The word statistics will often send a chill through the spine of those who spent years getting flogged with the calculations by professors at university. Regardless of your experience with statistics, they do exist and they do really have a use. In this article I simply want to explain the different types that exist.
Numerous types of statistics are used; these include Descriptive Statistics, Inferential Statistics, Mathematical Statistics and Exact Statistics. The two major forms of statistics that we are concerned with as Lean Six Sigma practitioners are Descriptive Statistics and Inferential Statistics.
Descriptive statistics describe, summarise and display the characteristics of data; typically a sample of data taken from a population or process. Descriptive statistics encompass both quantitative methods and graphical techniques which help us understand certain things about the data set.
- Central tendency
- Symmetry of its distribution
- Distribution of proportions as they relate to certain variables
These are examples of what descriptive statistics look like:
- A table of numbers that displays the statistics of a sample of data (i.e. mean, standard deviation, variance, range, min, max etc)
- A vertical column chart displaying the total value per month over the period of 1 year
- A histogram displaying the frequency distribution of the entire data set
Inferential statistics on the other hand, model patterns in data in such a way that uncertainty and randomness are accounted for and relatively accurate inferences can be made about a specific process or population.
Inferential statistics utilise various advanced methods to help us understand the following.
- Whether or not samples come from different populations (e.g. hypothesis testing)
- Whether or not one or more variables are associated with variation in one or more other variables (e.g. correlation)
- Models for estimating outcomes for a specific dependent variable when other independent variables are altered (e.g. regression)
- The accuracy of estimates of population characteristics (e.g. interval estimation)
These are examples of specific Inferential Statistics tests:
- T Tests (1 sample and 2 sample)
- Z Tests
- Analysis of Variance (1 way and 2 way ANOVA)
- Correlation and Regression
- Chi Square
- F Test
- Anderson Darling Normality Test
- Bartlets and Levenes Tests
To undertake statistical analysis effectively, we must understand the following terms.
A parameter is a numerical measure that describes a specified characteristic of a population.
- The average operating cost per month for our entire business is $5.75m
- 30 percent of all university students during the year of 2002 applied for HECS fees
- The range of all fines issued issued by police officers during 2003 for speeding was between $100 and $275
- The average defect rate per month for our production line is 1.3 percent
A statistic is a numerical measure that describes a characteristic of a sample of data taken from a population. For example:
- The average height of the 50 high school students we sampled is 173.5 cm
- The range of heights for the 50 high school students we sampled was 163cm to 191cm
- 25 percent of the products we sampled last week as they came off the production line were out of spec
Continuous Numerical Variables
Data generated through measuring characteristics using an infinitely divisible ordinal scale.
The level of detail in the result depends entirely on the accuracy of the instrument being used to measure. (e.g.: time, weight, length, width, height, thickness, strength, density etc)
Discrete Numerical Variables
Data generated through counting attributes.
The results appear as whole units that cannot be divided. (e.g.: number of defects, number of males, number of players, number of complaints, shots on a golf course etc)
It’s the case that ‘half’ of one of these does not practically make sense; half a defect or half a player is not a relevant value.
Data generated through observing categorical variables (or characteristics).
The results are qualitative in the information provided and yield values that can only be placed into categories or classes. (e.g.: pass or fail, causes of failure, defective or non defective, male or female, low, medium or high etc)
Since data can refer to either the whole population in a study or to just some sample selected from that population, it is vitally important that we, as lean six sigma professionals, are able to distinguish between values that describe a population and values that describe a sample.