Normal Distribution for Lean Six Sigma

Normal Distribution for Lean Six Sigma

Normal Distribution, also called Gaussian distribution, is arguably the most important distribution from a statistical analysis perspective. As a Lean Six Sigma practitioner, one needs to understand this distribution, its characteristics and applications in the projects. We will look at all the details pertaining to Normal distribution and its application in Lean Six Sigma. Read on!

In the last posts, we discussed basic probability concepts and Probability distributions at length. We spoke about discrete and continuous probability distributions. We also looked at histograms and how to build the same to identify shape or distribution of the data. Click on the links to read the related posts as we will build on from there. (Opens in a new tab).

What is Normal Distribution?

Normal distribution is the data distribution that you get when the data is clustered around the center (mean) of the data and extends towards both sides almost symmetrically. This means, maximum number of data points are at the center of the range of your data set as compared to both ends of the range.

Normal Distribution Histogram and Curve example
Normal Distribution Histogram and Curve example

Weights of students in a class

Let us take a simple example of weights of students in a particular class with 50 students. Most of the students will normally have approximately the same weight. This would be close to the average or mean weight of the class. There would be a few students who are overweight and a few who are underweight.

If you actually collect such data, you will get something similar to the below table.

42.946.548.148.849.7
44.447.348.248.949.7
44.547.448.349.150.0
44.647.648.449.250.4
46.447.748.749.350.4
50.451.051.251.753.1
50.451.051.451.953.4
50.651.151.452.153.6
50.751.251.652.254.8
50.751.251.653.156.4

Table with weights of 50 students from a class

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Observations from the Weights Data

If you carefully observe or even better, plot a histogram using the above data, you will see the below points;

  1. The average weight of the class if 49.9, very close to 50.
  2. The weights are clustered around this mean – 33 students weigh between 48 and 52 kgs out of 50 students
  3. There are only 3 students with weight less than 46 kgs, just one less than 44 kgs
  4. Similarly, there are only 2 students who weigh more than 54 kgs, just one above 56 kgs

Below table shows the frequency of data points and the graph shows the histogram of the above data.

IntervalsWeight Frequency
42 kgs to 44 kgs1
44 kgs to 46 kgs2
46 kgs to 48 kgs6
48 kgs to 50 kgs13
50 kgs to 52 kgs20
52 kgs to 54 kgs6
54 kgs to 56 kgs1
56 kgs to 58 kgs1
Normal Distribution Histogram of Weights Data
Normal Distribution Histogram of Weights Data

Such a data set is said to be normally distributed.

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Parameters of Normal Distribution

Any probability distribution usually is defined by some key parameters. These parameters go on to define the shape and hence the distribution of the data.

For Normal distribution, Mean and Standard deviation are such parameters. The shape of your normal distribution will depend on the mean and the standard deviation of your data set. Let’s see how!

Mean

Mean is the mathematical average of the data. It is the sum of all data values in your data set divided by the total number of values you have in the data set.

Mean is used the represent the central tendency of the data. As discussed earlier, most of the values in the normally distributed data will be clustered around the mean. In terms of probabilities, the probability of a value being close to the mean is way more than the probability of a value falling farther away from the mean.

The shape of the normal curve will change when the mean of the data changes. The whole normal distribution curve will move to the left or right depending on the change in the mean.

Shift in Normal Distribution Curve with change in Mean
Shift in Normal Distribution Curve with change in Mean

Standard Deviation

Standard deviation is the measure for variation in your data. It essentially represents how close or distant the data values are spread from the mean value. Hence, standard deviation dictates the width of your normal distribution curve.

Related Post : What are the measures of Variation?

When the standard deviation is small, the curve is tall and narrow. And when the standard deviation is big, the curve is short and wide. Look at the below image depicting the same.

Change in Normal Distribution Curve with change in Standard Deviation
Change in Normal Distribution Curve with change in Standard Deviation

Thus, the actual shape of your normal distribution curve will depend on these 2 parameters, mean and standard deviation.

Now that we understand what normal distribution is and how the shape of the curve varies based on mean and standard deviation, let us take a look at some of its characteristics.

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Characteristics of a Perfect Normal Distribution

A perfect normal distribution will always follow the below characteristics. Do not confuse this with standard normal distribution. It is different and we will talk about it later in this post.

1. It will always look like a bell shaped curve

2. The mean will always be at the center of the curve

3. Half of the data points will always be greater than the mean, that is, on the right side of the mean

4. Other half of the data points will always be smaller than the mean, that is, on the left side of the mean

5. The mean, mode and median of the perfectly normally distributed data will always be equal

6. 68.2% of the data values will always be between +/- 1 standard deviation from the mean

7. 95.4% of the data values will always be between +/- 2 standard deviations from the mean

8. 99.7% of the data values will always be between +/- 3 standard deviations from the mean

Below normal distribution shows the percentage of values within each standard deviation range. Take a minute to go through the same.

Area Under the Normal Curve
Area Under the Normal Curve

What does these probabilities mean?

The data of weights mentioned above has a mean of 49.9 kgs and a standard deviation of 2.7. And it follows normal distribution. This means, if you pick up any random student from the class, there is a 68.2% probability that this student weigh between 47.2 kgs and 52.6 kgs (+/- 1 std dev). There is also a 95.4% chance that this student weighs between 44.5 kgs and 55.3 kgs (+/- 2 std dev).

Thus, once you know that a particular data set follows normal distribution and know the parameters (mean and standard deviation), you can predict the probability of the random variable taking a value within a range.

The same is true for any normally distributed data set.

One important point to remember here.

Normal distribution is not the same as Symmetrical distribution. All normal distributions are symmetrical, however, not all symmetrical distributions are normal.

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Standard Normal Distribution

We saw that the normal distribution curve can have take various shapes depending on the mean and standard deviation of the data. There can be one with a mean of 50 and standard deviation of 3 and there can be another with a mean of 100 and a standard deviation of 5. The next question that we need to answer is, how to compare such different normally distributed data sets or processes.

Converting the data into a Standard normal distribution is the answer.

This is a distribution with a mean of ‘Zero’ and a standard deviation of 1.

All normal distributions can be converted to standard normal distribution. This is done by calculating the standard score or Z score for each of the data value in your data set. And then we can compare them since they are on the same scale. This distribution is also caller a Z-Distribution.

Essentially, a Z score of a data point represents how far the said data point is from the mean. If you have a Z Score of 0, it means the data point is the mean of the data. A Z score of 1 means that the data point is 1 standard deviation on the right side of the mean (mean + 1 std dev). A Z score of -1 means that the data point is 1 standard deviation on the left side of the mean (mean – 1 std dev).

How to calculate the Z Scores?

It is quite simple. All you need to know is the mean and the standard deviation of your data set.

To calculate the Z Score for a data point, simply substract the mean of the data from the data value and divide it by the standard deviation. That’s it.

The mean weight of our example class is 49.9 and the standard deviation is 2.7. The weight of the first student was 42.9 kgs, the first value in our data set. To calculate the Z score of this value, first substract the mean from this value.

We get 42.9 – 49.9 = 7

Next divide this subtraction by the standard deviation.

We get -7/2.7 = -2.59

This is the Z Score for 42.9 kgs.

Similarly, for 54.8 kgs, another value in our data set, the Z score is (54.8-49.9)/2.7 = 1.81. You can calculate the Z scores for all the values using the same method. See the below table;

Table with Z Scores for Weights
WeightsWeight – MeanZ Score
50.400.510.1938
52.102.210.8349
50.600.710.2692
51.401.510.5709
48.80-1.09-0.4095
50.400.510.1938
50.700.810.3069
47.70-2.19-0.8243
44.60-5.29-1.9933
48.10-1.79-0.6735
42.90-6.99-2.6343
48.30-1.59-0.5981
50.700.810.3069
48.20-1.69-0.6358
44.50-5.39-2.0310
49.70-0.19-0.0701
51.001.110.4201
47.40-2.49-0.9374
53.603.711.4005
49.70-0.19-0.0701
51.101.210.4578
51.201.310.4955
50.400.510.1938
48.90-0.99-0.3718
50.400.510.1938
51.601.710.6463
51.401.510.5709
56.406.512.4563
50.000.110.0430
51.701.810.6840
53.103.211.2119
46.50-3.39-1.2768
48.70-1.19-0.4472
51.201.310.4955
47.60-2.29-0.8620
51.601.710.6463
53.403.511.3251
47.30-2.59-0.9751
53.103.211.2119
48.40-1.49-0.5603
49.30-0.59-0.2210
54.804.911.8530
49.10-0.79-0.2964
51.201.310.4955
44.40-5.49-2.0687
49.20-0.69-0.2587
51.902.010.7594
52.202.310.8726
46.40-3.49-1.3145
51.001.110.4201

The distribution for Z Scores will look like as shown below. This is the Standard Normal Distribution for weights data of our class.

Standard Normal Distribution
Standard Normal Distribution

Now, if you convert the weights data of multiple classes into Standard Normal Distributions, with a mean of 0 and a standard deviation of 1, you can easily compare the weights of all the classes. You can also see if a particular student is overweight or underweight with respect to his or her class. And you can also make statements such as ” student A from class VI is doing better than student B from class X” based on which side of the mean this student stands. Thus, standard normal distribution helps us to “compare apples and oranges” as well 🙂

Importance of Normal Distribution

It is quite evident how important a normal distribution is in statistically analyzing data based on what we have discussed above. Specially for drawing conclusions about the population based on the sample data. Hence, it is also extremely critical for every Lean Six Sigma practitioner to understand Normal distribution. Since this is exactly what we do, statistically analyze data and draw conclusions about the population based on the sample.

Apart from what we already discussed, there are other reasons for importance of Normal Distribution.

For lean six sigma project, you will either have data which follows normal distribution or non-normal distribution. More often than not, you will see that process data follows Normal Distribution.

A lot of tests that you will do in your Analyze phase as well as some in the measure phase assumes that your data follows normal distribution.

Even if you have data which does not follow normal distribution, if you select multiple samples from the same data set, the means of such samples also tend to follow normal distribution. This is specially important because, in our processes, it is not possible to capture the data for whole population. We usually pick up multiple samples from the population. The distribution of the means of such samples will follow normal distribution irrespective of the distribution of the population data. More on this when we discuss Central Limit Theorem.

This is quite sufficient about Normal Distribution that a Lean Six Sigma practitioner needs to understand. Do let me know if there is anything I missed or if you have any comments / questions in the comments section below. I will surely get back to you. Don’t forget to subscribe so you wont miss on the latest posts!

Liked this. Please help share with others too!

11 thoughts on “Normal Distribution for Lean Six Sigma

  1. My brother suggested I might like this website.

    He was entirely right. This post truly made my day.
    You cann’t imagine just how much time I had spent for this information! Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *