Histogram – Identifying Shape of the Data

Histogram – Identifying Shape of the Data

Histogram is a graphical analysis tool used to identify the shape of the data. In this post, we will discuss various characteristics of a histogram, how to identify shape or distribution of your data set using histogram and how plot a histogram using Minitab as well as excel.

What is a Histogram?

Histogram is a graphical analysis tool. In simple words, it is a bar chart representing your complete data set. However, this bar chart does not plot the data values from your data set. It rather plots the frequency, or number of time a particular data value is present in the data set, segregated into multiple ‘intervals’ or ‘bins’. Thus, it is essentially a bar chart depicting the frequency of the values in your data.

Histograms are useful to analyse continuous data and not for discrete data. More about continuous and discrete data type (opens in a new tab). Histograms are useful for Lean Six Sigma practitioners in understanding the shape, or distribution, of the data. It helps in understanding where the center of the data is or where the data is more concentrated. It also helps to understand the spread of the data. Spread of the data is the variation between the lowest and highest value in the data. Remember, the center of the data is the average (mean) of the data or the median value of the data. However, such mean or median does not necessarily lie at the center of the histogram, as we will see in a few examples below.

Related Post : What are the measures of Variation?

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

How to create a Histogram

Before we start creating histograms in MiniTab or Excel, let us first understand step by step approach of how a histogram is built. This will also help you understand some of the important characteristics of histograms.

Let us consider the below data set. It is the time taken, in minutes, for an operator to fulfil each order at an e-commerce warehouse.

Histogram Data Set
Histogram Data Set

In this data set, we observe that there are 50 values. 3.79 is the minimum value and 39.60 is the maximum value. Which means, the data ranges roughly from 0 to 40. We are considering the range from 0 as it is the lowest natural boundary below which the time taken to fulfill an order practically cant go.

To plot a histogram, first we divide this range into equal intervals. For this example, lets consider the intervals of 5 minutes each, which gives us a total of 8 intervals as shown in the below image. Next, we count the number of values that falls into each of the intervals. There are 3 value which are between 0 to 5 minutes, 5 values between 5.01 to 10 minutes, 8 values between 10.01 to 15 minutes and so on.

Histogram Frequency table
Histogram Frequency table

We then use the information from the above frequency table to plot the bar chart. The X axis has the intervals mapped against the frequency on the Y axis. The resulting bar chart that we get is a Histogram.

Histogram Example
Histogram Example

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Histogram using MS Excel

There are multiple ways in which you can create a histogram in Excel. However, one of the easiest way is to use the free excel add-in for data analysis. This add-in is not only useful for creating histograms but can come in handy while doing a lot of other analysis as well. Hence I strongly recommend to add the same.

The steps are simple. From the excel file menu, go to options which is at the bottom right corner in the file menu. Click on add-ins, select ‘Analysis Tool Pak’ from the ‘inactive application add-in’ list. At the bottom of the command box, ensure that “excel add-in” is selected in the drop down and click on GO. In case a new dialog box appears, ensure that Analysis ToolPak is ticked and click on OK. You will be able to see the Data Analysis option under ‘Data’ menu in excel, at the top right corner. I will try and put a detailed post on adding this with relevant screenshots sometime later and link it here.

Now that you have the Analysis TooPak, here are the steps to create the Histogram.

Step 1. Copy the above 50 data value in one single column of an excel file. Create one more column which lists the intervals or bins. We will use the intervals of 5 minutes each as described earlier. Hence this interval column will have values as 0, 5, 10, 15, and so on till 45. The data will look like as shown in the below image.

Step 1 - Histogram in Excel
Step 1 – Histogram in Excel

Step 2. Click on “Data” tab in the excel Menu bar. You will see the Data Analysis button at the top right of this menu. Click on the same and a dialog box will appear.

Step 2 - Histogram in Excel
Step 2 – Histogram in Excel
Data Analysis Dialog box
Data Analysis Dialog box

Step 3. Select Histogram in the dialogue box and click on OK. A dialogue box will appear as shown below.

Histogram Dialogue box
Histogram Dialogue box

Step 4. In the input range field, select the range of columns where you have the time data stored. I have it saved in cells B3 to B52, hence I will select the same. In the Bin range field, select the column range where you have the intervals stored. For me, its cells C3 to C12. If you need the output in the same worksheet, select the Output range radio button and input the cell number where you need the output. Select the new workbook radio button if you need the output in a new worksheet. Ensure that you tick the Chart Output option to get the Histogram. Once all of this is done, hit OK.

Step 4 - Histogram in Excel
Step 4 – Histogram in Excel

Once you click on OK, you get the below output. This has the frequency table as well as the histogram.

Histogram Output in Excel
Histogram Output in Excel

Download my latest eBook – Lean Six Sigma Acronyms

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. And its FREE!

Histogram using MiniTab

To create the same chart using MiniTab is much easier. However, the current version of MiniTab does not give you the option to manually define the intervals or bins. MiniTab will automatically calculate the best fit bin size and then plot the histogram by itself.

To start with, paste the 50 data values in MiniTab worksheet in one single column. Once you have this data in the worksheet, go to ‘Graph’ menu from the menu bar. You will see histogram listed in this menu as shown in the below image. Click on the same.

Histogram in Minitab - Path
Histogram in Minitab – Path

Once you click on Histogram, a pop up will appear asking to select which type of histogram you wish to create. Ignore the last 2 options for now. The first option will only plot the histogram whereas the second, ‘with fit’ option will plot the histogram as well as try to fit it under a distribution curve. You should try both. For this example, we select ‘simple’. Hit OK.

Histogram in Minitab - Dialogue Box 1
Histogram in Minitab – Dialogue Box 1

Once you hit OK, another command box will appear asking for ‘Graph Variables’. This is where you input the column where you have the data stored. Double click on the column name from the left hand side list. Once done, hit OK.

Histogram in Minitab - Dialogue Box 2
Histogram in Minitab – Dialogue Box 2

That is all. You will get the histogram as graphical output in a new window as shown below. I suggest you should also try the same steps with multiple data sites and also with the other ‘With fit’ option for better understanding.

Histogram in Minitab - Output
Histogram in Minitab – Output

Now that we know how to create a histogram, let’s look at one very important aspect based on which the shape of your data might look different. The interval size.

Importance of Intervals

The below image shows the histogram plotted using the same set of data. Although the data is the same, the graphs look way different than each other. Why do you think this happens.

This essentially happens due the different interval on bin sizes that we defined. The first graph uses a bin size of 5 minutes. Just by looking the the histogram, you may tend to conclude that the data is normally distributed. However, the moment we change the bin size from 5 to 4, the graph does not look the same. The third graph with bin size of 3 seems further apart from normal distribution. The last figure with bin size of 2 minutes gives a completely different picture and makes it very difficult to conclude the normality of this data set.

Do remember that some of the statistical analysis tools or softwares will give you an option to select the interval or bin sizes of your choice and some will not.

Histogram with different intervals
Histogram with different intervals

The shape of the histogram will always depend on the interval or bin size used to plot the same. Hence, it is strongly advised to look at the results from normality tests and p values before you conclude the normality of a data set. Although a histogram can give you a graphical view regarding normality, or non normality, but it should not be used as a standalone tool for final conclusion.

Liked this. Please help share with others too!

Leave a Reply

Your email address will not be published. Required fields are marked *