# Data Type for Lean Six Sigma Projects

Why is data type important? A Lean Six Sigma project will deal with a lot of data. The project owner needs to perform various tests on such data set. The statistical tests to be performed depends on the data type. Hence, its imperative to understand the data types for all the data sets you have for your project.

**Quick disclaimer**, we will discuss the data types relevant to a Lean Six Sigma project and will restrict ourselves to this scope. You will find that there are multiple alternate ways to classify data into various types other than what in mentioned below. However, understanding the data types discussed below is necessary as well as sufficient from an LSS projects perspective.

## Data Types

All data sets for your Lean Six Sigma project can be broadly classified into 2 data types. **Continuous data** and **Discrete data**. Discrete data is further classified into **Ordinal, Count** and **Binary** data types.

In addition to the above Data Types, there exists a school of thought that treats some data as **pseudo continuous**. This is nothing but the data values that you get for a derived metric and not something that you collect or collate. We will discuss this in details at the end of this post.

First, lets understand Discrete and Continuous data types.

**Download my latest eBook – Lean Six Sigma Acronyms**

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. **And its FREE!**

## Discrete Data Type

Simply put, discrete data is what can be counted, not measured. Like the number of students, total invoices processed or total defective units.

Statistically, discrete data is the data set which can only have integers, no fractions or decimal values. There is no continuity between 2 data points. This means, if you take any 2 consecutive data points from a data set, you will not be able to split the difference between these 2 data points into any number of intervals and still have a valid data point.

Simple example, consider a data set of ranks of the students. Rank 1, 2, 3, 4 and so on are valid data points in this data set. Now, if you take rank 2 and rank 3, and split the difference into half, you get rank 2.5. This is not a valid data point for this data set. And hence this is Discrete data type by nature.

Thus, any data set in which, 2 consecutive data points cannot be split into multiple parts and still remain to be a valid data point, is Discrete data.

### Types of Discrete data

Discrete data can be further categorised into 3 types.

### Binary

Any data set which consists of only 2 possible and valid values is od Discrete Binary data type. It can be a Yes or a No, a True or a False, a 1 or a 0, Valid or Invalid, pass or fail, correct or incorrect etc. All such data sets which has only 2 values is binary.

Remember, when it comes to data for an LSS project, it can be both qualitative as well as quantitative. Trick is to convert the qualitative data into quantitative data for further analysis.

### Ordinal

The data set which has data points corresponding to a specific order or importance is of Discrete Ordinal data type. Lets understand this with an example.

Lets say you have a data set of ratings given by multiple customers for your product on a scale of 1 to 5, 1 being the least and 5 being the highest. This data set will consists of values ranging from 1 to 5. Each data point in such data set has its standing relative to other data points and the whole data set has an order to it. Such data sets are of Discrete Ordinal data types.

Other examples or ordinal data type are ranks, ratings (discrete scale), scores (integers), grades etc.

### Count

The third type of discrete data is count. The data set which has each value which is a count of people, things, units, products etc, is of discrete count data type.

If you have a data set of number of students in each class of a school, its count data. Number of defective units in each batch and number of defects in each unit is count data.

**Download my latest eBook – Lean Six Sigma Acronyms**

Contains 220+ LSS acronyms and abbreviations, a handy reference guide for all LSS Practitioners. **And its FREE!**

## Continuous Data Type

Simply put, continuous data is the data that can be measured, not counted. Consider weight or height of a person. You can’t count the height or weight, you have to measure it. That’s continuous data. Other examples are temperature, humidity, pressure, time and so on.

Statistically, Continuous data is the data set which can contain integers as well as fractions. It has a built in continuity in the data set. If you pick up any two consecutive values from the data set and divide the difference into multiple parts, each such data point is a valid data point for this data set. Repeat this exercise again and you will still get valid data points. Such data set is of Continuous data type.

Take the example of weight. 10 Kg is a valid weight. 11 kg is also a valid weight. If you split 10 to 11 kgs into 10 parts, 10.1, 10.2, 10.3 and so on are also valid weights. Further, 10.11, 10.12, 10.13 and so on are also valid weights. There is no break in validity of the data points. Such data set is of continuous data type.

## Key Callouts

So now we know how to differentiate between discrete and continuous data type. Below are a few pointers that we should keep in mind while identifying the data types.

- Always remember, you do not identify the data type of a data set just by looking at the values in the data set. You should identify the same by looking at what the data is about.
- For example, weight data set might sometimes have data points recorded as integer values only. Still, weight data by nature is continuous and you should treat it as continuous.
- Always prefer continuous data set over discrete data sets. These are more accurate, precise and gives a lot more information than discrete data.
- Qualitative data is always discrete in nature.
- Data types should be identified based on the data that you collate / collect, not on the basis of a representative data point. More on this in my post on Project metric data type (opens in a new tab).