Six Sigma is based on statistics to help you make sense of all the data that you collect when completing your Six Sigma project. Today, we go over some Six Sigma basic statistics in QuikSigma to better help you make sense of everything.
In this session, will learn to use the descriptive statistics function and you’ll notice have opened that for you here. It’s under the toolbox and for this demonstration I’m going up here to tools and I’m going to open the data manipulator and generate some random data. I’ve already selected that I want 12 rows of data with a mean of 100 and a standard deviation of 15. More properly, I want 12 data randomly drawn from a population like that so let’s take the data that we have generated, copy it, and paste it over here and see what happens. You can think of descriptive statistics as sort of the compression algorithm for data. It’s not lossless. You do lose some information but it’s kind of the way that you can take a big, nasty, ugly collection of data and talk about it meaningfully with just a few numbers.
Okay, three things that we look for as we do our descriptive statistics. Number one, where’s the middle of the distribution? Number two, how spread out are the data around the middle? Number three, what’s the shape of the distribution? So let me take those in a little different order. First thing we can look at is according to the Anderson-Darling normality test, we cannot detect any non normality. So we are justified in assuming that the data are normally distributed and that’s handy. Here’s a histogram of the data. Now, sometimes people are a little shocked when they see histograms of small data sets that doesn’t look very normal but truly it is if you if you enlarge the size of the sample, then you’ll see that you get something closer and closer to a normal distribution. Measures of the middle are right up and down this column and in the middle is the mean. The mean is 96.63. The median, the 50th percentile, we have as much data below this as we do above his 97.12, and the trimmed mean is 97.5. The trend mean is calculated after you throw out outliers and if that’s very different from this mean it means that your data are probably a little bit skewed. So the next issue is how spread out is the data? That can be measured by standard deviation or by range. On larger groups of data, we prefer standard deviation. On small subgroups, range works very well. This is the one that is probably center stage for you.
This is Sigma sum of squares, 15.85. This is the standard usual way that standard deviation is calculated. This, on the other hand, is standard deviation calculated by the moving range method. Now, we use that in process behavior or control charts and we use that in capability. Ok, let’s see, where next? Oh, confidence intervals. Around the mean we have this ninety-five percent confidence interval. Says that if we threw the size sample again and again from this population that ninety-five percent of the time we would get a mean between 86.56 and 106.7. Similarly, we’ve got ninety five percent confidence intervals on our standard deviation. That could be anywhere between 11.23 and 26.91. Ok, what have we left out? Kurtosis. If your center heavy or tail-heavy in your distribution, you may have something in kurtosis. Skew simply means that the distribution is kind of been pushed to one side. Here are our non parametric measures, min, max. Difference between those two is the range of course. First quartile, 25-percent of the data or below that. Median, fifty percent or below. Third quartile. 75% or below. The only things left I guess, are to point out that we had 12 data and then these relate to my Anderson-Darling test up here. So pretty simple and gives you a pretty complete rundown on what the data looks like in a condensed version.