Regarding the value of 4.7: I’ll talk about two things here. One is appropriate data, the other is appropriate use of statistics.
DATA - What you’re missing is that standard deviation is an almost worthless metric unless you meet the following criteria…
1) Sufficiently large sample size. And this size is highly variable depending on many factors.
2) An average value “appropriately” close to both the max and min values.
i.e. the average for ALL values should be “somewhat near” the average of just the max and min.
3) The measurments should roughly approximate a gaussian (or similar) distribution. You know this as a “bell curve”.
So, like almost all statistics, S.D. is an “ideal” calculation. The closer your data are to approximating this ideal, the more relevant the notion of a standard deviation becomes. But if your data violate one or more of the previous three rules by too wide a margin, then S.D. is just some number which doesn’t mean anything. You can calculate it, but what for?
So go back to the example and look. I’m assuming esac chose his data randomly, so we can ruthlessly bash the results. :-)
1) There are only 5 data points, which is so few as to produce meaningless, almost random results. He just wanted to show you how to do the calculations.
2) The distance from max to average is 1.8, the distance from min to average is 1.0. That’s an aweful large spread. It’s a hint that we’re violating rule three…
3) Look at the data again. How do you construct a Gaussian distribution from that? You can’t. You can’t even get close. There are other curves that you can use, but they have different and more complicated distribution formulas. They’re a perfect pain in the ass to work with, also.
STATISTICAL USAGE - One other point to keep in mind is that esac could have given us a much larger and better dataset, and still come up with the same problem. Namely that 4.7 would be smaller than any datapoint we have.
This is perfectly reasonable. The purpose of a standard distribution is either to validate a theory (by showing that the actual S.D. reproduces our expected outcome) OR to provide a tool for prediction. That second part is important.
So maybe 4.7 is lower than any of our actual data. That’s ok because what it really means is that “if we select one more person at random and take their measurement, we expect them to fall within a certain range.”
Would it surprise you to find someone with a penis less than 5.1 inches? Of course not. Just because you haven’t seen one yet doesn’t mean you don’t ever expect to see one!
Example: Your dad is 6’2”, you are 6’0”, and you have a brother who is 6’3”. If another brother is born, would you assume that he must grow to be between 6’0” and 6’3”?
And finally, note that esac only showed 1 S.D. This is where we expect to find about 85% of a population. He could have extended his estimate to 2 or 3 S.D., which is where you expect to find ~95% and ~99% of a population, respectively.
Hope this helps,
busted bus