Originally Posted by huff
That thread is awesome Para. Your decorous yet strident battle with Wad is a classic confrontation.
Para gave me dubious praise for "rhetoric," modestly avoiding his own employment of rhetorical device. He had stated: "In short, as counterintuitive as it may be at first glance, a statistic based on a sample size of 100 is equally valid regardless of whether the population from which the sample is drawn is 100,000 or 100 trillion."
Taking him at his word, the sample size is equally valid if it represents only 1/1000th or even 1/1,000,000,000,000th of the total. Since there are nowhere near 1 trillion people on earth, let’s cut it to "only" 1/1,000,000,000th. About 6.5 billion people on earth - maybe 3.25 billion men. According to Para’s analogy, we need only measure the penises of just 3 men on the planet and we will draw as valid an average if we measured 3, 250,000 men. A rhetorical reach I wouldn’t dare attempt. :)
Furthermore, Para said: "What I have described here is a foundational part of statistics, and if you have some new insight that calls it — or its application to any domain, including the extremely simple task of inferring a population mean for penis size — into question, then I invite you to enter the field and turn the entire discipline on its head. I did not, and in principle could not, overstate the centrality of the central limit theorem to modern medical research and sundry other domains of inquiry."
Far be from me to "enter the field and turn the entire discipline on its head." I’m simply wondering why Para doesn’t take his statistical expertise to the medical research institutions and save them countless millions of dollars in research expenses by convincing them to study only 1/1,000,000,000,000th of the total population. Again, since the earth’s population is far less than 1 trillion, they could limit their "studies" to just 1 test subject.
What a boon to humanity that would be. :)
http://www.free -definition.com … stribution.html
The central limit theorem
The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the so-called central limit theorem.
The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.
A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied). The approximating normal distribution has mean μ = np and standard deviation σ = (n p (1 - p))1/2.
A Poisson distribution with parameter λ is approximately normal for large λ. The approximating normal distribution has mean μ = λ and standard deviation σ = √λ.
Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.
Infinite divisibility
The normal distributions are infinitely divisible probability distributions.
Occurrence
Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal. There are statistical methods to empirically test that assumption.
Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal.
Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see below).
To summarize, here’s a list of situations where approximate normality is sometimes assumed. For a fuller discussion, see below.
In counting problems (so the central limit theorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as
Binomial random variables, associated to yes/no questions;
Poisson random variables, associates to rare events;
In physiological measurements of biological specimens:
The logarithm of measures of size of living tissue (length, height, skin area, weight);
The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
Other physiological measures may be normally distributed, but there is no reason to expect that a priori;
Measurement errors are assumed to be normally distributed, and any deviation from normality must be explained;
Financial variables
The logarithm of interest rates, exchange rates, and inflation; these variables behave like compound interest, not like simple interest, and so are multiplicative;
Stock-market indices are supposed to be multiplicative too, but some researchers claim that they are log-Lévy variables instead of lognormal;
Other financial variables may be normally distributed, but there is no reason to expect that a priori;
Light intensity
The intensity of laser light is normally distributed;
Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.
Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality.
Physical characteristics of biological specimens
The overwhelming biological evidence is that bulk growth processes of living tissue proceed by multiplicative, not additive, increments, and that therefore measures of body size should at most follow a lognormal rather than normal distribution. Despite common claims of normality, the sizes of plants and animals is approximately lognormal. The evidence and an explanation based on models of growth was first published in the classic book
Huxley, Julian: Problems of Relative Growth (1932)
Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the joint distribution of sizes deviate from lognormality.
The assumption that linear size of biological specimens is normal leads to a non-normal distribution of weight (since weight/volume is roughly the 3rd power of length, and gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.
blood pressure of adult humans is supposed to be normally distributed, but only after separating males and females into different populations (each of which is normally distributed)
The length of inert appendages such as hair, nails, teet, claws and shells is expected to be normally distributed if measured in the direction of growth. This is because the growth of inert appendages depends on the size of the root, and not on the length of the appendage, and so proceeds by additive increments. Hence, we have an example of a sum of very many small lognormal increments approaching a normal distribution. Another plausible example is the width of tree trunks, where a new thin ring if produced every year whose width is affected by a large number of factors.