The Central Limit Theorem (CLT)

The Central Limit Theorem (CLT) is an asymptotic result of summing probability distributions. It turns out to be very useful for obtaining sums of individuals (e.g. sums of animal weights, yields, scraps). It also explains why so many distributions sometimes look like the Normal. We won't look at the derivation, just see some examples and its use.

The CLT result

The sum S of n independent random variables x(i) (where n is large), all of which have the same distribution will asymptotically approach a Normal distribution with known mean and standard deviation:

(1)

where m, s are the mean and standard deviation of the distribution from which the n samples are drawn. This distribution of the sum of random variables is implemented in ModelRisk with the VoseCLTSum function.

Examples

Imagine that the distribution of the weight of random nails produced by some company has a mean of 27.4g and standard deviation 1.3g. What will be the weight of a box of 100 nails? The answer is the following Normal distribution: Normal(100*27.4, SQRT(100)*1.3) grams.

=VoseNormal(2740, 13)

This CLT result turns out to be very important in risk analysis. Many of distributions are the sum of a number of identical random variables, and so as that sum gets larger, the distribution tends to look like a Normal distribution. For example:

Gamma(a,b) is the sum of a independent Expon(b) distributions, so as a gets larger, the Gamma distribution looks progressively more like a Normal distribution. An Exponential distribution has mean and variance of b, so we have:

Gamma(a,b) ® Normal(ab, a√b) as n ® ∞

Other examples are discussed in the section on approximating one distribution by another.

How large does n have to be for the sum to be distributed Normally?

Distribution of individual	Sufficient n
Uniform	12 (try it: an old way of generating Normal distributions)
Symmetric Triangle	6 (because U(a,b)+U(a,b) = T(2a,a+b,2b)
Normal	1 !
Skewed	30+ (30 lots of Poisson(2) = Poisson(60) )
Exponential	50+ (check with Gamma(a,b) = sum of a Exp(b)'s )

A nice way to see CLT at work is using the VoseAggregateMoments function with a number as frequency argument. The larger you make n, the closer the skewness and kurtosis will approach the Normal skewness and kurtosis of 0 and 3 respectively. Try inserting

{=VoseAggregateMoments(n,VoseTriangleObject(0,1,4))}

using larger and larger values of n. As you use larger n, the skewness and kurtosis (indicating the shape of the aggregate distribution) will approach more and more the Normal values of 0 and 3.

Other related results

The average of a large number of independent, identical distributions

Dividing both sides of Equation (1) by n, the average x of n variables drawn independently from the same distribution is given by:

(2)

Note: the result of Equation (2) is correct because both the mean and standard deviation of the Normal distribution are in the same units as the variable itself. However, be warned that for most distributions one cannot simply divide by n the distribution parameters of a variable X to get the distribution of X/n. It works for the normal distribution because both parameters are in the same units as x.

The product of a large number of independent, identical distributions

CLT can also be applied where a large number of identical random variables are being multiplied together, for the following reason:

Let P be the product of a large number of random variables x(i); i = 1 to n, i.e.:

Taking logs of both sides, we get:

The right hand side is the sum of a large number of random variables, and will therefore tend to a Normal distribution. Thus, from the definition of a Lognormal distribution, P must be lognormally distributed.

A neat result from this is that if all Xi are Lognormally distributed, their product will also be Lognormally distributed.

Is CLT why the Normal distribution is so popular?

Many stochastic variables are neatly described as the sum or product, or a mixture, of a number of random variables. A very loose form of CLT says that if you add up a large number n of different random variables, and if none of those variables dominate the resultant distribution spread, the sum will eventually look Normal as n gets bigger. The same applies to multiplying (positive) different random variables and the Lognormal distribution. In fact, a Lognormal distribution will also look very similar to a Normal distribution if its mean is much larger than its standard deviation (see graph below), so perhaps it should not be too surprising that so many variables in nature seem to be somewhere between Lognormally and Normally distributed.

Some ModelRisk functions useful with the Central Limit Theorem and their Excel equivalents

Use	Function	Explanation
Normal probability	=VoseNormalProb(x,m,s,cumulative) which is equivalent to: = NORMDIST(x,m,s,cumulative)	The Normal density for x (cumulative = FALSE), or cumulative probability <x (cumulative = TRUE)
Lognormal probability	=VoseLogNormalProb(x,m,s,cumulative) which has no Excel equivalent	The LogNormal density for x (cumulative = FALSE) or cumulative probability for x (cumulative = TRUE)
Lognormal inverse probability	=VoseLogNormal(m,s, U) which has no Excel equivalent	The value x such that P(variable ≤ x) = U
Normal inverse probability	=VoseNormal(m,s, U) which is equivalent to: =NORMINV(U,m,s)	The value x such that P(variable ≤ x) = U
Unit Normal inverse probability	=VoseNormal(0,1, U) which is equivalent with: =NORMSINV(U)	The value z such that P(variable ≤ z) = U for a Normal(0,1) distribution (i.e. a z-test limit)
CLT Sum	=VoseCLTSum(N,m,s)	This function generates values from a Normal distribution approximating the sum of N independent identically distributed random variables following a distribution with mean m and standard deviation s.

Read on: The strong law of large numbers