Chebyshev's inequality | Vose Software

# Chebyshev's inequality

If a data set has mean  and standard deviation s, we are used to saying that 68% of the data will lie between (x - s)  and (x + s), 95% lie between (x - 2s) and  (x + 2s), etc.. However, that is only true when the data follow a Normal distribution. The same applies for a probability distribution. So, when the data, or probability distribution, are not normally distributed, how can we interpret the standard deviation?

Tchebysheffs rule applies to any probability distribution or data set. It states:

 For any number k greater than 1, at least (1-1/k2) of the measurements will fall within k standard deviations of the mean.

Substituting k=1, Tchebysheffs rule says that at least 0% of the data or probability distribution lie within one standard deviation of the mean. Well, we already knew that! However, substitute k = 2  tells us that 75% of the data or distribution lie within 2 standard deviations of the mean. That is useful information because it applies for all distributions.

This is a fairly conservative rule, in that if we know the distribution type we could specify a much higher percentage (e.g. 95% for 2 standard deviations for a Normal distribution, compared with 75% with Tchebysheffs rule) but is certainly helpful in interpreting the standard deviation of a data set or probability distribution that is grossly non-Normally distributed.

This figure compares Tchebysheffs Rule with the results of a few distributions. You can see that for any k, knowing the distribution type allows you to specify a much higher fraction of the distribution to be contained in the range mean +/- k standard deviations. The bimodal distribution tested was: