| Download a pdf copy of this help file here |
See also: Histogram plots, Graphical descriptions of model outputs, Sturges' rule
A histogram plot is a natural way of representing a set of samples drawn from a univariate variable. It shows the range, central tendencies and shape of the distribution of the data. However, in order to make a histogram plot one must decide on the number of bars to use.
Most statistical software use Sturges rule which says the data range should be split into k equally spaced classes where
![]()
where
is the ceiling
operator (meaning take the closest integer above the calculated
value). Click here for a more detailed
explanation of this equation.
Sturges' rule is the most commonly applied in statistical software, even though it is not actually that good when the data exhibit skewness or any other non-normality. There are two better guidelines:
Scott (1979) proposed that the bar width w should be determined as follows:

where s is the sample standard deviation of the n data values. The equation is derived from attempting to minimize the bias in variance of the histogram compared with the data set. The underlying theory requires knowledge of the distribution form of the data, which we rarely have, so the above equation assumes normality although this turns out to be rather unrestricting in practice.
Freedman and Diaconis (1981) proposed that the bar width w should be determined as follows:

where IQR is the sample inter-quartile range of the n
data values, i.e. the difference between the 75th and 25th
percentile of the data. The rule was based on the goal of minimizing the
sum of squared errors between the histogram bar height and the probability
density of the underlying distribution which gave the
part
of the equation. The use of 2*IQR as a measure of spread was determined
from their empirical experiments.
In general, we have found Scott's rule gives the most pleasing balance between detail and overview (Freedman and Diaconis' rule generally produces more bars), but the histogram bar ranges will take awkward minimum and maximum values which makes the histogram less easy to read.
In the histograms produced by Vose Data Analysis in ModelRisk we round off to more intuitive values. The user can also overrule the settings which is most useful when there are extreme tails in the data.