How to read probability distribution equations

This topic helps you better understand how to read and use the equations that we provide to describe distributions. For each distribution featured in this help file we give the following equations:

Probability mass function (see definition) (for discrete distributions);
Probability density function (see definition) (for continuous distributions);
Cumulative distribution function (see definition) (where available);
Mean (see definition);
Mode (see definition); and
Variance (see definition).
Skewness
Kurtosis

There are many other distribution properties we could have provided (e.g. moment generating functions), but they are of little general use in risk analysis, and would leave you facing a daunting page of algebra to wade through.

Location, scale and shape parameters

In this help file we parameterise distributions to reflect the most common usage, and where there are two or more common parameterisations we use the one that is most useful to model risk. So we use, for example, mean and standard deviation a lot for consistency between distributions, or other parameters that most readily connect to the stochastic process that the distribution is most commonly applied to.

Another way to describe parameters is to categorise them as location, scale and shape, which can disconnect the parameters from their usual meaning, but is sometimes helpful in understanding how a distribution will change with variation of the parameter value.

A location parameter controls the position of the distribution on the x-axis. It should therefore appear in the same way in the equations for the mode and mean - two measures of location. So, if a location parameter increases by 3 units, then the mean and mode should increase by three units. For example, the mean m of a Normal distribution is also the mode, and can be called a location parameter. The same applies for the Laplace, for example. A lot of distributions are extended by including a shift parameter (VoseShift), which has the effect of moving the distribution along the x-axis and is a location parameter.

A scale parameter controls the spread of the distribution on the x-axis. It's square should therefore appear in the equation for a distribution's variance. For example, b is the scale parameter for the Gamma, Weibull and Logistic distributions, s for the Normal and Laplace distributions, b for the Extreme ValueMax, ExtremeValueMin and Rayleigh distributions, etc.

A shape parameter controls the shape (e.g. skewness, kurtosis) of the distribution. It will appear in the pdf in a way that controls the manipulation of x in a non-linear fashion, usually as a coefficient of x.

For example, the Pareto distribution has pdf:

q is a shape parameter here as it changes the functional form of the relationship between f(x) and x. Other examples you can look at are n for a GED, Student and the ChiSq distribution, and a for a Gamma distribution. A distribution may sometimes have two shape parameters, e.g. a1 and a2 for the Beta distribution, n1 and n2 for the F distribution.

If there is no shape parameter the distribution always takes the same shape (like the Cauchy, Exponential, Extreme Value, Laplace, Logistic and Normal).

Understanding distribution equations

Probability mass function (pmf) and probability density function (pdf)

The pmf or pdf is the most common equation used to define a distribution, for two reasons. The first is that it gives the shape of the density (or mass) curve, which is the easiest way to recognize and review a distribution. The second is that the pmf (or pdf) is always in a useful form, whereas the cdf frequently doesn't have a closed form (meaning a simple algebraic identity rather than an integral or summation).

Pmf's must sum to 1, and pdf's must integrate to 1, in order to obey the basic probability rule that the sum of all probabilities = 1. This means that a pmf or pdf equation has two parts: a function of x, the possible value of the parameter and a normalizing part that calibrates the distribution to sum to unity. For example, the Error distribution pdf takes the (rather complicated) form:

where

The part that varies with x is simply

The rest of Equation (1), i.e. K/b, is a normalizing constant for a given set of parameters and ensures the area under the curve is unity. Equation (2) is sufficient to define or recognize the distribution and allows us to concentrate on how the distribution behaves with changes to the parameter values. In fact probability mathematicians frequently work with just the component that is a function of x, keeping in the back of their mind that it will be normalized eventually.

For example, the x-m part shows us that the distribution is shifted m along the x-axis (a location parameter), and the division by b means that the distribution is rescaled by this factor (a scale parameter). The parameter n changes the functional form of the distribution. For example:

For n = 2:

Compare that to the Normal distribution density function:

So we can say that when n = 2, the GED is Normally distributed with mean m and standard deviation b. The functional form (the part in x) gives us sufficient information to say this, as we know the multiplying constant must adjust to keep the area under the curve equal to unity (see the proof).

Similarly, for n = 1 we have:

Compare that to the Laplace distribution

So we can say that when n = 1, the GED takes a Laplace(m,s) distribution.

The same idea applies to discrete distributions. For example, the Logarithmic(q) distribution has pmf:

is the normalizing part because, it turns out,

Cumulative distribution function (cdf)

The cdf gives us the probability of being less than or equal to the variable value x. For discrete distributions this is simply the sum of the pmf up to x, so reviewing its equation is not more informative than the pmf equation. However, for continuous distributions the cdf can take a simpler form than the corresponding pdf. For example, for a Weibull distribution:

(3)

The latter is simpler to envisage.

Many cdfs have a component that involves the exponential function (e.g. Weibull, Exponential, Extreme Value, Laplace, Logistic, Rayleigh). Exp(- ∞) = 0 and Exp(0) = 1 which is the range of F(x), so you'll often see functions of the form:

where g(x) is some function of x that goes from zero to infinity or infinity to zero monotonically (meaning always increasing) with increasing x. For example, Equation (3) for the Weibull distribution shows us:

The value b scales x
When x = 0, F(x) = 1-1 = 0, so the variable has a minimum of 0
When x = ∞, F(x) = 1-0 = 1, so the variable has a maximum of ∞
a makes the distribution shorter, because it 'amplifies' x. For example (leaving b=1), if a = 2 and x = 3 it calculates 32=9, whereas if a = 4 it calculates 34=81

Mean m

The mean of a probability distribution is useful to know for several reasons.

It gives a sense of the location of the distribution;
Central Limit Theorem (CLT) uses the mean;
Knowing the equation of the mean can help us understand the distribution. For example, a Gamma(a,b) distribution can be used to model the time to wait to observe a independent events that occur randomly in time with a mean time to occurrence of b. It makes intuitive sense that, 'on average', you need to wait a*b, which is the mean of the distribution;
We sometimes want to approximate one distribution with another to make the mathematics easier. Knowing the equations for the mean and variance can help us find a distribution with these same moments;
Because of CLT, the mean propagates through a model much more precisely than the mode or median. So, for example, if you replaced a distribution in a simulation model with its mean the output mean value will usually be close to the output mean when the model includes that distribution. However, the same does not apply as well by replacing a distribution with its median, and often much worse still if one uses the mode; and
A distribution is often fitted to data by matching the data's mean and variance to the mean and variance equations of the distribution - a technique known as Method of Moments.

When the pdf of a distribution is of the form f(x) = g(x - z) where g( ) is any function and z is a fixed value, the equation for the mean will be a linear function of z.

Mode

The mode is the location of the peak of a distribution, and is the most intuitive parameter to consider - the 'most likely value to occur'.

If the mode has the same equation as the mean it tells us the distribution is symmetric. If the mode is less than the mean (e.g. for the Gamma distribution, mode = (a-1)b and mean = ab) we know the distribution is right-skewed, if the mode is greater than the mean the distribution is left-skewed.

Variance V

The variance gives a measure of the spread of a distribution. We give equations for the variance rather than the mean because it avoids having square-root signs all the time, and because probability mathematicians work in terms of variance rather than standard deviation. However, it can be useful to take the square root of the variance equation (i.e. the standard deviation s) to help make more sense of it. For example, the Logistic(a,b) distribution has variance:

which shows us that b is a scaling parameter: the distribution's spread is proportional to b. Another example - the Pareto(q,a) distribution has variance:

which shows us that a is a scaling parameter.

Read on: Selecting the appropriate distributions for your model