Goodness of Fit Plots

See also: Fitting distributions to data, Fitting in ModelRisk, Analyzing and using data

Goodness-of-fit plots offer the analyst a visual comparison between the data and fitted distributions. They provide an overall picture of the errors in a way that a goodness-of-fit statistic cannot and allow the analyst to select the best fitting distribution in a more qualitative and intuitive way. Several types of plot are in common use. Their individual merits are discussed below.

Note that the goodness of fit plots described below are implemented in the Vose Distribution Fit window from ModelRisk.

But be careful: make sure that you have an appropriate expectation of the level of correspondence between a data set of the size you have and the parent distribution it would have come from. For example, in the figure below 20 samples of 20 data points are drawn from a Normal distribution, and then plotted as histograms. You'll notice that there is little in each histogram to suggest the data's original distribution. Moreover, a histogram of all 400 samples is still not a particularly convincing match visually to a Normal.

Figure 1: Histograms of 20 samples of 20 values drawn randomly from a Normal distribution. the red plot is a histogram of all 400 samples


Comparison of Probability Density

Overlaying a histogram plot of the data with a density function of the fitted distribution is usually the most informative comparison. It is easy to see where the main discrepancies are and whether the general shape of the data and fitted distribution compare well. The same scale and number of histogram bars should be used for all plots if a direct comparison of several distribution fits is to be made for the same data.

Comparison of Probability Distributions

An Overlay of the cumulative frequency plots of the data and the fitted distribution is sometimes used. However, this plot has a very insensitive scale and the cumulative frequency of most distribution types follow very similar S-curves. This type of plot will therefore only show up very large differences between the data and fitted distributions and is not generally recommended as a visual measure of the goodness-of-fit.

Probability-Probability (P-P) Plots

This is a plot of the cumulative distribution of the fitted curve F(x) against the cumulative frequency Fn(x) = i/(n+1) for all values of xi. The better the fit, the closer this plot resembles a straight line. It can be useful if one is interested in closely matching cumulative percentiles and it will show significant differences between the middles of the two distributions. However, the plot is far less sensitive to any discrepancies in fit than the Comparison of Probability Density plot and is therefore not often used. It can also be rather confusing when used to review discrete data (see below) where a fairly good fit can easily be masked, especially if there are only a few allowable x-values.

Quantile-Quantile (Q-Q) Plots

This is a plot of the observed data xi against the x-values where F(x) = Fn(x), i.e. = i/(n+1). As with P-P plots, the better the fit, the closer this plot resembles a straight line and can be useful if one is interested in closely matching cumulative percentiles and it will show significant differences between the tails of the two distributions. However, the plot suffers from the same insensitivity problem as the Probability-Probability plots.

 

ModelRisk

Monte Carlo simulation in Excel. Learn more

Tamara

Adding risk and uncertainty to your project schedule. Learn more

Navigation

FREE MONTE CARLO SIMULATION SOFTWARE

For Microsoft Excel

Download your free copy of ModelRisk Basic today. Professional quality risk modeling software and no catches

Download ModelRisk Basic now

FREE PROJECT RISK SOFTWARE

For Primavera & Microsoft Project

Download your free copy of Tamara Basic today. Professional quality project risk software and no catches.

Download Tamara Basic now
-->