Goodness of Fit Plots | Vose Software

# Goodness of Fit Plots

Goodness-of-fit plots offer the analyst a visual comparison between the data and fitted distributions. They provide an overall picture of the errors in a way that a goodness-of-fit statistic cannot and allow the analyst to select the best fitting distribution in a more qualitative and intuitive way. Several types of plot are in common use. Their individual merits are discussed below.

Note that the goodness of fit plots described below are implemented in the Vose Distribution Fit window from ModelRisk.

But be careful: make sure that you have an appropriate expectation of the level of correspondence between a data set of the size you have and the parent distribution it would have come from. For example, in the figure below 20 samples of 20 data points are drawn from a Normal distribution, and then plotted as histograms. You'll notice that there is little in each histogram to suggest the data's original distribution. Moreover, a histogram of all 400 samples is still not a particularly convincing match visually to a Normal.

Figure 1: Histograms of 20 samples of 20 values drawn randomly from a Normal distribution. the red plot is a histogram of all 400 samples

### Comparison of Probability Density

Overlaying a histogram plot of the data with a density function of the fitted distribution is usually the most informative comparison. It is easy to see where the main discrepancies are and whether the general shape of the data and fitted distribution compare well. The same scale and number of histogram bars should be used for all plots if a direct comparison of several distribution fits is to be made for the same data.

### Comparison of Probability Distributions

An Overlay of the cumulative frequency plots of the data and the fitted distribution is sometimes used. However, this plot has a very insensitive scale and the cumulative frequency of most distribution types follow very similar S-curves. This type of plot will therefore only show up very large differences between the data and fitted distributions and is not generally recommended as a visual measure of the goodness-of-fit.

### Probability-Probability (P-P) Plots

This is a plot of the cumulative distribution of the fitted curve F(x) against the cumulative frequency Fn(x) = i/(n+1) for all values of xi. The better the fit, the closer this plot resembles a straight line. It can be useful if one is interested in closely matching cumulative percentiles and it will show significant differences between the middles of the two distributions. However, the plot is far less sensitive to any discrepancies in fit than the Comparison of Probability Density plot and is therefore not often used. It can also be rather confusing when used to review discrete data (see below) where a fairly good fit can easily be masked, especially if there are only a few allowable x-values.

### Quantile-Quantile (Q-Q) Plots

This is a plot of the observed data xi against the x-values where F(x) = Fn(x), i.e. = i/(n+1). As with P-P plots, the better the fit, the closer this plot resembles a straight line and can be useful if one is interested in closely matching cumulative percentiles and it will show significant differences between the tails of the two distributions. However, the plot suffers from the same insensitivity problem as the Probability-Probability plots.