| Download a pdf copy of this help file here |
See also: Fitting distributions to data, Goodness of fit statistics, Fitting in ModelRisk
Although still popular today, the Chi-Squared, Kolmogorov-Smirnoff and Anderson-Darling goodness of fit statistics are technically all inappropriate as a method of comparing fits of distributions to data. They are also limited to having precise observations and cannot incorporate censored, truncated or binned data. Realistically, most of the time we are fitting a continuous distribution to a set of precise observations and then the Anderson-Darling does a reasonable job.
For important work you should instead consider using statistical measures of fit called information criteria.
Let:
n = number of observations (e.g. data values, frequencies)
k = number of parameters to be estimated (e.g. the Normal distribution has 2: mu and sigma)
Lmax = the maximized value of the log-Likelihood for the estimated model (i.e. fit the parameters by MLE and record the natural log of the Likelihood.)
![]()
![]()
![]()
The aim is to find the model with the lowest value of the selected information criterion. The -2ln[Lmax] term appearing in each formula is an estimate of the deviance of the model fit. The coefficients for k in the first part of each formula shows the degree to which the number of model parameters is being penalised. For n > ~ 20 or so the SIC (Schwarz, 1997) is the strictest in penalizing loss of degree of freedom by having more parameters in the fitted model. For n > ~ 40 the AIC (Akaike, 1974, 1976) is the least strict of the three and the HQIC (Hannan and Quinn, 1979) holds the middle ground, or is the least penalizing for n < ~ 20.
ModelRisk applies these three criteria as a means of ranking each fitted model, whether it be fitting a distribution, a time series model or a copula. The following functions return the AIC, SIC or HQIC of a Fit Object directly to the spreadsheet:
If you fit a number of models to your data try not to pick automatically the fitted distribution with the best statistical result, particularly if the top two or three are close. Also look at the range and shape of the fitted distribution and see whether they correspond to what you think is.