Comparing fitted models using the SIC, HQIC or AIC information criteria | Vose Software

Comparing fitted models using the SIC, HQIC or AIC information criteria

See also: Fitting distributions to data, Goodness of fit statistics, Fitting in ModelRisk

Although still popular today, the Chi-Squared, Kolmogorov-Smirnoff and Anderson-Darling goodness of fit statistics are technically all inappropriate as a method of comparing fits of distributions to data. They are also limited to having precise observations and cannot incorporate censored, truncated or binned data. Realistically, most of the time we are fitting a continuous distribution to a set of precise observations and then the Anderson-Darling does a reasonable job.

For important work you should instead consider using statistical measures of fit called information criteria.

Let:

  • n = number of observations (e.g. data values, frequencies)

  • k = number of parameters to be estimated (e.g. the Normal distribution has 2: mu and sigma)

  • Lmax = the maximized value of the log-Likelihood for the estimated model (i.e. fit the parameters by MLE and record the natural log of the Likelihood.)

SIC (Schwarz information criterion, aka Bayesian information criterion BIC)

AIC (Akaike information criterion)

                

HQIC (Hannan-Quinn information criterion)

The aim is to find the model with the lowest value of the selected information criterion. The -2ln[Lmax] term appearing in each formula is an estimate of the deviance of the model fit. The coefficients for k in the first part of each formula shows the degree to which the number of model parameters is being penalised. SIC (Schwarz, 1997) and HQIC (Hannan and Quinn, 1979) are stricter in penalizing loss of degree of freedom than AIC (Akaike, 1974, 1976).

ModelRisk applies these three criteria as a means of ranking each fitted model, whether it be fitting a distribution, a time series model or a copula. The following functions return the AIC, SIC or HQIC of a Fit Object directly to the spreadsheet:

If you fit a number of models to your data try not to pick automatically the fitted distribution with the best statistical result, particularly if the top two or three are close. Also look at the range and shape of the fitted distribution and see whether they correspond to what you think is.

 

Navigation