Critical Values and Confidence Intervals for Goodness-of-Fit Statistics

See also: Fitting distributions to data, Fitting in ModelRisk, Analyzing and using data introduction

Analysis of the c2, K-S and A-D statistics can provide confidence intervals proportional to the probability that the fitted distribution could have produced the observed data. It is important to note that this is not equivalent to the probability that the data did, in fact, come from the fitted distribution, since there may be many distributions that have similar shapes and that could have been quite capable of generating the observed data. This is particularly so for data that are approximately normally distributed, since many distributions tend to a Normal shape under certain conditions.

Critical values are determined by the required confidence level a. They are the values of the goodness-of-fit statistic that has a probability of being exceeded that is equal to the specified confidence level. Critical values for the c2 test are found directly from the c2 distribution. The shape and range of the c2 distribution are defined by the degrees of freedom n where:

                n = N-a-1

N = number of histogram bars or classes

a = number of parameters that are estimated to determine the best-fitting distribution

Critical values for K-S and A-D statistics have been found by Monte Carlo simulation (Stephens 1974, Stephens 1977 and Chandra et al 1981). Tables of critical values for the K-S statistic are very commonly found in statistical text books. Unfortunately, the standard K-S and A-D values are of limited use for comparing critical values if there are fewer than about 30 data points. The problem arises because these statistics are designed to test whether a distribution with known parameters could have produced the observed data. If the parameters of the fitted distribution have been estimated from the data, the K-S and A-D statistics will produce conservative test results, i.e. there is a smaller chance of a well fitting distribution being accepted. The size of this effect varies between the type of distribution being fitted.

Modifications to the K-S and A-D statistics have been determined to correct for this problem as follows where n is the number of data points and Dn and An2 are the unmodified K-S and A-D statistics respectively:

 

Kolmogorov-Smirnoff Statistics

Distribution

Modified Test Statistic

Normal

Exponential

Weibull & Extreme Value

All others

Anderson-Darling Statistics

Distribution

Modified Test Statistic

Normal

Exponential

Weibull & Extreme Value

All others

Another goodness-of-fit statistic with intuitive appeal, similar to the A-D and K-S statistics, is the Cramer-von Mises statistic Y:

The statistic essentially sums the squared differences between the cumulative percentile F0(Xi) for the fitted distribution for each Xi observation and the average of i/n and (i-1)/n: the low and high plots of the empirical cumulative distribution of Xi values. Tables for this statistic can be found in Anderson and Darling (1952).

 

ModelRisk

Monte Carlo simulation in Excel. Learn more

Tamara

Adding risk and uncertainty to your project schedule. Learn more

Navigation

FREE MONTE CARLO SIMULATION SOFTWARE

For Microsoft Excel

Download your free copy of ModelRisk Basic today. Professional quality risk modeling software and no catches

Download ModelRisk Basic now

FREE PROJECT RISK SOFTWARE

For Primavera & Microsoft Project

Download your free copy of Tamara Basic today. Professional quality project risk software and no catches.

Download Tamara Basic now
-->