Kolmogorov-Smirnoff (K-S) Statistic | Vose Software

# Kolmogorov-Smirnoff (K-S) Statistic

The K-S statistic Dn is defined as:

Dn = max [ | Fn(x) - F(x) | ]

where     Dn is know as the K-S distance

n = total number of data points

F(x) = distribution function of the fitted distribution

Fn(x) = i/n

i = the cumulative rank of the data point

The K-S statistic is thus only concerned with the maximum vertical distance between the cumulative distribution function of the fitted distribution and the cumulative distribution of the data. The figure below illustrates the concept for data fitted to a Uniform(0,1) distribution.

Figure 1: Illustration of a K-S distance determination for a Uniform distribution

### Method

·       The data are ranked in ascending order

·       The upper FU(i) and lower FL(i) cumulative percentiles are calculated as follows:

where i = the rank of the data point and n = the total number of data points.

·       F(x) is calculated for the Uniform distribution (in this case F(x) = x)

·       The maximum distance Di between F(i) and F(x) is calculated for each i:

Di = MAX ( ABS ( F(x) - FL(i)), ABS ( F(x) - FU(i) ))

where ABS (...) finds the absolute value

·       The maximum value of the Di distances is then the K-S distance Dn:

Dn = MAX ( {Di} )

The K-S statistic is generally more useful than the c2 statistic in that the data are assessed at all data points and avoids the problem of determining the number of bands to split the data into. However, its value is only determined by the one largest discrepancy and takes no account of the lack of fit across the rest of the distribution.

The difference between the observed distribution Fn(x) and the theoretical fitted distribution F(x) at any point, say x0, itself has a distribution with a mean of zero and a standard deviation sK-S given by binomial theory:

sK-S

The size of the K-S standard deviation sK-S varies considerably over the x-range of a fitted distribution, and depends greatly on the type of distribution being fitted, as shown in the graphs below:

:

Figure 2: Standard deviation sK-S for a number of distribution types with n = 100

The position of Dn along the x-axis is more likely to occur where sK-S is greatest which, Figure 2 shows, will generally be away from the low probability tails. This insensitivity of the K-S statistic to lack of fit at the extremes of the distributions is corrected for in the Anderson-Darling statistic.