Fitting a continuous non-parametric first-order distribution to data

Download a pdf copy of this help file  here

See also: Fitting distributions to data, Fitting in ModelRisk, Analyzing and using data introduction

If the observed data are continuous and reasonably extensive, it is often sufficient to use a cumulative frequency plot of the data points themselves (sometimes known as an ogive) to define the variable's probability distribution. The figure below illustrates an example with 18 data points.

The observed F(x) values are calculated as the expected F(x) that would correspond to a random sampling from the distribution, i.e. F(xi) = i / (n + 1) where i is the rank of the observed data point and n is the number of data points. An explanation for this formula is provided here. Determination of the empirical cumulative distribution proceeds as follows:

      

      This formula maximises the chance of replicating the true distribution. Use the VoseOgive1 function to generate an array of these F(xi) values directly.

If there is a very large amount of data, it becomes impracticable to use all of the data points to define the Cumulative distribution. In such cases, it is useful to batch the data first. The number of batches should balance fineness of detail (large number of bars) with the practicalities of having large arrays defining the distribution (lower number of bars).