Fitting a distribution to truncated, censored or binned data | Vose Software

Fitting a distribution to truncated, censored or binned data

Maximum likelihood methods offer the greatest flexibility for distribution fitting because we need only be able to write a probability model that corresponds with how our data are observed and then maximise that probability by varying the parameters.

Censored data are those observations that we do not now precisely, only that they fall above or below a certain value. For example, a weight scales will have a maximum value X it can record: we might have some measurement off the scale and all we can say is that they are greater than X.

Truncated data are those observations that we do not see above or below some level. For example, at a bank it may not be required to record an error below \$100 and a sieve system may not select out diamonds from a river below a certain diameter.

Binned data are those observations that we only know the value of in terms of bins or categories. For example, one might record in a survey that customers were (0,10], (10,20], 920-40] and (40+) years of age.

It is a simple matter to produce a probability model for each category or combination, as shown in the following examples where we are fitting to a continuous variable with density f(x) and cumulative probability F(x).

Censored data

Observations: Measurement censored at Min and Max. Observations between Min and Max are a,b,c,d and e. p observations below Min and q observations above Max.

Likelihood function: f(a)*f(b)*f(c)*f(d)*f(e)*F(Min)p*(1-F(Max))q

Explanation: For p values we only know that they are below some value Min, and the probability of being below Min is F(Min). We know q values are above Max, each with probability (1-F(max)). The other values we have the exact measurements for.

Truncated data

Observations: Measurement truncated at Min and Max. Observations between Min and Max are a,b,c,d and e.

Likelihood function: f(a)*f(b)*f(c)*f(d)*f(e)/(F(Max)-F(Min))5

Explanation: We only observe a value if it lies between Min and Max which has probability (F(Max)-F(Min)).

In the ModelRisk distribution fit window you can indicate if the measurements was truncated, and provide min and max.

Binned data

Observations: Measurement binned into continuous categories as follows:

 Bin Frequency 0-10 10 10-20 23 20-50 42 50+ 8

Likelihood function: F(10)10*(F(20)-F(10))23*(F(50)-F(20))42*(1-F(50))8

Explanation: We observe values in bins between a Low and High value with probability F(High) - F(Low).