Fitting distributions to data | Vose Software

Fitting distributions to data

See also: ModelRisk functions and windows, Distribution fitting functions, Fitting distributions to data

 

Introduction

In the Distribution Fit window you can fit distributions to a set of data in the spreadsheet. The distribution's parameters are estimated using maximum likelihood estimation (MLE).

The fitted distributions are ranked according to the SIC, AIC (Akaike) and HQIC information criteria. For these holds: the lower an information criterion, the better the fit. To avoid confusion the negatives of these criteria are displayed in the list. This means that:

the higher the value shown in the list, the better the fit.

AIC and the other Information Criteria are superior goodness of fit statistics to other fit ranking criteria (e.g. chi-squared), because they take into account the number of parameters estimated, and penalize for overfitting: a model that has a good fit using fewer parameters is preferred over one that needs more parameters. You can read more about information criteria here.

The AIC is the least strict of the three in penalizing for more parameters, while SIC is the strictest. More information on information criteria can be found here.

The graph toolbar has additional buttons that show density, mass, P-P and Q-Q plots, for visual inspection of the quality of the fit. Also see Goodness of Fit Plots for a more detailed explanation.

The fitted distribution can be dynamically linked to the spreadsheet data. From the fitted distribution you can insert a random value, percentile calculation, distribution object, etc. Also, the fitted parameters themselves can be inserted in the spreadsheet.

To list all the distributions you can fit click here.

Window elements

In the Data location field, you can specify where in the spreadsheet the data is located. Note that this can be an array of any dimension, though in most cases it will be one-dimensional.

You can take truncated data in account by checking the Enabled box. If enabled, the minimum and maximum can be provided.

In the Distribution List, you specify what distributions to fit the data to. Add and remove distributions by pressing the Add and Remove buttons, respectively. Note that, when adding distributions to fit, you can only select distributions that can be fitted to this data. For example, you can not:

  • fit a discrete distribution to continuous data; or

  • fit a bounded distribution (e.g. a Beta) to a data set that has data points outside of the distribution's boundaries.

In the Distribution List, you can rank the fitted distributions by the SIC, HQIC or AIC information criterion. The higher the value shown in the list, the better the fit.

By marking the checkbox you can choose whether or not to include uncertainty about the fitted distribution's parameters. On the preview graph, this is represented by grey lines added to the fitted distribution's graph. To read the motivation behind this parameter click here. Selecting the ‘Overlay’ option allows you to compare the fit of two or more selected distributions together:



 

Below the graph you can specify the number of bins to group the data in (this only affects the image, not the fitting algorithm) and the number of lines generated to represent uncertainty in fitted distribution with.

Click the button above the preview graph to insert the fitted distribution in the spreadsheet. The following options are available

  • Random sample (linked to data) - generate random values from the fitted distribution.
  • Random sample (not linked to data) - insert the fitted distribution with static values as parameters, i.e. not dynamically linked to the source data.
  • Estimated parameters - insert the fitted distribution's parameters

  • Quantile(U) - calculate a quantile value of the fitted distribution (through the U parameter)

  • Object - construct a fitted distribution object.

 

Industrial version only:

Clicking the “Create report” button  above the chart will produce a fit report in a new Worksheet with the fitted models in a table. The table will have the fitted distribution objects, Goodness of Fit rankings, statistics and percentiles of the fitted models. The report will also include the OptimatFit function that automatically returns the best fitted model according to the selected information criteria.

An example of such report is available in the  example model.

For explanations about other fields, buttons, graphs and summary statistics tables in this window, see Common elements of ModelRisk windows.

Useful tips and tricks

See also: Graphics, workflow and error handling in ModelRisk

Using View Function to return to a window

The output of ModelRisk windows always corresponds to VoseFunctions (the functions ModelRisk adds to Excel) being entered into one or more spreadsheet cells.

You can always re-open the window for a ModelRisk function that is in a spreadsheet cell by using View Function. Select the spreadsheet cell and then select View Function from the ModelRisk menu/toolbar/ribbon.

 

 

Navigation