Fitting in ModelRisk | Vose Software

Fitting in ModelRisk

See also: Fitting distributions to data,Estimating model parameters from data, Goodness of fit statistics, Comparing fitted models using the SIC HQIC or AIC information criterion








ModelRisk allows one to fit a distribution, time series or a copula to spreadsheet data.

All fits are performed using Maximum Likelihood Estimation (MLE) methods. In the fitting windows (see list on the right) different fitted models can be ranked according to SIC, HQIC or AIC (Akaike) information criteria.

About the uncertainty parameter

The uncertainty parameter is common to all ModelRisk fitting functions. It allows the inclusion of uncertainty about the fitted model parameter estimates. Unfortunately, it is common practice in risk analysis to use just the maximum likelihood estimates (MLEs) for a fitted distribution, copula or time series. However, when there are relatively few data available or when the model needs to be precise, omitting the uncertainty about the true parameter values can lead to significant underestimation of the model output uncertainty.

The Uncertainty parameter is set to FALSE by default (i.e. returns MLEs or projections based on MLEs) to coincide with common practice, but we strongly recommend setting it to TRUE. Uncertainty values are then generated for the fitted parameters using parametric bootstrapping techniques, which has the great advantage of allowing correlation structure between uncertain parameters and non-normal marginal uncertainty distributions, the latter being an important constraint of more classical methods based on asymptotic results (i.e. when the amount of data approaches infinity).

Distribution fitting functions

For non-parametric bootstrapping techniques for estimation of parameters, ModelRisk has the VoseNBoot functions.

For each univariate distribution in ModelRisk, a number of fitting functions are included:

VoseDistributionFit

Returns a sampled value from a distribution fitted to the data using Maximum Likelihood Estimation. The general syntax is:

=VoseDistributionFit({data}, Uncertainty, U)

where Distribution is replaced by the name of the distribution.

The parameters are:

  • {data} - array containing data to fit the distribution to.

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted distribution (as explained above), and FALSE (default) to use the MLE. When set to TRUE, a new fitted parameter value is used on each spreadsheet recalculation through bootstrapping techniques.

  • U - optional parameter specifying the cumulative percentile of the distribution. If omitted the function generates random values. Also see Distribution functions and the U parameter.

For example, if DataSet is an array of data, VoseNormalFit(DataSet) will return a random value from a Normal distribution that is the MLE fit to the DataSet. VoseNormalFit(DataSet,1) will use bootstrapping to simulate the uncertainty about the fitted parameters.

If we want to use VoseDistributionFit to generate multiple random values from a fitted distribution with uncertainty included (i.e. Uncertainty=TRUE), there are two ways to do this:  

1.      In one cell, or in many cells, but not as array function

2.      In many cells as array function

In the first case, the uncertainty and variability are mixed, because each random value is sampled form a different distribution. However in the second case, all random values are sampled from the same distribution and the distribution will change only with the next iteration.

You can read more about separating uncertainty and randomness in the Separating uncertainty from randomness and variability introduction topic.

VoseDistributionFitP

Array function that returns the parameters of the VoseDistribution fitted to the data. The general syntax is:

{=VoseDistributionFitP({data}, Uncertainty)}

where Distribution is replaced by the name of the distribution.

  • {data} - array containing data to fit the distribution to.

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted distribution (as explained above), and FALSE (default) to use the MLE.

The output array size should be one-dimensional, with the number of cells equal to the number of estimated parameters. The fitted parameters are returned in the same order as they are in the corresponding VoseDistribution (simulation) function.

So, for example,

{=VoseNormalFitP({1,2,2,3},0)}

should have an output of two cells. The function will return best fitting values for Mu and Sigma, in that order,  because the ModelRisk syntax for the normal distribution is VoseNormal(Mu, Sigma).

VoseDistributionFitObject

Constructs a distribution object of the fitted distribution.  General syntax:

=VoseDistributionFitObject({data}, Uncertainty)

where Distribution is replaced by the name of the distribution.

  • {data} - array containing data to fit the distribution to.

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted distribution, and FALSE (default) to use the MLE.

Time series fitting functions

For each time series in ModelRisk the following fitting functions are included.

VoseTimeSeriesFit

Generates a sequence of random values of a time series model fitted to the data using Maximum Likelihood Estimation. Syntax:

{=VoseTimeSeriesFit({data}, Uncertainty, Log Returns, Initial Value)}

where Series is replaced by the name of the time series.

  • {data} - array containing data to fit the time series to.

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted time series (as explained above), and FALSE (default) to use the MLE.

  • Log Returns - optional boolean (TRUE/FALSE) parameter that specifies whether the time series are in log returns. Default is FALSE.

  • Initial Value - last known historic value. The generated time series values will continue on from this value. Should only be provided if the Log Return parameter is set to FALSE or omitted.

For example, if DataSet is an array of historical data, {=VoseTimeAR1Fit(DataSet)}) will return a random value from a AR1 time series that is the MLE fit to the DataSet.

{=VoseTimeAR1Fit(DataSet,1)} will use bootstrapping to simulate the uncertainty about the fitted parameters.

When the data fitted to a time series takes negative values ModelRisk recognizes that these data can only be log returns, not the actual value of the variable. In this situation, the Log Return option is automatically selected and ModelRisk will produce a forecast of log returns, making Initial Value redundant as described above.

VoseTimeSeriesFitP

Array function that returns the parameters of a time series model fitted to the data using Maximum Likelihood Estimation. General syntax:

=VoseTimeSeriesFit({data}, Uncertainty, Log Returns)

where Series is replaced by the name of the time series.

    • {data} - array containing data to fit the time series to.

    • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted time series, and FALSE (default) to use the MLE.

    • Log Returns - optional boolean (TRUE/FALSE) parameter that specifies whether the time series are in log returns. Default is FALSE.

For example, if DataSet is an array of historical data, {=VoseTimeAR1FitP(DataSet)}) will return the parameters from a AR1 time series that is the MLE fit to the DataSet.

The output array size should be one-dimensional, with the number of cells equal to the number of estimated parameters. The fitted parameters are returned in the same order as they are in the corresponding VoseTimeSeries (simulating) function.

So, for example,

{=VoseTimeGBMFitP({1,2,2,3},0)}

should have an output of two cells. The function will return best fitting values for Mu and Sigma, in that order,  because the ModelRisk syntax for modeling an GBM Time Series is VoseTimeGBM(Mu, Sigma,[other parameters]).

Copula fitting functions

For each of the Copulas available in ModelRisk, the following fitting functions are included:

VoseCopulaFit

Array function that generates values from the bivariate or multivariate copula fitted to the data using Maximum Likelihood Estimation. The syntax for fitting bivariate respectively multivariate copula is

 

 

 

 

 

 

 

{=VoseCopulaBiNameFit({data},Data_in_rows,Uncertainty)}

{=VoseCopulaMultiNameFit({data},Data_in_rows,Uncertainty)}

where Name is replaced by the name of the copula.

For the bivariate archimedean copulas (Clayton, Gumbel, Frank) this function chooses the direction of correlation that best fits the data and simulates from the fitted copula. Alternatively one can use the standard multivariate copula fitting functions, even for bivariate data, which will assume the standard direction of the fitted copula.

  • {data} - the array of data to fit the copula to. This should be an 2-dimensional array for fitting a bivariate copula, or n-dimensional where n>2 for fitting a multivariate copula.

  • Data_in_rows - optional boolean parameter that specifies whether the data is in columns (FALSE,default) or rows (TRUE).

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted time series, and FALSE (default) to use the MLE.

The scatter plot shown on the right illustrates a Gumbel copula fitted to bivariate data.

Note from the scatter plot that the data are negatively correlated. The bivariate Gumbel can adapt to this by rotating the fitted copula, and gives a fitted Direction parameter of 3, and a value of theta = 3.036.

In contrast, the multivariate Gumbel copula only has Direction = 1 at its disposal so would provide a very poor fit.

VoseCopulaFitP

Array function that returns the parameter(s) from the bivariate or multivariate copula fitted to the data.  The syntax is:

{=VoseCopulaBiNameFitP({data},Data_in_rows,Uncertainty)}

{=VoseCopulaMultiNameFitP({data},Data_in_rows,Uncertainty)}

where Name is replaced by the name of the copula.

  • {data} - the array of data to fit the copula to. This should be an 2-dimensional array for fitting a bivariate copula, or n-dimensional where n>2 for fitting a multivariate copula.

  • Data_in_rows - optional boolean parameter that specifies whether the data is in columns (FALSE,default) or rows (TRUE).

  • Uncertainty - optional boolean parameter. Set TRUE to include uncertainty about the fitted time series, and FALSE (default) to use the MLE.

The output array size should be one-dimensional, with the number of cells equal to the number of estimated parameters. The fitted parameters are returned in the same order as they are in the corresponding VoseCopula (simulating) function.

So, for example,

{=VoseCopulaBiTFitP(Data,0)}

should have an output of two cells. The function will return best fitting values for the Nu and Covariance parameters, in that order,  because the ModelRisk syntax for modeling a bivariate T copula is VoseCopulaBiT(Nu,Covariance).

VoseCopulaData

This array function generates random values from an empirical copula constructed entirely from the correlation pattern of given data. Syntax:

{=VoseCopuladata({data},Data_in_rows)}

  • {data} - the spreadsheet data from which to construct the copula. This should be at least a two-dimensional array.

  • Data_in_rows - a boolean parameter (TRUE/FALSE) that specifies whether the data is oriented in rows (TRUE) or not (FALSE, default)

Note the difference between constructing an empirical copula, and fitting an existing type of copula:

When fitting a copula, we determine the parameter of the copula that makes for a best fit to the data, but retaining the copula's functional form. With the empirical copula, the functional form itself (not just the parameter) is based on the data, making it a flexible tool for capturing any correlation pattern, however unusual (for example the one shown on the right).

An illustration of the use of the empirical copula on the right is given in the example model

VoseCopulaData.xls
 

 

Navigation