Estimating model parameters from data

Model parameters are fixed quantitative values that characterise the model we believe reflects the real world. They have to be estimated either by statistical inference from observations, as discussed in this section, or by expert opinion.

The starting point of statistical inference is some postulated probability model that relates the true value of the parameter in nature to the set of observations we have. That model could be as simple as believing that the observations are random samples from some Normal distribution, or could be far more involved. Looking at the data from the viewpoint of the assumed probability model, we then back-calculate to infer true value of the parameter(s) being estimated. We recommend that you take a good look at stochastic processes, which are the foundations of statistics and will help you greatly in understanding the topics of this section.

There are essentially three different approaches to statistical inference:

Classical statistics
The Bootstrap
Bayesian inference

Each of these has its own strengths and weaknesses.

The goal is a distribution

Most of you will be familiar with some classical statistics results, but will be used to seeing an output in the form of a best guess together with a confidence bound. For example, "m = 0.32 [95CI: 0.27, 0.37]". Although this is common scientific and engineering practice, the output is inadequate for a risk analyst's needs: we need to have the entire uncertainty distribution from which we can generate values. Thus, you will see, for example, the 'z-test' described here in the form:

That is, we have a distribution of our uncertainty about a true mean of a population of probability distribution. When there are two or more parameters being estimated at the same time we need some method to generate samples from their joint distribution - which sometimes requires some ingenuity.

Three approaches

Classical statistics is what we generally get taught at school and university: the z-test, t-test, chi-squared test. All useful, but how often did we understand why? Mostly, we just got taught a set of procedures to follow. We hope that the material presented here will clear up a lot of that mystery.

The Bootstrap is a particular classical technique that is becoming ever more popular with good reason. It requires that we make far fewer assumptions than the most common classical methods, and has the flexibility to answer many more questions. It is also blessed in returning the same answers as classical statistics where the assumptions match.

Bayesian inference is, in our view, the most powerful of all the methods we present. It has the ability to estimate many parameters from the same data set, it is explicit about its models and assumptions, it is easy to incorporate different data sets into the one estimate, and it is very intuitive too. A lot of nonsense has been said by some classical statisticians that Bayesian inference is extremely subjective, and one could therefore come up with any answer one wished. Each method has its own problems, as you will see, but where all methods can be used on the same data set they usually come up with exactly the same, or very nearly the same, answer. We encourage you to keep an open mind, not to classify yourself as being in one camp or another and just pick the methods that work.

Estimating model parameters from data

The goal is a distribution

Three approaches

Navigation