Introduction to Bayesian inference concepts | Vose Software

Introduction to Bayesian inference concepts

Bayesian inference is based on Bayes' Theorem, the logic of which was first proposed in Bayes (1763). Bayes' Theorem states:

                                 (1)

We're on the first line and already it looks awful if you are not familiar with the topic. The formula is best explained with an example:

Notation for Bayesian inference

If you are unfamiliar with Bayesian notation, expand this link.

Bayesian inference is about shapes

The basic equation of Bayesian inference is:

     when q is continuous

         when q is discrete

The denominators in these equations are normalizing constants to give the posterior distribution a total confidence of one. Since the denominator is simply a scalar value and not a function of q, one can rewrite the equations in a form that is generally more convenient:

                                          (4)

The shape of the prior distribution embodies the amount of knowledge we have about the parameter to start with. The more informed we are, the more focused the prior distribution will be:

Example 1: Comparison of the shapes of relatively more and less informed priors

The shape of the likelihood function embodies the amount of information contained in the data. If the information it contains is small, the likelihood function will be broadly distributed, whereas if the information it contains is large, the likelihood function will be tightly focused around some particular value of the parameter:

Example 2: Comparison of the shapes of likelihood functions for two data sets. The data set with the greatest information has a much greater focus.

But the amount of information contained in the data can only be measured by how much it changes what you believe. If someone tells you something you already know, you haven't learned anything, but if another person was told the same information, they might have learned a lot. Keeping to our graphical review, the flatter the likelihood function relative to the prior, the smaller the amount of information the data contains:

Example 3: The likelihood is flat relative to the prior so has little effect on the level of knowledge (the prior and posterior are very similar)

Example 4: The likelihood is highly peaked relative to the prior so has a great influence on the level of knowledge (the likelihood and posterior have very similar shapes)

The closer the shape of the likelihood function to the prior distribution, the smaller the amount of knowledge the data contains and so the posterior distribution will not change greatly from the prior. In other words, one would not have learned very much from the data:

Example 5: Prior and likelihood have similar shapes (i.e. they agree) so the posterior distribution is not greatly influenced by the prior.

On the other hand, if the focus of the likelihood function is very different from the prior we will have learned a lot from the data:

Example 6: The likelihood is highly peaked relative to the prior so has a great influence on the level of knowledge (the likelihood and posterior have very similar shapes)

That we learn a lot from a set of data does not necessarily mean that we are more confident about the parameter value afterwards. If the prior and likelihood strongly conflict, it is quite possible that our posterior distribution is broader than our prior. Conversely, if the likelihood leans towards an extreme of the possible range for the parameter, we can have a likelihood that has a significantly different emphasis to the prior, yet we get a posterior distribution that is narrower than the prior distribution:

Example 7: The likelihood is highly peaked relative to the prior and focused on one extreme of the prior's range, so is in reasonable disagreement with the prior, yet the posterior is strongly focused despite the disagreement because the parameter cannot be negative and is therefore constrained at zero.

See Also

 

Navigation