Uninformed priors | Vose Software

Uninformed priors


An uninformed prior has a distribution that would be considered to add no information to the Bayesian inference. For example, a Uniform(0,1) distribution could be considered an uninformed prior when estimating a binomial probability because it states that prior to collection of any data we consider every possible value for the true probability to be as likely as every other. An uninformed prior is often desirable in the development of public policy to demonstrate impartiality. Laplace (1812), who also independently stated Bayes' Theorem in Laplace (1774) 11 years after Bayes' essay was published (he apparently had not seen Bayes' essay), proposed that public policy priors should assume all allowable values to have equal likelihood (i.e. Uniform or Duniform distributions).

At first glance then, it might seem that uninformed priors will just be Uniform distributions running across the entire range of possible values for the parameter:

 

Figure 1: These intuitively seem to be logical uninformed priors, except for setting ranges, but are they?

That this is not true can be easily demonstrated from the following example. Consider the task of estimating the intensity l of a Poisson process (this is equivalent to fitting a Poisson distribution to data). We have observed a certain number of events within a certain period, which we can use to give us a likelihood function. It might seem reasonable to assign a Uniform(0, z) prior to l, where z is some large number. However, we could just have easily parameterised the problem in terms of b, the mean exposure between events. Since b = 1/l, we can quickly check what a Uniform(0, z) prior for l would look like as a prior for b by running a simulation on the formula: =1/Uniform(0, z):

The result is actually impossible to show as a density plot because it has an infinite peak at zero, but this plot compares the Uniform(0.01,100) with a 1/Uniform(0.01,100) on a cumulative scale:

Figure 2: Cumulative prior distributions: p(l) = Uniform(0.01,100) and b = 1/l

The prior for b is alarmingly far from being uninformed with respect to b! Of course, the reverse equally applies: if we had performed a Bayesian inference on b with a Uniform prior, the prior for l would be just as far from being uninformed. The probability density function for the prior distribution of a parameter must be known in order to perform a Bayesian inference calculation. However, one can often chose between a number of different parameterisations that would equally well describe the same stochastic process. For example, one could describe a Poisson process by l, the mean number events per unit exposure, by b, the mean exposure between events as above, or by P(x>0), the probability of at least one event in a unit of exposure.

The Jacobian transformation let's us formally calculate the prior distribution for a Bayesian inference problem after reparameterising, but it is easy to see why the two priors are not the same with an example:

If l is given a Uniform(0,z) prior then the fraction 1/z of the distribution is below 1 and the rest between 1 and z. The prior for b would then be 1/Uniform(0,z) and that has the fraction (z-1)/z below 1 because it is <1 when the Uniform(0,z) is >1.

There is no all-embracing solution to the problem of finding an uninformed prior that doesn't become "informed" under some reparameterising of the problem, but one approach is to select a prior that is the same under a transformation, so that it is at least not affecting the estimation depending on the parameterisation chosen.

For example, consider a prior such that log10(q) (or loge(q)) is Uniform( , ) distributed:

The parameter 1/q would have a prior log10(1/q) = -log10(q) = Uniform( , )

The parameter c*q would have a prior log10(c*q) = log10(c) + log10(q) = Uniform( , )

The parameter qc would have a prior log10(qc) = c*log10(q) = Uniform( , )

This prior is therefore invariant under a number of transformations. Replotting Figure 2 on a log scale shows the relationship graphically:

Figure 3: Plotting q on a log scale against the cumulative distributions for Figure 2 shows their reflective relationship.

Using the Jacobian transformation it can be shown that log(q) = Uniform( , ) is equivalent to the prior density  for a parameter that can take any positive real value. You probably wouldn't describe that distribution as very uninformed, but it is arguably the best one can do for this particular problem. It is worth remembering too that if there is a reasonable amount of data available the likelihood function l(X|q) will overpower the prior p(q) = 1/q and then the shape of the prior becomes unimportant. This will occur much more quickly if the likelihood function is a maximum in a region of q where the prior is flatter.

A location parameter of a distribution should have the same effective prior irrespective of the scaling chosen. This is achieved if we select a Uniform(a,a+b)  prior for q, i.e. p(q) = 1/b where b is the range of the Uniform distribution. Changing the scale has no effect on the shape of the prior. For example, a prior Uniform(0,10kg) has a density p(q) = 0.1kg-1.  Changing the units to grams, we would have a prior Uniform(0,10000g) which again has the constant density p(q) = 10-4g-1.

Parametric distribution often have either of or both a location parameter and a scale parameter. If more than one parameter is unknown and one is attempting to estimate these parameters, it is common practice to assume independence between the two parameters in the prior: the logic is that assumption of independence is more uninformed than an assumption of any specific degree of dependence. The joint prior for a scale parameter and a location parameter is then simply the product of the two priors. So, for example, the prior for the mean of a normal distribution is  since m is a location parameter the prior for the standard deviation of the normal distribution is  since s is a scale parameter, and their joint prior is given by the product of the two priors, i.e. .

See Also

 

Navigation