Example of using the estimate of binomial probabilities

A (perhaps rather cynical) performance measure for lawyers is the percentage of cases he or she has won. Let's say a person is looking for a lawyer to defend his case and would like to choose the one with the highest performance. There are three lawyers in the area who are knowledgeable in the field and their past performances are displayed in the table below. Assuming that each lawyer's cases were random samples from the same population of cases (so we can assume a Binomial process) since all three lawyers work in the same field, whom should the person choose as his lawyer?

Lawyer	Number of trials done	Number of trials won	% of trials won
"Gary"	230	215	93.5%
"Jon"	34	32	94.1%
"Frank"	17	16	94.1%

Answer

The first step is to determine the probability r each lawyer would win a random case. We assume a binomial process for the trials which is probably reasonable if each trial's result is unrelated to any other trial, but will not be entirely correct if random effects change the probability for all trials, in which case it would be a mixture process. Given the binomial assumption, the number of trials n and the number of successes s allow us to estimate the probability p as Beta(s+1,n-s+1), using the Bayesian estimate with an uninformed prior (the technique is exactly the same if you use a classically derived uncertainty distribution for r).

A Beta distribution estimate of r is constructed for each lawyer. The three distributions are plotted below:

The density plot on the left shows that there is considerable overlap between the three distributions, which means that we cannot know for sure which lawyer is the best, although the cumulative plot shows that 'Gary' is second order stochastic dominant and should therefore be preferred by the defendant. This may well be sufficient for the defendant's needs, but with simulation we can also describe how confident we are that 'Gary' is actually better than the other two...

The file Lawyers.xls provides the confidence we should have that each lawyer is the best. A random sample from these distributions represent possible true performance rates (probability of success): if a value for 'Gary' is greater than for the other two lawyers in an iteration, then this scenario gives 'Gary' as being the better lawyer. In the model, an IF function for each lawyer compares the generated values from the three rates and returns a 1 if that lawyer's success rate is the largest and a value of 0 otherwise. Simulation results for this IF function would be equivalent to a Bernoulli(p) distribution, where p is the confidence the analysis gives us that each lawyer is better than the others. The mean of that Bernoulli is just p: the value we are interested in. Using simulation software to calculate the mean we can report the p value directly into Excel at the end of a simulation. In this case, we have an 38.7% confidence that 'Gary' is the best, a 33.9% confidence that Jon' is the best, and therefore a 27.5% confidence that 'Frank' is the best.

Numerical integration

This model is an example of Numerical Integration: a broad technique that has many applications in risk analysis. The confidence value we are looking for is, in words, the confidence of rate(Gary)=x AND rate(Jon)<x AND rate(Frank)<x, summed over all value of x. In equation terms this is:

where f(x) is the Beta pdf and F(x) is the Beta cdf.

In the model we are effectively using frequencies of the value that are generated as a replacement for an integration: frGary(x) is replaced by the frequency with which the Gary Beta distribution generates a value x, and FrJon(x) is replaced by the frequency with which the Jon Beta distribution generates a value less than x, and therefore produces a 1 from the IF function.

A more efficient approach

Bearing in mind the numerical integration equation equivalent of this analysis, we could make a model that is much more efficient:

Create a Beta(215+1,230-215+1) =Beta(216,16) distribution for Gary's success rate.

Then calculate the confidence that both Jon's and Frank's rate would be lower:

Cell X: =VoseBeta(216,16)

Cell Y: =VoseBetaProb(Cell X,32+1,34-32+1,TRUE)*VoseBetaProb(Cell X,16+1,17-16+1,TRUE)

Cell Z: =Mean(Cell Y) to be done with simulation software

The VoseBetaProb(...,TRUE) function returns the cdf for the Beta distribution for the other two lawyers' rates.

Running a simulation will give the confidence that Gary is better in Cell Z.

The model Lawyers.xls performs this alternative calculation. If you run a simulation you will see the results from both approaches are the same. However, this second approach is more efficient: meaning that it reaches the required value with a specified tolerance with fewer iterations than the original approach.

Exercise

Try repeating the exercise using a classical statistic estimate of the lawyers' rates. The answers will be a little different: why? How would you present the results to the client (the defendant)?

Example of using the estimate of binomial probabilities

Answer

Numerical integration

A more efficient approach

Exercise

Navigation