The Hypergeometric Process | Vose Software

The Hypergeometric Process

See also: Stochastic processes introduction, The binomial process, The Poisson process

Description

The hypergeometric process occurs when one is sampling randomly without replacement from some population (as opposed to sampling with replacement in the Binomial Process), and where one is counting the number in that sample that have some particular characteristic. This is a very common type of scenario. For example, population surveys, herd testing, and lotto are all hypergeometric processes. In many situations, the population is very large in comparison to the sample and we can assume that if a sample was put back into the population, the probability is very small that it would be picked again. In that case, each sample would have the same probability of picking an individual with a particular characteristic: in other words this becomes a binomial process. When the population is not very large compared to the sample (a good rule is that the population is less than ten times the size of the sample) we cannot make a binomial approximation to the hypergeometric. This section discusses the distributions associated with the hypergeometric process.

 

The figure above demonstrates the four parameters of the Hypergeometric process: The population one is sampling from (M); the sub-population of interest (D), the number being randomly sampled from the population (n) and the number (s) in that sample that come from D. We recommend that you draw out a diagram like this when you are faced with a hypergeometric problem to keep that all clear!

  1. Number in a sample with a particular characteristic

  2. Number of samples to get a specific s

  3. Number of samples that were taken to have observed a specific s

  4. Estimate of population and sub-population sizes

Summary of results for the hypergeometric process

Quantity

Formula

Notes

Number of sub-population in the sample

s = VoseHypergeo(n,D,M)

 

Number of samples to observe s from the sub-population

n = s + VoseInvHypergeo(s,D,M)

 

Number of samples there were to have observed s from the sub-population

n = s + VoseInvHypergeo(s,D,M)

Where the last sample is known to have been from the sub-population

Number of samples n there were before having observed s from the sub-population

Where the last sample is not known to have been from the sub-population. This uncertainty distribution needs to be normalized.

Size of sub-population D

D = VoseHypergeoD(s,n,M)

This uncertainty distribution needs to be normalized.

Size of population M

M = VoseHypergeoD(s,n,D,max)

A maximum upper limit has to be placed on the possible range of values for M

 

 

Read on: Mixture processes

 

Navigation