Nearest neighbor distance
An example of a Monte Carlo simulation risk analysis model for the Statistical Modeling
The probability of a risk event occurring is often a function of the distance to the hazard, which itself is usually a random variable. For example: the risk that a house will be consumed by a wildfire within some period is related to the distance of the closest fire that occurs within that period. This example model shows how to determine the probability distribution of the nearest neighbor distance.
The risk analysis model is built using Excel with our Monte Carlo simulation add-in called ModelRisk. It uses the following formatting convention:
This example models the distance from a house to the nearest fire. The scatter plot depicts the idea – hit the F9 key on your computer to see it change.
We will assume that fires occur independently, and randomly in time and space at a frequency of 0.032 fires per square km per year.
The model is based on simulating the location of a number of wildfires on a square in the middle of which stands the house. The mean number of fires is set to 200, a sufficient size for the model to give quite precise results, and the dimensions of the square is calculated to give this mean number of fires given the expected frequency.
On average there are 200 fires, but since these occur randomly the actual number follows a Poisson(200) distribution. Cell C11 calculates how many fires need to be simulated to be 99.9% sure that the table is long enough using the function VosePoisson(200,99.9%), which returns 245:
The model takes about 4 seconds to run 5,000 samples. It is set up to directly show two reports within the ModelRisk ResultsViewer at the end of the simulation, which are described below.
The first tab shows a histogram plot of the probability distribution of the distance to the nearest wildfire:
In fact, it turns out that using probability theory one can derive the exact probability distribution for this distance. A given distance x from the house equates to an area of πx2, within which there is expected to be λπx2 fires, where λ is the mean number of fires per unit area per year (in this case 0.032 fires/km2/year). From the Poisson distribution, the probability of there being no fires within that area is given by:
The probability that there is at least one fire is then:
which is the same as the probability that the distance to the nearest fire is less than x, i.e. the cumulative probability distribution for x. Comparing this equation with the cumulative probability distribution for the Rayleigh distribution:
We can see that these are identical if:
The Rayleigh distribution is entered in Cell C16. The second tab compares the cumulative distributions for both methods and show that they match:
Whilst both models give the same result, the Rayleigh distribution approach is better for several reasons:
- It does not require building a model with long columns of calculations – the length of which need to be adjusted if the mean number of fires were larger;
- It is much faster – simulating just one cell instead of hundreds; and
- We could directly calculate any value on the cumulative probability curve – for example, the probability that the distance is less than x is given by VoseRayleighProb(x,b,1) and the distance we are 90% sure the nearest fire will be further than is given by VoseRayleigh(b,1-0.9).
ModelRisk can be used as a tool for probability modeling, not just Monte Carlo simulation. If you are familiar with probability theory, ModelRisk offers hundreds of functions for probability calculations.