Time series projection of events occurring randomly in time
An example of a Monte Carlo simulation risk analysis model for forecasting
Minimum software requirements: ModelRisk Basic edition
Technical difficulty: 2
Many things we are concerned about occur randomly in time: people arriving at a queue (customers, emergency patients, telephone calls into a centre, etc); accidents, natural disasters, shocks to a market, terrorist attacks, etc. Naturally we may want to model these over time, perhaps to figure out whether we will have enough stock vaccine, storage space, etc.
The natural contender for modelling random events is the Poisson distribution which takes one parameter λ and models the number of random events occurring in a unit of time given that, on average, there should be λ events in a unit of time. Often, this expected (mean) number of events λ will increase or decrease over time so we make λ a function of time. For example, we could use the following equation:
The example model Poisson_random_walk 1 illustrates this idea:
This model can be used, for example, to describe vehicle accident claims made to an insurance company, or cases of a disease for a health authority: as the number of cars increases, the number of car crashes increases correspondingly according to some function; as the pollution level in a city increases, the number of people with respiratory disease increases.
The fractional variation of the series is much bigger in the top panel than that in the bottom panel. This is because the standard deviation of Poisson(λ) counts equals √λ. Thus, the coefficient of variance (std.dev./mean) is 1/√λ. which gets smaller as λ gets bigger, meaning that the larger the expected number of events, the smaller the fractional variation one would observe. This property of a Poisson process is very useful to insurance companies: the more people they cover, the more stable their liabilities become, and the less margin they need to cover themselves at a certain risk level... an example of when big is actually better.
The equation S(t)=Poisson(m*t+c) has some limitations in that if m is negative then after time t = -c/m the equation will produce negative (i.e. impossible) values for the Poisson mean. If one is approaching such a situation it is worth considering the following equation, which is the basis of Poisson regression techniques:
A variation of this model is to take account of seasonality by multiplying the expected number of events by seasonal indices (which should average to 1).
Seasonality for lambdaImagine that an insurance company needs to create a risk analysis model of the number of car crashes that will occur in the country in the next 52 weeks. A reasonable assumption (which can be checked by analyzing the historic data) is that the number of car crashes n(t) over a period of time follows a Poisson process, i.e. each car crash is independent of any other. This is, of course, not exactly true since many of the car crashes involve at least two cars, and sometimes more than 10, but probably not from the same insurance company. Here we will neglect this small approximation, so:
n(t) = Poisson(λ(t))
The Poisson intensity parameter - λ(t) - is the mean, or expected, number of events per unit time. In this model it is not constant throughout the year because of two factors:
- The number of crashes depends on the number of cars in the country. Let's assume that the number of cars in the country will grow within a period of one year by 15%. And since the correlation between the two parameters is probably not perfect, the number of car crashes is expected to increase by 10% over the same period.
- The seasonality factor - the number of car crashes increases in the winter season due to several reasons like slippery roads and low visibility, and with certain yearly events like summer holidays, Christmas, etc. Seasonality is a repeated underlying pattern (perhaps disguised by overlying randomness) from one year to the next.
(t) = f(t) * Si
where f(t) is a trend function and Si is a seasonality factor for period i.
The model Poisson_series 2 shows an example of the above technique:
Including other factorsThe Poisson intensity parameter may also include other factors - in fact, as many factors as needed in order to give a fair estimate to the mean number of events over a period of time. For example, if the same insurance company was to model the number of old people deaths in transition-economy country X, λ(t) might consist of the following factors:
- The trend factor, which is influenced by the changes in the population size and improvement of medical care;
- The seasonality factor. The old people tend to die more often in hot and cold seasons, and less in other seasons; and
- The economic factor. As Country X is going through economic hardships, many old people are affected by instability in the country and their death can be caused by factors like: stress, cold (as they are not able to pay for central heating), malnutrition.
This model Seasonal_Poisson_random_walk 5 provides an example:
Using a PolyaThe Pólya and Delaporte distributions are counting distributions that are similar to the Poisson but allow λ to be a random variable too, which has the effect of increasing the amount of variation around the mean. The Pólya is particularly useful because with one extra parameter, h, we can add some volatility to the expected number of events, as shown in the model Polya_time_series 4 3:
Notice the much greater peaks in the plot for this model compared to that of the previous model. Mixing a Poisson with a Gamma distribution to create the Pólya is a helpful tool because we can get the likelihood function directly from the pmf of the Pólya and therefore fit to historical data. If the MLE value for h is very small then the Poisson model will be as good a fit and has one less parameter to estimate, so the Pólya model is a useful first test.
Pólya regression modelThe linear equation used in the above two models for giving an approximate description of the relationship of the expected number with time is often quite convenient, but one needs to be careful because a negative slope will ultimately produce a negative expected value, which is clearly nonsensical (which is why it is good practice to plot the expected value together with the modelled counts as shown in the two figures above). As discussed above, the more correct Poisson regression model considers the log of the expected value of the number of counts to be a linear function of time:
The Polya regression model fits a Pólya regression to data (year <=0) and forecasts the next three years of annual sports accidents:
Note that the λ0, λ1 and h parameters are determined by using Excel’s Solver:
Click on the button below to download the Excel model. ModelRisk needs to be installed in order for the model to work.