Time series models with leading indicators

Leading indicators are variables whose movement has some relationship to the movement of the variable you are actually interested in.

The leading indicator may move in the same direction as the variable of interest:

or in the opposite direction:

In order to evaluate the leading indicator relationship, you will have to determine:

The causal relationship.
The quantitative nature of the relationship.

The causal relationship is critical. It gives a plausible argument for why the movement in the leading indicator should in some way presage the movement of the variable of interest. It will be very easy to find apparent leading indicator patterns if you try out enough variables, but if you can't logically argue why there should be any relationship (preferably make the argument before you do the analysis on the potential indicator variable, it's always easy to argue something after the fact!), it's likely that the observed relationship is spurious.

The quantitative nature of the relationship should come from a mixture of analysis of historic data, and practical thinking. Some leading indicators will have a cumulative effect over time (e.g. rainfall as an indicator of the water available for use at a hydro-electric plant) and so need to be summed or averaged. Other leading indicators may have a shorter response time to the same, perhaps unmeasurable, causal variable as the variable you are interested in (if the causal variable was measurable, you would use that as the leading indicator instead), and so your variable may exhibit the same pattern with a time lag.

The analysis of historic data to determine the leading indicator relationship will depend largely on the type of causal relationship. Linear regression is one possible method, where one regresses historic values of the variable of interest against the lead indicator values, with either a specific lag time if that can be causally deduced, or with a varying lag time to produce the greatest r-squared fit if one is estimating the lag time. Note that any forecast can only be made a distance into the future equal to the lag time: otherwise one needs to make a forecast of the lead indicator too.

The example model Leading_indicator provides a fairly simply example in which the historic data (used to create the first figure above) of the variable of interest Y are compared visually with lead indicator X data for different lag periods. The closest pattern match occurs for a lag d of 11 periods:

A scatter plot of Y(t) against X(t-11) shows a strong linear relationship, so a least squares regression seems appropriate:

The regression parameters are:

Slope: = 0.04555

Intercept: = -0.01782

SteYX: = 0.1635

Note that we could use the Linear regression parametric Bootstrap (see type A) to give us uncertainty about these parameters if we wished.

The resultant model is then:

Y(i)=Normal(0.04555*X(i-11)-0.01782,0.1635)

which we can use to predict {Y(1)...Y(11)}:

Read on: Time series projection of events occurring randomly in time

Time series models with leading indicators

Navigation