A few weeks ago, I wrote a piece describing the construction of an in-running model for the final margin of an AFL game. Today, I'm going to use the same data set (viz, score progression data from the www.afltables.com website, covering every score in every AFL game from 2008 to 2016) to construct a different in-running model, this one to project the final total score.
As well as using the same dataset, I'll also use the same statistical algorithm (quantile regression), and a somewhat similar set of regressors and functional form.
Specifically, the variables I'll use are:
- GameFraction - the proportion of the game completed
- CurrentHomeScore - the home team score at a particular game fraction
- CurrentAwayScore - the away team score at a particular game fraction
- CurrentTotalScore - CurrentHomeScore + CurrentAwayScore
- PreGameExpectedScore - the total score predicted, pre-game using the MoSHBODS Team Rating System
- HomeTeamSSRun - the number of consecutive scoring shots registered by the home team as at a particular game fraction (this variable will be zero if the Away team scored last)
- AwayTeamSSRun - the number of consecutive scoring shots registered by the away team as at a particular game fraction (this variable will be zero if the Home team scored last)
The quantile regression algorithm returns a separate regression output for each of the quantiles for which output is requested. I requested outputs for the 5th, 25th, 50th, 75th, and 95th percentiles.
The equation for the 50th percentile - which we can treat as our model for the "most likely" final total score - appears in the diagram at right.
(Producing it has given me a new-found disdain for Latex.)
It comes from fitting a model to 99 points in each game, these relating to the situation after 1%, 2%, 3%, 4%, and so on up to 99% of the game has been completed. Each quarter of a game is deemed to represent 25% of the entire contest, with points then equally-spaced within each quarter. With this approach, in a typical game we are sampling the score at intervals of roughly 30-40 seconds.
The equation tells us how to project the final total score for a game after a proportion GameFraction of it has been completed, given that we know the Home and Away team scores at that point, the length of the Home or Away team's Scoring Shot run at that point, and MoSHBODS' pre-game prediction.
Because we multiply each of the terms in the equation by (1-GameFraction) raised to some power, a term's influence diminishes as the game progresses if the exponent on (1-GameFraction) is positive, and increases as the game progresses if that exponent is negative. In this equation, only the CurrentTotalScore term carries a negative sign, and a very small one at that, which, along with a coefficient near 1, ensures that the output of the equation tends to equal the CurrentTotalScore as we near the end of the game. Which is, of course, what we'd want.
Below is an example output for all fitted percentiles for the Collingwood v Gold Coast game from late in 2016.
(Please click on it to access a larger version).
In the chart we map the fitted values for all five of the percentiles for which an equation was fitted (5th, 25th, 50th, 75th and 95th) across the entire game, and overlay the MoSHBODS pre-game expectation of the total score (blue line), the actual final total score (black line), and a projected score, calculated simply by dividing the current total score by the fraction of the game played (green dotted-line). We extrapolate the final score only from quarter-time onwards because extrapolations for earlier points in the game tend to be highly variable (for example, if a goal were scored in the first 30 seconds, the extrapolated final total would be 600 points.)
The coefficients in the regression were chosen to minimise the Akaike Information Criterion (AIC) for the model fitted to the 50th percentile. In this limited sense then, the model is relatively good. That optimised AIC value reveals nothing, however, about the practical utility of that model as an estimator of the 50th percentile.
One way that we can measure the model's performance in a useful and intuitive way is to proceed as we did in the previous blog and estimate the model's calibration. If the model for the 50th percentile is well-calibrated, we'd expect the actual final total score to fall below it 50% of the time, and to rise above it 50% of the time. We can perform similar calculations for the models for the four other percentiles.
If we do that, and consider the results for each of the quarters separately, we obtain the results as shown at left.
A perfectly-calibrated model for the Xth percentile would provide estimates such that the final total score would fall below them exactly X% of the time across some sufficiently large set of estimates. We see that all five models seem well-calibrated across all four quarters. In particular, the models for the 5th, 50th and 95th percentiles appear to be especially good.
We can use also this approach to estimate the models' calibration for each of the 1,786 games in the sample. For this purpose we calculate the proportion of times during the game that the 5th percentile model provided a projection of the final total that was above the actual final total, and then do the same for the four other percentiles. Lastly, we add the absolute differences between these proportions and the relevant percentage. So, for example, if the 5th percentile model provided projections that were too high 6% of the time in a game, its contribution to the sum would be 1% point since it would, ideally, have provided such projections only 5% of the time. Similarly, if the 50th percentile model provided estimates that were too high 55% of the time, its contribution would be 5% points.
Proceeding in this way allows us to rank the models' performance on every game and identify the games for which calibration was poorest.
The chart below is for the game where calibration was poorest of all.
It's not difficult to see why the models struggled with this game: an 83-point final term after three quarters averaging just 48 points each.
By way of contrast, the chart below is the game where calibration was highest, there being just enough variability in the scoring rate to bounce the model estimates around sufficiently for them to all be too high in almost exactly the right proportions - a "Goldilocks" game, if you like.
Note in particular how much better the 50th percentile model performed than naively projecting the final total based on the current scoring rate. The game produced only 6 goals in the first half - which was well-below MoSHBODS' pre-game expectation - but 16 in the second. The 50th percentile model reacted well to the lower-than-expected scoring, though it did overshoot a little.
There's one final performance metric worth estimating for the 50th percentile model and that is its mean absolute error (MAE) relative to that of naive extrapolation of the score. To calculate this we take the 99 estimates provided by the 50th percentile model for a game and then sum the absolute difference between each of these and the actual final total score. We do this for the 50th percentile model for all 1,786 games and then take the average.
Next we perform the same set of calculations for our naive model for which final score projections are made at the same 99 points by dividing the actual total score at that point by the fraction of the game completed. So, for example, if 47 points had been scored to quarter time, the naive model projection of the total would be 47/0.25 = 188 points.
If you look back at either of the model output charts above, what we're calculating for the 50th percentile model is the average distance between the heavy, jagged black line and the straight black line of the actual final total score. For the naive model we're measuring the average distance between the green dotted-line and the straight black line.
Doing this, separately for projections made in each quarter, and excluding those made in the 1st quarter altogether, yields the results in the table shown at left.
We see two things here. Firstly that, as we'd expect, the size of the absolute errors of both models declines as the game gets nearer to concluding.
Secondly, and more importantly, we see that the 50th percentile model outpredicts the naive model in all three quarters, though by progressively smaller amounts in each quarter.
We can see the diminishing superiority even more clearly if we plot the mean absolute errors of the two models at each value of GameFraction from 25% to 99%, which is what we've done in the chart at right.
One interesting feature of this chart is the behaviour of the average absolute errors for the two models across time. The Naive model follows three distinctive trajectories, declining steeply from the 25% point to about the 35% point, then more slowly until about three-quarter time, then more rapidly again across the final quarter.
The average absolute errors for the 50th percentile model decline at a fairly constant (and lower) rate until just after three-quarter time, and then match the trajectory of the errors for the naive model.
What this flags is variability in the actual scoring rate across the course of a typical AFL game. This variability "fools" the naive model, but is relatively effectively embedded in the 50th percentile model.