In the previous post we looked at some simple models for projecting the final score of an AFL game based solely on the scores at the quarter-time breaks. I said there that I'd revisit that modelling, incorporating pre-game market information, providing that I could source it in large enough volume.
The Australia Sports Betting site carries historical market and results data for a number of sports, AFL included, from which I was able to extract pre-game, opening and closing Pinnacle Overs/Unders information for just over 400 games across the 2013 to 2015 seasons. For those of you who might be less familiar with some of the wagering terminology, bettors are often interested in the price and total first posted by a bookmaker for an Overs/Unders market (the "opening" market), and the price and total available from a bookmaker just prior to the game commencing (the "closing" market). It's this information that I've extracted for the modelling.
Today we'll look at how incorporating such opening and closing data might improve the fit of our models to the actual final total score.
Fifteen models in total were fitted, five using the score information available at Quarter-Time, five more using the score information available at Half Time, and the last five using the score information available at Three-Quarter Time. Within each block of five models we'll try adding opening or closing bookmaker data. The coefficients of those fifteen models appear in the table below.
Looking firstly at the left-hand block of five models, the first is a replica of the model built in the previous blog, except now we're using only the information from the 411 games that we're considering today. This model explains 27.6% of the variability in final total scores for those 411 games and suggests that we should expect a total score equal to 121.2 plus 1.25 times the total score at Quarter Time.
In the second model we add the bookmaker's pre-game opening market total into the model. This model suggests that we should take 68% of that pre-game total - we can think of this as a Bayesian Prior of sorts - and add to that 4.6 points plus 1.16 times the total score at Quarter Time. This new model explains about another 6% of the variability in final total scores. Its superiority over the first model is further demonstrated in the residual quantiles at the foot of the table, which show that, in almost every case, the errors for a given quantile are smaller for the second model than for the first. The practical implication of this is that confidence intervals derived from the second model - intervals to which we can attach some probability of including the actual final total - are smaller than those derived from the first.
To understand what's been done for the third model, you need first to know that the Pinnacle bookmaker sometimes posts his market at other than even money prices. That means he doesn't feel that the probability a game will finish under or over his particular total is exactly 50%. We need to adjust the totals in these games by inferring what total he would think represented an even-money wager.
I've made this adjustment by assuming that final total scores are Normally distributed around some unbiased expected value with a standard deviation of 28 points for the opening market and 26 points for the closing market (these standard deviations were estimated empirically by comparing actual totals with expected totals for those games where even-money prices were posted). Using this assumption, in conjunction with the probability implicit in the bookmaker's prices, we can work from the Normal Cumulative Distribution Function to reverse-engineer a "fair" total. In general, these adjustments are very small, and never more than a couple of points.
For all that effort, we see practically no improvement in the third model compared to the second.
In the fourth and fifth models we use closing rather than opening data and clearly benefit from the greater proximity of this data to game time. Adding the closing total data unadjusted allows us to explain another 3% of variability in total scores, and adjusting this total when we don't have an even-money situation yields about another 0.5%.
Note that the "weighting" of the bookmaker total data in these two models (about 80 - 83%) is much higher than in the equivalent models using the opening data instead (68%), reflecting the greater information content of this closing data relative to opening data, adjusting for knowledge about the score at Quarter Time.
The second block of models can be interpreted similarly to the first block. We see that the inclusion of bookmaker data again improves the fit, but now by much less - the difference in the R-squared between the best and worst model now less than 5% points - and using the adjusted rather than the raw bookmaker data adds much less value.
And, finally, the third block of models can be interpreted in the same way and reveals an even smaller contribution from the bookmaker data and virtually no benefit from the adjustment process.
As a final observation, note how the relative weighting of in-game scoring and pre-game bookmaker opinion varies as the game progresses. At quarter time, our best model uses 83% of the pre-game closing adjusted total, at half time is uses 55%, and at three-quarter time it uses only 28%. Simply put, in seeking to project the final total, we should pay more attention to the actual score and less attention to the bookmaker's pre-game opinions as the game unfolds. Intuitively, that's pretty obvious, but the model gives us an idea of how much attention we should pay to each at different points in the game.