Imagine that you allowed me to ask you for just one piece of data about an upcoming AFL game. Armed with that single piece of data I contend that I will predict the margin of that game and, on average, be within 5 goals of the actual margin. Further, one-half of the time I'll be within 4 goals of the final margin and one-third of the time I'll be within 3 goals. What piece of data do you think I am going to ask you for?
I'll ask you for the bookies' price for the home team, true or notional, I'll plug that piece of data into this equation:
Predicted Margin = -49.17 + 96.31 x (1 / Home Team Price)
(A positive margin means that the Home Team is predicted to win, a negative margin that the Away Team is predicted to win. So, at a Home Team price of $1.95 the Home Team is predicted to win; at $1.96 the Away Team is predicted to squeak home.)
Over the period 2006 to 2009 this simple equation has performed as I described in the opening paragraph and explains 23.6% of the variability in the victory margins across games.
Here's a chart showing the performance of this equation across seasons 2006 to 2009.
The red line shows the margin predicted by the formula and the crosses show the actual results for each game. You can see that the crosses are fairly well described by the line, though the crosses are dense in the $1.50 to $2.00 range, so here's a chart showing only those games with a home team price of $4 or less.
How extraordinary to find a model so parsimonious yet so predictive. Those bookies do know a thing or two, don't they?
Now what if I was prohibited from asking you for any bookie-related data but, as a trade-off, was allowed two pieces of data rather than one? Well, then I'd be asking you for my MARS Ratings of the teams involved (though quite why you'd have my Ratings and I'd need to ask you for them spoils the narrative a mite).
The equation I'd use then would be the following:
Predicted Margin = -69.79 + 0.779 x MARS Rating of Home Team - 0.702 x MARS Rating of Away Team
Switching from the bookies' brains to my MARS' mindless maths makes surprisingly little difference. Indeed, depending on your criterion, the MARS Model might even be considered superior, your Honour.
The prosecution would point out that the MARS Model explains about 1.5% less of the overall variability in victory margins, but the case for the defence would counter that it predicts margins that are within 6 points of the actual margin over 15% of the time, more than 1.5% more often than the bookies' model does, and would also avow that the MARS model predictions are 6 goals or more different from the actual margin less often than are the predictions from the bookies' model.
So, if you're looking for a model that better fits the entire set of data, then percent of variability explained is your metric and the bookies' model is your winner. If, instead, you want a model that's more often very close to the true margin and less often very distant from it, then the MARS Model rules.
Once again we have a situation where a mathematical model, with no knowledge of player ins and outs, no knowledge of matchups or player form or player scandals, with nothing but a preternatural recollection of previous results, performs at a level around or even above that of an AFL-obsessed market-maker.
A concept often used in modelling is that of information. In the current context we can say that a bookie's home team price contains information about the likely victory margin. We can also say that my MARS ratings have information about likely victory margins too. One interesting question is does the bookie's price have essentially the same information as my MARS ratings or is there some additional information content in their combination?
To find out we fit a model using all three variables - the Home Team price, the Home Team MARS Rating, and the Away Team MARS Rating - and we find that all three variables are statistically significant at the 10% level. On that basis we can claim that all three variables contain some unique information that helps to explain a game's victory margin.
The model we get, which I'll call the Combined Model, is:
Predicted Margin = -115.63 + 67.02 / Home Team Price + 0.31 x MARS Rating of Home Team - 0.22 x MARS Rating of Away Team
A summary of this model and the two we covered earlier appears in the following table:
The Combined Model - the one that uses the bookie price and MARS ratings - explains over 24% of the variability in victory margin and has an average absolute prediction error of just 29.2 points. It produces these more accurate predictions not by being very close to the actual margin more often - in fact, it's within 6 points of the actual margin only about 13% of the time - but, instead, by being a long way from the actual margin less often.
Its margin prognostications are sufficiently accurate that, based on them, the winning team on handicap betting is identified a little over 53% of the time. Of course, it's one thing to fit a dataset that well and another thing entirely to convert that performance into profitable forecasts.