We've fresh tipsters for 2011, fresh Funds for 2011, so now we need fresh margin predictors for 2011.
This year, all of the margin predictors are based on models that produce probability forecasts, which includes the algorithms powering ProPred, WinPred and the Head-to-Head Fund and the "model" that is the TAB Sportsbet bookmaker. The process for creating the margin predictors was to let Eureqa loose on the historical data for seasons 2007 to 2010 to produce equations that fitted previous home team margins of victory as a function of these models' probabilities.
The more experience I gain in model-fitting the more deeply I appreciate the need to guard against overfitting, which manifests in models that fit well the data they've been modelled, but which perform poorly when called upon to forecast future, unseen results. As one guard against overfitting in the current exercise, I instructed Eureqa to fit candidate models using only a randomly-chosen 50% of the available data and then to select the best models based on how well they fit the remaining 50%. (I chose Mean Absolute Prediction Error as the measure of a model's accuracy, though I've also calculated the Median Absolute Prediction Error for all the 'winning' models.)
Eureqa provides modellers with another way to impose their individual tolerance for the risk of overfitting by providing a measure of the complexity of any model it produces on the basis that, the more complex the model, the higher the risk of overfitting.
An example might help make this more concrete, so first here's a simple model that Eureqa produced for generating margin predictions based only on the bookie's implicit home team probability of victory:
Predicted Home Team Margin = 25.7*ln(Prob/(1-Prob)), where ln stands for 'natural log' and Prob is the implicit Home Team probability of victory, which is Away Team's Price / (Home Team's Price + Away Team's Price)
That's a fairly simple model and Eureqa gives it a complexity score of just 3.
Another model that Eureqa produced was the following:
Predicted Home Team Margin = 2.2 + 17.7*(ln(Prob/(1-Prob))) + 2*Prob
Intuitively, it's reasonable to describe this as a more complex model. Eureqa gives it a complexity score of 9. If you're willing to accept the additional risk that this model, being more complex, is also more likely to have overfit the data, however slightly, your reward is a lower Mean Average Prediction Error.
In selecting the models to include in this blog and, therefore, to become margin predictors for 2011, I excluded any model with a complexity score greater than 10.
Eleven models made the cut and their vital statistics are presented below.
The labels are based on the underlying probability algorithm used in the equation that produced the plotted Mean and Median APE, and the number appended after the underscore is the complexity of the relevant equation. So, for example, the Bookie_3 result is the Mean and Median APE produced by the equation shown above when applied to all the results in seasons 2007 to 2010.
For the Combo model Eureqa had access to all the probability algorithms. In Combo_7 it chose to use only the bookie's probabilities and those from the Head-to-Head Unadjusted algorithm. (You might recall that, for wagering purposes, I adjust the raw probability outputs from the Head-to-Head model by ensuring that its predicted probability of victory for the Home team is never more than 25% greater than the bookmaker's probability and by setting the probability of victory for the Home team equal to the bookmaker's probability whenever the Home team is priced at over $5. These adjusted probabilities are "H2H_Adj" in the above chart; the unadjusted probabilities are "H2H_Unadj".)
In the chart, the closer a point is to the bottom left corner, the better the performance of the model it represents, since such a model has a lower Mean and Median APE. Combo_7, being nearest the y-axis, has the lowest Mean APE of all models shown here, and ProPred_7, being nearest the x-axis, has the lowest Median APE.
As always, models produced using the TAB Sportsbet bookmaker's opinions fare well. Bookie_9 does best, returning a Mean APE of 28.99 and a Median APE of 24.20. Here's a chart showing Bookie_9's and Bookie_3's predicted margins, overlayed with actual margins for every game from 2007 to 2010.
Bookie_3 is a little less conservative than Bookie_9 and predicts larger victories for the Home team when the TAB Sportsbet bookie's prices imply a high probability of a Home team victory, and larger losses when the opposite is the case.
Extremes of probability are relatively rare, however, so the correlation between actual and predicted margins is, at about +0.509, roughly the same for Bookie_3 and Bookie_9. (For Combo_7, the model with the lowest Mean APE, the correlation is only slightly higher at +0.516).
So, we now have the season's eleven margin predictors.