To what extent can the head-to-head prices set by the TAB Sportsbet Bookmaker in 2011 be modelled using only the competing teams' MAFL MARS Ratings, their respective Venue Experiences, and the Interstate Status of the fixture?
That's the question that motivated that blog and the first step was to use R's incredibly versatile non-linear least squares function, nls, to fit the following logistic model to the TAB Sportsbet bookmaker's implicit probabilities:
Implicit Home Team Probability = 1-1/(1+exp(a1 + a2 x Own MARS Rating + a3 x Opponent MARS Rating + a4 x Own Venue Experience + a5 x Opponent Venue Experience + a6 x Interstate Status))
The choice of the logit-style functional form is to ensure that fitted values are, as are probabilities, constrained to the 0-1 range. Note that I don't have a 0/1 target variable here, so a binary logit is unsuitable.
This model, once fitted, does a good job of explaining the bookmaker's implicit probabilities - it explains about 87% of their variability and has, across the 163 games, an average absolute error of about 6.6% points.
All model coefficients are statistically significant at the 5% level or higher and their signs are what we'd expect. Higher MARS Ratings and greater Venue Experience for the Home team are associated with higher fitted values for the probability of a home team victory, as are games where the Home team is playing in its home state and its opponents are not. Conversely, higher MARS Ratings and greater Venue Experience for the Away team reduces the Home team's fitted victory probability.
An analysis of the residuals from this model reveals strong team-specific effects. For example, for games in which Essendon, Hawthorn or the Gold Coast are the Home team, the average absolute errors are 10% points or more. This is also true for games in which Essendon or the Gold Coast are the Away teams.
Why might team-specific covariates be required in the model? Well, it's conceivable - I'd even venture likely - that the MARS MAFL Ratings for some teams have differed from the TAB Sportsbet bookmaker's assessment of these same teams at the same point in time. Alternatively, it might be that the bookmaker tweaks the market prices of particular teams to reflect things other than their current form and the venue at which they are playing - perhaps the strength of the wagering support for them or changes in key personnel for the game. In either case, the market price, and hence implicit probability, for particular teams will differ from what might be expected based on their assessed relative strength and where they're playing. Team-specific covariates will cater for these effects.
So, I refitted the model above incorporating 32 dummy variables, 16 to reflect which team was playing at home, and 16 more to reflect which team was playing away (dummies for the 17th team were omitted to prevent perfect collinearity in the regressors).
In this new model a number of coefficients were not statistically significant - the two Venue Experience variables and 15 of the team-specific dummy variables - so I removed them, producing a model that explains over 96% of the variability in the TAB Sportsbet bookmaker's implicit probabilities for the home teams. It uses 21 parameters: an intercept, each teams' MARS Rating, the Interstate Status variable, 8 dummy variables reflecting whether or not 8 particular teams were playing at home in the relevant game, and 9 more dummy variables reflecting whether or not 9 particular teams were playing away in the relevant game.
The coefficients of this model appear in the table at left. Signs for the MARS Rating and Interstate Status coefficients remain logical and their values remain highly significant.
Our new model has team-specific effects for 11 of the teams. Take Essendon, for example. The positive coefficient for Essendon (Home) means that, if they are the Home team, the bookmaker rates their chances of victory higher than their relative MARS Ratings and the fact that the clash is an interstate clash or not, would suggest. The negative coefficient for Essendon (Away) means that, if they are the Away team, the bookmaker rates their opponent's chances of victory lower than their relative MARS Ratings and the fact that the clash is an interstate clash or not, would suggest.
As another example, consider the coefficients related to the Gold Coast. Since Gold Coast (Home) carries a negative coefficient and Gold Coast (Away) a positive coefficient, this suggests that the presence of Gold Coast on the park is felt by the bookmaker to enhance the prospects of their opponents, whoever they are and wherever they're playing.
An example might help to explain how this model can be used to predict the bookmaker's pricing. Consider the Western Bulldogs v Essendon clash of Round 21. Using the model, we'd expect that the Dogs' implicit probability would be:
Predicted Implicit Probability = 1-1/(1+exp(-3.792047 + 0.038189 x 1,001.561 - 0.034321 x 988.243 + 0.507 x 0 - 0.644)), or about 47.4%
(The Dogs' pre-game MARS Rating was 1,001.561 and Essendon's was 988.243. Also, the match was not an Interstate Clash since the Dogs and Essendon were playing in their home state. Finally, the -0.644 reflects the fact that Essendon were the Away team in this clash.)
The Dogs' actual implicit probability was 1.75/(2 + 1.75) or 46.7% so, for this game, the actual and fitted probabilities were quite similar.
Indeed, across all 163 games, the largest discrepancy between the fitted and actual home team probability was just 12.6% points, and the average absolute difference was just 3.8% points, or less than 60% of what it was for the earlier model without team-specific effects.
The chart at right shows the actual and predicted home team probabilities for all 163 games; the quality of the fit is apparent.
Seventeen dummy variables capturing team-specific effects made it into the final model. As I alluded to earlier, the need for these dummy variables might reflect one or both of the following:
- a deliberate strategy on the part of the TAB Sportsbet bookmaker to price contests involving these teams differently, assuming that his assessment of relative team strengths is similar to that relected in the respective MARS Ratings
- a persistent difference in the bookmaker's assessment of team strengths as compared to those encapsulated by MAFL MARS Ratings.
One way to determine which of these explanations is better supported by the data is to consider the probability scoring performance of the actual home team probabilities and those fitted using the earlier model without the team-specific variables. If the TAB bookmaker has been tweaking his home team probabilities for reasons other than form and venue (ie explanation 1 is supported) then we'd expect his probability score would suffer relative to the fitted probabilities from the earlier model, which excludes team-specific effects. Conversely, if he's been making better assessments of relative team strengths (ie explanation 2 is supported) then we'd expect the probability score of his probability assessments would exceed those of the fitted probabilities from the earlier model.
It's a ridiculously close-run thing, but the fitted probabilities from the model without team-specific effects produces a marginally higher probability score: 0.2782 to 0.2777. I'd call that an effective numeric draw.
Another test of the relative merits of the competing explanations is to imagine that we Kelly-staked on the basis of the fitted probabilities from the model without team-specific effects. If this strategy is profitable then explanation 1 is favoured, otherwise explanation 2 is favoured. It turns out that such Kelly-staking yields a +24% ROI, but all of it - and a bit more - stems from wagers on Gold Coast alone. Without these wagers the ROI is -13%.
If we look at the ROIs from wagering using the model without team-specific effects only on those teams and at those venues for which team-specific covariates appear in the later model we find that:
- ROIs are positive for wagering on Gold Coast, Hawthorn and Richmond when they're at home
- ROIs are negative for wagering on the Lions, Fremantle and the Dogs when they're at home
- ROIs are positive for wagering on Carlton, Collingwood and the Gold Coast when they're away
- ROIs are negative for wagering on Fremantle, Richmond, St Kilda and West Coast when they're away
- No wagers are made on Essendon at home or away, none on Hawthorn when they're playing away, and none on West Coast when they're playing at home
On balance then I'd conclude that the TAB Sportsbet bookmaker's adjustments for the Lions, Fremantle, St Kilda, West Coast and the Western Bulldogs meant that his probability assessments were superior to those of the fitted model with no team-specific effects, and that those for Carlton, Collingwood, the Gold Coast and Hawthorn meant that his assessments were inferior. For Richmond, playing at home his adjustments are inferior, but playing away they're superior.
I'd also conclude that you can do a surprisingly good job of modelling the TAB Sportsbet bookmaker using only MARS Ratings, Interstate Status and Venue Experience.