In the previous blog on this topic I posited that the Scoring Shot production of a team could be modelled as a Poisson random variable with some predetermined mean, and that the conversion of these Scoring Shots into Goals could be modelled as a BetaBinomial with fixed conversion probability and theta (a spread parameter).
Implementing this idea involved:
- determining sensible Home team and Away team Scoring Shot ranges over which to simulate random outcomes
- estimating a correlation coefficient between Home team and Away team Scoring Shot production. Ultimately, for the previous blog, I used the empirical correlation between Home and Away Scoring Shot production across all games (which I acknowledged was a fudge since we should really be using the conditional correlation coefficient, conditioned on the expected Scoring Shot production for the Home and the Away team in each game. Absent a model to estimate those expectations - a deficiency I'll remedy in this blog - there's no way to estimate this correlation.)
- estimating the parameters, separately, for the Home team's and the Away team's BetaBinomials for Goal Scoring
- calculating that Home team and Away team conversion rates across games were virtually uncorrelated
- generating random samples of games for all considered combinations of expected Home team and Away team Scoring Shot production, then usng the appropriate BetaBinomial to randomly convert those opportunities into Goals (and, by inference Scores)
What emerged from these simulations was that:
- The distribution of Game Margins (Home Score - Away Score) was virtually indistinguishable from a Normal Distribution across the entire range of mean Scoring Shot scenarios. The Skewness' and Kurtoses were consistent with that of a Normal Distribution (0 and +3, respectively), and QQ-plots revealed little divergence from the Normal except at the extreme tails.
- The standard deviation of each Distribution was generally in the 30 to 40 point range (and in a narrower range for more common mean Scoring scenarios), which is consistent with previous empirical analyses on MoS.
Since then I've been investigating models to explain Home team and Away team Scoring Shot production for a given game - in much the same way as I investigated models for Total Scoring in an earlier blog - as a function of regressors including:
- the team's recent scoring history
- the portion of the season in which the game was played
- the Venue
- the teams' Venue Experience
- the teams' MARS Ratings at the time
- the TAB Bookmaker's Implicit Home Team Probability (using the Risk-Equalising approach).
As part of these current investigations I've considered both the Poisson Distribution (ie the distribution I used in the previous blog) and the closely-related though slightly more flexible Negative Binomial Distribution, the major benefit of which in the current circumstance is that it removes the restriction imposed by assuming a Poisson Distribution that the conditional mean of Scoring Shot production is equal to the conditional variance. The Negative Binomial Distribution is equivalent to a Poisson Distribution where the mean is not fixed but is, instead, a random variable. In a footballing context we might think of the difference as the Poisson assuming that a team will generate Scoring Shots are some fixed rate throughout the game with no memory of the rate at which it has produced them earlier in the game, while the Negative Binomial allows the Scoring Shot production rate to vary throughout the course of the game.
I'll not report the details here, but the summary of the analysis is that the Negative Binomial fits Home team Scoring marginally less well than the Poisson (based on the AIC metric) but fits the Away team Scoring somewhat more substantially better than the Poisson. Home team Scoring Shot production appears to be at a more constant rate throughout games whereas Away team Scoring Shot production appears to be more variable. (We could test this by investigating the scoring sequences from a large enough sample of games. Yet another exercise for another day.)
But rather than postulate one statistical distribution for Home team Scoring and another for Away team Scoring I've opted to model both as Negative Binomials.
The modelling approach I've used for this blog then is as summarised in the inset above. The data I've used for modelling relates to the period 2007 to 2013 and includes all home-and-away games and Finals.
Ideally, I'd have fitted the Scoring Shot data as a bivariate Negative Binomial, but none of the packages I could find for R did this and reported a fitted covariance matrix. So, instead, I used the VGAM package's vglm function and fitted univariate Negative Binomials to the Home team and Away team Scoring separately, then estimated the conditional correlation between Home team and Away team Scoring by calculating the correlation between the two models' residuals.
The fitted models appear at right and they suggest that the expected Scoring Shot production of Home teams increases:
- the higher Rated they are
- the lower Rated their opponent
- the higher their chances of victory according to the TAB
- the more points they've scored in their four most-recent games
- the more points their current opponents (ie the Away team) have conceded in their four most-recent games
Further, relative to games at the MCG and controlling for all other factors in the model, Home team Scoring Shot production is lower at Cazaly's Stadium, Football Park and Manuka Oval. Also, relative to games played in the 1st third of the home-and-away portion of the season, Home teams tend to produce fewer scoring shots in the 2nd third of the home-and-away season, all other things being equal.
Similarly, Away teams tend to produce more Scoring Shots when:
- their Rating is higher
- their opponent's Rating is lower
- the Home team's chances of victory according to the TAB are lower
- they've scored more points in their four most-recent games
- their current opponents (ie the Home team) have conceded more points in their four most-recent games.
Also for Away teams, relative to games played at the MCG they tend to produce fewer Scoring Shots at Football Park, Kardinia Park and Subiaco. And, finally, relative to games played in the 1st third of the home-and-away portion of the season, Away teams tend to produce fewer scoring shots in the 2nd third of the home-and-away season and in Finals.
The correlation between the residuals for these two models - our estimate of the conditional correlation between Home team and Away team Scoring Shot production - came in at -0.24, which is a little lower than the -0.36 we used in the previous blog.
I also refitted the BetaBinomials for Home team Goals and Away team Goals, now using only the 2007 to 2013 data. This gave the following parameter estimates:
- Home team Conversion = 53%
- Home team Theta = 231.4
- Away team Conversion = 53%
- Away team Theta = 245.6
The larger value of Theta for Away teams reflects the greater variability in Conversion rates exhibited by them across games, which you can see in the density plots of Home team and Away team Conversion rates below.
SIMULATING GAME MARGINS
We now repeat the process of simulating games under different assumptions for expected Home team and Away team Scoring, using the NegativeBinomial assumption for Scoring Shot production, the earlier-estimated correlation between Home team and Away team Scoring, and the new parameters for the BetaBinomials to convert Scoring Shots into Goals.
(In passing I'll note that I used a handy trick to generate the correlated bivariate NegativeBinomial Scoring Shot data, the core code for which is summarised below:
The results of these new simulations are very similar to those from the previous blog, as evidenced by the tables for the Standard Deviation of Game Margins and the Probability of a Draw.
Wagering on a draw at a price of anything less than about $70 remains a practice to be discouraged based on the statistical evidence.
Speaking of wagering, we can also use the simulations to estimate the probability of a Home team victory for any given Scoring Shot scenario.
Reviewing this table gave me a new perspective on the influence of randomness on AFL game outcomes. Just a couple of additional Scoring Shots can add as much as 10% points to a team's victory chances. Consider for example a Home team moving from 25 expected Scoring Shots to 27 - say as the result of a few fortuitous umpiring decisions or other pieces of luck - while the Away team remained at 25 expected Scoring Shots. In that scenario, the Home team's victory probability climbs from 50% to 58%.
Varying the assumption about the correlation between Home team and Away team Scoring Shot production reveals that weaker teams do better when this correlation is smaller (ie more negative).
For example, from the table at right we see that when the Home team's expected Scoring Shot production is 28 and the Away team's is 32, the Home team is expected to win 33% of the time if there's zero correlation between Scoring Shot production, but 37% of the time if the correlation is -0.5.
The table also demonstrates that, the larger the difference in expected Scoring Shot production, the bigger the benefit for the weaker team from a more-negative correlation between the teams' Scoring Shot production.
The intuition of this finding is that, the weaker you are as a team, the more you'll benefit from your own goal- scoring tending to dampen the likelihood of a reply from your stronger opponent.
INFERRING OVERROUND IN THE HEAD-TO-HEAD MARKETS
Another frequent topic on MoS has been that of estimating the overround levied on the two team prices in the Head-to-Head market (for example, this blog, or this blog, or this blog ... I could go on). The model described above allows us to estimate directly the levied overround by:
- Assuming that the Bookmaker in question behaves as if he or she subscribes to model's assumptions (ie that Scoring Shot production can be modelled as a NegativeBinomial and Goal scoring as a BetaBinomial, and that the parameters I've estimated for the various correlations, conversion rates and thetas are sufficiently accurate to be useful)
- Combining his or her contemporaneous opinions as expressed in the Head-to-Head, Line and Over/Under markets
We've talked often here about the first two of those markets, but not about the last. The Over/Under market is one in which the Bookmaker sets a total score for a given game which he or she thinks is likely to be exceeded with probability of about 50%. For example, this week the Bet365 site has the Over/Under for the Collingwood v Hawthorn game set at 191.5 points, meaning that he or she thinks it about equally likely that the total score for this game will be less than or greater than 191.5 points.
By combining the Line market handicap and the Over/Under points score - provided both markets are at even-money prices - we can infer the Bookmaker's expectations about the actual scores for each team.
Take, for example, the Collingwood v Hawthorn game again where we find that the Bet365 site has the Line market as Collingwood +48.5 (with the same price for Collingwood as for Hawthorn). If the total score is expected to be 191.5 points and Collingwood are expected to score 48.5 points fewer than Hawthorn, then the implied final score is Collingwood 71.5 - Hawthorn 120.
We can convert these scores into expected Scoring Shot production for each team by assuming that both are expected to convert their opportunities at 53%, which yields expected Scoring Shots for Collingwood of 19.6 and for Hawthorn of 32.9. These Scoring Shot expectations can be fed into the simulation model and used to estimate a victory probability for the two teams via random simulation. These estimated probabilities, when combined with the current Head-to-Head prices, allow us to estimate the overround in those prices.
I've followed that procedure using the Bet365 markets as at early on Wednesday morning to produce the following table:
Overround calculations are extremely sensitive to probability estimates, especially for those near 0 and 1. Accordingly, I've provided standard errors for the probability estimates in the table above and I've used 100,000 simulations to create these probability estimates for games where the Home team probability is in the 15% to 85% range, and 1,000,000 simulations otherwise.
Concentrating on the Estimated Overround columns we can see that:
- Favourites (whose figures are shaded) are estimated as carrying less overround in six of the nine games.
- The range of estimates of overround on favourites across the nine games is relatively narrow - from about +3% to +6%. That on underdogs is much broader - from -1% to +21%
Undertaking the same exercise on the TAB markets from about 12 hours later produced the following:
These markets are even kinder to favourites, imposing less overround on their prices in every contest except Geelong v Brisbane (and maybe Fremantle v Port Adelaide, though the balance of evidence suggests a slightly greater overround imposition on Port Adelaide).
My long-standing aversion to wagering on Away teams might not, it seems, be down to the fact that all Home team prices carry less overround but, instead, the fact that Home teams are more often favourites, and favourites receive the more favourable treatment from Bookmakers. In the dataset we've been using for this blog, for example, Home teams are outright favourites 58% of the time and equal favourites 1% of the time.
But that too is a topic for another day - and a larger dataset.