In my earlier posts on statistically modelling team scoring (see here and here) I treated Scoring Shot conversion as a phenomenon best represented by the Beta Binomial distribution and proceeded to empirically estimate the parameters for two such distributions, one to model the Home team conversion process and the other to model Away team conversion. The realised conversion rates for the Home team and for the Away team in any particular game were assumed to be random, independent draws from these two fixed distributions.
It's possible though - and many seasons of following AFL would suggest it's maybe even likely - that we could identify covariates of a team's expected Scoring Shot conversion rate for a given game. For example, we might reasonably expect an autoregressive component in conversion rates in that teams with a history of better-than-average Scoring Shot conversion would be considered more likely to continue that behaviour into the future. As well, a game's venue, the teams' experience at that venue, and the relative strengths of the competing teams could all legitimately be put forward as candidate covariates of Shot conversion in a particular game.
How well these covariates explain variability in team conversion rates is the sole focus of this blog post.
Since I plan to use the TAB Bookmaker implicit probabilities as one input, I've restricted my analysis to the period commencing in 2006 and running to the end of Round 25 in 2014, which is the period for which I've hand-recorded Bookmaker prices. That window of time affords me a potential sample of 1,740 games, though I lose a handful of games in accommodating lagged historical conversion rates in the modelling.
The variables used in the modelling were:
- The Home team and Away team conversion rates in a given game, defined as Goals Scored / (Goals Scored + Behinds Scored). These variables were used as the target variables of the regression models. One regression model was fitted to the Home team conversion data and another was fitted to the Away team conversion data.
- Game Venue. Venues where fewer than 20 games were played across the period were grouped into an "Other" category. The reference category was defined as the Adelaide Oval on no more sophisticated a basis than alphabetic superiority.
- Home and Away team Venue Experience, defined as the number of games played at the same venue in the preceding 12 months.
- Implicit Home Team Victory Probability, derived from the TAB Bookmaker's prices in the head-to-head market, and employing the Risk-Equalising approach to transform those prices into probabilities.
- Home and Away team MARS Ratings.
- Home and Away teams' historical conversion rates, averaged over the past 2, 3, 4, 5, 6, 7 and 8 games. For the purposes of calculating these averages, games from the previous season can be included as necessary.
(Team names were used as the sole regressors in a second round of modelling, discussed below.)
For the regular season games, Home team status is determined by the AFL's designation and, as has become the custom on MoS, Home team status in Finals is attached to the team with the higher MARS Rating going into the contest.
Before moving onto the regression modelling of the Home and Away team conversion rates, here's a plot of their density and a tabular summary of some key distributional characteristics over the period.
The two distributions are broadly similar, though Home team conversion rates span a narrower range, have a lower mean, higher median and smaller standard deviation than Away team conversion rates. In common is the fact that only 25% of Home and 25% of Away teams convert 6 or more Scoring Shots in 10 into goals in the course of a typical game.
We'll be using the betareg package in R to fit the two regression models. It provides a variety of link functions for the mean and caters for the possibility of overdispersion by allowing the precision parameter, phi, to be modelled, like the mean, as a function of specified covariates. I used the same covariates to model phi as I used to model the (transformed) mean.
Though the choice of link function turned out to have relatively little effect on the fit of the models, a log link provided a moderately superior outcome, so this was used in preference to the default logit link.
As well, the choice of the historical conversion rate variable - that is, whether to use the last 2, 3, 4 or more games in calculating a team's recent "accuracy" - made little difference to the model fit. Ultimately, I settled on using the previous 5 games for the Home and for the Away team for this purpose.
The two fitted models appear below and what is immediately striking about them is the lack of statistical significance of the coefficient estimates. No variable - excepting the intercept in the Away team model - has a statistically significant coefficient in the context of modelling the mean, and only one variable in each model has a statistically significant coefficient in the context of modelling the precision parameter, phi.
Further, the models explain less than 3% of the total variability in the relevant set of conversion rates.
Ignoring the issue of statistical significance (ie whether or not the true values of the coefficients might be zero) we find that Home team conversion rates are:
- Higher for Home teams with higher pre-game victory probabilities
- Higher for higher MARS Rated teams
- Lower when the opponent's (ie the Away team's) MARS Rating is higher
- Higher for teams with greater Venue Experience
- Lower when the opponent's (ie the Away team's) Venue Experience is greater
- Higher for Home teams with recent low conversion rates
- Lower when the opponent's (ie the Away team's) recent conversion rates have been higher
Similarly, we find that Away team conversion rates are:
- Lower when the Home teams has a higher pre-game victory probability
- Higher for higher MARS Rated teams
- Lower when the opponent's (ie the Home team's) MARS Rating is higher
- Lower for teams with greater Venue Experience
- Lower when the opponent's (ie the Home team's) Venue Experience is greater
- Higher for Away teams with recent high conversion rates
- Lower when the opponent's (ie the Home team's) recent conversion rates have been lower
There are also some mildly differentiated effects across venues, the most notable being the generally higher conversion rates for Home and Away teams at Docklands, Kardinia Park and Manuka Oval.
In summary then, based on these modelling results, stronger teams tend to have marginally higher conversion rates, as do home teams with greater Venue Experience, and teams facing a stronger opponent tend to experience depressed conversion rates. After controlling for all of these factors, knowledge of a team's recent conversion rate history provides little that is of additional predictive value about its expected conversion rate in the current game.
This last point is reinforced by calculating linear correlations between teams' conversion rates in a current game and their historical conversion rates in the previous 2, 3, 4 or more games. These correlations range between -0.01 and +0.03.
For all practical purposes then, team conversion rates are almost entirely unpredictable on the basis of the regressors used in the models described.
We can do a little - though only a little - better by using team names as regressors instead. The results of fitting the same two models using team names as the only regressors, appear below.
In essence, what these models are doing is fitting each team's historical average conversion rate at Home and Away, controlling for the different mix of teams they've each faced at Home and Away venues.
If we consider only the model for Home team conversion rates and focus solely on the coefficients in the top half of the table (ie those of the form Home Team is X), we can calculate that the lowest expected Scoring Shot conversion rate for a team at Home facing Adelaide is 49.6% for the Gold Coast (ie exp(-0.687 - 0.015)) and the highest is 54% for Hawthorn and the Western Bulldogs (ie exp(-0.687 + 0.070)). The difference then is only 4.4%.
To give that difference some context, simulation using the model described in this earlier blog post suggests that every 1% change in the expected conversion rate for the Home team facing an average Away team equates to about a 1.4% change in victory probability.
So, whilst knowledge of the particular teams in a given contest allows us to do a little better than chance in predicting the teams' Scoring Shot conversion rates in that game, the overwhelming majority of the variability remains unpredictable. Even a model (not shown here) incorporating all of the regressors from the two models above - and therefore at greater risk of overfitting the data - explains less than 5% of the variability.
For most practical purposes, treating a team's Scoring Shot conversion rate in a particular game as a random draw from a Beta Binomial with fixed parameters - choosing the fixed parameters solely on the basis of whether the team is playing at Home or Away - seems to be empirically justified. Slightly improved predictions might be made by adjusting the expected conversion rate to reflect the long term conversion rate history of a particular team, but the predictive gains are small.
(I should note that it would be perfectly reasonable to adjust the expected conversion rates in a specific game for any known on-the-day factors such as unusually high winds or inclement weather. I don't have the necessary data to estimate the appropriate adjustments for such factors, however.)