What Variables Are Used in MoS Statistical Models?
In a fine example of making a virtue of necessity I've steadfastly defended my practice of excluding any player information from the statistical models I build. The thought of scanning the papers every week for news of player ins and outs makes me shudder even now.
What I do use as predictors for modelling game outcomes are:
Variables reflecting the Venue
- Which team is the home team
- Where the game is being played
- How much experience each team has at the venue
Measures of Team Strength and Recent Form
- The TAB Bookmaker's pre-game head-to-head prices
- The teams' Team Ratings
- The teams' recent scoring record
In most sports, teams have a designated and unique home ground so it's always clear which team is playing at home and which is playing away. This was true of the VFL/AFL competition for most of its history too until, in recent seasons, teams began sharing home grounds, calling multiple venues home, and playing designated "home" games at neutral venues - or even, wackiest of all, playing them at their opponent's home ground.
These recent practices confuse things because, empirically, there is a Home Ground Advantage (HGA) in AFL.
In the early years of MoS I decided, at the start of each season, which team I'd recognise as the Home team in every contest for that season. That began to feel a bit arbitrary so I now simply recognise as the home team for MoS purposes whichever team the AFL designates as the home team. There are, in any case, other variables I use that can adjust for some of the anomalies this might introduce when, for example, Richmond play the Gold Coast at "home" at Cazaly's Stadium.
(For the MoSSBODS Team Rating System I go a step further than this and look at the Venue Effects for both the designated Home and the designated Away teams, recognising that some teams actually tend to do worse than you'd expect when they're playing at some "home" venues.)
With the introduction of Greater Western Sydney in 2012 and the Gold Coast in 2011, the AFL competition now includes 8 non-Victorian and 10 Victorian teams, with home grounds spanning 5 States. Interstate travel by the Away team increases the HGA enjoyed by the Home team, though the size of this incremental advantage has been diminishing in recent years.
To explicitly incorporate the beneficial effect for the Home team of interstate travel by the Away team I include in many models a 'dummy' variable that takes on the value:
- +1 if the Home team is playing in its home State and the Away team is playing outside its home State
- 0 if the Home team and the Away team are both playing in or both playing outside their home States
- -1 if the Home team is playing outside its home State and the Away team is playing in its home State
One of the reasons posited for the existence of a Home Ground Advantage is the Home team's relative familiarity with the game venue compared to the Away team. We can treat familiarity as a binary concept simply by recognising which team is the home team and which is the away team or we can, instead or as well, treat familiarity as a continuous variable.
To do this I include two variables, one for each team, which reflect the number of games each has played at the venue of the current game anytime in the past 12 calendar months.
Measures of Team Strength
Bookmakers are exceptionally good at quantifying the relative strengths of teams, which they convey in the head-to-head prices they offer. MoS's bookmaker of choice (well, let's be honest, 'convenience' or 'necessity' are nearer the truth) is the TAB Sportsbet market-maker, whose pricing opinions I usually gather at noon on Wednesdays.
Bookmaker prices can be translated into a measure of the teams' relative strengths in a number of ways, but the two formulations I use most commonly are:
- The Home Team Implicit Probability: which is defined as the Away Team's Price divided by the sum of the Home Team's and the Away Team's Price.
- The Home Team Log Probability Ratio: which is defined as the log of the Home Team Implicit Probability divided by the Away Team Implicit Probability.
Most of the possible formulations that convert prices into measures of relative strength are highly correlated for all prices except the very small (say less than $1.10) and the very large (say greater than $10). Outside that range, different formulations can behave very non-linearly. For example, consider two different head-to-head markets, the first $1.05 and $10, the second $1.04 and $11. The Implicit Probability measures for these two markets for the favourites are 90.5% and 91.4%, which is about a 1% difference, whereas the Log Probability Ratio measures are 0.98 and 1.02, which is about a 5% difference. The Odds Ratio, another potential measure and defined as (Away Team Price - 1) / (Home Team Price - 1), provides relative strength measures of 180 and 250, which is a 39% difference. Finding a way to reliably incorporate bookmaker opinion in statistical models across the range of team prices is an area of ongoing interest to me.
As a supplement to bookmaker prices - and as a replacement for them when I need to project into a future in which I don't have them because they haven't yet been formed - I've created my own team Rating systems, MARS, MoSSBODS, and ChiPS.
These systems are all ELO-style system and so rewards teams for their game-day performance relative to what might be expected of them given the strength of their opponents and whether they face them at home, away, or on a neutral venue. The ELO approach includes a number of user-selectable parameters and, as you'll see in those earlier links, I've spent a considerable time optimising (and re-optimising) those parameters, especially for MoSSBODS and ChiPS.
Recent Scoring Record
Empirically, Bookmaker prices and Team Ratings seem to capture teams' short-term and medium-term performance, but neither seem to completely capture the predictive content of teams' longer-term performance. Accordingly, I include in many models variables to reflect the average for and against differential for the home and away teams over the course of the most recent 16 rounds of the current season.
Prior to the end of the 16th round of a given season I set these variables equal to the average for and against differential for each team over the course of all completed rounds in the current season and, for the 1st week of the season, I set these variables equal to zero.
Another way of incorporating a team's form into the modelling is to calculate the change in its Team Ratings over the past few games. The advantage of this approach over the simple averaging approach just described is that Ratings changes reflect the quality of opposition against which recent results were achieved.