This week I've been part of a Twitter conversation about Home Ground Advantage in the AFL, a trending topic because of the shift from Football Park to Adelaide Oval for the home games of Adelaide and Port Adelaide in the 2014 season.
The broad questions under discussion have been:
- How much of a disadvantage will Port Adelaide and Adelaide suffer with the switch of home grounds?
- At what point will the disadvantage be significantly reduced?
To investigate these questions in this blog I'm going to use data for the home and away portions of the most-recent 8 seasons (ie from 2006 to 2013) and measure the nature and extent of Home Ground Advantage by creating random forest classification models where each game is classified as a "home team win" or a "home team loss", with drawn games being ignored.
As I've written before, Home Ground Advantage can be parameterised in a startling variety of ways. Today I'll be encapsulating it by using two features of each game:
- Its Interstate Status (refer here) from the viewpoint of the designated Home team
- The Excess Venue Experience (refer here) of the designated Home team, which is the number of times it has played at the game venue during the past 12 months more often than has its opponent. Since the start of season 2006, over 80% of the time the value of Excess Venue Experience has been positive, and values greater than +12 have been relatively uncommon (they've occurred in about 6% of games).
In the modelling, I want to control for the relative strengths of the teams, for which purpose I've included the MARS Ratings of both teams, and for the teams' recent form, for which I've included each team's average for-and-against difference for their most recent 2, 6 and 16 games in the current season. If the season has not yet spanned 2, 6 or 16 games for a team, I average instead across the number of games that have been played in the current season. So, for example, if the season is only 10 games old, the average for-and-against for the Home and for the Away teams for the last 16 games will be based solely on the 10 games of the current season.
Bookmaker prices, or probabilities derived from them, have been deliberately excluded from the modelling process because these prices, unlike MARS Ratings and recent form, will almost certainly include a component reflecting the Home team's Excess Venue Experience. By excluding these prices I'm able to obtain a direct read on the effects of Excess Venue Experience.
So, in short, the formula for my random forests is:
Home Team Result = Excess_Venue_Experience + Interstate_Status + Own_MARS_Rating + Opponent_MARS_Rating + Own_Ave_FA_Last_2 + Opp_Ave_FA_Last_2 + Own_Ave_FA_Last_6 + Opp_Ave_FA_Last_6 + Own_Ave_FA_Last_16 + Opp_Ave_FA_Last_16
I fitted this model to each of the 8 seasons separately and to the eight seasons as a whole.
The random forest formulation offers a number of advantages:
- The random forest algorithm has been shown to be an excellent all-purpose modelling algorithms in a variety of domains.
- It automatically handles non-linear relationships. So, for example, if it turns out that each additional game of Excess Venue Experience has a different impact on the designated Home team's chances of victory, this will automatically be included in the random forest's output.
- It provides for what are called "Partial Dependence Plots", which trace out the relationship between any of our input variables and our target variable.
As proof of the efficacy of this modelling approach, the "out-of-bag" error estimates (broadly, the algorithm's assessment of how well, on average, it predicts the result of games that are excluded from a particular tree in its forest) for each season range from a high of about 40% for 2006 to 29% for 2012. For the model built using all eight seasons' data at once the error estimate is 33.5%. In other words, using only the data described earlier, the random forest fitted to seasons 2006 to 2013 in total, correctly picks the winner (on out-of-bag games) 66.5% of the time.
That's good enough, I say.
Though I could produce partial dependence plots for all 10 of the input variables, I'll only show them here for 3 variables: Own MARS Rating, Opponent MARS Rating and Excess Venue Experience.
Plots for the MARS Rating variables are shown mainly to provide context to those for Excess Venue Experience in a way I'll explain in a moment. But, firstly, here are the plots for seasons 2006 to 2009.
These plots should be viewed in pairs, the first two in the upper left, which relate to season 2006, the next two in the upper right, which relate to season 2007, and so on. Each line in a chart maps the relationship between a particular input and the chances of a Home team win. So, for example, the green line in the chart at top left traces the relationship between the MARS Rating of the Home team and its victory prospects.
From this chart you can make the following observations for season 2006:
- Based on the green line, the Home team's chances of victory generally increase as its MARS Rating increases, though they're essentially flat for Ratings between about 950 and 990, and from a Rating of about 1,000 or higher
- Based on the black line, the Home team's chances of victory generally decrease as the MARS Rating of its opponent increases, pretty much uniformly across the range from 980 to 1,020
- Because the range of the black line (in terms of the y-axis, which measures impact) is greater than the range of the green line, the Rating of the Home team's opponent can be said to have a larger impact on the Home team's chances than the Rating of the Home team itself. (Strictly speaking this claim also requires that the bulk of the observed values for the two Ratings are in those portions of the curves when the range of the black line exceeds the range of the green. That is the case here, and elsewhere.)
Turning our attention next to the blue line in the neighbouring chart we can make the following additional observations for season 2006:
- The Home team's chances rise with increased Excess Venue Experience to a maximum at an Excess Venue Experience of about 5 games
- Excess Venue Experience is a lesser determinant of the Home team's victory chances than is its own MARS Rating and, especially, the MARS Rating of its opponent.
Reviewing in the same manner the charts for the other three seasons we can see that it's generally true that:
- The MARS Rating of the Home team's Opponent is the most important variable, its own MARS Rating is next most important, and its Excess Venue Experience is, while still somewhat important, least important of all. (One of the outputs of the random forest algorithm is an overall assessment of the importance of every variable in the model. It consistently ranks Excess Venue Experience as the 2nd least important variable, behind only Interstate Status, and just slightly less important than some of the form variables. Own and Opponent MARS Ratings always rank 1st and 2nd, with Opponent MARS Rating most often but not always in 1st.)
- The importance of Excess Venue Experience reaches its maximum around the 5 to 7 game mark.
(By the way, as tempting as it might be to do so, you can't really compare curves on the basis of their range from one modelled season to another - so you can't determine whether, for example, Own MARS Rating is more or less important in 2007 than it was in 2009.)
Next we'll look at the same chart for seasons 2010 to 2013.
The first thing to note about these charts is that they show that team Ratings have been more important than Excess Venue Experience throughout this period as well. Indeed, in 2010 Excess Venue Experience bore virtually no relationship to the Home team's chances, and in 2011 and 2013 the relationship was weak, at best. The "magic number" for Excess Venue Experience, as much as one can be said to exist for this period, appears to be in the +5 to +7 range.
Notwithstanding the clear differences in the relationship between Excess Venue Experience and Home team victory chances across the eight seasons, combining the data for all of them might remove some of the "noise" and help better generalise the key relationships. So, let's do that, and plot the results.
We find the same ordering in the importance of the variables shown, with Opponent MARS Rating most important, Own MARS Rating next most important, and Excess Venue Experience least important.
Also, we find that Excess Venue Experience has roughly the same impact on Home team chances for any value in the range below about +5 after which it spikes for values up to about +7 before plateauing - even diminishing a little - for higher values.
The answers then to the questions posed at the start of this blog are therefore that Excess Venue Experience is only mildly important in determining a team's victory prospects and far less important than the Rating of the teams involved in the contest, and that the benefit of Excess Venue Experience reaches its peak at around a value of +7.
TEAM HISTORY OF EXCESS VENUE EXPERIENCE
With the foregoing analysis in mind it's interesting to analyse the average Excess Venue Experience that each team has enjoyed when playing as the designated Home team during the past 8 seasons.
There is, of course, a large State component to this analysis, with the South Australian, Queensland, and West Australian teams all enjoying large average Excess Venue Experience.
Sydney, despite being another non-Victorian team, has benefited less from this factor because it has split its home games across Stadium Australia and the SCG.
Most of the Victorian teams have much lower average Excess Venue Experience values, especially Geelong, Richmond, Carlton and Essendon. Geelong's average has been much affected by its playing home games at Docklands and MCG at which grounds it has regularly faced opponents, such as Collingwood, who enjoy having had 5 or more additional appearances during the preceding 12 months.
Hawthorn has improved its average in recent seasons partly by playing some home games at Aurora Stadium, but also by playing almost all of its Victorian home games at the MCG. The Roos have bolstered their average Excess Venue Experience in similar ways, by playing some home games at Bellerive and by consolidating its Victorian home games at Docklands, avoiding the G, which it had also used in earlier seasons.
Amongst the remaining teams, the Western Bulldogs have benefited from shifting almost all of their home games to Docklands, while St Kilda have suffered somewhat in recent years from playing a few home games at the G against more venue-experienced opponents and from also facing now more venue-experienced teams at its major home ground, Docklands.
We know though, from the earlier analysis, that Excess Venue Experience is not linearly related to a Home team's chances. Instead, values of +5 and more and, especially, of +7 and more, are most important. Here then is a table with the number of games that each team has played at home when enjoying Excess Venue Experiences at or above these thresholds.
The pattern across teams for the number of games where the Excess Venue Experience is +5 or more is similar to that for average Excess Venue Experience for all teams except for Geelong who have faced a surprisingly high number of matchups while enjoying such an advantage.
Moving to the lower table showing the number of home games played with an Excess Venue Experience of 7 or higher we see that the Cats' advantage disappears for the three most recent seasons on account of its slightly lighter schedule at Kardinia Park. They now join Carlton and Essendon - and, more recently, Hawthorn - as teams virtually starved of games where they enjoy significant Excess Venue Experience.
In the context of the original premise for this blog, Adelaide and Port Adelaide are in for something of a shock next season when they find that their Excess Venue Experience is far smaller than they've enjoyed in previous seasons.