Some of you will already know about the fantastic Fitzroy R package, which makes available a slew of match and player data spanning the entire history of the V/AFL. A number of people are already doing interesting things with that data, and this will be the first in what I expect will be a series of blog posts from me doing my own investigations and analyses with that data.
Specifically, what we'll be covering in this blog are the following:
- How has individual player experience changed over the years? Are the players in a typical game today more or less experienced - on average in terms of games played - than their counterparts from previous eras?
- How has shared player experience changed over the years? Have the players in a typical game today played together more or less often than their counterparts from previous eras?
- What predictive value does individual or shared player experience offer to us? Are we better able to predict game margins knowing the individual and shared experience of the players involved, compared to a model where we only use team rating and venue performance data?
For all the analyses we will use the data spanning from Round 1 of 1897 to the end of the 2018 home and away season.
PLAYER EXPERIENCE DATA
First, let's look at the history of individual player experience across history by plotting density functions for the average number of previous V/AFL games played by players taking part in games during each era. We use average rather than aggregate experience because the number of players in a team has varied across time from 18 to 22 per side.
We see a general increase in the average experience levels of players in each successive era, culminating in the current one where the average player taking the field has done it 92 times before, about 16 games more often than those from the previous era, and more than twice as often as those from the earliest era, where the average was just around 40 games per player.
Using this measure, the most experienced team ever to take the field in a V/AFL game was the Brisbane Lions Round 22 side of 2004, which had an aggregate experience of 3,718 games, which is an average of 169 games per player. The Lions' Round 21 side was only slightly less experienced (3,696 games), and the Hawthorn 2015 Grand Final team comes in 3rd with an aggregate of 3,648 games and an average of about 166 games per player.
It seems highly logical that the level of average individual experience in a team should affect its performance and that more experienced teams should do better than less experienced ones.
In the first chart we investigate, at the game level, the relationship between the difference in average individual experience (ie the average number of previous games played for this or any other team by all members of the current team) for the two sides in a contest and the final margin in that contest.
We find that, as we would expect, larger differences in aggregate experience are associated with larger final margins. Across time, however, it seems that the expected margin benefit from additional experience has declined a little. In the 1940-1959 era, for example, a difference of 25 games in average individual experience was worth about 30 points whereas in the modern era it's worth only about 20 to 25 points in margin.
Across all eras, the correlation between difference in average individual experience and game margin is only +0.34, which means that only about 11% of the variability in margins can be explained by the variability in average experience differences.
Maybe what's more important for a team than average individual experience is average shared experience - that is, the average number of games that any pair of members of the team have played together previously. Our measure for this is average pairwise shared experience, which we calculate by summing all of the shared experience counts for every pair of players on a team and then dividing by the number of pairs.
So, for example, if Player A and B have previously played on the same team 10 times, Player A and C have played together 5 times, and Player B and C have played together 12 times, the aggregate pairwise shared experience for those three players is 27 games. Their average per pair then is 27/3, which is 9 games. To calculate this value for a modern, 22 man roster for a single game requires that we sum the shared experience across all 231 possible pairs of players (ie 22 x 21 divided by 2) and then divide by 231.
The era-by-era profiles appear below.
Again we find that experience, as measured by this shared metric, has risen over time. Were we to select a pair of players at random from an AFL game in progress in the modern era, we would expect them to have played previously together about 34 times. Were we to do the same thing in the era from 1897 to 1919, that figure would be closer to 16.
Using this measure, the team with the highest level of shared experience ever to take the field in a V/AFL game was the Geelong 2004 side in the Qualifying Final, which had an aggregate shared experience of 22,667 games, which is an average of 98 games per pair. The Geelong team of that same year also fills positions 2nd to 4th on the list of most experienced, while the 2009 team fills 5th and 6th.
It's not until we get to 7th that we find a different team, that 2004 R22 Brisbane Lions team as it happens, which had aggregate shared experience of 19,688 games, which is just over 85 games per pair.
This shared measure is highly correlated with the individual measure we used earlier. In fact, the correlation between individual and shared experience across all eras is +0.92, which means that about 84% of their variability is common. The chart below shows this relationship, by era.
One of the striking features of this chart is the growth in the spread of average individual and shared experience over time. We can also see that, in most eras, the relationship between individual and shared experience is not strictly linear: at higher levels of average individual experience, shared pairwise experience grows more rapidly. In short, if a team has a lot of experienced players it will tend to have disproportionately high levels of shared pairwise experience.
So, we might ask: how correlated is the difference in this shared metric with final game margins?
The answer: slightly more correlated than the individual experience metric. Here the correlation is +0.39 across all eras, meaning that about 15.5% of the variability in game margins can be explained by the variability in shared experience differences.
To put that number in context, if we look only at the modern era, the percentage of explained variability is about 17.6%. Using MoSHBODS' forecasts we explain 32% of the variability during that same period. Using bookmaker prices we can get to about 34%.
So, we know that we're better able to forecast margins using MoSHBODS alone than we would be using either of the experience metrics alone, but can we do a better job still by using all of them?
To test this, we'll build a random forest on a randomly selected 50% sample of all 15,398 games from V/AFL history (choosing whether to adopt a home team or away team perspective also at random), and then estimate the mean absolute error (MAE) of predictions from this model on the remaining 50% of games - the "holdout" sample. We compare that MAE with what we get from using MoSHBODS alone on that same 50% holdout sample.
The model we'll fit is:
Game Margin = MoSHBODS Expected Margin + Own Average Individual Experience + Opponent Average Individual Experience + Own Average Shared Experience + Opponent Average Shared Experience
One output from this model is a variable importance plot, which allows us to rank and talk in broad, quantitative terms about the predictive ability of all the input variables.
Two measures of variable importance are provided, the definitions for which are, loosely:
- The percentage increase in the mean squared error of the model when we permute (randomise in a defined way) a particular variable
- The average increase in node purity when a particular variable is used in a tree in the forest
(see this PDF for more detail).
These measures provide an identical ordering of the variables, with the MoSHBODS Expected Margin by far the most important followed by the shared experience variables (own before opponent's) and then the individual experience variables (also own before opponent's).
Now, to be clear, MoSHBODS' Expected Margins are based on teams' offensive and defensive ratings, which themselves incorporate experience to the extent that this experience has contributed to previous performances, because these performances are what are used to update ratings. Nonetheless, it seems that on-the-day individual and shared experience data is of significantly secondary importance in comparison to ratings when predicting margins. Put another way, teams don't change significantly enough in terms of experience from week-to-week to make this a highly predictive aspect or, alternatively, any such changes provide no great predictive value.
That's not to say that the absence of a key player has no affect on game margins, just that any such affect is not captured by the change in experience level, if any, when he is missing. The impact of individual players is something we'll explicitly consider in a future post.
One other interesting output that we can produce for the random forest is what is called a partial dependence plot, which shows the dependence between the model's predictions and each input variable, marginalising over the values of all other input variables. In essence, it shows how the output variable (for us, game margin) responds to changes in one of the input variables, controlling for all other variables.
The package we use, plotmo, calculates these partial effects by setting all but the variable of interest at their average values.
This chart allows us to assess the size and sign of the relationships between our input variables and predicted margins.
We can see for example, in the first panel, that a one point increase in MoSHBODS' Expected Margin translates roughly at a one-to-one rate into an increase in the model's predicted game margin, at least in the input range from about -20 to +20, which represents a large proportion of the observed range as indicated by the rug plot at the bottom of that chart. This one-to-one effect size is as we would hope it to be if MoSHBODS is well-calibrated.
The second panel suggests that the relationship between a team's average individual experience and predicted margin is broadly linear though the effect size is quite small. Moving from an average of 50 games per player to 100 games per player changes the predicted margin by only about a couple of points.
In contrast, the third panel suggests that the relationship between a team's average pairwise shared experience and predicted margin is only linear over a portion of the input range - up to about an average of 50 shared games per player. In that range, the effect size is roughly 3 points in increased predicted margin for every 4 games of additional averaged shared experience. After that, there is no significant increase in expected margin as shared experience increases.
The remaining panels can be interpreted similarly, and suggest that the relationships for opponent experience levels are broadly mirror-images of those for own experience levels.
So, how accurate is this model in comparison to using MoSHBODS raw forecasts (or MoSHBODS with the 2-point away team bias correction that we've employed when using it during the season)?
The random forest model has proven to be only occasionally superior to MoSHBODS-based forecasts, finishing 1st in only 43 seasons including just 9 since 1980. Overall, it has performed worse than both MoSHBODS-based methods.
SUMMARY AND CONCLUSION
We've found that experience, formulated as average experience per player or average shared experience per pair, offers no additional predictive power over what we obtain from using MoSHBODS forecasts alone.
There are a number of interpretations we might place on this finding, some broad, some narrow, including that:
- A different experience metric is required (for example, I've had it suggested that experience in winning teams is different from experience in losing teams)
- A different model choice other than Random Forest might reveal the predictive value of experience (for example, we might try an ensemble model)
- The relationship between experience and margins hasn't been stable over time so we should incorporate a time variable in the model or fit a model only to the most recent era
- Similar to the first point, experience isn't well-encapsulated by raw game counts and one game played by Player X isn't equivalent to one game played by Player Y
- Elo ratings already encapsulate the key predictive elements of experience such that week-to-week changes in experience levels provide no additional "signal"
All of these are interesting hypotheses and deserve separate investigation in subsequent analyses and blog posts.
(BTW: You can read more about shared experience in the V/AFL in the book Footballistics in a chapter bearing that name written by Matt Cowgill and James Coventry.)