Most sporting codes with a history of any significant length will eventually be described in terms of having passed through a number of eras, one or both ends of which are usually defined by some relatively obvious characteristic that forms the basis of the discussion.
Sometimes, eras will be defined by a rule change that dramatically effects the nature of the game and the strategies that are effective in playing it - think the "6 tackle rule" in Rugby League, the "Designated Hitter rule" in Baseball, or the "Shot Clock" in Basketball. Other times, eras will be defined by an outstanding player or team of players who were so dominant or influential on the way the sport was played that they similarly changed the way the game was played.
Another way to define eras is purely quantitatively, which is the approach I'll be adopting for this blog, in which I'll be using scoring data for every game played from Round 1 of 1897 to Round 14 of 2014.
In timeseries literature, changes in the underlying statistical process imagined to be generating the observed data are referred to as "changepoints" and techniques have been devised to identify these points given certain assumptions about the nature of that underlying generating statistical process.
I'll be using the R package called changepoint for the current analysis, a companion Journal of Statistical Software (JSS) paper for which does an excellent job of defining the notion of changepoints, providing an historical context for previously proposed approaches to the problem of finding them, and describing the proper use and interpretation of the outputs produced by the R package.
Broadly the idea of changepoint detection is to identify a number of segments of the timeseries within which the underlying process generating the data appears to be roughly constant and outside which the underlying process appears to be different. Loosely speaking, when we break a timeseries up into these segments, we don't want "too many" and we don't want any of them to be "too short", but we want them all to be "acceptably different". Practically we operationalise the notion of "too many" by imposing a penalty on the function we're optimising whenever we postulate a new segment, and we operationalise the notions of "too short" and "acceptably different" by visually inspecting proposed solutions.
As the JSS paper notes with admirable honesty:
"The choice of appropriate penalty is still an open question and typically depends on many actors including the size of the changes and the length of segments, both of which are unknown prior to analysis ... In current practice, the choice of penalty is often assessed by plotting the data and changepoints to see if they seem reasonable."
Sometimes it really is the case that the best solution can be arrived at by a combination of mathematical technique and commonsense interpretation of the technique's outputs under various, reasonable assumptions.
CHANGEPOINTS IN TOTAL SCORING
Let's first attempt to define the changepoints in VFL/AFL football based on the average total points scored per game during a season. Further, let's assume that the underlying process generating a season's average score per game is Normally distributed and that changes in both or either of the mean or the variance of the underlying distribution define each era (ie the segment between changepoints). The assumption of Normality and that eras are defined by changes in the mean and variance will be carried through into all analyses in this blog.
By trialling different penalty values I eventually landed on the following specification:
total_score_model = cpt.meanvar(ts(summary_by_season$Total_Average_Score,start = 1897), penalty="Manual", pen.value="20.5", method="PELT", test.stat="Normal", class=TRUE, param.estimates=TRUE)
The JSS paper will allow the interested reader to decipher the precise meaning of this function call, but here I'll just show you what it produces.
The model suggests that there have been nine eras in terms of Season Average Total Scoring:
The red lines in the chart define each era and mark the average Season Average Total Score for that era, weighting each season equally.
Reducing the penalty value to allow the algorithm to find additional (or even different) eras results in some very short eras, some of which don't look all that different from their neighbouring eras.
So, nine eras of Total Scoring it is. It's interesting that four of them come before the Second World War, and that three of the four post WWII eras have been of 15 to 20-year durations.
CHANGEPOINTS IN WINNERS' AND LOSERS' SCORING
Another obvious way to define eras based on team scoring is to look at the trends in the average scores of winning and of losing teams within each season.
Using the same penalty value as I used above in both cases here yields the following solutions.
For Winning Scores, the eras are:
These are very similar to the eras we found for Total Scores above.
For Losing Scores, the eras are:
These are similar to, but a little different from, the eras we found for Total Scores above. Most notably, there's no separate era from the late 1960s to the mid 1970s.
CHANGEPOINTS IN VICTORY MARGINS
Eras might also be defined based on the ease with which teams have won in a given season - that is, the average difference between the winning and losing scores.
In this case a slightly smaller penalty value, 12.5, produces an acceptable (to me) solution that includes just six eras:
With a lower penalty, the changepoint detection algorithm is keen to use its extra freedom to create a 2010-2011 era, and a 2013-2014 era, reflecting the very different average winning margins in those pairs of seasons. Exactly how best to partition the period from the late 1970s to modern times will only be fully determined once we have a longer timeseries.
CHANGEPOINTS IN SCORING CONVERSION RATES
As a final analysis for this blog I'll define eras based on teams' conversion rates - that is the proportion of goals as a fraction of goals plus behinds.
To derive the solution for this metric I've used a penalty value of 15, which yields the chart shown here.
Five eras are postulated, the first of which, though very short, persists even if we lower the penalty value substantially (which has the undesirable effect of introducing a range of additional, very short eras elsewhere in the timeline):
Apart from that very short initial era, the other four are quite long, especially the third (44 seasons) and the fifth (34 seasons, including the current season).
It's often important to define your terms and in this post we've found that we can legitimately claim that, based on various ways of analysing scoring data alone, there have been as few as five or as many as nine eras in VFL/AFL football.
Of course, we could split the history of football into different sets of eras again based on other metrics such as the number of goals scored per game, the number of scoring shots, and so on, which is something I might look at in a future blog.