I've been talking here on MoS and elsewhere about doing some analysis of player data for quite some time but, until now, have lacked a key ingredient for that analysis (viz, the data). That's just changed.
Over on github, Twitter users @anoafl and @plusSixOneblog are working on an R package that provides player and other historical V/AFL data in a ready-to-analyse format. You can get the current development version using the following R commands:
It's a fantastic resource, and builds on the amazing work done over on the afltables.com website. Keep an eye out on CRAN for the final package release.
So much data; so many questions ... what should we start with? How about team-mates.
Team-Mates: NUMBER AND FAMILIARITY
Here's an interesting question: which V/AFL player has taken the field with the largest number of team-mates when we aggregate across their entire career?
(Note that, for all of the analyses in this blog, I've included only players whose most recent game was played in or during the 2016 season.)
The answer is Wels Eicke, who played with 299 different team-mates during his 218 game career that spanned two clubs over an 18-year period. That would be one hell of a reunion.
He's one of only eight players to have had 250 or more team-mates, a surprising proportion of which have done so despite playing for only a single club.
Many on the list are from some time back, with Tony Lockett the only representative from the current century, and only nine of the 25 having careers that extended beyond 1980.
Obviously, players with longer careers have a greater opportunity to rack up team-mates, but there's an extraordinarily large range of team-mate counts for virtually all career lengths.
As two extreme examples, consider Jimmy Bartel and Richard Osborne, both of whom are labelled on the chart above. Jimmy had only 109 team-mates across a 305 game career, while Richard had 267 across a 283 game career.
There's an interesting piece to be done, I think, on the link between a club's list stability and its on-field success.
In that vein, at right is a table showing the players who have had fewest team-mates per game played.
Micheal Tuck heads this list having played 426 games across 20 seasons for Hawthorn and seeing only 144 team-mates across that period, or about 7 per season.
Thereafter come Corey Enright and Jimmy Bartel, whose careers at Geelong overlap for the 2002 to 2016 seasons.
No-one in this Top 25 list had a career that ended prior to 1984, and 16 of them had careers ending in 2010 or later. Lists, it would seem, have been much more stable in the modern era.
We can also see this relative stability if we compare how many team-mates players with similar career lengths have had across the various eras.
What we find is that, for careers of 150 games or more, players in the modern era (ie whose average is described by the red line in the chart above) have tended to have fewer team mates during those careers.
We also see that the relationship between incremental team-mates and incremental games becomes roughly linear in the modern era once a player's career has hit the 100 game mark. Thereafter he can expect to see one new team-mate about every four games.
If we aggregate across career lengths, we can investigate the marginal distribution of team-mate counts across eras (where we place each player in an era based on the season in which he played his most-recent game).
We see that players in the modern era tend to have had slightly more team-mates than in earlier eras, with a median value of 75 and a mean of 79.3. The comparative figures for the 1980 to 1999 era, which has the next-highest figures, are 69 and 78.7.
Career lengths, however, differ substantially across these two eras, the average for the 2000 to 2016 era being 90.7 games (and the median 58) , and for the 1980 to 1999 era being only 67.8 games (and the median 35).
Players in the modern era also tend to have more direct experience with their team-mates, with about 50% of modern players having averaged 17 games or more with each team member they've played with.
So far we've looked at the direct connections between a player and his team-mates. We might also be interested in the second-order connections between a single player's team-mates. We might want to know, loosely speaking, what proportion of those team-mates have also played in the same team together, with or without the player in question. The local clustering metric in social network analysis measures exactly this by counting the number of connections between a player's team-mates as a proportion of the number of possible connections.
Consider, for example, a player who has only four career team-mates, A, B, C and D, each of whom he's played with at different times. Imagine that he's played in one team with A and B, and in another with C and D, and that none of the four have otherwise played together. The clustering co-efficient here would be 2/6 = 0.33, because there are two direct connections between team-mates (A with B, and C with D) out of a possible six (A with B, A with C, A with D, B with C, B with D, and C with D).
We can think of the clustering co-efficient as a measure of the familiarity amongst a player's team-mates.
If we create a scatter-plot of this measure against team-mate count, and analyse it by era, we find that modern day players, for any given number of team-mates, tend to have more team-mates that have played together at some stage in their own careers.
All of which underscores the point that modern day lists do seem to have been more stable.
The data that we create to perform the team-mate analyses above can also be viewed more overtly through a social network lens where the players are the 'nodes', the 'edges' represent two nodes having played a game in the same team, and the 'weights' of those edges are a count of how often they've played together.
Viewed in this way, our player data defines a network with 12,519 nodes and 388,761 edges. That network has an average path length of 5.02, which means that traversing from any player in V/AFL history to any other player solely via team-mates takes, on average about 5 'hops'. We can think of this as the V/AFL's average Kevin Bacon number if we treat each player, in turn, as Kevin Bacon and calculate the distance from that player to all other players. Equivalently, we can consider it to be the V/AFL network's average Erdos number. (Alas, there is no Erdos-Bacon equivalent).
We might ask then, which player is, on average, closest (or, to use slightly more standard social network analysis terminology, 'most connected') to every other player in V/AFL history?
The answer is Ted Whitten, who we can link to every other player in V/AFL history by, on average, 3.64 steps. He's just ahead of six players whose careers all ended sometime in the 1950s and who can be linked to every other player in history by, on average, between 3.70 and 3.78 steps.
It's notable that no-one on this list finished his career before 1944 or after 1974.
Curiously, at least to me, only three of the players on the list have played with more than a single club, and only one of them, Jack O'Halloran, with as many as three.
Being well-connected in a V/AFL sense does not, it turns out, require playing stints at a wide range of clubs.
Equally, we could ask which player is furthest, in the network sense, from every other player.
The answer to that question is Tommy Kinman, who played a single game for Carlton in 1898 where he suffered a 4.5 (29) to 3.12 (30) loss. To get from Tommy to every other player in history takes, on average 7.02 steps.
Apart from Harry Gyles (equal 4th) and Harry Morgan (equal 20th, despite playing for two clubs), every player on this list played his final game in 1897 or 1898. What partly counts against players from these seasons is the smaller number of players (viz, 20) named for each game in that period.
If we broaden our view to look not just at the extremes of connectedness but instead at the connectedness of every player, categorising them by the last season in which they played a game, we find that players from about the mid-1940s to the late 1950s are, on average, most connected.
And that, for now, is all.
I feel as though there are a lot more interesting and useful analyses to do with this data, alone and in combination with other data, and I plan to undertake some of these, progressively, during the season. Let me know if there's a particular analysis that interests you.