Many AFL fans, I reckon, would have a reasonably accurate internal model of what a good, average or poor crowd might be for a given contest. My goal in this blog is to see if we can build a statistical model to encapsulate that intuition, and see how well it might 'predict' the attendance figure for any game from the past 16 seasons based on certain of that game's features (in a somewhat similar manner to this post from 2012).
For this modelling exercise, we'll be using data for the period 2000 to 2015, focussing solely on games from the home-and-away portion of those seasons.
We'll source attendance figures from the afltables site and we'll include as regressors, variables to describe:
- The Home Team
- The Away Team
- Whether or not the Home Team was the pre-game Favourite (as determined by MoSSBODS Team Ratings)
- The Strength of Favouritism for the team preferred by MoSSBODS (measured in terms of Expected Scoring Shots superiority)
- The Venue at which the game was played
- The Time of Day at which the game commenced
- The Day of the Week on which the game took place
- The Month of the Year in which the game took place
- The Year in which the game took place
Before we look at the fitted model, let's explore the relationship between Attendance and each of the proposed regressors, in turn, starting with the Season in which the game took place.
In this first table we look at Average Attendance at games within each of the 16 home-and-away seasons and find that the all-season average attendance is just over 34,000 per game, with higher averages in the 2005 to 2011 period (and especially in the 2007 to 2010 period), and particularly lower averages in the 2012 to 2015 period.
The variability of crowds, as described by the standard deviation, peaked during the 2011 to 2013 period, possibly reflecting the introduction of the Gold Coast and GWS teams.
From a modelling viewpoint, the data suggests that splitting the 2000 to 2015 history into three periods - 2000 to 2004, 2005 to 2011, and 2012 to 2015 - makes some empirical sense.
Next, let's look at average attendances for each of the 18 teams when they are the (notional) home team.
This table is sorted on the basis of average "excess" attendance relative to the all-team average, and sees Collingwood perched atop the list having, on average, generated crowds when playing as the home team of a bit over 52,000.
They are one of seven teams that can claim to have generated above average home crowds, the six other teams including four more from Victoria (Essendon, Richmond, Carlton and Hawthorn), and just two non-Victorian teams (Adelaide and West Coast).
Three non-Victorian teams sit amongst the bottom four places (GWS, Gold Coast and the Brisbane Lions), interrupted only by the Kangaroos who've seen crowds of fewer than 26,000 at their typical home game.
Hawthorn have seen the most variable home crowds, though Carlton, Collingwood, Richmond and Melbourne have also been associated with above-average variability of home game crowd sizes.
Looking at the teams when they are (notional) Away teams instead, we find that Collingwood, again, is associated with the largest average crowds (just over 50,000 per game), and that Essendon is once again second to them.
In fact, of the six teams with above-average drawing power as Away teams, five of them are the Top 5 teams from the list of teams with above-average drawing power as Home teams (Collingwood, Essendon, Carlton, Hawthorn and Richmond). Geelong is the only team that climbs into the Top 6 as an Away team that wasn't also towards the top of the list as a Home team.
Those six teams are the only ones with above-average typical attendances when playing as the Away team, and they are all based in Victoria.
In contrast, the eight teams with the smallest average attendances when playing as the Away team are all non-Victorian.
The greatest variability in attendance as an Away team, however, is associated with five of the Victorian teams - Collingwood, Essendon, Carlton, Geelong and Richmond.
When we turn next to Venue we find a considerable range of average attendances, even for grounds that have been used quite a number of times during the period.
Perhaps not surprisingly, the MCG has attracted the largest crowds, the average for that venue coming in at a tick under 47,000 per game. The Adelaide Oval and Stadium Australia, albeit on a much smaller base of games, come in at 2nd and 3rd based on average attendance.
Four more non-Victorian venues (Subiaco, Football Park, the Gabba, and the SCG) fill slots in the next five places, all of them with average attendances in excess of 25,000 per game.
At the foot of the table we also find non-Victorian grounds, however, these filling all of the bottom eight positions, with averages all below 15,000 per game.
By far, variability in attendance figures has been greatest for the MCG, its standard deviation of almost 19,000 attendees over 50% larger than the figure of almost 12,000 for Stadium Australia.
Next, let's consider the crowd-boosting effects of home team favouritism (again determined by MoSSBODS Ratings and not any bookmaker).
We find a small, though highly statistically significant, jump in attendance when the home team is the favourite according to MoSSBODS, equivalent to a little over 300 people per game.
One aspect of this table that's interesting to note is just how common it is for the home team to be favourite. It's the case just over 60% of the time, which is well above what we'd expect due to chance.
Okay, so what about the effects of month, day of week, and time?
Firstly, here's a table of average attendances by month.
It reveals that early-season games tend to attract the largest crowds, with games in March dragging an additional 3,000 fans through the gate, and those in April - which would, of course, include ANZAC day clashes - attracting about half that many extra behinds witnessing goals (assuming a one fan equals one behind ratio).
The period May through August then sees very similar average attendance figures, all in the 33,300 to 33,750 range.
Those seasons where there have been Home and Away games in September have produced crowds a bit over 2,500 below average, a reflection perhaps of the relative unimportance of those games in the context of the relevant season's Finals.
The greatest variability in crowds has been seen across games played in April, the least for games played in September, though games in August and June have produced attendances almost equally as variable.
From a Day of Week perspective, days from other than the weekend have produced the largest average crowds, undeniably bolstered by the sometime "blockbuster" nature of these contests.
The largest average crowds have come on Tuesdays and Wednesdays, though the paucity of games on those days makes this average subject to extremely large sample variation.
Amongst those days on which more than 50 games have been played - Friday, Saturday and Sunday - Friday has the highest average at just over 44,000 fans per game, while Sunday has the lowest at just under 31,000 fans per game. Sunday crowds have been the least variable, and Saturday the most variable, though only slightly moreso than Friday crowds.
Games starting at 7:30pm or later have attracted significantly larger crowds (a bit over 41,000 per game), while those starting before 4:30pm have attracted the smallest crowds (just over 32,000).
The greatest variability in attendance, however, has also been associated with games starting at 7:30pm or later, while the smallest variability has been associated with games starting in the 4:30pm to 7:30pm window.
Finally, we might ask what effect the expected competitiveness of the contest had on attendances, which here we'll measure using MoSSBODS' expected victory margin for the favourite, measured in Scoring Shots.
What we find is a clear relationship between attendances and the strength of the favourite, with games expected to be close attracting an additional 1,400 or so fans above average, and with games expected to be blowouts seeing crowds, on average, almost 4,500 below average.
Variability of attendance is highest in games where the favourite is expected to score 9 Scoring Shots or fewer than its opponent, and is lowest in games where the favourite is expected to win by 12 Scoring Shots or more.
So, what happens if we construct a model - here an OLS regression - where we attempt to fit actual attendance as a function of the variables just explored?
We end up with a model that explains almost three-quarters of the variability in actual attendances, and that is:
- within 800 of the actual attendance 10% of the time
- within 4,600 of the actual attendance 50% of the time
- within 9,120 of the actual attendance 80% of the time
That model is summarised at right. Using its coefficients requires an understanding of its construction.
Firstly, it's important to recognise the model's "reference" levels, which mean that the coefficients are estimates relative to a game between Hawthorn and West Coast, played sometime in the 2000 to 2004 period in March at the MCG on a Friday, starting before 4:30pm and where the Eagles start as very narrow favourites.
Looking at each of the coefficient blocks in turn (which means, implicitly, assuming that we hold all other variables at the reference levels just described), we see that, from the first block, Carlton, Collingwood, Essendon, Geelong and Richmond might be expected to draw a larger crowd than Hawthorn when playing at home (which, Geelong aside, we saw in the earlier table).
The second block reveals that the Lions fans are likely to attend in greater number when the Lions are favourites than are fans of other clubs, and that the Roos' and Cats' fans are least likely.
Next, the third block tells us that a number of teams are better drawcards than the Eagles as Away teams, most notably Collingwood, Essendon, Carlton, Richmond and Geelong. GWS, Fremantle and Port Adelaide are significantly poorer drawcards than West Coast.
Block four talks to Venue (relative to the MCG) and shows that only Subiaco and Carrara can be expected to draw larger crowds for a given contest. Most other venues are expected to draw crowds of between 20,000 and 30,000 less (although it's important to factor in the Home team coefficients from Blocks 1, and perhaps 2 and 3, when coming up with a likely crowd figure, because certain teams are more likely to be the Home team at particular venues).
The fifth block estimates the effects of the day of the week on which a contest is played and shows that, relative to a Friday game, Saturdays and Sundays are likely to attract a smaller crowd for the same contest. That said, as the next block reveals, games starting at 7:30pm or later attract smaller than average crowds, so a mid-afternoon Saturday or Sunday game might actually be expected to draw a marginally larger crowd for the same venue and teams.
We see from the coefficient for the MoSSBODS' absolute expected margin that every additional Scoring Shot of superiority for the Favourite knocks about 180 people off the crowd, so a 10 Scoring Shot favourite might, for example, reduce the expected attendance by about 1,800, or about 5% of the average crowd. Of course, if that favourite were also the home team then the coefficients in Block 2 need to be considered as well.
Next we see the effects of month (relative to March) and find that they are generally small and unanimously statistically insignificant.
Finally, amongst the final block of coefficients we have the effects of the seasons (relative to the 2000 to 2004 period) and find that crowds in the 2005 to 2011 were almost 3,000 higher per game, while those in the period 2012 to 2015 were just over 1,400 higher per game. That's an interesting result when you look back at the raw season-by-season attendance figures, which might have you believe that the coefficient on this most recent period should be nearer zero, probably even negative. The coefficients in the model, however, adjust for the different mixes of games across the seasons, in particular here adjusting for the attendance-depressing impact in this most recent period of the introduction of Gold Coast and GWS.
At the base of the table we have Wald statistics for various hypotheses testing the joint significance of groups of coefficients. We find that, taken as a whole, the Home team (and its cross with Home team favouritism), Away team, Venue, Day of Week and Start Time (and their cross), as well as the Season coefficient sets are all highly statistically significant. The only set of coefficients not achieving joint statistical significance is that for the Month in which the game was played. In practical terms this means we can't reject the null hypothesis that all of these coefficients are zero, which would imply that Month has no affect on Attendance.
A plot of the Actual versus the Fitted attendance reveals that the fitted model tends to understate the attendance at games which attract very large audiences - say above 50,000.
If we look at those 58 games where the model is in error by more than 20,000 fans and the actual attendance was over over 50,000 we find that:
- 53 of the games were played at the MCG
- 29 of the games were played on a Saturday, 15 on a Sunday, and 10 on a Friday
- 30 of the games started before 4:30pm. Only 19 started at 7:30pm or later
- 43 of the games were played in May through August
- 36 of the games were played in the period 2009 to 2015, and only 16 were played in the period 2001 to 2008
- 45 of the games had a 6 Scoring Shot or less favourite
- 31 of the games had a Home team favourite
- 33 of the games involved Hawthorn, Collingwood or Richmond as the Home team, 9 more involved Essendon, and 5 more involved Carlton
- 13 of the games involved Carlton as the Away team, 11 involved Collingwood, 9 involved Essendon, 6 involved Geelong, and 6 involved Richmond
- 8 of the games were Richmond v Essendon clashes, 7 were Collingwood v Carlton clashes, 6 were Richmond v Carlton, 5 were Collingwood v Hawthorn, 4 were Essendon v Carlton, 4 were Hawthorn v Geelong, 4 were Collingwood v Geelong, and 3 were Essendon v Collingwood
- 7 of the games came in the 1st round of the season, 13 from Rounds 6 through 9, and 12 from Rounds 19 through 22
This suggests that the creation of a "super blockbuster" flag, defined on the basis of the characteristics most prominent in the large-error, large-attendance games described above might improve the fit of the model. For example, we might consider an "all-Victorian and played at the MCG" variable.
Other variables that would be interesting to include are:
- Weather (rain, temperature, wind, humidity, etc)
- Roof on versus roof off for Docklands games
- Public holiday information
- Team positions at the time of the contest and how important, therefore, was the game in the context of Finals positions
- Whether or not the game was played as part of a round with byes
- The availability or unavailability of key players
- The average experience of the named squads for the two teams
I'm sure readers could think of yet more variables to add, but further data scraping and analysis is a task for another day ...