We've looked at the topic of uncertainty of outcome and its effects on attendance at AFL games before, first in this piece from 2012 and then again in this piece from 2015.

In both of those write-ups, we used entropy, derived from the pre-game head-to-head probabilities, as our measure of the uncertainty in the outcome. In the first of them we found that fans prefer more uncertainty of outcome rather than less, and in the second that fans prefer the home team to be favourites, but not overwhelmingly so.

Today I want to revisit that topic, including home and away game attendance data from the period 2000 to the end of Round 13 of the 2017 season (sourced from the afltables site), and using as the uncertainty metric the Expected Margin - the Home team less the Away team score - according to the MoSHBODS Team Rating System. There's also been a suggestion recently that fans prefer higher-scoring games, so I'll also be including MoSHBODS' pre-game Expected Total data as well.

Let's begin by looking at the relationship between expected final margin (from the designated Home team's perspective) and attendance.

There are just over 3,000 games in this sample and the preliminary view from this analysis is that:

fans prefer games with expected margins near zero - maybe where the home team is a slight favourite
attendance drops off more rapidly for decreases in expected margin (ie as the home team becomes a bigger underdog) than for increases in expected margin (ie as the home team becomes a bigger favourite)

Those conclusions are broadly consistent with what we found in the earlier blogs (and with the more general "uncertainty of outcome" hypothesis by which name this topic goes in the academic literature).

There doesn't appear to be much evidence in this chart for increased attendance with higher expected total scoring, however, an assertion that the following chart supports.

Now there's clearly a lot of variability in attendance in those charts, and whilst Expected Margin might explain some of it, by no means does it explain all of it.

One obvious variable to investigate as a source for explaining more of the variability in attendance is Home team, since some teams are likely to attract higher or lower attendances when playing at home, regardless of how competitive they are.

We see here some quite different patterns of association between Expected Margin and Home team, with a number of teams - especially the non-Victorian ones - drawing similar crowds almost regardless of how competitive they were expected to be

Attendances, of course, are constrained by capacity at many venues, which suggests another dimension on which we might condition the analysis.

Here we consider only the 10 venues at which at least 50 home and away games have been played during the period we're analysing, and we again see a variety of relationships between attendance and expected margin, though the more frequently-used grounds - the MCG and Docklands - do show the inverted-U shape we saw in the first chart.

We could continue to do these partial analyses on single variables at a time, but if we're to come up with an estimate of the individual contribution of Expected Margin and Expected Total to attendance we'll need to build a statistical model.

For that purpose, today I'll be creating a Multivariate Adaptive Regression Spline model (using the earth package in R), which is particularly well-suited to fitting the type of non-linear relationship we're seeing between attendance and Expected Margin.

The target variable for the regression will be Attendance, and the regressors will be:

Designated Home Team
Designated Away Team
Venue
Day of Week
Month of Year
Night Game dummy (which is Yes if the game starts after 5pm local time)
Same State dummy (which is Yes if the game involves two teams from the same State playing in their home state)
Expected Margin (from MoSHBODS, home team perspective)
Expected Total Score (from MoSHBODS)

We'll allow the algorithm to explore interactions, but only between pairs of variables, and we'll stop the forward search when the R-squared increases by less than 0.001.

We obtain the model shown at right, the coefficients in which we interpret as follows:

A constant, which sets a baseline for the fitted attendance values. It will be added to and subtracted from on the basis of the other, relevant terms
A block of coefficients that apply based on the game's Designated Home Team. If that's Collingwood, for example, we add 6,805 to our fitted attendance figure.
A block of coefficients that apply based on the game's Designated Away Team. If that's Fremantle, for example, we subtract 3,944 from our fitted attendance figure.
A block of coefficients for different venues. Games played at the MCG attract a 5,747 increase, for example
A coefficient for night games, which attract, on average, an extra 2,657 fans
A coefficient for games played between teams from the same State and played in their home state (for example, Adelaide v Port Adelaide at the Adelaide Oval, Brisbane v Gold Coast at the Gabba, or Melbourne v Western Bulldogs at the MCG). These games attract, on average, an additional 10,629 fans
A coefficient for a hinge function based on the Expected Margin. That coefficient only applies when the expression in the brackets, 3.04112 - Expected Margin, is positive. Here that means it only applies where the Expected Margin is less than 3.04112 points. For Expected Margins greater than this, the effect is zero, for Expected Margins below, it reduces the fitted attendance by 118.5 people per point of Expected Margin.
A coefficient for another hinge function based on the Expected Margin, but this one only applies for games that are played between teams from the same State in their home State. The hinge element adds a further restriction and means the coefficient applies only in games where the Expected Margin is above 3.0727

Together, these last two terms create the relationship between attendance and Expected Margin that we saw earlier. The orange portion to the left of about a +3 Expected Margin applies to all games. For games where the Expected Margins is above about +3 points, the red portion applies if the game involves teams from different States or teams from the same State playing out of their home State (for example, in Wellington or at Marrara Oval), and the orange portion applies if the game involves teams from the same State playing in their home State (for example Sydney v GWS at the SCG).

Note that we obtain here not only the inverted-U shape, but also a relationship where attendance drops off more rapidly with negative Expected Margins than it does with positive Expected Margins.

There are a few more interaction terms in the model.

One that involves a hinge function on Expected Total Score that applies in games involving teams from the same State. It provides a small increment to fitted attendances for games where the Expected Score exceeds about 165 points. So, for these particular game types (which represent about 37% of all games), fans do prefer higher scoring
Some terms related to particular Home Team and Venue combinations, and to particular Away Team and Venue combinations
Terms for the traditional Collingwood v Carlton, and Collingwood v Essendon matchups, where Collingwood is the Designated Home Team
Some special terms for games involving particular teams at home or away facing other teams from their home State.
Terms for Monday, Tuesday and Sunday games at the MCG, the first two capturing the higher attendances at long-weekend and ANZAC day matches
A night game at Stadium Australia term, and a Thursday night with teams from the same State term
Some Home team and Expected Margin interaction terms involving hinge functions again
A term for games where Essendon is the Away team that lifts the fitted attendance for games where the Expected Total is above about 171 points.

The overall fit of the model is quite good, with almost 80% of the variability in attendance figures being explained (the Generalised R-squared for the model, which provides an estimate of how well the model might be expected to fit other data drawn from a similar sample, is about 76%).

Diagnostic plots reveal that there is some heteroscedasticity, however, with larger errors for games with higher fitted attendance levels.

It could be that some systematic sources of error remains and that the fit could be improved by, for example, considering the criticality of a particular game in the context of the season or the availability or unavailability of key players. Weather too would doubtless play a role, and maybe even the quality of the other games in the round.

Nonetheless, this model seems a reasonable one for at least first-order estimations of the magnitudes and shapes of the relationships between attendance and Expected Margin, and between attendance and Expected Total score. Both Expected Margin and Expected Total have some influence, but the rate at which attendance varies with changes in depends on the specifics of the game being considered - in particular, who is playing whom, and where.