Lately, while waiting for the competition to generate some new, meaningful new data to analyse, I've been looking at the history of VFL/AFL scoring, in particular Scoring Shot generation and Conversion Rates. (Those of you who follow me on Twitter will know that this has been a source of regular, if not always gripping, Tweetage.)
Both metrics have varied quite considerably across the 119 seasons of VFL/AFL history, so, to remove one source of variability, I typically analyse the data on an Era-by-Era basis, the latest of which I define as the period from 2000 to the present.
It was while reviewing the data for this Era that I came across the interesting fact that about 1 more Scoring Shot per game was registered in Sunday games compared to Saturday games, and these Shots were converted at about a 1% higher rate.
It seemed unlikely that the day itself was a causal factor so I cast about coming up with plausible hypotheses with the help of some Twitter followers.
First off, we know from very recent analyses that Conversion Rates tend to be higher in games with stronger favourites.
Using MoSSBODS Team Ratings to measure the relative team strengths (at the venue in question), we see about a 1% point variation in empirical Conversion Rates in games with closely-matched teams versus games with large mismatches, but we see only a little variability in Scoring Shot production.
It was also suggested that, perhaps, Saturday games have tended to be better match-ups than Sunday games, an hypothesis we can investigate by crossing our earlier Strength of Favourite metric with the Day of Week variable.
The results show that, to the extent MoSSBODS is a reasonable measure of relative team ability, the profile of Saturday games is not that different to the profile of Sunday games (see the shaded column of data).
What we also see in the results is that, whichever Strength of Favourite category we choose, teams have converted rates 0.5% to 1.5% points higher on Sundays than on Saturdays within the same category.
The same is true for Scoring Shot production where we see Sunday games generating between 0.5 and 2 Scoring Shots per game more than games played on Saturdays from within the same Strength of Favourite grouping.
Now we also know from previous analyses that some Venues tend to produce games with Conversion Rates significantly different from average. For the 16-season period being analysed (and ignoring those grounds where only a few games have been played), Conversion Rates across the Venues range from a low of 50.6% at Carrara to a high of 55.4% at Docklands.
(It's an interesting side note to the analysis that almost exactly 50% of all games in the period (1,539 of 3,059) have been played at either the MCG or at Docklands.)
Scoring Shot production has also varied widely by Venue, the Gabba, for example, producing on average 53.2 Scoring Shots per game, while York Park has managed just 46.5 and Kardinia Park 48.0.
Amongst the not infrequently used grounds then, we might classify Docklands and Princes Park as high Scoring, high Conversion; the Gabba and, to a lesser extent, the MCG, as high Scoring, low Conversion; Kardinia as low Scoring and (slightly) high Conversion; and Football Park, the SCG and Stadium Australia as low Scoring, low Conversion.
Fairly obviously, these differences cannot reasonably be attributed solely to the nature of the grounds themselves, but must also be driven by the teams that play at a given venue most often. (Teasing out the Venue from the Team effects was what we attempted to do in the blog post linked to earlier).
But what of the relationship between Day of Week and Venue?
The table at right shows the Venue mix of games for each day of the week and reveals that, compared to Saturday games, Sunday games have more often been played at Docklands in particular, which we've just seen has been a ground that's tended to produce higher Conversion Rates and greater Scoring Shot production.
Also, a lesser proportion of Sunday games have been played at the MCG, which is a ground associated with only around-average Conversion Rates and Scoring Shot production.
Speaking of teams, we should also look at the team-by-team statistics for this period.
There's significant variability across this dimension too, the Gold Coast having converted at just 50.3% across the entire period, while Hawthorn have converted at an amazing 55.6%.
Greater Western Sydney are the team that has generated fewest Scoring Shots per game (20.6 per game), and Geelong the most (27.1 per game). That's a difference of over 30% - and a difference in Goal production per game of almost 40% when you combine Geelong's Scoring Shot production superiority with its Conversion Rate superiority.
The spread of the 18 teams across these two dimensions - and the positive correlation between them (+0.54) - is made very clear by a chart.
The chart also seems to broadly group teams based on their competition success over the 16 year period the obvious exception being the Western Bulldogs, whose scoring statistics suggest that they might have reasonably hoped for more than just the three Preliminary Finals appearances in that time.
Offence, though, is only one half of the picture, and it's when we review the scoring performance of teams' opponents that we gain a more complete understanding of their abilities.
The Bulldogs, for example, have conceded 27.1 Scoring Shots per game, about 1.7 more than the all-team average, and those Shots have been converted at a rate of 54.4%, which is almost 1% point higher than the all-team average.
There are worse teams, however, on both measures. Greater Western Sydney have conceded 30.1 Scoring Shots per game, Gold Coast 28.7, and Melbourne 27.8. Greater Western Sydney have also seen those Shots converted at a rate of 55.1%, which is slightly higher than the 54.6% Conversion Rates by opponents of Essendon and West Coast.
Best on Scoring Shot production has been Sydney with just 22.7 Scoring Shots per game conceded, ahead of Geelong on 22.9, while best on Conversion Rate has been Sydney's 51.6%, then Hawthorn's 52.4%.
Again, a chart highlights the spread of the teams on these two metrics and suggests a moderate, positive correlation between them (it's +0.48).
So then, which teams tend to play more often on which days?
The table at left provides the team-mix profile for each day of the week and reveals that, relative to Saturdays, Fremantle, the Kangaroos, Port Adelaide and the Western Bulldogs have appeared more often in Sunday games, while the Brisbane Lions, Collingwood, Geelong and Sydney have appeared less often.
It's difficult to use this data in conjunction with the team-by-team offensive and defensive data above to draw definitive conclusions about the contribution of the different team mixes on the observed differences in Saturday and Sunday scoring data without constructing a statistical model (which we'll do later), but we can make a few qualitative observations.
In particular, Sunday Conversion Rates are bolstered by the relatively less frequent appearances of Brisbane, Collingwood and Sydney, and by the relatively more frequent appearances of the Western Bulldogs. Also, Sunday Scoring Shot levels benefit from the relatively less frequent appearances of Collingwood, Geelong and Sydney, and the relatively more frequent appearances of the Kangaroos, Port Adelaide, and the Western Bulldogs.
A number of people on Twitter thought the apparent Sunday effect might be attributable to the relative rareness of "night games" on Sundays compared to Saturdays.
The table at left summarises Scoring Shot and Conversion data by the Day of the Week and the (grouped) Starting Time of the game. Note that only 5 games during the period were played on a Sunday and started at 7:30pm or later. By comparison, 162 games on Saturdays started at that time.
As well, only 108 Sunday games started between 4:30pm and 7:30pm, compared to 653 of the Saturday games.
Very clearly, the starting time profile of Sunday games is different from the starting time profile of Saturday games. That said, for any given Starting Time group, Conversion Rates on Sundays are higher than those for the same Starting Time group on Saturdays.
Lastly, in was conjectured that crowd pressure might drive variability in Scoring Shot production and conversion. To test this, I sourced crowd data for every game from the afltables site and then grouped games on the basis of the quartile into which its attendance level fell.
Evidence for any (linear) link between Attendance Level and either Conversion Rate or Scoring Shot production, seems weak at best.
MULTIVARIATE REGRESSION MODELS
We've now teased out, for a range of variables, the univariate and/or bivariate relationships they have with Conversion Rates and Scoring Shot production. But, there's only so far you can go by controlling for just one or two variables.
For this last section I've created two multivariate least squares regressions to explore the marginal relationships between all of the variables explored so far and the two scoring metrics.
(Note that I included Attendance and Strength of Favourite in their continuous forms, rather than as categoricals as tabulated above. Also, I should point out that I'd hoped to create a Beta regression for the Conversion Rate model, but had difficulties fitting the model using the betareg package in R. Doubtless the inadequacy was mine and not the package's.)
Let's start by reviewing the Conversion Rate model. The coefficients here can be interpreted as percentage points in fitted Conversion Rates. So, for example, we can say that, controlling for all other variables in the model, teams playing at Bellerive Oval have converted at 3.8% points lower than teams playing at Adelaide Oval (which is the reference Venue for all Venue-related coefficients).
Only one Venue has a coefficient that is significantly different from zero, Cazaly's Stadium, where teams are estimated to have converted at a rate 8.4% lower than at Adelaide Oval.
At the foot of the table are Wald Statistics on the joint significance of various sets of coefficients, the first of which is for the Venue variables taken together. The p-value for the null hypothesis that all of these coefficients are jointly zero comes in at 0.0004, which allows us to reject that null with some considerable vigour. In practical terms this simply means that at least one Venue almost certainly has some effect on Conversion Rate variability across games.
The next block of coefficients in for the Home Team in each game and these are expressed relative to Adelaide. No single coefficient is statistically significantly different from zero and, taken together, they are jointly significant only at the 10% level. So, taken across the entire 16-season period, the Home Team variable hasn't provided a lot of help in explaining Conversion Rate variability.
The Away Team variable has offered even less assistance, although we can claim that St Kilda, playing away, has tended to convert at a higher rate than Adelaide, also playing away.
Moving next to the Day of Week variable, where all comparisons are made with Friday games, we find that Saturdays and Sundays have very similar coefficient values of around -1%. This suggests that we've found other variables that are included in the model to explain the difference in Conversion Rates between Saturdays and Sundays that we noted at the very start of this blog.
In fact, taken together, we can't reject the null hypothesis that all of the Day of Week coefficients are zero, which implies that we've otherwise explained all of the day-versus-day variability in Conversion Rates that we saw in that original table.
As we'd probably have expected based on the results from the recent blog exploring favourite strength and Conversion Rates, we find the Strength of Favourite variable statistically significant here. The coefficient isn't huge though, and suggests only a 0.8% increase in the expected Conversion Rate for a game with a 10 Scoring Shot favourite.
Lastly, we have the Attendance variable which, unsurprisingly given our earlier univariate analysis, proves to be not statistically significantly different from zero. It is, however, negative, so our best estimate would be that larger crowds do tend to lower Conversion Rates, but the effect size is tiny - just 0.2% points for every additional 10,000 crowd members.
Overall, the regression model for Conversion Rate has a similar fit to those built previously here on MatterOfStats, with about 5% of the variability in Conversion Rates from game-to-game explained.
Much more of the variability in Scoring Shot production (about 12%) is explained by the other regression model however, where we find, to begin with, that particular Venues are associated with quite different levels of Scoring Shot production (again relative to Adelaide Oval). As noted earlier, Adelaide Oval has been a low-scoring ground, so it's not surprising to see a slew of positive coefficients here, foremost amongst them those for the Gabba, SCG and Stadium Australia where the coefficients are all double-digit and statistically significant. Taken together, the Venue coefficients are highly significant.
Both the Home Team and Away Team variables are also highly significant jointly. Amongst the individual coefficients for the Home Team variable we see that Sydney is estimated as being involved in games when playing at home where Scoring Shot production is about 9 lower than when Adelaide is playing at home. This offsets, to some extent, the large coefficients we saw for the SCG and Stadium Australia, at least for those games where the Swans are playing as the Home team.
The Away Team coefficients are, generally, smaller in size, reflecting lesser variability in the away performances of teams relative to Adelaide. Sydney (-4.3) and St Kilda (-2.2) have the most-negative coefficients, and Carlton (+1.6) the most-positive.
Turning next to the Day of Week variable we see that Sundays produce, on average, about 0.4 more Scoring Shots per game than Saturdays (ie 1.3 - 0.9) after we control for all other variables in the model. That said, the Day of Week variables, taken together, are not jointly significant, so we can't reject the null hypothesis of them all being equal to zero. As for the Conversion Rate model then, it appears that other variables in this Scoring Shot model account for most of the day-versus-day variability in Scoring Shot production we saw in the original table.
The two Start Time coefficients are negative and statistically significant, suggesting that later starts are responsible for about a 1 to 1.5 Scoring Shot reduction relative to starts before 4:30pm.
Favourite Strength appears to have little if any effect on Scoring Shot production, the effect size of -0.01 practically and statistically not meaningfully different from zero.
Lastly, Attendance does seem to have a mild though statistically significant positive effect on Scoring Shot production, with every 10,000 increase in attendance lifting production by 0.3 Scoring Shots.
In summary then:
- Variability in game-to-game Conversion Rates are statistically associated with Venue, Home Team (somewhat), Start Time and Strength of Favourite, though not with Attendance, Away Team, or Day of Week.
- Variability in game-to-game Scoring Shot production is also statistically associated with Venue, Home Team and Start Time, but also with Away Team and Attendance, though not with Strength of Favourite nor Day of Week.
The Sunday versus Saturday effect that we saw at the start of this blog then, is unlikely to be due to anything particular about those days of the week, but instead to the different characteristics of games played on Sundays compared to games played on Saturdays - who played, where they played, the time they started, in front of what sized crowd, and with what relative abilities.