In the previous post we saw that large or what might be called "blowout" victories in AFL games occurred at a rate much higher than the rate at which bookmakers assigned equally large handicaps to those same games, and we saw why this might be the case from a theoretical perspective on the assumption that AFL game margins were Normally distributed.
It's one thing to find a result that applies to a single sport, but might it also be true in another major Australian sporting competition, the NRL?
Unlike for the AFL, I don't have a personally curated data set for the NRL, so for this post I'll be relying on the information provided on the aussportsbetting site, which includes results and bookmaker data for National Rugby League games going back to 2009. That data doesn't include handicap data for the entire period, and some of the handicap data that it does include has different prices for the home and the away teams, meaning that the handicap provided for these games is not a true estimate of the (negative of) the expected final margin. So, if we want to investigate the entire period, we're going to need a reliable way of estimating pre-game handicaps for every game.
The data does include for almost every game, head-to-head prices, averaged across a number of bookmakers, which we can use for the 292 games for which we have handicap data with equal prices for the home and the away teams to build a beta regression model in R. In that model we regress the average implied home team probability (calculated, for the current purpose, by adopting an Overround Equalising approach) on the opening home team line, and we choose a logit link. That model can then be used to estimate an implicit handicap for every game for which head-to-head prices are available.
(On a slightly technical note, I should point out that I've used the panel-based head-to-head bookmaker price data with the Pinnacle-only handicap data for this regression. The data set does also include Pinnacle-only head-to-head data, but for a much smaller number of games. So, since I'll ultimately be inferring handicaps using the bookmaker prices from the panel data, I opted to use those prices in the model build, despite the possible logical inconsistency that this represents. I don't think much swings on this decision.)
The fitted model is ln(Implicit Prob/(1-Implicit Prob)) = 0.0065 - 0.1040 x Home Opening Line, and it has a pseudo R-squared of about 97%.
This model allows us, via the panel bookmaker head-to-head prices, to infer a Home Line for all but 3 of the 1,407 games, and these Lines can be used to investigate margin errors for the 1,404 games calculated as the Actual Margin minus the Expected Margin.
Remarkably (well, to me at least) these errors look suspiciously Normally distributed, if a little less peaked and thinner-tailed than would be textbook.
A QQ-plot (see chart at left) suggests that an assumption of Normality is not unwarranted, save perhaps in the tails as the chart above hinted at, and a Shapiro-Wilk test spectacularly fails to reject the null hypothesis assumption of Normality (p = 0.78).
The errors have a mean of about 1 and a standard deviation of about 16 points, and I've used these parameters in the tables that follow to make a comparison, as I did for the AFL data, between the observed empirical and theoretical results.
The grouping of Expected Absolute Margins used in the analysis came from a review of their distribution across the 1,407 games, and I chose boundaries that put about 20% of games in each group. We see, as we did for AFL data, that the ratio of actual to expected margins in the most extreme category (here, 9 points or more), runs at about 3:1. While almost 60% of games finished with a margin of 9 points or more, only 19% of games had a pre-game expected absolute margin in this same range.
We also see that the correspondence between empirical and theoretical results is very high. The difference again seems to be largest for games with the highest expected margins, which might be explained by a higher inherent variability in the final margins of these games, by a greater divergence from Normality for them, or by a combination of both.
Regardless, we find that, in the NRL under seemingly reasonable assumptions, large victory margins occur more often than a simple analysis of pre-game margin expectations would imply.
From a theoretical perspective, the reasons for this are the same as those posited for the AFL results: relatively speaking, the Normal distribution puts a substantial portion of its density in the tails, so even games between quite evenly-matched opponents will often finish with a relatively large victory margin for one of the teams - sometimes even a large victory margin for the underdog.
Exactly why handicap-adjusted margins in both the AFL and the NRL should, to a reasonable degree, follow a Normal distribution, is a separate matter, though I have being able to derive this result for games of Australian Rules from a pseudo "first principles approach" by making assumptions about the score-generating processes.
In any case, the conclusion seems to be that relatively large wins - for suitably defined "relatively" - are an inevitable part of sport, even for contests between equally-talented teams.