With the Tigers toppling the Swans with a goal after the siren on Saturday night, one of my Twitter followers, a Swans fan, wondered if there was a way to mathematically operationalise the notion of a "bogey team" and, more importantly, were the Tigers such a team for the Swans?
For those of you who might be unfamiliar with the term, a "bogey team" is, loosely, thought of as being a team that seems to beat your team more often than they should, even when they're the less-fancied of the pair.
That somewhat intuitive description lends itself to a natural, quantifiable measure of what I'll call Bogeyness, a measure for which I'll define as the difference between the number of times one team has beaten another and the number of times we might reasonably have expected that to occur.
As an equation then:
Bogeyness Score = Actual Wins - Expected Wins
So, for example, if your team has played another team on four occasions and been equal-favourites on all four, then we'd expect both teams to have won, on average, two games. If your team lost all four then the Bogeyness Score for your team against this opponent would be -2, since your team had won 2 games fewer than expected.
Calculating Bogeyness Scores for all pairs of teams across the period 2000 to the end of Round 8 2016, using MoSSBODS to calculate each team's victory probability in every game, yields the chart below.
In this chart, the size of the dot is proportional to the number of games played between a pair of teams, and the colour of that dot denotes the Bogeyness Score. If we review the results for any given team (ie scan across a row), the dark green dots represent that team's "bogey teams" - the opponents that have won more often than they should have.
We see for Sydney then, for example, that Richmond is something of a "bogey team" but less so than Collingwood and only about as much as Adelaide and Geelong. Sydney themselves, however, are "bogey teams" for Brisbane, Carlton, Port Adelaide and West Coast during the period we're reviewing.
Hawthorn is a "bogey team" for Adelaide, Carlton, Collingwood, Fremantle and Melbourne, but has underperformed, relatively speaking, against Geelong, Port Adelaide and Richmond.
If you support a particular team you'll probably have your own views about which are your bogey teams and whether this chart aligns with your perceptions. I'd be keen to hear your feedback on this.
INTERPRETING BOGEYNESS SCORES
There are a few things we need to consider when interpreting the Bogeyness Scores shown here.
The first is that it's a mathematical and statistical fact that the Bogeyness Score will tend to increase with the number of games played between two teams (in the same way that the expected number of excess heads over tails in a series of coin tosses increases with the number of tosses (also here). Since most pairs of teams in this chart have played each other a similar number of times across the time period we're considering, that's not such an issue here, except for GWS and the Gold Coast.
The second is that the Bogeyness Scores shown here depend partly on the accuracy of MoSSBODS probability assessments in each game. If MoSSBODS is poorly calibrated then the Scores shown here are likely to be more variable than their "true" values.
Now MoSSBODS has been by no means perfectly calibrated across the 2000 to 2016 period but, as the chart at right reveals, it's not been especially poorly calibrated across any part of the home team probability space either.
In an ideal world the line on this chart would be a 45-degree line passing through the (50%, 50%) point, since such a line would imply that, when MoSSBODS assigned the home team a probability of X% it would win, on average, X% of the time.
We see a few imperfections, such as the fact that home teams assigned a 50% probability tend to win a little over 50% of the time, but the overall picture is of a reasonably well-calibrated probability forecaster.
So, I think it's reasonable to say that MoSSBODS, overall, is doing a sufficiently good job at estimating team victory probabilities.
That's not to say, of course, that it might not systematically be under- or over-estimating the chances of individual teams or in specific contests, but the difficulty in determining this comes from the challenge of differentiating pre-game miscalculation of a team's chances across a series of games and the genuine underperformance of that team in those games. Put another way, it might truly be the case that a team was, objectively, a 75% chance of winning all 10 contests against some other team, but that it won only 3 because it consistently underperformed in those games.
Lastly, in interpreting the Bogeyness Scores, we should recognise that deviations from expectations are to be expected, and that some of those deviations will be large due solely to chance. In my view, the notion that a team "plays above themselves" or "has a hoodoo" when playing some other team or at a particular venue, across a reasonably long period of history, is a highly suspect notion. It might be the case that, in the short-term, one's teams style works particularly well against another, otherwise better-performed team, but it's unlikely that this will be the case over an extended period.
With that in mind, I'd be inclined to consider most of the larger (in absolute terms) Bogeyness Scores here to be partly a reflection of short-term advantages resulting from a beneficial mismatch in styles or a well-suited venue, but mostly a reflection of random variation.
They're still fun to look at though.
ALL-TIME BOGEYNESS SCORES
What if we take a longer view of history, go right back to 1897, and review the Bogeyness Scores across that expanse of time?
If we do that, we arrive at the chart below, prepared on the same basis as the earlier chart.
I noted earlier that larger absolute differences between actual and expected wins were more likely for pairs of teams that had played more often, and we see the empirical evidence for this fact in the chart above where darker red and darker green points tend only to be larger points, and where smaller points tend towards the yellows and oranges.
One way to control for this is to standardise the Bogeyness Scores by dividing them by the square root of the number of games played between the relevant teams, which is what I've done for this final chart
Taking this view we see that Carlton has consistently over-performed relative to expectations when playing St Kilda, as did Fitzroy against South Melbourne, and South Melbourne against St Kilda.
All the same interpretational caveats apply to these charts as they did to the chart for 2000 to 2016.
MoSSBODS calibration, however, is even less of a concern for this long expanse of history than it was for the 2000 to 2016 timeframe, as evidenced by the chart at right, which represents a model about as empirically well-calibrated as you might wish for.
So, certainly, some teams tend to do better or worse than expected against other teams, even across relatively large expanses of time, but the extent to which that means one team is the other's "bogey team" or that it's merely the natural outcome of a random process, is a topic worthy of further analysis and discussion.