Which Teams Are The Surprisal Packets?

Anyone who's ever tipped sporting outcomes knows the frustration of the untippable team, the team that win when they're supposed to lose, and lose when they're supposed to win. With that idea in mind, which team's results, I wondered, have been the most difficult to predict across the history of the VFL/AFL.

To answer that question I need two things: 

  1. an estimate of the probability of the outcome of every VFL/AFL game ever played, and 
  2. a metric for converting actual game outcomes relative to these probabilities into a measure of the difficulty of predicting that result. 

For the probability estimates for every game I'll use the model that I created for the previous blog, a binary logit with season-specific, and home and away team MARS-Ratings coefficients. With this model I can, for example, estimate the pre-game probability that could reasonably have been associated with Essendon beating Collingwood in their meeting in 1917 by inputting into the model, along with the year, each team's MARS Rating at the time of the clash.

Surprisals, you might recall, measure the "surprise" or, more technically, the "information content", of the outcome of a random event. They're measured in bits, with one bit being the surprise associated with the outcome of the toss of a coin; the more bits associated with an outcome, the more surprising it is. Surprisals are ideally suited for the second task listed above, that of converting the actual outcome of a game and the pre-game probability of that outcome into a measure of "predictive difficulty".

The formula for calculating the surprisals associated with a given outcome is:

Surprisals = - log2(p), where p is the pre-game probability of the eventual game outcome

By way of example, consider a game where the model had estimated Geelong as 75% favourites. If the Cats won, the surprisals associated with the outcome would be -log2(0.75) or about 0.41 bits. If, instead, the Cats lost, which is intuitively a more surprising outcome, the associated surprisals would be -log2(0.25), or 2 bits. (Were the outcome a draw we'd average these two figures giving about 1.2 bits.)

If you imagine doing that for every game of VFL/AFL ever played, that's the basis for all that follows.


Let's start by looking at the average surprisals per game associated with the game outcomes for every team that's ever played. Note that I've followed the MAFL convention of combining the results for Sydney and South Melbourne, for the Kangaroos and North Melbourne, and for the Western Bulldogs and Footscray.

On the left are the surprisals for each team when (notionally at least) playing at home, in the middle are the surprisals for each team when playing away, and on the right are the combined surprisals across all games for a particular team whether playing at home or away.

The table is sorted by this combined surprisal measure and so reveals that Adelaide have been the most difficult team to predict of all teams across the entirety of their history. Three other non-Victorian teams have been the next most difficult in the shape of Port Adelaide, Fremantle and West Coast.

Conversely, the Gold Coast, University and GWS are the teams whose results have been easiest to predict. That ease of prediction has come mostly from the strategy "tip loss".

Most teams have generated broadly similar levels of surprisal when playing at home as they have when playing away, but there are some notable exceptions. Collingwood have been significantly more predictable when playing away than at home whereas St Kilda have been just the opposite. The home and away pattern for Carlton and for Geelong is similar to that of Collingwood, though less differentiated.

Few reading this blog will have, I'm willing to wager, a deep knowledge of all 116 seasons of the competition so in these next few tables I've looked only at more-recent seasons to allow readers to compare their intuitions with the empirical evidence.

Firstly, here's the view for the teams playing at home across the period 1991-2012, where it's been Richmond who've most troubled the tipsters especially in the last two seasons. It takes a certain devotion to contrarianism to generate 1.3 bits of surprisal per game.

Completely unsurprisingly, the Gold Coast and GWS are the teams associated with lowest average surprisals. Next-lowest are the Cats whose well-chronicled home team consistency reflects here as paucity of surprisals.

Next we look at teams' away records.

Carlton heads the table with Geelong not far behind, proving to be a much trickier tipping proposition away from the fortress that has been Kardinia Park (and, for more-recent seasons, the G).

The Gold Coast and GWS once again prop up the table, joined now by Fitzroy on the basis of the very-predictable results they churned out over the course of their last 3 seasons in the competition.

Bringing it all together gives us this final table.

All up then it's Carlton, the Roos and Hawthorn that have, probabilistically speaking, been hardest to tip over the past 13 seasons, and Fitzroy, Gold Coast and GWS that have been easiest. 

(As you're scanning and interpreting all these tables recognise that very high levels of surprisals can only be generated by games where a heavy underdog prevails, but that moderate levels of surprisals are an inevitability whenever two teams of roughly equal chances meet. For example, at the extreme, where a contest is assessed as a 50:50 proposition, exactly 1 bit of surprisal will result whichever team wins. The commonsense of this behaviour of the surprisal measure is that, when teams of roughly equal ability meet, the outcome gives us quite a lot of "information" that we didn't have pre-game (ie which team is superior). In that sense, the result, whatever it is, is "surprising").


These days a number of clashes are touted as "traditional" rivalries - Essendon v Collingwood, Adelaide v Port Adelaide, and Fremantle v West Coast - often with the implication that they'll be less predictable than other contests because of this status. Whether or not these games do tend to produce surprising outcomes is, now, an empirical question.

Again we'll start by spanning all of history.

This table shows the average surprisals generated per game each time a particular pair of teams have met throughout history.

Interestingly, both the Adelaide v Port Adelaide and West Coast v Fremantle matchups do stand out as games that tend to generate surprising outcomes - though not Collingwood v Essendon clashes.

Other pairings that do tend to generate surprising outcomes are West Coast v Western Bulldogs, Adelaide v Fremantle, Adelaide v Sydney, Carlton v Port Adelaide, and Richmond v Port Adelaide.

Conversely, unsurprising outcomes are more the norm for clashes pitting St Kilda v Collingwood, West Coast v Richmond, or Carlton v Brisbane.

(For interest's sake I've included below tables showing the 25 matchups generating the highest and lowest levels of average surprisals if we take into account which team is playing at home and which away. For brevity's sake I'll not discuss them.)














Finally, again to provide a more-recent flavour to the analysis, here are the average surprisals for each matchup looking only at games from 2000 to 2012.

The standout surprisal factory in this period has been the Carlton v Western Bulldogs clash, though Collingwood v Geelong and Carlton v Essendon clashes have also tended to produce surprising outcomes. in the case of the Pies v Cats matchup this is almost certainly because of the generally even-chance nature of these affairs.

Geelong v Brisbane games have been least likely to surprise in this period, though only slightly less so than St Kilda v Collingwood and St Kilda v Brisbane games.

That's all for this blog. In the next I'll look at surprisals on a season-by-season basis to assess whether the competition has been getting more or less predictable.