Expected Surprisals

If the TAB Bookmaker is any sort of a judge (and, speaking from painful experience, he is) then this week's results, Round 17 of season 2012, hold as much information about the competition as almost any round this season.

What, exactly, do I mean by that?

Well you might recall the notion of surprisals, which I talked about most recently earlier this season as well as on a  number of previous occasions. You can take this concept a step further and ask: what is the expected surprisal content of a particular contest? If, for example, the head-to-head prices for a game are \$2.20 / \$1.68, then the implicit probability of a win by the underdog is given by, following MAFL convention, 1.68/(1.68 + 2.20) which is about 43%. Such a result would constitute -log(0.43,2) or 1.21 bits of surprisal.

Conversely, a win by the favourite has probability 57% and associated surprisals of -log(0.57,2) or 0.82 bits. Since a win by the favourite is more expected, it carries less information about the relative merits of the two teams and so produces fewer surprisals when it happens.

The expected surprisals from the outcome of this game then is just the probability-weighted sum of the surprisals of the individual results. Here that's 43% x 1.21 + 57% x 0.82, which is about 0.99 bits.

Because the two teams in this game are of roughly equal ability, as evidenced by their near-equal favouritism, the result of the game involving them carries a relatively large amount of information, a fact which is reflected in the relatively large expected surprisal value of the game's outcome. In comparison, consider a game between teams priced at \$5 and \$1.17. Here an underdog win produces 2.4 bits or surprisal, but is only a 19% chance of occurring, where a win by the favourite promises just 0.3 bits of surprisal but is an 81% chance of occurring. The expected surprisal from this game is only 0.7 bits so, in some sense, the result of the game is expected to convey only about 70% of the information that the result of the previous game is expected to provide.

This approach to calculating expected surprisals can be applied historically to the pre-game TAB head-to-head prices for every game since 2006. We can also calculate the actual bits of surprisal that were realised in the game results by calculating -log(p,2) for each game, where p was the probability associated with the team that actually won. (For draws we just average the surprisals associated with a win by either team.)

Averaged per game by round, these expected and actual surprisal figures appear in the following table:

Remember that larger average values imply more expected (left) or actual (right) surprisal.

Based on the opening prices for Round 17 of Season 2012 then, the average game in the round carries 0.87 bits of surprisal, lower only than the figure of 0.90 per game for Round 10 - hence my opening claim that this round carries about as much information about the competition as almost any other round so far this season.

Note that the average expected surprisal per game for the entirety of Season 2012 stands at just 0.75 bits per game, making this the most anticipatedly unsurprising season since sometime before 2006. Looking at the equivalent, "Total" row in the right hand table, we can see that the season's broadly been living up to these expectations. An average game this season has produced only 0.73 bits of surprisal, ranking it only a whisker ahead of the predictability of last season, which produced just 0.72 bits of surprisal per game.

Some of you might prefer to see this data in chart form:

For me, the most notable features of these tables are that:

• The first round of the season, and the Finals, produce the highest average expected surprisal figures. In the first round this is probably due to the caginess of the TAB Bookmaker and his consequent unwillingness to offer juicy odds about any team. In the finals it's almost certainly due to the more evenly-matched nature of opponents in these games.
• Actual surprisals show a much greater level of variability from round to round, tending to peak in Rounds 1 through 7 before generally staying lower for the remainder of the Home-and-Away season, with a couple of spikes in Rounds 17 and 19. The first week of the finals is another source of higher-than-average surprisal production, followed by a couple more weeks of general predictability, especially in the Prelims, capped by a final spurt of surprisal generation in Grand Finals.

Finally, I note that the TAB Bookmaker's all-game, all-Season expected surprisal value of 0.84 bits per game is impressively close to the all-game, all-Season actual surprisal value of 0.86 bits per game. Calibration, thy name is Bookmaker.