A Proposition Bet on the Score

It's been a while since I've written about a proposition bet. So let me remedy that by crafting one from the two components of a team's score: the number of goals and the number of behinds it scored.

CONDITIONALLY GUESSING GOALS SCORED OR BEHINDS SCORED

Consider this proposition: 

I'm going to select a game and a team at random and tell you how many behinds that team scored. Your goal is to "predict" how many goals that same team scored in that game. Conversely, you're going to select another game and a team at random and tell me how many goals that team scored and I'm going to "predict" how many behinds they scored. If we're both right we pay each other, if we're both wrong nobody wins, and if only one of us is correct, the winner pays the loser. 

I'm proposing even-money odds, which sounds fair to me Are you in?

Well, if the games we used for the wagering came from VFL/AFL history before 1981 you'd be well-advised to take that bet, because in about 51.1% of the games where money changed hands it would have been paid to you. But, if we'd instead selected a game at random in the period since then, I'd have been a collector 52.7% of the time that money changed hands.

In fact, if we'd settled up at the end of each season, I'd have prevailed in every year except 2000, when we'd have broken even (what's charted at right is the percentage of times the bettor asked to guess goals given behinds would collect as a proportion of the number of games in which money would change hands).

From an information theoretic point of view what we're asking here is how much randomness or information remains in the number of goals a team scored once we know how many behinds it scored (ie H(Goals | Behinds)) and how does this compare to how much randomness or information remains in the number of behinds a team scored once we know how many goals it scored (ie H(Behinds | Goals)). These are conditional entropies, which can be measured in R by using the condentropy function of the InfoTheo package. 

The following chart tracks these two conditional entropy measures, by season, and reveals that H(Goals | Behinds) < H(Behinds | Goals) from 1897 to about 1970, and that the reverse is consistently true from about 1982 onwards - which is when, as I revealed above, the bet I originally proposed became more advantageous for me.

Another interesting feature of this chart is the fact that both H(Goals | Behinds) and H(Behinds | Goals) have been increasing since about 1982, suggesting that knowledge of either portion of a team's score has become less revealing of the unknown portion whether it's the goals or the behinds portion that you knew.

In fact, there's been a general increase in both conditional entropy measures, but especially in H(Goals | Behinds), pretty much from the very first VFL season onwards. At the same time the linear correlation between Goals Scored per game and Behinds Score per game has been positive but trending downwards.

So, how exactly would I benefit from this knowledge for my proposition bet? Well, if I aggregate the scoring data for every game from season 1982 to the present (2014 R17), I get the following cross-tabulation in which each entry is the number of games that finished with a team recording the number of goals (rows) and number of behinds (columns) shown.

For example, there was one game where a team finished with a score of 1 goal and 7 behinds and 68 games where they finished 9 goals, 8 behinds.

To find the optimal strategy for predicting the most likely number of goals scored by a team given knowledge of the number of behinds it scored, we find the maximum in each column. I've coloured these cells orange (or green if they're also the row maximum). Summing these orange cells tells us how often we'd correctly "guess" the number of goals scored by a team if we knew how many behinds it had scored (and had prior access to this cross-tab).

Similarly, the optimal strategy for predicting the most likely number of behinds scored by a team given knowledge of the number of goals it scored can be found by identifying the row maxima, which I've coloured grey (or green if they're also the column maximum). The sum of these maxima is 1,281, which tells us that optimally guessing behinds given goals would have paid out 132 times more often than optimally guessing behinds given goals.

CONDITIONALLY GUESSING THE WINNING OR LOSING SCORE

The same methodology can be employed to determine whether it's easier to guess the winning score in a game knowing the losing score, or guessing the losing score knowing the winning score. 

Take a moment to think about which of these tasks you think might be easier.

Here, it turns out that, with the exception of only a few seasons, H(Winning Score | Losing Score) > H(Losing Score | Losing Score). In this sense it's fair to say that it's harder to predict the winning score knowing the losing score than vice versa. One factor that immediately springs to mind as a probable contributor to this result is the fact that losing scores are bounded from below by 0 while winning scores are, theoretically, unbounded above (though practically bounded at, say 250).

Making a viable proposition bet from this knowledge would require some adjustment to the settlement process we described for the bet involving goals and behinds. In that case it made practical sense to insist that a payout required an exact guess of the number of goals or behinds scored. Here, such a requirement would mean that payouts would be extremely rare so instead we might determine the winner of the wager on the basis of who was nearer the actual score.

I think it's correct to say that the conditional entropy results above imply that an optimal strategy for predicting the winning score given the losing score would be dominated by an optimal strategy for predicting the losing score given the winning score, but this is something I'd like to test empirically at some point.

In the meantime, I'll finish with a chart of the timeseries for the correlation between winning scores and losing scores for games in each season. This correlation in recent seasons has been at levels not seen since the 1920s. I'm not sure what to make of this nor to what to attribute it.