Are Some Games Harder to Predict Than Others?

If you've ever had to enter tips for an office competition where the the sole objective was to predict the winner of each game, you'll intuitively recognise that the winners of some games are inherently harder to predict than others.

In fact, the level of difficulty can be quantified for each game via the expected surprisal metric, which has it that the amount of surprise we expect from the result of a particular game (ignoring draws) is given by:

Expected Surprisal = - p log p - (1-p) log (1-p)

where p is the probability of victory for one of the team's (so 1-p is the probability for the other) and the logs are measured base 2.

This measure, whose units is bits, is maximised when p is 0.5 and shrinks as p moves towards 0 or towards 1, the simple intuition of which is that the hardest games of all to predict - those with the highest expected surprisal - are those where there are equal favourites and the easiest to predict are those where there's a very short-priced favourite. Intuitive - and fairly obvious.

But what if our task is not just to predict the winner of a game but also the size of that victory - the final game margin as we call it on MoS, the difference between the home team and the away team score. Are there games for which this margin is harder to predict? 

To address this issue we firstly we need a difficulty metric and for this task I'm going to use Expected Error, defined as follows:

Expected Error = Expected(abs(Actual Margin - Expected Margin))

In words, the metric is the average absolute amount by which we expect our margin prediction to be in error assuming that our margin prediction is equal to the true expected game margin. To draw on the terminology used elsewhere on MoS, it's our expected Absolute Prediction Error (APE) if our predictions are unbiased (the correct statistical terminology here is MAE, which stands for Mean Absolute Error).

So, if we (correctly) expect that Geelong will win by 20 points but the underlying randomness of the game outcome is such that, on average, we can expect to be in error by 35 points, then the MAE for the game is 35 points. If, for another game, our MAE is only 30 points then this game's margin could be said to be easier to predict than the margin in the Geelong game.

The question is: can we claim that one particular game has an inherently greater MAE than another and, if so, what are the characteristics of that game that makes it so?

I have touched on this issue before in MoS, albeit framed slightly differently, in a blog post where I investigated the relationship between the MAE from a fitted model and the Bookmaker's pre-game Implicit Home Team Probability, and between the MAE from a different model and team MARS Ratings and the Interstate Status of a game. What I did there was fit models to the margins from a range of games and then investigate whether the fitted errors from those model - the difference between the actual and the fitted game margins - was related to any of the regressors or to the size of the predicted margin. I found no significant relationships in either case.

One, more technical way of stating that conclusion is that I found no evidence for heteroskedasticity in the model errors. When looking for heteroskedasticity you need to, implicitly at least, posit a model of how that heteroskedasticity manifests - in other words, what variables are related to greater or lesser variance in your prediction errors. In the blog just linked I was positing that predictive errors would be related to, for example, the strength of favouritism of the home team, the relative abilities of the two teams, the venue at which the game was played, or to the size of the margin that a model predicted. I found no such links. 

Looking back over that analysis, I realised that I've never investigated a relationship between game margin prediction errors and the expected score of the home team, the expected score of the away team, or the aggregate of these scores, mostly because I've lacked a basis on which to estimate expectations for these scores for a given game.

That's changed now with the more recent work on modelling these aspects, in particular:

The team and total score models provide an empirical basis on which to test the relationship between expected scores and MAE's, while the theoretical model of scoring provides a, well, theoretical basis to explore the same issue.To make it explicit, what we're asking is whether:

  • games where the Home team is expected to score more points are harder (or easier) to predict
  • games where the Away team is expected to score more points are harder (or easier) to predict
  • games where the Total scores of the Home and Away teams is expected to be higher are harder (or easier) to predict

We'll also revisit the relationship between the expected game margin and the predictive difficulty of a game.


Firstly, let's investigate the relationship between the absolute errors from a model fitted to game margins for the period 2007 to 2013 and:

  1. the expected Home team score in those games, estimated by fitting a model to Home team scores over the period
  2. the expected Away team score in those games, estimated by fitting a model to Away team scores over the period
  3. the expected Total score in those games, estimated by fitting a model to total Home plus Away team scores over the period

I'll not record here the details of the models that were fitted for these purposes, but suffice to say that they were similar in terms of functional form and regressor selection to those I've published previously on MoS. As proof of the models' adequacy I'd note that the R-squared for the Home team score model was 27%, for the Away team score model was 28%, for the Total score model was 14%, and for the Game Margin model was 37%.

The charts below are scatterplots of the three relationships.

There's no evidence for a relationship between empirical MAE and the Expected Home team score, some evidence for a relationship with the Expected Away team score, and only weak evidence for a relationship with the Expected Total score in the most-commonly predicted range of Expected Total scores.

The data though is very noisy, so it might be that a larger sample could reveal the existence of some stronger relationships. Though we've more than 1,300 data points in each chart, the number of observations at the extremes of the prediction ranges is quite small.

So, to horribly mangle an old economist joke, we know that the evidence is weak at best in practice, let's see if it's stronger in theory.


To briefly recap, in a recent blog I asserted that we could model the Home team and Away team scores with a bivariate distribution formed by a combination of a bivariate Negative Binomial to model Scoring Shots and bivariate Beta Binomials (with 0 covariance) to model the conversion of those Scoring Shots into goals by the two teams. The relevant equations are shown in the inset.

When using the Negative Binomial to simulate the outcome for games with different expected Scoring Shot values, I've assumed that the size parameter is a constant. Though I didn't report it in the previous blog, I do have justification for this assumption. Fitting the Negative Binomial to empirical data as I did in the previous blog, but assuming that size varies with the same regressors that I used to model lambda, results in models both for Home team and Away team scoring with a lower AIC metric (ie an inferior model).

One implication of this assumption is that the variance of each team's Scoring Shot production will increase with the mean of its Scoring Shot distribution, lambda, since the equation at right relating the variance of a negative binomial to its two parameters applies. (Note that lambda is non-negative and our estimated size parameter is positive.)

A similar property emerges when we use the Scoring Shot distribution and the relevant Goals distribution to produce simulated Scores, as shown in the table at left where I've recorded the standard deviation of the simulated Home team (left column) and Away team (right column) scores under different assumed values for the expected number of Scoring Shots. Put simply, the spread of Actual Scores for a team about its Expected Score grows wider as that Expected Score increases.

For example, a Home team expected to generate 30 Scoring Shots, and so having an Expected Score of 30 x 6 x 0.53 + 30 x (1 - 0.53) = 109.5 points, will produce Actual Scores with a standard deviation of 25.1 points per game. A Home team expected to instead generate 35 Scoring Shots will produce Actual Scores with a standard deviation of 27.1 points per game. So, in this sense, if our task was to predict the number of points that the Home team would score in a game and our metric was MAE, we'd expect to do more poorly for Home teams expected to score more points. The same statement would pertain also if we were asked to predict the Away team score.

(The reason, by the way, for the higher levels of variability for Away teams compared to Home teams with the same value for Expected Scoring Shots is that Away teams exhibit, empirically, greater levels of variability in their Scoring Shot conversion rates, which is reflected in our modelling by the higher value of the theta parameter in the Beta Binomial.)

There is some empirical support for this theoretical result from the models fitted earlier. In the charts that follow we're looking at the relationship between the predicted Home (left) or Away (right) team score and the fitted error in that prediction.

Again though, there is considerable noise in the data.

Stlll, broadly speaking, it seems fair to say that the higher we expect the Home team's score to be, the larger the absolute error we should expect for that prediction. And we can say the same thing for Away team score predictions.

Claiming that it's harder to predict the Home team's or the Away team's score as the expected level of those scores increases gives us a first inkling about the characteristics of games where the margin (ie the difference between the Home and Away team scores) might be harder to predict.

To see this, recall that the variability of the difference between two random variables is equal to the sum of the variability of the individual random variables less twice their covariance, which here can be expressed as:

Now we know from what we've just done that the two variance terms increase with the mean of the underlying Scoring Shot distributions and by simulation we can show that the covariance becomes more negative as the expected scores of either the Home or the Away team increases, which increases the variance of the difference in Home and Away scores further still, so theoretically we should find that the variability of game margins increases as the expected score for either team increases - which is actually a result we found in the previous blog and reported in the table that I've repeated below.

For that table I simulated a very broad range of expected Home team and Away team Scoring Shot levels. Many of the combinations in that table would never apply to a real AFL game, such as a team expected to generate 30 Scoring Shots facing another expected to generate 40. In order to run the simulation on more realistic examples, I fitted models to Home team and Away team Scoring Shot production and then repeated the simulations using only the pairs of fitted values for each game that emerged from these models. Interestingly, if we do that and then create a scatterplot of the simulated Game Margins versus the Margin Error, we get the chart shown at right.

That curious hockey-stick shape emerges - though it's less dramatic here because the range of the y-axis is larger - if we create a similar scatterplot using the empirical Game Margin model we fitted earlier and the fitted errors for that model.

Lastly, if we take the results from the latest round of simulations and scatterplot the expected absolute game margin error against the expected total game score we obtain the plot you see below, which implies a much stronger relationship between these two aspects than we saw when we created a similar plot using the empirical data earlier.

We should note though the relatively small range of absolute margin error values that the bulk of the observations inhabit. For games where the expected Total score is about 180 points the theoretical expected absolute game margin error is about 28 points, while for games where the expected Total score is about 200 the theoretical expected absolute game margin error is about 30 points. It's easy to see how an effect this small could get lost in the noise.


So where does that leave us?

Firstly, I'd say that there's some theoretical but relatively scant (though possibly lost in the noise) empirical evidence that games expected to be higher scoring are likely to be harder to predict in the sense of having a higher expected absolute margin error.

There's also theoretical AND empirical evidence that predicted large Away team victories are likely to be harder to predict, though the theoretical estimate of the additional difficulty is higher than that we find empirically. The theoretical results also suggest a moderate increase in difficulty for predictions of larger Home team wins, though the empirical evidence does not support this.

Finally, in addition to identifying some characteristics of games that might make their final margin harder to predict in practice, this analysis has demonstrated that the theoretical model of team scores, while not a perfect model of actual team scoring, appears to be an adequate first-order approximation for describing key features of real AFL game scores.