Modelling Miscalibration

If you're making probability assessments one of the things you almost certainly want them to be is well-calibrated, and we know both from first-hand experience and a variety of analyses here on MatterOfStats over the years that the TAB Bookmaker is all of that.

Well he is, at least, well-calibrated as far as I can tell. His actual probability assessments aren't directly available to anyone but must, instead, be inferred from his head-to-head prices and I've come up with three ways of making this inference, using an Overround-Equalising, Risk-Equalising or an LPSO-Optimising approach.

All three of these approaches produce very similar and therefore highly-correlated probability assessments, the absolute calibration errors for which are mostly around 5 percentage points across the entire range of possible probabilities.

There are, however, some probability ranges where the calibration error is larger than elsewhere, and one way to model these deviations is via multivariate adaptive regression splines, which essentially allow us to fit smooth, piecemeal curves to different ranges of our input values. In short, what we'll be doing is fitting a regression model to the implicit probabilities that will allow us to "correct" them on the basis of actual winning rates. I used the earth package in R for this purpose.


For this analysis I drew on the data for the 1,536 games during the seasons 2006 to 2013 inclusive, excluding the 17 games that finished in draws. Three models were fitted, each using as its sole regressor one of the three home team probability estimates produced from the TAB Bookmaker's pre-game head-to-head prices. The target variable for all three models was the result of each game, win or lose, from the home team's perspective.

All three models produced broadly similar fits, the generalised R-squared for the model using the LPS-Optimising (LSPO) approach proving to be slightly superior to that using the Overround-Equalising (OE) approach, which was in turn microscopically superior to that using the Risk-Equalising (RE) approach.

The OE and RE models each had one "knot" (ie value at which the relationship between the input and output variable changes slope). For the RE model that knot was at a probability of 62.5% while for the OE model is was at a probability of about 61.7%. More complex was the LPSO model, which has two knots, one at a probability of about 72.5% and another at about 85.2%. All three models has non-zero intercepts.

For probabilities less than the first knot in the OE model (ie about 62%), the three models yield very similar outputs (fitted probabilities) for a given input (implicit probability).

For implicit probabilities less than 25% - that is, those where the approach we're using to infer the Bookmaker's probability from his prices suggests that his assessment of the home team's chances are that they're 25% or less - result data, when modelled, suggests that the RE, OE and LPSO approaches all underestimate the true probability of the home team, by an increasing amount as the implicit probability approaches zero.

For example, for those games where the approaches estimate the home team as about 15% chances the home teams' actual success rate is modelled to be about 20%, and for games where the estimates are about 10% the modelled success rate is nearer 17 to 18%.

For input probabilities in the range from about 25% to 65% the modelled probabilities are roughly equal to the input probabilities - in other words, all three approaches are very well-calibrated in this range. This range of probabilities represents about half of all games - or a little less, depending on which approach we use.

The RE and OE models also suggest that the RE and OE approaches tend to understate the true home team probabilities for input probabilities higher than about 70%, as they do for probabilities less than 25%. In percentage point terms the understatement is largest for input probabilities around 80% where the difference is about 5 to 6% points.

In contrast to the RE and OE models, the LPSO model suggests that probabilities derived using the LPSO approach tend to overstate home teams' true probabilities for input probabilities up to about 77%. For input probabilities greater than this, the LPSO model, like the RE and OE models, finds that the LPSO approach to estimating the Bookmaker's true probability assessment tends to understate the home team's true chances.


Based on the generalised R-squared metric we've seen that the LPSO model provides a modestly superior fit.

We can also measure the quality of the probability estimates produced by these models using probability scores, in particular the Brier Score and the Log Probability Score. (Note that, contrary to the standard definition, I add 1 to all Log Probability Scores so that a perfect score is 1 and not 0.)

In the table at right I've calculated these two probability scores for the three approaches to inferring home team probabilities. The rows labelled "Raw" are the scores achieved if we were to use the inferred probabilities, uncorrected, as our estimates while those labelled "MARS model" are the scores that would be achieved if we were to use the relevant regression spline model, taking as input the relevant, raw inferred home team probabilities.

For Brier Scores, smaller is better, while for Log Probability Scores the opposite is true. The first thing to notice then is that, for all three approaches and regardless of whether we consider the Brier or LPS results, the MARS model scores are superior to the Raw scores. In this sense at least, the modelled scores are superior to their raw equivalents.

We can also see that the scores for the LPSO-Optimised MARS model are best of all, suggesting that the probability assessments from this model best fit the actual results achieved by home teams if we take probability score as our preferred metric.


The best home team probability estimates for the period 2006 to 2013 come from the following binary logit model:

ln(Estimated Home Team Probability/(1 - Estimated Home Team Probability)) = 0.70784 + if(LPSO_Prob > 72.5%, (LPSO_Prob - 72.5%) x 11.597, 0) + if(LPSO_Prob < 72.5%, (72.5% - LPSO_Prob) x -3.6586, 0) + if(LPSO_Prob > 85.2%, (85.2% - LPSO_Prob) x -5.3031, 0)

where LPSO_Prob = 1/Home Team Price - 1.0281%

Forced to make a probability assessment of a home team's chances armed only with the TAB Bookmaker's latest prices, this is the best model that MatterOfStats can currently offer.