As I was writing up the recent post about the application of the Pythagorean Expectation approach to AFL I realised that it provided yet another method for generating a margin prediction from a probability prediction.
Recall that the Pythagorean Expectation approach involves finding a value of k that best fits teams' winning rates as a function of their scoring as per the equation at right.
Now, if we replace the Win PC term in that equation with some probability assessment of a team's - say the Home team's - chances of victory and then rearrange the equation to make the score ratio the subject, we wind up with an expression that allows us to infer an expected score ratio from a probability assessment. That equation appears at left.
Lastly, if we make one further assumption about the expected aggregate points scored in the game (ie Points Conceded + Points Scored) then we achieve our ultimate objective: an expression relating an expected or predicted game margin to a probability assessment.
(Incidentally, as part of the analysis I did for this blog I investigated the relationship between the pre-game home team probability and the ultimate aggregate points scored, thinking that games with a very strong or very weak home team might result in a larger or smaller than average aggregate points tally. No such relationship exists - or certainly not a linear one - so it seems reasonable to make a blanket assumption about the expected aggregate score for a game, regardless of the pre-game probabilities.)
I'll be the first to admit that this isn't an especially pretty equation, but it does have a certain impish charm in its slightly imperfect symmetry.
COMPARISON WITH ASSUMING A NORMAL DISTRIBUTION
Most commonly, when called on to generate a margin prediction from a probability assessment I'll use a Normal distribution. For the purposes of this blog I'm going to use a Normal distribution with a mean of 0 and a standard deviation of 38.4 points, the latter parameter a little larger than I'd generally use, but not ridiculously so. Having chosen a specific Normal distribution to use I'd then evaluate the inverse CDF of that distribution at the probability assessment I have to convert that probability into a margin.
So, for example, if I had calculated a 70% probability for the Home team, I'd evaluate the inverse CDF for a Normal(0, 38.4) and arrive at a predicted margin of 20.14 points for the Home team. In other words, if I'd assessed that the TAB Bookmaker had the Home team as 70% favourites - and just how I might have come to that assessment I'll return to in a minute - then I'd infer that this meant he expected the Home team to win by a little over 20 points.
The chart below shows the comparison between the expected margins I'd get for various Home team probabilities using this Normal assumption and what I'd get using the equation I derived above based on Pythagorean Expectations, assuming values for k and Total Score of 3.92 and 185.4 respectively (which are both plausible empirical values for the 2006 to 2013 timeframe).
Clearly, the Normal and Pythagorean Expectation approaches can yield very similar results. Even for games involving highly mismatched teams, where we do start to see some signs of divergence between the two approaches, the difference for the range of probabilities shown in the chart (about 9% to 90%) is only 1 or 2 points.
A BROADER FRAMEWORK
We can put this new approach to generating margin predictions from probability assessments into a broader framework, recognising it as one of a number of approaches that I've chronicled here on MatterOfStats over the years. At the same time, we can remind ourselves about the first step in the process, the inferring of Home team probability assessments from the observed Home and Away team prices. I wrote about the various approaches for this step in a series of blogs culminating in this one and this one, both in September 2013.
The following diagram summarises my thinking about the various ways to move from observed head-to-head prices to predicted game margins.
On the left we have the information we observe: the head-to-head prices for the two teams at some point in time. From these prices we can calculate the Total Overround in the market as the sum of the inverse of the prices less 1.
The middle section addresses the problem of converting the observed prices into implicit victory probabilities for the two teams, with each variant making a different assumption about how the total overround in the market is levied on each team. Note that this framework, for now, excludes any consideration of drawn games.
(One of the more profound realisations I've had while working through the mathematics of bookmaker overround and generating the three options you see in the diagram, is that the total overround in the head-to-head market is equal to the sum of the maximum calibration error that the bookmaker can have in his assessments of the two teams. In other words, if there's 6% overround in the head-to-head prices and the bookmaker's price for the Home team provides cover for a calibration error of 4% - that is, still carries a negative expectation even if the probability on which it was based underestimates the true probability for the Home team by no more than 4% - then his price for the Away team provides cover for a calibration error of no more than 2%. That is extraordinary confidence in your ability to assess probabilities accurately - or at least in your belief that no-one will be able to spot any opportunities you might inadvertently provide.)
Finally, the section on the right looks at the various ways for taking the implicit probabilities generated in the middle section and converting them into margin predictions. Where we choose to make some distributional assumption about game margins, such as assuming that they're related to the Normal distribution in the way described earlier, we need to calculate optimal values for the distribution's parameters.
One way to do this is to empirically optimise some metric such as the mean absolute error (MAE) of the resulting margin predictions across some expanse of time. In the diagram above I have, for example, found optimised values of mu and sigma if the input probabilities come from, say, the Overround Equalising approach and we choose to minimise MAE across the 2006 to 2013 period.
Other distributional assumptions are possible, some of which I've listed above, though I've not explored them so far on MatterOfStats.
Another, more direct way of finding an empirical relationship between probabilities and margins is to feed the data for both for a certain time period into Eureqa and let it come up with a suitable equation relating the two. As you can see in the diagram, when I did this for the 2006 to 2013 data Eureqa quite likes the fit provided by equations of the form ceProb - k.
And, as we've just discussed, the Pythagorean Expectations approach can also be used. Here too though we need a method for determining optimal parameter values.
At the foot of both the middle and right-hand sections I've somewhat hand-wavily allowed for the possibility of other empirical approaches, both to the determination of probabilities based on prices, and to the determination of expected margins based on probabilities. I'm sure at some point I'll discover and write about more of these possibilities, but for now the diagram above provides a comprehensive summary of what I know.