The previous post here on the Simulations blog triggered an interesting exchange of comments with Rob that forced me to think a little harder about the appropriate interpretation of the results presented in that original blog post. To substantially paraphrase, Rob's main point (and I do recommend you read his well-constructed and argued comments in their entirety) as I take it is that the TAB Bookmaker's prices each week are contingent on the results of all the games that preceded them, so as soon as we change the result of any game in any one week then it's no longer valid to use his prices for subsequent weeks of the simulation.
In that sense then, no single replicate can purport to be a representation of a variant of season 2015 that we would ever expect to see in real life. That is, I think, an absolutely fair point, so it would be wrong to characterise the outputs in that way.
So then, how should we interpret them and in what, more limited, sense do they tell us something about:
- How different season 2015 has been from what might be implied from the TAB Bookmaker's weekly head-to-head prices as the "most likely" situation right now?
- How much season 2015 might reasonably have varied from that most likely scenario?
Firstly, let's remind ourselves of what we're doing in the simulation process. In running the simulations we're implicitly assuming that
- the Bookmaker's estimate of the expected game margin is unbiased and reflects the difference in the true, underlying abilities of the teams on the day, at the relevant venue and with the teams as selected
- deviations from that margin in the actual result reflect a mixture of:
- Bookmaker error in assessing team abilities
- On-the-day variability in team abilities relative to their underlying ability
- Purely random components
Historical data allows us to confirm the unbiased nature of the Bookmaker's opinions in 1. and to estimate the size and distribution of the deviations in 2. What that history has shown us is that the deviations are distributed as a Normal random variable with a standard deviation of about 36-38 points. Numerous posts have found that, on balance, this standard deviation is the same for all games with perhaps a few exceptions.
If we agree that this is a reasonable basis on which to model the outcome of a single game then it's fair to say that, for any single replicate, the outcomes of all 170 games are entirely consistent with what the Bookmaker might have expected for each game this season given his pre-game beliefs about the relative strengths of the teams and his understanding of the inherent variability of a game in the AFL. It's true, as we've agreed, that, were the games played in the same sequence as they have been in 2015, he would almost certainly have revised his opinions about the relative strengths of the teams on the basis of earlier results but, taken individually, no single result should be excessively "surprising".
So, I think it's fair to say that each replicate represents a set of outcomes for the 170 games that is entirely consistent with the 2015 TAB Bookmaker's pre-game opinions for each of those games. The question is whether you believe adding the results for each team and treating it as a legitimate replicate of the entire season so far provides any useful insight into the two questions I posed at the start of this blog. That is, I think, mostly a matter of personal preference and I find myself developing cogent arguments for both the defence and the prosecution.
To diverge for a moment though, as I noted in my reply in the Comments section though, there's no doubt that the choice of the standard deviation for the random element of the result of each game has a significant bearing on the variability in team results that we observe across replicates. To get a sense of just how large an effect this assumption has I've re-run the simulations, assuming that the standard deviation of the random component is only half as big (ie 18 points).
Here, firstly, is a comparison of team ladder finishes. We still see some results for some teams that are very different from the current ladder - for example, Essendon are in the Top 8 in 1% of replicates - but these are less common.
Note that the ordering of the teams in terms of average ladder finish is identical in both sets of simulations, these being driven by the relative average Bookmaker-expected margins for each team rather than by the variability of the results around those margins.
The Dinosaur Charts and heat maps of teams' ladder positions and the most common Top 2s and Top 4s tell a similar story - that of a reduction in the variability across replicates.
The most common Top 8s also occur more frequently with the smaller standard deviation, though still even the most common of them crops up only 0.1% of the time.
All 10 of the 10 most common Top 8s see Hawthorn finishing 1st and Fremantle 2nd, with any two of Port Adelaide, Sydney and West Coast finishing 3rd and 4th. The Roos take 6th exclusively, either of Adelaide or Richmond take 7th, and any of Adelaide, Richmond or Collingwood finish 8th.
It's always an interesting and useful exercise to think about how best to interpret the results of some simulation or analysis, no less so in the case of the simulations of the 2015 season. My sense from the feedback I've received - from Rob and from others, this having been one of the most popular blog posts on MatterOfStats for a while - is that the contention that the TAB Bookmaker would have expected, say, Port Adelaide to be higher on the ladder now, and the Western Bulldogs lower, is a reasonable one. In that sense then the simulation - or at least the mean outcome for each team - does reflect the 2015 season that the TAB Bookmaker would have expected.
Debate is more, firstly, at the level of the individual replicate and whether or not each provides a realistic version of a season. In the sense of "likely to be observed in real life", I think many replicates would be deemed "unrealistic", but in the sense of an assemblage of 170 games each individually plausible and matched to the TAB Bookmaker's pre-game opinions about one of the 170 actual games this season, every replicate is, by design, "realistic".
Finally, it's about the emergent property we call a season: if we assemble 170 realistic (in the latter sense) replicates, one for each game of the real season, is it legitimate to think about that assemblage as a replicate of the 2015 season?
That, I think, is a very interesting question and I very much thank Rob (and others) for making me ponder it. As ever, your comments are welcomed too.