Blowing Hot and Cold About Simulations

There are two fundamentally different ways of simulating the remainder of a sporting competition, which are loosely referred to as ‘cold’ and ‘hot’ simulations. The difference between them comes down to the assumptions made about the dynamics of the abilities of the teams involved. In cold simulations, teams’ assumed abilities are unrelated to the simulated results of earlier games, while in hot simulations they are related to earlier results.

(I’ve found surprisingly scant writing about this topic in academic journals or elsewhere. Please let me know if you come across anything.)

Cold Simulating the men’s AFL Competition

In the simplest version of cold simulation, we could proceed by assuming that all 18 teams’ underlying abilities remain constant for the remainder of the season, and that the teams’ Venue Performance Values (my extension of the Home Ground Advantage concept in which teams have a plus or minus value for every venue) do likewise. Whilst that makes the simulation process quite simple, it deliberately ignores a major source of variability in the actual world: variability in the teams’ underlying abilities across time.

To cater for this source of known variability, we could look at how it relates to time, and then somehow incorporate this in the simulation process. That’s the path I’ve adopted previously in creating my cold simulations.

More specifically, what I’ve done is used empirical data to estimate that the standard deviation of a team’s underlying rating can be roughly estimated by 4.5 times the square root of the number of days into the future we project. The variability also roughly follows a Normal distribution, so we can simply make draws from a Normal distribution with the requisite standard deviation and add that these to the teams’ current Ratings to come up with an on-the-day rating to use for the team in a particular future game.

(In practice, it’s actually even more complex, because we need to add variability to each team’s offensive and defensive ratings. Most recently, I’ve tried making the same adjustment to a team’s offensive and its defensive rating to reflect the positive correlation between them, historically)

We could also choose to similarly add variability to the Venue Performance Values but, in practice, I’ve always kept them fixed within and across simulation replicates.

Although this method does produce simulations that seem to approximate the betting markets moderately well - moreso around the middle of the season - it suffers from some obvious shortcomings:

  • It ignores regression to the mean entirely, so very strong teams are assumed to remain very strong (on average) all season, and very weak teams to remain very weak

  • It adds a lot of variability to games that are far into the future, which tends to make these games a little closer to 50:50 contests

  • It ignores the fact that team ratings are constrained to sum to zero

  • It treats each future round independently so that a team’s assumed underlying rating in Round T will bear no relation to its underlying rating in Round T+1 except for the fact that the two values have the same expected value

There might well be better ways of implementing cold simulations that address all of the issues I’ve listed here, but life is finite (at least for the sample of lives we’ve observed so far). One other thing I have tried in the past to deal with the final point listed above is to use empirical rating trajectories - that is paths that actual team’s ratings have tracked from a given point in the season. These will show realistic variability over time, and will, if chosen at random from all available trajectories, keep the expected change in the underlying rating across all simulations, at zero.

So, while, conceptually, cold simulations seem like the more logical way to proceed, but building in the neessary variability in a manner that is faithful both to history and the underlying structure of ratings, has proven to be challenging.

And this all might seem overly complicated when there is an obvious alternative that suffers from none of the shortcomings described here.

HOT Simulating the men’s AFL Competition

In practice, nothing could be simpler than hot simulation. Under this approach you simply simulate each game, in turn, using the current values for each team’s rating and the relevant Venue Performance Values, update these ratings and Values to reflect the simulate result, and then repeat for all remaining games in the season. Then, repeat the whole process for the next replicate, starting afresh.

In other words, you alter each team’s estimated ratings and Venue Performance Values for games in Round T+1 based on their results from Round T. And that is the fundamental objection about the hot simulation approach: why alter our estimate of a team’s underlying rating or its Venue Performance Value on the basis of a simulated outcome that is, by definition, consistent with its previous ratings and Value?

Proponents will counter with “but all we’re doing is what we’d do in the real world if we observed that result”, which is true, but we rely on the real world to give us new information about team abilities, which is something that the simulated results cannot.

Regardless, the hot simulation method incorporates regression to the mean, respects rating constraints, and builds in dependence across the rounds within a given simulation replicate. As a result, it produces simulation outputs that look a lot like real seasons.

The only major downside of hot simulations is that, because of the dependence of all future games on all past ones, they’re a lot slower to perform. You’re forced to simulate each round before you can move onto the next, because subsequent rounds will have team ratings and Venue Performance Values that depend on those earlier results.

That aside, pragmatism heavily favours hot simulation.

SIMULATING EACH GAME

Whether using a cold or a hot approach, one very important component of any simulation is that used to simulate a single game. In a hot simulation regime it is, in fact, the only source of variability and so, for completeness, I’ll describe my current method here.

Expected Scores

Firstly, note that I use MoSHBODS Ratings in my simulations, which are designed such that each team’s ratings are measured in points and relative to an average team. So, a rating of +3 means that a team is expected to score three points more than an average team (facing an average team at a neutral venue). Venue Performance Values (VPVs) are also measured in points, so we create the expected score for a team playing a particular opponent at a given venue as

Expected Score = Expected Score for Average Team + Own Offensive Rating - Opponent Defensive Rating + 0.5 * (Own VPV - Opponent VPV)

The Expected score for an average team is calculated based on the all-team actual average score over the past X days (currently 365 days for MoSHBODS).

We can, therefore, given two team’s current ratings and relevant VPVs, come up with an expected score for each of them.

Expected Scoring Shots

My individual game simulation works on scoring shots, not points, so we need to convert the expected scores to expected scoring shots. We do this by estimating the average points per scoring shot for home teams and for away teams over the past 365 days, and then calculating

Expected Scoring Shots = Expected Score / Historical Average Points per Scoring Shot

Simulating Scoring Shots

To simulate the scoring shots for a single game we use a bivariate negative binomial, the parameters for which I’ve estimated using historical data. The bivariate nature allows us to incorporate the empirical fact that scoring shot production tends to be negatively correlated in that, if Team A records more scoring shots than expected, Team B tends to record fewer.

Simulating Conversion

To simulate the conversion of scoring shots into goals (and behinds) we use two, independent beta binomials the parameters for which I have also estimated using historical data. These also require estimates for the means, for which I use actual home and away conversion rates over the past 12 months.

Although it’s easy to imagine that teams’ conversion rates in a single game might be correlated - if it’s wet, for example, you’d expect both teams to record conversion rates below expectations - the emprical evidence (at least as far as I’ve assessed it) suggests that any such correlation is very small, which Iis why we use two independent beta binomials here.

To get a feel for this method of simulating individual game scores, we can look at the standard deviation of the simulated margins that it produces.

We can see that, for expected totals of around 160 points, which is the modern-day average, the standard deviations are around 34 to 35 points, which aligns with other analyses.

CONCLUSION

So, this year, at least for now, I’ll be doing away with cold simulations, and presenting only the results of my hot (or Heretical) simulations.

I am nothing if not a pragmatist …