Simulating the Finalists for 2016 After Round 14

This year I'm starting simulations for the Finalists a bit earlier than usual, partly as a result of a request from Twitter follower @SgtButane, who's another of the burgeoning group of AFL fans using statistical modelling techniques to forecast game outcomes. If you've an interest in this sort of modelling and you're not yet on Twitter, I'd strongly encourage you to sign up and start following like-minded people (I'm @MatterOfStats by the way). There's a tremendous amount of informed commentary and quality analysis being shared by some very talented people in this space.

THE PHILOSOPHY AND PRACTICE OF SIMULATION

Anyway, that request for me to start simulating the season triggered a lively, but very civil debate on Twitter about how to approach the task. The main point of contention was whether or not any adjustment should be made to a team's assumed abilities during the course of a single replicate in the simulation, on the basis of the simulated result of a game.

If, say, Melbourne were to defeat Adelaide in one simulated version of Round 15 - which would represent a surprise result given the two teams' current ratings - should Melbourne's team rating be increased and Adelaide's decreased before simulating the results for Round 16?

My view is that they should not be altered because we have no new information in the simulation about Melbourne's or Adelaide's ability, just a single simulated result which, while unlikely, is by definition completely consistent with the teams' assumed pre-game ratings. If we model victory margins as stochastic variables then, I'd suggest, we shouldn't respond to the randomness that this produces as if it were somehow a sign of improved ability. Doing so implies that we believe team ratings respond to the randomness inherent in actual results.

The counterargument seems to be that we would adjust the teams' ratings were we to see this result in real life, so we should do the same when we see it in a simulation, to which my response would be that we make adjustments when we see a result in real life because we assume there is some deterministic component in that outcome.

Just how much we assume an actual result reflects random versus deterministic elements and how much, as a result, we adjust team ratings on the basis of that single result, is at the heart of any ratings system  - adjust too much and we respond excessively to randomness, too little and we ignore the real changes in team abilities over time.

But, in the case of the simulations, we know in advance that any deviations from expectation we see are due solely to chance, so there's no logical basis, I'd argue, on which to proceed otherwise.

The participants in the conversation split almost 50:50 into those who do and those who don't adjust team ratings within a single replicate, but one of the modellers who does not make adjustments (@RankingSW on Twitter) provided a subtle alternative: use the same base team ratings across every round of the simulation, but treat them as random variables, drawing a different value for each simulation run, and maybe even for each round in each simulation run, centred on those base ratings. These adjustments to ratings are made without regard to previous simulated results and are used to reflect the fact that each team's base ratings are inherently measured with error (because, for example, of the mis-alloation of "surprises" in previous actual results into random versus deterministic components).

I've incorporated this idea in the 2016 simulations, as outlined in the following section.

THE APPROACH

The simulation of each game this year has proceeded in the following manner:

  1. Take each team's MoSSBODS Offensive and Defensive Ratings and estimated Venue Performance Values as at the end of the most-recent round (Round 13 for this first week of simulations).
  2. Add a random component to both of the ratings for each team (but not to the estimate Venue Performance figures). We use a Normal(0,0.5) for this purpose, the 0.5 value for the standard deviation approximately equal to the standard deviation of MoSSBODS team component rating changes across the last eight or so home-and-away rounds for seasons from 2000 to 2015. Note that we draw a different value from this distribution for the Offensive and Defensive Ratings for every team, and a different pair of values for each team for every round
  3. Use these values, along with any Travel Penalty if applicable, to calculate expected Scoring Shots for any pair of teams scheduled to meet.
  4. Simulate the scores for any pair of teams using the Negative Binomial/Beta Binomial model I developed in 2014 (and using the parameter values I estimated there).
  5. Repeat for every game remaining in the schedule and collect the results across replicates.

This process of simulation is more complicated than that which I've used in previous years and, as a consequence, is a little slower to run. So, for now, we'll need to make do with 25,000 simulations of the remainder of the season, which take a bit over an hour to run. I might ramp up the number of simulations, if time permits, in future weeks.

THE RESULTS

Each team's simulated results are subject to three sources of variability, one that comes from the stochastic components of their base Offensive and Defensive ratings, another than comes from the modelled on-the-day variability of their Scoring Shot production around its expected value, and a third that comes from the modelled on-the-day variability in Scoring Shot conversion around its expected value (which is 53% for all teams).

That variability manifests in terms of a range of possible final ladder positions, which are summarised below in what I last year dubbed the "Dinosaur Chart".

Not surprisingly, with so many games still to be played, most teams have a relatively wide range of simulated potential ladder finishes, in some cases spanning as many as 13 ladder spots. Only the Lions, Dons and, to a lesser extent, Suns, appear to have a significantly more restricted menu.

Looking at these results, instead, as an annotated heat map, reveals more details about the relative likelihood of specific ladder finishes for each team, and the sets of teams that appear most likely to contest particular ladder position groups.

The number shown in a cell represents the proportion (multiplied by 100) of all simulation runs for that particular team that resulted in the team finishing in the ladder position indicated. So, for example, Geelong finished 1st in 45.8% of all simulation runs, more often than any other team. Since every team appears in the same number of simulations, the number also reflects the proportion of all runs in which that team finished in that position. In other words, both the rows and the columns sum to 100.

From this heat map we can see that the teams form some natural groupings:

  • Geelong - most likely to finish 1st
  • Adelaide, GWS, Sydney, Hawthorn and the Western Bulldogs - most likely to scrap over 2nd to 6th
  • West Coast and the Kangaroos - most likely to compete for 7th and 8th
  • Port Adelaide and St Kilda - most likely to compete for 9th and 10th
  • Melbourne, Collingwood, Carlton and Richmond - likely to split 11th to 14th amongst them
  • Fremantle and Gold Coast - candidates for 15th and 16th
  • Brisbane Lions and Essendon - candidates for 17th and the Spoon.

COMPARISON WITH LATEST TAB MARKETS

As always, a sanity check using the prices of a professional bookmaker is advisable, and for this purpose we'll use those of the TAB.

Broadly speaking, the simulation results accord with the bookmaker's thinking, though a handful of prices appear to offer some value if you have faith in the simulation inputs and process.

Specifically:

  • Geelong ($2.75) and Adelaide ($9.00) offer value in the market for the Minor Premier
  • Adelaide ($1.65) and the Western Bulldogs ($3.50) offer value in the Top 4 market
  • St Kilda ($13.00) and Carlton ($251.00) offer value in the Top 8 market
  • The Kangaroos ($7.50) offer value in the Miss the Top 8 market.

Adelaide appears to offer particular value in the Minor Premiers and Top 4 markets, which you might interpret as either an opportunity or an error in modelling, depending on your perspective.

LADDER FINISHES

Lastly, let's look at what the simulations reveal about the teams most likely to occupy ranges of ladder positions.

Firstly, consider 1st and 2nd at the end of the home-and-away season. 

About 14% of simulation runs finished with the Cats in 1st and the Giants in 2nd, and another 6% with the reverse ordering. Slightly more common in aggregate was a Cats / Crows 1-2 finish, though neither possible ordering of that pair was as common as Cats in 1st and Giants in 2nd.

Geelong appears in all but two of the Top 10 pairings, Adelaide in four, GWS in four, Sydney in two, and Hawthorn in two.

Next, let's expand our view to consider positions 1st through 4th.

The first thing to note is that none of these quartets is particularly likely. Even the most common of them (Geelong / GWS / Adelaide / Sydney) appeared in only about 1 simulation run in 50.

Also noteworthy is the fact that Geelong is in 1st position in all 10 of the most common quartets, and that GWS also appears in all 10, half the time in 2nd, four times in 3rd, and once in 4th.

Adelaide also makes 10 appearances, Sydney 5, Hawthorn 4, and the Western Bulldogs only 1.

Finally, we'll focus on the bottom half of the Top 8.

Here we find a much more diverse range of possibilities, though again even the most likely of them (Dogs / Hawks / Eagles / Roos) appears in only about 1 simulation run in 50.

Amongst the 10 most common results we find that:

  • The Western Bulldogs appear in all 10, and in nine of those either in 5th or 6th
  • West Coast also appears in all 10, most often in 7th
  • The Roos appear in all 10 as well, most often in 8th
  • Hawthorn appears in 5, and always in 5th or 6th
  • Sydney appears in 3, and either in 5th or 6th
  • GWS appears in only 1, in 5th
  • Adelaide appears in only 1, also in 5th

In future weeks we might also review other aspects of the simulated results, for example the most common Top 8. For now though, analyses of that sort are fairly futile because of the high levels of variability that remains in final ladder positions. As an example of this, amongst the 25,000 simulations run for this blog post, the most common Top 8s appeared only twice.