Interest is being fuelled by the variety of contests and wagering markets, public and private, offering prizes and plaudits for ordering accuracy. William Hill, for example, is offering a share in \$100m (for, I think, a \$5 bet) to anyone able to place all teams in the correct order.

It's interesting to consider how you might go about estimating the correct odds for this wager and how you might assess the likelihood of other, related outcomes. I can't, though, find any analyses performed and published by others that provide answers to normative questions such as these:

• How many teams might you reasonably expect to place in the right order?
• How likely is it that you'll get 0, 1, 2 and all the way up to 18 teams in the correct order?
• If you measure the difference between your own team ordering and the correct one on the basis of the sum of the absolute differences in the ranks (so, for example, if you had Hawthorn 1st and they finish 4th you "score" 3), what's a reasonable score?

### Simulation-Based Approach

One reason there are no published analyses in this area, I think, is that there's no practical, purely theoretical and from first principles approach that can be used to answer the questions I've just posed. If the result of each game can be considered to be a realisation of a draw from a Normal distribution (for which there's reasonable empirical and theoretical support), then the set of all results is a draw from a huge, and analytically unwieldy multivariate Normal. That renders a whole-of-season theoretical approach an apparent dead-end.

But, we can't legitimately simplify the analysis by switching to a team-by-team view because team ladder finishes are inherently correlated. By this I mean that knowledge about the final ladder position of one team in a given season realisation alters the probability distribution for other, similarly skilled teams. To give an obvious example, the conditional probability of Sydney finishing 1st is zero in all realisations where Hawthorn finished 1st. The practical implication of this is that you can't treat each team's final ladder finish as an "independent event" and can't therefore make independent assessments about the probability of correctly ordering each of the teams separately, and then multiply these assessments together to give an estimate of the joint probability of getting them all correct.

So, if an analytical approach won't work, my response is to adopt a simulation-based approach, for which purpose I'm going to reuse the assumptions and approach that I used for this earlier blog exploring the schedule strength of the 18 teams in 2015. Those assumptions can be used to derive an expected number of wins for every team and this used to create an implicit "correct" final team ordering:

1. Hawthorn
3. Sydney
4. Fremantle
5. Geelong
6. Kangaroos
7. West Coast
9. Richmond
10. Essendon
11. Carlton
12. Collingwood
13. Gold Coast
14. Western Bulldogs
15. Brisbane Lions
16. GWS
17. Melbourne
18. St Kilda

To some extent, the particular ordering of the teams here is irrelevant. What it serves to do for us is determine a "correct" or "most likely" result against which simulated seasons can be compared. We're assuming, if you like, that we know the correct ordering of the teams, given their relative strengths, Home Ground Advantages and the nature of their draw, and we're then assessing the extent to which the random nature of football, as summarised in the assumptions I've used for the simulations, serve to produce final ladder orderings different from this "correct" one.

These assessments then give us a best-case indication of the likely accuracy of an all-knowing forecaster's ladder predictions.

How Many Correctly-Ordered Teams?

The results of 175,000 simulations of the 2015 season are summarised in the table at right. It makes for sobering reading.

For now, focus on the first two columns, which show, for example, that in about 6% of the simulated seasons, none of the teams finished in its correct position. The most likely result for an all-knowing forecaster is that he or she will correctly forecast the final ladder positions of only 2 or 3 teams (the average is 2.78 teams correct).

If the simulations are anything to go by, the chances of getting more than one half of the teams in exactly the right order are only about 1 in 1,700.

Whilst there's no a priori reason to surmise that the number of teams correctly ordered should follow any particular statistical distribution, the simulation results are broadly consistent with a Poisson distribution - albeit a truncated one, since there's a logical cap at 18. Fitting a Poisson to the simulation data, choosing its sole parameter lambda to minimise the sum of the absolute differences in the actual and fitted proportions at each observed count yields a fitted Poisson whose details are shown in the two rightmost columns. The fit is pretty good, as you can see, especially for values of the number of correctly ordered teams from 0 to 11.

To get an estimate for 18 correctly ordered teams then we:

1. Sum the fitted values from the Poisson for the Number Correct between 0 and 16
2. Recognise that it's not possible to get exactly 17 of the 18 teams in the correct order (if you've got 17 right, you must also have the 18th right)
3. Assign the remaining probability to the assessment of getting all 18 teams correct. This gives a quasi-best case estimate of the likelihood of getting all 18 teams right.

That gives an estimate of about 1 in 137,000,000 for getting all 18 teams right. I make no claims about the accuracy of that estimate, but it is comfortably smaller than the 1 in 20,000,000 offered by William Hills, so that gives me some additional confidence that it is in the ballpark (or at least in the carpark of the ballpark).

It is, by the way, easy to come up with a worst-case estimate of the probability of correctly ordering all the teams. Since there are 18 teams in the competition, the number of possible final ladders is given by 18! (ie 18 x 17 x 16 .... x 3 x 2 x 1), which is about 6.4 quadrillion combinations. That's 64 following by 14 zeros, which is, according to the unimpeachable Wikipedia, about the number of ants on the face of the planet. So, that's a few.

Now if we assume that any team is as likely as any other to finish in one of the 18 positions, then each of these combinations is equally likely, so the probability of correctly ordering the 18 teams is about 1 in 6.4 quadrillion. That's roughly equivalent to rolling 20 die and having them all come down 6, or tossing 52 coins and having them all land Heads. Put slightly more formally in the information-theoretic language of surprisals, correctly tipping the final team order would be an outcome carrying 52.5 bits of surprisal.

This is though, as I just mentioned, a worst-case estimate since it assumes you're unable to do any better than assume every team is as likely to finish first as it is last or indeed any other position on the ladder. In contrast, the simulation-based approach above attempts to take maximum advantage of our knowledge about relative team strengths and the actual 2015 schedule.

Results for the Sum of the Absolute Rank Differences

Instead of assessing the accuracy of our team ladder predictions by considering only the number that we placed in the correct position, we might use a more discriminating metric in which we sum the differences for each team between its predicted and actual ladder finish. I'll call this the Aggregate Absolute Rank Difference Metric (AARDM). The best possible value of this metric is 0, corresponding to ordering all 18 teams correctly, and the worst possible value is 162, corresponding to ordering all 18 teams in the reverse order to their actual ladder finish.

We proceed again by simulating seasons in the same way, this time for 50,000 new seasons, and then use those simulations to construct a distribution of values for the AARDM for that same all-knowing forecaster, operating in 2015.

The first chart shows this distribution in probability density form so that, for example, we can say that an AARDM of 44 would be expected about 9% of the time and one below 20 or above 72 would be quite rare. In fact, in only 0.2% of simulations was a score below 20 achieved, and in only 0.25% of simulations was a score above 70 achieved.

The average AARDM across the 50,000 simulated seasons was 42.7 and the median 42. The lowest score recorded was 12 and the highest 88. (Note that the nature of the metric is such that only even values are possible.)

These same results can, instead, be viewed in the form of a cumulative distribution function, as seen here at left. This tracks the proportion of simulations for which a given AARDM or lower was achieved.

So, for example, scores of 60 or less occurred in about 97% of simulations. The median score of 42 can be read off as the x-value at which the cumulative probability is 50%.

Before finishing, I think it's worth thinking about the factors that make team orderings in a particular season easier or harder to predict. Orderings, I contend, will be more difficult to predict when:

• Teams are more evenly matched
• Schedules are shorter (especially if mis-matches are deliberately avoided)
• Individual game results are more highly variable (in other words, the standard deviation of the final game margin about its expected value, is higher)

These factors will differ from year-to-year, so a good AARDM in one year might be more or less impressive in other.

### WRAP UP

There are a vast number of different possible orderings that the 18 AFL teams might produce at the end of this season and, while our knowledge of teams' relative abilities and schedules help us reduce the odds, simulation suggests that correctly ordering half of them or more is an extremely unlikely achievement. Getting 4 or 5 of them exactly right seems like a more realistic aspiration.

Similarly, an AARDM of under 24 looks a daunting target while one under 32 seems challenging but achievable.