Are AFL Games Like Random Walks?

I came across an interesting journal article this week, published in March of 2015 on and entitled "Safe Leads and Lead Changes in Competitive Team Sports"

Its abstract provides the following summary:

"We investigate the time evolution of lead changes within individual games of competitive team sports. Exploiting ideas from the theory of random walks, the number of lead changes within a single game follows a Gaussian distribution. We show that the probability that the last lead change and the time of the largest lead size are governed by the same arcsine law, a bimodal distribution that diverges at the start and at the end of the game. We also determine the probability that a given lead is “safe” as a function of its size L and game time t. Our predictions generally agree with comprehensive data on more than 1.25 million scoring events in roughly 40,000 games across four professional or semi-professional team sports, and are more accurate than popular heuristics currently used in sports analytics."

There is, as the abstract implies, a lot in this article, but for this blog I'll be focussing solely on the notion of "safe" leads which, in the context of the piece, are leads that are "unlikely", in some probabilistic sense, to be surrendered over the remainder of a contest.

The arXiv article looks mainly at the scoring progression in games from the NBA, CFB, NHL and NFL, but its methodology is well-suited to AFL games as well.

Section V of the paper provides an equation for the probability that a lead of size L is safe assuming that:

  • Teams are, roughly, evenly matched
  • Tau seconds of the contest remain
  • the contest generates Scoring Events, on average, every delta  T seconds (and this average does not vary significantly across the duration of the contest)
  • s points are generated, on average, per Scoring Event
  • Scoring persistence - the probability that a team scores again having scored last - is p

Given all those parameters, the probability that a lead is safe is given by the equation at right, where erf is the Gauss Error function. 

The paper suggests, slightly incorrectly I think, that this gives the probability that "the team leading at time t will ultimately
win the game".
 Instead, I'd suggest, it gives the probability that a team with lead L at time t will not see that lead completely eroded at least once over the remainder of the contest. It's entirely possible that the team will see that lead eroded yet still go on to win, but this probability does not include that eventuality.

Regardless, to make this equation operational we need estimates of its parameters, for which purpose I've turned to some previous score progression data that was provided to me by Paul from the afltables site and which covers all games from R1 of 2008 through to the end of the 2014 home-and-away season. 

A typical AFL game in that period lasted for about 7,300 seconds, provided 50.2 Scoring Shots, each generating 3.65 points, with Scoring Shots being generated, on average, every 145 seconds. Whilst there are some differences in the quarter-by-quarter statistics - most notably that scoring, scoring frequency, and accuracy tend to increase in the second half of games compared to the first - no major disservice will be done, it appears, if the all-quarter averages are used to characterise the scoring at any given time.

Another key parameter in the model postulated in the paper is the "persistence" of scoring, which is the tendency for the team that scored last to also score next. The table at left summarises the data on this aspect for the period noted earlier and reveals that, when the last score was a goal by one team, the next score is a goal by that same team 28% of the time, a behind by that same team 25% of the time, a goal by the opposing team 25% of the time, and a behind by the opposing team 22% of the time.

The table also provides similar data for the situation where the previous score was a behind and, overall, suggests that "persistence" in AFL - the rate at which a score by one team is followed by another score by the same team - runs at about 54%.

I've used that 54% figure in producing the table at right along with the all-quarter averages for scoring rates and scoring returns per opportunity, on the assumption that these all-quarter averages are a reasonable proxy of the quarter-by-quarter data. The table shows the probability that a lead of a given size with some specified time remaining will be run down over the course of the rest of the game.

So, for example, it suggests that a 54-point lead at (roughly) quarter time, when about 5,400 seconds of the contest can still be assumed to remain, will be run down only about 2% of the time.

As well, it suggests that a 3 point lead with 60 seconds left will be run down only 23% of the time.

Now this first table assumes a rate of "persistence" consistent with recent history, but this year seems to have been characterised by higher levels of persistence (or "momentum" if you like), so it might be worth reproducing the same table assuming a higher rate of persistence, say 60%.

The result of doing this appears at left, and reveals that the implications of changing the level of persistence - at least to the extent explored here - are only small. Even with only 60 seconds left and a desperate 1-point lead, the probability of it being overcome shrinks only from 31% to 27% given the higher level of persistence shown here.

Big leads are less safe in a world with more persistence, but only slightly so. A 30-point lead at half-time (ie with 3,600 seconds to go), is still safe 82% of the time, only 5% points less safe than it was in the world with lower levels of persistence.

But, maybe, games with higher rates of persistence are also characterised by greater accuracy and higher rates of scoring generally, so let's consider a situation where persistence is 60%, the average yield per Scoring Shot is 4, and the average time per Scoring Shot is just 2 minutes.

We again find that leads of any size are less safe that they were under the assumption that the relevant parameters could be estimated from the all-quarter analysis, though again we find that the probability of a 1-point lead with 60 seconds to go being run down is only slightly different under the assumptions made here than under the original assumptions - it alters from the original 31% to just 23%.

Our 30-point lead at half-time is also still safe 74% of the time, down from the 87% we had in the original table.


So, how does this model of scoring perform empirically? To answer this question let's again use the scoring progression data we have for 2008 to 2014 and, for every scoring event in the third and fourth quarters of the games in that sample, use the model to estimate the probability that the lead at that scoring event is safe given the size of the lead and the time remaining. In the first game in our sample, for example, Richmond led Carlton by 18 points with 1,076 seconds to go. Given that position, the model estimates the Tigers' lead should be safe 91% of the time.

We can also then calculate, with the benefit of the knowledge of the actual course of the remainder of the game, whether or not the lead was, indeed, safe, and then assess how well-calibrated are the model's probability assessments. A well calibrated forecaster is one who assesses as X% likely events that happen X% of the time, for all values of X.

The results of performing such an analysis appear in the table at left in which each row summarises the performance of the model for probability assessments falling within specific bounds. The first row, for example, summarises the results for those scoring events after which the model assessed the probability of the current lead being safe as less than 5%.

It reveals that, when such assessments were made in 3rd Quarters, the average estimated probability that the lead was safe (averaged across all the times when the assessed probability was under 5%) came in at about 4%, but that the leads at the time were safe, in fact, about 15% of the time. The better calibrated the model, the closer will be these two numbers.

Looking firstly at the results for scoring events and assessments in 3rd Quarters suggests that the model is quite well-calibrated for probability assessments greater than 15% but is a little conservative about how safe leads are for smaller probability assessments.

In 4th Quarters the model is well-calibrated only for probability assessments greater than 55% and is, again, too conservative when its probability assessments are lower.

The conservatism in the model can't be blamed on the fact that we've used the all-quarter scoring averages for the model rather than averages relevant to specific quarters, because scoring is more frequent and the average score per event higher in 3rd and 4th Quarters, so using quarter-specific metrics in the model would tend to make it more conservative in this case.

More likely, the conservatism stems from the assumption that the teams are always evenly matched, which would tend to make larger leads less likely to be deemed safe than they really are. The paper does discuss how the probability formula might be adjusted to cater for teams of unequal ability, and this is a topic I might look at in a future post.

In the meantime and in any case, I think it's interesting to see how well the model performs generally, especially when it deems a lead to be safe with a probability in excess of 50%. There is then, apparently, something of a random walk in the progression of the score in an AFL game.