As I'm fairly certain I've commented before: it's somehow part startling and part comforting when the Normal distribution turns up at a party to which it's not been formally invited.
In putting together the most-recent blogs on the in-running models I had reason to analyse the distribution of the change in the Home team lead from the end of one quarter to the next (using data for the period 2006 to 2012) and found that these three random variables - one defined by the change in the lead between the end of Q1 and Q2, another by the change in the lead between the end of Q2 and Q3, and the third by the change in the lead between the end of Q3 and Q4 - were all statistically indistinguishable from Normal distributions.
An ocular assessment suggests that a Normal distribution will do a fine job of describing all three of these curves, and the Shapiro-Wilks test fails in all three cases to reject the null hypothesis that they are, indeed, Normally distributed (the p-values are 0.42, 0.57 and 0.64, respectively).
What's more, the three distributions are all remarkably similar in terms of mean, median and skewness:
- Lead Change Between End of Q1 and Q2: Mean 1.998; Median 2.000; Skewness -0.0096
- Lead Change Between End of Q2 and Q3: Mean 1.992; Median 2.000; Skewness -0.0022
- Lead Change Between End of Q3 and Q4: Mean 1.958; Median 2.000; Skewness -0.0239
Frankly, that's abnormally Normal.
For practical purposes, it's also useful to know the ranges that encompass 50%, 75% and 90% of these distributions:
- Lead Change Between End of Q1 and Q2: 90% range (-26 to +31); 75% range (-19 to +22); 50% range (-10 to +14)
- Lead Change Between End of Q2 and Q3: 90% range (-28 to +33); 75% range (-20 to +23); 50% range (-11 to +14)
- Lead Change Between End of Q3 and Q4: 90% range (-30 to +32); 75% range (-19 to +23); 50% range (-11 to +15)
(We used these ranges in the previous blog to explore the appropriate range over which we might safely apply the in-running models.)
These ranges further demonstrate the remarkable similarity of these three distributions. You could use these details, for example, to frame a market on the Home team lead, and offer even money that it would change by between about -2.0 and +2.5 goals from the end of any one quarter to the next.
The only lead change that seems to be a little different is that for the change between the end of Q3 and the end of the game where there seems to be an atypical 'bump' close to zero. The charts below show the same density information as appears in the chart above, along with the count data on which they are based.
Inspecting these, there does indeed seem to be a small gap for Home team lead changes in the final term sized slightly larger than zero (see the chart on the right), suggesting that Home teams are relatively less likely to extend their lead by only a very small amount in the final term. More likely is that they'll not extend it at all, see it reduced by a little, or increase it by a little bit more than a little - if you'll forgive the jargon.