In an earlier blog we estimated empirical relationships between Home Teams' success rate in each Quarter of the game and their Implicit Probability of Victory, as reflected in the TAB Bookmaker's pre-game prices. It turned out that this relationship appeared to be quite similar for all four Quarters, with the possible exception of the 3rd. We also showed that there was a near one-to-one relationship between the Home Team's Implicit Probability and its actual Victory Probability - in other words, that the TAB Bookmaker's forecasts were well-calibrated.
Together, these results imply an empirical relationship between the Home Team's likelihood of winning a Quarter and its likelihood of winning an entire Game. In this blog I'm going to draw on a little probability theory to see if I can derive that relationship theoretically, largely from first principles.
THE FULL DERIVATION
The axioms of standard probability theory allow us to state that the probability of a Home Team victory is equal to the sum, for X equal to 0 through 4, of the probability that the Home Team wins X Quarters times the probability that it is victorious given that it wins X Quarters. Or, as an equation:
Pr(Home Team Wins) = Pr(Home Teams Wins 0 quarters) x Pr(Home Team Wins | Home Team Wins 0 quarters) + Pr(Home Teams Wins 1 quarter) x Pr(Home Team Wins | Home Team Wins 1 quarter) + Pr(Home Teams Wins 2 quarters) x Pr(Home Team Wins | Home Team Wins 2 quarters) + Pr(Home Teams Wins 3 quarters) x Pr(Home Team Wins | Home Team Wins 3 quarters) + Pr(Home Teams Wins 4 quarters) x Pr(Home Team Wins | Home Team Wins 4 quarters).
Now let's assume - based on what we've seen empirically - that the Home team has a fixed probability, p, of winning any of the four quarters. We can then say that:
- Pr(Home Team Wins 0 quarters) = (1-p)4
- Pr(Home Team Wins 1 quarter) = 4p(1-p)3
- Pr(Home Team Wins 2 quarters) = 6p2(1-p)2
- Pr(Home Team Wins 3 quarters) = 4p3(1-p)
- Pr(Home Team Wins 4 quarters) = p4
Next, from empirical data for the period 2006 to 2012, we can determine that:
- Probability(Home Team Wins | Home Team Wins 0 quarters) = 0% (obviously - if you don't win a Quarter, you don't win the game)
- Probability(Home Team Wins | Home Team Wins 1 quarter) = 6%
- Probability(Home Team Wins | Home Team Wins 2 quarters) = 51%
- Probability(Home Team Wins | Home Team Wins 3 quarters) = 97%
- Probability(Home Team Wins | Home Team Wins 4 quarters) = 100% (also obviously)
(Note that, for simplicity, I've ignored the possibility of tied quarters).
Combining all these pieces of information allows us to create an expression that relates the probability of a Home team victory to its fixed probability, p, of winning a Quarter:
Pr(Home Team Wins) = 6% x 4p(1-p)3 + 51% x 6p2(1-p)2 + 97% x 4p3(1-p) + p4
The graph of this relationship should, by now, look a little familiar.
If you switch "Home Team Implicit Probability" for "Probability Wins Game" (on the x-axis) and "Probability Home Team Wins Quarter" for "Probability Wins Quarter" (on the y-axis), then you have a chart that's extremely similar to those from the earlier blog.
For example, at the anchor points (as I referred to them in the earlier blog) of 25%, 50% and 75%, the Probability Wins Quarter values are within about 3-5% points of what we found empirically in that earlier blog.
A SIMPLIFYING ASSUMPTION
Actually, we can get even closer to the values we found in that blog by replacing the empirical values for the conditional victory probabilities in the equation above - the 0%, 6%, 51%, 97% and 100% - with values that we might have come up with based on the reasonable simplification that Home teams winning 0 or 1 quarters all lose, Home teams winning 3 or 4 quarters all win, and those that win 2 quarters win half of the time.
The previous equation then simplifies to:
- Pr(Home Team Wins) = 3p2(1-p)2 + 4p3(1-p) + p4 = p2(3-2p)
The chart for this cubic equation is very similar to the earlier quartic, the only noticeable difference being for smaller values of p, as you can see from the following chart which shows both the quartic (in blue) and the quintic (in red).
That's a remarkable simplification. What we're saying is that the Home Team's victory probability V, is directly related to its probability of winning any single Quarter, p via the relationship V = p2(3-2p).
As simple as this equation is, it's surprisingly challenging to find a simple way of expressing its inverse - that is, to express p as a function of V.
An equation in this form would be far more useful to us in practice since we can get an estimate of V from the Bookmaker's pre-game prices, and then use that estimate to make claims - and even to make wagers - about the outcome of individual quarters.
To come up with an approximation for the inverse function I eventually called upon the services of Eureqa, feeding it a range of value pairs for V and p and then asking it to create an equation to fit p as a function of V. It's the extreme values of p and V - those near 0 and 1 - that are the hardest to fit, so I restricted the range of V values to which I asked Eureqa to fit a model to the interval (0.05, 0.95). In that range, the following equation provides values for p that are in error by at most 0.7% points:
p = 0.031720 + 2.27862*V - 7.65809*V
Even outside that range, the approximation isn't too egregious, and never more than a couple of percentage points.
We've now got a pair of equations that allow us to relate a team's probability of winning any quarter (p) to its probability of victory (V).
V = p2(3-2p).
p = 0.031720 + 2.27862*V - 7.65809*V2 + 17.19818*V3 - 18.12259*V4 + 7.24110*V5
These equations can be motivated by the empirical data for the period 2006 to 2012 or by a more theoretical approach based on standard probability theory; the assumption that a p is constant for all quarters in a given game; and by the simplifying assumption that teams winning fewer than 2 quarters lose, more than 2 quarters win, and exactly 2 quarters win half the time.