In earlier blogs I've claimed that there's not much additional information in bookie prices that's useful for predicting victory margins than what can be derived from a statistical analysis of recent results and an understanding of game venues.

We saw that the Smart Model - which makes its victory margin predictions using only team MARS Ratings, a team's two most recent results and information about whether or not the relevant game is being played interstate from the point of view of one of the participants -performed about as well as the best model that can be built using bookmaker prices, where performance is measured on any of a number of sensible metrics.

Since the Smart Model performs about as well the model based solely on bookie prices, it's likely that the variables used in the Smart Model are themselves correlated with bookie prices. One way to determine just correlated they are is to construct a model that attempts to predict bookie prices using the same types of variables that were used in constructing the Smart Model. The greater the R-squared of this model, the greater the extent to which bookie prices can be predicted by such variables, and the greater the justification we can feel for claiming that a large proportion of the information in bookie prices is just a summary of previous results and venue information.

Well I have built a model using this approach. Herewith the Bookie Probability Prediction Model:

Predicted Bookie Probability for Team = 0.981158*exp(0.0111118*Home_Team)* logistic(0.00237077*Ave_Res_Last_3 + 29.8825*Own_MARS + 0.41195*Interstate_Clash - 29.821*Opp_MARS)

where

Home_Team = +1 if the team in question is at home, 0 if there's no home team for the game, and -1 if the opponent of the team in question is playing at home

Ave_Res_Last_3 = the average victory margin for the team in question over the most recent three home-and-away games in this or, if necessary, the previous season

Own_MARS = the team's own MARS Rating divided by 1,000

Opp_MARS = the team's opponent's MARS Rating divided by 1,000

Interstate_Clash = +1 if the team in question is playing in its home state and its opponent is playing out of its home state, 0 if neither or both teams are playing out of their home states, and -1 if team in question is playing out of its home state and its opponent is playing in its home state (NB for this purpose, the Gold Coast and the ACT are both considered to be separate States from QLD and NSW respectively)

and

logistic(x) = exp(x)/(1+exp(x))

This model explains about 83.9% of the variability in the team victory probabilities implicit in the TAB Sportsbet bookie's head-to-head prices. That's more evidence then to support my earlier conjecture.

Here's what an 83.9% R-squared looks like in its native habitat:

Using this model to predict bookie probabilities would have you within 5% of the actual bookie probability about 50% of the time and within 2.5% about 25% of the time. Your average absolute prediction error would be just over 6%.

But the Bookie Probability Prediction Model (BPPM) can be used for more than just second-guessing the bookies and proving my point about the information content of bookie prices. By interpreting its predictions of probabilities as standalone estimates of the true victory probabilities we can use them as the basis for a wagering system.

To do this I'll take these probabilities and decide when and how much to wager on a team using the Kelly Betting criterion. Our wager on a team for whom we've predicted a victory probability of Prob and for whom the bookies are offering \$BookiePrice in the head-to-head market will then be given by:

Wager = max(0, (Prob*BookiePrice - 1)/(BookiePrice - 1))

So, for example, if the BPPM estimates a team's victory probability as being 70% and the price being offered for that team is \$1.70, then the recommended Kelly Bet fraction is (70% x 1.7 - 1)/(1.7 - 1) = 0.271 units. If the price on offer was less than \$1.43 then no bet would be made since a BookiePrice of \$1.43 makes 70% x BookiePrice - 1 approximately equal to zero.

Returns from following such an approach commencing in Round 13 of 1999 and finishing in Round 12 of 2010 are shown in the following tables.

The top table shows the return from wagering in all rounds of every season and splits the results into those that would have been achieved wagering on home teams and those that would have been achieved wagering on away teams. I created this split because all my previous experience building wagering models has shown that it's hard to make money betting on away teams.

And indeed that's the situation here too with the BPPM. Wagering on home teams alone would have been profitable in all but two seasons and would have been highly profitable overall; wagering on away teams would have been a slow and painful way to wind up contributing more than your fair share to State Government infrastructure projects.

Yet more profit would have accrued had we passed on the opportunity to wager in Round 1 of every season. Such foresightedness would have lifted our ROI from 28% to 31% while still leaving us to make 40 to 60 wagers a year or about 2 or 3 a week.

Though the results don't appear in the table above I've been tracking the returns that would have accrued had we followed this strategy in Rounds 13 through 15 of the current season. Round 13 would have been quite unprofitable, while the small gains in Round 14 would have been roughly offset by the losses in Round 15. All up the updated ROI for the full season would be 37.2%, a little less stratospheric but still acceptable.

Given the prices for the Round 16 games as they currently stand (on Monday night), the home team bets for the weekend would be 0.086 units on Adelaide at \$3.75 and 0.065 units on Collingwood at \$2.00.

Pending ongoing monitoring, the BPPM can be added to the Super Smart Model as a candidate Fund algorithm for 2011.