Why You Should Have Genes in Your Ensemble

Over on the MAFL Wagers & Tips blog I've been introducing the updated versions of the Heuristics, in this post and in this post. I've shown there that these heuristics are, individually, at least moderately adept at predicting historical AFL outcomes.

All told, there are eleven heuristics, comfortably enough to form an ensemble, so in the spirit of the previous entry in MAFL Statistical Analyses, the question must be asked: can I find a subset of the heuristics which, collectively, using a majority voting scheme, tips better than any one of them alone?

Well no, I can't. No ensemble of tipsters that I can assemble, guided by their majority vote, performs better than BKB across the 11 seasons from 2000 to 2010. I suspect that this is largely because any decent ensemble must contain BKB and because all other heuristics have a strong propensity to agree with BKB, and be wrong when they don't. Consult the Ladder agrees most often with BKB - 78% of the time - and the relatively poorly performing Home Sweet Home agrees least often, but even it manages to concur 62% of the time. 

If BKB's so good, why not let it have a larger say in any ensemble by giving it more than one vote? In fact, why not build an algorithm to find how many votes each underlying heuristic should have in order to create the most predictively accurate ensemble?

Enter, the genetic algorithm, or GA for short.

For my GA, the organisms that I'm evolving have genes that are vectors of 11 integers, each a chromosome if you like, lying between 0 and 100 and reflecting how many votes the corresponding heuristic receives in the ensemble. So, for example, if an organism's genome is {0,40,25,26,93,20,9,11,3,44,1} then it makes its predictions by weighting BKB's opinion by 0, Home Sweet Home's by 40, Consult the Ladder's by 25, and so on, then summing these weighted opinions and selecting the team that reflects the weighted majority opinion. (In the unlikely event of a tie, the organism picks the home team.)

The choice for a fitness function for my GA - the basis on which an organism's evolutionary fitness is assessed and its chances of procreating are determined - is pretty obvious: it's the number of predictions that the organism gets right. Since an organism's likelihood to mate is directly related to its fitness, organisms with better predictive abilities are more likely to mate than those with poorer abilities and so, over time, the population of organisms should increase in average fitness. ("Would you like to come back to my place, I'm good at predicting footy results?" is clearly an irresistible offer.)

When organisms mate, their offspring shares chromosomes (weights) from each parent's genes. Six of the weights come from one mate and five from the other. So, if my organism above with genome {0,40,25,26,93,20,9,11,3,44,1} mated with another organism with genome {4,2,1,1,0,0,0,6,7,8,9}, one possible offspring is an organism with genome {0,2,1,26,93,0,20,9,7,44,9}, where the bold weights come from the first organism and the non-bold weights come from the second organism. 

Random mutations are introduced by changing, on average, some percentage of any organism's weights. Choosing this mutation percentage is important to creating generations with generally improved fitness: too small a percentage and you risk creating a species that taps out at some relatively low level of fitness, too high a percentage and you risk creating a species that doesn't evolve at all.

Another important component of the ecosystem is how fitness maps to procreative chances. Again you need to find the balance between letting the early, most fit organisms hog all the mating opportunities, leaving the species as a whole tied down to the potential of these precocious individuals, and letting every organism breed with roughly equal chances, regardless of its fitness, which can lead to a species (presumably more generally happy) but of more mediocre and stable predictive abilities.

So, I ran my GA, and after a few hours of evolution, a creature emerged which could tip better than BKB, but only just. Its gene was {99 (BKB), 44 (HSH), 17 (CTL), 6 (Shadow), 7 (Silhouette), 64 (EI I), 28 (EI II), 6 (STM I), 1 (STM II), 3 (RYL), 0 (FTS)}. It correctly predicts 67.6% of results across seasons 2000 to 2011, just 0.4% more than BKB. It's smarter than BKB, but I can't promise that its intellect will generalise into the new environment that is Season 2011.

In statistical terms there doesn't appear to be a lot of useful additional information for predicting winners in the other heuristics' predictions over and above that which resides in BKB.

Perhaps there is information in the other heuristics, however, that could be used to create a profitable level-stake betting algorithm.

Again we can use a GA to test this, similar to the one I used above, but with a fitness function now based on betting returns. So, whereas a correct tip in the GA used earlier would score an organism 1 fitness point and an incorrect tip would score it 0, now a correct tip scores the organism an amount equal to the betting return that would accrue from wagering one unit on that tip (which is the price of the winning team minus 1) and an incorrect tip scores it -1 fitness points. An organism's overall fitness then is the profit that would have accrued from level-staking its predictions. Everything else in the GA is as before, and once more we must carefully choose the mutation rate and the mapping of fitness to fecundity.

(Since I mistrust the bookie data I have for seasons prior to 2006, I used only the data from seasons 2006 to 2010 in running this GA.)

A few hours later, I had this: 

{0 (BKB), 100 (HSH), 3 (CTL), 37 (Shadow), 13 (Silhouette), 68 (EI I), 8 (EI II), 13 (STM I), 24 (STM II), 91 (RYL), 0 (FTS)} 

Level-staking the predictions from this organism would have netted a 39.3 unit profit, which represents a 4.5% ROI over the five seasons. Profits would have accrued in every season but 2007, 16 to 19 units in 2006, 2008 and 2010, and just over 1 unit in 2009. The loss in 2007 would have been about 13 units.

Once again I'm not confident that this organism will do well in Season 2011, but I'll be watching ...