AFL Crowds and Optimal Uncertainty

Fans the world over, the literature shows, like a little uncertainty in their sports. AFL fans are no different, as I recounted in a 2012 blog entitled Do Fans Really Want Close Games? in which I described regressions showing that crowds were larger at games where the level of expected surprisal or 'entropy' was higher. 

The entropy measure I used in those regressions treated the source of any uncertainty symmetrically in that, for example, a game where the home team was a 70% favourite was assessed as having the same level of entropy - and hence the same level of interest to an intending attendee - as a game where the away team was the 70% favourite. This agnosticism towards uncertainty might or might not reflect the reality of fans' behaviour. Some recognition was given in those earlier regression model to the likely greater importance of home team favouritism by including a dummy variable reflecting whether or not the home team was the favourite, but that was the extent of any asymmetric treatment of the two teams.

More recently I've come across some articles in the literature suggesting that there is an optimal home team probability for some sports that serves to balance the Goldilocksian desires of home team fans to feel that a home team victory is more likely than not, but not so likely that it renders the contest no longer worth watching.

That would suggest a more nuanced asymmetric treatment of home team versus away team prospects is required and it's this aspect that I'm especially keen to visit later in this blog.

CROWDS AND SURPRISE BY TEAM

First though, let's update those 2012 tabulations of crowd, entropy and actual surprisal data, now adding the Home and Away season data for 2013 and 2014 to give us nine complete years of data.

(As in the earlier blog, crowd data has been sourced from the afltables site, to which I again here record my gratitude.)

On the left of the table is the summarised data for teams when playing at home, a status enjoyed by each team on 99 occasions across the nine years, except by the two most-recently joined teams. Note that, for the purposes of this blog, I've used the AFL's designation of 'home' and 'away' status, so, for example, a designated Tigers home game at the Gold Coast Stadium is counted as a Tigers home game.

Here we can see that Adelaide has averaged crowds of just over 39,000 when playing at home and has participated in contests where the average entropy has been 0.847 bits. That's roughly equivalent to a game where the favourite is a 72.6% proposition. Those contests have been slightly more surprising that the TAB Bookmaker expected since the actual average level of surprisals generated by the outcome of those games has been 0.894 bits.

Staying focussed on this block of data we can see that Collingwood are the team with the largest average home crowd at a little under 56,500 per game. At the other end of the scale, GWS have drawn the smallest home crowds at just under 10,000 per game. Amongst the more established teams, the Brisbane Lions have drawn the smallest home crowds (just over 25,000 per game), with the Roos (about 26,000), Port Adelaide (about 26,600), the Western Bulldogs (roughly 27,750), and Melbourne (about 29,700) the only other teams to average under 30,000 per game.

At home, Roos fans have faced the greatest uncertainty of outcome, their average contest promising 0.868 bits of information. Those games have ultimately delivered even more surprise than was expected (0.887 bits), but this is considerably less than the average surprise delivered by Essendon (1.011 bits) and Port Adelaide (0.974 bits) home games. GWS, at home, has promised least surprise (0.575 bits) of any team, but despite being even less surprising (0.544 bits) than expected, finishes just behind Geelong (0.543 bits) in actual surprisal generation. In other words, Geelong's home games have gone even more to script than have GWS'.

(For a refresher on surprisals and entropy, please refer to the 2012 blog.)

Turning next to the middle block in the table, which relates to teams' away statistics, we find the Pies once again claiming top spot in crowd attraction. Their average crowd of just over 53,000 per game is almost 7,000 fans per game higher than the next best team, Essendon, at just over 46,200. Surprisingly, Collingwood and Essendon are two of only eight teams that draw larger average crowds at home than away, and two of only four Victorian teams about which the same can be said, Carlton and Richmond being the other two. Put another way, more than half the teams in the competition have drawn larger average crowds over the past nine years away from home than at home. That fact is also true if we consider only the most recent season, 2014, alone.

Members of those large away crowds for Pies games have witnessed, on average, the most surprising outcomes of any team, each contest having generated 0.915 bits of information (and bear in mind that the toss of a fair coin generates only 1 bit of information). In contrast, the 22,000 fans who've turned up to an average GWS away contest have very rarely been surprised about the outcome. They've witnessed just 0.343 bits of information per outcome, which is about as surprising as picking a business day at random and finding that it isn't a Friday.

Combining the home and the away data tells us that the Pies have drawn about 54,750 fans to an average game across the nine years, which is about 8,000 more than Essendon and 10,000 more than Carlton. It also reveals that games involving the Crows have promised the highest levels of surprise (0.860 bits) and delivered the fourth-highest levels of actual surprise (0.889 bits). Essendon's results have represented the highest level of surprise (0.938 bits per game), which is about 9% more information per game than was expected. That figure of 1.09 is also the highest ratio for any team of actual to expected surprise, the lowest ratio belonging to GWS whose average result has delivered 18% less surprise than was apparently expected. (My entropy calculations do, of course, depend on the methodology employed for inferring probability from head-to-head prices - which for this blog has been the Overround-Equalising variant - and will be affected by the extent to which this methodology misattributes overround to the teams involved).

Despite its high nine-year average, Collingwood's crowd-drawing ability appears to be declining. After peaking in 2010 at just under 61,500 fans per game, its home ground attendances have fallen in every succeeding year, most precipitously in 2014 when it fell by more than 7,000 fans per game. Its appeal playing away also declined, the average attendance at its away games falling by over 5,000 fans per game.

Adelaide and Port Adelaide were teams whose home team attendances spiked most notably in 2014, rising by over 14,000 and 17,000 per game respectively as they swapped their home games from Football Park to the revamped Adelaide Oval. Those levels of increase did not extend to their away game attendances however, which in Adelaide's case fell by over 2,000, and in Port Adelaide's rose by just over 3,000 fans per game.

Carlton and Richmond were other teams whose away performances were notably less well attended in 2014, while Sydney enjoyed modest increases in attendances at home and at away contests.

CROWDS AND SURPRISE BY VENUE

If you delight in large crowds then the MCG should have been your venue of choice over the past nine years, each contest there pulling an average attendance of almost 50,000. Assuming instead that you were seeking uncertainty - and, in these uncertain times, who isn't? - then Football Park would've been your preferred venue, promising that you'd leave the ground with 0.858 more bits of information than you'd had when you'd entered.

That choice would have left you a little disappointed however, as you would have boarded your transportation home with only 0.748 bits of information, considerably less than those wiser folk who'd ventured to Aurora Stadium and packed away, on average, 1.011 bits to take home. Aurora Stadium, however, offered only 37 opportunities for information acquisition across the nine years so, very arguably, Docklands would have been a superior choice. There you'd have had 428 opportunities to grab 0.974 bits of information per game. In total, the 428 contests at that venue have provided fans with 417 bits of information, making it by far the greatest generator of football information of any venue in use at any time during the period. The next best information pump, the MCG, has generated only 286 bits and this from just 21 fewer games.

CROWDS AND SURPRISE BY DAY OF WEEK

No day of the week has been spared at least one game of football at some point during the nine seasons being reviewed here, though Tuesdays and Wednesdays have been used only twice each, three of these ANZAC day Collingwood v Essendon clashes, and the fourth an odd Hawthorn v Geelong game in 2011 on the day after ANZAC day to complete a curtailed Round 5 of just seven games that had started on the previous Thursday. AFL scheduling continues to mystify.

The average crowds on these two days have been very high (about 85,000 to 90,000), especially in comparison to the averages we've seen for Saturdays and Sundays (about 32,000 to 33,000). About 12% of all games have been played on Fridays, with these games drawing about 50% larger crowds than games on the weekend.

Across the core Friday to Sunday block, Fridays have both promised (0.859 bits) and delivered (0.838 bits) the most surprise, while Sundays have delivered least (0.775 bits) despite promising about the same level of surprise as Saturdays. If you're looking for an upset in a typical week of football then, Sundays would appear to be your worst bet and Fridays your best.

APPEARANCES OF EACH TEAM BY DAY OF WEEK

Lastly, before we analyse the results of the regression modelling, let's review the profile of each team's appearances, home and away, by day of week.

The Gold Coast and GWS have each played about 80% to 85% of their home games on a Saturday, more than any other team, while the Brisbane Lions and Gold Coast have each played 70% or more of their away games on Saturdays.

St Kilda have played almost one-quarter of their home games and almost 20% of their away games on Fridays. Collingwood have also featured prominently on Fridays, playing almost one-quarter of their home games and almost 30% of their away games on this day. Carlton, Essendon, Geelong and Hawthorn have been other teams frequently appearing in the early part of the weekend.

Combining home and away appearances, the Gold Coast have the largest proportion of Saturday games (78%) and the Western Bulldogs the smallest (38%), while Melbourne are highest for Saturdays (48%) and Collingwood lowest (20%). Collingwood, however, have the largest proportion of Friday games (26%), and GWS and Gold Coast the smallest (0%). None of the non-Victorian teams have played more than 10% of their games on a Friday. 

REGRESSION MODELS

As I mentioned at the start of the blog ,one of my goals was to investigate the notion of a crowd-optimising home team probability. Here I've done this by including Home Probability in a simple linear regression, both linearly and as a squared term, which is about the simplest way I can imagine of incorporating Home Probability in a way that affords the notion of an optimum.

Also included in the regression are the same terms as from the 2012 blog post, namely the identities of the home and the away teams, the game venue, and the day of the week on which the game was played. 

In the first of the regressions, the results of which appear on the left of the table at right, no terms are included that relate to individual seasons and the coefficients on Home Probability and its square are both statistically significant and imply that the optimum home probability is about 58%. 

We can derive this by noting that the sum of the Home Probability terms is maximised when the derivative of the sum is equal to zero, which implies that the Home Probability is equal to minus the coefficient on the linear term divided by twice the coefficient on the squared term. 

This figure of 58% is broadly consistent with the results I've seen for other sports (see, for example, this piece on NBA, which records an optimum home team probability of 67% and alludes to results of between 60% and 67% for MLB; this piece, which finds an optimum home team probability of 60.5% for the NRL; and this article, which summarises the findings of a large number of other papers and suggests that a range of optima have been found for some sports, and none at all for others, but that optima, where they exist, are most often in the 60% to 70% range).

For the regression whose results are recorded on the right of the table I've allowed for season-specific coefficients for the two terms involving home team probability, which serves to modestly lift the proportion of explained variance from 77.87% to 78.18% and which also affords the opportunity to calculate season-specific optimal home team probabilities.

These optima are shown in the table below and suggest that fans preferred stronger home team favourites in 2006 and 2007, but also that they have since generally rewarded less certain home team victors with their attendance in ensuing seasons.

The most recent season has hinted at a return towards a preference for more highly fancied home teams, though the optimum of 58.7% for 2014 remains well below the 64 and 65% optima of those earliest analysed years.

It seems then, that AFL crowds most prefer home teams that are modest favourites. The Goldilocks zone appears to span an approximate range from 55% to 65% (which equates to prices of about $1.50 to $1.75 assuming a 5% overround) within which, using the 2014 coefficients, estimated attendances are practically flat. Outside that zone, increases or decreases in home team probability have more material affects on expected attendance. For example, about another 660 fans could be expected to attend a game with a 59% home team favourite compared to the same game with a 3/1 on home team favourite. Compared to a 4/1 on home team favourite the additional expected attendance would be 1,130 fans.

Similar calculations for other home team probabilities are shown in the chart at right.

A comparison of the regression coefficients in this blog with those from the 2012 blog in which Entropy was used instead of Home Probability (and its square) shows broad similarity in the outputs. This is partly because of the high empirical correlation between Entropy and Home Probability x (1-Home Probability), which comes in at +0.998 across the nine years. It's also because the crowd-optimising Home Probability turns out to be quite close to 50%, which is the implicit optimum imposed by the earlier formulation using Entropy. To see why this is the case, recall that the 2012 regression had a positive coefficient on Entropy, so higher Entropy implied larger estimated attendance. Next, recall that Entropy is maximised when the Home Probability is 50%. QED.

CONCLUSION

The raw historical data reveals that per game attendances in 2014, though up marginally on the 2012 and 2013 figures, are nonetheless down by about 4,000 fans per game (or 13%) on the 2008 home and away season peak.

This decline has coincided with the introduction of the Gold Coast and GWS into the competition and with an overall increase in the variability of home team probability, as shown by the boxplot at left. The standard deviation of home team probability peaked in 2012 at 27.7% points, which is almost 30% higher than the standard deviation for 2008.

Increased variability has meant a decrease in the proportion of games within the "Goldilocks Zone" of home team probabilities, which I've defined as the set of probabilities within 15% of the optimal season-by-season home team probabilities as estimated using the second of the regression models above. It's really only outside this range, the modelling suggests, that the impact on estimated attendance is material - say 500 fans or more.

It'd be wrong though to attribute too much of the per game decline in attendances to the changing distribution of home team probabilities in recent seasons - the modelled effect of a change in home team probability from the 2014 optimum of 59% to, say, 30% or 88% is only about 2,000 fans, which is only one half of the decline from 2008.

Still, it'd be nice to move closer to the situation as it was in the 2007 to 2010 period when 40-50% of all games were in the Goldilocks zone rather than the situation as it stands now where the proportion is nearer 30%. Here's hoping that the Suns and the Giants continue to improve, that the Dees and the Saints become more competitive, and that no other team surprises us with its ineptitude in 2015.  

The Effects of Narrow Wins and Losses On Subsequent Performance

I've often heard it asserted after a team's close loss that it will "bounce back harder next week". With a little work, that's a testable claim.

Firstly, we need to define some terms, specifically what we mean by "close" and what constitutes "bouncing back". Today I'll test three definitions of close, a margin of under 6 points, under 12 points, and under 18 points, and I'll define bouncing back as having a higher probability of winning the next game than might be expected or as producing a better-than-expected game margin in that next game.

While I'm looking for evidence of next-game resilience, I might as well look for evidence of its opposite too - that is, for any evidence that teams enjoying a close win might suffer a performance decline relative to expectations in the subsequent game. 

THE DATA

The search for evidence will draw, as is customary, on the data for all games from the start of season 2006 up to and including the most recent result (which is the regrettable 2014 Grand Final). 

The hypothesis being tested is whether a team's performance in a particular game is influenced by its suffering a close loss (or enjoying a close win) in the immediately preceding game. To properly isolate the effects of such a narrow win or loss we need to control for other factors that might reasonably be thought to influence the victory chances and ultimate game margin of that ensuing game. These factors are:

  • The strength of the teams involved (which I'll proxy by my team MARS Ratings and, for some models, the Bookmaker's Implicit Home Team Probability - here the Risk Equalising version)
  • The game venue (which I'll proxy by team Venue Experience and the game's Interstate Status)
  • Whether or not the team's opponents are themselves playing after having narrowly won or narrowly lost their previous game. This status will be reflected in the models by a pair of dummy variables taking on the value 1 when the relevant status is true (eg the opponents had a narrow loss in their previous game) and 0 when it's false.

(For an explanation of many of these variables see the What Variables Are Used in MAFL Statistical Models? in the MOS Primer

THE MODELS

For each of the three definitions of "close" I've fitted two models:

  • A binary logit, with the game result from the Home team's perspective as the dependent variable (and excluding all drawn games from the data)
  • An OLS regression, with the game margin from the Home team's perspective as the dependent variable

Initially I fit all of these models excluding the Bookmaker Home Team Implicit Probability variable on the assumption that, were there any "close last game" effects these might already be incorporated in the Bookmaker's pricing. Including the Bookmaker variable then serves to account for any such expected last game effects, but also serves to further control for the relative quality of the competing teams to the extent that this is not achieved via my own team Ratings and the other, venue-related variables.

THE RESULTS

Looking firstly at the modelling outputs for the Binary Logit formulations in which we are assessing the effects of narrow wins and losses on the subsequent victory probability of a team we find scant evidence for any "last game effects", regardless of which definition of "close" we adopt. 

The models on the left in each block are those where the Bookmaker Implicit Probabilities have been excluded and they all include small and statistically non-significant coefficients on the Close Win and Close Loss dummy variables. In other words, a team playing after a close win or a close loss in the preceding round is not, statistically speaking, more or less likely to win. Ignoring statistical significance we can say that teams playing after a close win are slightly less likely to win in the next round (adjusting for other factors), regardless of the definition of close, while teams playing after a close loss are actually LESS likely to win too unless we define close loss as being a loss by less than 3 goals. For teams that would otherwise be 50% chances even the largest coefficient in absolute terms implies only a reduction in that probability by 1.4% to 48.6%.

Including the Bookmaker Implicit Probability variable does nothing to change the signs of the Close Win and Close Loss coefficients, though it does serve to increase their absolute magnitudes. Not though by enough to make them statistically significant and, even in the case of the most negative coefficient only by enough to drop an otherwise 50:50 proposition to a 46.7% proposition. So, not statistically significant in any of the cases and probably only practically significant in the most extreme case.

On that basis I think it's fair to assert that a team's close loss or close win in one week has no meaningful influence on the likelihood of its winning its next game after we adjust for the quality of the opponent it's facing and the venue of the subsequent contest.

Perhaps though the effect is more subtle, and teams don't become more likely to win but merely more likely to score or concede a few more points. That's the hypothesis tested via the OLS Regression formulation, the results for which appear next.

Once again, if we're searching for statistically significant effects, we've come to the wrong hypothesis. Excluding the Bookmaker Implicit Probability variable we find promising coefficient values only when we define a close result as being one where the margin is less than a goal. In that case we find that team's backing up after a close loss score 4.3 more points than we'd expect, and teams backing up after a close win score 1.3 points fewer than we'd expect.

Including the Bookmaker Implicit Probability variable produces this same pattern of losing teams subsequently scoring more and winning teams subsequently scoring less after close results, regardless of the definition of close. Still, however, none of the coefficients is statistically significant. There is perhaps some solace in the fact that, as we narrow the definition of close the absolute size of the coefficients increases. So, the narrower the loss (and the narrower the win) the bigger the subsequent effect.

A generous interpretation of these results would be that there might be a small effect - that teams suffering a narrow loss one week score slightly more in the following week and that teams enjoying a narrow win one week score slightly less in the following week, but that the effect is so small (or variable) that we've insufficient sample to confidently proclaim its existence.

Even if such an effect does exist in terms of the ultimate game margin, however, the results of the binary logit models mostly suggest that it's insufficiently large to effect teams' winning and losing chances and serves instead only to reduce the size of a loss or increase the size of a victory.

THE CONCLUSION

The main conclusion is that there's no statistically significant evidence for either of the hypotheses we wound up testing. Teams that suffer narrow losses don't do better (or worse) than they otherwise might have been expected to in the subsequent games, and teams that enjoy narrow wins don't subsequently do worse (or better).

Ignoring statistical significance, there is some, small evidence that very narrow wins and losses (ie those by fewer than 6 points) have about a half-goal effect on the score in the ensuing game in the direction hypothesised. We need more data though to distinguish between the existence of a genuine but small effect and a highly variable but mean zero effect.

 

On Choosing Strong Classifiers for Predicting Line Betting Results

The themes in this blog have been bouncing around in my thoughts - in virtual and in unpublished blog form - for quite a while now. My formal qualifications are as an Econometrician but many of the models that I find myself using in MoS come from the more recent (though still surprisingly old) Machine Learning (ML) discipline, which I'd characterise as being more concerned with the predictive ability of a model than with its theoretical pedigree. (Breiman wrote a wonderful piece on this topic, entitled Statistical Modelling: The Two Cultures, back in 2005.)

What's troubled me, I've begun to realise, about my adoption of ML techniques is their apparent arbitrariness. Just because, say, a random forest (RF) performs better than a Naive Bayes (NB) model in fitting a holdout sample for some AFL metric or outcome, who's to say that the RF technique will continue to outperform the NB approach in the future? It not as if we can speak of some variable as being "generated by a random forest like process", or identify situations in which an RF model can, a priori, br deemed preferable to an NB model.

Note that I'm not here concerned about the stationarity of the underlying process driving the AFL outcome - every model throws its hands up when the goalposts move - but instead with the relative efficacy of modelling approaches on different samples drawn from approximately the same distribution. Can we, at least empirically, ever legitimately claim that classifier A is quantifiably better at modelling some phenomenon than classifier B?

It took the recent publication of this piece by Delgado et al to crystallise my thoughts on all this and to devise a way of exploring the issue in an AFL context.

THE DATA AND METHODOLOGY

Very much in the spirit of the Delgado paper I'm going to use a variety of different classifiers, each drawing on the same set of predictive variables, and each attempting to model the same outcome: the handicap-adjusted winner of an AFL contest.

I've been slightly less ambitious than the Delgado paper, choosing to employ only 45 classifiers, but those I've selected do cover the majority of types covered in that paper. The classifiers I'm using are from the Discriminant Analysis, Bayesian, Neural Networks, Support Vector Machines, Decision Trees, Rule-Based Methods, Boosting, Bagging, Random Forests, Nearest Neighbour, Partial Least Squares, and General Linear Models families. All I'm missing are representatives from the Stacking, Multivariate Adaptive Regression Splines, and Other Ensemble tribes.

The data I'll be using is from the period 2006 to 2014, and the method I'll be adopting for each classifier is as follows:

  1. Select the data for a single entire season.
  2. For those classifiers with tuning parameters, find optimal values of them by performing 5 repeats of 4-fold cross-validation using the caret package.
  3. Assess the performance of the classifier, using the optimal parameter values, when applied to the following season. In other words, if the classifier was tuned on the data from season 2006, measure its performance for season 2007.

In step 2 we'll optimise for two different metrics, Area Under the Curve and Brier Score, and in step 3 we'll measure performance in terms of Accuracy, Brier Score and Log Probability Score.

The regressors available to all the classifiers are:

  • Implicit Home Team Probability (here using the Overround Equalising approach)
  • Home and Away teams' bookmaker prices
  • Home Team bookmaker handicap
  • Home and Away teams' MARS Ratings
  • Home and Away teams' Venue Experience
  • Home team's Interstate Status
  • Home and Away teams' recent form, as measured by the change in their MARS Ratings over the two most-recent games from the same season. For the first two games of each season these variables are set to zero.

The main issue I'm seeking to explore is whether any algorithm emerges as consistently better than any other. With the technique and timeframe I've used I obtain 8 measures of each algorithm's performance on which to base such an assessment - not a large sample, but definitively larger than 1.

Secondarily, the methodology also allows me to explore the effects, if any, of tuning the classifiers using different performance metrics and of measuring their performances using different metrics.

THE RESULTS

In interpreting the results that follow it's important to recognise the inherent difficulty in predicting the winners of line-betting contests, which is what we're asking the various classifiers to do. Were it possible to consistently and significantly predict more than 50% of line-betting winners (or to produce probabilities with associated Brier Scores under 0.25 or Log Probability Scores - as I'll define them later - over 1) then bookmakers everywhere would be justifiably nervous. We probably can't so they're probably not.

With that in mind, it's not entirely surprising that the best-performed classifiers only barely achieve better-than-chance results, no matter what we do.

The first set of charts that appear below summarise the results we obtain when we tune the classifiers using the AUC metric.

Two C5.0-based classifiers finish equal first on the Accuracy metric, averaging just over 53% across the 8 seasons for which predictions were produced. Twenty nine of the classifiers average better-than-chance, while the worst classifier, treebag, records a significantly worse- than-chance 47.5% performance. 

The whiskers at the end of every bar reflect the standard deviation of the classifier's Accuracy scores across the 8 seasons and we can see that the treebag classifier has a relatively large standard error. Its best season was 2007 when it was right 55% of the time; in 2012 it was right only 40% of the time.

Because of its binary nature, Accuracy is not a metric I often use in assessing models. I've found it to be highly variable and poor at reliably discriminating amongst models and so have preferred to rely on probability scores such as the Brier Score or Log Probability Score.(LPS) in my assessments. Charts for these two metrics appear below.

Note that, for the purposes of this analysis I've defined the LPS as 2+log(P) where P is the probability attached to the final outcome and the log is measured in base 2. Adding 2 to the LPS makes the chance LPS score equal to +1 (ie the LPS a classifier would achieve by assigning a probability of 50% to the home team in every contest). The equivalent chance Brier Score is 0.25. Recall that higher LPS' and lower Brier Scores are preferred.

The ranking of the classifiers on these two metrics is very similar, excepting that the C5.0Rules algorithm does well under an LPS regime but not under a Brier Score regime. The blackboost, ctree, gbm, cforest algorithms, along with three SVM-based and four Partial Least Squares-based algorithms, all fill the top places on both measures. Only blackboost and ctree record better-than-chance values for both metrics. They also predict with significantly better than 50% Accuracy.

(Note that three classifiers - rpart, rpart2 and treebag - fail to register an LPS because they occasionally generate probabilities of exactly 0 or 1, which can lead to undefined LPS'.)

In a future blog I might look at bootstrapping the probability scores of some or all of the classifiers with a view to assessing the statistical significance of the differences in their performances, but for now I'll settle for a tentative conclusion that the 10 or so best classifiers under the Brier Score metric are, in some sense, generally superior to the 10 or so worst classifiers under that same metric.

Now in this first analysis I've tuned the classifiers on one metric (AUC) and then measured their performance on a holdout dataset and on different metrics (Accuracy, Brier Score and LPS). What if, instead, I tuned the classifiers using the Brier Score metric?

Perhaps unsurprisingly it doesn't make much difference when we assess the subsequent models using the Accuracy metric. 

We find the same two C5.0-based classifiers at the head of the table, gbm and nnet doing a little better than previously, a few classifiers doing a little worse, but not a lot changing. In fact, 17 classifiers do worse in terms of Accuracy when tuned using a Brier Score instead of an AUC metric, 18 do better, and 10 stay the same.

Accuracy though, as I've already noted, is not my favourite metric, nor was it the metric on which we based our tuning in this round of analysis. That metric was instead a probability score - specifically, the Brier Score - so let's see how the models perform when we assess them using probability score metrics.

The blackboost and ctree classifiers continue to lead, and six of the 10 best classifiers under the AUC-based tuning regime remain in the top 10 under a BS-based tuning regime. The two neural network classifiers, nnet and anNNet, benefit greatly from the change in tuning regime, while the four Partial Least Squares classifiers suffer a little.

Overall though the change of tuning regime is a net positive for classifier Brier Scores, Twenty seven classifiers finish with a better BS, while just four finish with an inferior BS, and five classifiers now show better-than-chance Scores. 

The story is similar for classifier Log Probability Scores, with 23 recording improvements, six showing declines, and five finishing with better-than-chance Scores.

Those five classifiers are the same in both cases and are:

  • blackboost
  • ctree
  • nnet
  • cforest
  • gbm

CONCLUSION

We started with a goal of determining whether or not it's fair to claim that some classifiers are, simply, better than others, at least in terms of the limited scope to which I've applied them here (viz the prediction of AFL line-betting results). I think we can claim that this appears to be the case, given that particular classifiers emerged as better-than-chance performers across 8 seasons and over 1,600 games even when we:

  1. Changed the basis on which we tuned them
  2. Changed the metric on which we assessed them

Specifically, it seems fair to declare blackboost, ctree, cforest and gbm as being amongst the best, general classifiers, and the nnet classifier as capable of joining this group provided that an appropriate tuning metric is employed.

Classifiers based on Support Vector Machines and Partial Least Squares methods also showed some potential in the analyses performed here, though none achieved better-than-chance probability scores under either tuning regime. These classifiers, I'd suggest, require further work to fully assess their efficacy.

One other phenomenon that the analyses revealed - reinforced really - was the importance of choosing appropriately aligned tuning and performance metrics. Two of the four best classifiers, cforest and gbm, only achieved better-than-chance Brier and Log Probability Scores when we switched from tuning them on AUC to tuning them on BS. On such apparently small decisions are profits won and lost.

I'll finish with this sobering abstract from David J Hand's 2006 paper :

"A great many tools have been developed for supervised classification,
ranging from early methods such as linear discriminant analysis
through to modern developments such as neural networks and support
vector machines. A large number of comparative studies have been
conducted in attempts to establish the relative superiority of these
methods. This paper argues that these comparisons often fail to take
into account important aspects of real problems, so that the apparent
superiority of more sophisticated methods may be something of an illusion.
In particular, simple methods typically yield performance almost
as good as more sophisticated methods, to the extent that the difference
in performance may be swamped by other sources of uncertainty that
generally are not considered in the classical supervised classification
paradigm."

... which reminds me of this:

“So we beat on, boats against the current, borne back ceaselessly into the past.”

How Often Does The Best Team Win The Flag?

Finals series are a significant part of Australian sporting life. No professional team sport I know determines its ultimate victor - as does, say the English Premier League - on the basis of a first-past-the-post system. There's no doubt that a series of Finals adds interest, excitement and theatre (and revenue) to a season, but, in the case of VFL/AFL at least, how often does it result in the best team being awarded the Flag?

Read More

VFL/AFL Home-and-Away Season Team Analysis

This year, Sydney collected its 8th Minor Premiership (including its record when playing as South Melbourne) drawing it level with Richmond in 7th place on the all-time list. That list is headed by Collingwood, whose 19 Minor Premierships have come from from the 118 seasons, one season more than Sydney/South Melbourne and 11 more than Richmond.  

Read More

The 2014 Grand Final : When the Coin Flipped

The Sydney Swans were deserved pre-game favourites on Saturday according to most pundits (but not all - congratulations to Robert and Craig for tipping the winners). At some point during the course of their record-breaking loss that favouritism was handed to the Hawks. In this blog we'll investigate when.

Read More

Grand Final History 1898-2013 : Winning Team Scoring Patterns

Only three teams in VFL/AFL history have trailed by more than three goals at Quarter Time in the Grand Final and gone on to win. The most recent was Sydney in 2012 who trailled the Hawks by 19 at the first break before rallying in the second term to kick 6.0 to 0.1, eventually going on to win by 10 points, and before that Essendon who in 1984 trailed the Hawks by 21 points at Quarter Time - and still trailed them by 23 points at Three Quarter Time - before recording a 24 point victory on the strength of a 9.6 to 2.1 points avalanche in the final term.

Read More

Scoring Catenation: An Alternative Measure of Momentum

Almost two years ago, in a post-GF funk, I recall painstakingly cutting-and-pasting the scoring progression from the afltables site for 100 randomly-selected games from 2012. I used that data to search for evidence of in-game momentum, there characterising it as the tendency for a team that's just scored to be the team that's more likely to score next.

Read More

Scoring Shot Conversion Rates: How Predictable Are They?

In my earlier posts on statistically modelling team scoring (see here and here) I treated Scoring Shot conversion as a phenomenon best represented by the Beta Binomial distribution and proceeded to empirically estimate the parameters for two such distributions, one to model the Home team conversion process and the other to model Away team conversion. The realised conversion rates for the Home team and for the Away team in any particular game were assumed to be random, independent draws from these two fixed distributions.

Read More

Leading and Winning in AFL

One of the bets that's offered by TAB Sportsbet is on which of the teams will be the first to score 25 points. After analysing scoring event data for the period 2008 to 2014 provided by Paul from afltables.com I was surprised to discover that the first team to score 25 points goes on to win the game over 70% of the time.

Read More

When Do AFL Teams Score?

Soccer goals, analysis suggests, are scored at different rates throughout the course of matches as teams tire and as, sometimes, one team is forced to press for a goal or chooses to concentrate on defending. Armed with the data provided by Paul from afltables.com, which includes every scoring and end-of-quarter event from every game played between the start of season 2008 and the end of the home-and-away season of 2014, we can investigate whether or not the same is true of AFL scoring.

Read More

Scoring In Bursts: Evidence For In-Game Momentum?

The notion of momentum gets flung about in sports commentary as if it's some fundamental force, like gravity, that apparently acts at both long and short distances. Teams have - or don't have - momentum for periods as short as a few minutes, for perhaps half a quarter, going into the half-time break, entering the Finals, and sometimes even as they enter a new season, though I think when we start talking about momentum at the macro scale we wander perilously close to confusing it with another fundamental sporting force: form. It's a topic I've addressed, in its various forms, numerous times on MoS.

Read More

Are Some Games Harder to Predict Than Others?

If you've ever had to enter tips for an office competition where the the sole objective was to predict the winner of each game, you'll intuitively recognise that the winners of some games are inherently harder to predict than others.

Read More