Updating the Player-Based MoSHPlay Forecaster
/Last season, we had the first partially player-based forecaster, MoSHPlay, which provided forecasts for game margins and game totals, and estimates of home team victory probabilities. It performed reasonably well in a year that was, in many ways, completely unlike those it had been trained on.
It used as inputs the margin and team score forecasts of MoSHBODS, and player ratings derived from historical SuperCoach scores.
In this blog I’ll take you through the process of reviewing the existing MoSHPlay forecasting equations, and investigate the efficacy of deriving player ratings from AFL Player Rating scores rather than from SuperCoach scores.
**** WARNING: LONG READ AHEAD ****
THE DATA
We have:
MoSHBODS forecasts for team scores and game totals for the entirety of V/AFL history (note that we’ll be using the forecasts from the most recent version of MoSHBODS, as rebuilt for the 2021 season)
SuperCoach scores, available through the Footywire extract in the fitzRoy package in R, from 2010 onwards
AFL Player Rating scores, available through the fryzigg extract in the fitzRoy package in R, from 2012 onwards
PLAYER RATINGS
Ultimately, what we want from a Player Rating model is some estimate of what that player is likely to contribute to the team in the next game he or she plays. As a first step in estimating this, it might be useful to create a model that forecasts the Score (SuperCoach or AFL Player Rating) that he or she is most likely to produce in that next game, and then, later, link those projected player Scores to actual outcomes such as game margins.
(1) SUPERCOACH
Let’s first work on a model for forecasting SuperCoach Scores.
Now the simplest approach to forecasting a player’s next SuperCoach Score would be to use his or her last score - I mean, you are only as good as your last game, right - but that pathway is obviously subject to enormous variability. There are many ephemeral reasons why a player’s performance in any one particular game might be a poor indicator of likely future output.
To reduce that variability and estimate some underlying ability, it makes sense to use a number of previous Scores for the same player, and to combine them in some way.
For this purpose we’ll use the ets function from the R forecast package, which allows us to treat the string of scores that a player has recorded over a period as a time series, thereby recognising both the magnitude of and order in which scores were registered. More specifically, we’ll estimate a one-period ahead forecast SC score for a player based on his most recent N games played sometime during the past D days. We’ll try a range of values for N and D, and for the other parameters described in the following paragraphs, and select a joint set of parameter values that provide the best fit based on a metric described below.
(Technical note: we use players’ actual SC scores and do not adjust them for the time spent on the ground because we’re interested in each player’s actual “output” as measured by his SC score, not what he or she might have produced had he or she been on the field for every minute of the contest. Implicitly, we assume that a player’s average minutes per game in the sample data is a reasonable proxy for minutes per game in future matches. In treating the string of SC scores as a time series, we also ignore the time period that elapses between subsequent games. A player’s second-most recent game is treated as having occurred twice as long ago as his most-recent, regardless of how distant in time it was compared to his most-recent. The fact that we only consider games played in the past D days does ameliorate this issue somewhat)
The ets function requires that we provide a number of parameters. One of those parameters is alpha, a smoothing parameter, and another is a parameter to define the model type we are fitting. We’ll set that latter parameter to “ZNN”, the two N’s of which mean that we’re assuming the underlying SC data has no trend or seasonality, and the Z of which means that we’re allowing the algorithm to determine the error type (refer to the previously linked PDF for more details).
When we run the ets function using a player’s SC history for a specified alpha, we’ll get an estimate - a forecast - of that player’s next SC score. We can think of that as an estimate of his current value, as measured by his next most-likely SC score.
Now for some players, this estimate will be based on very few games and so will be highly variable. To deal with this reality we will employ some regularisation, here assuming that, for players who’ve played fewer than Y games, they would have recorded a score of S in the “missing” games.
The final estimate value of a player then will be a simple weighted average of the estimate we get from ets and this assumed score for the missing games.
In other words:
For players who have played Y games or more
Estimated Player Value = ets estimate
For players who have played fewer than Y games
Estimated Player Value = (ets estimate x number of games played + (Y - number of games played) x S) / Y
In essence, what we do for less-experienced players is drive their actual forecast Score back towards S, more strongly the fewer the games the player has played.
In all, we had four parameters to optimise, the final values for which were as follows:
N (the maximum number of games to include): 30 (roughly speaking then, for a player who regularly takes the field for his team, we’ll be including the last couple of seasons)
D (the maximum age of included games): 3 years (this ensures that we’re not considering games from too long ago)
Alpha (the smoothing parameter): 0.1 (this produces forecasts that are relatively slow to respond to the most recent games. It makes the forecasts behave somewhat like a raw average, though older SC scores are assigned slightly less weight)
Y (the minimum number of games before regularisation is no longer employed): 10
S (the assumed SC score for “missing” games): 55 (this tends to lower estimated values for players in their first year or two of their careers, unless they consistently produce sub-55 SC scores. It also provides the value that is used for debutants.)
These values were chosen because they minimised the mean absolute difference between the one-game ahead Score forecasts and the actual Scores recorded (the MAE). In other words, these values of the parameters produced one-game ahead forecasted Scores that were relatively close to the Scores actually achieved by the players in that next game.
To give you an idea about what the SuperCoach Forecast Score model produces, here are the forecasts for some selected, highly rated, 2020 current players.
(2) AFL PLAYER RATING
We adopt a similar, ets-based approach in building a model to forecast AFL Player Rating Scores.
Here, the optimal parameter values were:
N (the maximum number of games to include): 50
D (the maximum age of included games): 3 years (this ensures that we’re not considering games from too long ago)
Alpha (the smoothing parameter): 0.075
Y (the minimum number of games before regularisation is no longer employed): 20
S (the assumed AFL Player Rating score for “missing” games): 6
Note that we inflate all 2020 AFL Player Rating Scores by 25% to adjust for the shorter quarters, since these Scores come directly from acts, and the number of acts is a function of time. No such adjustment is required for SuperCoach Scores since these are constrained to average 75 per player, regardless of match length.
Here are the AFL Playing Rating forecasts for some selected, highly rated, current players.
COMBINING SCORE FORECASTS WITH MOSHBODS FORECASTS
To build all of the models incorporating MoSHBODS and Player Score forecasts, we’ll consider only the data from seasons 2015 to 2020 (and increase the actual 2020 margins by 25% to adjust for the shorter quarters), and split the data 50:50 into a training and a test set.
(1) USING SUPERCOACH FORECASTS
One issue I’m particularly keen to explore in this review is the best manner in which to employ forecast Scores - in their raw form, or transformed in some way. Is, for example, a player with a forecast SuperCoach Score of 90 really twice as good as one with a forecast Score of 45, or do we need to use some transformation that reduces this ratio?
FORECASTING GAME MARGINS
We fit ordinary least squares regression models to the training data using eight different functional forms as follows:
Actual Margin ~ MoSHBODS Expected Margin
Actual Margin ~ MoSHBODS Expected Margin + Difference in Team Average Forecast SC Scores
Actual Margin ~ MoSHBODS Expected Margin + Difference in Team Average sqrt(Forecast SC Scores)
Actual Margin ~ MoSHBODS Expected Margin + Difference in Team Average ln(Forecast SC Scores)
Actual Margin ~ Difference in Team Average Forecast SC Scores
Actual Margin ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + Home Team Average Forecast SC Score + Away Team Average Forecast SC Score
Actual Margin ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + ln(Home Team Average Forecast SC Score) + ln(Away Team Average Forecast SC Score)
Actual Margin ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + sqrt(Home Team Average Forecast SC Score) + sqrt(Away Team Average Forecast SC Score)
(Note that in, for example, the term Difference in Team Average ln(SC Scores), we first take the natural logs of the individual player scores, then we average them, then we difference the results for the home and the away teams)
We then use the fitted versions of these models to make forecasts on the test data, and choose the model with the smallest mean absolute error.
It’s a very close run thing between models 3 and 4, but model 4, which takes the natural logarithm of raw forecast SuperCoach Scores, is best, with an MAE of 28.76 points per game.
Refitting model 4 to the entire data set yields the following equation:
Expected Margin = 1.33821 + 0.83371 x MoSHBODS Expected Margin + 104.03676 x Difference in Team Average ln(Forecast SC Scores)
One implication of this model is that a player with a forecast SuperCoach Score of 90 is only 1.2 times as “good” (in terms of game margin) as a player with a forecast Score of 45.
FORECASTING TOTAL SCORES
We fit ordinary least squares regression models to the training data using nine different functional forms as follows:
Actual Total ~ MoSHBODS Expected Total
Actual Total ~ MoSHBODS Expected Total Score + Difference in Team Average Forecast SC Scores
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + Home Team Average Forecast SC Scores + Away Team Average Forecast SC Scores
Actual Total ~ Home Team Average Forecast SC Scores + Away Team Average Forecast SC Scores
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + sqrt(Home Team Average Forecast SC Scores) + sqrt(Away Team Average Forecast SC Scores)
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + ln(Home Team Average Forecast SC Scores) + ln(Away Team Average Forecast SC Scores)
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + abs(Difference in Average Forecast SC Scores)
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + Difference in Team Average sqrt(Forecast SC Scores)
Actual Total ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + Difference in Team Average ln(Forecast SC Scores)
(The functional form for Model 7 was informed by some earlier analysis that suggested games with more mismatched teams tended to produce higher totals.)
Model 6 (which also incorporate the natural log transformation on SC Forecast Scores, but does so here after the mean Score has been estimated for a team) when applied to the test data, narrowly yields the lowest MAEs of 22.48 points per game.
Refitting model 6 to the entire data set yields the following equation:
Expected Total = 240.35440 + 0.42209 x MoSHBODS Expected Home Team Score + 0.62279 x MoSHBODS Expected Away Team Score + 20.64999 x ln(Home Team Mean Forecast SC Scores) - 57.23884 x ln(Away Team Mean Forecast SC Scores)
I’ll confess to being a little bit nervous about including a model with four inputs, but I guess we’ll see how it performs in 2021.
ESTIMATING WIN PROBABILITIES
We fit binary logit regression models to the training data (for non-drawn games) using six different functional forms as follows:
Result ~ MoSHBODS Expected Margin
Result ~ MoSHBODS Expected Margin + Difference in Team Average Forecast SC Scores
Result ~ Modelled Margin^ + Difference in Team Average Forecast SC Scores
Result ~ 0 + Modelled Margin^ (ie a model without intercept)
Result ~ Difference in Team Average Forecast SC Scores
Result ~ MoSHBODS Expected Home Score + MoSHBODS Expected Away Score + Home Team Average Forecast SC Scores + Away Team Average Forecast SC Scores
^ Using the best model from the Game Margin models above (ie model 4)
Using log probability score (LPS) as the performance metric, model 4 is the standout leader here. Its LPS on the test data is 0.1662 bits per game.
Refitting it to the entire data set yields the following equation:
Probability of Home Win = 1/(1+exp(-0.04865 x Modelled Margin))
One especially appealing feature of this model is that it will always produce probability estimates under 50% for negative expected margins, and always produce probability estimates over 50% for positive expected margins (and produce a 50% probability estimate when the expected margin is exactly zero).
(2) USING AFL PLAYER RATING FORECASTS
We fit exactly the same models here as we did when we were using SuperCoach Forecasts, and the results are:
FORECASTING GAME MARGINS
The preferred model, with an MAE on the test data of 28.29 points per game, which is about half a point better than the equivalent model using SuperCoach forecasts, is the following:
Expected Margin = 1.9327 + 0.7394 x MoSHBODS Expected Margin + 9.9399 x Difference in Team Average Forecast AFL PR Scores
Note that no transformation is applied to the forecast Scores, so our estimate of the ratio of the abilities of two players is proportionate to the ratio of their forecast Scores.
FORECASTING TOTAL SCORES
The preferred model, with an MAE on the test data of 22.51 points per game, which is about the same as that for the equivalent model using SuperCoach forecasts, is the following:
Expected Total = 82.3997 + 0.5076 x MoSHBODS Expected Total + 2.7446 x Absolute Difference in Team Average Forecast AFL PR Scores
So, just two inputs here, so a bit less (maybe) to be concerned about.
ESTIMATING WIN PROBABILITIES
The preferred model, with an LPS on the test data of 0.1760 bits per game, which is considerably better than the equivalent model using SuperCoach forecasts, is the following:
Probability of Home Win = 1/(1+exp(-0.04834 x Modelled Margin))
Note that the coefficient on Modelled Margin here is very similar to that in the equivalent model based on SuperCoach forecasts, but the Modelled Margin referred to is that from the AFL PR model for forecasting game margins.
COMPARATIVE PERFORMANCE
As a last step, let’s apply the six preferred models to the entire data set from 2015 to 2020, and compare the SuperCoach-based models with their AFL Player Rating-based equivalents year-by-year, and overall.
We see that:
For forecasting Game Margins, the AFL Player Rating-based model is clearly superior to the SuperCoach-based model. It produces a lower MAE in 4 of the 6 seasons, and has a MAE across all 6 seasons about one-quarter of a point per game lower. (Note that the MAE for 2020 uses Actual Margins that are inflated by 25%. You can calculated the equivalent MAEs based on the actual scores by taking 80% of the figures shown here)
For forecasting Total Scores or estimating Home Team Winning Probabilities, there’s little to separate the SuperCoach from the AFL Player Rating models. That’s a little surprising given the clear superiority of the AFL Player Rating model for Estimating Probabilities when applied to the test data alone, but you might hope that the out-of-sample performance would be better estimated by the relative performances on the holdout. We will see.
NEXT STEPS
Now that we have player ability estimates using SuperCoach and AFL Player Rating data, it might be interesting to further analyse their similarities and differences.
Some other thoughts I have are to:
Identify the highest- and lowest-rated teams since 2015
Compare SuperCoach and Player Rating data for the same players to identify those who consistently do better or worse under one methodology compared to the other
Investigate team-by-team Scores to see if different team styles systematically lead to different outcomes in terms of SuperCoach or AFL Player Rating Scores
As always, I’d also be interested to hear your suggestions.