Fitting Team Winning Percentages: Alternatives to Pythagorean Expectation

I've addressed the topic of fitting a team's winning rate as a function of its scoring behaviour before on MoS, in discussions about Win Production Functions generally and in posts about Pythagorean Expectation specifically.

Pythagorean Expectation has been cropping up again for me recently, in a really nice piece reviewing the inequities of the 2015 Draw that I read on the HurlingPeopleNow website, in a tweet on the SoccerMetrics site generously pointing some of their visitors to the earlier MoS piece, and in updates to analyses I've been performing with the encouragement of Friend of MoS, Mark (who, by the way, has been diligently assembling pre-1896 footy data - he's managed to get back as far as 1870).

During the work Mark made me aware of some relatively simple alternatives to Pythagorean Expectation, most of which, it seems, have been developed in the US for use in baseball. That genesis makes sense. When you're 42 games into a 162-game season an accurate yet simple way to project the remainder of the season must seem attractive.

In this blog I'm going to describe a few of these Pythagorean alternatives and assess how well, comparatively and absolutely, they fit historical end of home-and-away season VFL/AFL data. 

ALTERNATIVES TO PYTHAGOREAN EXPECTATION

For the blog I'll be considering eight methods, each of which uses only information about a team's winning rate for a season, the number of points it scored and conceded, and, in some cases, the number of games it played in scoring and conceding them.

The formulae for these methods appear in the table at right and you can read a little about their history in this very old blog post, which appears to have most recently been updated sometime around 2004 though created when website aesthetics were, by necessity, hostage to available capabilities.

I'll leave any further study of the history of these methods to the interested reader, but I will note here that I've generalised - and in some cases newly parameterised - these methods. For example, the standard Kross method does not have parameters but, instead, has k1 and k2 set to 2. As well, none of the methods formally constrain their winning percentage outputs to the (0,1) range as I've done here with all that convoluted "min"ing and "max"ing.

Lastly, I'll note that the Palmer version shown here is different from the one in the ancient blog I linked to earlier. The version I've used is the one that Mark provided me and fits better than the version in that blog anyway.

I've also included in the set of methods to be explored an alternative version of Pythagorean Expectation in which separate exponents are allowed for Points For and Points Against.

To test the methods I fitted them to the results of each of the 118 seasons separately, using the nls function of the stats package in base R to estimate optimal values of the k's for each season and method. Though nls uses squared error to determine the optimal value of the free parameters, to evaluate the fit I used both the squared correlation between actual and fitted winning rate, and the mean absolute difference between the actual and fitted winning rates for all the teams in a given season (ie the MAE).

The chart below records the MAE for each of the eight methods, optimised for each of the 118 seasons. Most of the methods produce fitted winning percentages that differ, on average, by between 4% and 8% points per team. So, for example, for teams whose actual winning rate was 50%, on average, a model fitted to using the scoring data would most likely produce a value in the 45-55% range. In this regard, all of the methods fit the actual data well.

Averaged over the entirety of VFL/AFL history, the Alternative Pythagorean Expectations method performs best, coming in with a season-average MAE of 5.25% per team per season. Pythagenpat is next best at 5.27%, the Alternative Pythagenpat at 5.28%, Palmer at 5.29%, Pythagorean and Tango both at 5.30%, Ben V-L at 5.31%, then Kross at 5.61%. Kross aside then, there's not a lot to differentiate the methods.

Looking, instead, at just the Modern Era, Pythagenpat is best with an average MAE of 5.14%, followed by the Alternative Pythagorean method (5.15%), the Alternative Pythagenat method (5.16%), Pythagorean (5.18%), Palmer (5.24%), Ben V-L (5.26%), Tango (5.27%), and Kross (5.59%). So, there's a bit more spread and Kross remains definitively inferior, but there's still not a great deal of separation. 

These averages, however, hide a subtler story about the superiority of particular methods for specific seasons. Kross, though comfortably last in terms of average MAE, is most often the best method for a season, which you can get a hint of from the chart below, which tracks the performance of each method in a given season relative to the best method in that season. (You can also see from this chart how highly variable is Kross' performance.)

In fact, as the table below reveals, the Kross method is best in almost 1 season in 4. Its problem is that it's also worst in over 1 season in 2.

Pythagenpat arguably has the most appealing best versus worst record, finishing best in 20 of the 118 seasons and worst in only 9. Across the three most recent Eras it's been best 12 times and worst only once. The Alternative Pythagorean method has a similar record though it achieves this with the aid of an additional free parameter.  

The full set of optimised parameters and performance metrics for all methods and seasons appears in the table below (which, I remind you is, like all images on MoS, able to be clicked for a larger version).

MAKING FORECASTS

As further input into our assessment of the relative merits of the various methods, we might also consider their stability, which we measure by fitting the results of one season using the optimised model coefficients from the season before. Stable methods should still provide reasonably good levels of fit when we do this.

Appearing below are the same two charts as previously but now based on the analysis just described. The chart on the left tracks the season-by-season MAE for each method, and the one on the right tracks each method's MAE in comparison to the best method in that season. Note that I've allowed the y-axis scale to vary for each method in the chart on the right as, otherwise, the 8% figure for Kross in 1898 defines the scale for all methods and reduces our ability to distinguish some of the smaller numbers for other methods.

As they must, MAEs increase, but not by much. The figures are now:

All-time

  • Pythagenpat: 5.81%
  • Alternative Pythagenpat: 5.81%
  • Pythagorean: 5.83%
  • Tango: 5.84%
  • Ben V-L: 5.85%
  • Palmer: 5.85%
  • Alternative Pythagorean: 5.89%
  • Kross: 6.30%

Modern Era

  • Pythagenpat: 5.34%
  • Alternative Pythagenpat: 5.40%
  • Pythagorean: 5.44%
  • Tango: 5.47%
  • Palmer: 5.47%
  • Ben V-L: 5.49%
  • Alternative Pythagorean: 5.51%
  • Kross: 5.92%

The All-Time figures are generally about 0.5 to 0.65% points higher than the earlier equivalent MAEs, and the Modern Era figures about 0.2 to 0.3% points higher.

Looking at the best versus worst figures for this new analysis, Pythagenpat fares best in the comparison. It is best method in more than one-quarter of seasons and worst in only one-eighth. The Kross method fares worst and is now best method in only 14% of seasons but remains worst in over one-half.

SUMMARY AND CONCLUSION

The Kross method aside, none of the alternative methods explored in this blog fit the historical record substantially less well than does Pythagorean Expectation.

That said, the Pythagenpat method finishes in the top 2 methods in terms of All-Time and Modern Era MAE in our first analysis, and finishes as top method in terms of All-Time and Modern Era MAE in the second analysis. So, it both fits the data well and displays a high level of stability as I've measured it here. On that basis, Pythagenpat appears to be a legitimate alternative to Pythagorean Expectation in fitting VFL/AFL winning percentages.