It's a cliche, sure, but sometimes a picture (or two) does paint a thousand words.
Recently I've been working on the assumption that the Home and Away team scores in a contest can be modelled as a bivariate normal with an appropriately estimated mean vector and variance-covariance matrix. Various analyses have suggested to me that this assumption is broadly valid from an empirical viewpoint but, until tonight, I've never taken the simple step of graphing the data. Rookie error ...
These few lines of R code did the trick:
# read in the data, which has just 2 columns: Home Score and Away Score. It spans 1999 to 2012 R20
Data <- read.csv("C:/Users/Tony/Desktop/Data.csv")
# load the necessary packages
dens = kde(Data,H=Hpi(Data),xmin=c(10,10), xmax=c(240,240))
persp3d(dens$eval.points[], dens$eval.points[], dens$estimate, col = 'skyblue',
xlab = "Home Score", ylab = "Away Score", zlab = "Density")
This last command produces an interactive, 3d-plot of the kernel density estimates, various snapshots of which appear below.
These charts (each of which is clickable by the way) hint that the bivariate Normal distribution might not be the best choice for jointly fitting Home and Away scores. Whilst the Normal distribution will probably do an adequate job - as it very often does - the extended tails to the right of the marginal distributions for Home and for Away scores make me wonder if a bivariate lognormal or a bivariate negative binomial might do better.
The second image of this series, its shape reminiscent of the Breadknife rock formation in the Warrumbungles in NSW, Australia and tilted relative to the Home Score and Away Score axes, best depicts the correlation between Home and Away scores, which necessitates that a bivariate approach be adopted.
Sometime soon then I'll be revisiting the assumption that team scores in a game are drawn from bivariate normal distributions.
These few additional lines of R code produce a 2d contour plot of the kernal density estimates, which does a better job of showing the mild correlation between Home Score and Away Score
library(MASS); f1 <- kde2d(Data$Own, Data$Opp, n = 50, lims = c(10, 240, 10, 240)); filled.contour(f1, col = rev(grey.colors(10)), nlevels = 10, xlab = "Home Score", ylab= "Away Score")