Who is the Premier League’s best goalkeeper?
Despite the ever increasing opportunities for supporters to watch live, televised football matches, many still rely on edited highlights to inform them about the ebb and flow of the domestic leagues.
On a typical Saturday evening, Match of the Day condenses the day’s Premier League matches into ten minute segments, complete with interviews and post-game comment.
Inevitably, selectively chosen incidents create a distorted impression of the talent on display.
What does Match of the Day show viewers?
Mistakes and exceptional pieces of skill appear much more common than is actually the case and combined with the many cognitive biases that exist when the human brain attempts to assimilate and quantify information, it becomes very easy to over or underestimate a team’s likely true worth.
Goals are the natural focus of highlight reels and so goal scorers and keepers are the players most prone to seeing their reputations inflated or depressed on the flimsiest of short-term evidence.
Increasingly, clubs are taking an integrated approach to scouting, combining the nuances that can be identified from watching a player in the flesh, with the extensive, data driven assessment that can track a player’s traits and abilities over his entire career.
Central to this data-rich environment are the various expected goals models that exist to enable coaches to evaluate the quality and quantity of chances each player is participating in, measure the outcome of such processes and compare actual outcomes to those of the player’s peers.
What are xG2 models?
Expected goals models or xG for short, use a large historical database of similar chances and their actual outcomes to estimate the likelihood of success for any opportunity presented to an attacking player.
Equally, the chance that a keeper will save an attempt on target can be modelled, again based on parameters such as how and from where the shot was taken. But also incorporating post shot information, typically, where it was destined to enter the net, whether it was struck with power or swerve and if the keeper was further inconvenienced by the ball taking a deflection.
These models are called xG2 models to distinguish them from ones that are used to describe all attempts on goal, whether they are on target or not.
Models take a probabilistic world view, but the reality of a football match is that chances are either taken or spurned, and reconciling the differences between the process of chance creation and actual outcomes is often key to interpreting player or team assessment.
A keeper typically has a 20% chance of saving an on-target penalty kick and two saves in every ten such attempts is the most likely actual outcome, but there is also a range of possible - but less likely - outcomes on either side of this expected value.
Simply simulating numerous sequences of ten penalties, each with a 20% chance of being saved, illustrates that in around a third of such trials, a keeper will save a maximum of one penalty in a trial consisting of ten attempts.
Similarly, in around 13 out of every 100 trials, a keeper will save at least four such kicks.
If such relatively common over and under performances compared to the expected average were seen over the course of a season, it would be tempting to think that the former group were below average keepers when facing penalties and consider the latter as expert penalty savers.
What is the chance of a penalty being saved?
However, throughout these simulations, each penalty kick has had the same 20% chance of being saved and what might have passed for talent or lack of it in the real world is simply randomness in a small number of trials.
It is therefore essential that we allow for the often strong possibility that random variation, rather than a differential in skill levels has separated the performances of players as measured by our various xG models.
One simple way to visualise multiple seasons of player data is to take a running average to smooth out their performance curves to see if periods of under or over performance persist, or if they become swamped by runs of less extreme outcomes.
Thibaut Courtois’ shot stopping data well illustrates the narrative-driven stories that can easily be woven by looking at small samples of performance-related metrics.
What does the graph above show?
The blue plot represents the rolling expected goals per attempt allowed based on the modelling of pre and post shot variables, applied to all non-penalty kicks faced by Courtois since the beginning of the 2014/15 Premier League season.
The red line represents Courtois’ actual concession rate per attempt.
When the red line falls below the blue one, Courtois has conceded fewer goals in that run than the expectation of the model based on the quality of on target attempts he has faced.
So there have been periods within each of the last four seasons when Chelsea’s keeper has exceeded the shot-stopping expectation of our model. But there have also been times, notably mid-season in 2014/15 and at the beginning of 2016/17 and 2017/18, when his actual performance has dipped below par.
Is Courtois the best Premier League goalkeeper?
Overall, Courtois’ has been a fairly average shot stopper when measured against the data of Premier League keepers upon which the model is based.
Shot stopping is just one aspect of a goal keeper’s skill set, but Courtois appears to be no better nor worse than the historical average of a top flight keeper.
The occasional waxing and waning of his performances are most likely down to simple randomness, although such variation is frequently portrayed, erroneously, as the keeper being “in” or “out” of form.
We need many seasons of data, often corrected for a player’s natural age related improvement and decline, before we become confident that any diversion from a modelled prediction is more likely due to signal, rather than just statistical noise.
So who is the best Premier League goalkeeper?
Manchester United’s David de Gea is a rare example of a keeper who consistently exceeds expectations.
He has experienced a few under-performing runs, but overwhelmingly, he has conceded many fewer goals than our xG2 model suggest an average keeper would be capable of achieving.
Repeated simulations of the each shot faced by de Gea using the modelled likelihood that a goal would be conceded, very rarely performs as well as de Gea has actually done over multiple seasons and randomness alone cannot be used to explain the level of over-performance from the Spaniard.
Finding examples of players with a significant skill advantage over their peers at the very top of a sport is often difficult, but de Gea’s performances over the recent seasons presents a solid case for him to be considered currently the best of the Premier League’s best at preventing the ball from crossing the goal line.