Football Analytics Introduction

The application of sporting statistics as both an explanatory or predictive aid has a rich and varied history.

Traditionally the US based sports, such as basketball, hockey and particularly baseball have been the leading pioneers in using a numerical approach to better understand the complex interactions that occur on the field of play.

Often these events are influenced to a large degree by the randomness of a naturally occurring process and the challenge in all sports has been to identify the part played by luck and that attributable to the skill of the player.

Skill will tend to persist into the future, subject to the age related waxing and waning of a player’s physical abilities, whilst luck based contributions will tend to regress towards less extreme values.

Bill James and Billy Beane in baseball are the highest profile “sabermetricians” of the current era, but football had its own, often unfairly maligned, visionary in the late Charles Reep.

Reep not only collected his own data, but also analysed the outcomes of differing patterns of play.

He is best known as an advocate of the long ball game that was used to great effect by Wimbledon in the 80’s, the Norwegian national side in the 90’s and most recently Leicester in winning the 2015/16 Premier League title.

The availability of data has been a stumbling block to the development of the current data analytics movement in football.

The natural desire of data companies to protect their product has often conflicted with the need to provide datasets to drive the analytics movement forwards.

In 2012, Manchester City and Opta joined forces to release individual player data for one Premier League season.

Prior to 2012, the burgeoning online analytics community had largely used simple counting statistics, such as the number of goals or the number of shots, both on or off target, recycling methods that were common in ice hockey, another invasive team sport.

The larger volume of goal attempts in a match or season, typically expressed as shot ratios or shot differentials, often improved the predictive power compared to a model based on a less frequent goal based approach.

The availability of a much wider range of counting stats also allowed a wider scope for analytical investigation.

Assists, key passes, tackles and interceptions expressed in terms of per 90 minutes were used as a baseline for an average Premier League player to which others could be compared.

Very quickly more granular information was added to the established shot figures, such as whether the chance was a header or a shot and where on the field it was taken from.

By adding more information both pre and post shot and comparing the outcomes for similar attempts from previous seasons, analytics can attempt to evaluate the sustainability of a player’s hot scoring streak, by estimating how likely it is that his or her performance might be repeated by an average player.

Football analytics has irrevocably crossed the divide between a niche internet topic and become a part of the decision making process in most top tier clubs.

On field events that have a causative connection to statistical indicators that correlate to success or failure during a match are being identified and utilised in pre match tactical planning.

A less skilled team may choose to play long balls into their opponents half to reduce their chances of conceding possession in dangerous areas of the field by attempting to play intricate passing movements to which their skill set is not suited.

Stoke used such tactics to consolidate their Premier League status following promotion with a squad of largely Championship quality players.

Most publicly Brentford has employed statistical techniques to improve their acquisition of talent and also to maximize their returns from on field events, such as free kicks and set piece plays, allowing them to compete with more affluent rivals.

And the likes of Liverpool and Arsenal have been quietly using analytical techniques to drive match and transfer strategy over recent seasons.

As before in other sports, football analytics, with more extensive data is trying to eliminate the statistical noise from the observed performance to estimate the talent based signal.

Recent blog entries