Expected Goals Introduction

One of the cornerstones in the ever developing area of football analytics is the concept of expected goals.

Intuitively any fan will recognise that not all shots or headers have the same likelihood of successfully ending up in the net.

Simplistically, a shot is more likely to be scored than a header from the same position on the field and indeed shot location and shot type are the two major contributing factors when projecting how likely it is that a goal is scored from any chance.

Minor contributing factors include how the chance originated.

For instance whether the opportunity followed a fast break, often a proxy for the amount of defensive pressure or whether the build-up was slower or was created from a set piece.

The so called “expected goals” of a chance, sometimes abbreviated to just xg, is expressed in probabilistic terms. For example, penalty kicks are typically converted around 78% of the time.

So prior to the kick a penalty would be have an expected goals figure of 0.78.

Therefore, xg figures lie between 0 and 1, where 0 means that a goal will never be scored from such a chance and 1 indicates total certainty.

In reality, the figures will range from very small, in the region of 0.01 for speculative longshots up to 0.6 when a shot is taken from relatively close range.

The xg figures are derived from historical precedence from a large dataset of granular shot data and the relationship between the characteristics of the shot and the actual outcome, whether it was blocked, off target, saved or resulted in a goal.

These models are then validated on sample data that was not part of the original model building and for example, a group of attempts that are predicted to be successful 30% of the time should comprise 30% actual goals.

Football is a low scoring sport, where most top flight leagues average around 2.5 total goals per game and by looking at the process of chance creation, denoted by the expected goals of each individual chance, it is possible to more quickly identify those teams who are profiting from an unsustainable lucky, hot streak.

The process of chance creation, together with a quantified estimation of each individual chance created and faced by a side is often a better indicator of future performance than the often luck driven singular outcome of those chances.

At its most simplistic, the expected goals created by each team in a match can be summed and compared to the actual result to determine whether or not the match outcome was a fair reflection of attacking and defensive process of each team.

A more satisfying approach may involve simulating the likelihood of a goal resulting from each individual chance, once due regard is taken for rebounds, to produce a range and probability of different score lines.

The probability of each score line that leads to either a home, away win or draw may then be summed to create a postgame probability for all three possible outcomes.

A team that is consistently winning matches while creating fewer expected goals than their opponents, might be expected to experience more usual rewards for their efforts in the future and subsequent results may take a downswing.

Similarly, a side that is creating chances, but failing to score regularly might eventually begin to reap a fairer reward for their creativity.

The versatility of xg can be further extended to individual players, again through simulation to gain a more nuanced evaluation of their scoring record and perhaps a goal drought may be more correctly attributed to the natural random ebb and flow in the process of chance conversion rather than a narrative driven crisis of confidence.

Recent blog entries