Monte Carlo simulations: The best way to predict final standings
There are only a handful of league matches left in the top two English leagues and the awards at the top table are reaching fruition, whilst relegation is becoming a painful reality for sides towards the foot of the table.
This time of the season is rife with interactive tools that calculate the final league positions after anxious supporters have inputted their predicted scorelines for the remaining matches.
An optimistic away win on the final day of the season can be the difference between demotion and survival for another year in the top flight.
Such exercises are useful in bolstering false hope, but are of very limited value when attempting to make an informed opinion about the likelihood of a struggling side avoiding the drop.
Predictions, by which we mean which team will win a contest by a precise scoreline, are very different to probabilistic projections that involve a modelled estimate of all three possible match outcomes.
Certainty...is never attainable
Certainty, which is implied by a match prediction is never attainable, even when the bottom side travels to the current second placed team, as all but relegated, WBA’s 1-0 victory over Manchester United amply demonstrated on Sunday afternoon.
The requirements of an interactive table calculator would lead to the vast majority opting for a comfortable home win for Mourunho’s side, whereas a probabilistic approach allows for the small, but non zero likelihood that the Baggies might pull off an unlikely victory as their Premier League swansong.
The most generous bookmakers’ odds had an implied probability of around 0.05 for a WBA win, compared to Infogol’s slightly more optimistic 0.08 assessment of their chances.
The probabilistic approach gives a much more realistic version of the weighted possible outcomes, but it is not well suited to the simple predictive process of a league calculator that requires the input of a single result.
Making actual predictions for every upcoming game produces a definitive final table, but has probably been populated largely by “most likely” results and cannot give any flavour of the effect of rare, but possible surprises, such as we saw at Old Trafford.
Monte Carlo simulations bridge the gap between the uncertainty that is ever present in our assessment of future matches and the need to input a single match result, preferably with a scoreline to break ties where goal difference may relegate one team above another.
The building blocks of a Monte Carlo simulation is the probabilistic assessment of each upcoming game in which each of the three possible outcomes are given a probability of occurring.
These probabilities can be based on the rate at which each side has created or allowed expected goals in a representative selection of previous matches.
In the Manchester United vs WBA game, the hosts had unsurprisingly performed better than West Brom in both expected goal creation and the rate at which they have allowed their opponents to threaten their own goal.
It is then possible to combine the rate at which United create chances with the rate at which WBA allow chances, with due regard for the venue, to calculate the average number of goals the host would expected to score.
Likewise we can do the same from WBA’s perspective.
The Poisson approach
Average expected goals figures for each team are well suited to making scoreline predictions by a Poisson approach.
At its simplest, a Poisson calculates the chance that a side will score a particular number of goals in a match provided we know the average number of goals that side is expected to score.
We have just calculated these rates for both Manchester United and WBA, so we can use them to calculate how frequently United would score zero goals and how often WBA would score exactly one.
By further combining these two probabilities we can estimate how often such a match up would end as Manchester United 0, WBA 1.
It was roughly a true 33/1 shot.
Similarly, we can calculate and sum the probability of every scoreline that leads to a WBA win and this gives a home win, draw and away win percentage chance for the three possible outcomes, as shown on the Infogol app below, with the Betfair percentages for comparison.
We have moved from the simplistic, gut feeling, result-based requirements of a league table calculator to a probabilistically-based assessment of the match.
In short, we think that WBA will win 8 in every 100 of such games, Manchester United 77 out of every 100 and 15 will be drawn, while Betfair’s combined wisdom of the crowds had settled at 5/100 for WBA, 82/100 for United and 13/100 for the stalemate.
We’re now ready to use a Monte Carlo simulation to recreate multiple possible realities for every remaining game based on our assessment of the current, respective expected goals-based qualities of each team.
Rather than forcing a singular choice for all upcoming matches, Monte Carlo simulations use random numbers to select an outcome based of the probabilities we have calculated.
For example, if we randomly generate a number between 1 and 100 and it falls between 1 and 77, then the simulation chooses a Manchester United win.
If it falls between 78 and 92, the result is deemed a draw and a WBA win transpires if the random number is between 93 and 100.
It’s also possible to incorporate scorelines to allow for goal difference tie breakers.
The power of simulations
We now have a method that selects an individual prediction for a future match, but importantly uses our probabilistic assessment of the outcomes of that match.
The power of simulations is that these iterations can be repeated many times over, not just for a single game, but for every match yet to be played.
Points won, goals scored and goals conceded, in multiple iterations, can then be added to each side’s actual record to produce a distribution of possible final league placings for many alternative realities.
Applying this process to other leagues, this allows us to move away from the restrictive question of “will Derby County make the playoffs” to a much more useful and realistic conclusion based on likelihood that encompasses the inherent uncertainty surrounding any projection.
Attempting to predict every result in a league calculator format can only tell you if Derby finished inside the top six or not, but by running through thousands of Monte Carlo simulations we can place that likelihood at 66%, with 5th (41%) being the Rams’ most common final placing and a 1% chance they could finish as high as 4th or as low as 11th.