# APR Modelling: How to Ruin Football

*“Some people believe football is a matter of life and death, I am very disappointed with that attitude. I can assure you it is much, much more important than that.”*

Clearly, UEFA disagreed with this famous Shankly sentiment, and decided that cramming 90,000 fans and a killer virus into Wembley Stadium was to be avoided. While a virtuous subset of the populace may be patient enough to wait another year for Euro 2020, we actuaries fire up our favourite software and get modelling.

Now, obviously, any sane model will predict England to win next year. Any child will tell you that this is the obvious result. We’re not going to predict the winners – that’s boring, that’s been done.

What we’re going to look at is the format of the Euros. The tournament has grown from its humble beginnings in 1960 where only 4 teams competed, up to the most recent edition where 24 countries will compete. The main problem with larger tournaments, aside from the dilution of quality (even Wales qualify now), is that more teams means more games to be crammed into a few weeks in the summer. But could it be better? In this article, we’re going to investigate the current structure, as well as some alternative formats.

And as a quick appetizer for the kinds of crazy results we see under different formats; low ranked teams such as Kosovo have almost twice the chance of winning under a straight knockout format than a league format.

## If it ain’t broke, don’t fix it

Let’s quickly run through the tournament structures we’ve modelled.

**Euro 2020 Format**

24 teams compete in the summer tournament. You’ll remember that, before times became unprecedented, 20 teams had successfully qualified for the tournament. The 16 remaining teams were fighting tooth and nail to fill one of the 4 remaining spots via the playoffs.

After the qualifiers have concluded the tournament will begin with 6 groups of 4 teams. All teams in a group play each other once, being awarded 3 points for a win, 1 for a draw and 0 for a loss. The top two teams from each group along with the 4 best performing 3^{rd} place teams –16 teams in total – will then advance to the knockout stage.

**Unseeded Knockout**

All teams still involved in the competition are combined into a 36-team roster. The 8 lowest-ranked teams engage in a play-off round, leaving 32 teams to be entered into a knockout tournament. Fixtures are drawn randomly, so you could have the two top ranked teams facing in each other in the first round proper.

**Seeded Knockout**

The 32 teams will be decided in the same way as for the unseeded knockout. The fixtures for the knockout will be determined by a rudimentary seeding system. The 8 highest-ranked teams are randomised and placed on the draw sheet in such a way that they cannot play each other until the third round (the quarter finals). The remaining teams are then randomly inserted into the draw sheet.

**Draw and Process**

This type of tournament is used in croquet and is similar to playing two knockout tournaments (the first called the draw and the other the process), with the winners of both playing each other to determine the overall winner. If the same team wins the draw and the process, they are the overall winner. The fixtures of the draw are determined randomly and, in our model, will be unseeded. As we have 36 teams there will be 28 byes (free wins) in the first round of the tournament.

The process then ensures that teams who played each other in the first round of the draw cannot play each other until the finals and players who played in the second round of the draw cannot play until the semi-finals and so on.

An example of an 8 team draw and process is illustrated below where the teams in the draw have been numbered from 1 to 8 in descending order.

**League**

All 36 teams in the tournament play each other twice with 3 points awarded for a win, 1 for a draw and 0 for a loss. At the end of the league the team with the most points wins. If we have two winners, both get recorded as having come second – the same principle applies to all teams joint on points.

## The Model Build

Onto the fun bit! Key steps in the creation of our model:

- Deciding which software to use.
- Choosing the data that will be used to rank teams.
- Writing a function to determine the outcome of a given match.
- Writing the code to simulate different tournament structures. Each structure was simulated 10,000 times.

**Step 1 – Software**

A tricky decision and always one worth taking some time over. The main requirements on the software were:

- It had to be able to easily produce visualisations of the results.
- It had to be familiar to all model developers.

There are many technologies which satisfy the first point. R is a good example, and was nearly chosen, but in the end the possibility that some of the development would be split among other staff meant that the low-risk Excel/VBA approach was taken. This meant that anyone drafted in was guaranteed to be familiar with the software, and demonstrates that care should be taken before introducing new technologies – though they might be better for the job than what you are using, they also bring training and maintenance overheads along with them.

**Step 2 – Team Rankings**

To simulate a match between two teams, ratings of the teams’ skill levels were needed. One data source considered was bookmaker’s odds, but these suffer from numerous issues which you can look forward to reading about in 2-5 minutes time. In the end, we settled on FIFA world rankings points. Each team is assigned a skill rating by dividing their number of FIFA points by the total number of points of all teams. The probability of each team winning a match is then calculated proportionately using these skill ratings. While sufficient for our investigation, this section of the model is the one most in need of refinement – but we’ll talk about that in due course.

**Step 3 – Play Match**

A difficulty we encountered was simulating the probability of draws. Finding a winner of a match is fairly straightforward – it’s just a case of transforming the skill levels into probabilities of winning, and comparing this probability to a random number to determine the actual winner. This is fine for knockout matches, but leaves something to be desired for league formats.

In order to model draws, then, we use a parameter, D, which represents the maximum draw probability. This will be the probability of a draw for two equally skilled teams. Next, a function is generated that takes the difference in win probabilities of the two teams (x) as an input and outputs the probability of a draw (y). Our boundary conditions:

- If one team has a 100% chance of beating the other team (i.e. a difference in skill of 1) the probability of a draw is 0%
- If the skill difference is 0 the probability of a draw is D (from the definition of D)

These boundary conditions can be expressed mathematically as; for x = 0, y = D and for x = 1, y = 0. The function satisfying these conditions that was settled on is below however this could be fine-tuned with sufficient data. Our justification of “that looks about right” is far from TAS compliant!

\(y=\frac{D}{(1-e)}\ – \frac{De}{(1-e)}\ e^{-|x|}\)

For D = 0.25 (i.e. a 25% chance of a draw for two equally skilled teams) this graph looks as follows.

The probability of each team winning is then rescaled so that the sum of these probabilities plus the probability of a draw equals 1.

For example, if team A has twice as many UEFA points as team B this gives win probabilities for team A and B of 66.7% and 33.3% respectively. So, team A is twice as likely to win as team B if draws are not allowed. For D = 0.25 and x = 0.667 – 0.333 the probability of a draw is then calculated as 14%. The probabilities of winning are then re-scaled to add up to 86% (i.e. the probability of there not being a draw). This sees teams A and B have 57% and 29% probabilities of winning, respectively.

**Step 4 – Tournament Format Simulations**

Once functions were written to simulate individual matches, the rest of the code was a case of getting these matches played in the right order, and recording the results. Since many of the formats investigated were some combination of league and knockout, subroutines were written to simulate both fundamental formats. These could them be used as building blocks for other structures. We stored information about the teams and their results in 2-dimensional arrays, and so we found it useful to write utility tools for:

- Randomising the order of a 2-dimensional array
- Sorting 2-dimensional arrays by a given dimension
- Merging 2-dimensional arrays together

Examples of where the functions above were useful:

- The knockout and Draw-and-Process tournaments used the randomise function to create the random fixtures
- The subroutine which controlled the Euro 2020 format used the knockout subroutine to simulate the playoffs; the league subroutine to simulate the group stage, and then the knockout subroutine again for the rest of the tournament.

Taking a bit of time to consider the model build before plunging into it meant that, once the building blocks were established, it was far easier to build more complex structures from these building blocks. It made implementing these structures much quicker and less error-prone than if it was done from scratch each time – and when the time came to review the model, this was also much easier, and any errors were picked up more easily. There is still some specific code for each tournament format; for example, for the Euro 2020 simulations, the knockout fixtures are not random and depend on which groups the four 3^{rd} placed teams advancing to the knockout stage come from. The model implements these rules within the Euro 2020 subroutine.

Once built, each structure was run 10,000 times, with information about each team’s performance over those 10,000 simulations stored in the array.

## The Big Limitation – Quantifying Skill Levels

The skill levels are clearly a large source of over-simplification in our method. The method we have used to derive the skill levels of the teams does not discriminate between the top and bottom teams by nearly as much as we would like. For example, in a knockout match between the teams with the highest and lowest scores, Belgium vs Kosovo, the method used gives Kosovo a 40% chance of winning (Vedat Muriqi is good, but he’s not *that *good).

Alternative data sources we considered to quantify the skill levels were:

- The points earned by each team in the Euro qualifiers. This was rejected early on due to the relative scarcity of data, and the fact that some teams had smaller groups than others, meaning some subjective workaround would be needed
- Bookmakers’ odds of winning the tournament. We explored this in some depth, but they were eventually rejected as:
- They are calculated assuming the existing Euro 2020 structure. We want our skill ratings to be independent of the tournament structure used.
- They include loadings for the bookmakers’ profit
- They are not entirely probability-based; they also take some account of how much money is wagered on each team. For example, UK bookmakers will likely receive a large amount of wagers on the Home Nations to win. They then adjust the odds they offer to ensure they are not overly exposed to a particular country winning. The Edinburgh APR office balked at the English dominance that the bookies predicted, and rejected these probabilities straight away

- Historical goals scored and conceded by each team. These could be used to derive some attack/defence parameters which are then used to simulate actual goals scored in a match adding a layer of detail. We may revisit this if we ever refine the model, but was deemed too spurious for our initial build

However, while the skill levels are clearly not ideal for realistic predictions, for the purposes of analysing the differences caused by tournament structure they suit reasonably well.

## And the winners are…

Belgium. That’s that then.

So, Belgium having the highest probability of winning, they come out on top in all 5 formats – sense check complete, at least the model isn’t a total disaster. Now to look at some interesting results.

## What do we expect?

Before we get into details, let’s think about what we expect to see. In a league format, where everyone plays everyone twice and the odd upset is lost in a sea of ‘correct’ results, we’d expect to see results much more in line with the form book than for the other formats. The knockout would probably be the most volatile, and the Euros, as a hybrid of the two, would see results somewhere in the middle. The draw and process being vaguely similar to a knockout competition (albeit with two lives before you’re out) would possibly see fairly volatile results too.

**Win Probability**

Let’s start simple – how does the probability of teams winning the tournament change with the tournament structure? We’ve gone for the two extremes – mighty Belgium and less-mighty Kosovo, to illustrate how the format affects fortunes at the top and bottom of the food chain.

After a bit of digesting, this is largely as we might expect. The league format sees the largest gap between the two teams’ win probabilities. Knockout tournaments favour less skilled teams more, but seeding erodes this effect slightly. Hybrid structures, like Draw and Process and Euro 2020, sit somewhere in the middle of the two extremes.

**Shock Results**

Some thrill seekers like their tournaments complete with twists, turns, and shocks aplenty. Defining a shock will always be a bit subjective, but here are a few that we investigated:

*Shock Winner –**A team other than**England, France, Germany, Spain, Belgium, Croatia, Italy, Netherlands or Portugal wins**Plucky Underdog**– A team other than England, France, Germany, Spain, Belgium, Croatia, Italy, Netherlands or Portugal reaches the final or the top 2 in the league format.**Rogue Semi-Finalist**– A team in the bottom quarter of our irreproachable Skill Level rankings reaches the semi-final or the top 4 for in league format.*

If follows from our expectations above that the league format gives little chance to these shock results, while a straight knockout format makes them much more likely. But what does the simulation say?

Those of you paying attention may have noticed that the Euro 2020 format has a lower probability of these shock results than the league. However, before you begin to question our intuition or fabulous model the reason for this will be explained in the next section on the standard deviation.

That aside there are no real surprises, the only other potentially unexpected result is that seeding apparently only makes a minuscule difference to the chance of an upset. I don’t know why Wimbledon bother.

For those of you looking at the x-axis, you’ll observe that the shock result probability is extraordinarily high, with the straight KO tournaments recording shock winners nearly 70% of the time, and even the league hitting above 50%. This is an unfortunate circumstance of our Skill Level metric which is looking more flawed by the paragraph. In any case, it’s the shape of the graph that is important here rather than the figures. However, the fairly uniform Skill Levels probably do feed into the similarity between seeded and unseeded knockout tournaments, perhaps rendering our exciting result void.

Comparing the modelled results to real world results is difficult as the competition has evolved so much over time. Perhaps the biggest shock result came in 2004 when Greece who were given odds as long as 150/1 went on to win what was then a 16-team tournament of a similar format to the present-day tournament. All winners since have been among the top teams in the tournament.

**Standard Deviation of Position**

We looked at shock results, but then thought we’d go a little further a look at the standard deviation of the position of each team. Before we proceed, it’s worth noting what ‘position’ means for a knockout competition. Say we are in a straight knockout with 32 teams. We have assumed that the teams getting knocked out in the first round come joint 32^{nd}, the teams getting knocked out in the second round come 16^{th}, and so on.

Now that’s understood, we go to our glamorous assistants, Belgium and Kosovo, to demonstrate the differences – but we also include Finland, for reasons that will soon become clear. Because position is hard to define in a Draw and Process competition, we’ve cunningly left it out:

This may resemble expectations less than the other graphs. Belgium behaves as we might expect – their stranglehold on the competition is less volatile in the league, but much more so in the knockout formats. However, Kosovo have almost the exact same volatility of results in the knockout tournaments as in the Euros. This is quite unintuitive at first – since the Euros incorporates a mini-league, shouldn’t that act to decrease volatility of results?

Finland goes some way to explaining this. They exhibit similar behaviour to Belgium, but are of a similar rank to Kosovo. The difference is that, although they are of a low rank, they have somehow managed to fluke their way into the Euros and so don’t need to engage in the 16-team playoff round. This playoff round means Kosovo are pretty much in a knockout competition from the off, explaining their high positional volatility.

## What Next?

With a year until the tournament kicks off, we’ve got plenty of time to expand and refine the model. We’ve already discussed refining the skill levels used in determining the outcome of matches – other potential extensions we could investigate would be:

- The obvious thing to do would be to model more competition formats
- The functions used to play matches could be refined. Instead of simply simulating the result, we could simulate the number of goals scored for each team. This would allow us to apply the goal difference rule to teams tied on points, as UEFA do, but would require a lot more data on each team.
- The point above opens the door to penalty shootouts in the knockout round. We all know how much fun they are, and when calibrated to past data may reduce some team’s odds more than others.
- We could also add simulations for other statistics used to differentiate teams with equal points – for example, the number of red cards they have received. At the risk of delving too much into data analysis, we could also create and calibrate different parameters for home and away games, adding a further level of detail to the model

Obviously there are many other directions we could take this and we are open to ideas so if you have any ideas that you think would be interesting to explore please do get in touch.

##### Craig Lynch

June 2020