London

81 Chancery Lane
London
WC2A 1DD
01235 821 160
View map

Edinburgh

1 Lochrin Square
92 Fountainbridge
Edinburgh
EH3 9QA
01235 821 160
View map

Dublin Office

77 Lower Camden Street
Dublin 2
Ireland
D 02 XE 80

View map

Administrative Office

BH Office
Church Street
Ardington
Wantage
Oxfordshire
OX12 8QA
01235 821 160
View map

Send us a message
CLOSE X
Contact Us
06.09.2021

The Fantasy Premier League API: Web Scraping with Python

Fantasy FootballHere at APR, one of our main aims is keeping up with developments in the actuarial world. From a technological perspective, one such development that’s become noticeable at clients is that both R and Python are gaining a lot of traction. R has recently been inducted into the actuarial examination structure, and clients are seeing how beneficial it can be for processing larger data sets.

Meanwhile, Python has an attractive flexibility and data science background. At APR, we therefore prize knowledge of these languages, and actively look to add value to our clients with them where possible.

At the same time, APR’s long-standing tradition of having at least 3 employees care about the Fantasy Premier League has just entered a fresh new season. This article will use the opportunity to illustrate some of the easy benefits of using Python in the context of the “FPL”.

Python Packages

One of the great benefits of Python is the global community of developers who create “packages” for other users. A “package” is basically a wrapped-up box of neat functions and features that can be distributed.

The ‘requests’ package provides users with the ability to read data from online APIs very easily. API stands for Application Programming Interface and is essentially a helpful intermediary that allows software to communicate with one another. For example, the Fantasy Premier League API allows the requests package to ‘ask for’ data and the API will return it in a specific format.

The ‘pandas’ package allows the user to read that data structure into something like a two-dimensional table (with headings) called a DataFrame.

Finally, the ‘pyplot’ package provides an easy graph plotter using the data that’s been analysed.

The Process

The general idea when interpreting information from an online source is to take it step by step:

  1. Use the requests package to ask for relevant information.
  2. Use the pandas package to play around with the information.
  3. Use the pyplot package to visualise the results in a way that is more understandable.

For the first step, the requests package needs a URL to direct to (i.e a location to ask its questions to). For the FPL API, this is the URL:

https://fantasy.premierleague.com/api

This URL is a base pathway – in order to ask it more specific questions, the user would need to append search terms to it. What these are will depend on what the user is interested in finding out. For example, it is possible to find out, at varying complexities:

Each of these bits of information will be held somewhere in the API data, and the user needs to have a good knowledge of where to find them. Some will be superficial information, available to everyone at all times. Some will be available to the public, but might need some data manipulation for any useful conclusions. Others may be sensitive (e.g private league information), and will require the user to provide log-in credentials.

For the purposes of this article, the focus will be on value for money, as an attempt to figure out if there are any undervalued players to add to a team. In particular, we will look at:

Value: How is it Measured?

This article assumes readers will have cursory knowledge of the Fantasy Premier League, including budgets, player selection, and points. If you are interested in finding out more, please go to:

Fantasy Premier League, Official Fantasy Football Game of the Premier League

The question we are interested in answering is: how do we know which players are good value for money?

The API provides a data field called “value_season”, which is not visible to regular users on the website. It essentially equals the points a player obtained divided by their cost to purchase. This will be used as a basis for further analysis. Intuitively, “good value for money” means high-performing AND cheap, or some kind of optimal combination of the two. It makes sense, therefore, to use a value that incorporates both of these characteristics.

The API database contains a lot of information about individual players. For example, the team they play for, the position they play, their value, their cost, their popularity among fantasy managers… it’s not all ready-made for someone to look at, the code needs to combine a few things together to get all this information in one place. In Python, that’s almost trivial, achievable in just a few lines.

Insights from the 2020/21 season

Teams – Value for Money

One of the rules governing the FPL game is that managers cannot choose more than 3 players from a single team. This means finding teams with “good value” players is crucial. It is possible to aggregate all that information from the API, grouping it together by the teams to get an idea of the teams which represent good value for money and the teams that don’t. Here, we see the final table:

The higher the value_season number, the better the players on a team performed relative to their cost. The top 5 contains title-winning Manchester City and 4 teams that were picked to do poorly and overperformed. The worst team in the division (excluding relegated sides) was Liverpool, who, being champions, had incredibly expensive players, but underperformed over the year.

There’s a common conception among managers of FPL that midfielders and forward score the big points, so provide the most value for money. In theory, this feels reasonable; the average forward on FPL costs £7m, midfielders £6m, whereas defenders and goalkeepers cost under £5m on average.

However, it is possible to get a view of how true this is by employing the same method as with teams above. This time, it is applied to playing position:

It transpires that the mark-up in cost for midfielders and forwards leads to them being comparatively low-value positions. By contrast, the goalkeeper position, if well chosen, can lead to a very good return for money, as is the case with defenders. Once again, the author feels mocked at this point. But a good takeaway from this? Don’t make goalkeepers the afterthought. It can be tempting to blow the entire budget on top quality midfielders, and then purchase the lowest cost goalkeeper to make books balance. However, selection of the right goalkeeper can really bring good value.

This article has covered broader thinking so far:

But knowing that a defender is worth investment does not make it easy to pick the right one. Understanding that Leeds United are basically a gold mine of value talent doesn’t mean a fantasy manager will win by picking their second-choice goalkeeper.

To get down to the question of who specifically should be picked, more specific data will need to be looked at.

For example, let’s try answering two questions:

  1. Which goalkeepers are the best value for money?
  2. Are there any undervalued midfielders?

For question 1, Python can once again be helpful in just a few lines of code. Using the pyplot library, it is possible to very quickly plot a scatter graph of goalkeepers. For the purposes of this analysis, only goalkeepers who scored above 100 points last season are included; this is to reduce the number of points on the graph and removing the low scorers seems reasonable.

The plot:

The plot above shows the goalkeepers in the Premier League, plotted by their cost for this season versus the number of points they gained last season. Remembering that value is defined as points/cost, in theory it is possible to draw an imaginary line of best fit. Above that line, all players are good value for money, that is to say, they get more points than expected for their cost. Below the line, players are less good value for money.

This plot has highlighted three specific players at three different price points:

To answer question 2, the term “undervalued” needs to be qualified. Usually, it would mean that the player is worth more than they cost; that is to say, similar to the positions analysis above, all three of those identified goalkeepers would be undervalued. However, in FPL mechanics, getting 10 points is only a definitive “good” result if a sufficient portion of competitors did worse. That is to say, if, for example, a player scores 5 goals in a game, picking them loses some value if everyone else picked them too.

As a result, for undervalued midfielders, the plot will not be points vs cost. Instead, it will plot the popularity of the player (by % of managers selected) against the value. The rationale here is that it is possible to aim for identifying the player that is very good value for money but that few people have identified as worth picking.

As these are midfielders, the plot was restricted to midfielders who cost above £7.0m (for this season) and only the best 20 by value are plotted to keep the graph legible.

The plot:

The plot above can give some insight into previous selection patterns. For example, Mohamed Salah is very good value for money, but is selected by 1 in 2 fantasy managers; therefore, selecting him is a good way to ensure other managers do not get a lead over you, but not a good way to create a lead over other managers. By contrast, Heung-Min Son is very good value for money, and is only selected by 1 in 5 managers, so might be a worthwhile risk to take on the rest of the field.

In general, the bottom right quadrant provides a list of players who are viable alternatives to the popular midfielders everyone already picks, and may even be better value for money if looking to save for a big purchase elsewhere. Other players to pick out include:

Conclusions

While the analysis performed above is not especially deep and will be subject to limitations, it illustrates how valuable insights can rapidly be gleaned from taking a closer look at the data. The key takeaways:

The code used to generate these results is available to any curious readers; just contact john.nicholls@aprllp.com for further information.

John Nicholls

September 2021