The Fantasy Premier League API: Web Scraping with Python
Here at APR, one of our main aims is keeping up with developments in the actuarial world. From a technological perspective, one such development that’s become noticeable at clients is that both R and Python are gaining a lot of traction. R has recently been inducted into the actuarial examination structure, and clients are seeing how beneficial it can be for processing larger data sets.
Meanwhile, Python has an attractive flexibility and data science background. At APR, we therefore prize knowledge of these languages, and actively look to add value to our clients with them where possible.
At the same time, APR’s long-standing tradition of having at least 3 employees care about the Fantasy Premier League has just entered a fresh new season. This article will use the opportunity to illustrate some of the easy benefits of using Python in the context of the “FPL”.
One of the great benefits of Python is the global community of developers who create “packages” for other users. A “package” is basically a wrapped-up box of neat functions and features that can be distributed.
The ‘requests’ package provides users with the ability to read data from online APIs very easily. API stands for Application Programming Interface and is essentially a helpful intermediary that allows software to communicate with one another. For example, the Fantasy Premier League API allows the requests package to ‘ask for’ data and the API will return it in a specific format.
The ‘pandas’ package allows the user to read that data structure into something like a two-dimensional table (with headings) called a DataFrame.
Finally, the ‘pyplot’ package provides an easy graph plotter using the data that’s been analysed.
The general idea when interpreting information from an online source is to take it step by step:
- Use the requests package to ask for relevant information.
- Use the pandas package to play around with the information.
- Use the pyplot package to visualise the results in a way that is more understandable.
For the first step, the requests package needs a URL to direct to (i.e a location to ask its questions to). For the FPL API, this is the URL:
This URL is a base pathway – in order to ask it more specific questions, the user would need to append search terms to it. What these are will depend on what the user is interested in finding out. For example, it is possible to find out, at varying complexities:
- The names of all teams participating in the Premier League this year.
- The names of all players representing those teams.
- The number of goals scored last season by player A.
- The number of goals scored so far this season by player B.
- The team names of all people who’ve entered into a public league (e.g England) this year.
- The team composition (i.e which players were picked) of the best performing fantasy manager in the world for last season.
Each of these bits of information will be held somewhere in the API data, and the user needs to have a good knowledge of where to find them. Some will be superficial information, available to everyone at all times. Some will be available to the public, but might need some data manipulation for any useful conclusions. Others may be sensitive (e.g private league information), and will require the user to provide log-in credentials.
For the purposes of this article, the focus will be on value for money, as an attempt to figure out if there are any undervalued players to add to a team. In particular, we will look at:
- The teams that represent good value for money.
- The positions on the pitch that represent good value for money.
- The players that represent good value for money.
Value: How is it Measured?
This article assumes readers will have cursory knowledge of the Fantasy Premier League, including budgets, player selection, and points. If you are interested in finding out more, please go to:
Fantasy Premier League, Official Fantasy Football Game of the Premier League
The question we are interested in answering is: how do we know which players are good value for money?
The API provides a data field called “value_season”, which is not visible to regular users on the website. It essentially equals the points a player obtained divided by their cost to purchase. This will be used as a basis for further analysis. Intuitively, “good value for money” means high-performing AND cheap, or some kind of optimal combination of the two. It makes sense, therefore, to use a value that incorporates both of these characteristics.
The API database contains a lot of information about individual players. For example, the team they play for, the position they play, their value, their cost, their popularity among fantasy managers… it’s not all ready-made for someone to look at, the code needs to combine a few things together to get all this information in one place. In Python, that’s almost trivial, achievable in just a few lines.
Insights from the 2020/21 season
Teams – Value for Money
One of the rules governing the FPL game is that managers cannot choose more than 3 players from a single team. This means finding teams with “good value” players is crucial. It is possible to aggregate all that information from the API, grouping it together by the teams to get an idea of the teams which represent good value for money and the teams that don’t. Here, we see the final table:
The higher the value_season number, the better the players on a team performed relative to their cost. The top 5 contains title-winning Manchester City and 4 teams that were picked to do poorly and overperformed. The worst team in the division (excluding relegated sides) was Liverpool, who, being champions, had incredibly expensive players, but underperformed over the year.
There’s a common conception among managers of FPL that midfielders and forward score the big points, so provide the most value for money. In theory, this feels reasonable; the average forward on FPL costs £7m, midfielders £6m, whereas defenders and goalkeepers cost under £5m on average.
However, it is possible to get a view of how true this is by employing the same method as with teams above. This time, it is applied to playing position:
It transpires that the mark-up in cost for midfielders and forwards leads to them being comparatively low-value positions. By contrast, the goalkeeper position, if well chosen, can lead to a very good return for money, as is the case with defenders. Once again, the author feels mocked at this point. But a good takeaway from this? Don’t make goalkeepers the afterthought. It can be tempting to blow the entire budget on top quality midfielders, and then purchase the lowest cost goalkeeper to make books balance. However, selection of the right goalkeeper can really bring good value.
This article has covered broader thinking so far:
- What team is worth picking from?
- What position is worth investing in?
But knowing that a defender is worth investment does not make it easy to pick the right one. Understanding that Leeds United are basically a gold mine of value talent doesn’t mean a fantasy manager will win by picking their second-choice goalkeeper.
To get down to the question of who specifically should be picked, more specific data will need to be looked at.
For example, let’s try answering two questions:
- Which goalkeepers are the best value for money?
- Are there any undervalued midfielders?
For question 1, Python can once again be helpful in just a few lines of code. Using the pyplot library, it is possible to very quickly plot a scatter graph of goalkeepers. For the purposes of this analysis, only goalkeepers who scored above 100 points last season are included; this is to reduce the number of points on the graph and removing the low scorers seems reasonable.
The plot above shows the goalkeepers in the Premier League, plotted by their cost for this season versus the number of points they gained last season. Remembering that value is defined as points/cost, in theory it is possible to draw an imaginary line of best fit. Above that line, all players are good value for money, that is to say, they get more points than expected for their cost. Below the line, players are less good value for money.
This plot has highlighted three specific players at three different price points:
- Emiliano Martinez: got more points than any other goalkeeper last season, and is £0.5m cheaper than the most expensive keepers.
- Illan Meslier: the Leeds goalkeeper scored the best among all keepers at his price point, and 3rd best overall. At £5.0m, he isn’t the cheapest around.
- Vicente Guaita: the Crystal Palace keeper can be seen as the “budget” option. He wasn’t in the top 10 scoring goalkeepers, but in terms of value for money, he is 3rd in the entire cohort. Consider choosing him if money is tight.
To answer question 2, the term “undervalued” needs to be qualified. Usually, it would mean that the player is worth more than they cost; that is to say, similar to the positions analysis above, all three of those identified goalkeepers would be undervalued. However, in FPL mechanics, getting 10 points is only a definitive “good” result if a sufficient portion of competitors did worse. That is to say, if, for example, a player scores 5 goals in a game, picking them loses some value if everyone else picked them too.
As a result, for undervalued midfielders, the plot will not be points vs cost. Instead, it will plot the popularity of the player (by % of managers selected) against the value. The rationale here is that it is possible to aim for identifying the player that is very good value for money but that few people have identified as worth picking.
As these are midfielders, the plot was restricted to midfielders who cost above £7.0m (for this season) and only the best 20 by value are plotted to keep the graph legible.
The plot above can give some insight into previous selection patterns. For example, Mohamed Salah is very good value for money, but is selected by 1 in 2 fantasy managers; therefore, selecting him is a good way to ensure other managers do not get a lead over you, but not a good way to create a lead over other managers. By contrast, Heung-Min Son is very good value for money, and is only selected by 1 in 5 managers, so might be a worthwhile risk to take on the rest of the field.
In general, the bottom right quadrant provides a list of players who are viable alternatives to the popular midfielders everyone already picks, and may even be better value for money if looking to save for a big purchase elsewhere. Other players to pick out include:
- Jack Grealish: he was the 3rd most popular >£7.0m midfielder last season but was outside the top 10 in terms of value.
- Ilkay Gundogan: the Manchester City midfielder went on a goalscoring tear last season when no one knew he could, generating a great points return, despite being a cheap midfielder initially (in fact, on last season’s price, he would not have even qualified for this dataset).
While the analysis performed above is not especially deep and will be subject to limitations, it illustrates how valuable insights can rapidly be gleaned from taking a closer look at the data. The key takeaways:
- With Python, a few lines of code can go a long way, and the ability to get information off websites is incredibly useful.
- We now know what APIs are, and that the requests package can be used for getting information from them.
- FPL insights:
- When it comes to teams, the best value for money comes from sides that overperform, even if they don’t necessarily finish near the top in the actual table.
- When it comes to positions, think twice before blowing the entire budget on midfielders and forwards; with the right picks, great value can be gained from those at the back!
- When it comes to players, think about the best that can be done at various price points, and consider the likely popularity of flagship players when trying to judge their competitive value.
The code used to generate these results is available to any curious readers; just contact firstname.lastname@example.org for further information.