I have recently talked about the best data sources for xG and the best ways to use xG data. In this post I’m providing the R code I use which:
- Scrapes non-penalty xG data (npxG) and expected assist (xA) data for all the players in the Premier League
- Scrapes npxG for and against data for each team in the Premier League
- Uses the above data in weighted averages to calculate npxG & xA baselines for each player (which are then used in my FPL model)
- Uses the above data in weighted averages to calculate offensive and defensive ratings for each team (which are then used in my FPL model)
The data is from FBREF.
You can use this code to apply the previous advice I gave you in order to
- Make better FPL decisions
- Build your own FPL model
How to run the Code
To use this code, you need to have some software installed which is capable of running R code i.e., R or Rstudio. I recommend Rstudio as it looks nicer and it’s easier to view the variables the code produces.
I’ve attached two files, one called ‘Alberts_scraping_code.R’ and the other called ‘Alberts_helper_functions.R’. The second script contains a lot of code (specifically, functions) which I use in the first script, where key variables are defined and it’s easier to see what is happening.
Below are the packages I use in my code. When running this code for the first time, it may ask you to install some of these packages, especially if you haven’t used R before. Go ahead and install the packages.

Below are the 3 main variables which you can change.

‘working_directory’ is the folder where you need to have saved the ‘Alberts_helper_functions.R’ file and where you want to save the csvs which get outputted.
‘window_length’ determines how many games worth of data to use when calculating the weighted averages. I recommend setting this variable to a number between 20 and 30.
The final variable ‘weight_type’ determines if a weighted mean or normal mean is used. It’s currently set to ‘w’ which means a weighted mean is used, if you want to use a normal unweighted mean set it to ‘s’.
Once all the code is run, it should output two csv files in the folder you chose to be your working directory. The first file is called ‘all_player_stats’, which contains the npxG & xA averages calculated for each player. It looks like this:

The second file is called ‘team_stats’, which contains the attack and defence ratings for each team. It looks like this:

Alternatively, the corresponding dataframes ‘all_player_stats’ and ‘team_stats’ can be viewed in Rstudio.
Let me know on twitter or via email (albyedw@yahoo.co.uk) if you have any questions, and I hope you find the code useful
Google Drive folder with the files to download: https://drive.google.com/drive/folders/1pnumMnD9_Wq4_aVvalhbJlocL27G7kZc?usp=sharing