With the Premier league season currently on hold, I thought now would be the ideal time to talk about the fantasy football model I made this season. Last season, I chose my players based on expected goals and fixtures, so a model that combined these to provide points predictions for each player seemed like the natural next step.

The model works by taking each player’s (non-penalty) expected goals and assists per 90 from the whole season, and adjusts them for each game based on the defensive strength of the opponent and if they are playing at home or away. This can be combined with the points awarded for a goal/assist to estimate the expected points for each player in each game. I use this estimate and each player’s position to estimate how many bonus points each player may expect to score each game, to give a total points estimate for each game. Unfortunately (or fortunately), there are no extra ‘frills’ to the model. However, I have built some tools to accompany my model, such as a program to build the optimal team based on my predictions. The data I base my model on is Statsbomb’s expected goals data. I wrote code to scrape data from the website myself, but there is also a good package on R which gets data from Understat (https://github.com/ewenme/understatr). The main thing I would say is that the data provider doesn’t make much difference to the accuracy of predictions (although it may make a bigger difference for a better model).

To evaluate my model, I’ve been using it to make predictions in chunks of 6 gameweeks and then comparing my predictions to player’s actual points total. If you compare the model predictions with players raw points totals over a certain period, the model is not that accurate, as the average prediction error (average difference between prediction and true points total) is normally around 8-10 points. However, if the predictions are scaled for each player based on the proportion of minutes played, the model is more accurate and the average prediction error is almost zero. As seen in the histogram below, the main problem with my model is that some players over perform or under perform their predictions by large amounts. For example, Robert Snodgrass and Sergio Agüero both over performed and under performed their predictions by 16 points respectively, when adjusting for the minutes they played. I think this shows how difficult fantasy can be, as no one in their right mind would have thought Snodgrass would do so well, whilst Aguero could easily have come out of this period of fixtures with some monster hauls. Alas, this why football, and fantasy football, is so enjoyable.

*Histogram of the prediction error, when predictions are scaled by proportion of minutes played. The red line is the average prediction error.*

To try and improve the variance of the prediction error, I investigated if there was a way to take form into account. This consisted of comparing player’s expected goals and assist stats over their last 3 games and comparing this to what the model predicted their expected goals and assists to be those games. This allowed me to calculate ‘form ratios’ for each player which I could multiply their season averages by. For example, if a player tallied up 3 (non-penalty) expected goals in the last 3 games, but the model only predicted 1.5 for these games, he’d get a form ratio of 2. I do this as saying ‘this player was top for expected goals in the last 3 weeks’ isn’t enough if they’ve only played Norwich, Aston Villa and Manchester United…Whilst this comparison between player’s actual stats and my model’s predictions obviously depends on the quality of my model, if my model is consistently over/underestimating a player, this form ratio may help correct it. However, my predictions with this form ratio were a bit worse than my normal model, as there was a small increase in (average) prediction error and its variance. This may because 3 games is an insufficient amount of to pick up actual changes in performance, and actual under/over performance in these games may just be natural variance. I could try repeating this process but using the last 5 games instead of the as 3. Although I could improve the way I estimate bonus points to take into account ‘talisman’ players e.g., Jamie Vardy, this potential improvement isn’t big enough to lower the variance of my model.

The main assumptions in my model are that home advantage is the same for each team and player and that each player will play 90 minutes each game. It would be difficult to estimate the effect of home advantage for each team as you’d have to wait until later in the season once teams have played enough games home and away. One way to get around this is to supplement the model with extra analysis, by looking at the home and away breakdown of a players’ goals/expected goals over the last few seasons, to see how they perform home and away. I did this in double gameweek 24 to see how Mane and Salah performed at home and away, but their home/away splits seemed fairly normal, so I didn’t think the model assumptions were massively wrong. With regards to the full minutes assumption, you could look at the proportion of minutes each player plays when fully fit, especially for Manchester City players. For rotated players on other teams, it’s unlikely you’d want to transfer them in anyway.

Like any model, this model should not be taken as gospel. At the start of next season, I’m not sure how useful the model will be as there’s too much uncertainty and not enough data at the start of a new season to base predictions on. I don’t let my model make transfers and pick my team for me, but it helps guide me. So far, this has helped me achieve an overall rank just below 18,000. Examples of good transfers I’ve made with help from my model are bringing in Danny Ings in gameweek 14 and Dominic Calvert-Lewin in gameweek 22. Another punt I was pleased with was when I transferred in Roman Saiss in gameweek 16, as my model was giving him high point predictions due to the chances he racked up from corners. I don’t always make successful transfers though. I took Martial out recently and he’s scored 3 8-pointers since.

I will try and improve the form element of my model by looking at the last 5 games instead of the last 3, but I can’t think of any other major improvements at the moment. If anyone has any suggestions about how to improve the model, feel free to mention it. I have seen other people on Twitter also have fantasy models, so it would be fun to get a predictions competition going, so get in touch if you want to do that.