Finding West Ham a Striker

Since Sebastian Haller’s departure to Ajax in the January transfer window, West Ham’s only senior striker has been Michail Antonio. Combined with the fact that Antonio has only played in around half of all games due to injuries, it’s no surprise that West Ham are keen to sign a striker. This article identifies players West Ham could target in the summer transfer window to bolster their front line.

A key assumption behind the methods used to try and identify recruits is that West Ham want to sign a player who plays in a similar manner to Antonio. This seems reasonable, given how well Antonio has done for West Ham in the last year. Figure 1 is FBREF’s scouting report for Antonio, which shows he’s almost in the top 5% of strikers for non-penalty expected goals per 90 over the last year. This assumption also makes it easier to identify potential targets. However, perhaps West Ham would like to sign a new striker to suit a different style of football to what they currently play, or someone who has a different skillset to Antonio and can offer something different.

Figure 1: Fbref’s scouting report for Antonio, showing how he performs in various stats

The data used for this analysis covers all forwards in the big five leagues for this past season, available from FBREF. If I had more time, previous seasons could also be looked at. Three methods have been used to identify potential recruits. More technical details on the first two methods used are in the appendix. The first is a method which aims to find players who have a similar statistical output to Antonio based on stats such as expected goals, dribbles, types of passes completed and defensive actions. Table 1 shows the 10 players most similar to Antonio, based on the first method. Fbref’s ‘similar player’ feature also includes Watkins and Dzeko as 2 of the 10 players most similar to Antonio, which suggests that this method seems reasonable.

Table 1: Players Returned By The First Method

The second method, which I will refer to as ‘stylistic scouting’, is similar to the first in that it uses the same algorithm to find similar players to Antonio. However, the data being used is different. Instead of using traditional stats such as ‘shots per 90’ or ‘dribbles per 90’, the data is the percentage of times a player attempts a shot/dribble/pass etc per game. As the data available covers all the different type of ‘actions’ a player can make in a game (e.g., passing, shooting, attempting a dribble or tackle), it’s possible to work out the percentage of a player’s ‘actions’ which are shots/passes etc. This method was used to try and take into account players who play for worse teams and may not have the same number of opportunities to shoot/dribble etc. It means that the method aims to find players who play in a similar style to Antonio, even if their raw output isn’t as good. Table 2 shows the 10 players most similar to Antonio, based on the stylistic scouting. In addition to Dzeko and Watkins, Fbref also have Belotti as one of the 10 players most similar to Antonio.

Table 2: Players Returned By The Stylistic Scouting

The final method was more simple, which only involved looking at which stats Antonio does well in and filtering the data to find players who perform similarly in those stats. Based on his scouting report in figure 1, Antonio is good at getting into favourable positions to score goals from and carrying the ball up the pitch into dangerous areas (due to his progressive carries). He is also involved in a lot of aerial duels, although he is roughly average winning aerial duels. Therefore, I decided to filter players based on their non-penalty xg per 90, progressive carries per 90, aerial duel success rate and aerials per 90. This was a more experimental process than the previous two methods, and different values for these variables were tried until a reasonable number of players under 26 years old were returned. Table 3 shows the players returned by applying these filters.

Table 3: Players returned by filtering the data


Both the first method and stylistic scouting calculate the Euclidean distance between each player and Antonio (based on the data used). For each method, the data was standardised to have zero mean and unit variance, to ensure that no particular variable would ‘dominate’ the distance calculation.

For the first method, 30 different variables were used. When applicable, ‘per 90’ stats were used instead of totals. As well as standard stats such as non-penalty expected goals, expected assists, dribbles attempted, some of the stats were very detailed and may be deemed as irrelevant for a striker. This includes passes attempted by length (short, medium, long), ball recoveries, interceptions and tackle success rate. However, I decided it would be easier to be less selective about which stats to include and then do a quick comparison (based on key stats) with Antonio for the most similar players returned to see if they’re sensible suggestions. Similarly, the distance calculations for the stylistic scouting were done on a lot of variables and the matching players were also checked manually to verify that they were a good match. This manual check was looking at which variables the returned player & Antonio were similar in. When the stylistic scouting suggested that Aaron Leya-Iseka plays similarly to Antonio, performing this manual check revealed that it was only because they matched on stats which are ‘less important’ for a player like Antonio, such as ball recoveries and progressive passes.

The data used for the stylistic scouting was the percentage of times a player tried a different ‘action’, such as shooting, dribbling, passing or tackling. Therefore, stats like expected goals or pass completion percentages weren’t included in this method, as they only describe the types of shots/passes a player makes. However, shots and passes can be broken down into different categories which give more information about the types of shots and passes being attempted.

One problem with the stylistic scouting is deciding how to split up events into more detailed categories, whilst making sure that nothing is double counted. One could also see this as an advantage, as it allows for customisation. I have split up pass attempts into failed passes, key passes, progressive passes and everything else (‘standard passes’). However, some key passes could also have been recorded as progressive passes. If I had raw data available and a better understanding of the way types of passes are defined, they could be broken down into more categories, such as passes into the final third and short, medium or long passes.

Shots have been broken down into shots on/off target as that should give some information about the shot location, but it would be better split up shots based on their actual location e.g., in or out of the penalty area. Dribbles have not been split up into successful or failed attempts, as I believe looking at just dribbles attempted gives a better indication of how often a player tries to dribble. Similarly, only total aerial duels contested was used.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s