I entered the data visualization competition run by Chance Analytics. We were given a full season’s data from the Chinese Super League and told to visualize it as best as possible. I wanted to write about my visualization and creating data visualizations in general.
For my entry, I plotted the locations of low crosses which created a chance.
I received the data in an Excel spreadsheet. I first had to tidy up the data in the spreadsheet so that I could import the spreadsheet into R. This involved filling empty cells with ‘NA’. Stratagem had a subjective ‘chance quality’ for each chance, and I replaced this with the approximate expected goal value for each chance. I learnt how to use R at university, so although I encountered some difficulties, the programming was easier than expected. I used the basic R plotting function ‘plot’, and I made sure that the data was in the right structure e.g. making sure numbers were in a numeric format and not a string. I had attempted to color each point based on the probability of the shot the cross assisted being a goal i.e. the expected assist of each cross, but I encountered some difficulties. Therefore, my next step would be to learn how to use the R package ggplot2, as this produces better graphics and it would be easier to color each point based on its expected assist value with ggplot2. Another thing that I would change would be to remove the numbers from the axes.
I entered this competition to practice making data visualizations with football data. Before this competition, using R/another programming language in this context felt scary and complicated. However, I feel that thinking through what you want to communicate and how you communicate/visualize this is the most important part of creating a data visualization. Obviously once you know what you want to do you must program it, but this is just a means to an end that can be learnt. The actual programming wasn’t too difficult, as I discovered.