Reading time: 9 min
The benefits of data-based decision-making in the world of sports is more and more becoming a trend. Last year’s Champions League winner Liverpool has a remarkable data analyst staff; a data analyst team played a significant role in achieving the 2018 World Cup final for the Croatian team. The Eastern European region is still on a developing path, but several initiatives have been taken recently. Hiflylabs has been involved in one such sports analytics project. This post is about this story combined with a little industry outlook.
Sports analytics becoming widespread
The topic of sports analytics first gained popularity with the movie “Moneyball” starring Brad Pitt in 2011. The story is about Billy Bean, the General Manager of the Oakland A’s baseball team, who is looking for new solutions to deal with his (the team’s) unfavorable financial situation. He hires an assistant, a young sport analyst, Paul DePodesta, with whom – based on statistical indicators -, he signs easily accessible and undervalued players, who play little or are in decline with their current team. They achieve the playoffs for 4 consecutive years from 2000, and in 2002, and set record for the first time in the 100+-year history of the MLB League to win 20 consecutive matches.
It is a good example that in life, new ideas are often born out of external constraints that eventually revolutionize an industry. However, baseball is perhaps one of the most “data-ready” sports, with pitching, batting, and fielding moments and their results easily identified. Can this be applied to other sports?
When the data began to be used in the NFL, the professional American football league, the majority was sceptical. “It’s not baseball, it’s a much more unpredictable and creative game here,” – was a popular opinion. Since then, the situation has changed so much that since 2014, every player has an RFID chip in their shoulder pads. So the exact movement of all players and the ball can be tracked by coaches, and even spectators. It enables almost “Real-time” analysis during life broadcasts using this data.
Is it possible to do this level of data analysis in football as well? Certainly not, because football is a much more unpredictable, creative… oh wait, it’s like it’s been already told somewhere …
Sports analytics in football
Let’s look at a simple example in Europe’s football.“Passing victim” is a ball-acquisition tactic designed to get the clumsiest player to pass a lot. The main point is that the 3 designated players of the typically 4-player defense are attacked by the strikers and midfielders, so the ball goes to the 4th, who can only pass forward. He is the “passing victim”.
Applying the method in the past required many hours of videotaping, which the video analysts of the teams did before each match. They had to spend several days each week analyzing the last 4-5 matches of the next opponent to find the best and worst players.
However, on a data basis, it goes much simpler. If we build a data warehouse with the most important data sources and performance indicators for the team, this information will be available in a few minutes. With 1-2 clicks of the analyst interface, you can even call up Arsenal’s worst pass defender in the last 3 years, spending at least, say, 300 minutes on the pitch. Moreover, our dashboard can show you who can be considered if you are looking for an attacker under the age of 21, taller than 180 cm, with good aerial and finishing skills, a Latin-speaker with at least 100,000 Instagram likes…
data, data, DATA
Much of the data is generated using human resources (Messi: left foot pass, 1 mark, Messi
right-foot left foot pass again, another mark). Although there are automated video analysis systems that use AI support, significant human effort is still required to validate them (e.g., to decide on a deflected ball whether it was a bad shot or a bad pass).
To perform the analyses, we can divide data into 3 major groups:
- Match statistics: squad, results, goal shots, passes (2-300 data points/match)
- Event data: game events by area, per player (2-3000 data points/match)
- Tracking data: stationary second-based events, second-based data about 22 + 6 players and the ball (2-3 million data points/match)
While it is worthwhile to develop simpler systems for “match statistics”, more sophisticated scouting and player profiling are possible with “event data”, and with the use of “tracking data” deep tactical analyses become possible. The larger the amount of data, the more expertise and money is required, so we should be aware of what information our team – our coaches and players – is capable of incorporating at all.
How much is this wealth of data worth?
Sports Analytics has grown into a huge market in the world. According to the research by Grand View Research, the value of the global sports analytics market will be around $ 1 billion in 2020 and is projected to reach $ 4.5 billion by 2025, of which the European football analytics market could be worth € 100-200 million. This shows that the competition is getting more intense, and teams that take actions in time can gain a significant competitive advantage, buy undervalued players, predict opponents’ weak points, track players’ daily physical condition, prevent injuries, and thus catch up with larger budget clubs.
The story of a sports analytics project
Recently, at the initiative of a well-known Eastern European football Club, we participated in the implementation of a pilot project. The project aimed to provide data-based support to the club’s decision-makers (manager + sports director) in the 3 following areas:
- Player performance evaluation (adult + youth team)
- Opponent analysis
During the implementation, we built a data warehouse, developed a Live Dashboard, plus developed a medium-term data strategy.
We need a team
It was an old dream of mine to be involved in a football-themed data project, but I didn’t know that this was the case with half of the company, so when the news of the project spread, we almost had to have a ”qualifier” to decide on the applicants. Finally, we stood out with a starting lineup of nearly 10 people, in the following positions:
- Data scientist
- Data engineer
- Dashboard expert
- UI / UX designer
- Strategic consultant
- Project manager
- Soccer analytics experts – it was extremely helpful to involve them in the team as consultants
Instead of the traditional top-down or bottom-up project methodologies, we used the methodology called “outside-in” (more here), in which the team was divided into 2 parts:
- the data analysis team started to seek and scrape available data (WyScout, Instat, Polar, Transfermarkt).
- meanwhile, the other half of the team, involving external football experts, focused on the end result to be presented, gathering insights that could be truly useful to the Club.
- the two teams kept iterating with each other, giving analysts direction on what data to look for and giving experts a clear picture of what will actually be feasible.
The All-Decisive Player Index (at least in FIFA)
Aggregating the data, we created the Player Index customized to the club’s player strategy. For each position, we created an index based on 10 different parameters (e.g., for strikers: successful tricks, goal shots, etc.; for defenders: tackles, won duels, etc.), however, the index gives a general picture of players regardless of position as well.
We indexed all the players in the league and youth league, for each match, so it was possible to track their progress by match, quarterly or even between seasons.
Based on these, we generated real suggestions that can arise in everyday life:
We created an online Power BI interface for the presentation (online, but not real-time, read more about the topic here, which was “clickable” during the project, so the club had an opportunity to test drive the system and add their suggestions before launching the final version.
Don’t build a nuclear missile, start small
Being a country of 10 million football coaches, we approached 1 million in ideas, but we had to keep our feet on the green lawn, prioritizing development proposals keeping resource constraints, and benefits in mind. On the one hand, even if we find correlations in the data, if we cannot use them professionally in sports (e.g. find a correlation between the number of yellow cards received and the number of throw-ins), and if we try to introduce an overdeveloped methodology in an environment where it has not been used due to high resistance, it is likely to it won’t succeed.
In the first round, it is always worth assessing the data maturity of the club, based on the organizational structure, IT skills, individual competencies, and making a corresponding proposal.
Then it is best to go step by step, and start with the so-called “Proof of Concepts” (stay tuned, the next Hiflylabs article on the topic is coming soon), test projects that pose little risk to clients, yet are fast, lasting 1-2 months, give a taste of what decisions can be made with data, improve colleagues’ data-maturity, and give direction to future developments.
Finally to answer the question in the title: overall, it’s unlikely that a third-tier team will beat Liverpool with a higher level of data maturity, however, they have better odds for a draw.
Balázs Füredi – Data Solution Advisor