Players don't contribute the way you think
Introduction
In a game where the objective is to outscore the opponent, naturally, the light is focused primarily on the ones who score or assist the goal. However, football is a team game and no player can score goals on his own against the whole opposition, apart from maybe one small Argentinian. There is a lot that has to happen before the ball gets into a position for a shot on goal, some events increase the likelihood of a shot, and some decrease it. And players who increase the likelihood of a shot are just as important as the ones giving the assist or scoring the goal and need recognition. Furthermore, conventional statistics favour players from better teams. Naturally, teams which dominate games and generate more shots will allow for players to accumulate more assists, crosses, through balls etc. However, in teams which do not get the opportunity for a shot often, it is important to highlight the players who are most relied on to create. Alternatively, players who progress the ball further in the first two-thirds of the field should be credited for successfully putting the ball into dangerous areas which raises the probability of a dangerous attack. On the other hand, players who continuously decrease that likelihood could be silently holding the team’s performance back and working towards improving them can be equally as effective to the final outcome.
This article will explore the possibility of evaluating player contribution to shots. Inspired by an article written by StatsBomb, a first-order Markov model will be constructed to take into account all events that have happened in a single season for Chelsea WFC and evaluate their contribution towards a shot.
First-order Markov chain
A Markov chain is a random process describing a sequence of possible events in which the probability of each event depends on events that happened before. This analysis will use a first-order Markov chain meaning that the probability of each event depends only on the state attained in the previous event. A graphical representation of the chain is given in Figure 1. The chain is divided into transition states and absorption states.

States that have a probability of transitioning into another state are called transient states.
States in which the probability of remaining in the state is 1 are called absorption states.
Once the probabilities involving all states are calculated, a matrix is constructed to reflect those probabilities called a transition matrix. Similarly, an absorption matrix is constructed reflecting the probabilities of each transient state resulting in an absorption state. The probability of reaching an absorption state from a given transient state in n-steps is calculated. A fundamental matrix is derived from the transition matrix which is then used to calculate the expected number of plays as its row sums. Finally, the ‘contribution factor’ is calculated by the product of the fundamental matrix and the absorption matrix.
Methodology
To achieve the analysis, the necessary cleaning process needs to be carried out. As with previous analyses, the cleaning starts with splitting the pitch into 30 zones and excluding the events which are determined to not add any additional value to the possession chains. Furthermore, only shots which meet a certain xG threshold would be considered shots. For a more detailed explanation of these steps, previous posts can be checked.

For absorption states two possible outcomes will be considered, namely:
- Shot
- Loss of possession
Whether the possession ends with a shot, which is considered a positive outcome, or loss of possession, a negative outcome. A decision has been made to include shots as an outcome instead of goals as even though goals provide the highest value in the context of a football game, some valuable plays which do not end up in a goal get missed out. Similarly with shots, there will be valuable plays which will not result in a shot. However, less valuable plays will be missed out through tracking shots, than goals since the first one occurs more frequently than the latter. In that respect, it is believed that having shots as an outcome will capture the true value of a player to a greater extent.
In terms of transition states, for the purpose of this analysis, we are interested in the zones which contribute to a positive outcome the most. For this reason, each zone will be considered a transient state. On top of that four more set-piece scenarios have been added, namely:
- Goal kick
- Throw in
- Corner
- Free kick
Combined together there are 34 transient states which will be used.
Once the transient and absorption states are known, the dataset would need to be manipulated to accommodate for the analysis. The dataset is divided into possession sequences where each possession sequence is considered individually. Each event in the possession is then coupled with its descendant, meaning that for any given event, the current state is the zone or set-piece that it has occurred in and the next state is the zone or set-piece of the next event in the couple. An exception is only made for the last event of a possession sequence where the next state ends in either a ‘shot’ or a ‘loss of possession’. Events after manipulation are shown in Table 1.

A Markov model is then run on the dataset to generate the probabilities of going between transient states and absorption states. This will then be used to calculate the shot contribution value of each transient state. This will be achieved with the following matrices:
(1) 34x34 transition matrix — containing the probabilities of each transition state to end up in any other transition state
(2) 34x2 absorption matrix — containing the probabilities of each transition state to end up in the respective absorption state
(3) 34x34 identity matrix
(4) 34x34 fundamental matrix — the inverse of the identity matrix minus the transition matrix
The probability of each transition state leading to an absorption state is calculated by the product of the fundamental matrix and absorption matrix. When the probabilities are known, the player contribution can be calculated by assessing if the play done by the particular player increases the probability of a shot or decreases it. When the next event is a shot the probability is equal to 1 and if it is a loss in possession then it is equal to 0.
Player contribution = Probability of a shot | Next event — Probability of a shot | Current event
Results
Results of the probabilities have been presented in Figures 2 and 3.


Figure 2 shows that rather unsurprisingly, zone 28 is the area which has the highest probability of preceding a shot. It is followed by corners which serves as an indication that the team takes advantage of corner set-pieces to generate shots. It can be observed that the left-hand side area has a slight edge for shot generation Zone26 narrowly edges Zone 24 and is followed by Zone 22 whilst Zone29 and Zone27 have a fairly similar probability. On the other end of the field, as expected, are the areas which contribute to a shot the least, with the lowest being Zone1 and Zone5.
The derived equation for player contribution has been used to evaluate each event. In the end, all events in which a particular player has been involved have been summed to calculate the player’s season contribution towards shots for the team. The results are presented in Figure 4.

Analysing Figure 4 provides some interesting insights. Unsurprisingly, the model determines Sam Kerr as the player with the highest contribution towards shots with Pernille Harder and Francesca Kirby second and third. This could be largely influenced by those three players being the ones with the most shots on the team, meaning that they convert chances into shots which would eventually score goals. On the other hand, it could also serve as an indication that those players progress the ball successfully into more dangerous areas more often than not. On the other side, we can see that Bethany England has the lowest shot contribution from the team. The result can be broken down into a few potential reasons. Firstly, the model punishes giving possession away, meaning that if the play style of the player is to attempt many aggressive and high-risk passes albeit most of them being unsuccessful, it would drive down contribution due to the fact that the ball is given away more times than chances have been created. Secondly, players which tend to be on the other side of the spectrum and play too many safe passes in advanced areas are also punished by the model since instead of progressing the ball into an area with a higher possibility of a positive outcome, the ball is recycled and chances are not taken. Figure 5 represents the difference between the current model and the one that was built, taking into account if the player was under pressure during the event. In some cases, a player not being under pressure in deeper zones might prove to be more impactful for creating a shot than being under pressure in an advanced area. With that regard, the model which includes pressure has the potential to be more accurate.

It can be observed that the values have not changed much, mainly due to the fact that in the most dangerous areas, players will very rarely not be under pressure.
It is important to clarify that the model does not take into account any tactical considerations where players are tasked with the responsibility of consolidating play or ones who are given the freedom to attempt riskier passes with the possibility of making that one killer pass that will ultimately decide a game. Different models can then be constructed, using different transition and absorption states to capture any specific coaching requirements. Furthermore, the model does not take into account the context of the event. Details such as state of the game, opponent or occasion have not been taken into account. Differentiating between whether the player was under pressure or not did not provide significant changes. Estimation accuracy can also be improved by implementing a higher-order Markov model which takes into account the previous state before reaching the current one. Lastly, the model does not consider how events transition from one state to another. A Markov decision process could be constructed to accommodate that need.
Conclusion
A first-order Markov model has been constructed and used to analyse the 20/21 season of Chelsea WFC. It aimed to evaluate players based on their contribution towards creating shooting chances and taking shooting opportunities. It was also used to get a visual representation of which zones have the highest probability of creating a shooting chance where it was concluded that the left-hand side is slightly more dangerous than the right-hand side with corners possessing the second highest threat of a shot after Zone 28. Players playing in forward or midfield positions yield the highest shot contribution. Defenders having low shot contribution values can serve as an indication that Chelsea does not use defenders as much to progress the ball into dangerous positions. This can be a result of tactical instructions or a lack of qualities to do so. Interesting insights have been extracted when looking at the least contributing players, which can lead to analysis in further detail when raised to the coaching staff.
Player evaluation has always been a crucial part of any competitive team. Recognising players as ones who perform up to standard and on the other hand, players who need to be developed further is embedded into the work cycle of any coaching staff. Before the digital era, conclusions have been deducted primarily based on subjectivity. However, technology has given the opportunity for a more objective approach to player evaluation based on pre-agreed criteria. A vast amount of parameters can be taken into account over a long time span, helping to uncover certain players’ merit, which could have been invisible to the naked eye. In doing so, a team can not only recognise and retain talent more effectively but also scout and uncover hidden gems which can vastly improve certain aspects of the team, something which in the long run can make a difference.
Comments
Post a Comment