Introduction

Over the past decade, various types of data have become available for sports data analysis. In particular, the prevalence of play-by-play1 and tracking data2 for soccer games has enabled new analyses that were previously impossible3. Some typical examples include ball-passing network analysis4,5, formation analysis6,7, and space evaluation8,9,10. Such research topics require a variety of statistical analysis methods, including network theory, computational geometry, and machine learning. In particular, machine learning is an effective tool for soccer game analysis because soccer produces very complex behaviors from simple unified rules and has a huge accumulation of data. Soccer games are a good laboratory to develop cutting-edge machine learning techniques11,12.

Statistical properties of player interactions and collective motions are also hot topics in soccer game analysis. In terms of statistical physics, soccer players can be regarded as self-propelled particles13. We can apply some techniques developed for characterizing the self-propelled particles, including flocks of birds or fish schools14. The examples include the detection of highly correlated segments using directional correlation functions15, the characterization of order-disorder transition16, and the modeling of soccer players’ motion by the self-propelled player model17. Furthermore, the dynamics of player interactions and ball passing in soccer games are described as stochastic processes, such as the Markov chain18 and the first-passage process19.

The widespread use of these new data has also made it possible to test mathematical models that have mainly been discussed theoretically in soccer game analysis. One of the essential models in soccer games is a motion model, which calculates the arrival point of a player in t s based on the current location and velocity. The two practical applications of motion models in soccer games are space evaluation and pass prediction. In space evaluation, a fundamental concept is the “dominant region,” defined as the region where a specific player can arrive before other players3,20. In general, we can estimate the dominant region of each player by comparing the arrival times of all players to each location in the field. In addition, the outcome of a given pass can be estimated by calculating the arrival time of each player on the ball trajectory17,21,22,23. Because the motion model can calculate the arrival times of a player to a specific location, it is essential for soccer game analysis.

Thus far, various motion models have been proposed. The simplest motion model assumes uniform linear motion for all players, resulting in the Voronoi region as the dominant region. More realistic models have also been proposed; for example, Taki and Hasegawa assumed uniform accelerated motion of players20,24, and Fujimura and Sugihara considered viscous resistance21. These motion models are based on the equation of motion and are often referred to as the “physics-based motion model.” They provide the arrival points of players under specific conditions such that each player moves to all locations by sprinting.

Another type of motion model calculates the arrival points of players based on machine learning25,26,27. This “probabilistic motion model” can predict realistic arrival points based on previous tracking data, though it is costly for learning. We note that because the tracking data used for learning include various running patterns other than a sprint, the predicted arrival point does not necessarily mean the position arrived by sprinting.

Both physics-based motion models and probabilistic motion models aim to predict the arrival point of players. Physics-based motion models also play a role in elucidating the principles of players’ movement laws. This study focuses on the motion model proposed by Fujimura and Sugihara (hereinafter Fujimura–Sugihara model) in sprint conditions to elucidate soccer players’ movement laws during sprinting. Specifically, we aim to investigate the validity and limitations of the Fujimura–Sugihara model based on soccer tracking data. We stress that the motion model in the sprint condition is significant in situations where we calculate the minimum arrival time of players to each location. The minimum arrival time is utilized to evaluate which player can receive the ball21,22 and to quantify the degree of safety and sparsity of each location in the field10. Thus, the present study provides an essential basis for various applied analyses.

In our investigation, we first focused on the shape of the arrival region of players predicted by the motion model. The Fujimura–Sugihara model generates a circular arrival region21. However, it has been pointed out that the arrival region should be elliptical, considering the difference in the acceleration ability of players depending on the direction of motion8,26. We show that the circular arrival region is obtained from soccer tracking data, and the arrival region’s initial speed dependence also satisfies the Fujimura–Sugihara model’s solution. Next, we propose a method to estimate the kinetic parameters of the Fujimura–Sugihara model. Contrary to previous experiment-based methods21, this method can estimate valid parameters directly from tracking data. Finally, we discuss the limitations of the Fujimura–Sugihara model for soccer games based on the time dependence of the estimated kinetic parameters.

Method

Data

The datasets used in this study were from 54 soccer games in the top division of the Japan Professional Football League (J1 League). Each game was played by 18 teams in 2016. The primary data of our dataset is absolute positional coordinates (xy) of all players every 0.04 s, collected by using the TRACAB system28. The x and y coordinates are considered to contain an error of \(\pm 1\) m by assessing the accuracy of the TRACAB system. These datasets were provided by DataStadium Inc., Japan, which was authorized to collect and sell these data under a contract with the J League29. This contract also ensures that the use of relevant datasets does not infringe on the rights of players and clubs belonging to the J League. Although the datasets were proprietary, we received explicit permission from DataStadium Inc. for use in this study. This study’s Data analyses and visualizations were performed using Python packages on an iMac Pro system with a 3-GHz 10-Core Intel Xeon W processor and 128 GB of memory.

Preliminary analysis

We summarize the basic properties of soccer players’ motions. The player’s velocity \(\vec {v}(t)\) was calculated as the difference between the current position and the position 1 s ago. Figure 1a shows the speed distributions for the goalkeeper and the other players obtained from a game. It is found that all players have a peak at \(v \simeq 1\) m/s, whereas the non-goalkeepers have a second peak at \(v \simeq 3\) m/s. These two peaks correspond to walking and jogging, respectively. Figure 1b shows that the speed distribution of the non-goalkeepers is more widely distributed than that of the goalkeeper, and both decay almost exponentially. Because the sprint is observed mainly for players other than the goalkeeper, we exclude goalkeepers in the following analysis.

Figure 1
figure 1

Speed distributions of soccer players. (a) Linear plot and (b) semi-logarithmic plot.

We also examine mean squared displacement (MSD) to characterize players’ trajectories. When a player at location \(\vec {x}(t)\) at t moves to location \(\vec {x}(t+\tau )\) after \(\tau\) [s], MSD is defined as \(\langle |\Delta \vec {x}|^2\rangle _{\tau } = \langle |\vec {x}(t+\tau ) - \vec {x}(t) |^2\rangle _{\tau }\). In general, MSD is scaled as

$$\begin{aligned} \langle |\Delta \vec {x}|^2\rangle _{\tau } \sim \tau ^{\beta }, \end{aligned}$$
(1)

where the exponent \(\beta\) reflects the trajectory of the player; in particular, \(\beta =1\) and 2 correspond to the simple random walk and linear motion of the player, respectively. In real data analysis, we calculated MSD for a player as the long-time average defined as follows:

$$\begin{aligned} \langle |\Delta \vec {x}|^2\rangle _{\tau } = \frac{1}{T - \tau } \sum _{t=0}^{T - \tau } |\vec {x}(t+\tau ) - \vec {x}(t)|^2, \end{aligned}$$
(2)

where T is the length of a single time series from a restart to a time-out in a game. In Fig. 2, we present the \(\tau\) dependence of MSD obtained from the entire time series in a game; each line in Fig. 2 is obtained by averaging MSD over all players except for the goalkeeper. It is found that each line exhibits \(\beta =2\) in \(\tau \lesssim 10\) s and \(\beta =1\) in \(\tau \gtrsim 10\) s. This result indicates that each player moves in a straight line for up to 10 s and then changes direction randomly. Thus, the physics-based deterministic motion model is considered valid for a range of up to 10 s.

Figure 2
figure 2

\(\tau\) dependence of the mean squared displacement of soccer players. The graph is shown in double logarithmic scale.

Fujimura–Sugihara model

We summarize the motion model proposed by Fujimura and Sugihara21. For the position \(\vec {x}(t)\) of a player at time t, the Fujimura–Sugihara model is given by the following equation of motion:

$$\begin{aligned} m \frac{d^{2} \vec {x}(t)}{d t^{2}}&= F \vec {n} - k \frac{d \vec {x}(t)}{d t}, \end{aligned}$$
(3)

where m is the mass of the player, F and \(\vec {n}\) are the magnitude and direction of the driving force, respectively, and k is the coefficient of viscous resistance. In other words, the player accelerates in the direction of \(\vec {n}\) with a magnitude F, and it becomes harder to accelerate in proportion to its velocity \(d\vec {x}(t)/dt\). Given an initial position \(\vec {x}_{0}=(0, 0)\) and an initial velocity \(\vec {v}_{0}\), the solution is given as follows:

$$\begin{aligned} \vec {x}(t)&= \frac{1 - \exp (-\alpha t)}{\alpha } \vec {v}_{0} + V_{\textrm{max}} \left( t - \frac{1 - \exp (-\alpha t)}{\alpha }\right) \vec {n}, \end{aligned}$$
(4)

where \(V_{\textrm{max}} = F/k\) and \(\alpha = k/m\) are arbitrary constants referred to as “kinetic parameters.” Here, we define the first and second coefficients in Eq. (4) as

$$\begin{aligned} A(\alpha , t)&= \frac{1 - \exp (-\alpha t)}{\alpha }, \end{aligned}$$
(5)
$$\begin{aligned} B(\alpha , V_{\textrm{max}}, t)&= V_{\textrm{max}} \left( t - \frac{1 - \exp (-\alpha t)}{\alpha }\right) = V_{\textrm{max}} (t - A(\alpha , t)). \end{aligned}$$
(6)

When the direction of the driving force \(\vec {n}\) changes arbitrarily, the arrival points in t [s] of the player are distributed on a circle with center \(A(\alpha , t) \vec {v}_{0}\) and radius \(B(\alpha , V_{\textrm{max}}, t)\); this circle is referred to as “arrival circle” hereinafter. Fujimura and Sugihara conducted a sprint experiment with three amateur hockey players to estimate the kinetic parameters. By fitting the obtained speed curve with the solution of the Fujimura–Sugihara model, they empirically obtained the kinematic parameters \(V_{\textrm{max}}=7.8\) m/s and \(\alpha =1.3\) 1/s as the typical values during sprinting21.

Investigation method

To characterize the arrival points of players in \(\Delta t\) [s] using tracking data, we adopt the method used in previous studies21,26. First, the velocity \(\vec {v}(t)\) of each player at time t is converted to start at the origin (0, 0) and point in the positive direction of the x axis (Fig. 3). We then plot the location of the same player at \(t+\Delta t\) on the same coordinates. By repeating this plot for various players and t, we obtain a heat map formed by the arrival points in \(\Delta t\) [s].

Figure 3
figure 3

Coordinate system for the calculation of heat maps.

The shape of the heat map reflects the movement pattern of each player during \(\Delta t\). For example, when players lose speed during \(\Delta t\), they arrive at a point within the heat map. However, they arrived at a point around the boundary in the case of sprinting. Because we investigated the validity of the Fujimura–Sugihara model under sprint conditions, we focused on the shape of the heat map’s boundary and compared it with the solution (4). We set \(\vec {v}_{0} = (v_{0}, 0)\), where \(v_{0} > 0\), in Eq. (4) for comparison with the heat map.

In the following analyses, we calculated heat maps using all data during the playing time of 54 games for all players except the goalkeepers. The controlling parameters were the initial speed \(v_{0}\) [m/s] and time interval \(\Delta t\) [s]. Specifically, for a fixed \(\Delta t\), we calculated the heat maps for the initial speed \([v_{0}, v_{0} + \Delta v_{0})\), where \(\Delta v_{0} = 0.3\) m/s (\(\simeq 1\) km/h). We set one cell side in the heat map to \(0.2\times \Delta t\) [m], and only cells adjacent to more than c nonblank cells were used for the analyses. To exclude isolated cells as outliers, we manually set the value of c to eight for \(\Delta t < 1\), six for \(1 \le \Delta t \le 3\), and four for \(3 < \Delta t\).

Result

\(v_{0}\) dependence of heat map

Figure 4a,b present the heat maps of the players’ arrival points for various initial speeds, \(v_{0}=1,\ 3,\ 5,\ 7\) m/s, where \(\Delta t = 1\) and 2 s. We focused only on the shape of each heat map’s boundary, although the color gradation of the heat map is proportional to the number of data points. Remarkably, Fig. 4a,b show that the boundary of the heat map is not elliptical but circular for any initial \(v_{0}\). This result is consistent with the solution (4) of the Fujimura–Sugihara model.

Figure 4
figure 4

Heat maps of players’ arrival points for various initial speed \(v_{0}=1, 3, 5, 7\) m/s, where (a) \(\Delta t = 1\) s and (b) \(\Delta t = 2\) s. The dotted line in each panel represents the arrival circle (4) with kinetic parameters estimated from the heat maps.

Next, we approximated the boundary of each heat map as a circle and estimated its center coordinates \((x_{\textrm{c}}, y_{\textrm{c}})\) and radius \(r_{\textrm{c}}\). After excluding the isolated heatmap cells as outliers, we calculated the maximum and minimum values of the heat maps x and y coordinates, \(x_{\textrm{max}}\), \(x_{\textrm{min}}\), \(y_{\textrm{max}}\), and \(y_{\textrm{min}}\). Then, \((x_{\textrm{c}}, y_{\textrm{c}})\) and \(r_{\textrm{c}}\) were estimated as follows:

$$\begin{aligned} (x_{\textrm{c}}, y_{\textrm{c}})&= \left( \frac{x_{\textrm{max}}+x_{\textrm{min}}}{2}, \frac{y_{\textrm{max}}+y_{\textrm{min}}}{2}\right) , \end{aligned}$$
(7)
$$\begin{aligned} r_{\textrm{c}}&= \frac{x_{\textrm{max}} - x_{\textrm{min}} + y_{\textrm{max}}-y_{\textrm{min}}}{4}. \end{aligned}$$
(8)

Figure 5 shows the \(v_{0}\) dependence of \(x_{\textrm{c}}\) and \(y_{\textrm{c}}\). We find that \(y_{\textrm{c}}\) is independent of \(v_{0}\); namely, \(y_{\textrm{c}} \simeq 0\). However, \(x_{\textrm{c}}\) is proportional to \(v_{0}\), particularly for \(v_{0} \lesssim 6\) m/s. The dotted lines in each panel of Fig. 5 represent the regression line for the points where \(v_{0} \le 6\) m/s. These results are consistent with the solution (4) of the Fujimura–Sugihara model.

Figure 5
figure 5

\(v_{0}\) dependence of the center coordinates \(x_{\textrm{c}}\) and \(y_{\textrm{c}}\) of the estimated circle of each heat map, where (a) \(\Delta t = 1\) s and (b) \(\Delta t = 2\) s. The dotted lines in each panel represent the regression line for points where \(v_{0} \le 6\) m/s.

We also examined the \(v_{0}\) dependence of the radius \(r_{\textrm{c}}\) of the estimated circle. Figure 6 shows that \(r_{\textrm{c}}\) becomes almost constant, particularly for \(v_{0} \lesssim 6\) m/s. The dotted line in each panel of Fig. 6 represents the average value of \(r_{\textrm{c}}\) where \(v_{0} \le 6\) m/s. As \(B(\alpha , V_{\textrm{max}}, t)\) in Eq. (4) is independent of \(v_{0}\), this result is also consistent with the solution (4) of Fujimura–Sugihara model.

Figure 6
figure 6

\(v_{0}\) dependence of the radius \(r_{\textrm{c}}\) of the estimated circle of each heat map. The dotted line represents \(r_{\textrm{c}} = 6.16\) m for (a) \(\Delta t = 1\) s, and \(r_{\textrm{c}} = 12.94\) m for (b) \(\Delta t = 2\) s. These values are obtained by averaging the points where \(v_{0} \le 6\) m/s.

Estimation of kinetic parameters using heat maps

According to the solution (4), the x coordinate of the arrival circle is proportional to \(v_{0}\), and the proportionality coefficient is given by Eq. (5). Because we obtained the result that \(x_{\textrm{c}}\) calculated from the heat map is proportional to \(v_{0}\) (refer to Fig. 5), we can estimate the kinetic parameter \(\alpha\) using Eq. (5). Specifically, the proportionality coefficients obtained from the regression line shown in Fig. 5a,b are 0.58 and 0.84; then, we obtain \(\alpha =1.23\) and 1.04 1/s for \(\Delta t = 1\) and 2 s, respectively.

The radius of the arrival circle is given by Eq. (6). As shown in Fig. 6a,b, the radius of the estimated circle of each heat map has become \(r_{\textrm{c}}\simeq 6.16\) and 12.94 m for \(\Delta t = 1\) and 2 s. Thus, the estimated values of \(\alpha\) and Eq. (6) can yield another kinetic parameter \(V_{\textrm{max}}\); we obtain \(V_{\textrm{max}}=14.53\) and 11.19 m/s, respectively.

The dotted lines in each panel of Fig. 4 show the arrival circle calculated by the solution (4) with the above estimated kinetic parameters \(\alpha\) and \(V_{\textrm{max}}\). We found that the solution (4) of the Fujimura–Sugihara model can correctly predict the boundary of the arrival region.

\(\Delta t\) dependence of kinetic parameters

We also analyze the dependence of the kinetic parameters on \(\Delta t\). As a result of the same analyses in the previous section for different \(\Delta t\), we confirm the characteristics shown in Figs. 4, 5, and 6 for all \(\Delta t\). We present the \(\Delta t\) dependence of the estimated kinetic parameters in Fig. 7. Although \(\alpha\) and \(V_{\textrm{max}}\) are assumed to be constant in the Fujimura–Sugihara model, we found that the estimated kinetic parameters vary with \(\Delta t\), particularly in the range of \(\Delta t \lesssim 1\) s.

Figure 7
figure 7

\(\Delta t\) dependence of kinetic parameters, \(\alpha\) and \(V_{\textrm{max}}\).

Discussion and conclusion

We investigated the validity of the Fujimura–Sugihara model based on heat maps of the players’ arrival points obtained from soccer tracking data. Our results can be summarized as follows. First, the boundary of the heat map became a circle rather than an ellipse. Second, \(x_{\textrm{c}}\) was proportional to \(v_{0}\) and \(y_{\textrm{c}}\) was independent of \(v_{0}\). Third, \(r_{\textrm{c}}\) was independent of \(v_{0}\). These results are consistent with the solution (4) of the Fujimura–Sugihara model. We also proposed a method for estimating valid kinetic parameters in the Fujimura–Sugihara model. Meanwhile, the estimated kinetic parameters varied with \(\Delta t\), particularly in \(\Delta t \lesssim 1\) s; this result is inconsistent with the assumption of the Fujimura–Sugihara model.

In the previous study by Fujimura and Sugihara, the kinetic parameters were estimated by a sprint experiment21. The experiment subjects were members of a college field hockey team who ran in a straight line by sprinting. They empirically obtained the kinetic parameters \(\alpha = 1.3\) 1/s and \(V_{\textrm{max}}=7.8\) m/s by fitting the solution of the Fujimura–Sugihara model to the speed curve of each subject. In the present study, we estimated the parameters based on each soccer player’s real positional data. The obtained values were slightly different from the previous study; for example, \(\alpha = 1.04\) 1/s and \(V_{\textrm{max}}=11.19\) m/s for \(\Delta t = 2\) s. However, our estimation method of kinetic parameters is more reasonable than the previous empirical one in the sense that the parameters were estimated directly from the soccer tracking data.

Note that as shown in Figs. 4, 5, and 6, the plot in large \(v_{0}\) region is different from the others. This discrepancy is because of the few data points of the heat maps in the large \(v_{0}\) region. For example, a player with large \(v_{0}\) is unlikely to move in the opposite direction during \(\Delta t\). Namely, the larger the \(v_{0}\), the fewer the data points in the region \(x < 0\) in the heat maps. The estimated values of \((x_{c}, y_{c})\) and \(r_{c}\) for large \(v_{0}\) values appear to be incorrect because they are calculated using the maximum and minimum values of the heat maps x and y coordinates. In the future, the use of large-scale tracking data could eliminate such problems.

We comment on the results of the \(\Delta t\) dependence of the kinetic parameters shown in Fig 7. This result requires careful consideration from two perspectives. First, the x and y coordinates of our tracking data contain an error of \(\pm 1\) m by assessing the accuracy of the TRACAB system. This error cannot be ignored in the analysis of the heat maps, especially when \(\Delta t \lesssim 1\) s. Therefore, we must check the validity of the rapid increase in \(\alpha\) and \(V_{\textrm{max}}\) in Fig. 7 using more accurate data. The second possibility is that the results in Fig. 7 indicate the limitation of the Fujimura–Sugihara model. The Fujimura–Sugihara model must be extended to reproduce the behavior shown in Fig. 7. One possible extension is a model in which the magnitudes of the driving force F and viscous resistance k in Eq. (3) are time-dependent:

$$\begin{aligned} m \frac{d^{2} \vec {x}(t)}{d t^{2}}&= F(t) \vec {n} - k(t) \frac{d \vec {x}(t)}{d t}. \end{aligned}$$
(9)

This equation is known as the variable coefficient second-order linear ODE. As the time dependence of kinetic parameters is conceivable in soccer, the analysis of this extended motion model can be a challenging and significant future topic in soccer game analysis.

Previously, Brefeld et al. demonstrated that the arrival region of soccer players becomes elliptical26. Fernández et al. modeled the player influence area as elliptical when the initial speed is large8. However, Anzer et al. pointed out that the smaller the speed interval \(\Delta v_{0}\), the closer the shape of the arrival region is to be circular rather than elliptical23. This study supports the results of Anzer et al. with a more detailed and comprehensive analysis. Furthermore, we have recently shown that soccer players’ sprints satisfy the characteristics of the Fujimura–Sugihara model, that is, \(v_{0}\) dependence of the arrival circle for fixed \(\Delta t\). Our results suggest that a relatively simple model can describe soccer players’ sprints, although a slight discrepancy exists between actual observations and model predictions. Furthermore, the motion model under the sprint condition was utilized for pass prediction17,21,22 and for modeling the dominance and influence of soccer players at each location in the field10. Our observations are fundamental characteristics that the motion model should satisfy and provide direction for modeling soccer players’ motions.

In conclusion, the Fujimura–Sugihara model effectively predicts the arrival point of soccer players by sprinting, except when the time interval \(\Delta t\) is small. The boundary of the player’s arrival region is shown to be circular rather than elliptical; the initial speed dependence of the arrival region satisfies the model. In the case of sprinting, kinetic parameters in the model can be estimated directly from the soccer tracking data.