Abstract
During the COVID-19 pandemic, governments faced difficulties in implementing mobility restriction measures, as no clear quantitative relationship between human mobility and infection spread in large cities is known. We developed a model that enables quantitative estimations of the infection risk for individual places and activities by using smartphone GPS data for the Tokyo metropolitan area. The effective reproduction number is directly calculated from the number of infectious social contacts defined by the square of the population density at each location. The difference in the infection rate of daily activities is considered, where the ‘stay-out’ activity, staying at someplace neither home nor workplace, is more than 28 times larger than other activities. Also, the contribution to the infection strongly depends on location. We imply that the effective reproduction number is sufficiently suppressed if the highest-risk locations or activities are restricted. We also discuss the effects of the Delta variant and vaccination.
Similar content being viewed by others
Introduction
Since the beginning of the COVID-19 pandemic in 2019, there have been 452 million confirmed cases and over six million deaths globally as of 12 March 2022, posing serious healthcare challenges1,2,3. Most governments have struggled to control this disease and simultaneously minimize its damage to daily and economic activities owing to limited time and resources4,5,6,7,8. Along with vaccinations, nonpharmaceutical interventions are considered essential for managing this disease9,10,11,12,13. For example, the governments around the Tokyo metropolitan area in Japan declared states of emergency (SoEs) that limited daily and economic activities including schools, department stores, cinemas, restaurants, bars, and travel to reduce human mobility in public spaces5,9. Consequently, the number of social contacts and, thereby, the effective reproduction number decreased. Governments require reliable quantitative estimations of the effect of such policies on the pandemic, that is, a predictable model, for making informed decisions.
Studies have estimated the effectiveness of lockdowns or non-compulsory measures such as SoEs on reducing human mobility9,10,11,12,13. However, the effect of reduced human mobility on the pandemic remains unclear, as it is insufficient to investigate the number of social contacts alone. Infectious diseases, including the COVID-19 pandemic, have been studied using global positioning system (GPS) data14,15,16,17,18,19,20,21,22,23. However, with such diseases, the types of social contacts are considered more critical. Specifically, the infection rate is known to strongly depend on whether social contacts are wearing masks or talking and on the ventilation state of the rooms they are in24,25,26,27. The point is that the number of social contacts through each daily activity type should be investigated.
In this study, we propose a model to predict the effective reproduction number of COVID-19 based on daily human activities inferred from smartphone GPS data. From these data, first, we categorize citizens’ daily activity patterns into four types and estimate the population density for each activity, location, and time in the Tokyo metropolitan area. We also calculate the number of social contacts for each activity by assuming it to be proportional to the square sum of the population density. Second, we propose an activity-dependent infection model based on the susceptible–infectious–removed (SIR) model. This model seems to have a lower resolution than that of other compartmental models (e.g., susceptible–exposed–infectious–removed (SEIR) model). However, it is suitable for direct formulation using GPS data. The effective reproduction number is a linear combination of the number of social contacts, and the coefficients are the infection rate per contact. We determine the parameters to fit the empirical data before the spread of the Delta variant and verify that the model is sufficient to predict the effective reproduction number. Then, we compare the parameters to observe the activity that has the highest infection risk. We also calculate the effective reproduction number for each daily activity, location, and individual. The model prediction is valid for up to around 2 weeks after the human mobility data are obtained. Third, we investigate effects other than those of human mobility. We show that the domination of the Delta variant is described by only two parameters: starting date and ratio of effective reproduction number to that of existing variants. Furthermore, we demonstrate the effectiveness of vaccination through its high prevention ratio of infection in a metropolis.
Methods
Epidemiological data
COVID-19 epidemiological data were provided by the Ministry of Health, Labour, and Welfare in Japan28. The data consist of the number of new infection cases and severe cases from Feb. 2020 and May 2020, respectively, to Oct. 2021. We took a 7-day moving average to remove the periodicity of epidemiological data in a week. The moving average was taken from 6 days prior to the target date. Vaccination statistics were obtained from the Prime Minister of Japan and His Cabinet29.
GPS data
Our GPS data from 1 Jan. 2020 to 30 Sep. 2021 were purchased from Agoop Corp, Japan30, which collected the information of smartphones’ location only if the users downloaded some smartphone applications provided by Agoop and gave permission and their consent to the terms of use for the applications following the Privacy Policy of Agoop Corp31. The example and description of the data are given in Supplementary note 2. The data were fully anonymized. Namely, all user IDs were renewed every midnight, so any tracking of individual persons across the days is impossible. The median accuracy of the latitude/longitude data was 10 m. However, the precise home location was masked by limiting the data resolution around their home for privacy protection in the following way. If a user is located in a 1 km-square where their home is located, the location data was replaced by that of the centre of the square; also, the size of the square extends to 10 km-square in low population areas. This square size is used only for the data provider to mask the users’ homes in very low population areas. As for users’ workplaces, the city’s name is given in the resolution of municipalities. The GPS data has a bias in generations: the Ministry of Internal Affairs and Communications, Japan, reported that the ratio of the smartphone users is concentrated between 13 and 59 ages in 201932. A bias in the home location is discussed in Supplementary note 3. We confirmed that the population observed in the GPS data is similar to the actual population in the census data, although the GPS data has a bias to concentrate on the high-density population areas more than the census data. The anonymized GPS data is commercially available from Agoop Corp., which complies with the Japanese Personal Information Protection Act. Human Subjects Research Ethics Review Committee of Tokyo Institute of Technology authorized that no ethical review is needed in this case.
We pre-processed the original data to obtain 15 min interval data using linear interpolation, discarding data that cannot be interpolated sufficiently. The timestamps were every 15 min from 05:00 to 24:00. Data for midnight were discarded because most people stay in bed and do not spread infection. The home/work city data were converted into the home/work 1 km-square data. A home square is a 1 km-square where the user stays at 05:00 in the home city, and a work square is where the user visits for at least 5 h in a day in the work city. Data for iOS smartphones were discarded because they are not gathered if the user is not moving due to the iOS specification, making staying at home undetectable. The converted data cover approximately 0.4 million users in Japan. We investigated the population corresponding to a GPS point to renormalize the GPS data to the actual population distribution. We counted the users at 05:00 for each home prefecture defined by their home city’s prefecture for each day. The effective population of one GPS data point in a prefecture was calculated as the ratio of the actual population in the prefecture in Oct. 201933 to the number of users counted. At the beginning of 2020, the typical values ranged from 250 in metropolitan areas to 500 in the countryside.
Results
Epidemiological data
First, we analysed COVID-19 epidemiological data in the Tokyo metropolitan area (Tokyo, Kanagawa, Saitama, Chiba, Ibaraki, Gumma, Tochigi, and Yamanashi prefectures) from 1 Feb. 2020 to 31 Oct. 202128. t denotes the day count from the beginning of 2020 (e.g., \(t=1\) means 1 Jan. 2020) and \(I^{\mathrm{new}}(t)\), the number of new COVID-19 infection cases on day t. The effective reproduction number \(R_{e}^{\mathrm{data}}(t)\) was estimated as34
where \(\gamma ^{-1} = 5\) is the mean generation time35. Figure 1a shows a plot of the effective reproduction number. Before 30 Mar. 2020, the data is unreliable because the number of PCR tests per day is under 10,000 in this period, and the confirmation of infection is not considered comprehensive. In this span, the effective reproduction number was estimated using an average of 40 days. Assuming the effective reproduction number is constant as \(R_{e}^{\mathrm{data}}(t)=R_{e}^{\mathrm{avr}}\), Eq. (1) results in \(I^{\mathrm{new}}(t)\left( R_{e}^{\mathrm{avr}}\right) ^{4}=I^{\mathrm{new}}(t+4\gamma ^{-1})\). Here we sum the number of newly confirmed cases over the former 20 days and the latter 20 days of the 40 days. Then, we obtain \(R_{e}^{\mathrm{avr}} = \left( \frac{\sum _{t^{\prime }=t+20}^{t+39} I^{\mathrm{new}}(t^{\prime })}{\sum _{t^{\prime }=t}^{t+19}I^{\mathrm{new}}(t^{\prime })}\right) ^{1/4}\), where we did not take a 7-day moving average for the calculation of \(I^{\mathrm{new}}(t)\). That is, the average effective reproduction number of the 40 days is determined by the ratio of the number in the latter 20 days to that in the former 20 days to the power of 1/4. The difference from the estimation by the Cori method36 is shown in Supplementary note 4. The result of the method in this study is consistent with that of the Cori method. Similarly, we plotted the effective reproduction number limited in severe cases, which is defined using only the number of new severe cases instead of the number of all infections. The correlation between the effective reproduction numbers for all cases and severe cases is maximized if that for severe cases is regarded to be delayed by 12 days. Both are consistent with each other after \(t=200\). The 1st to 4th SoEs in the Tokyo metropolitan area5 are also shown. During each SoE, the effective reproduction number started to decrease after around 2 weeks. The Delta variant was first detected in Japan on 18 May 2021 (i.e., \(t=504\))37.
Human mobility and activity patterns
We analysed empirical human mobility data in Japan, especially in the Tokyo metropolitan area, from 1 Jan. 2020 to 30 Sep. 202130. We calculated the population density at each time and location to estimate the number of infection routes. The infection probability of each infection route depends on how people contact each other and, consequently, on the activity. Therefore, we categorized the GPS user activity into four states, staying home (‘home’), moving (‘move’), staying out (‘stay’), and working (‘work’), to distinguish the activity-dependent infection probability with a time resolution of 15 min. The definition is as follows: (a) home is the state when the user is within the 1 km-square corresponding to their home (home square), (b) work is the state when the user is within the 1 km-square corresponding to their workplace (work square), (c) move is the state in which the user is within a 1 km-square that is different from the one in which they were 15 min ago and that does not correspond to either their home or workplace, and (d) stay-out is any other situation, e.g., in restaurants, department stores, or stadiums away from their home. Here, the home and work locations (home/work square) are estimated from the home/work city data in the GPS dataset, as explained in the “Methods” section. For an example of the categorization, if a user is at home/work 1 km-square, then the activity type is home/work (home is preferred if both are the same). If a user moves from another 1 km-square, the activity is move unless the following location is not home/work square. Even if the user stays for over 5 h, the activity type is not work in a location other than the work city. Note that home activity is not limited to staying at home literally. It also includes activities within a 1 km-square around the home location, such as shopping and having a meal at a restaurant.
Figure 1b shows a plot of the time series of the mean duration per day of the four activities, where we take the 7-day moving average for avoiding the periodicity related to weekdays. The total time per day is 19 h because the data point is from 05:00 to 24:00. During the pandemic, roughly speaking on average, users were at home for 12 h (not including midnight), at work for 3 h, moved for 2 h, and stayed out for 2 h. Note that this is averaged over all days including holidays for all users. We also calculated the correlation among the mean durations of the four states in the span \(200 \le t < 500\). This span was used later for model parameter fitting. Move was strongly correlated with work and stay.
Model
We derived an infection model based on the classical SIR model using the GPS data for the four states with several assumptions. The classical SIR model is given by the following set of equations:
where t is the time; S(t), the number of susceptible people; I(t), the number of infectious people; R(t), the number of removed people (either recovered, quarantined, or dead); and \(N=S(t)+I(t)+R(t)\), the total number. The parameter \(\beta\) represents the strength of infection spread, and the parameter \(\gamma\) determines the timescale from infected state to recovered or quarantined state. The reciprocal of \(\gamma\) is the mean generation time35: \(\gamma ^{-1}=5\). We adopted the following four assumptions:
-
(A1)
the number of removed people is much smaller than the number of susceptible people,
-
(A2)
no nonlocal infections occur between different spaces or time periods,
-
(A3)
infectious people are distributed uniformly in the target area (i.e., Tokyo metropolitan area), and
-
(A4)
no infections occur between different activities.
Assumption (A1) comes from the low cumulative number of COVID-19 cases in Japan (below 2% on 30 Oct. 2021)28. Then, the number of susceptible people is assumed to be constant, that is, the whole population of Japan. Assumption (A2) means that we discard indirect infections such as droplet infection with long-distance or contact infection after a long time38. Nonetheless, indirect infections are effectively included if the population in the target area does not change drastically. Assumption (A3) is used for simplifying the model. As our GPS data does not contain users’ privacy including the information of infection, we simply assume that infected people distribute uniformly. We discard the effect from infected people who went outside or inside of the Tokyo metropolitan area. Assumption (A4) is the intuition that social contacts between different activities are fewer than those within the same activity. In the Tokyo metropolitan area (and possibly most other metropolitan cities), the office space and restaurants are separated even though they are on the same 1 km-square. Working people in office towns are considered to have a comparably small interaction with the stay-out people because most of them are working in their offices. In addition, the workers contacting with stay-out people (e.g., waiters contacting customers in restaurants) are strongly recommended to wear masks5 and considered to be less infectious.
By applying the first assumption (A1), we only need to consider the second equation of the SIR model because infection causes a negligible change in S(t). In this case, the equation becomes
where \(R_{e}(t)=\frac{\beta }{N\gamma }S(t)\) is an effective reproduction number. We discretize the equation by setting \(dt=\gamma ^{-1}\) as
where \({\tilde{\beta }} = \frac{\beta }{N\gamma }\). This discretization is needed for fitting the empirical data.
Next, we adopt the second assumption (A2) of local infections. We suppose that all people in the target area have equal possibility of close contact with each other. In this case, the time evolution of I(t) is described by the product of I(t) and S(t) because the number of infection routes or social contacts is proportional to I(t)S(t). In this sense, the classical SIR model is exact only if the population density is uniform. However, this is not the case in real situations. Therefore, we divide the area into squares (grids) in which social contacts are equally possible among all people in a square. The size of the squares is set to 1 km \(\times\) 1 km. The case of 100 m \(\times\) 100 m9 is discussed in Supplementary note 1. Thus, we obtain
where m is the square label, and \(\tau\) is the time label in the considered area A and date t. The area A is the Tokyo metropolitan area, and it contains 36,898 1 km-squares. The time label \(\tau\) moves in 15 min intervals around the current date t for 5 days, from \(t-2\) to \(t+2\), according to the discretization unit \(\gamma ^{-1}=5\). The parameter \({\tilde{\beta }}\) takes a different value from the earlier equations because the space size and period are different: \({\tilde{\beta }}\) is multiplied by (number of 1 km-squares)/(number of time steps) if the density is uniform in space and time.
The third assumption (A3) gives the following relation:
where \(S(t)=\sum _{m\in A}S_{m\tau }(t)\), \(I(t)=\sum _{m\in A}I_{m\tau }(t)\), and A is the target area. Therefore, the effective reproduction number is given as
show consistent differencesFinally, assumption (A4) is applied. We divide the population density \(S_{m\tau }(t)\) into that of the activities \(S_{am\tau }(t)\) and introduce activity-dependent parameters \(\beta _{a}\). The equation is
where the time label \(\tau\) moves from \(t-2\) to \(t+2\), and the activity label a takes the values ‘home’, ‘move’, ‘stay’, and ‘work’. The coefficient \(\beta _{a}\) is supposed to be constant in the timespan considered. For simplicity, we approximate the sum as follows:
for the equation to have values only on date t, where the time label \(\tau\) moves only on date t in the right-hand side. Thus, the effective reproduction number is a linear combination of the human mobility time series,
by introducing \(M_{a}(t)=\gamma ^{-1}\sum _{m\in A,\tau \in t}\left( S_{a m\tau }(t)\right) ^{2}\), the population moment of activity a on date t. We emphasize that the effective reproduction number is simply determined only by the human mobility on the same day. The parameters \(\beta _{a}\) are interpreted as the infection rate during activity a per infection route and 15 min period.
We determine the model parameters \(\beta _{a}\) to fit the effective reproduction number; however, the observed effective reproduction number is based on the report date and not the infection date. We have to consider a typical time delay from infection to report \(\Delta T\) to deal with it. In light of this effect, Eq. (12) is modified as follows:
Here, we use the 7-day moving average of \(M_{a}(t)\) to remove the periodicity of weekdays as well as the definition of the empirical effective reproduction number. We estimate the time delay from infection to report \(\Delta T\) in the period \(200 \le t < 500\). Figure 2 shows the correlation between the effective reproduction number and the population moment of each activity delayed by \(\Delta T\). The peak at \(\Delta T \sim 14\) is the delay of the infection reports, whereas the trough at \(\Delta T \sim -60\) means that human mobility is decreased after infection spread. The optimal value \(\Delta T=14\) is derived by minimizing a fitting loss to the effective reproduction number by the population moments as a function of \(\Delta T\), where the fitting loss is the squared distance in the log-10 space, to prevent the estimation from being dominated only by the significant value of the effective reproduction number. This value is consistent with that reported in a previous study39,40.
The remaining problem is the multicollinearity of the move population moment; the correlation of the population moments is 0.89 between move and stay-out and 0.90 between move and work. To address this, we assume that the coefficient \(\beta _{a}\) for move and work is the same because the situations in these two activities are similar: people wear masks in trains (move) and offices (work) but not always in restaurants (stay-out).
We fit the model parameters to the data for all cases in the period \(200 \le t < 500\). We do not use data for \(t<200\) because the social situation drastically changed during the 1st SoE (e.g., mask distribution in Japan41), and the data are not stationary. Furthermore, the effective reproduction numbers for all cases and severe cases are different: epidemiological data are not reliable in that span. The fitting is performed under the log space to prevent the estimation from being dominated only by the significant value of \(R_{e}(t)\). As a result, the parameters are \(\beta _{\mathrm{home}}=(1.2 \pm 0.1)\times 10^{-7}\), \(\beta _{\mathrm{move}}=\beta _{\mathrm{work}}=(6.2 \pm 2.8)\times 10^{-8}\), and \(\beta _{\mathrm{stay}}=(3.4 \pm 0.2)\times 10^{-6}\), where the unit is per 15 min. Figure 3 shows that the model explains the data in the period; however, they are different before \(t=200\) and after \(t=500\). Before \(t=200\), the effective reproduction number for severe cases is more consistent with the model result. After \(t=500\), the effects of the Delta variant and the vaccination result in differences29,37. Thus, the classical SIR model is suitable for direct GPS data modelling. Despite its simplicity, we observe that the model quantitatively explains the COVID-19 epidemic.
Next, we change the threshold concerning the activity duration to investigate the sensitivity of the fitting parameters. As explained in the “Methods” section, a work 1 km-square has been detected by the user’s dwelling for at least 5 h in their work city. Here, we check the cases of 4 h and 6 h minimum dwelling time. The parameters for the 4 h, 5 h, and 6 h cases are obtained in Table 1. The estimated values of \(\beta _{\mathrm{home}}\) and \(\beta _{\mathrm{stay}}\) are considered to be unchanged within the margin of the standard deviation, while the values of \(\beta _{\mathrm{move}}\) and \(\beta _{\mathrm{work}}\) show a tendency to take smaller values for longer minimum dwelling time. This difference is because the infection rate in move and work is much lower than the other parameters and has a more significant fluctuation relatively. If we decrease the minimum dwelling time, the work hour tends to be counted more. For example, the spent time percentages (move, stay-out, work) in public are (25%, 32%, 42%) for the 4-h case, (26%, 34%, 40%) for the 5-h case, and (26%, 36%, 38%) for the 6-h case, on the average of a week starting from 9 to 15 January 2020, in which COVID-19 did not affect the human mobility.
We also estimate the sensitivity of move activity. For explanation, here we suppose the case that the user is not at home or work. In this supposition, move is the state in which the user is out of a 1 km-square where they were located 15 min ago. This definition means that the minimum moving time is evaluated as 15 min. Instead, we consider the interpretation that the minimum moving time is 30 min. We append an additional criterion for move that the 1 km-square in the next 15 min is different from the present. For instance, if a user is in A at 6:00 and B at 6:15, the state is move both at 6:00 and 6:15 under this criterion, while it is move only at 6:15 under the previous criterion. The result in this case is shown in Table 1. The parameter \(\beta _{\mathrm{stay}}\) is increased because the effective reproduction number is supported by the decreased stay-out time. The ratio \(\beta _{\mathrm{stay}} / \beta _{\mathrm{home}}\) is increased to about 35.
In addition, we check the case with the assumption of the relation, \(\beta _{\mathrm{move}}=\beta _{\mathrm{stay}}\). We have investigated the case of \(\beta _{\mathrm{move}}=\beta _{\mathrm{work}}\) because of the multicollinearity of the population moment of move with that of stay-out and work. If we assume the opposition case \(\beta _{\mathrm{move}}=\beta _{\mathrm{stay}}\) instead, the parameters are changed as in Table 1. As a result, the parameters for home, work and stay decrease, while the parameter for move soars. The ratio \(\beta _{\mathrm{stay}} / \beta _{\mathrm{home}} = 29\) is still close to that in the case of \(\beta _{\mathrm{move}}=\beta _{\mathrm{work}}\).
Delta variant and vaccination effects
The effects of the Delta variant and vaccination are estimated as follows42. Let \(r_{\delta } = R^{\delta } / R^{0}\) be the ratio of the effective reproduction number of the Delta variant \(R^{\delta }\) to that of the other existing variants \(R^{0}\). Furthermore, we assume that the numbers of people infected by the Delta variant \(I^{\delta }(t)\) and the existing variants \(I^{0}(t)\) are the same at \(t=t_{\delta }\). If both variants increase independently, the ratio is \(I^{\delta }(t)/I^{0}(t) = (R^{\delta })^{\gamma (t-t_{\delta })}/(R^{0})^{\gamma (t-t_{\delta })} = r_{\delta }^{\gamma (t-t_{\delta })}\). By definition, the effective reproduction number of mixed variants \(R^{\mathrm {mix}}(t)\) is
where \(S(x)=1/(1+\exp (-x))\) is the standard sigmoid function. The range of \(R^{\mathrm {mix}}(t)\) is \(R^{0}\) to \(R^{\delta }\). Therefore, the effect of the Delta variant is introduced by multiplying by the factor \(1+(r_{\delta }-1)S(\gamma \log (r_{\delta })\cdot (t-t_{\delta }))\). The vaccination model is also multiplied by a factor
where \(C_{V}\) is the infection prevention ratio of the vaccine43,44, and \(R_{V}(t)\) is the vaccination ratio in the target area. We approximate the vaccination ratio as \(R_{V}(t)=\)(number of vaccinations in Japan up to date t)/2(Japanese population) and fit its data by a sigmoid function
as shown in Fig. 4. The parameters are \(t_{V}=590\), \(\gamma _{V}^{-1}=34.6\), and \(R_{V}^{\infty }=0.817\). The sigmoid function S(x) is suitable for fitting the vaccination ratio because it is saturated in the limit \(x\rightarrow \pm \infty\).
We determine the other parameters of the Delta variant and the vaccination after \(t=500\) by fitting the epidemiological data as \(r_{\delta }=1.68 \pm 0.03\), \(t_{\delta }=530 \pm 2\), and \(C_{V}=0.99 \pm 0.02\), where the model equation is finally
Figure 3 implies that the effects of the Delta variant and the vaccination are fully explained. The values of the parameters, effective reproduction number ratio of the Delta variant, and vaccination’s effectiveness are comparable to those reported previously42,43,44,45,46.
Components of effective reproduction number
The effective reproduction number for each activity, location, and person was investigated. Figure 5 shows the result for each activity: the sum equals the total effective reproduction number. The stay-out activity dominates the change in the whole effective reproduction number because the parameter for stay-out is 28 and 55 times larger than that for home and move/work, respectively. The effective reproduction number at each location m is defined by the partial sum of Eq. (12)
as shown in Fig. 6 (a) 2 months just before the 1st SoE, (b) during the 1st SoE, (c) 2 months just before the 2nd SoE, and (d) during the 2nd SoE. The movie of the effective reproduction number map each day is provided as Supplementary movie 1, where a raw value is taken instead of a 7-day moving average to calculate the population moments. The sum over the Tokyo metropolitan area gives the total effective reproduction number. High-risk zones are concentrated in downtown Tokyo. Figure 7 shows the cumulative distribution. This figure shows that the distribution is almost the same for low-risk regions, and a power law approximates it for high-risk regions whose exponents are (a) − 0.93, (b) − 1.73, (c) − 1.03, and (d) − 1.12. Exponents close to − 1 imply that the highest-risk regions dominantly affect the total effective reproduction number. In fact, the top-20 highest-risk 1 km-squares in the Tokyo metropolitan area (36,898 km\(^{2}\)) contribute (a) 40%, (b) 13%, (c) 32%, and (d) 27% of the total effective reproduction number. Consequently, restrictions in the highest-risk downtown regions effectively suppress infections. For example, the total effective reproduction number is reduced by 17%, 25%, and 36% if the infection rate is suppressed to 10% in the top-5, -10, and -20 highest-risk 1 km-squares in period (a), respectively. With regard to real restrictions, the 1st SoE successfully reduced the population density in the downtown region. The SoEs reduced the exponents; however, the change amount of the exponent before and after SoEs also decreased.
The effective reproduction number for each infected person47,48 is defined by the mean number of people to whom they would spread the infection; the average for all people gives the overall effective reproduction number. We calculate it using the time series of the activity and the location in a day by the following relation:
where \(\tau\) is the time label with 15 min intervals, and \(a_{i}(\tau )\) and \(m_{i}(\tau )\) represent the history of person i. We refer to this value as the GPS-based individual effective reproduction number. Figure 8 shows the cumulative distribution function (CDF) of \(R_{e}^{(i)}(t)\), where we collect all data for (a) 2 months just before the 1st SoE, (b) during the 1st SoE, (c) 2 months just before the 2nd SoE, and (d) during the 2nd SoE. The distribution is approximated by truncated power laws. The exponents between − 1 and 0 in (a) and (c) clearly indicate that significant cluster infections or superspreaders dominate the total infection. In fact, the top 10% of people contribute to (a) 58%, (b) 38%, (c) 54%, and (d) 50% of the overall effective reproduction number. A comparison before and during SoEs reveals that the cut-off does not change while the exponent is decreased. This implies that the effects of the SoE are represented by the reduction of the exponent in the CDF power law; it results in the suppression of the overall effective reproduction number. The cut-off corresponds to the largest infection clusters or superspreaders, which was not suddenly changed by SoEs; however, we observe that it decreased slowly from the 1st to the 2nd SoE.
Discussion
In this study, we verified human-activity-dependent COVID-19 infection rates using smartphone GPS data. We classified human activity patterns into four types, ‘home’, ‘move’, ‘stay-out’, and ‘work’, and estimated the number of social contacts for each daily activity in the Tokyo metropolitan area. Then, we derived an equation from the classical SIR model for GPS data with activity information to be used directly. The model successfully predicted effective reproduction numbers for future reporting. We demonstrated that infection risk is the highest when the people are not at home or work or not moving. By quantitatively understanding the effect of human mobility on infection spread, we distinguished the impact of the Delta variant and vaccination. Furthermore, we derived formulas that divide the effective reproduction number into the contributions from each location or individual. These formulas enabled us to observe the distributions of infection risk. As applications, we present an effective reproduction number map and GPS-based individual effective reproduction numbers, whose distributions obey the power law or truncated power law.
The model provides a comprehensive understanding of infection spread of epidemics. A previous research by Yabe et al.9 investigated the COVID-19 spread in the Tokyo metropolitan area during 2020 in detail and discovered the nonlinear relation between the effective reproduction number and the contact index, the number of social contacts among people not in their homes. The nonlinear relation is explained by the change in the ratio of the population moment of stay-out to that of work and move. Another previous research conducted by Rüdiger et al.15 shows that the non-uniformity in the infective contact network has an important role as well as the total number of social contacts. The difference from this study in approach is the origin of the non-uniformity in social contacts: the non-uniform distribution of the infectious people in their study [c.f. it can break the assumption (A3)], and the type of the social contacts in this study.
We validated the model in the following two ways: demonstrating the stability of prediction by using a shorter fitting span and comparing the results with a previous study. We have already shown the result of fitting in the span \(200 \le t < 500\), but checking the result in another shorter span is meaningful as cross-validation. Here we fit the parameters in the shorter span \(200 \le t < 300\) to check the model stability as shown in Supplementary note 4. We observe that both errors in the parameters and prediction are reasonable despite the data containing non-stationarity. Another comparison is with the airborne transmission dynamics27 in possible situations in daily life. Previous research by Prentiss et al.27 indicated that the infection risk is increased by 10 times or more depending on the use of masks, filtration, and talking. Their results are consistent with ours that stay-out activity, including eating at restaurants, enhances the infection rate more than 10 times.
The assumptions (A1)–(A4) determines the limitation of our model, but relaxing some of them can lead to a new application or complex model for epidemics. If we do not adopt the assumption (A1), the effective reproduction number is decreased to the ratio \((1-R(t)/N)\). This is because the possibility of infection in each location decreases to the same ratio. For the assumption (A3), the non-uniformity in the infective contact network is considered. Although it is impossible to track each person across the days in the GPS dataset for privacy protection, preferences of people with high infection risk could be assumed or observed in another dataset to estimate the spatial distribution of infected people.
For future research, the infection spread to the whole country could be simulated. In the second wave of the pandemic in Japan, after the infection spread among metropolitan areas, it exploded into countryside areas28. We have already started the research of modelling the infection spread across distant regions. The model will be adopted in other metropolitan areas to validate the model further, and the parameters possibly dependent on the regions will be determined and discussed. After analysing the bulk property of epidemics in each area, we will introduce the interaction between the regions into the model, using the GPS data of travelers.
For practical applications, the result of this study may be helpful for evidence-based policy making for epidemic control, such as non-pharmaceutical interventions under pandemics. The following questions will be answered: “How does the human mobility decrease in target areas affect the effective reproduction number?” and “What intervention is the most effective under given constraints?”. In our estimation, the contribution from the downtown area fills a large amount of the effective reproduction number. It will be possible to suggest the locations where activities such as stay-out should be limited enough to realize a goal effective reproduction number. This model can also be applied to real-time forecasting of the infection spread and assessing the individual infection risk. We are preparing a web page that will illustrate the infection risk of each location and time on a map, using the epidemiological data and the semi-real-time GPS data provided by Agoop. People can check the infection risks of visiting places on the web page. Also, we suggest implementation in a smartphone application that tracks the user’s location each time. The smartphone application downloads the estimated infection risks of the visiting places each day. After the day, the individual effective reproduction number, which represents both the risk of transmitting infection and being transmitted, is given a notice to the users for risk management. Finally, we are proceeding with a study to estimate the trade-off relations between the infection spread and the economy, using economic data on each site together. We will analyse the relationship between human mobility and the companies’ sales in the leisure industry, such as restaurants, department stores, and traveling. The sales of those companies that are generated in each location are estimated as a function of human mobility, with the data of the spatial distributions and the mean expenditure of such restaurants or stores. This investigation bridges the effective reproduction number with the economy in each location.
Data availibility
The data that support the findings of this study are available from Agoop Corp.30,50 but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Agoop Corp.30,50
References
WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/. Accessed 13 March 2022.
Timeline: WHO’s COVID-19 Response. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/interactive-timeline/. Accessed 13 March 2022.
Huang, C. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet 395, 475–481. https://doi.org/10.1016/S0140-6736(20)30183-5 (2020).
Anderson, R. M., Heesterbeek, H., Klinkenberg, D. & Hollingsworth, T. D. How will country-based mitigation measures influence the course of the COVID-19 epidemic?. Lancet 395, 931–934 (2020).
Tokyo Metropolitan Government. https://www.metro.tokyo.lg.jp/english/index.html. Accessed 13 March 2022.
Ashraf, B. N. Economic impact of government interventions during the COVID-19 pandemic: International evidence from financial markets. J. Behav. Exp. Finance 27, 100371. https://doi.org/10.1016/j.jbef.2020.100371 (2020).
Aristovnik, A., Keržič, D., Ravšelj, D., Tomaževič, N. & Umek, L. Impacts of the COVID-19 pandemic on life of higher education students: A global perspective. Sustainability 12, 8438. https://doi.org/10.3390/su12208438 (2020).
Di Domenico, L., Pullano, G., Sabbatini, C. E., Boëlle, P.-Y. & Colizza, V. Modelling safe protocols for reopening schools during the COVID-19 pandemic in France. Nat. Commun. 12, 1073. https://doi.org/10.1038/s41467-021-21249-6 (2021).
Yabe, T. et al. Non-compulsory measures sufficiently reduced human mobility in Tokyo during the COVID-19 epidemic. Sci. Rep. 10, 18053. https://doi.org/10.1038/s41598-020-75033-5 (2020).
Wellenius, G. A. et al. Impacts of social distancing policies on mobility and COVID-19 case growth in the us. Nat. Commun. 12, 3118. https://doi.org/10.1038/s41467-021-23404-5 (2021).
Kraemer, M. U. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science 368, 493–497. https://doi.org/10.1126/science.abb4218 (2020).
Lai, S. Effect of non-pharmaceutical interventions for containing the COVID-19 outbreak in China. medRxivhttps://doi.org/10.1101/2020.03.03.20029843 (2020).
Pepe, E. COVID-19 outbreak response: A first assessment of mobility changes in Italy following national lockdown. medRxivhttps://doi.org/10.1101/2020.03.22.20039933 (2020).
Oliver, N. et al. Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Sci. Adv. 6, eabc0764. https://doi.org/10.1126/sciadv.abc0764 (2020).
Rüdiger, S. et al. Predicting the SARS-CoV-2 effective reproduction number using bulk contact data from mobile phones. PNAShttps://doi.org/10.1073/pnas.2026731118 (2021).
Oliver, N. Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle. Sci. Adv. 6, eabc0764 (2020).
Pepe, E. et al. COVID-19 outbreak response, a dataset to assess mobility changes in Italy following national lockdown. Sci. Data 7, 1–7 (2020).
Gao, S. et al. Association of mobile phone location data indications of travel and stay-at-home mandates with COVID-19 infection rates in the US. JAMA Netw. Open 3, e2020485–e2020485 (2020).
Huang, X., Li, Z., Jiang, Y., Li, X. & Porter, D. Twitter reveals human mobility dynamics during the COVID-19 pandemic. PLoS ONE 15, 1–21. https://doi.org/10.1371/journal.pone.0241957 (2020).
Grantz, K. H. et al. The use of mobile phone data to inform analysis of COVID-19 pandemic epidemiology. Nat. Commun. 11, 4961. https://doi.org/10.1038/s41467-020-18190-5 (2020).
Hunter, R. F. et al. Effect of COVID-19 response policies on walking behavior in us cities. Nat. Commun. 12, 3652. https://doi.org/10.1038/s41467-021-23937-9 (2021).
Arik, S. O. et al. Interpretable Sequence Learning for COVID-19 Forecasting. https://storage.googleapis.com/covid-external/COVID-19ForecastWhitePaper.pdf.
Shida, Y., Takayasu, H., Havlin, S. & Takayasu, M. Universal scaling of human flow remain unchanged during the COVID-19 pandemic. Appl. Netw. Sci. 6, 1–13. https://doi.org/10.1007/s41109-021-00416-0 (2021).
Worby, C. J. & Chang, H.-H. Face mask use in the general population and optimal resource allocation during the COVID-19 pandemic. Nat. Commun. 11, 4049. https://doi.org/10.1038/s41467-020-17922-x (2020).
Kwon, S. et al. Association of social distancing and face mask use with risk of COVID-19. Nat. Commun. 12, 3737. https://doi.org/10.1038/s41467-021-24115-7 (2021).
Moritz, S. et al. The risk of indoor sports and culture events for the transmission of COVID-19. Nat. Commun. 12, 5096. https://doi.org/10.1038/s41467-021-25317-9 (2021).
Prentiss, M., Chu, A. & Berggren, K. K. Finding the infectious dose for COVID-19 by applying an airborne-transmission model to superspreader events. PLoS ONE 17, 1–23. https://doi.org/10.1371/journal.pone.0265816 (2022).
Ministry of Health, Labour and Welfare, Japan. Visualizing the data: Information on COVID-19 infections. https://covid19.mhlw.go.jp/extensions/public/en/index.html. Accessed 13 March 2022
Prime Minister of Japan and His Cabinet. https://japan.kantei.go.jp/ongoingtopics/vaccine.html. Accessed 13 March 2022.
Agoop. https://www.agoop.co.jp/. Accessed 4 July 2022 (in Japanese)
Agoop Privacy Policy. https://www.agoop.co.jp/privercy/. Accessed 4 July 2022 (in Japanese)
“Communications Usage Trend Survey” in 2019 Compiled, Ministry of Internal Affairs and Communications, Japan. https://www.soumu.go.jp/johotsusintokei/tsusin_riyou/data/eng_tsusin_riyou02_2019.pdf. Accessed 7 June 2022.
Statistics Bureau of Japan. https://www.stat.go.jp/english/index.html. Accessed 13 March 2022.
Alessandro, A. & Tommi, A. Effective Reproduction Number Estimation from Data Series. Tech. Rep. JRC121343, Publications Office of the European Union (2020). https://doi.org/10.2760/036156.
Nishiura, H., Linton, N. M. & Akhmetzhanov, A. R. Serial interval of novel coronavirus (COVID-19) infections. Int. J. Infect. Dis. 93, 284–286. https://doi.org/10.1016/j.ijid.2020.02.060 (2020).
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am. J. Epidemiol. 178, 1505–1512. https://doi.org/10.1093/aje/kwt133 (2013).
New variants of a novel coronavirus (SARS-CoV-2) of concern for increased infectivity and transmissibility and altered antigenicity (12th report). https://www.niid.go.jp/niid/ja/diseases/ka/corona-virus/2019-ncov/2484-idsc/10554-covid19-52.html (in Japanese)
Chia, P. Y. et al. Detection of air and surface contamination by SARS-CoV-2 in hospital rooms of infected patients. Nat. Commun. 11, 2800. https://doi.org/10.1038/s41467-020-16670-2 (2020).
Linton, N. M. et al. Incubation period and other epidemiological characteristics of 2019 novel coronavirus infections with right truncation: A statistical analysis of publicly available case data. J. Clin. Med. 9, 538 (2020).
Shimada, T., Suimon, Y. & Izumi, K. On the relation between active population and infection rate of COVID-19. arXiv preprint arXiv:2008.07791v2 (2020).
Policy of distributing 2 cloth masks per address. https://www3.nhk.or.jp/news/html/20200401/k10012362911000.html. Accessed 13 March 2022 (in Japanese) .
Figgins, M. D. & Bedford, T. SARS-CoV-2 variant dynamics across us states show consistent differences in effective reproduction numbers. medRxivhttps://doi.org/10.1101/2021.12.09.21267544 (2021).
About Pfizer’s COVID-19 Vaccine. https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/vaccine_pfizer.html. Accessed 13 March 2022 (in Japanese).
Marziano, V. et al. The effect of COVID-19 vaccination in Italy and perspectives for living with the virus. Nat. Commun. 12, 7272. https://doi.org/10.1038/s41467-021-27532-w (2021).
Liu, Y. & Rocklöv, J. The reproductive number of the Delta variant of SARS-CoV-2 is far higher compared to the ancestral SARS-CoV-2 virus. J. Travel Med. 28, taab124. https://doi.org/10.1093/jtm/taab124 (2021).
Information for Healthcare Professionals on COVID-19 Vaccine Pfizer/BioNTech (Regulation 174). https://www.gov.uk/government/publications/regulatory-approval-of-pfizer-biontech-vaccine-for-covid-19/information-for-healthcare-professionals-on-pfizerbiontech-covid-19-vaccine. Accessed 13 March 2022.
Rocha, L. E. C. & Masuda, N. Individual-based approach to epidemic processes on arbitrary dynamic contact networks. Sci. Rep. 6, 31456. https://doi.org/10.1038/srep31456 (2016).
Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359. https://doi.org/10.1038/nature04153 (2005).
Generic Mapping Tools (GMT). https://www.generic-mapping-tools.org/.
Dynamic Population Data, Agoop. https://www.agoop.co.jp/service/dynamic-population-data/. Accessed 4 July 2022 (in Japanese).
Acknowledgements
We thank Kenta Yamada, Yukie Sano, Takashi Shimada, and Takahiro Nishi for the helpful discussions. We thank Agoop for providing the GPS datasets. This work was supported by the Tokyo Tech World Research Hub Initiative (WRHI) Program of the Institute of Innovative Research, Tokyo Institute of Technology.
Funding
This work was supported by a Grant-in-Aid for Scientific Research (A) (Grant Number 21H04595) and Grant-in-Aids for Scientific Research (B) (Grant Number 18H01656 and 22H01711). The authors thank the Tokyo Tech World Research Hub Initiative (WRHI) Program of the Institute of Innovative Research, Tokyo Institute of Technology, for financial support.
Author information
Authors and Affiliations
Contributions
M.T. was the project leader and directed the entire research plan and writing of the manuscript. Y.S. pre-processed the raw GPS data and appended the activity information. J.O. analysed the pre-processed GPS data, developed the models, performed the numerical calculations, and wrote the manuscript. H.T. improved the data analysis and modelling methods and revised the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Supplementary Movie 1.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ozaki, J., Shida, Y., Takayasu, H. et al. Direct modelling from GPS data reveals daily-activity-dependency of effective reproduction number in COVID-19 pandemic. Sci Rep 12, 17888 (2022). https://doi.org/10.1038/s41598-022-22420-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-022-22420-9
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.