A dataset to assess mobility changes in Chile following local quarantines

Fighting the COVID-19 pandemic, most countries have implemented non-pharmaceutical interventions like wearing masks, physical distancing, lockdown, and travel restrictions. Because of their economic and logistical effects, tracking mobility changes during quarantines is crucial in assessing their efficacy and predicting the virus spread. Unlike many other heavily affected countries, Chile implemented quarantines at a more localized level, shutting down small administrative zones, rather than the whole country or large regions. Given the non-obvious effects of these localized quarantines, tracking mobility becomes even more critical in Chile. To assess the impact on human mobility of the localized quarantines, we analyze a mobile phone dataset made available by Telefónica Chile, which comprises 31 billion eXtended Detail Records and 5.4 million users covering the period February 26th to September 20th, 2020. From these records, we derive three epidemiologically relevant metrics describing the mobility within and between comunas. The datasets made available may be useful to understand the effect of localized quarantines in containing the COVID-19 pandemic.

governments have started to collaborate with mobile network operators to estimate the effectiveness of control measures in several countries 2,10,46-50 . To assess the impact of the NPIs imposed by Chilean authorities in response to the epidemics, we analyse a mobile phone dataset provided by Telefónica Chile, which comprises 31 billion eXtended Detail Records (XDRs) and 5.4 million users distributed all over the country covering the period February 26th, 2020 to September 20th, 2020. An XDR is created every fifteen minutes if a certain threshold of traffic has been reached, thus describing individual movements in great detail 21 . From the XDRs, we derive three epidemiologically relevant metrics: the Index of Internal Mobility (IM int ), which quantifies the amount of mobility within each comuna of the country; the Index of External Mobility (IM ext ), quantifying the mobility between comunas; and the Index of Mobility (IM), which considers any movement, both within and between comunas. We analyse how these metrics change as the COVID-19 epidemics spread out in Chile, highlighting a considerable heterogeneity of response to local quarantines across the country.
The datasets we make available will grow as time goes by and, to the best of our knowledge, are the only ones describing mobility changes and dates of local quarantines in Chile at the comuna level. They can be used not only for fighting against the COVID-19 epidemics but will also benefit other research and applications such as emergency response 51,52 and crowd flow prediction 14,[53][54][55] . The datasets described are currently used at all levels of the Chilean government.

Methods
Mobile phone operators collect several different streams of mobile phones interaction with the cellular network for billing and operational purposes. Among them are the eXtended Detail Records (XDRs), a mixture of human-and device-driven event, triggered either by explicitly requesting an HTTP address or automatically downloading content from the Internet (e.g., emails) every 15 minutes and at certain traffic thresholds.  Formally, an XDR is a tuple (u, t, A, k), in which there is only one tower A involved, u is the caller's identifier, t is a timestamp of when the record is created, and k is the amount of downloaded information (Fig. 1a). Rather than capturing trips, we are interested in detecting any "movement", i.e., any transition between two antennas. From an epidemiological point of view, transitions provide a useful indication of people's displacements and hence useful information about the movements of the virus between areas within the same comuna or between two comunas. Even if an individual's movement between two antennas may not be a trip from a semantic point of view, it denotes the movement of the virus between those two antennas anyway. To this purpose, from the XDRs of the individuals, we define two types of movement. Every time a user moves from one tower to another www.nature.com/scientificdata www.nature.com/scientificdata/ within the same comuna, they generate an intra-comuna movement. Every time the user moves from an tower to another in a different comuna, they generate an inter-comuna movement (Fig. 1b). For each day and comuna, we construct three indicators of mobility based on the intra-and inter-comuna movements: 1. IM int (Index of Internal Mobility), the number of intra-comuna movements for that day; 2. IM ext (Index of External Mobility), the number of inter-comuna movements for that day; 3. IM = IM int + IM ext (Index of Mobility).
All the three indices ranges in [0, ∞), where a value of 0 indicates no mobility at all. We normalize the three indices with respect to the number of users that reside in the comuna, estimated as the total number of unique mobile devices whose home tower falls in that comuna. Each device's home tower is computed as the tower in which it has the highest number of XDRs during nighttime (between 7 pm and 7am, inclusive)0 21,56 . The number of estimated resident users in the comunas is strongly correlated (R 2 = 0.96, slope = 4.37, intercept = 298.30) with the official population of the comunas as per the official 2017 Chilean Census.

Data Records
The raw datasets were provided by Telefónica/Movistar Chile, a mobile phone company which possesses between 29-32% of the Chilean mobile phone market. Telefónica gathers data for billing purposes and for network maintenance purposes by persisting network events. Users are not allowed to "opt-out" of billing information, as stated in the terms and conditions below. They are, however, able to opt out of the use of personal data by calling a number or visiting the Telefónica website (see page 3, section 6) of Telefónica's Terms and Conditions (see 57 in Spanish). In this study, no personal data or information whatsoever is used in the creation of the dataset proposed here (in fact, it's only the aggregated number of transitions between rtowe)s, without any individual information. www.nature.com/scientificdata www.nature.com/scientificdata/ From the raw datasets we construct the three mobility indices described above. The datasets are released under the CC BY 4.0 License and are publicly available at 58 . Table 1 shows the structure of the dataset describing the mobility indices. Each record refers to a comuna in Chile and describes: • the official name of the region (region, type:string); • the identifier of the region as per the official 2017 Chilean Census (rid, type:string); • the official name of the comuna (comuna, type:string); • the identifier of the comuna as per the official 2017 Chilean Census (cid, type:string). All maps and their official identifiers can be downloaded from the National Statistics Office of Chile 59 ; • the area of the comuna in km 2 (area, type:float); • the values of IM, IM int and IM ext for that day (type:float); • the day the IM, IM int and IM ext values refer to (date, type:date). Table 2 shows the structure of the quarantines dataset. Each record refers to a quarantine regulation and describes: • the identifier of the quarantine regulation (qid, type:integer); • the official name of the comuna (comuna, type:string); www.nature.com/scientificdata www.nature.com/scientificdata/ • the status of the quarantine, that can be either active or not active (status, type:string); • the coverage of the quarantine, that can be either partial, rural, or complete (coverage, type:string); • the date the quarantine started (start, type:date); • the date the quarantine ended, which is " -" if it is still active (end, type:date); • the identifier of the comuna as per the official 2017 Chilean Census (cid, type:string); • the area of the quarantine in m 2 (area, type:float); • the perimeter of the quarantine (perimeter, type:float).
A limitation of all phone-records studies concerns the position of towers and the geographical area they "illuminate" or serve given their technical specifications. There may be towers that serve two neighboring comunas, for example, impacting our movement counts. However, two phenomena mitigate this problem: (i) comunas are generally large, and eventual borderline events are scarce given the 15-minute span; and (ii) telco companies do not record all antenna interactions by mobile devices, because storing all that information would be costly.   Table 3. Values of IM, IM ext and IM int of the ten comunas with the highest IM computed between March 9th and March 15th, 2020. As an example, Rinconada (Valparaíso region) has IM = 30.37, meaning that the number of movements within, to, or from that comuna is around 30 times higher than the estimated number of users that reside in Rinconada.
www.nature.com/scientificdata www.nature.com/scientificdata/ In our case, an event (a phone record or XDR) is typically generated every y minutes and if and only if the device has crossed a threshold of x MegaBytes (MBs) of traffic (not revealed by the company as it is an industrial secret). A two-rule heuristic determines the quantities x and y. A "clock" triggers a rule every 15 minutes: if the user has reached x MBs at either 15, 30, or 45 minutes, the system appends a new XDR in the database. Some heavy users will use up the x MBs threshold at 15 minutes (if they are watching movies on the web, for instance), most at 30 minutes, and a few light users will reach the threshold at 45 minutes. There is also a fair share of frequency at other times. The second rule states that if the control plane of the mobile network notices some particular phone events, such as some antenna handovers, turning off the phone, or losing connection, then a record is created into the database at any time (irrespective of the megabytes used), making it possible to find events anywhere in-between the clock's 15-minute triggers.

technical Validation
In our analysis, we consider two periods: the pre-quarantine period, from March 9th to March 15th, 2020, and the quarantine period, from June 22nd to June 28th, 2020. Although we have two weeks before March 9th, the transition from February to March marks the start of the Fall school semester in Chile. In 2020, March 6th was the start of the semester, so we assume that the "business as usual" period would be best represented by the week of March 9th until March 15th. March 16th marked the start of NPIs in Chile, with the closure of schools, universities and large public gatherings. After that, on March 26th, there was a partial lockdown of seven comunas in the Metropolitan Region. By June 22-28, more than half of the population of the country was under quarantine, and mobility was at 40% reduction.
During the pre-quarantine period, comunas with high mobility indices and comunas with low mobility indices coexist. Geographically, high-mobility comunas are concentrated near urban areas such as the capital Santiago and, in general, in the center of the country (Figs. 2a, 3a, 4a, and 5a). The northern and southern parts of Chile have fewer high-mobility comunas. The comunas with the highest mobility registered during the pre-quarantine period are located in the regions of Metropolitana de Santiago, Arica y Parinacota, Valparaíso, Ñuble, and Magallanes ( Table 3).
The top-ten comunas with the highest mobility indices change during the quarantine period, except for Rinconada in the region of Valparaíso (Table 4), mirroring the different degree of reduction in human mobility  Table 4. Values of IM, IM ext and IM int of the ten comunas with the highest IM computed over the period from June 22nd and June 28th, 2020. As an example, Rinconada (Valparaíso region) has IM = 22.44, meaning that the number of movements within, to, or from that comuna is around 22 times higher than the estimated number of users that reside in Rinconada. www.nature.com/scientificdata www.nature.com/scientificdata/ in the Chilean regions (Fig. 6). All regions show a reduction in all three mobility indices during the quarantine period, albeit with different intensities (Fig. 7). At the comuna level, high-mobility comunas are rare and clustered near the large urban areas located in central Chile (Figs. 2-5).
These results are supported by the distributions of the mobility indices of the two periods (Fig. 8). There is a clear shift towards the left of the distribution of the IM index (Fig. 8a): (i) the average IM during the quarantine period (5.16 ± 2.74) is 27.6% lower than the average IM during the pre-quarantine period (7.13 ± 4.15); (ii) the www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ distribution of IM during the quarantine period is more skewed to the left, showing a decrease of the mobility in Chile during the selected days. Regarding IM int and IM ext , we observe no net shift of the curve, but rather a flattening, suggesting that intra-and inter-comuna movements decreased during the quarantine (Fig. 8b,c).
We further analyze the reduction of the mobility defining IM red as the relative reduction of the IM index in the quarantine period with respect to the pre-quarantine period. The distribution of IM red shows that a large number of comunas have a reduced mobility, following Chilean government interventions, by an average of 25.37% ± 43.2 (Fig. 8d). However, comunas that were not in quarantine during the quarantine period do not reduce their mobility significantly (Fig. 9a).
The percentage of population that live in comunas where the authorities applied NPIs increases with time ( Fig. 9a) reaches its peak (≈57%) in late July 2020. With the increase of the number of people under quarantine, IM red initially increases, but it slightly decreases over time even if both the number of individuals and the number of comunas under quarantine increase. This phenomenon suggests that mobility restrictions are more effective in the short-medium term and become less effective as time goes by, and it can be observed both at regional (Fig. 7) and comuna level (Fig. 9a,b).
Unfortunately, we do not have ground truth data to compare our data with because there are no official indices at the comuna level in Chile. However, other mobility reports do exist for the same area and period at the regional level (not comunas), such as Google Mobility Reports 13 . By aggregating our data at the regional level and comparing them with Google's data, we find a strong Pearson correlation (r = 0.7), suggesting that our mobility index is reflecting mobility trends captured by other reliable data sources.
Limitations of our dataset. Mobile phone records are sparse and irregular in time, leading to gaps between the user's actual trajectory and the trajectory that can be inferred from their digital trace 15 . Chen et al. 60 propose an algorithm to reconstruct individual trajectories from CDRs by recovering the unspecified positions of each user. They revisit the seminal work of Gonzalez et al. 23 , in which the authors show that heavy-tails characterise the distributions of (charateristic) distances traveled by individuals, showing that CDRs preserve the mobility patterns observed in the reconstructed (denser) trajectories, though slightly underestimating long trips and overestimating short ones 60 . Considering that in our study we use XDRs, which are way denser than CDRs, we can assume that the mobility traces of individuals represented in our dataset do not differ significantly from the actual user's trajectory.

Code availability
The up-to-date data are available from the general repository of the Ministry of Science of Chile at: https://raw. githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto33/IndiceDeMovilidad.csv (IM indeces), and https://github.com/MinCiencia/Datos-COVID19/blob/master/output/producto29/Cuarentenas-Activas.csv (quarantines). The code to download the up-to-date data automatically and to reproduce the analysis in our paper is available at 58 .