Countrywide population movement monitoring using mobile devices generated (big) data during the COVID-19 crisis

Mobile phones have been used to monitor mobility changes during the COVID-19 pandemic but surprisingly few studies addressed in detail the implementation of practical applications involving whole populations. We report a method of generating a “mobility-index” and a “stay-at-home/resting-index” based on aggregated anonymous Call Detail Records of almost all subscribers in Hungary, which tracks all phones, examining their strengths and weaknesses, comparing it with Community Mobility Reports from Google, limited to smartphone data. The impact of policy changes, such as school closures, could be identified with sufficient granularity to capture a rush to shops prior to imposition of restrictions. Anecdotal reports of large scale movement of Hungarians to holiday homes were confirmed. At the national level, our results correlated well with Google mobility data, but there were some differences at weekends and national holidays, which can be explained by methodological differences. Mobile phones offer a means to analyse population movement but there are several technical and privacy issues. Overcoming these, our method is a practical and inexpensive way forward, achieving high levels of accuracy and resolution, especially where uptake of smartphones is modest, although it is not an alternative to smartphone-based solutions used for contact tracing and quarantine monitoring.

www.nature.com/scientificreports/ measures 15 . The first two known COVID-19 cases were confirmed on 4 March. Visits to hospitals and longterm care facilities were banned on 8 March and a state of emergency was declared by the government on 11 March, restricting public gatherings and requiring universities to switch from face-to-face teaching to distance learning 16,17 . This was followed by school closures, announced on 13 March and effective from 16 March, when all public gatherings were banned and severe restrictions were imposed on retail trade and personal services. Eventually a relatively strict curfew was implemented on 28 March [18][19][20] . However, this was not a total lockdown, with outdoor recreational activities still permitted, giving rise to some crowding around popular tourist destinations. Travel and entry restrictions followed the same pattern, starting with suspending entry only from hard hit countries such as China, Iran, Italy and South Korea on 11 March, adding Israel on 14 March, with full border closures on 16 March 17,21,22 . The success of physical distancing and population movement restrictions in interrupting the exponential growth of cases does, however, depend on how well people comply with the regulations. Effective decision making in a crisis is greatly facilitated by access to relevant information. The Hungarian Ministry for Innovation and Technology (which has been authorised by the government to access routinely available data to support the management of the pandemic) 23 in collaboration with data experts and researchers, established a Digital Health and Data Utilisation Team (DHDUT) on 22 March, tasked with identifying relevant data that could inform the design and implementation of responses to the pandemic. Using raw data as diverse as medical information from the Hungarian eHealth cloud, financing data from the National Health Insurance Fund Management, and epidemiological data from the National Public Health Center, the experts quickly assembled a dashboard reporting both standard indicators and graphs, such as numbers of new and cumulative cases, deaths, and recoveries and their age, gender, and geographical distribution, but also developed new indicators, for instance for monitoring the effectiveness of physical distancing measures. This dashboard now serves as the main management information system for those members of the government involved in day to day pandemic decision making.
The application of information and communication technologies to improve the efficiency of the public sector, and within it the health system, is increasingly widespread and acknowledged 24 . One of the most promising developments among digital data technologies is the pooling and analysis of routinely generated, mass-produced data, such as Call Detail Records (CDR) of mobile phone use [25][26][27][28][29] . The use of data from mobile phones to analyse physical mobility in a population is not new [30][31][32][33] , and the location of mobile devices, as a proxy for the users' geographical presence, has long been used for commercial purposes, such as for tailored advertising, as well as to inform subscribers about their daily activities, such as numbers of steps taken 34 . The opportunity that mobile phones and associated apps offer in combating the pandemic has already been recognised 35 , with applications such as the Chinese coronavirus "close contact detector" app, which brings together routine data on travel, case reports, and users' mobile phone location information to detect possible contacts with infected people 36 , or the South Korean official mobile phone apps, which are used similarly for agile contact tracing, but also as a means to monitor people in home quarantine 37 . While the pooling of personal information, such as GPS data from mobile phones, car navigation systems, credit card use, or security camera footage, is fraught with legal and ethical questions, digital contact tracing has been suggested as a way to ease physical distancing measures while keeping the pandemic at bay 38 . Not denying the importance of ethical and legal considerations in using such data, the immense potential of the methodology justifies the attention given to it, and international collaborations offer scope to bring together the still somewhat limited expertise that exists, especially in smaller countries 35 .
While there are several accounts of using mobile phone data to assess the impact of physical distancing measures from both EU countries and the UK, as well as China, Chile, and the USA 35 , there are surprisingly few studies on the implementation details of practical applications involving the population of a whole country. The feasibility of CDR-based movement tracking raises serious technical questions, such as the collation of data from different mobile network providers, or the need to comply fully with personal data privacy laws, such as the General Data Protection Regulation of the European Union (GDPR). In this paper, we present the Hungarian experience of how anonymized aggregate cellular data, covering 98.7% of the 8.6 million mobile phone subscribers, were used for monitoring the effectiveness of physical distancing measures during the first wave of the COVID-19 pandemic. The aim of the work reported here was to develop a measure to monitor the effectiveness of physical distancing measures, a so-called mobility (and stay-at-home) index, using a methodology that overcomes feasibility limitations. Our purpose is to share our experiences with other countries interested in replicating this work.

Methods
We sought to develop an index to track changes in population movement during the imposition of physical distancing measures, comparing results from before and after the implementation of restrictions. The process is summarised in Fig. 1. We used geolocation data generated by the use of mobile phones. Mobile network providers locate users whenever they use their phone to initiate communication, whose parameters are automatically stored in so-called Call Detail Records (CDR). This communication can be of several types: calling another client, sending an SMS, or starting a new data transfer session during use of the Internet. The recorded location data for each communication session comprise the location of the cellular base station (also referred to as cellular tower or mast), which handled the transmission request. When a mobile phone is connected to more than one mast, the call is handled by the mast that has available capacity and provides the best signal, usually the one closest to the user.
Hungary has three main companies providing mobile telecommunications for the general public, with market shares of 44.8%, 27.2%, and 26.7%, as of the end of 2019 39 . The market share of smaller carriers is negligible, at 1.3% for calls and 2.3% for internet service provision. The three telecommunications companies provided daily www.nature.com/scientificreports/ CDR data, starting on 15 December 2019, and collaborated with the research team to develop the methodology and advise on how to interpret the raw data for the purpose of our analysis. Given the need to respect the anonymity of clients and to remain compliant with the European Union GDPR, each provider aggregated data at the level of settlements, the lowest level of public administration in Hungary. In 2019, there were 3155 settlements with an area ranging from 1 km 2 to 488 km 2 and a population from 10 to 201,432, except for the capital, Budapest, which covers an area of 525 km 2 with a population of more than 1.7 million, 18% of the total population of Hungary 40 . Since Budapest is divided into 23 districts, each with its own local government body, each district was treated as a separate settlement. Close to 53% of the population lives in one of the other 345 towns, while only 3% lives in small villages with less than 500 inhabitants. A phone was defined as changing location if it was registered in two or more settlements in a 24-h period and non-mobile if it remained in only one. The mobile phone geolocation data places a phone within the land area served by the transmitting cellular tower, whose radius can be as large as 35 kms, but the actual geographical resolution that can be achieved depends on the density of cell towers, which is much higher in densely populated areas because the capacity of masts to handle simultaneous communication sessions is limited. In urban areas, this can be as low as a few hundred metres, while in rural areas masts are generally spaced at least a few kilometres apart. Each mast was assigned to a settlement by the telecommunications companies, which ensured an unequivocal matching of CDR location data with settlements.
The aggregated datasets from the three providers in Hungary were then merged by the DHDUT for each settlement. These data were used to create 2 indices, a "mobile" and a "stay-at-home" index. Since the mobility of the Hungarian population fluctuates over the course of a week, a simple sum of SIM cards in each category would be misleading for our purpose, so the values were normalised in relation to the distribution by day of week in a reference period, in this case February 2020. Hence, the relative-mobility and the relative-stay-at-home index are calculated as the ratio of the device counts during a given day in the pandemic period to the average value for the corresponding weekdays in the reference period. For instance, the relative-mobility index on 14 April is calculated as 100·c 0414 /C Tuesday − 100, where c 0414 is the count of devices that have changed their location on 14 April (Tuesday) and C Tuesday is the average device-count of Tuesdays of the reference month. National holidays were normalised as Sundays. In this paper we present our findings for the period between 1 February 2020 and 20 May 2020. This period covers the first wave of the COVID-19 pandemic in Hungary with over a month before the first identified cases, the period of the curfew, and the gradual lifting of the movement restrictions after 30 April 2020. For the processing of the data we used the Microsoft Power BI Pro software.
Finally, we compared our results with those generated using Google mobility data. Its main features are essentially the same as those of other smartphone-based methods, when a specific smartphone application is developed to track the movement of those who download and install it. While, as noted above, these are usually applied to track individuals or smaller population groups for the purpose of monitoring targeted interventions, such as contact tracing or quarantine monitoring, it is technically possible to use them for the monitoring of large scale population movement. As such, they are an alternative to CDR-based methods. The comparison with the mobility reports of Google, therefore, serves both as an external reference point for validation, but also to set www.nature.com/scientificreports/ out their relative advantages and disadvantages when policymakers must decide on solutions to be deployed in their countries. Although both use data from mobile phones, they differ in that the Google data track individuals continuously, either via satellites, cellular base stations or both, but only as long as their location function is turned on, whereas CDR automatically register the location data, but only when someone interacts with the telephony provider. Further, the former only captures mobility of smartphones while the latter captures all mobile phones. Hence, one would expect that the measures would differ, but it was not clear how much and in what way.

Results
Trends in numbers of devices, prior to adjustment for day of week, are shown in Fig. 2. A clear weekly pattern is visible, with a substantial reduction in movement at weekends, with the reduction most marked on Sundays.
Higher activity on working days with a relatively strong maximum on Fridays can also be observed. In contrast, the resting dataset is not so volatile, showing only a maximum on Sundays and a minimum on Fridays. After normalising to the reference values, the weekly periodicity has smoothed as shown in Fig. 3. The trend of the two indices can be considered to reflect what happened when the government imposed restrictions on movement. As noted above, the first significant measures were introduced on 11 March and continued with the closure of schools on 16 March. This was associated with a clear drop in mobility, which is intuitive as parents had to remain at home to care for their children. Two weeks later, the staying at home regulation was announced, which was a lighter version of curfews elsewhere. As Fig. 3 shows, both mobility and resting indices remained relatively stable except for a discrete peak associated with announcement of the regulation. On this day, many members of the public rushed to supermarkets and pharmacies to prepare themselves for the forthcoming restrictions on mobility. Figure 3 also shows the impact of the gradual loosening of movement restrictions, which, in Hungary, started on 30 April, with the state of emergency being abolished on 18 June.
Moving to the local level, the data are sufficiently granular to track changes at the level of individual settlements in smaller geographical areas. This is illustrated in Fig. 4, showing the change in the mobility index at the level of settlements in 3 Hungarian counties (Somogy, Zala and Veszprém) around Lake Balaton on 5 April. At the end of March, rumours started to spread in various media that people with a weekend house near Lake Balaton (one of the most popular domestic tourist destinations during the summer) decided to move from Budapest and  www.nature.com/scientificreports/ other cities in large numbers with the intention of staying in quarantine there. Initially this story was supported only by anecdotal evidence but closer examination of the mobility index at the local level confirmed the problem. The majority of the settlements close to Lake Balaton are shaded dark in Fig. 4, signalling an anomalous increase of at least 50% in the mobility index, but in certain cases as high as 800%. In response, mayors were authorised to impose local movement restrictions in addition to the national measures if they felt necessary. As noted above, Google and other global technology companies use smartphone-generated data to track the location of users. We looked at the COVID-19 Community Mobility Reports from Google in detail to compare our findings with 41 . In Fig. 5a we compare the CDR-based resting index calculated between 16 February and 17 May and the residential index of Google, based on similar aggregated and anonymized data used to show popular times for places in Google Maps.
The curves follow very similar patterns, e.g., the effect of the curfew announced on 11 March is manifested in a large increase of both indices. In Fig. 5b we show a scatter plot of the daily CDR-based resting (staying-athome) index as a function of the daily residential index from Google. Most of the points are close to the diagonal line that would correspond to a perfect fit. However, during the weekends our CDR-based index takes a larger value compared to the Google index (points deviate from the diagonal to the left), whereas on national holidays the opposite happens and the Google index is higher. The likely reason behind this behaviour is the different reference baseline, e.g. in the case of the CDR-based index the holidays were normalised by the Sunday in the reference week even when they fell on weekdays.
Finally, in Table 1, we compare the advantages and disadvantages of the two methods regarding population movement monitoring.

Discussion
The idea to use mobile phone CDR to track population movement in the case of emergencies is not new, and its use has been reported during this COVID-19 pandemic 35 . Nevertheless, to our knowledge, no detailed description of its application and utility has been published to date. www.nature.com/scientificreports/ We have shown that the mobility and staying-at-home indices developed using CDR data give results that are similar, but not identical, to those with other applications that record population movement on the basis of high resolution smartphone capabilities. Nevertheless, both approaches have strengths and limitations. The advantage of smartphone-generated, satellite-based location data, such as that collected by Google, is that the temporal and spatial resolution is extremely high for any individual 41 , which is impossible to achieve with CDR. In addition, by linking its mobility data to the vast amount of customer data owned by Google and similar companies it is possible to study mobility by characteristics of the phone owner, including by socio-economic position or employment, whose feasibility is limited using CDR only. However, a serious disadvantage is that only some users are included. Google see only Android users, and among them only those, whose location history is turned on. In general, any smartphone-based location tracking solution is blind to individuals who do not have a smartphone, or not willing to share their location. In contrast, by combining the data from all mobile phone operators, generated automatically, as part of the routine administration of calls, the CDR-based mobility index captures all active mobile phone users. This property makes its implementation relatively easy and inexpensive, and can be especially important in settings where early versions of phones are still widely used. In Hungary, out of roughly 8.6 million mobile phone users, 5.3 million have smart phones, further broken down to 4.5 million Android users Figure 5. Comparison of two methods for measuring the relative change from the baseline of "staying-athome". Left (a) The residential index of Google (purple) and the CDR based stay-at-home index (green) plotted as a time series. Right (b) Scatter plot of the daily stay-at-home index as a function of the daily residential index of Google. Perfect fit would correspond to a set of points falling onto the diagonal line shown in purple. The weekends during the curfew seem to deviate from this to the left, whereas outliers to the right are corresponding to the national holidays. Table 1. Comparison of CDR-based tracking as opposed to smartphone-based tracking of mass population movement.

CDR-based method
Smartphone-based method

Tracking accuracy
Better population coverage Better location accuracy for individual smartphone users

Location of a person
Lower (around one hundred metres to kilometres), depends on mast density and environment Depends on the use of the phone

Feasibility
Easier and less expensive More expensive

Personal data protection and privacy
Ensured with anonymous and aggregated data Does not require approval from mobile phone users (approval is given when one registers for the service)

Requires approval from smartphone users
When should be used?
Better option for monitoring mass population movement during large scale restrictions country-wide (hammer phase), or regional (dance phase) www.nature.com/scientificreports/ and 0.8 million using some other operating system 42 . Therefore, the mobility index using CDR benefits from having data from almost twice as many users as Google. While this may not seem to be a major advantage at the national level, it can make a substantial difference in smaller geographical areas where smartphone penetration is small, or where a significant part of the population deny access to their location data. Voluntary compliance is the Achilles-heel of any smartphone-based location tracking application, and while mandating its use can work for small scale, targeted epidemiological interventions, it would be a serious infringement on civil liberties if it was enforced for whole populations.
The main limitation of the CDR-based method is that the accuracy of localisation depends on the use of the phone. Thus, it will only register when a call is made, an SMS is sent, or data accessed. Moreover, if a user is starting a long phone call in one settlement and continues to talk during a trip to another distant one, the CDR-based location method will report the client as if they remain at the starting position. So, no movement is detected by the method, when the location of the phone did change. However, if the same client is making a call using data via an app, such as WhatsApp, the trip will be tracked accurately because apps usually divide their data flow into shorter sessions. Therefore, a new session event, with the changed location information, is generated frequently, allowing geolocation at high resolution.
In summary, as shown in Table 1, CDR-based tracking is superior to smartphone-based tracking in that the former automatically generates data on all users, while the latter only includes those who have smartphones and permit location tracking. On the other hand, the CDR-based method depends on the actual use of the device, which makes the tracking of less frequent phone users less accurate. This limitation could be overcome by using cell registration data, which is feasible, but they are more difficult to obtain and process. Another limitation is that the socioeconomic characteristics could not be linked to the basic CDR event records. Such analysis is feasible only with more detailed device registry data. The fact that the CDR-based method, as used by the government, utilises anonymous, aggregated data provided by independent operators, makes the identification of individual users impossible. On the other hand, it is rather a strength than a weakness of the methodology in that no major personal data protection and privacy questions arise. The aggregation and anonymization of the data were guaranteed by the mobile network operators, since they had to hide details of CDR records not only for the protection of their customers, but to protect details of their business strategies too.
The experience in Hungary has shown that CDR-based mobility and staying-at-home indices provide a means to monitor the effectiveness of restrictions on mobility and to pinpoint problematic local geographical areas where further measures are warranted. Swift implementation of strict measures to reduce the frequency of movement seems to be paying off as seen by the continuing low incidence of infection. In the early phase of the COVID-19 outbreak, the Hungarian government was able to interrupt the exponential growth of new cases, keep the curve flat and keep the number of COVID-19 deaths per 1 million population well below the average of the EU-27 and the UK 43 . The next test of the methodology will be the exit phase, where success depends on the accuracy and timeliness of monitoring, as well as the agility of contact tracing, should a local flare up occur. While the CDR-based method seems to be a better option to monitor mass population movement during large scale restrictions, when there is no need to pinpoint the location of individuals with high precision but it is very important to see as many members of the population as possible, the smartphone-based method should work better to track individuals or smaller population groups for the purpose of monitoring targeted interventions, such as contact tracing or quarantine monitoring, where high precision is the main determinant of effectiveness. In this respect, the two methods are rather complementary in supporting pandemic responses.

Conclusions
We report how integrated CDR data from three major European telecommunication companies covering almost all the mobile phone market in Hungary have been used to monitor mobility. The algorithms automatically process these huge routine databases into information easily interpreted by high level decision makers. The process has been made routine and sufficiently rapid to support swift decision making. At the national level, its results are in good agreement with Google mobility reports, both are able to follow the changes in the movement of the population at the scales relevant for the decision making related to the fight against COVID-19. Nevertheless, our CDR-based method is more accurate with much better population coverage, especially when it comes to the monitoring of the movement of local communities with lower smartphone penetration.
We are convinced, that our method can be easily adapted by other countries, should they wish to apply it in the management of the COVID-19 outbreak. The DHDUT is happy to share further details upon request to support the fight against the COVID-19 pandemic all over the world.