World’s human migration patterns in 2000–2019 unveiled by high-resolution data

Despite being a topical issue in public debate and on the political agenda for many countries, a global-scale, high-resolution quantification of migration and its major drivers for the recent decades remained missing. We created a global dataset of annual net migration between 2000 and 2019 (~10 km grid, covering the areas of 216 countries or sovereign states), based on reported and downscaled subnational birth (2,555 administrative units) and death (2,067 administrative units) rates. We show that, globally, around 50% of the world’s urban population lived in areas where migration accelerated urban population growth, while a third of the global population lived in provinces where rural areas experienced positive net migration. Finally, we show that, globally, socioeconomic factors are more strongly associated with migration patterns than climatic factors. While our method is dependent on census data, incurring notable uncertainties in regions where census data coverage or quality is low, we were able to capture migration patterns not only between but also within countries, as well as by socioeconomic and geophysical zonings. Our results highlight the importance of subnational analysis of migration—a necessity for policy design, international cooperation and shared responsibility for managing internal and international migration.


Supplementary text
Error! Reference source not found.Socio-climatic bins Socio-climatic bins were created by using global gridded data of aridity (Global Aridity Index(1)), human development (Human Development Index; downscaled by using methodology from Kummu et al (2)), and population counts for 2000-2019 from the WorldPop program (3).Our binning divides global inhabited areas into 100 socio-climatologically analogous zones, which have similar human development and climatic conditions.The binning was conducted in two steps.First, we divided all considered grid cells into 10 population-weighted quantiles based on HDI.After that, each HDI quantile was again divided into 10 population-weighted quantiles based on aridity.This division ensured that each bin incorporates 1% of the global population.
For both HDI and aridity, long-term averages were used.HDI data was provided for each year between 1990-2019 and averaged by taking a mean over 2000-2019.The Global Aridity Index and Potential Evapotranspiration defined by Trabucco and Zomer(1) provides a global estimate of aridity for 1970-2000 as a long-term average: Aridity = MAP / MAET, in which MAP refers to mean annual precipitation and MAET to mean annual evapotranspiration.The lower the aridity value, the higher the aridity, and the higher the aridity value, the higher the humidity.The original aridity raster with 30 arc-second resolution was aggregated to 5 arc-minute resolution prior to binning.Extended Data Fig. 1 shows long-term averages of HDI, aridity and the derived socio-climatic bins.Extended Data Fig. 2 illustrates how the bins were divided.

Rural-urban classification and net migration in rural and urban areas
Extended Data Fig. 3 showcases an example of the urban extent data developed in this study.The data maps urban areas over two decades from 2000 to 2019.The data are provided in 5 arc-min resolution rasters, with grid cells classified as 1 (urban) or 0 (rural).Extended Data Fig. 3 shows the development and growth of urban areas in the South China Sea around the Malesian peninsula during three timesteps, as well as the global extent of urban areas in 2019.
Urban extent rasters were used to calculate net migration in urban and rural areas.Urban net migration was defined by multiplying a net migration raster with the urban extent raster for each year: Rural net migration was similarly derived for each year from net migration data by sub-setting all raster cells where the urban extent raster was valued 0 (i.e.defined as rural).Extended Data Fig. 4 illustrates gridded net migration in urban and rural areas for selected years (2000, 2010, and 2019).
The amount of urban and rural net migration at national, sub-national and communal scales for each year over 2000-2019 could be then calculated as a zonal sum over each administrative area.Extended Data Fig. 5 illustrates cumulated (2000-2019) and aggregated urban and rural net migration at national, sub-national and communal scales.Furthermore, we investigated how net migration was divided between rural and urban areas within each administrative area.We divided administrative areas into four classes depending on the "direction" of migration.In other words, if both urban and rural areas had net-positive migration then the admin region in question would be a 'net-receiver', whereas in the opposite case it would be a 'net-sender'.A case in which urban migration was net-positive and rural net-negative would be called 'urban pull -rural push' and the opposite would be called 'rural pull -urban push'.

Administrative zoning
The regional division used in the study is shown in Extended Data Fig. 6.The grouping is based on the United Nations (UN) country grouping(4).

Births and deaths
For births, we used two compiled databases, namely StatCompiler(5) and EUROSTAT (6), as well as national census data.For deaths, we used likewise two compiled databases, namely OECD(7) regional statistics and EUROSTAT (8), as well as national censuses.illustrates the origin of data used for each country -noting that for some countries, births and deaths data were available in different sources.

Downscaling input variables
Extended Data Fig. 8 illustrates the spatial distribution of the downscaling input variables: HDI, population density, share of women of reproductive age (15-49), and share of life lived for an average person.

Regression model performance
We assessed the performance of regression models used to predict cell-wise birth and death rates, finding that the models predicted the birth and death rates well, with coefficient of determinations being 0.74 and 0.60, respectively (Table S1).

Validation of the data
Data developed in the study were validated against sub-national and national observations.Data used in the validation are described in Table S2.

Births and deaths
Gridded birth and death data were validated against EUROSTAT and OECD data, respectively.The results presented in Supplementary Figure 2 and Supplementary Table 2 show that the downscaled values were in line with reported values, the correlation coefficient (Pearson's R) being R=0.79 and R=0.76 for births and deaths with the significance being p < 0.001, respectively.

Net migration
Gridded net migration was validated against national and sub-national observations.Correlations between observed and modelled values were mainly strong (Pearson's R ranging between 0.61-0.9,except for year 2020 when correlation was not statistically significant) and significant (p<0.001 in all years) at national level (Supplementary Figure 3).Largest differences between observed and modelled values were observed in 2019.This may be explained by the uncertainty in census data and reporting over the most recent years.
At the sub-national level, validation was done for the US, Europe and South Korea.US validation (Supplementary Figure 4) shows very strong and significant correlation between the modelled and observed data.For the US, the observed data was reported for different age groups as a 10-year cumulative sum.To obtain total net migration for the whole population, net migration in each age group was first summed together.Then, an accumulated sum over the reported years (2000-2010) was calculated from the here-developed data, and then compared to the reported values.
In Europe, the correlation between the modelled net migration and reported data is mainly strong and statistically significant, Pearson's R ranging between 0.31-0.88(p<0.001) in years 2001, 2005, 2010, 2015 and 2020 (Supplementary Table 2).For 5-and 10-year cumulative net migration rates, correlations were moderate (0.3-0.57, p<0.001) (Supplementary Figure 6).For individual years, correlation between observed and estimated rates ranged between 0.17-0.49(Supplementary Table 2).We suspect that the largest differences between the observed and modelled data are caused by differences in the definition of net migration.EUROSTAT defines net migration in two ways.Firstly, net migration is defined as the difference between in-and out-migration.Secondly, for some countries, the definition also includes a "statistical adjustment", meaning "other changes in the population figures between 1 January for two consecutive years which cannot be attributed to births, deaths, immigration or emigration."(9; metadata).We defined net migration as the difference between absolute and natural population change, thus, missing any detailed information of immigration or emigration and other flows in or out.
In South Korea, correlation (Pearson's R) ranges between 0.6-0.99 for the total and accumulated modelled and observed net migration counts in each year and year group (Supplementary Figure 5, Supplementary Table 2), respectively.Here, it should be noted that net migration counts for Sejong were provided only for 2012-2020 due to a district reform in 2012 in which Sejong was formed by joining parts of two other provinces (Jeongi-gunand and Chungcheongnam-do) (10).The reported net migration counts for Sejong were 1000-fold compared to our estimates and thus appeared as outliers.Thus, Sejong was excluded from the validation.The validation of migration rates shows strong and significant correlation for the five-year periods (2001)(2002)(2003)(2004)(2005)2006-2010) (Supplementary Figure 5a) and the first 10-year period of the century (2001-2010) (Supplementary Figure 5b).For other observed years and year groups, the correlation was weaker and insignificant.