Sub-national mapping of population pyramids and dependency ratios in Africa and Asia

Pezzulo, Carla; Hornby, Graeme M.; Sorichetta, Alessandro; Gaughan, Andrea E.; Linard, Catherine; Bird, Tomas J.; Kerr, David; Lloyd, Christopher T.; Tatem, Andrew J.

doi:10.1038/sdata.2017.89

Download PDF

Data Descriptor
Open access
Published: 19 July 2017

Sub-national mapping of population pyramids and dependency ratios in Africa and Asia

Carla Pezzulo^1,2,
Graeme M. Hornby^1,3,
Alessandro Sorichetta^1,2,
Andrea E. Gaughan⁴,
Catherine Linard ORCID: orcid.org/0000-0002-0819-7755^5,6,
Tomas J. Bird^1,2,
David Kerr¹,
Christopher T. Lloyd ORCID: orcid.org/0000-0001-7435-8230¹ &
…
Andrew J. Tatem^1,2

Scientific Data volume 4, Article number: 170089 (2017) Cite this article

10k Accesses
41 Citations
27 Altmetric
Metrics details

Subjects

Abstract

The age group composition of populations varies substantially across continents and within countries, and is linked to levels of development, health status and poverty. The subnational variability in the shape of the population pyramid as well as the respective dependency ratio are reflective of the different levels of development of a country and are drivers for a country’s economic prospects and health burdens. Whether measured as the ratio between those of working age and those young and old who are dependent upon them, or through separate young and old-age metrics, dependency ratios are often highly heterogeneous between and within countries. Assessments of subnational dependency ratio and age structure patterns have been undertaken for specific countries and across high income regions, but to a lesser extent across the low income regions. In the framework of the WorldPop Project, through the assembly of over 100 million records across 6,389 subnational administrative units, subnational dependency ratio and high resolution gridded age/sex group datasets were produced for 87 countries in Africa and Asia.

Design Type(s)	data integration objective • observation design • time series design
Measurement Type(s)	population data
Technology Type(s)	digital curation
Factor Type(s)	spatiotemporal_interval • spatiotemporal_region
Sample Characteristic(s)	Homo sapiens • Algeria • Angola • Benin • Botswana • Burkina Faso • Burundi • Cameroon • Central African Republic • Chad • Republic of Congo • Democratic Republic of the Congo • Cote d'Ivoire • Djibouti • Egypt • Equatorial Guinea • Eritrea • Ethiopia • Gabon • Gambia • Ghana • Guinea • Guinea-Bissau • Kenya • Lesotho • Liberia • Libya • Madagascar • Malawi • Mali • Mauritania • Morocco • Mozambique • Namibia • Niger • Nigeria • Rwanda • Senegal • Sierra Leone • Somalia • South Africa • South Sudan • Sudan • Swaziland • Togo • Tunisia • Uganda • Tanzania • Western Sahara • Zambia • Zimbabwe • Afghanistan • Armenia • Azerbaijan • Bangladesh • Bhutan • Brunei Darussalam • Cambodia • China • Georgia • India • Indonesia • Iran • Iraq • Kazakhstan • North Korea • Kyrgyzstan • Laos • Malaysia • Mongolia • Myanmar • Nepal • Pakistan • Papua New Guinea • Philippines • Russia • Singapore • Solomon Islands • South Korea • Sri Lanka • Tajikistan • Thailand • Timor-Leste • Turkey • Turkmenistan • Uzbekistan • Vanuatu • Viet Nam

Machine-accessible metadata file describing the reported data (ISA-Tab format)

LivWell: a sub-national Dataset on the Living Conditions of Women and their Well-being for 52 Countries

Article Open access 22 November 2022

Camille Belmin, Roman Hoffmann, … Peter-Paul Pichler

A subnational reproductive, maternal, newborn, child, and adolescent health and development atlas of India

Article Open access 10 February 2023

Carla Pezzulo, Natalia Tejedor-Garavito, … Andrew J. Tatem

Harmonized disposable income dataset for Europe at subnational level

Article Open access 21 March 2024

Mehdi Mikou, Améline Vallet & Céline Guivarch

Background & Summary

Populations in low income countries are generally younger and growing more rapidly compared to those in high income countries. Africa has the highest rate of population growth and more than half of the global population growth between now and 2050 is expected to occur in Africa¹. Asia is projected to be the second largest contributor to future global population growth, adding 0.9 billion people between 2015 and 2050¹.

Population demographics have been central to the Millennium Development Goals and are key to the Post-2015 Sustainable Development Agenda², and development policies need to account for population dynamics and their relationships with social, economic and environmental factors^3,4. One aspect characterising low income countries is the higher proportion of children relative to working age individuals^5,6. Moreover, given the increasing number of elders dependent on working age individuals, population aging is also an emerging issue in some low and middle income countries^7,8. The ratio between the share of the ‘dependent age’ population (either <15 years or >65 years) to the remaining ‘working age’ population is generally quantified through the ‘age dependency ratio’^5,9,10. A declining share of children and/or elders and an increase in the working-age adult population lowers the dependency ratio and can lead to opportunities for development^11,12. For example, households with a lower number of dependents tend to increase the share of savings and to invest in the human capital of their children^13,14, and countries experiencing a rise in the number of working-age people can expand their productivity¹⁵. In contrast, experiencing high young-age and old-age dependency ratios means that working age populations face a greater burden in supporting and providing the social services needed by children and elders¹⁶. In areas such as the sub-Saharan Africa and South Asia, age dependency ratios remain relatively high, and only in some cases have they started to slowly decline, prompting opportunities for development and economic growth^17,18, with the potential to enter into the so-called ‘demographic dividend’¹⁹.

Age-sex pyramids and dependency ratios are generally measured, mapped and summarised at national levels, in spite of the fact that large subnational demographic heterogeneities exist within countries. Studies have identified and explored the role of subnational or household-level dependency ratios in some countries^20,21,22, but these have focussed mostly on high income regions²³. However, subnational demographic data are becoming increasingly available for countries in low income regions²⁴ and this trend is being encouraged for highlighting inequalities, targeting populations at risk²⁴, and guiding the development of targeted policies for the Sustainable Development Goals²⁵.

Contemporary and spatially detailed datasets that identify areas where policy relevant age groups (e.g., children under the age of five, young persons, women of childbearing age and the elderly) reside are a fundamental prerequisite for monitoring demographic changes¹⁵, supporting geographic targeting of policies (www.undatarevolution.org), planning vaccination interventions²⁶), identifying age groups at risk of mortality and morbidity²⁷ and allocating resources effectively^28,29. Accurately depicting the distribution of young age and old age dependency ratios at subnational levels aids in identifying opportunities for economic growth¹⁸ and possible gains from the demographic dividend³⁰. It also helps in investigating other socio-economic conditions usually related to high dependency ratios, such as high fertility rates³¹, low levels of child enrolment in schools³² and low opportunities for productive employment³³.

In the framework of the WorldPop Project (http://www.worldpop.org)³⁴, advancing the approach described in Tatem et al.²⁴, an open access archive of (i) gridded subnational dependency ratio datasets and (ii) high resolution gridded 5-year age/sex group count datasets, for 87 countries in Africa and Asia (excluding Middle Eastern countries and Japan), has been created using subnational data on population age and sex structures for circa 2010 and the corresponding WorldPop gridded continental population count datasets (Fig. 1). In particular, over 100 million records were extracted from census data, census microdata and household survey data, assembled and used to derive the proportional age and sex structures for 6,389 subnational administrative units.

Figure 1: Schematic overview of the data processing method adopted to generate the WorldPop gridded subnational dependency ratio datasets and high resolution gridded 5-year age/sex group count datasets.

Methods

Subnational tabular data on population age and sex structures were gathered on a country by country basis. The table format and structure varied significantly from one country to the next, reflecting the wide range of data sources utilised in this project. For example, some countries presented absolute counts per age group whilst others presented proportions. The thresholds that determined the age groups also varied from country to country, ranging from 1 to 5 years. In addition, some datasets represented the full population (e.g., national census data), whilst others were a sub-sample of the population (e.g., micro-census, household surveys). These challenges required a process of standardisation aimed at harmonizing these disparate data sources into a set of consistent tables in which each row represents a subnational administrative unit and the fields contain the corresponding proportion of males, females and people in each 5-year age group (from aged 0 to 4 up to a final group aged 65 and over). Standardized country data tables were subsequently joined to their respective subnational spatial boundaries and the spatial datasets generated were then merged together to produce two continental vector datasets (i.e., Africa and Asia) depicting the proportional age and sex structure in each subnational administrative unit.

Subnational proportional age structures were then used to calculate combined, young-age and old-age dependency ratios for each administrative unit in Africa and Asia. According to the standard definitions most widely found in the literature^10,35, the combined dependency ratio was measured as the ratio of dependents younger than 15 and aged 65 and over to the population aged 15–64; the young-age dependency ratio as the ratio of dependents younger than 15 to the working-age population aged 15–64; and the old-age dependency ratio as the ratio of dependents aged 65 and over to the working-age population. Eventually, all vector subnational dependency ratio datasets were rasterized at a resolution of 30 arc seconds (approximately 1 km at the equator).

A series of gridded 5-year age/sex group count datasets for Africa and Asia, with a spatial resolution of 30 arc seconds, were generated by gridding the subnational proportional age and sex structure datasets and overlying them with the corresponding WorldPop gridded continental population count dataset (http://www.worldpop.org.uk/data/data_sources/); with the latter adjusted to match United NationsPopulation Division (UNPD) estimates for 2010³⁶.

Data collection

Data on population age and sex structure, as close as possible to 2010, were collected at the finest spatial level available for all African and Asian countries listed in Tables 1 and 2. Where data for a given country were available from multiple sources, in addition to the year in which the data were collected, another key determinant, such as the sample size of the data, was considered in order to decide which data should be used.

Table 1 Data sources for African countries from which age and sex proportions were derived.

Full size table

Table 2 Data sources for Asian countries from which age and sex proportions were derived.

Full size table

The most reliable, complete and accurate source for subnational population composition is usually represented by country-level census-based data that cover the whole country population. Thus, for those countries where age and sex structure data were available from a recent national census, these were collected at the highest administrative level available, along with their corresponding spatial boundaries.

For those countries where recent full census data were not available, alternative sources were sought. Priority was given to census microdata obtained from the Integrated Public Use Microdata Series International (IPUMSI) database³⁷. Census microdata represent household-level records derived from census data by sampling a representative fraction of the population (with sample size generally between 2 and 15% of the full census).

Where neither full census nor census microdata close to 2010 were available, national household survey data were obtained from the Demographic and Health Surveys (DHS), DHS special Malaria Indicators Surveys (MIS) and Aids Indicator Surveys (AIS)³⁸, Social Indicator Surveys (SIS)³⁹ or Multiple Indicator Cluster Surveys (MICS)⁴⁰. Whilst providing a representation of the population closer to 2010, compared to outdated full census and census microdata, household surveys tend to suffer from a constrained sampling framework, with sample sizes being typically less than 1% of the national population and full range extending from 0.04 to 12%. Nevertheless, household survey data are designed to be representative at both national and subnational levels (typically administrative level 1, generally equivalent to provinces) and thus can be used to derive the corresponding age and sex structures. Household surveys were prioritised by survey year and sample size.

Tables 1 and 2 list data sources on a country-by-country basis for Africa and Asia, respectively, and provide information on the year of data collection and the subnational administrative level at which they were collected. In summary, the data for 30 of the 87 countries were derived from full national population and housing censuses, 17 from IPUMSI census microdata, 19 from traditional DHS, 4 from DHS-MIS/AIS/SIS and 8 from MICS (Tables 1 and 2).

For the remaining 9 countries where subnational data on population age and sex structures were not available (namely Libya, Eritrea, Western Sahara, Equatorial Guinea, Brunei, Myanmar, Papua New Guinea, Sri Lanka and Turkmenistan) 2010 UNPD country level estimates³⁵ were used.

Data preparation

Depending on the format and structure of the raw tables containing the age and sex data for each given country, a specific processing technique was developed and applied in order to standardise all data tables across all countries. Table 3 presents an example of a standardised table, for Bhutan, in which each record represents a subnational administrative unit and the fields contain the corresponding proportionate values of people (both sexes) in each 5-year age group, males and females. There follows a summary of the main techniques employed to process raw table data, summarised by data source.

Table 3 Example of a standardised country table containing the proportionate values of people (both sexes) in each 5-year age group, males and females in each administrative unit.

Full size table

Processing national census data

National census data is recorded and documented according to protocols determined by national governments; hence a wide range of different table data formats and structures needed to be processed and standardized. Microsoft Excel was used to manually restructure the raw data tables into a common format, structure and schema, similar to the one presented in Table 3.

Processing IPUMSI data

Unlike data derived from full censuses, IPUMSI data tables are already provided in a standard format and structure and thus a model was developed and applied in order to automate the processing of standardizing them. Raw country-level data are provided as comma-separated values (CSV) tables with accompanying spatial boundaries. Each CSV table is structured according to one row per person surveyed and the person’s age and sex is recorded in two fields in the same table. The raw table data were processed as follow:

Convert each CSV table to a file geodatabase table
Create new fields for each 5-year age group and sex class and populate with binary values, 0 or 1; a value of 1 in an age group field would indicate that the person's age fell in that range
Use a summarise function to sum the new fields by administrative unit (note, a ‘count’ of all records by administrative unit also provides the total sampled population)
Create and populate new fields for proportion of total population within each 5-year age group and sex class, by administrative unit
Perform QA/QC to ensure that for each administrative unit a sum of people in all 5-year age groups equates to total population
Perform QA/QC to ensure that for each administrative unit a sum of male and female equates to total population
Join new table data to spatial boundaries based on administrative unit unique IDs
Export the temporary join to a geodatabase feature class

The ArcGIS Toolbox containing the ArcGIS ModelBuilder models was used to perform all tasks described above and is distributed as part of the WorldPop-DepRatioAgeStruc-v1 code⁴¹ described in the Code availability subsection below.

Processing household survey data

Household survey data are designed to be representative at both national and sub-national level, where the sub-national level usually corresponds to region or province, depending of the survey sample design and representativeness.

Information about age structure by region were derived from household survey data by using raw data files. Given that the format and structure of all household survey data are very similar, a data management process was developed to produce standardized outputs from the raw tables containing age and sex data. Household members’ files, which include information related to each household member's age, sex and administrative unit of residence, were accessed and downloaded from the relative web pages, namely, Measure Demographic and Health Surveys (DHS) program³⁸ (which also includes MIS and AIS), UNICEF Multiple Indicator Cluster Survey (MICS)⁴⁰ and Social Indicator Surveys (SIS)³⁹. Data processing was mainly performed using the SPSS (IBM) statistical software⁴². Sampling weights were applied to the calculations in order to ensure representativeness, following the relative instructions given by the data providers’ documentation. The variable containing sampling weights was identified and made available in the appropriate SPSS format. Also, in order to account for the sampling strategy adopted by the survey, relevant variables for strata and primary sampling unit were defined in SPSS (using CSPLAN ANALYSIS). Following the DHS documentation and final report, age structures were calculated only on de facto members, which are generally defined as those household members who slept in the household the night before the interview. Moreover, variables in the datasets corresponding to members’ age, sex and administrative unit of residence were identified. After selecting only de facto members and weighting the data appropriately, cross tabulations were applied to calculate counts and proportions of population within each 5-year age group and sex class, by administrative unit. Finally, outputs of cross tabulations were exported into excel and reformatted in order to match the standard table schema described in the Data preparation subsection.

A template SPSS syntax file, showing the process for creating proportional age and sex structures from household survey data, is distributed as part of the WorldPop-DepRatioAgeStruc-v1 code⁴¹ described in the Code availability subsection below.

Joining standardised data tables to spatial boundaries

Once formatted to a standard table schema, subnational age and sex structure data were then joined on a country-by-country basis, using a GIS system, to their corresponding spatial boundaries; with the latter representing the administrative unit level at which the data were assembled. The ArcGIS Toolbox tool ‘Join Field’ was used for this purpose (…\data management tools.tbx\joins\join field). Note the exception for the IPUMSI data for this stage as processing that data was accomplished as part of the model workflow described in the Processing IPUMSI data subsection.

Spatial boundary datasets were obtained from a range of sources, including the GADM database⁴³, the DHS Spatial Repository⁴⁴ and multiple national statistical offices. Mismatches with subnational administrative units and topological inconsistences between national boundaries were manually corrected using a GIS system.

Supplementary Fig. 1a,b (Africa and Asia, respectively) present the boundaries of the administrative units and each unit is coloured according to the proportion of the total population sampled and used to derive the 5-year age group and sex proportions. These country-level spatial datasets were then merged into two continental vector datasets (i.e., an African and Asian dataset) using the ArcGIS ‘Merge’ tool (toolboxes\system toolboxes\data management tools.tbx\General\Merge).

Producing subnational dependency ratio datasets

Using the two continental vector datasets described above, dependency ratios were calculated using a field calculation. The ArcGIS tool ‘Field Calculator’ was used to populate three new fields (i.e., CDR, YDR and ODR), listing subnational dependency ratio values calculated at the administrative unit level:

\begin{matrix} (1) & CDR = ((pc 0_{14}) + (pc65) / ({pc15}_{64})) \times 100 \end{matrix}

\begin{matrix} (2) & YDR = ((pc 0_{14}) / ({pc15}_{64})) \times 100 \end{matrix}

\begin{matrix} (3) & ODR = ((pc65) / ({pc15}_{64})) \times 100 \end{matrix}

where CDR, YDR and ODR represent the combined, young and old dependency ratio, respectively, (pc0_14) represents the proportion of the population aged 0 to 14, (pc65) represents the proportion of the population aged over 65, and (pc15_64) represents the proportion of population aged 15 to 64.

All three subnational dependency ratio vector datasets were rasterized at a resolution of 30 arc seconds (approximately 1 km at the equator). Figure 2 illustrates the spatial distribution of the subnational YDR dependency ratios in Africa and Asia, respectively.

**Figure 2: Subnational young age dependency ratio (YDR) datasets, circa 2010.**

Producing subnational 5-year age/sex group count datasets

At this stage, to produce high resolution gridded 5-year age/sex group count datasets for 2010, the WorldPop continental gridded population count datasets for Africa and Asia (http://www.worldpop.org.uk/data/data_sources/), with the total population for each country adjusted to match United NationsPopulation Division (UNPD) estimates for 2010 (ref. 35), were multiplied by the comprehensive subnational proportional age and sex structures assembled.

The first step in this process was to convert the subnational administrative units, represented by polygon features in the two continental vector datasets described above, from vector format to raster grid format. Each polygon feature (representing a subnational administrative unit) is attributed with proportionate values for each 5-year age group and sex class (i.e., the proportion of the total population in each administrative unit belonging to each 5-year age group and sex class). The conversion process produced a stack of raster grids (one for each 5-year age group and sex class) and within each grid each pixel retained the proportionate value for that 5-year age group or sex class relating to the subnational administrative unit in which the pixel is located. Bearing in mind that subsequently these grids have to be overlaid on the WorldPop gridded continental population count datasets for Asia and Africa, in the interests of accurate grid cell calculations, the conversion process needs to ensure that the mesh of the grid is identical in terms of grid cell size (resolution) and grid cell alignment, to that of the WorldPop datasets. The conversion process uses the ArcGIS tool Polygon to Raster. A python script was created which called upon this tool and applied it, through iteration and whilst maintaining cell properties as described (using Geoprocessing Environment Settings), to all fields within the age/sex structure vector dataset containing the 5-year age group and male/female proportions. The ArcGIS Toolbox containing the Geoprocessing tool which calls the Python script is distributed as part of the WorldPop-DepRatioAgeStruc-v1 code⁴¹ described in the Code availability subsection below.

ArcGIS Model Builder was then used to automate the processing of the resulting stack of raster grids, (representing gridded subnational 5-year age group proportions). The model iterated through the stack such that each raster was sent to Raster Calculator tool along with the corresponding WorldPop continental gridded population count datasets (Africa or Asia) and both were utilised in a simple map algebra calculation:

\begin{matrix} (4) & AGC = AGP \times WPPC \end{matrix}

(where AGC represents the resulting gridded population count dataset for a given 5-year age group, AGP is the gridded proportion dataset for the corresponding age group and WPPC is the WorldPop continental gridded population count dataset either for Africa or Asia).

This model produced a series of grids for Africa and Asia, each one providing estimates of population count for a specific age group (circa 2010) at the grid cell level. To disaggregate them by sex, two similar models, one for males and one for females, iterated through this stack to produce sex delineated population counts for each age group (circa 2010) at the grid cell level. These two models call upon the grids for subnational male and female proportions generated at an earlier stage as previously described.

Figure 3 represents an application of these datasets, providing gridded estimates of both young and working age population distribution (aged 0 to 14 and 15 to 65 age, respectively) for Africa and Asia.

**Figure 3: Circa 2010 High-resolution gridded population distribution presenting age structures for mainland Africa and Madagascar and the Asian region.**

The ArcGIS Toolbox containing the ArcGIS ModelBuilder models used to perform the steps described above is distributed as part of the WorldPop-DepRatioAgeStruc-v1 code⁴¹ described in the Code availability subsection below.

Code availability

The WorldPop-DepRatioAgeStruc-v1 code⁴¹, used to produce the datasets described in this article, is publicly available through Figshare. It consists of (1) an ArcToolbox Geoprocessing Tool to pre-process the raw IPUMSI data, (2) an SPSS (IBM version 22) script to pre-process the DHS raw data and (3) an ArcToolbox Geoprocessing Tool to generate gridded age group structure and sex class proportion datasets and combine them with gridded population count datasets to produce the gridded age/sex structure count datasets. All of them are internally documented in order to both briefly explain their purpose and, when required, guide the user through their customization.

Data Records

All datasets described in this article, for the 87 countries listed in Tables 1 and 2, are publicly and freely available through both the WorldPop Dataverse Repository (Data Citation 1, Data Citation 2, Data Citation 3, Data Citation 4) and the WorldPop website (http://www.worldpop.org.uk/data/data_sources/). While the datasets available through the WorldPop Dataverse Repository will be preserved in their published form while the ones available through the WorldPop website will be integrated with additional countries (Middle Eastern countries and Japan). Furthermore, additional gridded subnational dependency ratio datasets and high resolution gridded 5-year age/sex group count datasets for all countries located in Latin America and the Caribbean will be soon available through the WorldPop website.

Gridded subnational dependency ratio datasets and high resolution gridded 5-year age/sex group count datasets for each country listed in Tables 1 and 2 can be obtained by (i) downloading the corresponding datasets associated with the continent in which the country of interest is located and (ii) using the accompanying gridded ISO country code mask⁴⁵ to extract them (Table 4).

Table 4 Name, description (from XX to YY represents all 5-year age groups from 00 up to a final group aged 65 and over), format, resolution and DOI of all datasets available for Africa and Asia (including the gridded ISO country code mask⁴⁵ needed to extract country datasets from the continental datasets).

Full size table

Technical Validation

All data collected, assembled and used were (i) already validated by the corresponding data collector, owner and/or distributor, and (ii) further checked, in the framework of this project, to ensure that they represent true trait variation by inspecting the proportions in each age class and sex group (making sure they were within reasonable ranges on a country-by-country basis). Then, both the gridded subnational dependency ratio datasets and high resolution gridded 5-year age/sex group count datasets were produced by solely processing the input data and thus the outputs were simply verified by inspecting them for abnormal values and nonsensical results. For the high resolution gridded 5-year age/sex group count datasets, both for Africa and Asia, this was done by summing all of them into a single dataset (depicting the total numbers of people for all age groups and both sexes at the grid cell level) and then subtracting it from the corresponding WorldPop continental gridded population count dataset to make sure that country totals matched the UNPD estimates for 2010. Subnational dependency ratios were also evaluated by examining their spatial distribution and by comparing them against existing UN country-level estimates.

Comparing sub-national dependency ratios against national estimates

The distributions of the YDR at sub-national scales for Africa and Asia were plotted against World Bank national estimates, for each country’s correspondent year. Figure 4 gives an indication of the size of the sub-national variation in the YDR shown in the subnational level datasets that is masked when averaging at national levels.

**Figure 4: Differences between national and subnational young-age dependency ratios (YDR) for Asian and African countries.**

The variation observed is related also to the administrative unit level of the input data, with those countries for which the most spatially detailed YDR were available (i.e., Tanzania, South Africa and China) showing the largest differences between minimum and maximum estimates. Nevertheless, even for those countries where subnational data were only available at administrative unit level 1, sub-national differences from the World Bank national level estimates are evident. Additionally, national estimates are often highly influenced by areas with high density of population whose rates greatly influence the averages, and this is reflected by the World Bank estimates falling outside the interquartile range of the boxplots in many cases (Fig. 4).

Sensitivity tests on differences over time in population structures

All available sources that were as close as possible to 2010 at the time of data processing were considered in this work. However, for some countries only relatively outdated data were available. To test the assumption of no major changes in age structures between years, the absolute change (difference) in age group specific percentage points was tested, in countries where two or more comparable time points were available from household surveys,. Two countries in Asia and one country in Sub Saharan Africa were selected, namely: Nepal (DHS Surveys 2011–2006 & 2011–2001), Kazakhstan (MICS Surveys 2011–2006) and Guinea (DHS Surveys 2011–2005 & 2011–1999). Absolute change (difference) in percentage points was calculated for each survey region and total country (Table 5) (available online only).

Table 5 Absolute change (difference) in age group specific percentage points between survey round years in Nepal (DHS Surveys 2011-2006 & 2011-2001), Kazakhstan (MICS Surveys 2011-2006), Guinea (DHS Surveys 2011-2005 & 2011-1999), for each region and total country

Full size table

Also, for the same countries and all different time points, population pyramids were constructed and structures between years were compared (Supplementary Figs 2a–4c). Population pyramids constructed for this test follow the methodology described in the Methods section, therefore they reflect the input data utilised to construct the datasets described in this article. In order to be consistent with the method applied here, country level proportions of males and females were applied for all age groups. Results from our tests suggest that overall, differences are relatively small. Through looking at country totals and comparing pyramids, the highest differences in percentage points are shown for younger age groups, and this pattern is consistent among all tested countries. Furthermore, country level tests for Kazakhstan also show that some older age groups (60–64 and 65–69) present larger differences in percentage points, compared to the other age groups. At subnational level, the most consistent differences are shown in Central, Far-western regions in Nepal when comparing the 2001 to the 2011 survey, and in Mangystau oblysy (Kazakhstan 2011–2006) for age group 0–5, when compared to other regions and age classes.

Usage Notes

The gridded subnational dependency ratio datasets and high resolution gridded 5-year age/sex structure datasets described in this article can be used to support a range of applications, from planning interventions to designing strategies and deriving health/development metrics, and to predict response variables that are intrinsically dependent on dependency ratios and age/sex structures.

Ongoing work involves the integration of these datasets with information related to health and development for planning interventions. This can include, for example, integration with information related to vaccination rates of children under the age of five²⁶, access to antenatal, delivery and postnatal care for women of reproductive age⁴⁶, and disease prevalence and burden^47,48.

These datasets can also be used as a base for measuring demographic progress in relation to a set of topics like, ageing, population growth, dynamics and projections^{5,6,8,15,49,50}. Furthermore, they can be combined with economic data to assess the association between the population pyramid and the economic development in a given country, as well as to study the effects of population structure on savings and growth⁵¹. Finally, age and sex compositions can be used to determine the demographic events that will occur and their impact on the types of facilities and services (e.g., schools and maternity services) needed by the population.

Although the subnational dependency ratio and age/sex structure datasets presented here represent the best available datasets at this level of spatial detail/resolution, it is important to highlight that limitations still exist. Indeed, all of the census data, census microdata and survey-based data used here are subject to various sources of error and bias.

Surveys usually do not sample from certain groups and places (e.g., indigenous groups, informal settlements, places experiencing civil unrest and refugee camps) either because of political biases, missing sampling frames, or security issues. Therefore, these groups are not covered in surveys, and in some cases are also not included in national censuses. Household survey data are in most cases sampled using complex survey design procedures⁵² and estimates derived from household survey datasets are usually representative at administrative level 1 or provinces, or a combination of those. IPUMS-International microdata are samples from population censuses from National Statistical Offices taken around the world since 1960. Sample data are derived by using different sampling designs (https://international.ipums.org/international/variance_estimation.shtml). Furthermore, they cover only a fraction of the population (approximately 5%) and, as for every sample, they are subject to sampling errors³⁷.

Uncertainties also arise over comparisons being made between primarily full census-based estimates of dependency ratios and age/sex structures and those based on IPUMSI microdata and household survey data. Differences between the way these were measured contributes to uncertainties in comparisons, though strong correlations between household survey-derived age structures, and those derived from census data suggest that such differences may be small²⁴. For some countries, the input data used in this project were also relatively outdated and/or coarse—whilst the precision with which heterogeneities in dependency ratios and age/sex structures are mapped is improved over simple country-level estimates, the datasets presented here are still limited through e.g., only one set of values for Libya and thousands for China. Moreover, like most other population characteristics reported at the administrate unit level, the dependency ratios and age/sex structures are also subject to the modifiable areal unit problem⁵³.

Population data that were sex disaggregated by age group at subnational administrative unit level were only available for a minority of countries. In most cases, the subnational level data were disaggregated only by age group with a single two-way sex split. Therefore in order to maintain the comparability across countries, the subnational level sex split was simply applied across all age groups within the corresponding subnational administrative unit.

Additional Information

How to cite this article: Pezzulo, C. et al. Sub-national mapping of population pyramids and dependency ratios in Africa and Asia. Sci. Data 4:170089 doi: 10.1038/sdata.2017.89 (2017).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

United Nations. World Population Prospects: The 2015 Revision, Key Findings and Advance Tables. Department of Economic and Social Affairs, Population Division (UNDESA) (2015).
United Nations. Population Dynamics in the Post-2015 Development Agenda: Report of the Global Thematic Consultation on Population Dynamics. United Nations Population Fund (2013).
Herrmann, M. Factoring population dynamics into sustainable development. Development Co-operation Report 2012: Lessons in Linking Sustainability and Development (OECD), 10.1787/dcr-2012-11-en (2012).
Das Gupta, M., Bongaarts, J. & Cleland, J. Population, Poverty, and Sustainable Development: A Review of the Evidence (World Bank, 2011).
Book Google Scholar
Bongaarts, J. in Population Matters: Demographic Change, Economic Growth, and Poverty in the Developing World (Oxford University Press, 2001).
Google Scholar
Bloom, D. E. & Canning, D. Demographics and Development Policy. Development Outreach 13, 77–81 (2011).
Article Google Scholar
Cohen, J. E. Human Population: The Next Half Century. Science 302, 1172–1175 (2003).
Article CAS ADS Google Scholar
United Nations. World Population Policies 2013. Department of Economic and Social Affairs, Population Division (UNDESA) (2003).
Shryock, H. S. & Siegel, J. S. The methods and materials of demography (US Dept. of Commerce, Bureau of the Census 1973).
Google Scholar
United Nations. Definition of the indicators of population ageing in world population ageing: 1950-2050—Annex I. United Nations Publications, 41–42 (2002).
Bloom, D. E., Canning, D. & Sevilla, J. The Demographic Dividend: A New Perspective on the Economic Consequences of Population Change. Population Matters Series (2003).
Bloom, D. E., Canning, D., Fink, G. & Finlay, J. E. Microeconomic Foundations of the Demographic Dividend. PGDA Working Paper 93 (2012).
Williamson, J. in Population Matters: Demographic Change, Economic Growth and Poverty in the Developing World (eds Birdsall, N., Kelley, A. & Sinding, S. ) (Oxford University Press, 2001).
Google Scholar
Schultz, T. P. Demographic Determinants of Savings: Estimating and Interpreting the Aggregate Association in Asia. IZA Discussion Paper No. 1479; Yale University Economic Growth Center Discussion Paper Series No. 901 (2005).
Leahy, E., Engelman, R., Vogel, C. G., Haddock, S. & Preston, T. The Shape of Things to Come: Why Age Structure Matters to a Safer, More Equitable World (Population Action International, 2007).
United Nations. Demographic Indicators: Dependency Ratio. Department of Economic and Social Affairs, Population Division (UNDESA), 104–106 (2013).
United Nations. Human Development Report 2013—The Rise of the South: Human Progress in a Diverse World. United Nations Development Programme (UNDP) (2013).
Van der Ven, R. & Smits, J. The demographic window of opportunity: age structure and sub-national economic growth in developing countries (Nijmegen Center for Economics (NiCE)—Institute for Management Research—Radboud University Nijmegen, 2011).
Google Scholar
Eastwood, R. & Lipton, M. Demographic transition in sub-Saharan Africa: How big will the economic dividend be? Population Studies 65, 9–35 (2011).
Article Google Scholar
Madhavan, S., Schatz, E. & Clark, B. Effect of HIV/AIDS-related mortality on household dependency ratios in rural South Africa, 2000-2005. Population studies 63, 37–51 (2009).
Article Google Scholar
Fayissa, B. & Gutema, P. Dependency Ratio and the Economic Growth Puzzle in Sub-Saharan Africa (Murfreesboro, Tenn.: Middle Tennessee State Univ., Dep. of Economics and Finance, 2010).
Google Scholar
Hadley, C., Belachew, T., Lindstrom, D. & Tessema, F. The shape of things to come? Household dependency ratio and adolescent nutritional status in rural and urban Ethiopia. American journal of physical anthropology 144, 643–652 (2011).
Article Google Scholar
File, T. & Kominski, R. Dependency Ratios in the United States: A State and Metropolitan Area Analysis. Data from the 2009 American Community Survey (Social, Economic, and Household Statistics Division (SEHSD)—U.S. Census Bureau, 2012).
Google Scholar
Tatem, A. J. et al. Millennium development health metrics: where do Africa’s children and women of childbearing age live? Population Health Metrics 11, 11 (2013).
Article Google Scholar
Rossouw, S. & Webber, D. Millennium Development Goals: A Concern Over Sub-National Variations. International Business & Economics Research Journal 10, 127–136 (2011).
Google Scholar
Alegana, V. A. et al. Fine resolution mapping of population age-structures for health and development applications. Journal of The Royal Society Interface 12 (2015).
Tatem, A. J. et al. Mapping populations at risk: improving spatial demographic data for infectious disease modeling and metric derivation. Population Health Metrics 10, 1–14 (2012).
Article Google Scholar
Bigman, D. & Srinivasan, P. V. Geographical targeting of poverty alleviation programs: methodology and applications in rural India. Journal of Policy Modeling 24, 237–255 (2002).
Article Google Scholar
Elbers, C., Fujii, T., Lanjouw, P., Oezler, B. & Yin, W. Poverty alleviation through geographic targeting: How much does disaggregation help? Journal of Development Economics 83, 198–213 (2007).
Article Google Scholar
United Nations. Sub-National Estimates of Human Capital Indicators: Localizing Investments for the Demographic Dividend. United Nations Population Fund (2016).
Bassett, T. J. & Winter-Nelson, A. in The Atlas of World Hunger Ch 28 (University of Chicago Press, 2007).
Google Scholar
Admassie, A. Explaining the High Incidence of Child Labour in Sub-Saharan Africa. African Development Review 14, 251–275 (2002).
Article Google Scholar
Vijayakuma, S. An Empirical Study on the Nexus of Poverty, GDP Growth, Dependency Ratio and Employment in Developing Countries. Journal of Competitiveness 5, 67–82 (2013).
Article Google Scholar
Tatem, A. J. WorldPop, open data for spatial demography. Scientific Data 4, 17004 (2017).
Article Google Scholar
United Nations. World Population Prospects: The 2015 Revision. Population indicators. United Nations Department of Economic and Social Affairs Population Division (UNDESA) (2016).
United Nations Development Programme (UNDP). World Urbanization Prospects: The 2014 Revision. CD-ROM Edition. Available at http://esa.un.org/unpd/wup/CD-ROM/ (2014).
Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 6.4 [Machine-readable database]. Available at https://international.ipums.org/international/citation.shtml (University of Minnesota, 2015).
ICF International. Demographic and Health Surveys (Various) [Datasets] (Icf International [Distributor], 2000–2015).
Ministry of Health, Lao Statistics Bureau & UNICEF. Lao PDR—Social Indicator Survey 2011-2012 [Dataset]. Available at http://microdata.worldbank.org/ (2012).
United Nations International Children's Fund (UNICEF). Multiple Indicator Cluster Survey (MICS) [Various Datasets]. Available at http://www.childinfo.org/mics4_surveys.html (United Nations, 2000–2015).
Pezzulo, C. et al. Source code for: Sub-national mapping of population pyramids and dependency ratios in Africa and Asia.10.6084/m9.figshare.4596187.v1 (2017).
IBM. SPSS Statistics for Windows, Version 22.0. IBM Corp. (Released, 2013).
GADM. Database of Global Administrative Areas. Available at http://www.gadm.org/ (2015).
ICF International. Spatial Data Repository, the DHS Program (Various) [Datasets] (Icf International [Distributor], 2000–2015).
Lloyd, C. T., Sorichetta, A. & Tatem, A. J. High resolution global gridded data for use in population studies. Scientific Data 4, 170001 (2017).
Article Google Scholar
Tatem, A. J. et al. Mapping for maternal and newborn health: the distributions of women of childbearing age, pregnancies and births. International Journal of Health Geographics 13, 2 (2014).
Article Google Scholar
Lozano, R. et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 380, 2095–2128 (2012).
Article Google Scholar
Wang, H. et al. Age-specific and sex-specific mortality in 187 countries, 1970–2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet 380, 2071–2094 (2012).
Article Google Scholar
United Nations. World Population Ageing, 1950-2050. Department of Economic and Social Affairs, Population Division (UNDESA) (United Nations, 2002).
Madsen, E., Daumerie, B. & Hardee, K. The effects of age structure on development. Policy and issue brief (Population Action International, 2010).
Google Scholar
Cruz, M. & Amer, A. On the Impact of Demographic Change on Growth, Savings, and Poverty (World Bank, 2016).
Book Google Scholar
ICF International. Demographic and Health Survey Sampling and Household Listing Manual. MEASURE DHS. (ICF International, Calverton, MD, USA, 2012).
Openshaw, S. The Modifiable Areal Unit Problem (Geo Books, 1984).
Google Scholar

Data Citations

Pezzulo, C. Harvard Dataverse http://dx.doi.org/10.7910/DVN/S5JHQN (2017)
Pezzulo, C. Harvard Dataverse http://dx.doi.org/10.7910/DVN/6TPPZ8 (2017)
Pezzulo, C. Harvard Dataverse http://dx.doi.org/10.7910/DVN/4MJN3G (2017)
Pezzulo, C. Harvard Dataverse http://dx.doi.org/10.7910/DVN/GUSJUZ (2017)

Download references

Acknowledgements

This work is supported by funding from the Bill & Melinda Gates Foundation (OPP1134076). C.P. is supported by funding from the Bill & Melinda Gates Foundation (OPP1106427). G.M.H. and A.S. are supported by funding from the Bill & Melinda Gates Foundation (OPP1106427, 1032350). A.E.G. is supported by funding from Google (OICB150153) and the Bill & Melinda Gates Foundation (OPP1134076). C.L. is supported by funding from the Belgian Science Policy (SR/00/304). A.J.T. is supported by funding from NIH/NIAID (U19AI089674) and acknowledges the support of the RAPIDD program of the Science and Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. A.J.T. is also supported by the Bill & Melinda Gates Foundation (OPP1106427, 1032350, OPP1134076, OPP1094793), the Clinton Health Access Initiative as well as a Wellcome Trust Sustaining Health Grant (106866/Z/15/Z). This work forms part of the WorldPop Project (www.worldpop.org.uk) and Flowminder Foundation (www.flowminder.org).

Author information

Authors and Affiliations

Department of Geography and Environment, WorldPop, University of Southampton, Southampton, SO17 1BJ, UK
Carla Pezzulo, Graeme M. Hornby, Alessandro Sorichetta, Tomas J. Bird, David Kerr, Christopher T. Lloyd & Andrew J. Tatem
Flowminder Foundation, Roslagsgatan 17, Stockholm, SE-11355, Sweden
Carla Pezzulo, Alessandro Sorichetta, Tomas J. Bird & Andrew J. Tatem
GeoData, University of Southampton, Southampton, SO17 1BJ, UK
Graeme M. Hornby
Department of Geography and Geosciences, University of Louisville, Louisville, 40292, Kentucky, USA
Andrea E. Gaughan
Department of Geography, University of Namur, Namur, B-5000, Belgium
Catherine Linard
Spatial Epidemiology Lab (SpELL), Université Libre de Bruxelles, Brussels, B-1050, Belgium
Catherine Linard

Authors

Carla Pezzulo
View author publications
You can also search for this author in PubMed Google Scholar
Graeme M. Hornby
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sorichetta
View author publications
You can also search for this author in PubMed Google Scholar
Andrea E. Gaughan
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Linard
View author publications
You can also search for this author in PubMed Google Scholar
Tomas J. Bird
View author publications
You can also search for this author in PubMed Google Scholar
David Kerr
View author publications
You can also search for this author in PubMed Google Scholar
Christopher T. Lloyd
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J. Tatem
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.P., G.H.M. and A.S. drafted the manuscript. C.P., G.M.H., A.S., A.E.G., T.J.B., C.T.L. and D.K. acquired and assembled the raw data. G.M.H., A.S. and D.K. produced the WorldPop gridded continental population count datasets for 2010. G.M.H. and D.K. produced the final datasets. C.P., G.H.M. and D.K. performed the technical validation of the raw data and final datasets. G.H.M., A.S. and C.T.L. produced the gridded ISO country code datasets accompanying the final datasets. C.P. and G.M.H. prepared the tables and figures. A.J.T. aided drafting the manuscript. C.P., G.M.H., C.T.L. and A.J.T. conceived the study. All authors read and approved the final version of the manuscript.

Corresponding author

Correspondence to Carla Pezzulo.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

ISA-Tab metadata

Supplementary information

Supplementary Information (PDF 806 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.

Reprints and permissions

About this article

Cite this article

Pezzulo, C., Hornby, G., Sorichetta, A. et al. Sub-national mapping of population pyramids and dependency ratios in Africa and Asia. Sci Data 4, 170089 (2017). https://doi.org/10.1038/sdata.2017.89

Download citation

Received: 09 March 2017
Accepted: 02 June 2017
Published: 19 July 2017
DOI: https://doi.org/10.1038/sdata.2017.89

This article is cited by

Ageing population and green space dynamics for climate change adaptation in Southeast Asia
- Ji Soo Kim
- Seung Kyum Kim
Nature Climate Change (2024)
Mapping heterogeneity in family planning indicators in Burkina Faso, Kenya, and Nigeria, 2000–2020
- Doori Oh
- Rebecca M. Cogen
- Laura Dwyer-Lindgren
BMC Medicine (2024)
The spatio-temporal distribution of acute encephalitis syndrome and its association with climate and landcover in Vietnam
- Hannah E. Brindle
- Leonardo S. Bastos
- Marc Choisy
BMC Infectious Diseases (2023)
World’s human migration patterns in 2000–2019 unveiled by high-resolution data
- Venla Niva
- Alexander Horton
- Matti Kummu
Nature Human Behaviour (2023)
Global holiday datasets for understanding seasonal human mobility and population dynamics
- Shengjie Lai
- Alessandro Sorichetta
- Andrew J. Tatem
Scientific Data (2022)

Subjects

Abstract

Similar content being viewed by others

Background & Summary

Methods

Data collection

Data preparation

Processing national census data

Processing IPUMSI data

Processing household survey data

Joining standardised data tables to spatial boundaries

Producing subnational dependency ratio datasets

Producing subnational 5-year age/sex group count datasets

Code availability

Data Records

Technical Validation

Comparing sub-national dependency ratios against national estimates

Sensitivity tests on differences over time in population structures

Usage Notes

Additional Information

References

References

Data Citations

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

ISA-Tab metadata

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links