Maps are key to revealing public-health challenges and suggesting potential solutions. Physician John Snow’s map of cholera cases and public wells in London in the 1850s was an early example of this, famously revealing that cholera is spread through contaminated water1. Two papers2,3 in Nature now provide high-resolution maps of health and education levels for children across Africa. Their maps will draw attention to areas most in need of support, and guide future interventions to where they can have the greatest impact.
Public-health policies and decision-making are often local, and so, ideally, are based on information on a small spatial scale. For example, when analysing disparities in immunization rates between urban and rural areas, information at the level of a village is often more valuable than district-level information. Like most disciplines in the twenty-first century, public health benefits from access to an unprecedented amount of data that can provide information not only on smaller spatial scales than was previously possible, but also more frequently — an exciting prospect for scientists and policymakers.
However, individual data sets often do not provide the information needed to answer key policy-relevant questions. For example, to estimate the relationship between maternal education level and infant mortality, it might be necessary to combine data from multiple sources. In much the same way as one can make educated guesses about missing pieces in a partially completed puzzle, it can be possible to find hidden information by piecing together data from different sources. But because each data source comes with its own set of errors and complexities, sophisticated statistical methods are required to both integrate the data into usable information and reflect its intrinsic uncertainties. An example of one such set of methods is Bayesian geostatistics, which combines data from multiple locations and data sets, exploiting their spatial correlations to predict values for regions for which information is lacking, while also providing information about the uncertainties involved in these predictions.
The current studies use new, advanced Bayesian geostatistical tools to analyse two problems Africa-wide. In the first paper, Osgood-Zimmerman et al.2 focused on childhood growth failure. They pooled geolocated information on growth stunting, muscle wasting and weight in children under the age of 5 from several surveys across tens of thousands of villages over 15 years. They then combined this with information on local climate and geography. They carefully validated their statistical model by first fitting the model to data at a subset of locations, then comparing the predictions from this fitted model with data at a different subset of locations.
The map is split into ‘pixels’ of 5 × 5 kilometres across Africa, and shows changes in growth failure over time, from 2000 to 2015 (Fig. 1). The authors use their data to point out the differences in improvements across time in different regions, and show that national-level data mask nuances uncovered by their precision maps. They also provide measures of the certainty of the prediction made for each region, thereby highlighting both the uncertainty in the maps and the areas of Africa most in need of additional sampling. They then use their model to assess the likelihood of achieving 2025 global nutrition targets, and point out regions in which progress is lagging behind. The authors find that, unless there is a change in the current rates of improvement, much of the continent will fail to meet the goal of ending malnutrition by 2030.
In the second paper, Graetz et al.3 used a similar approach to map local variations across Africa in the number of years of education that women between the ages of 15 and 49 have received. This is of particular interest because educational attainment is linked to the health of both mothers and their children. The authors produced their maps from data gathered from geolocated household surveys and censuses. They generated maps of average attainment across regions, including changes between 2000 and 2015, and provided uncertainties for each average. In addition, they generated similar maps for men, and and showed that, although there has been progress in educational attainment for both men and women across the continent, substantial differences between the sexes remain.
As these papers show, we now have sufficiently mature statistical methodology, theory and software to analyse continental-scale problems using sound methods and open-source software. This type of analysis was simply not possible ten years ago. In the past, researchers might have resorted to aggregating the data used in the current studies by country, at the expense of incorporating the local phenomenon that really drives the science. Alternatively, they might have restricted the analysis to one country, thus failing to exploit the full power of the data at hand. But the sophisticated geospatial tools used in the current work employ clever numerical approximations to sidestep the computational bottlenecks posed by analysing so many correlated observations. These methods are applicable to much more than just the public-health domains described here, and should provide scientific insights in many disciplines. Of course, it is important that these powerful statistical tools are not applied blindly. In both papers, the authors are careful to weight data appropriately and to validate their predictions at each step.
There is much excitement these days about the way in which enormous data sets are helping us to address many hard scientific challenges. In reality, data sets are useful only when combined with a deep understanding of the relevant science, economics or sociology, such as the impact of culture in a particular region, or details about how diseases spread. A solid understanding of how data are collected is also crucial. Rigorous scientific advances emerge when interdisciplinary teams work closely together — the current papers, which involve researchers trained in epidemiology, statistics, demography and public health, are prime examples of this.
The ultimate goal of a spatial analysis is to design interventions for maximum impact. If we understand a spatio-temporal process, we can optimize the allocation of resources in space and time. For example, consider the spread of malaria, and the effect of interventions such as bed-net distribution. A 2016 analysis4 considered several malaria interventions, and determined the most cost-effective intervention for each 5-km2 pixel in Africa on the basis of spatial variation in climate, mosquito populations and the current state of the disease. The results from Osgood-Zimmerman et al. and Graetz et al. should prove useful in an analogous study of optimal interventions for nutrition and education. We believe that we are entering an era in which this type of analysis can be applied broadly to improve the lives of people around the world.
Nature 555, 32-33 (2018)