## Main

Access to adequate housing and shelter is a fundamental human right, considered central to human wellbeing through the provision of facilities that are essential to security, comfort, health and nutrition. However, major inequalities persist, and a third of the world’s urban population lived in slum conditions in 20141. In response, Sustainable Development Goal 11 aims for universal access to adequate, safe and affordable housing, and to upgrade slums by 20302. This goal builds on Millennium Development Goal 7, which aimed for a substantial improvement in the lives of 100 million people who lived in slums by 20203.

The opportunity and need for better housing is particularly acute in Africa, with its rapidly shifting economic and demographic profile. The continent’s population is the fastest growing in the world and is predicted to increase from 1.2 billion in 2015 to 2.5 billion by 2050 (an addition equivalent to the current population of India)4, which will necessitate hundreds of millions of new homes. Alongside increased housing demand, the existing housing stock is steadily transforming—for example, thatch roofs are being replaced by corrugated metal roofs, and mud walls by concrete and brick walls5. These changes present a powerful opportunity to improve human wellbeing, and they also demonstrate the urgent need for investment in housing infrastructure to ensure that vulnerable populations are not left behind6.

Reliable measurements of house types in Africa are critical for tracking changes and targeting interventions, but existing data on African housing are limited7. The primary housing indicator for the United Nations Millennium Development Goals and Sustainable Development Goals is the prevalence of urban slum housing, estimates of which are limited to urban areas only, derived from basic extrapolations from national survey data, restricted to specific years and not standardized across the continent at any subnational scale8,9. Other detailed records of African housing conditions are focused on housing costs and finance10. Here we conduct a standardized analysis using a geospatial framework to quantify the changing profile of housing in urban and rural sub-Saharan Africa during the era of the Millennium Development Goals. We show that African housing underwent a marked change between 2000 and 2015, but unimproved housing persists.

To quantify changes in housing across sub-Saharan Africa, we leveraged 62 georeferenced national household surveys, representing 661,945 unique households in 31 countries (Extended Data Fig. 1). We designed a geostatistical regression model to map house construction materials and overall house type at 5 × 5-km2 resolution across sub-Saharan Africa. We categorized house construction materials into a binary variable that compared houses built from finished materials (for example, parquet, vinyl, tiled, cement or carpet flooring) to those built from natural or unfinished materials (for example, earth, sand, dung or palm flooring) (Extended Data Table 1). We based our categorization of house type on the Millennium Development Goal and Sustainable Development Goal definition of slum housing. We considered ‘unimproved’ housing to have at least one of four characteristics: (1) unimproved water supply; (2) unimproved sanitation; (3) more than three people per bedroom; and (4) house made of natural or unfinished materials (Supplementary Methods). We considered houses that had none of these characteristics as ‘improved’.

The independent variables (covariates) used in our model were aridity11, urbanicity12, accessibility13, travel friction13, night-time lights14 and irrigation15, which are commonly used in Africa-focused geostatistical models16; we also included space and time to account for autocorrelated residual effects. Our geostatistical model utilizes the random Fourier feature approach17 in which a nonlinear, interacting function is defined through high-dimensional feature spaces computed in explicit form in a feature map. The feature map that characterizes this relationship $$\left(\varphi :{\mathscr{X}}\to {\mathscr{H}}\right)$$ induces a measure of similarity—the kernel function $$\left(k:{\mathscr{X}}\times {\mathscr{X}}\to {\mathbb{R}}\right)$$—such that $$k\left({x}_{i},{x}_{j}\right)={\left(\varphi \left({x}_{i}\right),\varphi \left({x}_{j}\right)\right)}_{{\mathscr{H}}}$$ . Expanding on a previous study17, our feature map takes the Fourier spectral form $$z(x|\omega )={[{\rm{c}}{\rm{o}}{\rm{s}}({x}^{T}\omega ){\rm{s}}{\rm{i}}{\rm{n}}({x}^{T}\omega )]}^{T}$$such that $$k\left({x}_{i},{x}_{j}|\theta \right)\approx \frac{{\sigma }^{2}}{{N}_{{\rm{feat}}}}{\sum }_{r=1}^{M}z{\left({x}_{i}|{\omega }_{r}\right)}^{T}z({x}_{j}|{\omega }_{r})$$   , with spectral measure ωr. Rather than assuming a specific spectral distribution ωr, we obtain this Lebesgue measure directly from the data18. Given a response variable (for example, wall material), we used a beta-binomial likelihood function p([y+ytotal] | xωϕ) = BetaBinomial(z(x|ω), ϕ) that enabled overdispersion in the data while simultaneously accounting for sample size variation (from ytotalytotal). We performed regularization using dropout19. Approximate posterior confidence intervals were estimated using the weighted likelihood bootstrap20. Fitting was performed using ADAM stochastic gradient descent21. Cross-validation was performed to check for model fit and to assess the predictive accuracy of the model (Supplementary Text, Extended Data Figs. 2, 9). Population-weighted prevalences of people living in different house types in urban and rural areas were calculated using yearly population data from the WorldPop project22 and a static urban–rural definition from the Global Urban Footprint project23.

Our analysis revealed a marked transformation of housing in sub-Saharan Africa from 2000 to 2015. Across all sub-Saharan countries (excluding South Africa, Comoros and desert areas), the prevalence of houses that were built with finished materials increased from 32% (29–33%) in 2000 to 51% (49–54%) in 2015 (Table 1 and Figs. 1, 2). Our analysis suggests a widespread pattern of incremental modifications to the roof, then the walls and finally to the floor of houses (Extended Data Fig. 3). Overall, the predicted prevalence of improved housing (with improved water and sanitation, sufficient living area and durable construction) doubled from 11% (10–12%) in 2000 to 23% (21–25%) in 2015 (Table 1), with prevalences ranging from 5% (5–6%) in rural Ethiopia to 76% (71–80%) in urban Zimbabwe in 2015 (Supplementary Table 1). Between 2000 and 2015, 134 (118–147) million Africans in the analysed countries gained access to improved housing. However, unacceptable inequalities persist, with 53 (50–57) million urban inhabitants (47% (44–50%) of the total urban population of sub-Saharan Africa analysed) and 595 (585–607) million rural inhabitants (82% (80–83%) of the rural population) living in unimproved housing in 2015.

To examine the links between housing and socioeconomic factors, we quantified the association between house type and household characteristics in 51 national surveys, representing 588,892 households (Supplementary Table 2). For each survey, controlling for cluster-level variation, we jointly estimated the odds of improved housing in relation to education level of the household head, household wealth and age of the household head. We found that the odds of improved housing were 80% higher in more educated households (adjusted odds ratio, 1.80; 95% confidence interval, 1.68–1.93; P < 0.001; Fig. 3a), more than double in the wealthiest households (adjusted odds ratio, 2.53; 95% confidence interval, 2.28–2.82; P < 0.001; Fig. 3b) and 31% higher with increased age of the household head (adjusted odds ratio, 1.31; 95% confidence interval, 1.24–1.39; Extended Data Fig. 4). We also observed a higher prevalence of improved housing in urban survey clusters than rural survey clusters (Extended Data Fig. 5). Across all surveys, a 10% increase in the prevalence of urban clusters (including small rural towns that have grown from villages and account for much of sub-Saharan Africa’s urban growth) was associated with a 7.5% increase in improved housing (Extended Data Fig. 6).

Here we quantified housing conditions across urban and rural sub-Saharan Africa during the era of the Millenium Development Goals, and provided a detailed baseline measurement for the Sustainable Development Goals. By applying a geospatial approach to empirical observations, we have built considerably on existing measurements of African housing, which are limited to urban areas, not standardized at any subnational scale and are derived from more simplistic extrapolations from survey data. We show that the prevalence of improved housing (defined as housing with improved water and sanitation, sufficient living area and durable construction) doubled during 2000–2015, but that an unacceptably large proportion of people still live in unimproved housing in urban areas.

Our findings are consistent with continent-wide changes to African housing being driven by economic growth24. Increasing household spending is likely to have led people to invest more in their homes and, indeed, we found a clear increase in the prevalence of houses built with finished materials since 2000. Furthermore, house types changed the most in countries with the highest baseline prevalence of improved housing (Extended Data Fig. 7) and house type was clearly associated at the household level with education, wealth and age of the household head. In urban areas, the changes may also have been driven by a lack of traditional materials and the commodification of housing. In the future, continued population and urban growth in sub-Saharan Africa may help to sustain housing demand and incremental housing changes. In turn, ‘healthy urbanization’ has been recognized as important for maintaining economic productivity and growth24.

Our study has important implications for international goals, which have sought to address housing inequalities by achieving a substantial improvement in the lives of 100 million people living in slums by 20203 and universal access to adequate, safe and affordable housing by 20302. We show a considerable reduction in the prevalence of urban unimproved housing across sub-Saharan Africa from 68% (65–71%) in 2000 to 47% (44–50%) in 2015, similar to the equivalent estimates from the United Nations of 65% in 2000 and 55% in 201425. However, nearly half of Africa’s urban population still lives in unimproved conditions, which is partly explained by widespread unimproved sanitation—the most common housing deprivation in 75% (52 out of 69) of surveys analysed (Extended Data Fig. 8). These findings highlight the urgent need for governments to improve water and sanitation infrastructure as households continue to spend individually on their homes.

Housing is a central pillar of human security and wellbeing and is increasingly vital in the context of Africa’s urbanization and population growth. For example, house design is integral to Sustainable Development Goal 3 through a myriad of associated health outcomes, including mental health, respiratory disease, soil-transmitted helminths, diarrhoeal disease, leishmaniasis and malaria26,27,28. As towns and cities in sub-Saharan Africa grow, rapid development of luxury housing in major urban centres is occurring alongside the expansion of informal settlements that lack basic infrastructure. In addition, formal housing investment typically lags behind urbanization and the continent’s major urban growth is concentrated in smaller urban centres that have limited capacity to organize construction6. Addressing the housing needs of a growing population is key for sustainable urban development and the health and wellbeing of millions of Africans29, and will facilitate faster attainment of the Sustainable Development Goals. Our maps provide a critical mechanism to guide intervention and the measurement of change.

## Methods

### Data reporting

No statistical methods were used to predetermine sample size. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.

### Data sources

Data were sourced from Demographic and Health Surveys (DHS), Malaria Indicator Surveys (MIS) and AIDS Indicator Surveys (AIS), which are cross-sectional surveys designed to collect nationally representative health and sociodemographic data, typically at 3–5-year intervals30,31. Surveys are administered using a stratified two-stage cluster design in which primary sampling units are randomly selected from census data and households are randomly selected within primary sampling units from an updated enumeration list. Raw DHS datasets were downloaded from www.dhsprogram.com and data were directly categorized into the household-level variables included in the analysis, using the definitions below.

### Definition of house type

We explored two aspects of housing: (1) house construction materials; and (2) overall house type (Extended Data Table 1). DHS and MIS record the main materials used for the roof, walls and/or floor32 and categorize these as ‘natural’, ‘rudimentary’ or ‘finished’ (for example, finished floor materials include parquet, vinyl, ceramic tiles, cement and carpet, whereas natural or rudimentary floor materials include earth, sand, dung, wood planks, palm and bamboo33). We classified houses as ‘built from finished materials’ if at least two out of three of the materials for the walls, roof and floor were finished and as ‘built from natural or unfinished materials’ if this criterion was not met. We based our categorization of house type on the definition of slum housing3 of the United Nations Millennium Development Goals and Sustainable Development Goals. ‘Unimproved’ houses were considered to have at least one of four characteristics: (1) unimproved water supply (as defined by the World Health Organization Joint Monitoring Programme (WHO JMP)34 (Supplementary Table 3)); (2) unimproved sanitation (as defined by WHO JMP34); (3) more than three people per bedroom; and (4) house made of natural or unfinished material. Houses that had none of these characteristics were considered as ‘improved’. Following the United Nations protocol, we excluded a fifth characteristic of unimproved housing from our definition (insecurity of tenure) due to the lack of internationally comparable data3.

### Predicting changes in housing from 2000 to 2015

We included in the analysis all georeferenced surveys (that is, latitude and longitude were available for each geographical cluster) with data on water supply, sanitation facilities, number of household members, number of bedrooms and main material of the roof, walls and floor (Extended Data Fig. 1 and Supplementary Table 4). We designed a geostatistical regression model to map (1) main material of the roof, walls and floor; and (2) overall house type at 5 × 5-km2 resolution across sub-Saharan Africa. The independent variables (covariates) used in our model were aridity index11, degree of urbanicity12, accessibility to large cities13, travel friction surface13, night-time lights14 and irrigation15. These independent variables were chosen as close proxies for factors that affect house type, such as poverty, development, urbanization, transport access and population density. We also included spatial coordinates and time, to account for spatio-temporally autocorrelated residual effects. Our geostatistical model utilizes the random Fourier feature approach17 for which a nonlinear, interacting function is defined through high-dimensional feature spaces computed in explicit form through a feature map. This approach approximates a kernel function through an explicit rather than implicit map. The feature map associates a kernel function $$k:{\mathscr{X}}\times {\mathscr{X}}\to {\mathbb{R}}$$, which is defined on an input domain $${\mathscr{X}}\in \left\{{x}_{1},..,{x}_{d}\right\}\in {{\mathbb{R}}}^{d}$$ such that $$k\left({x}_{i},{x}_{j}\right)={\left(\varphi \left({x}_{i}\right),\varphi \left({x}_{j}\right)\right)}_{{\mathscr{H}}}$$ where $$\varphi :{\mathscr{X}}\to {\mathscr{H}}$$ is the feature map that associates kernel k with an embedding of the input space into a reproducing kernel Hilbert space $${\mathscr{H}}$$. On the basis of a previous study17, our feature map takes the form $$z(x|\omega )={[{\rm{c}}{\rm{o}}{\rm{s}}({x}^{T}\omega ){\rm{s}}{\rm{i}}{\rm{n}}({x}^{T}\omega )]}^{T}$$ such that $$k\left({x}_{i},{x}_{j}|\theta \right)\approx \frac{{\sigma }^{2}}{{N}_{{\rm{feat}}}}{\sum }_{r=1}^{M}z{\left({x}_{i}|{\omega }_{r}\right)}^{T}z({x}_{j}|{\omega }_{r})$$ with a given spectral measure ωr. Rather than assuming a spectral distribution in which ωr is associated with a given kernel (for example, Student’s t for the Matérn kernel), we obtained this distribution (empirical Lebesgue measure) directly from the data18. Given a response variable (for example, wall type), we used a beta-binomial likelihood function $$p(y|x,\omega ,\varphi )={\rm{BetaBinomial}}\left(z\left(x|\omega \right),\varphi \right)$$ to perform inference, allowing for overdispersion and sample size effects in the data. We performed regularization using dropout19. Dropout was chosen over standard $${{\ell }}_{1},{{\ell }}_{2}$$ penalties owing to ease of implementation, parameter tuning and superior cross-validation performance. We included length scale parameters for each covariate dimension by shrinking or expanding the empirical measure ω. These unconstrained length scale parameters were passed through a reticulated linear function to ensure positivity and sparse covariate selection (known as automatic relevance determination). Fitting was performed using ADAM stochastic gradient descent21 with GPU TensorFlow. Validation was carried out to check for appropriate model fit and to assess the predictive accuracy of the model. The predictive performance of the model at the pixel level and administrative division level 1 was assessed via out-of-sample validation. The dropout hyperparameter was selected by random search. The robustness of this hyperparameter was double-checked by implementing automatic selection of the hyperparameter through concrete dropout35. Confidence intervals and uncertainty were estimated using the weighted likelihood bootstrap, a method that generates samples from an approximate Bayesian posterior of a parametric model20. In brief, this approach involved repeated fitting of the model while weighting the likelihood with uniform Dirichlet weights—that is, $$p{\left(y|x,\omega \right)}^{w}$$ where $$w \sim {\rm{Dirichlet}}\left(1,1,\ldots ,n\right)$$. We performed this procedure 100 times to obtain 100 realizations of the model and subsequent surfaces. Population-weighted prevalences of people living in different house types in urban and rural areas were calculated using yearly population data from the WorldPop project22 and using a static urban-rural definition from the Global Urban Footprint project23.

### Model performance

The predictive performance of both the improved housing and finished materials models at pixel level was assessed through out-of-sample validation. In this validation scheme, the full dataset was randomly partitioned such that 75% of the dataset was used to fit the model and the remaining 25% was used to test the model predictive performance. This scheme was repeated 10 times and the scores were averaged. The mean squared error36 and the correlation coefficient were used to evaluate model performance. We evaluated the model at both the pixel level and the aggregate survey level. The latter was included to give an indication of how well the model predicts the survey as a whole. The predictive scores (Supplementary Table 6 and Extended Data Fig. 9) indicate excellent model performance, comparable to those from widely used, established models16. In addition, as pixel-level estimates we performed additional cross-validation analyses in which entire administrative divisions of data were held out. This allowed us to compare our predictions with those from surveys at a resolution at which the surveys are reliable. The results from this validation support the conclusions from our primary analysis (Supplementary Table 7). Confidence intervals were evaluated by the widely used continuous ranked probability score, a score that generalizes the mean absolute error to probability distributions36. The continuous ranked probability scores show suitable credible intervals with most scores clustering towards zero.

### Household-level association between house type and socioeconomic factors

Binary variables for education, wealth and age of the household head were created at the household level. National surveys record the highest level of education of the household head and we compared more than primary education with primary education or less. DHS and MIS household wealth index scores are developed using principal component analyses that typically include variables that describe durable asset ownership, access to utilities and infrastructure and house construction materials37. To enable estimation of the association between house type and wealth, we constructed a new asset-based wealth index for each survey that excluded variables related to house construction. For each household survey, we applied inclusion criteria27 of (1) fewer than 10% missing values; and (2) frequency values between 5% and 95% for the following set of assets: (a) car; (b) motorboat; (c) scooter; (d) cart; (e) bicycle; (f) television; (g) refrigerator; (h) radio; (i) watch; (j) mobile telephone; (k) landline telephone; and (l) electrification of the household. We tested several dimensionality reduction algorithms, with the goal of condensing these twelve assets into a single dimensional index. We tested ISOMAP38, kernel principal component analysis (PCA)39, t-distributed stochastic neighbour embedding40 and linear PCA41. We found minimal differences between algorithms and therefore opted for linear PCA. Using the first principal component, we created a binary wealth variable comparing households belonging to the upper 75th wealth quartile with all other households.

We examined the association between house type and education level of the household head, household wealth and age of the household head in 51 cross-sectional household surveys, representing 588,892 households. For each survey, we jointly estimated the odds of improved housing in relation to whether the household head completed at least primary education, whether a household belonged to the upper 75% wealth quartile and whether the household head was aged over 55 years (which is approximately the upper 75% age quartile). We performed conditional logistic regression to allow these associations to be estimated within geographical cluster in order to eliminate confounding due to inter-cluster variation in urbanicity, regional wealth, climate and other survey design factors. Individual survey odds ratios were combined to determine a summary odds ratio for all surveys using random effects meta-analysis. Individual and summary odds ratios were displayed in forest plots.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

### Code availability

Analysis code is available via codeshare: https://codeshare.io/2pm4Px.