Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017

HIV/AIDS is a leading cause of disease burden in sub-Saharan Africa. Existing evidence has demonstrated that there is substantial local variation in the prevalence of HIV; however, subnational variation has not been investigated at a high spatial resolution across the continent. Here we explore within-country variation at a 5 × 5-km resolution in sub-Saharan Africa by estimating the prevalence of HIV among adults (aged 15–49 years) and the corresponding number of people living with HIV from 2000 to 2017. Our analysis reveals substantial within-country variation in the prevalence of HIV throughout sub-Saharan Africa and local differences in both the direction and rate of change in HIV prevalence between 2000 and 2017, highlighting the degree to which important local differences are masked when examining trends at the country level. These fine-scale estimates of HIV prevalence across space and time provide an important tool for precisely targeting the interventions that are necessary to bringing HIV infections under control in sub-Saharan Africa.

Provide information on all included data sources and their main characteristics. For each data source used, report reference information or contact name/institution, population represented, data collection method, year(s) of data collection, sex and age range, diagnostic criteria or measurement method, and sample size, as relevant.
Methods section; SI section 5.2 For data inputs that contribute to the analysis but were not synthesized as part of the study: 7 Describe and give sources for any other data inputs. SI sections 3.1, 3.3, 3.4; Supplementary Table 5 For all data inputs: 8 Provide all data inputs in a file format from which data can be efficiently extracted (e.g., a spreadsheet rather than a PDF), including all relevant meta-data listed in item 5. For any data inputs that cannot be shared because of ethical or legal reasons, such as third-party ownership, provide a contact name or the name of the institution that retains the right to the data.
Available through http://ghdx.healthd ata.org/ihmedata/africa-hivprevalencegeospatialestimates-2000-2017 Data analysis 9 Provide a conceptual overview of the data analysis method. A diagram may be helpful.
Methods section, Extended data Figure 5  10 Provide a detailed description of all steps of the analysis, including mathematical formulae. This description should cover, as relevant, data cleaning, data pre-processing, data adjustments and weighting of data sources, and mathematical or statistical model(s State how analytic or statistical source code used to generate estimates can be accessed.

Methods section
Results and Discussion 15 Provide published estimates in a file format from which data can be efficiently extracted.
Results section; Extended data figure 4; Supplementary figures 1- 4 17 Interpret results in light of existing evidence. If updating a previous set of estimates, describe the reasons for changes in estimates.
Precision public health and HIV section 18 Discuss limitations of the estimates. Include a discussion of any modelling assumptions or data limitations that affect interpretation of the estimates.
Methods section; SI section 5.2 2 HIV data sources and data processing 2.1 Seroprevalence surveys

Data identification strategy
We identified HIV seroprevalence surveys in sub-Saharan Africa through a review of all surveys in the Demographic and Health Survey (DHS), AIDS Indicator Survey (AIS), and Multiple Indicator Cluster Survey (MICS) series; surveys listed in the Global Health Data Exchange 1 ; surveys included in the national HIV estimates files from UNAIDS 2 ; and surveys listed in the US Census Bureau HIV/AIDS Surveillance Database 3 . For a survey to be considered for this analysis, we required that the survey reported HIV blood test results, sampled from the general adult population, and contained geographic information more refined than country-level. For surveys with no microdata available we used reports if they included sample size or uncertainty intervals. Our desired age range was 15-49 years, but we also included survey reports that recorded prevalence for an expanded or overlapping age range. The surveys used in this analysis are listed in Supplementary

Data processing for microdata
To prepare survey microdata for analysis, we first subset the data to the age range of interest, 15-49 years. We then dropped rows for individuals explicitly listed as not tested or where the blood samples were marked as lost or rejected (insufficient sample volume, tip broken, etc.). Inconclusive and indeterminate test results were coded as a negative test result. After subsetting on these conditions, a further 0.07% of the microdata were missing an HIV test result or a survey weight and were dropped from the analysis. A further 0.93% of the microdata were dropped due to missing geographic information or due to the GPS coordinates being located more than 10 km outside of the country border; coordinates within 10 km of the country border were snapped to be approximately 1 km inside the nearest border of the specified country.
We then aggregated the individual-level microdata to calculate HIV prevalence at the finest possible spatial resolution available, ideally a latitude and longitude pair representing the location of the survey cluster (point-level data). The interview date for each specific location was calculated as the median of the individual-level interview dates. Where point-level referencing was not available, we geolocated survey microdata to the smallest geographical area (polygon) possible. Individual-level sample weights were used when calculating prevalence, and the effective sample size for each prevalence estimate was estimated via the Kish approximation 4 , which accounts for differences in the underlying selection probability within a sample.

Data processing for reports
In instances where individual-level microdata were not available, we used summary reports, given that the estimates reported were similar in nature to what we would calculate from the microdata. We used the median months of the reported data collection periods (ranging from 2 to 13 months in length) as the interview dates to align with the extracted microdata. If the sample sizes were not included in the report, we estimated the sample sizes from the reported confidence intervals, assuming that a Normal approximation was used to generate 95% confidence intervals. In both instances, sample sizes were further adjusted by multiplying the median design effect (ratio of effective sample size to observed sample size) calculated in the microdata as described above. For prevalence estimates provided for an age range other than 15-49 years, we cross-walked the prevalence to estimate prevalence for the 15 There were 1,846 results from this search. We then completed a title/abstract review where articles were excluded that met the following criteria: • Used a data source for which we already had microdata; • Location outside of Africa; • No measurement of HIV prevalence; • Facility-or school-based studies; • Special populations (migrants, workers, blood donors, patients, pregnant women, men who have sex with men, female sex workers, refugees, etc.).
668 articles remained after we excluded the articles that met our initial exclusion criteria. We then performed a full text review, in which we added these additional exclusion criteria: • Only national-level results; • Data collected prior to 2000; • Sample size and uncertainty intervals not included; • Convenience sampling or other non-probability type sampling; • Impossible to geolocate due to lack of location information; • Articles that had duplicate information from the same cohort study.
30 articles remained after the full text review. One additional article that was not captured by our PubMed search but was identified by other means was also included, for a total of 31 articles that were ultimately extracted.

Data processing
For all 31 articles, we extracted information on location, sex, age range, year, sample size, HIV prevalence, HIV type (HIV-1, HIV-2, dual-infections, HIV and combined), response rate, case definition (type of blood test used), case diagnostics (sampling methods and exclusion criteria), and any additional notes deemed relevant.
After extracting the data from each article, we geolocated each observation to a specific latitude and longitude or to a polygon as follows: 1. If the location information provided described an administrative subdivision at the first, second, third, or fourth level, we geolocated these data to a polygon corresponding to this area. 2. If the location information provided described an administrative area below the fourth level, we searched for the location on Google Maps and OpenStreetMaps and measured the area of the location: if the area was less than 5 km in diameter, we geolocated it to the central GPS coordinate of the area; if the area was greater than 5 km in diameter, we created a polygon shape file using QGIS based on the shape in Google Maps or OpenStreetMaps. 3. If the data extracted were from a demographic surveillance site or other cohort, we attempted to locate a map of the study site (these were provided in the article in some instances; in other instances, we did additional internet searches to locate this information) and then used ArcGIS to create a shape file to accurately represent the area.

Age cross-walk
In some instances, HIV prevalence estimates extracted from literature or survey reports referred to age ranges other than the standard 15-49 age range, which we denote below as 1 -2 where 1 ≠ 15 and/or 2 ≠ 49. In these instances, we used linear regression to transform the estimates of HIV prevalence from the reported age range 1 -2 to HIV prevalence for the standard 15-49 age range.
To do this, for each 1 -2 we first identified surveys with microdata available that spanned both the 1 -2 and 15-49 age ranges, i.e., minimum( 1 , 15)-maximum( 2 , 49), and produced HIV prevalence estimates for both age ranges by first-level administrative subdivisions. Then for each non-standard age range we used these data to fit the following linear regression model: 15−49 is the observed HIV prevalence among individuals age 15-49 years; • 1 − 2 is the observed HIV prevalence among individuals age 1 -2 years; • 0 is an intercept; is a normally distributed error term with mean 0.
In this model, a logit transform was applied to prevalence to ensure the predicted prevalence was restricted to between 0 and 1. For values that were equal to 0 and 1, a transformation was employed to preserve the shape of the logistic curve. For these values, an was selected to be half of the smallest non-zero value, and all instances of 0 were replaced with and instances of 1 replaced with 1 − . Linear regression was chosen over more complex algorithms due to the high proportion of variance explained with this relatively simple method (R 2 ranged from 0.84 to 0.99).
Uncertainty in prevalence for ages 15-49 as a result of this cross-walk was estimated by simulating 1,000 draws of the slope, intercept, and error term from the fitted linear model, and using these to construct 1,000 draws of HIV prevalence among ages 15-49 5 . Then, for each draw of HIV prevalence, we simulated a random draw from a binomial distribution with p equal to prevalence from this draw and N equal to the reported sample size and then divided this simulated count by the reported sample size to again calculate HIV prevalence. These draws of HIV prevalence reflected both the sampling uncertainty as well as the additional uncertainty introduced by cross-walking from non-standard age ranges to 15-49. Finally, a new effective sample size was calculated by finding the variance of the prevalence draws and employing the relationship between variance and sample size of a binomial distribution. This new sample size reflected our confidence in the estimate of HIV prevalence for ages 15-49 and is a function of the uncertainty in our linear model, the original sample size, and level of HIV prevalence.
Surveys where this age cross-walk was required are listed in Supplementary Table 3. 2.4 Antenatal care (ANC) sentinel surveillance

Data sources
In addition to general population surveys, we used antenatal care sentinel surveillance data, which measure HIV prevalence among pregnant women attending antenatal care clinics. Most of these data came from national Spectrum files that were developed by a country team of experts and compiled and shared by the UNAIDS secretariat 2 . These files include the HIV prevalence and sample size of ANC sentinel surveillance and routine testing for various sites and years. We only used the sentinel surveillance estimates for our analysis.
We supplemented this data with ANC sentinel surveillance country reports. In general, the reports contained the same information as the Spectrum files, but there was some additional information in the reports and some discrepancies compared to the Spectrum files. The additional information included additional sites, additional years for given sites, and more precise prevalence estimates. In instances where there were discrepancies for a given site-year, we elected to use the source where HIV prevalence was closest to the average prevalence of surrounding years for the same site.
Five countries had a notably large number of discrepancies between the Spectrum files and the ANC reports. The Senegal Spectrum files recorded only HIV-1 positive results so we used the reports to extract both HIV-1 and HIV-2 positive results to correspond with other countries' estimates. The Zambia Spectrum files recorded prevalence for the 15-39 age range, while the reports recorded prevalence for the 15-44 age range. In this case, we elected to use the Spectrum files because they had better data coverage in terms of number of site-years. There were also many discrepancies in Central African Republic, Côte d'Ivoire, and Zimbabwe; we were unable to identify a specific reason for these discrepancies and elected to use data from the Spectrum files only for these countries.
We investigated the ANC reports to determine if site names in the Spectrum files represented hospitals, cities, or administrative subdivisions. We then used various mapping websites to find geographic information related to these sites. For hospitals and cities/towns that are less than 25 km 2 in area, we used a central GPS coordinate, and for administrative subdivisions we used a polygon of the area. Some hospital sites had a city or town name rather than a hospital name. In those instances, we searched for a hospital in the given city or town and used that hospital's GPS coordinates. If there were multiple hospitals in the area but they were less than 5 km apart, we used the GPS coordinates of the midpoint of the hospitals. If no hospitals were found in the area but the corresponding region was less than 25 km 2 in area, we used the central GPS coordinate. 5.2% of the sites could not be geolocated because none of these conditions were met.
The ANC data included in this analysis are listed in Supplementary

Data processing
To prepare the ANC data for analysis, we compiled the HIV prevalence and sample size data from the Spectrum files and the ANC reports, and the site geographic information -either GPS coordinates or polygons for administrative subdivisions -into one dataset. After thoroughly inspecting the data, we decided to exclude the following data from our analysis: • Hospital-level sites were dropped from Congo in 2011 (23 site-years) and Guinea-Bissau in 2003, 2005, 2010, 2014 (10 site-years) because the data aggregated by administrative subdivisions had better temporal coverage. • We dropped administrative subdivisions that were masked by a different level of administrative subdivisions (8 site-years), defaulting to the level that would give better temporal coverage. • We determined that 4 site-years were outliers based on inspection of site-level time trends, and these were dropped from the analysis. • Data from sites that could not be geolocated were also dropped (245 site-years).
• Additionally, in the Spectrum files, data from 12 site-years labeled as sentinel surveillance we suspect are actually routine testing, and we excluded these from the analysis.

Polygon resampling
Wherever possible, HIV prevalence data were matched to a specific latitude and longitude. In instances where this was not possible, we matched data to the smallest areal unit (termed a polygon) possible. In most instances, these polygons represented administrative subdivisions. The statistical model we employed requires point-referenced data, so data matched to polygons were resampled to generate pseudo-point data based on the underlying population distribution within the polygon. The methods for the resampling are consistent with those previously used in geospatial modelling of under-5 mortality 6 .
Specifically, for each polygon-level observation, we randomly sampled 10,000 locations among grid cells in the given polygon with probability proportional to grid cell population. Grid cells were defined to be contained within the polygon if their centroid fell within the geographic boundary. We performed kmeans clustering (with k set to 1 per 40 grid cells) on the sampled points to generate a reduced set of locations to be used in modelling based on the k-means cluster centroids. Weights were assigned to each pseudo-point proportional to the number of sampled points contained in each of the k-means clusters, i.e., the number of sampled points divided by 10,000. Each pseudo-point generated by this process was assigned the HIV prevalence observed for the polygon as a whole, and a sample size equal to the sample size for the polygon as a whole multiplied by the weight derived for each point.
3 Covariate and auxiliary data 3.1 Pre-existing covariates This analysis included five pre-existing covariates: travel time to the nearest settlement of more than 50,000 inhabitants, total population, night-time lights, urbanicity, and malaria incidence. These variables were selected from among available gridded datasets for sub-Saharan Africa because they are factors or proxies for factors previous literature has identified to be associated (not necessarily causally) with HIV prevalence. The first four variables were included as measures or proxies for connectedness and urbanicity as HIV historically spread through sub-Saharan Africa along travel routes 7,8 and is typically found to be higher in more urban compared to more rural locations. Malaria incidence was selected based on prior evidence relating higher malaria incidence rates to higher prevalence of HIV at the population level 9,10 . Sources for these data are given in Supplementary Table 5. These covariates underwent spatial and temporal processing in preparation for their inclusion in analysis.
Spatial processing involved resampling the input covariate raster to align the spatial resolution of the covariate to the 5 x 5-km resolution used in modelling. For covariates that were originally at a finer resolution, we resampled the raster by taking the neighborhood average (travel time to the nearest settlement of more than 50,000 inhabitants, night-time lights, and urbanicity) or sum (total population) of the finer covariate raster to produce one at a 5 x 5-km resolution. Malaria incidence was natively at a 5 x 5-km resolution and thus did not require additional spatial processing.
Temporal processing was required in instances where the original temporal resolution of the covariate was anything other than annual. To resolve from a coarser time period to an annual time period, we filled the intervening years with the value from the nearest neighboring year (urbanicity) or using an exponential growth rate model (total population). Night-time lights and malaria incidence were provided at a one-year temporal resolution and did not require interpolation. As travel time to the nearest settlement of more than 50,000 inhabitants was available only for a single representative year (2015), this covariate was set to be unchanged over time. After interpolation, night-time lights and urbanicity were still missing the most recent years of the 2000-2017 analysis period, and in these instances we filled out the end of the time-series carrying forward the most recent year without modification.

Covariate selection criteria and definitions
In addition to the five pre-existing covariates, additional covariates were constructed specifically for this analysis. Numerous studies have been conducted in sub-Saharan Africa on risk and protective factors for HIV infection, and these factors commonly include sexual behaviour and factors that are thought to influence the transmission of HIV during sexual intercourse 11 . Potential covariates were informed by past literature and required to have a demonstrated association with HIV prevalence, though not necessarily a causal relationship. Furthermore, our selection of covariates depended on having adequate data coverage from data sources that could be readily extracted. In total, eight covariates were constructed: • Prevalence of male circumcision, including medical or traditional circumcision ('male circumcision'); • Prevalence of self-reported STI symptoms (genital discharge and/or genital ulcer/sore) in the last 12 months ('STI symptoms'); • Prevalence of marriage or living with a partner as married ('in union'); • Prevalence of one's current partner living elsewhere ('partner away'); • Prevalence of condom use at last sexual encounter within the last 12 months ('condom last time'); • Prevalence of reporting ever had sexual intercourse among young adults ('had intercourse'); • Prevalence of men reporting multiple sexual partners within the last year ('multiple partners in year'); • Prevalence of women reporting multiple sexual partners within the last year ('multiple partners in year').
The notion that male circumcision has a protective effect against acquiring HIV was first proposed in 1986, and since then more than 30 cross-sectional studies have found the prevalence of HIV to be significantly higher in uncircumcised men as well as numerous prospective studies that have shown a protective effect ranging from 48% to 88% 12,13 . In 2005, following the interruption of a randomized controlled trial of male circumcision in South Africa that showed a 60% protective effect of circumcision, WHO and UN agencies first acknowledged evidence of male circumcision's protective effect 14,15 . Following these declarations, voluntary medical male circumcision clinics (VMMC) emerged as an HIV prevention strategy in 14 countries in Eastern and Southern Africa with high HIV prevalence and low levels of male circumcision 16 . Given male circumcision's linkage to HIV in the scientific literature, many surveys record men's self-reported circumcision status.
Coinfection of HIV with viral and bacterial sexually transmitted infections (STIs), most notably herpes simplex virus type 2, is a well-studied mechanistic factor associated with higher risk of HIV acquisition 17 .
STIs are thought to have been especially important risk factors during the early stages of the epidemic when infections were concentrated in high-risk groups, though researchers have since argued STIs are also critical in advanced stages 18 . Due to the association between STI prevalence, sexual behaviour, and HIV, most survey series detail the self-reported presence of STI symptoms, facilitating its inclusion as an HIV covariate in this analysis.
Marital status represents a structural factor that, while distal to HIV exposure, has been associated with the number and type of sexual partners, as well as with HIV status 19,20 . It has been postulated that the relationship between an individual's marital status and the number of sexual relationships regulates the protective effect of marriage on the risk of HIV infection, especially in older populations 21 . Marital status is a readily available indicator in household surveys more generally.
The frequency with which a partner has slept away from home during the past year is an indicator of the mobility of male partners, and studies have found that mobility confers an increased risk for HIV 22 . Part of the rapid spread of HIV in sub-Saharan Africa has been attributed to occupations that consist of geographical mobility, especially truck drivers, who are identified as high risk for acquiring and spreading HIV 23 . Many surveys ask women if their partner has lived away from home in the past year, and we use these responses as a proxy for occupational mobility.
Condom use is a sexual behaviour factor that is protective against acquiring HIV. Condoms are often presented as the most effective HIV prevention method of sexual transmission of the disease 24 . Though difficult to measure accurately how often condoms are used in sexual encounters, most surveys report on the use of condoms in last sexual intercourse, a readily available proxy for overall condom use.
An early age at sexual debut may be associated with the number of lifetime sexual partners, which is considered a key risk factor for contracting HIV 21 . Furthermore, early age at sexual debut has been shown to be associated with numerous other risk factors for HIV acquisition, such as STI prevalence, decreased condom use, and increased number of sexual partners 25 . For young women, the initiation of sexual activity is the first important determinant of potential viral exposure, and delayed sexual debut has been associated with decreased risk of HIV acquisition 26 . Given these relationships between HIV and age of sexual debut, and the relative ease of acquiring self-reported sexual status, we constructed an indicator for whether young (ages 15-24) men and women have had intercourse.
An individual's number of sexual partners correlates with HIV risk, and past studies have found an increasing linear relationship between the number of sexual partners and HIV prevalence 27 . The number of sexual partners is thought to have been an especially important factor in the early stages of an epidemic, though past research has determined it remains a key risk factor in advanced stages 18 . Surveys often ask men and women their number of partners in the past year, and we used these responses to construct a proxy for multiple concurrent sexual relationships. Separate covariates were constructed for men and women given the well-documented discrepancy in the number of partners reported by men as compared to women 28 . Because of variations we identified in the way these questions were asked across surveys, we tracked the skip logic and question format for all surveys including STI symptoms and/or the sexual activity indicators. This helped us identify surveys for which the question format was so substantively different from others as to require special handling or exclusion (e.g., questions asked without a time restriction for indicators that require a response from the last 12 months). We excluded select surveys because of these irreconcilable question variations, incomplete sampling (e.g., a specific age range or subpopulation), or untrustworthy or outlier data (as determined by the survey administrator or by inspection). The surveys used for these covariates are listed in Supplementary Table 6, and surveys identified as outliers and subsequently excluded from this analysis are listed in Supplementary Table 7. 3.2.2.2 Covariate data processing for microdata To prepare the survey microdata for analysis, we first constructed final indicators from the raw variables included in the survey data:

Covariate data
• For 'STI symptoms,' we constructed a symptoms indicator that was true if a respondent reported either genital discharge or a genital sore/ulcer in the last 12 months, missing if either individual symptom was missing, and false if both symptoms were reported in the negative. • For 'in union,' we constructed an indicator that was true for all respondents who reported being either currently married or living with a partner, false for any other marital status response, and missing if the marital status response was missing. • For 'multiple partners in year', we used the reported number of sexual partners within the last 12 months to construct a binary indicator that was true for any respondent reporting two or more partners and false for any respondent with 0 or 1 partners (including respondents who had never had intercourse). • The other indicators were extracted from the survey microdata in their final form and required no additional construction.
For each indicator, we subset the data to the desired age range (15-24 years for 'had intercourse', 15-49 years for all other indicators). For 'STI symptoms' we additionally restricted the sample to respondents who reported having had intercourse, while for 'partner away' we additionally restricted the sample to respondents currently 'in union'. We dropped any rows with missing responses or sample weights. For indicators where we model men and women together ('STI symptoms,' 'in union,' 'condom last time'), we dropped any surveys that did not interview both men and women. Any observations missing geographic information or with inconsistent geographic information (i.e., points more than 10 km from the nearest specified country border) were also dropped.
Finally, we aggregated the weighted individual-level microdata for each indicator to the finest possible spatial resolution available, in the same way as was done for the HIV prevalence data. Data were geolocated to latitude and longitude at the survey cluster level wherever possible, and to the smallest possible polygon available otherwise. As with the HIV prevalence data, we calculated the effective sample size for each spatial aggregation using the Kish approximation 4 .
For three of the sexual activity indicators ('condom last time', 'multiple partners in past year', and 'had intercourse'), we initially prepared the data separately for men, women, and both sexes combined to evaluate any sex-specific patterns or trends. To investigate the sex-specific trends, three models were fit with sex-specific data and both sexes combined. The results were interrogated to see how correlated the indicators were over space and time. For 'condom last time', prevalence reported among men was generally higher than among women, but levels among men and women were highly correlated over time and space. We decided to include both sexes combined for the final model to maximize data availability.
Likewise, for 'had intercourse,' the estimates from the men-and women-specific models showed high correlation, but a women-specific model was selected due to existence of a number of surveys that only asked this question of women. In contrast, for 'multiple partners per year,' men-and women-specific models showed different trends over space and time, so separate sex-specific covariates were created. This was reinforced by literature that suggested discrepancies between men and women in the reported number of lifetime partners 28 .

Covariate data processing for reports
For 'male circumcision,' we additionally included summary reports for surveys where individual-level microdata were not available. As with HIV prevalence report extractions, we used the median month of data collection as the interview date. We cross-walked the circumcision prevalence from any surveys with an age range other than 15-49 to estimate the prevalence for that age range in alignment with the microdata, again using the same methods as for HIV prevalence. Surveys where this age cross-walk was required are listed in Supplementary Table 3.
We chose not to include summary reports for other covariates. For 'STI symptoms,' the estimates included in reports used a different construction of the variable than that which we built from the microdata, making the reports incompatible with the microdata. For the sexual activity indicators, we decided against summary report extraction due to the significant number of surveys we were able to extract at the microdata level and the scarcity of reports for most of these indicators.

Covariate modelling
Each of these covariates was estimated using the same modelling framework used for HIV prevalence as described in Section 4, with the key exceptions of not including any covariates in the geospatial model, and not including any correction for data derived from ANC sentinel surveillance (which are used only for modelling HIV prevalence). Therefore, these models were fit including only the intercept, Gaussian process term, country random effect, and nugget effect.

Administrative boundaries
For this analysis we used the Global Administrative Units Layers (GAUL) shape files maintained for the Food and Agriculture Organization (FAO) to define country boundaries and first-and second-level administrative subdivisions 29 . For Kenya, GADM shape files were used instead for first-and second-level administrative subdivisions and subsequently manually updated to address known discrepancies 30 .

Gridded population
The gridded population data used for this analysis were obtained from WorldPop 31,32 . Because WorldPop provides data at a 1 x 1-km spatial resolution at five-year intervals, we processed these data as described in Section 3.1 to aggregate to a 5 x 5-km spatial resolution and interpolate to annual time periods. When we use population as a covariate (described in Section 3.1), we use total population. In all other instances (as described in Section 4.3) we use the population for ages 15-49. 4 Statistical model 4

.1 Covariate stacking
Stacked generalization/regression, or stacking, is an ensemble modelling method that combines multiple prediction methods to increase predictive validity relative to a single modelling approach. This ensemble modelling method relies on a variety of sub-models that are then combined by a secondary learner to produce a meta-model that fuses multiple algorithmic methods to capture nonlinear effects and complex interactions 33 . Our implementation of stacking largely follows the approach described by Bhatt and colleagues 34 and previously implemented for mapping under-5 mortality, child growth failure, educational attainment, and diarrheal diseases in Africa 6,35-37 .
We fit three sub-models -a generalized additive model, boosted regression trees, and lasso regressionto the HIV survey data described above with the five pre-existing and eight constructed covariates as well as calendar year included as explanatory variables. ANC data were excluded from the sub-models because of known biases (described below). We selected these three sub-models based on ease of implementation through existing software packages, the fundamental differences in their approaches, and a proven track record of predictive accuracy 34 . Sub-models were fit in R using the mgcv, xgboost, glmnet, and caret packages.
Each sub-model was fit using five-fold cross-validation to avoid overfitting, and hyper-parameter fitting was done to maximize predictive power. For each sub-model, we produced two sets of predictions: outof-sample and in-sample. Out-of-sample predictions for each model were generated by compiling the predictions from the five holdouts from each cross-validation fold, and in-sample predictions were generated by re-fitting the sub-models using all available data. The out-of-sample sub-model predictions were used as explanatory covariates when fitting the geostatistical model described below, and the insample predictions were used when generating predictions from the geostatistical model in order to maximize data use. In both instances, the logit-transformation of the predictions was used to put these predictions on the same scale as the linear predictor in the geostatistical model.

Model description
We modelled HIV prevalence using the following spatially and temporally explicit generalized linear mixed effects model: where: • , and , are the number of individuals sampled and the number of individuals who are HIV+ among those sampled, respectively, in location i and year t; • , is the underlying HIV prevalence in location i and year t; , is a vector of logit-transformed stacked covariates for location i and year t, and 1 is the corresponding vector of regression coefficients; • [ ] is a country-level random effect for country c containing location i; • , is a spatially and temporally correlated random effect for location i and year t; • , is an independent and identically distributed random effect for location i and year t; • is an indicator variable that is 1 for data derived from antenatal care clinic sentinel surveillance and 0 otherwise; • 2 is a fixed offset for data derived from antenatal care clinic sentinel surveillance; • and is a spatially correlated random offset for data derived from antenatal care clinic sentinel surveillance for location i.
Descriptively, this model specifies logit-transformed HIV prevalence as a linear combination of a regional intercept ( 0 ), covariate effects ( 1 , ), country random effects ( [ ] ), spatially and temporally correlated random effects ( , ), and an uncorrelated error term or nugget effect ( , ). The intercept captures the overall mean level of HIV prevalence while the covariate effects capture the spatial and temporal variation in HIV prevalence that can be described as a function of spatial and temporal variation in the included covariates. The country random effects capture additional variation between countries, while the spatially and temporally correlated random effects capture additional variation by location (within and between countries) and time that varies smoothly in space and time. Finally, the uncorrelated error term (or nugget effect) captures any additional, non-structured variation by location and time.
HIV prevalence as measured by sentinel surveillance of antenatal care (ANC) clinics is known to be biased as a measure of HIV prevalence in the general adult population because it captures pregnant women who attend ANC only, as compared to all adult men and women 38,39 . This bias may be either positive or negative: the fact that all pregnant women are sexually active tends to elevate their risk of having acquired HIV prevalence compared to the general adult population (some of whom are not sexually active), while HIV-related sub-fertility tends to reduce the prevalence of HIV-positive women among the population of pregnant women 40,41 . However, ANC data have better temporal and spatial coverage in many countries than survey data alone (Extended Data Figure 1). We incorporated ANC data to capitalize on this additional data coverage, but also attempted to correct for the known biases. In instances where data in our model were derived from ANC sentinel surveillance ( = 1), our model allows for this bias via a fixed term ( 2 ) that captures the overall mean bias, and a spatially varying term ( ) that captures local differences in the extent of this bias. This approach is conceptually similar to previously described approaches for spatial modelling using non-randomized (and therefore potentially biased) data and randomized survey data 42,43 . Although the bias associated with ANC sentinel surveillance may also vary over time in addition to varying spatially, we felt there was insufficient data to estimate both spatial and temporal variation in this bias, and so the bias associated with ANC sentinel surveillance was assumed to be time-invariant over the period of this analysis.
The spatially and temporally correlated random effect ( , ) is modelled as a Gaussian process with mean 0 and a covariance matrix given by the Kronecker product of a spatial Matérn covariance function (Σ ) and a temporal first-order autoregressive (AR1) covariance function (Σ ). is a spatially correlated random effect and is modelled as a Gaussian process with mean 0 and spatial Matérn covariance (Σ ). The Matérn covariance function is given by: In this analysis (the smoothness parameter) was fixed at 1. For both , and , a penalized complexity (PC) prior was used for the Matérn covariance function and specified via two hyper-parameters: the spatial range, s (where s = √8 ⁄ and is equal to the distance at which correlation is approximately 0.1; the subscript s for space is used as to not confuse with the temporal correlation parameter), and marginal standard deviation, σ. PC priors shrink towards a more simplistic base model -in this case, one where the marginal variance is 0 and the spatial range is infinite -and are specified via setting the tail probabilities on each hyper-parameter 44,45 . We followed the guidance provided by Fugulstad et al., who recommend selecting priors that satisfy P(σ > σ 0 ) = 0.05 and P� s < 0 � = 0.05, where σ 0 is between 2.5 to 40 times the expected true marginal standard deviation and 0 is between 1 10 ⁄ to 1 2.5 ⁄ of the expected true range 46 . Specifically, we set: In addition, for , , the AR1 covariance function is associated with a temporal correlation parameter (not be confused with s ). We used the following hyper-prior in this case, which corresponds to a prior mean of 0.76 with a 95% range of -0.17 to 0.97 for : PC priors were also used for the standard deviation of [ ] and , and were set to: Finally, priors for fixed effects were set as:

Prior sensitivity analysis
Sensitivity analyses were undertaken to assess the impact of the hyper-priors for all random effects on the model predictions. In addition to the model described above, we considered five alternate models with different hyper-prior specifications, as outlined below.
• Model 1 (Less informative PC priors) In this model, we retained the PC priors for s and σ and the hyper-parameters for , and , but made these hyper-priors less informative by specifying a larger value of the standard deviation and smaller value of the spatial range when setting the tail probabilities: We similarly updated the PC priors for the standard deviation of [ ] and , : The hyper-prior for the temporal correlation, , was the same as our selected model.

• Model 2 (Less informative PC priors with informative )
This model has the same less-informative PC priors as Model 1 on the spatial parameters and other random effects, but the hyper-prior for the temporal correlation parameter was adjusted to be more heavily weighted toward a high temporal correlation: This hyper-prior corresponds to a prior mean of 0.96 with a 95% range of 0.68 to 1 for .

• Model 3 (INLA default)
This model used the default hyper-priors in INLA implemented in the function inla.spde2.matern() 47 . These hyper-priors are defined in terms of (where = �1/(4 2 2 ) ) and , and were set as: INLA default hyper-priors were also used for the precision parameter of the remaining random effects: The hyper-prior for the temporal correlation parameter, , was the same as our selected model.

• Model 4 (INLA default with informative ):
This model had the same hyper-priors for , , 2 , and 2 as model 3, but the more informative hyper-prior for the temporal correlation parameter, , described for model 2.

• Model 5 (default PC prior with informative )
This model used the same PC priors as our selected model but used the more informative hyperprior for the temporal correlation parameter, , described for model 2.
All six models (five alternative models outlined here and selected PC prior model described in the previous section) were fit for each covariate model and the HIV prevalence model to enable comparison of the corresponding model predictions. The less-informative PC prior (models 1 or 2) failed to converge in at least one region for three covariates ('partner away', 'had intercourse', and 'condom last time'), which is a known problem in INLA when the prior is insufficiently informative. For these indicators, the model that failed to converge was excluded from the comparison.
The predictions from each of the five alternate models were highly correlated with the predictions from the selected model at the grid cell level and at the first-and second-order administrative subdivision level, with the pairwise correlation above 0.999 in most cases (Supplementary Table 8). Additionally, in all but one case (Model 4 for 'male circumcision' at the first-order administrative subdivision level), there was 100% overlap in the uncertainty intervals between the selected model and any alternate model. The mean absolute difference between predictions was also generally low. These comparisons suggest that the predictions are relatively robust to different hyper-prior specifications.

Model fitting and prediction
This model was fit in R-INLA 48 using the stochastic partial differential equations (SPDE) 49 approach to approximate the continuous spatial and spatial-temporal Gaussian random fields ( and , , respectively). We constructed a finite elements mesh for the SPDE approximation to the Gaussian process regression using a simplified polygon boundary (Supplementary Figure 27). We set the inner mesh triangle maximum edge length (the mesh size for areas over land) to be 0.25 decimal degrees, and the buffer maximum edge length (the mesh size for areas over the ocean) to be 5.0 decimal degrees. Estimated model parameters are listed in Supplementary Table 9.
Due to computational constraints and to allow for regional differences in the relationship between covariates and HIV prevalence as well as the strength of spatial and temporal auto-correlation in HIV prevalence, separate models were fit for four regions (Extended Data Figure 7). Specifically, we used the regional classifications for sub-Saharan Africa from the Global Burden of Disease (GBD) study 50 which group countries by location and epidemiological profile. We made one modification to this classification, grouping Sudan as part of the Eastern sub-Saharan Africa region rather than the North Africa and the Middle East region. After fitting each model, we generated 1,000 draws of all model parameters from the approximated joint posterior distribution using the inla.posterior.sample() function in R-INLA. For each draw s of the model parameters we constructed a draw of , as: is set to 0 for the purposes of generating estimates, so draws of 2 and are not incorporated when generating draws of , . Additional processing of the output from inla.posterior.sample() is required for the spatial-temporal random effect ( , ( ) ) and the nugget effect ( , ( ) ) prior to constructing , ( ) according to the equation above. Specifically, for , ( ) , draws are generated initially only at vertices of the finite element mesh, so we project from this mesh to each combination of ( , ) desired for prediction, i.e., the centroid of each grid cell on a 5 × 5-km grid as well as all years from 2000 to 2017. For the nugget effect, we generate , ( ) for each combination of ( , ) by sampling from Normal �0, 2 ( ) �. At the end of this process, we have 1,000 draws of , for each grid cell and year combination.

Validation strategy
We used five-fold cross-validation in order to assess the performance of the modelling framework described above with respect to predicting HIV prevalence. To do so, we first split all survey data into five groups by randomly sorting a list of unique identifiers for each survey, calculating the cumulative effective sample size represented by the surveys in this list, and then dividing the list into five parts at the point where this cumulative sample size was closest to 20%, 40%, 60%, and 80% of the total. This results in five groups that are approximately equal in terms of the total effective sample size and which contain entire surveys (i.e., all of the data points derived from each survey are contained exclusively within only one fold). We then fit the model described above five times, excluding each of the five groups of data in turn.
All ANC data were included in all models and were not used to assess model performance given the known biases in these data.
After fitting the model five times, the data withheld from each model were matched with predictions from that model, and then these data-prediction pairs were compiled across all five models, resulting in a complete dataset of out-of-sample predictions corresponding to all survey data included in the analysis.
HIV prevalence estimates based on single survey clusters are generally quite noisy due to very small sample sizes, and are consequently insufficient as a 'gold standard' for evaluating the model predictions 6 .
To address this issue, we aggregated both the observed data and the corresponding out-of-sample predictions within countries and within first-and second-level administrative subdivisions, by calculating a weighted mean of each using the effective sample sizes as the weights. Then, across all data-estimate pairs, we calculated two summary measures: the mean error (ME, a measure of bias) and the root-meansquare error (RMSE: a measure of total variance).
In addition, for each data-estimate pair, we constructed 95% prediction intervals from the 2.5 th and 97.5 th percentiles of 1,000 draws from a binomial distribution corresponding to each of the 1,000 posterior draws of HIV prevalence with p equal to HIV prevalence in a given posterior draw and N equal to the effective sample size for the data point. We then calculated coverage as the percentage of data-estimate pairs where the data point was contained within this 95% prediction interval.
Finally, to complement the out-of-sample predictive validity metrics, we also calculated in-sample predictive validity metrics using the same process but matching each data point to predictions from a model fit using all data.

Sensitivity analyses
We used this validation strategy to assess model performance of the final model (as described in sections 4.1-4.2) and to compare to a number of alternatives, considering in particular the impact of the choices and assumptions we make with regards to incorporating covariates, using data from ANC data, and integrating data with polygon rather than point location information 51 .
First, we assessed the contribution of the covariates and the stacking algorithm to the overall modelling strategy by considering models with no covariates and models with covariates included in the geostatistical model as-is ('raw') rather than first going through the stacking algorithm. We additionally considered models with and without the Gaussian process (GP) spatial-temporal smoothing term ( , ). This leads to five models: The summary error measures for these five models are shown in Supplementary Figure 20 and a comparison of the estimates derived from these models is shown in Supplementary Figure 21. Across all three levels of aggregation and all three validation metrics, the models with a Gaussian process outperformed those without, with mean error closer to zero, smaller RMSE, and coverage closer to 95%. Among the three models with a Gaussian process, performance was more similar, although the mean error for models with covariates (either raw or stacked) was noticeably closer to zero than models without covariates, both in-and out-of-sample. For all models with a Gaussian process, coverage of the prediction intervals was between 96% and 97% in-sample, and between 90% and 93% out-of-sample.
Second, to assess the contribution of including ANC data in addition to survey data as well as several alternate approaches to including these data, we considered models with and without the ANC data included, and three different approaches to the bias-correction for these models: no correction, as a non-structured (independently and identically distributed (IID)) random effect, and as a spatially structured random effect. This leads to four models: The summary error measures for these four models are shown in Supplementary Figure 22, and a comparison of the estimates derived from these models is shown in Supplementary Figure 23. The model including ANC data but no bias correction was consistently the worst in terms of mean error and RMSE both in-and out-of-sample. The models including ANC data with some type of bias correction outperformed the model with no ANC data, with a noticeably lower out-of-sample RMSE. Comparing the two bias corrections, the model with the Gaussian process correction was slightly superior to the model with the IID correction in terms of out-of-sample mean error (closer to zero), and out-of-sample coverage (closer to 95%), although the RMSE both in-and out-of-sample was comparable between these two models.
Third, to assess the sensitivity of the model to including polygon data processed via polygon resampling in addition to point data, we considered models with and without polygon data included. In both cases, we assessed the performance of the model in terms of in-sample and out-of-sample predictions for the point data only. The summary error measures for these two models are shown in Supplementary Figure 24, and a comparison of the estimates derived from these models is shown in Supplementary Figure 25. Insample RMSE was relatively constant between the two models, while in-sample mean error was slightly larger in the model with point and polygon data compared to the model with point data only. Out-ofsample, however, the model with both point and polygon data performed noticeably better -smaller mean error and lower RMSE -than the model with point data only.

Calibration to Global Burden of Disease 2017
To take advantage of the more epidemiologically structured modelling approach and additional nationallevel data used by GBD 2017, we performed post hoc calibration of our estimates to the GBD estimates 52 . As a preliminary step to this calibration and the aggregation described in the next section, we first intersected each grid cell with the second-level administrative subdivision shape file to determine what fraction of the area of each grid cell fell within each administrative unit. Since all second-level subdivisions nest within first-level subdivisions, which in turn nest within countries, this strategy assigned the cell fractions to an administrative area at each level of the administrative hierarchy. We assumed that population density within each cell was uniform, and for cells that were split across multiple subdivisions, allocated the WorldPop population estimate in proportion to area. This process was carried out separately for each modelling region, so cells that cross international borders that are also regional borders were allocated in their entirety to the country that contained the centroid of the grid cell.
Using this assignment of cells and cell fractions to the administrative hierarchy, we first scaled the gridcell-level WorldPop estimates for the 15-49 age group to match the corresponding GBD population estimates for each country and year 53 . To do so, for each country and year, we defined a population raking factor as the ratio of the GBD population estimate to the sum of the WorldPop population estimates for all cells and fractional cells within the country, and then multiplied the WorldPop population estimates for all cells and fractional cells within the country by this raking factor.
We then similarly adjusted our HIV prevalence estimates. Specifically, for each country and year, we defined a prevalence raking factor as the ratio of the GBD prevalence estimate to the populationweighted mean of estimates for all cells and fractional cells within the country, and then multiplied each HIV prevalence draw for all cells and fractional cells within the country by this raking factor. At this point, the prevalence estimates for cells that had been fractionally allocated to multiple countries were recombined by calculating a weighted average with weights determined by the relative area of each fraction. Final point estimates for each grid cell were calculated as the mean of the scaled draws, and 95% uncertainty intervals were calculated as the 2.5th and 97.5th percentiles of the scaled draws. The impact of this calibration procedure is depicted in Supplementary Figure 26, which compares the pre-calibration estimates to the post-calibration estimates, both aggregated at the national level. The corresponding prevalence raking factors had a mean of 1.09 and an interquartile range of 0.88-1.14.

Aggregation to first-and second-level administrative subdivisions
In addition to estimates of HIV prevalence on a grid, we also constructed estimates of HIV prevalence for first-and second-level administrative subdivisions. These estimates were derived by calculating population-weighted averages of HIV prevalence for each grid cell or fractional grid cell within a given first-or second-level administrative subdivision. Grid cell fractions were assigned through intersection as described above, and this process used the calibrated population estimates and prevalence draws also described above. This was carried out for each of the 1,000 posterior draws at the grid cell level, generating 1,000 posterior draws for each administrative subdivision. Final point estimates and uncertainty intervals for each subdivision at each level of the administrative hierarchy were derived from the mean, 2.5 th percentile, and 97.5 th percentile of these draws, respectively.

Calculating people living with HIV (PLHIV)
We estimated the number of people living with HIV (PLHIV) in each grid cell and year by combining estimated population and HIV prevalence after calibration to GBD as described above. Specifically, for each cell and fractional cell, we multiplied the estimated population by each of the 1,000 prevalence draws to generate 1,000 draws of PLHIV. Fractional cells were then recombined by summing PLHIV for each draw within each cell. Final point estimates and uncertainty intervals for PLHIV were calculated as the mean, 2.5 th percentile, and 97.5 th percentile of these draws, respectively.

Advances compared to previous analyses
This study is not the first to consider subnational variation in HIV prevalence (Supplementary Table 1). However, this analysis builds upon and expands beyond previously published analyses in several important ways.
First, this is, to our knowledge, the only study to date to investigate subnational variation in HIV prevalence with a high degree of spatial granularity (second-level administrative subdivisions or lower) on a continental scale. This enables multinational comparisons and exploration of cross-border trends, but more importantly ensures that estimates are available for all countries in sub-Saharan Africa, including those that may have previously been missed entirely or analyzed only at a coarser subnational level (e.g., first-level administrative subdivisions).
Second, this analysis makes use of a wider array of data sources than are typically considered, with the aim of producing more robust and precise estimates. Most previous analyses used a single data source per country, whereas this analysis incorporated all identified and available data from seroprevalence surveys and ANC surveillance in each country and incorporated additional data in the form of covariates. Additionally, by calibrating our estimates to those from the GBD, we are able to capitalize on the more structured modelling approach and additional national-level data used in that study.
Third, this study considers not only differences in HIV prevalence at a single point in time, but also temporal trends over an 18-year period. While estimates for the most recent year are likely the most useful from a programmatic or policy standpoint, information about recent trends can also provide some suggestion of what to expect in the near future.
Finally, to our knowledge this is the first study to make estimates readily available via a web-based visualization tool, which we believe will make these estimates more accessible. We intend to regularly update this analysis so that contemporary, comprehensive data continue to be available. Subnational estimates of HIV prevalence or PLHIV are already in use in some contexts, for example, for geographic priority setting in annual PEPFAR country operational plans 54 . The underlying data sources and methodologies used to produce these estimates are often unclear based on publicly available documentation.

Limitations
This methodology and the resulting estimates are subject to a number of limitations.
Most importantly, the accuracy of our estimates is critically dependent on the quantity and quality of the underlying data. We have constructed a large database of geo-located HIV prevalence data for the purposes of this analysis; nonetheless important gaps in data coverage, both spatial and temporal, remain (Extended Data Figures 1-3). In addition, there are several factors related to data quality that should be acknowledged. Survey data are subject to non-response bias. This is potentially a particular concern for HIV prevalence data, as participation rates in the blood test portion of household surveys are typically lower than response rates for the questionnaire portion 55 . We carried out a complete-case analysis and have made no explicit adjustment for potential non-response bias. Future research should consider the impact of non-response on HIV prevalence estimates and potentially include adjustments alongside geostatistical methods for mapping prevalence. Survey data used to construct covariates may also be subject to biases. In addition to non-response bias, recall bias and social desirability bias are potential concerns for these self-reported outcomes 56 . If these biases impact our covariate estimates, particularly in a manner that varies spatially and temporally, these errors could potentially be propagated to the HIV prevalence estimates. However, our out-of-sample validation analysis found no evidence of increased bias or total error (as measured by RMSE) in HIV prevalence as estimated from models using covariates compared to models that do not use covariates.
The location information associated with the data compiled for this analysis is subject to some error. In order to protect respondent confidentiality, most surveys that collect GPS coordinates perform some type of random displacement on those coordinates prior to releasing data for secondary analysis: for example, GPS coordinates for Demographic and Health Surveys (DHS) are displaced by up to 2 km for urban clusters, up to 5 km for most rural clusters, and up to 10 km in a random 1% of rural clusters 57 . Past research has found that displacement can degrade the predictive power of a geostatistical model; however, this effect was found to be modest, and researchers concluded that relatively accurate mapping can be undertaken at a 5 x 5-km resolution even with GPS displacement 58 . ANC data and survey data extracted from reports and scientific literature were manually matched to location information, a process that is inherently associated with some uncertainty and potentially human error. Moreover, ANC data are collected at clinics. We attempted to match these data to the clinic location, but clinic location is only a proxy for where the women attending this clinic reside. Finally, data associated with polygons rather than GPS coordinates were resampled so that they could be included in the geostatistical model, but this process effectively assumes that HIV prevalence is constant over the polygon and may result in overly homogenous estimates in locations with a preponderance of polygon data. This may be the case, for example, in the Southern sub-Saharan Africa region, where an unusually large proportion of the ANC data are associated with polygon rather than point location data and the posterior estimate of the spatial range parameter is noticeably larger than in other regions (Supplemental Table 9). Research on scalable methods for better integration of polygon data in geostatistical models similar to those used in this analysis is currently ongoing.
With respect to the modelling strategy, the primary limitation is the difficulty in assessing model performance at the grid cell level. We used cross-validation to assess model performance, but due to the substantial impact of sampling error on estimates derived from single survey clusters, it was necessary to aggregate both the data and predictions when assessing error. Additionally, while we have attempted to propagate uncertainty from various sources through the different modelling stages, there are some sources of uncertainty that have not been propagated. In particular, it was not computationally feasible to propagate uncertainty from the covariates (all of which are associated with some uncertainty, including those that we modeled specifically for this analysis) or the sub-models in stacking through the geostatistical model. Similarly, although the WorldPop population raster is also composed of estimates associated with some uncertainty, this uncertainty is difficult to quantify and not currently reported, and so we were unable to propagate this uncertainty into our estimates of the number of PLHIV or estimates of HIV prevalence for administrative subdivisions that were created using population-weighted averages of grid cell estimates. The modelling strategy also incorporates a number of assumptions, which are difficult to check and, if incorrect, may lead to error. In particular, due to data sparsity, we elected to assume that the bias associated with estimates of HIV prevalence derived from ANC sentinel surveillance varied spatially but was constant over time, though it is possible that this bias varied temporally as well. Finally, our model 'borrows strength' spatially and temporally as well as from covariates in order to stabilize estimates in locations and time periods with very small sample sizes and to interpolate in locations and time periods with no directly observed HIV prevalence data. While we think this is broadly appropriate, there may be specific places and times where this methodology fails; for example, abrupt discontinuities in HIV prevalence in neighboring locations or time periods that cannot be described as a function of the covariates we included are unlikely to be reflected in our estimates unless they are directly observed in the underlying HIV data and those data have substantial sample sizes.
Model fitting was carried out using an integrated nested Laplace approximation to the posterior distribution, as implemented in the R-INLA package 48 . Prediction from fitted models was subsequently carried out using the inla.posterior.sample() function, which generates samples from the approximated posterior of the fitted model. Both model fitting and prediction thus require approximations, and these approximations may introduce error. While it is difficult to assess the impact of these approximations in this particular use case, our validation analysis found that our final model has minimal bias and good coverage of the 95% prediction intervals, which provides some reassurance that the approximation method used -as well as other potential sources of error -are not resulting in appreciable bias or poorly described uncertainty in our reported estimates.

Future directions
There are opportunities and a need to expand this analysis further.
In particular, information on HIV prevalence by age and sex and covering a larger age range would be valuable, given the well-known differences in HIV prevalence by age and sex, the unique needs of children and adolescents who are living with HIV, and the increasing number of PLHIV who are age 50 years or older 52,59 .
Even more so, mapping diagnosis rates, treatment rates, incidence, and mortality in addition to prevalence would provide more nuanced tools to target primary prevention, testing, and health care delivery strategies, and would also allow for local tracking of progress toward international targets including SDG3 and the 90-90-90 targets 60,61 . Comprehensively mapping incidence and mortality in sub-Saharan Africa is likely to be considerably more difficult than mapping prevalence, as there are substantially fewer sources of directly observed data 62 . Nonetheless, there is an opportunity to combine the geostatistical methods described here with methods akin to those used by the GBD study and by UNAIDS to estimate incidence and mortality largely on the basis of observed prevalence and the epidemiological relationships between prevalence, incidence, and mortality 52,59 . Moreover, HIV incidence assays -which distinguish between recent and non-recent infections, allowing for direct measurement of population-level HIV incidence -are increasingly common in seroprevalence surveys 63 and in routine health care settings, and could provide an important input into future incidence mapping efforts. Similarly, survey-based data on diagnosis and treatment are also increasingly common and could support future efforts to map diagnosis and treatment rates 63 .
This analysis does not speak to the underlying mechanisms driving geographic differences in HIV prevalence or differences in changes in HIV prevalence over time. Future analyses focusing on geographic differences in HIV incidence and mortality, as well as consideration of migration patterns, will be necessary to assess the extent to which local changes in HIV prevalence observed in this study are due to changes in incidence, mortality, and/or migration. Similarly, future analyses of diagnosis and treatment patterns, potentially coupled with information about structural and behavioural factors related to HIV incidence (including many of the variables constructed as covariates in this analysis), would enable an even more nuanced exploration of the root causes of the geographic differences observed here.
Finally, this analysis generated estimates on a 5 x 5-km grid and then aggregated these estimates to produce estimates for first-and second-order administrative subdivisions, but other levels of analysis are possible. Future research should consider the optimal level of analysis, which likely requires attention to data availability, data quality, and the specificity of the location information associated with available data sources, in addition to consideration of the level of policy action, implementation, and monitoring.   (1994,1996,1999,2001,2003,2005,2007,2010) Supplementary