A spatial database of health facilities managed by the public health sector in sub Saharan Africa

Health facilities form a central component of health systems, providing curative and preventative services and structured to allow referral through a pyramid of increasingly complex service provision. Access to health care is a complex and multidimensional concept, however, in its most narrow sense, it refers to geographic availability. Linking health facilities to populations has been a traditional per capita index of heath care coverage, however, with locations of health facilities and higher resolution population data, Geographic Information Systems allow for a more refined metric of health access, define geographic inequalities in service provision and inform planning. Maximizing the value of spatial heath access requires a complete census of providers and their locations. To-date there has not been a single, geo-referenced and comprehensive public health facility database for sub-Saharan Africa. We have assembled national master health facility lists from a variety of government and non-government sources from 50 countries and islands in sub Saharan Africa and used multiple geocoding methods to provide a comprehensive spatial inventory of 98,745 public health facilities.


Background and Summary
Defining the location of health services in relation to the communities they are intended to serve is the cornerstone of health system planning, ensuring the right services are accessible to the population and that no one is geographically marginalized from essential services. Across sub-Saharan Africa (SSA), health services are offered with increasing levels of medical sophistication, from community providers who handle basic care to hospitals which play a critical role in providing emergency care [1][2][3] . However, these services are not accessible to everyone equally 4 . This inability to reach quality assured health services able to provide life-saving interventions contributes to the sustained high burden of communicable and non-communicable disease morbidity and mortality in SSA relative to other regions of the world 5,6 . The Global health agenda now aims to ensure Universal Health Coverage (UHC), to ensure that all people obtain the health services they need and underpins the health-related Sustainable Development Goals. Defining and planning for universal equity in access to health services demands better data on the location of both services and populations they are intended to serve.
While countries have been encouraged to develop health Master Facility Lists (MFLs) 7,8 and censuses of service provision 9 , this has not been universal and most inventories lack coordinates. In Kenya in 2003, we began an exercise to compile and geocode a single, public health MFL from a variety of government, non-governmental (NGO), community-based (CBO) and faith-based organization (FBO) listings 10 , the first time a national map of service providers had been developed since 1959 11 . Over the next 15 years we extended this approach across other countries in SSA, coinciding with a regional impetus to improve master health facility lists 7,8 .
Access to health care is a complex and multidimensional concept and can be defined in a variety of ways, however, in its most basic sense, it refers to geographic availability. Improved data metrics on health access demands at the very least a knowledge of where service providers are in relation to populations. Provider-to-population ratios were used to estimate availability in 1970s and earlier [12][13][14] , then the emergence of Geographic information (2019) 6:134 | https://doi.org/10.1038/s41597-019-0142-2 www.nature.com/scientificdata www.nature.com/scientificdata/ systems (GIS) in 1980s and 1990s led to the evolution of distance as a metric [15][16][17] while network analyses, floating catchment area methods and cost-distance analyses are more recent [18][19][20][21] . By identifying top tier hospitals and lower tier facilities, it is now possible to estimate specific access to emergency care and surgeries 18,[22][23][24] . Other extensions are application in defining access to Basic and Comprehensive Emergency Obstetric and Neonatal Care (BEmONC/EmONC) services 25 , evaluating public health sector service delivery 26 and community-based interventions [27][28][29] . In the future, it is likely that this will be extended to understanding health care optimization. Georeferenced health service inventories, beyond defining equitable access, increase the epidemiological value of defining the evolution of emerging, epidemic pathogens 30 , antimicrobial resistance, and defining sub-national disease epidemiology as well as improving the potential of routine health data use 31 .
Notwithstanding previous efforts to extract health service providers from OpenStreetMap complimented by volunteered information (https://www.healthsites.io), there is no single, georeferenced and comprehensive public health facility inventory for sub-Saharan Africa. Building on a previous audit of public hospitals in sub Saharan Africa 18 , we provide a geo-coded inventory of 50 countries in sub-Saharan Africa, provided as an open-access resource. We have focused on public health facilities managed by government, local authorities, FBOs and NGOs. A wide variety of primary sources of information were consulted, from websites and data portals of ministries of health, national and international organizations, health sector reports, to sourcing data through personal communications. Facility types and numbers for each country were validated using reported data in national health sector strategic plans and other health sector reports.

Methods
Data sources. We undertook a data assembly process to compile a comprehensive health facility database across SSA, starting with Kenya in 2004 10,32 , and expanding to the rest of SSA between 2012 and 2018. Countryspecific public health facility databases were developed through a systematic and iterative process of data assembly from a variety of sources, data abstraction, geocoding and technical validation of the data. For each country, the Ministry of Health (MoH) website was the first source of authoritative MFLs consulted. This was done through online searches and downloads of MFLs or maps hosted on the MoH website or on data portals managed by the MoH such as the National Health Map (Carte Sanitaire), health facility registries and online Health Management Information Systems (HMIS) including District Health Information Systems version 2 (DHIS2). In some instances, MoH publications with information on facility lists or maps were used. Occasionally, while the MoH sources did not have comprehensive lists of facilities or maps, other government agencies' websites, for example national statistical agencies, hosted health facility listings and these were consulted. Where the governmental ministries and agencies had no facility listing information or was inadequate, other sources of data were consulted, including data hosted by the United Nations Office for the Coordination of Humanitarian Affairs' (UNOCHA) Humanitarian Data Exchange (HDX) portal and websites of other UN agencies, and where relevant, websites of FBOs and NGOs working in each country. For countries where no health facility data were available online, we made attempts to directly contact people working with National Malaria Control Programmes, World Health Organization or other health-oriented agencies to aid in locating health facility listings. Often, more than one source of data was consulted for each country and assembled into a single list as illustrated in Online-only Table 1 and Fig. 1.
Locations of existing health facility data for sub-Saharan African countries are fragmented. Half of the 50 countries had an MoH website/HMIS/data portal as the main source of data used; 4 (8%) had other governmental bodies, mainly national statistical agencies, as the main data source; 9 (18%) countries had UNOCHA HDX as the main data source, while we used other sources like NGO and FBO websites for 7 (14%) countries. The main source of data for the remaining 5 countries (10%) is unpublished data obtained through personal communication. In total, 93 different sources of data were consulted to develop the first sub-Saharan Africa geocoded inventory of health service providers across all levels of service provision (Online-only Table 1).
We have classified the 50 countries into 4 broad categories according to each country's main source of health facility data consulted (Online-only Table 1). We have treated Zanzibar as a separate entity from mainland Tanzania as it has a separate health system. The first category includes 17 countries (Burundi, Central African Republic, Ghana, Guinea, Liberia, Malawi, Mali, Mozambique, Namibia, Nigeria, Rwanda, Sierra Leone, Somalia, South Africa, South Sudan, Tanzania and Zimbabwe) that have an online list of health facilities and for which data are available either fully or near fully geocoded. The second category includes 11 countries (Botswana, Chad, Comoros, Gambia, Kenya, Lesotho, Senegal, Seychelles, Togo, Uganda and Zambia) that had MFLs available online but lacked coordinates and where a process of geocoding was undertaken. The third category includes 17 countries (Benin, Burkina Faso, Cameroon, Cape Verde, Cote d'Ivoire, Djibouti, Equatorial Guinea, Eritrea, eSwatini, Ethiopia, Gabon, Guinea-Bissau, Mauritania, Mauritius, Niger, São Tomé & Príncipe and Zanzibar) that had maps available online where a facility list was created through a process of onscreen digitization of locations in a GIS platform. The fourth category are 5 countries (Angola, Congo, Democratic Republic of Congo, Madagascar and Sudan) where adequate data were not available online, and for which we relied on personal communication and in country contacts to acquire these data. We could not locate data for lower tier facilities for Guinea-Bissau and Sudan, and for 10 provinces in Angola (Online-only Table 1).
Data abstraction. Broadly, there are two main categories of health care delivery systems across sub-Saharan Africa -public and private. Private includes private-not-for-profit and private-for-profit 33 . We have focused on the public and private-not-for-profit sectors where health facilities are managed by the MoH, local authorities, NGOs, FBOs and CBOs. Private health facilities, though important in extending provision of health services, were excluded as previous audits of MFLs reveal that private sector facilities are often under-represented in MoH registries, located in urban centres, accessible only to those able to afford services, unregulated and do not often feature in MoH commodity distribution systems 17 . Their exclusion was a pragmatic decision based on difficulties www.nature.com/scientificdata www.nature.com/scientificdata/ with enumeration within the private sector 10,32 and the complexity of its structural and organizational system [34][35][36][37] . Enumerating and regulating this sector has remained a challenge in most countries in sub-Saharan Africa. The private sector was difficult to audit despite extensive searches from multiple sources, where available, completeness varied wildly from country to country, and most of the facilities were often based in large cities and smaller urban areas. Our focus, therefore, has been on providers of public health sector services that cover services including expanded immunization programmes, health data surveillance and receive government funding to provide services to the general population. We have made a single exception to this rule, in Botswana, where many private company-owned health service providers also provide care to the general public and are more formally integrated into the government health system 38 . Across all countries, we excluded government facilities serving special groups, for example prisons, armed services, police, schools, universities and government service clinics. These services are often not available to the general population. Further exclusions were made to remove specialized care service providers such as blood transfusion centres, HIV Voluntary Counselling and Testing (VCT) centres, maternity and nursing homes, family planning clinics and specialist facilities (e.g. dental, eye, psychiatric, tuberculosis, rehabilitation, ophthalmic) (Fig. 1). The spatial location of these services is equally important for health system planning but were not universally documented across national health facility listings. Our ambition was to provide a geo-coded inventory of operational facilities providing broad clinical care services to the general population. It was notable across several data sources that not all facilities were deemed operational, and those labelled as "under construction" or "closed" were excluded. In total, 98,745 public health facilities were assembled for 50 countries in sub Saharan Africa.
Data cleaning. From all sources a single set of minimal descriptive data was possible for each facility: first level administrative unit, facility name, facility type, and details on facility ownership. A common feature across all databases was the presence of duplicates and these were identified by a careful review of each country's list and where identified were discarded. For facilities without names, we adopted names of the smallest administrative units i.e. wards, towns or villages, in which they are located if these were included in the original lists, with the assumption being that it is likely that a public facility has same naming orientation as the ward, town or village its located in. For those without names and information on admin units but with coordinates, we adopted the name of the nearest populated place based on calculated relative distances. In instances where information on facility ownership was missing an attempt was made to assign ownership based on the facility name e.g. all facilities that had a Christian name or church name such as Saint, protestant, mission, evangelical or Islamic were assigned FBO ownership, facility names with NGO names were classified as NGO-owned while the rest were assigned government ownership. The size and definition of level of health facilities varied considerably within and between countries (Online-only Table 1). There doesn't exists a universal standardized definition of facility types making cross country comparisons difficult, and definitions provided in national health policies varied between countries hence while we are relatively confident in the hospital definition used here, the definition for subsequent tiers were www.nature.com/scientificdata www.nature.com/scientificdata/ harder to reconcile between countries. We have therefore elected to include the information as shown in primary data sources consulted (Online-only Table 2).
Geocoding. Facilities that had GPS coordinates from original data sources were 47,805 (48%), while 10,759 (11%) coordinates were obtained from digitizing health facility maps. For the remaining 40,181 (41%) health facilities with missing coordinates, we used a variety of existing location data and digital resources for geocoding. As a priority, where Global Positioning Systems (GPS) data for previously mapped infrastructure like schools were available, we used these to match to names of the health facilities in the same administrative zones to assign coordinates (n = 9,656; 10%). The most useful digital resources were Geonames digital place name gazetteer (n = 6,270; 6%), Google Earth/Maps (n = 5,672; 6%) and other digital place name gazetteers including Encarta, OpenStreetMap, HereMaps, Bingmaps, iTouchMap and Fallingrain (n = 11,925; 12%) ( Fig. 1). A comprehensive list of all digital place name gazetteers used is presented in Table 1. Matching facility names to place names proved difficult in some situations. Although digital gazetteers strive to realize standard nomenclatures and unique naming strategies, these are difficult to achieve at national levels where spellings change between authors, overtime and where the same names are replicated across different places in the country 39 . Since all data were stored in a standard format in MS Excel (Microsoft, Redmond, USA), where maps were available, facility locations were digitized in ArcGIS 10.5 (ESRI, Redlands, USA), attribute information captured and converted to MS Excel.
For facilities that could not be geo-located using any of the above resources, we used a combination of sources and methods to geocode these (n = 4,308; 4%). This involved finding descriptions of locations of villages matching health facility names in journal papers and news articles and using this information to triangulate these locations in Google Earth, or using shapefiles of the smallest administrative divisions of a country, matching these with the same administrative units in the facility databases and placing the facility location on a populated place in Google Earth, but only in administrative units that were roughly 5 square kilometers in area to minimize the location error. Within small administrative units in Google Earth, populated places were not always obvious to locate visually so we located them by tracing footpaths, roads, rivers or streams that led up to populated places. www.nature.com/scientificdata www.nature.com/scientificdata/ Facilities that could not be geocoded using any of the sources in Table 1 in the final SSA MFL are 2,350 (2%). Figure 2 shows the location of the geocoded 96,395 public health facilities in sub Saharan Africa.

Data Records
The geocoded master facility list described here has been made publicly and freely available through both the figshare repository 40 and through the World Health Organization's Global Malaria Programme (https://www. who.int/malaria/areas/surveillance/public-sector-health-facilities-ss-africa/en/) in Microsoft Excel format. While the dataset available through figshare 40 will be preserved in its published form, the one available through the World Health Organization's Global Malaria Programme website will be updated through further in-depth validation and updating in collaboration with individual countries. The Database includes data from 50 countries in sub-Saharan Africa. Each data record represents a health facility and has 8 descriptive variables -Location identifiers including: country, first level administrative division, latitude, longitude and LL source (source of the coordinates). Coordinates are rounded off to four decimal places for uniformity, allowing an accuracy of 5-10 metres in decimal degrees coordinate format. We have included only the first level administrative divisions for each country that were in the primary data sources as they are least ambiguous and do not change much over time unlike second level administrative divisions which can change rapidly and are not consistent over time for most countries. The other pertinent details are the facility name, facility type and facility ownership. Facility types definitions vary between countries (Online-only Table 2), for example, a health centre in Kenya may not necessarily equal to a health centre in Sierra Leone in terms of services offered. Ownership expresses who owns or manages a facility e.g. MoH (government), local authority, NGO, FBO or CBO. Health facility coordinates are given in decimal degrees format in the World Geodetic System 1984 (WGS84) coordinate system.

technical Validation
Heath facility locations. Facility locations were validated in ArcGIS using two datasets and by visual inspection in Google Earth. In ArcGIS, the Spatial Join and Select by Location tools were used to identify facilities that fell slightly off international boundaries or slightly outside of their correct region and/or district. The Global Lakes and Wetlands Database (GLWD) 41 shapefile was used to ensure facilities were within defined land areas. While in theory GPS coordinates should represent an unambiguous spatial location, these required careful www.nature.com/scientificdata www.nature.com/scientificdata/ re-evaluation to ensure that facility locations were on their true position on the Earth's surface. We therefore routinely verified all GPS data from all sources using place names and/or Google Earth to ensure coordinates were located on populated places. Despite 47,805 (48%) facilities in the original lists having coordinates, some of the these plotted in wrong districts, on water bodies and or outside country boundaries. These were assigned new coordinates using Google Earth and minor shifts made in ArcGIS.
Service delivery levels and number of health facilities. Health facilities levels are defined based on the complexity of services they offer. Health facilities types, specifically at the primary level, vary between countries in their definitions, specialization, size of populations they serve and the essential services they provide, infrastructure and staffing. Health centres, medical centres, polyclinics, health posts, dispensaries, clinics, health huts, health units etc. may have similar functions but equally may represent different levels of service provision between countries. To cross-reference the completeness of our national inventories of facilities and to understand this variation, we first identified and defined service provision and health facilities at each level based on information from health sector policies and strategic plans for each country. We compared this against information in our database and for most countries, there was good concordance across all levels of service provision. Secondly, the number of health facilities at each service provision level for each country were compared with the corresponding number of facilities reported in the most current Health Sector Strategic Plans (HSSP) and other health sector reports and these were found to be largely similar with minor variations (Online-only Table 2). Most of these variations are partly because HSSPs and other health sector reports usually under-report mission and NGO facilities or do not at all, and in some instances, there was temporal misalignment between the data included in the database and the HSSPs consulted. While this inventory serves as a useful entry point to future facility censuses in Africa, the accuracy and completeness of this resource now requires further country and regional level validation.

Usage Notes
The definitions of health facility levels provided in national health policies and across country specific MFLs varied between countries hence the arbitrariness of health facility definitions used here. This has been noted previously 8 and demands development of uniform health service delivery definitions across Africa. Importantly, indicators of service availability cannot accurately reflect access to and utilization of services. Hence service accessibility must be accompanied with improved quality of care at facilities. Accessibility on its own does not by itself guarantee that services are available 42 . Defining the scope, service provision and laboratory capacities and optimal catchment populations for health care providers should therefore be a priority.
We have elected to exclude the facility codes provided in the original sources because, though important, we are unsure of the fidelity of these facility codes as unique facility identifiers in each country without further investigation, and these codes were absent from most of the original data sources we consulted. While we acknowledge the significance of facility codes in managing database updates and providing linkages to other datasets, we encourage countries to create a uniform national coding system for facilities to be used by all stakeholders. For now, future updates and linkage to this dataset will be possible through name and admin units matching, while facility type and ownership can also be used to refine the matching.
The private sector has grown significantly over the last two decades in many countries in Africa, and they will provide an important role in achieving universal health coverage. However, enumerating and regulating this sector has remained a challenge in most countries in sub-Saharan Africa and are difficult to audit within current master health facility lists. Future iterations of the African database of public service providers released here should aim to include private sector providers following improvements in their national enumerations and auditing.
In addition to the data release through the figshare repository 40 this dataset will simultaneously be hosted by the World Health Organization's Global Malaria Programme (WHO-GMP) (http://www.who.int/malaria/areas/ surveillance/public-sector-health-facilities-ss-africa/en) who will henceforth facilitate further in-depth validation and updating in collaboration with individual countries covered. This will also bring the possibility of assigning the facility codes for future updating, data linkages and comparisons. We have focused this work on SSA countries as this is the region our previous work has focused on, is most lacking with respect to comprehensive MFLs and it's the region under the World Health Organizations' Regional Office for Africa, who we have collaborated with on this work.
We recognize that, despite our best efforts, this dataset is incomplete and will change over time. As such, we encourage readers to alert us via email to additional and/or newly available data so that we may add them, if appropriate.

code availability
The data assembly was performed using MS Excel (Microsoft, Redmond, USA). The datasets included in this paper were manipulated using ESRI ArcGIS v10.5 and Google Earth V7.3.2.5776. Where maps were available, facility locations were digitized in ArcGIS, attribute information captured and converted to MS Excel.