Abstract
Onchocerciasis is a neglected tropical disease with numerous symptoms and side effects, and when left untreated can lead to permanent blindness or skin disease. This database is an attempt to combine onchocerciasis prevalence data from peer-reviewed publications into a single open-source dataset. The process followed to extract and format the information has been detailed in this paper. A total of 14,043 unique location, diagnostic, age and sex-specific records from 1975–2017 have been collected, organized and marked for collapse where a single geo-position is shared between multiple records. The locations vary from single villages up to smaller administrative units and onchocerciasis control program-defined foci. This resulting database can be used to by the global health community to advance understanding of the distribution of onchocerciasis infection and disease.
Design Type(s) | data integration objective • disease analysis objective |
Measurement Type(s) | Scientific Publication |
Technology Type(s) | digital curation |
Factor Type(s) | geographic location • temporal_interval |
Sample Characteristic(s) | Homo sapiens |
Machine-accessible metadata file describing the reported data (ISA-Tab format)
Similar content being viewed by others
Background & Summary
Onchocerciasis is a filarial disease that can lead to permanent blindness and skin disease. Infection with Onchocerca volvulus is transmitted through the bite of the Simulium species of blackfly, which breed in fast-moving rivers. Once an individual is infected, the adult female worm circulates throughout subcutaneous connective tissues, producing thousands of larval worms (microfilariae). Microfilariae migrate into the skin and the eye, causing damage to these organs as they die, resulting in terrible itching and ocular lesions. After repeated years of exposure to microfilariae, these lesions can result in irreversible disability.
The World Health Organization estimates that approximately 200 million individuals1 reside in an area at risk of infection across Africa, the Americas and Yemen, with over 90% of the burden of onchocerciasis-related disease found in Africa. Large scale onchocerciasis control interventions began in 1974 with the Onchocerciasis Control Program (OCP), which employed vector control interventions throughout West Africa to reduce transmission by targeting potential breeding sites of Simulium black flies. In the 1980s, Mectizan (ivermectin) was demonstrated to be an effective microfilaricide, shifting the control strategy towards preventive chemotherapy via mass drug administration (MDA). In 1987, the Mectizan Donation Program began supporting national onchocerciasis control programs by supplying ivermectin free of charge. Since the inception of the donation program, MDA with ivermectin has become the primary intervention to reduce the transmission of infection. Recent evidence from the Americas2, Uganda3,4 and Sudan5, as well as modelling studies6, has shown that MDA at population coverage of at least 80% can interrupt transmission after a period of approximately 12–15 years of annual or semi-annual treatment, achieving local elimination. The success observed in these settings has led stakeholders7,8 to consider the elimination of onchocerciasis across Africa9.
The objective of this systematic review of published literature was to quantify the amount of onchocerciasis-related data available from peer-reviewed sources and aggregate those indicators into a single open source dataset. In this report, we summarize the sources identified and data extracted on onchocerciasis-related infection and disability indicators from 1975–2017, encompassing the period of implementation for control and local elimination programs among the Americas, Africa and Yemen. By presenting our results, we aim to make these data available for use in future studies of the burden on onchocerciasis-related disease as well as prevalence of infection.
Methods
The following methods outlined were designed to provide more clarity surrounding the systematic literature data collection efforts from published articles on onchocerciasis. The protocols stated here have been adapted from previously published literature extraction efforts. A guide to our extraction has been included, Fig. 1, and shows the overarching process we followed to produce this dataset.
Data collection
Published reports of onchocerciasis were identified via searches through PubMed, Web of Science, and Scopus with the following search terms: “Oncho”, “river blindness”, “O. Volvulus”, “robles disease”, “blinding filariasis”, “coast erysipelas”, and “sowda”. The search was for all articles published about onchocerciasis prior to July 7, 2017. The exact strings used to identify articles for the systematic review can be seen in Supplementary Table 1. The search yielded 4,130 results in total, which was reduced to 2,502 after removing duplicates. The 2,502 results were then collated into a database before manually conducting title-abstract screenings. The first step of the systematic review was to implement a title/abstract screening. The purpose of this step was to remove any publications that did not report onchocerciasis prevalence among humans, case study articles, or ones solely reporting diagnostic development. A total of 579 articles underwent full text review. In order to meet the inclusion criteria, articles must have fit within the following: 1) detection of onchocerciasis in human subjects; 2) onchocerciasis cases from 1975 or later; 3) original sources only; 4) geographically representative populations only; and 5) no case-control studies. Full text review resulted in excluding 320 sources. In addition to the initial screening, all citations were reviewed to ensure relevant articles were retroactively added to the database if not already included. Through this iterative process, 18 articles that were not originally identified were retroactively added and subsequently marked for extraction. Ultimately, geographic data, as well as relevant epidemiological metadata were extracted from 259 peer-reviewed sources reporting prevalence of onchocerciasis.
Geo-positioning of data
Location information was manually extracted at the highest resolution possible from each article using either Google Maps or ArcGIS (https://www.esri.com/en-us/home). Two classes of spatial information were evaluated: points and polygons. If location of transmission was reported to have occurred within a 5 × 5 km area, the geography was defined as a point, and represented by a specific latitude and longitude. This definition of a point referencing an area smaller than 5 × 5 km was done to be compatible with satellite imagery, typically resolved at 5 km × 5 km for global analyses.
If location of transmission occurred within an area greater than 5 × 5 km (e.g. a large city), or if the location of transmission was less clear but known to have occurred in a general area (e.g. a province), a polygon was assigned to cover the region of the reported occurrence. In instances where the author’s spelling of a location differed from ArcGIS or Google Maps, contextual information was utilized in order to determine the location. Where authors provided maps, these were digitized using ArcGIS.
Three different types of polygons were used: known administrative boundaries, buffers, and custom polygons. For governorates, districts, or regions, the relevant administrative unit (sourced from the Global Administrative Unit Layers curated by the Food and Agricultural Organization of the UN10) was paired with the record. For cities and regions without corresponding administrative units, a buffer was created to encompass the area. Buffers were created by generating a circle that encompassed the entirety of the region of interest, using Google Maps to evaluate the required radius. Custom polygons were created in ArcGIS for areas with unspecified boundaries, such as various groups of communities and specific rivers that had been surveyed along. For subsequent re-identification, each polygon is assigned a unique code within a defined shapefile representing the spatial extent.
Data Records
The database has been made publicly available: Open Science Framework11 (OSF). Each row represents a unique location, year, diagnostic, age, and sex combination of data. A summary table of the number of records by diagnostic and location are presented in Table 1. The database contains the following fields:
-
1.
NID: Unique source identifier number that connects to the survey or paper that can be found through the Institute for Health Metrics and Evaluation’s (IHME) Global Health Data Exchange (http://ghdx.healthdata.org/).
-
2.
SITE_MEMO: A character string that details out the breakdown of the record’s location.
-
3.
LOC_GROUP: A unique identifying number that groups record rows that are georeferenced to the same location.
-
4.
LOC_UNIQ: A unique identifying number that groups record rows within a unique LOC_GROUP number that share the SITE_MEMO. This is done for collapse purposes when a location could not be individually georeferenced.
-
5.
LOC_SPEC: A character string to identify within each unique LOC_GROUP and LOC_UNIQ combo to signify whether that row represents a portion of the location total by a Sex, Age, or Sex Age subset.
-
6.
COUNTRY: A unique character string used by IHME to identify country and or subnational location within a country.
-
7.
POINT: Whether the record has been georeferenced to a point or a polygon. 1 = point; 0 = polygon.
-
8.
LAT: If POINT is 1, this is a decimal point value to represent the latitude of the point, otherwise this value will be NA.
-
9.
LONG: If POINT is 1, this is a decimal point value to represent the longitude of the point, otherwise this value will be NA.
-
10.
POLY_REFERENCE: If point is 0, this is a character string to represent the name of the shapefile that contains the georeferenced shape for this record.
-
11.
POLY_ID_FIELD_NAME: If POINT is 0, this is a character string to represent the name of the column in the shapefile that contains the unique identifier to connect the record to the polygon within the specified shapefile in POLY_REFERENCE.
-
12.
POLY_ID: If POINT is 0, this is a unique identifying numeric number to reference within the shapefile and column specified in POLY_REFERENCE and POLY_ID_FIELD_NAME to the polygon this record has been georeferenced to.
-
13.
AGE_START: A number that represents the start of the age range tested for this record. If no age is provided, assumed value of 0.
-
14.
AGE_END: A number that represents the end of the age range tested for this record. If no age is provided, assumed value of 99.
-
15.
SEX: A character string to represent which sex was tested in this record. Possible Values: Both, Female, Male.
-
16.
YEAR_START: Starting year of the data collected from the study. If no year provided, assumed start year to be 3 years before the publication date.
-
17.
MONTH_START: If provided, a numeric value between 1 and 12 to represent the starting month, otherwise it is given the value NA.
-
18.
YEAR_END: Ending year of the data collected from the study. If no year provided, assumed end year to be 1 year before the publication date.
-
19.
MONTH_END: If provided, a numeric value between 1 and 12 to represent the ending month, otherwise it is given the value NA.
-
20.
DX_CODE: A numeric value (integer) to represent the test for the presence or symptoms of onchocerciasis performed in this record. A table describing specific diagnostic codes has been included in the OSF data upload and in Supplementary Table 2.
-
21.
DX_GROUP: A character string that represents what diagnostic grouping this record belongs to. In the table of specific diagnostic names to codes you can find how codes were grouped. Possible values: Prevalence - ss (Skin Snip), nod (Nodules), sero (Serology), otherPrev (Other Prevalence); Sequelae - skin (Skin Symptoms), eye (Eye Symptoms).
-
22.
N: A numeric value representing the number of surveyed participants for this record.
-
23.
CASES: A whole numeric value representing the number of persons who tested positive for this record.
We have reviewed, standardized, and grouped all the diagnostics we extracted in this process and have detailed out the translation between diagnostic code and diagnostic name or diagnostic group in three tables included in the Global Health Data Exchange. In total, we extracted information on 120 diagnostics; 21 prevalence tests, 41 skin symptoms, and 58 eye symptoms. The conversion tables can be found in the OSF data upload and in Supplementary Table 2.
Technical Validation
The data validation was managed by a senior extractor who supervised the data entry performed by six data extraction analysts. Potential duplicate data records were investigated and removed if necessary, formatting and naming conventions were standardized across the data, and a new numbering system for collapse purposes was created programmatically due to miscommunication during the initial extraction. We also standardized and removed duplicate diagnostic test categories in the final dataset.
The georeferencing was verified by putting all the points and polygons onto a map and checking by year to see if there was any overlap between the locations. If overlap was found, the georeferencing would be manually double checked to ensure accuracy and remove redundancy. Every single polygon location was verified as well to ensure we had the smallest, most accurate polygon possible to represent the area surveyed. A finalized map of all points and polygons collected has been broken down into two maps, one of Africa (Fig. 2a) and one of the Americas (Fig. 2b). The geo-locations have been color coded to represent which locations have a specific type or types of diagnostics collected in that area.
The final georeferenced dataset is 14,043 records. The 112 records that we were unable to georeferenced to a subnational location have been kept and georeferenced to the country from which they were collected. A summary table of what countries these records belong to can be seen in Table 2.
Usage Notes
We provide a comprehensive dataset with onchocerciasis infection prevalence and related skin and eye disease. As national programs consider expanding mass drug administration of Ivermectin to achieve the elimination of onchocerciasis infection, national onchocerciasis elimination committees12 are tasked with compiling both current and historical data. The publication of this systematic review will enable stakeholders to review the published literature for locations, years or indicators of interest. Users should note that age-specific data are stored as reported in the original source, not aggregated by location or year.
For grouping purposes, we have included three variables –LOC_GROUP, LOC_UNIQ, and LOC_SPEC. These should be used by first pulling the diagnostic group information from the data the user is interested in and then collapsing by LOC_SPEC into LOC_UNIQ into LOC_GROUP.
This systematic review was conducted for the purposes of modeling the burden of onchocerciasis for the Global Burden of Disease Study (GBD). Since the GBD is implemented on an annual basis, the dataset will be updated on a routine basis to account for newly published data.
References
World Health Organization. Progress report on the elimination of human onchocerciasis, 2016–2017. WHO, http://www.who.int/onchocerciasis/resources/who_wer9245/en/ (2017).
Sauerbrey, M., Rakers, L. J. & Richards, R. F. Progress toward elimination of onchocerciasis in the Americas. International Health 10, 71–78 (2018).
Lakwo, T. et al. Cross-border collaboration in onchocerciasis elimination in Uganda: progress, challenges and opportunities from 2008 to 2013. Global Health. 14, 16 (2018).
Lakwo, T. L. et al. The disappearance of onchocerciasis from the Itwara focus, western Uganda after elimination of the vector simulium neavei and 19 years of annual ivermectin treatments. Acta Trop. 126, 218–221 (2013).
Zarroug, I. M. A. et al. The first confirmed elimination of an onchocerciasis focus in Africa: Abu Hamed, Sudan. Am. J. Trop. Med. Hyg. 95, 1037–1040 (2016).
Coffeng, L. E. et al. Elimination of african onchocerciasis: modeling the impact of increasing the frequency of ivermectin mass treatment. PLoS One. 9, 12 (2014).
Lawrence, J. & Sodahlon, Y. K. Onchocerciasis: the beginning of the end. Int Health 10, 1–2 (2018).
Lawrence, J., Sodahlon, Y. K., Ogoussan, K. T. & Hopkins, A. D. Growth, challenges, and solutions over 25 years of mectizan and the impact on onchocerciasis control. PLoS Negl Trop Dis 9, 5 (2015).
Rebollo, M. P. et al. Onchocerciasis: shifting the target from control to elimination requires a new first-step-elimination mapping. Int Health. 10, 14–19 (2018).
Food and Agricultural Organization of the United Nations. The global administrative unit layers (GAUL): technical aspects. FAO, http://www.fao.org/tempref/AG/Reserved/PPLPF/ftpOUT/GLiPHA/Gaulmaps/gaul_2008/documentation/GAUL%20Doc01%20Ver16.pdf (2008).
Cromwell, E. Onchocerciasis systematic review. Open Science Framework, https://doi.org/10.17605/OSF.IO/W69RF (2018).
Griswold, E. et al. The role of national committees in eliminating onchocerciasis. Int Health. 10, 60–70 (2018).
Acknowledgements
E.H., J.H., I.D.L., K.D., S.S., D.M.P., S.I.H. and E.A.C. are supported by a grant from the Bill & Melinda Gates Foundation OPP#1132415.
Author information
Authors and Affiliations
Contributions
E.A.C. conceived of the analysis. E.H., J.H. and I.D.L. carried out extraction, data cleaning, and mapping. S.S. provided managerial support. All authors participated in interpreting and summarizing the results. E.H. and J.H. wrote the first draft of the manuscript. All other authors reviewed the manuscript. All authors read and approved the final version submitted. E.A.C. has full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Corresponding author
Ethics declarations
Competing Interests
S.I.H. and D.M.P. are members of the editorial board of Scientific Data.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ISA-Tab metadata file
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Hill, E., Hall, J., Letourneau, I.D. et al. A database of geopositioned onchocerciasis prevalence data. Sci Data 6, 67 (2019). https://doi.org/10.1038/s41597-019-0079-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-019-0079-5
This article is cited by
-
A geopositioned and evidence-graded pan-species compendium of Mayaro virus occurrence
Scientific Data (2023)