Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

High-depth African genomes inform human migration and health

An Author Correction to this article was published on 12 April 2021

This article has been updated


The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals—comprising 50 ethnolinguistic groups, including previously unsampled populations—to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that population from Zambia were a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon—but in other genes, variants denoted as ‘likely pathogenic’ in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: H3Africa WGS data.
Fig. 2: Population admixture and genetic ancestry among African populations.
Fig. 3: Novel variation in the H3Africa dataset.
Fig. 4: Selection and medically relevant variants in African populations.

Data availability

WGS data used in this paper are available through the European Genome-phenome Archive (EGA) under study accession number: EGAS00001002976. The data include genomic (BAMs and VCFs) and minimal phenotypic data from appropriately consented individuals. In compliance with current international standards to protect participant confidentiality, the H3Africa-generated data are available to bona fide researchers within the wider scientific community through a controlled access process. Some of the DNA samples are archived in H3Africa biorepositories as part of the H3Africa Consortium agreement. To gain access to data in the EGA or biospecimens in the biorepositories, requests must be submitted to, or requested through the H3Africa Data and Biospecimen Catalogue ( Requests are subject to approval by an independent H3Africa Data and Biospecimen Access Committee (DBAC). Novel SNVs identified and reported here will be deposited into dbSNP. The H3Africa Initiative is committed to providing research data generated by the H3Africa research projects to the entire research community. H3Africa research seeks to promote fair collaboration between scientists in Africa and those from elsewhere. The H3Africa Consortium Data Sharing, Access and Release Policy outlines a policy framework that places a firm focus on African leadership and capacity building as guiding principles for African genomics research. The policy and related documents are available here:

Code availability

Code for the implementation of PROCRUSTES is available at, licensed under the GNU General Public License v.3.0.

Change history


  1. 1.

    Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541, 302–310 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  3. 3.

    Tishkoff, S. A. et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Gurdasani, D. et al. The African Genome Variation Project shapes medical genetics in Africa. Nature 517, 327–332 (2015).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Posey, J. E. et al. Insights into genetics, human biology and disease gleaned from family based genomic studies. Genet. Med. 21, 798–812 (2019).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Landry, L. G., Ali, N., Williams, D. R., Rehm, H. L. & Bonham, V. L. Lack of diversity in genomic databases is a barrier to translating precision medicine research into practice. Health Aff. 37, 780–785 (2018).

    Google Scholar 

  8. 8.

    H3Africa Consortium. Enabling the genomic revolution in Africa. Science 344, 1345–1346 (2014).

    Google Scholar 

  9. 9.

    Patin, E. et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science 356, 543–546 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Hanchard, N. et al. Classical sickle beta-globin haplotypes exhibit a high degree of long-range haplotype similarity in African and Afro-Caribbean populations. BMC Genet. 8, 52 (2007).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Ranciaro, A. et al. Genetic origins of lactase persistence and the spread of pastoralism in Africa. Am. J. Hum. Genet. 94, 496–510 (2014).

    CAS  Google Scholar 

  12. 12.

    Genovese, G. et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 329, 841–845 (2010).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Sabeti, P. C. et al. Detecting recent positive selection in the human genome from haplotype structure. Nature 419, 832–837 (2002).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Schlebusch, C. M. et al. Genomic variation in seven Khoe-San groups reveals adaptation and complex African history. Science 338, 374–379 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Scheinfeldt, L. B. et al. Genomic evidence for shared common ancestry of East African hunting-gathering populations and insights into local adaptation. Proc. Natl Acad. Sci. USA 116, 4166–4175 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Skoglund, P. et al. Reconstructing prehistoric African population structure. Cell 171, 59–71 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Choudhury, A. et al. Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans. Nat. Commun. 8, 2062 (2017).

    ADS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Ilboudo, H. et al. Introducing the TrypanoGEN biobank: a valuable resource for the elimination of human African trypanosomiasis. PLoS Negl. Trop. Dis. 11, e0005438 (2017).

    PubMed  PubMed Central  Google Scholar 

  19. 19.

    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    PubMed  PubMed Central  Google Scholar 

  21. 21.

    Semo, A. et al. Along the Indian Ocean coast: genomic variation in Mozambique provides new insights into the Bantu expansion. Mol. Biol. Evol. 37, 406–416 (2020).

    CAS  Google Scholar 

  22. 22.

    Loh, P.-R. et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193, 1233–1254 (2013).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Patin, E. et al. The impact of agricultural emergence on the genetic history of African rainforest hunter-gatherers and agriculturalists. Nat. Commun. 5, 3163 (2014).

    ADS  Google Scholar 

  24. 24.

    Shriner, D. & Rotimi, C. N. Genetic history of Chad. Am. J. Phys. Anthropol. 167, 804–812 (2018).

    PubMed  PubMed Central  Google Scholar 

  25. 25.

    Campbell, I. M. et al. Multiallelic positions in the human genome: challenges for genetic analyses. Hum. Mutat. 37, 231–234 (2016).

    CAS  Google Scholar 

  26. 26.

    Campbell, M. C. & Tishkoff, S. A. African genetic diversity: implications for human demographic history, modern human origins, and complex disease mapping. Annu. Rev. Genomics Hum. Genet. 9, 403–433 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Pavlidis, P., Živkovic, D., Stamatakis, A. & Alachiotis, N. SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol. Biol. Evol. 30, 2224–2234 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Szpiech, Z. A. & Hernandez, R. D. selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol. Biol. Evol. 31, 2824–2827 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Vitti, J. J., Grossman, S. R. & Sabeti, P. C. Detecting natural selection in genomic data. Annu. Rev. Genet. 47, 97–120 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Yi, X. et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science 329, 75–78 (2010).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Retshabile, G. et al. Whole-exome sequencing reveals uncaptured variation and distinct ancestry in the southern African population of Botswana. Am. J. Hum. Genet. 102, 731–743 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Lim, E. T. et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    World Health Organization. WHO Influenza (Seasonal): Fact Sheet (2016).

  34. 34.

    Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 (2017).

    Google Scholar 

  35. 35.

    Manjurano, A. et al. African glucose-6-phosphate dehydrogenase alleles associated with protection from severe malaria in heterozygous females in Tanzania. PLoS Genet. 11, e1004960 (2015).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Howes, R. E., Battle, K. E., Satyagraha, A. W., Baird, J. K. & Hay, S. I. G6PD deficiency: global distribution, genetic variants and primaquine therapy. Adv. Parasitol. 81, 133–201 (2013).

    Google Scholar 

  37. 37.

    Kimuda, M. P. et al. No evidence for association between APOL1 kidney disease risk alleles and human African trypanosomiasis in two Ugandan populations. PLoS Negl. Trop. Dis. 12, e0006300 (2018).

    PubMed  PubMed Central  Google Scholar 

  38. 38.

    Rotimi, C. N. & Jorde, L. B. Ancestry and disease in the age of genomic medicine. N. Engl. J. Med. 363, 1551–1558 (2010).

    Google Scholar 

  39. 39.

    Phillipson, D. W. Iron Age history and archaeology in Zambia. J. Afr. Hist. 15, 1–25 (1974).

    Google Scholar 

  40. 40.

    Schlebusch, C. M. & Jakobsson, M. Tales of human migration, admixture, and selection in Africa. Annu. Rev. Genomics Hum. Genet. 19, 405–428 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Mulindwa, J. et al. High levels of genetic diversity within Nilo-Saharan populations: implications for human adaptation. Am. J. Hum. Genet. 107, 473–486 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Shiroya, O. J. E. The Lugbara states — politics, economics and warfare in the eighteenth and nineteenth centuries. TransAfrican J. Hist. 10, 125–183 (1981).

    Google Scholar 

  43. 43.

    R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, 2017).

  44. 44.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    O’Connell, J. et al. A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet. 10, e1004234 (2014).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Loh, P. R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    PubMed  PubMed Central  Google Scholar 

  50. 50.

    Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

    PubMed  PubMed Central  Google Scholar 

  51. 51.

    Buchmann, R. & Hazelhurst, S. Genesis PCA and Admixture Plot Viewer. Version 0.2.6 (2014).

  52. 52.

    Jakobsson, M. & Rosenberg, N. A. CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 1801–1806 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Wang, C. et al. Comparing spatial maps of human population-genetic variation using Procrustes analysis. Stat. Appl. Genet. Mol. Biol. 9, 13 (2010).

    MathSciNet  CAS  Google Scholar 

  54. 54.

    Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Pickrell, J. K. et al. Ancient west Eurasian ancestry in southern and eastern Africa. Proc. Natl Acad. Sci. USA 111, 2632–2637 (2014).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Browning, B. L. & Browning, S. R. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

    PubMed  PubMed Central  Google Scholar 

  57. 57.

    Atzmon, G. et al. Abraham’s children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern ancestry. Am. J. Hum. Genet. 86, 850–859 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Maples, B. K., Gravel, S., Kenny, E. E. & Bustamante, C. D. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am. J. Hum. Genet. 93, 278–288 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Haber, M. et al. Chad genetic diversity reveals an African history marked by multiple Holocene Eurasian migrations. Am. J. Hum. Genet. 99, 1316–1324 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Weissensteiner, H. et al. HaploGrep 2: mitochondrial haplogroup classification in the era of high-throughput sequencing. Nucleic Acids Res. 44, W58–W63 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Van Geystelen, A., Decorte, R. & Larmuseau, M. H. D. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications. BMC Genomics 14, 101 (2013).

    PubMed  PubMed Central  Google Scholar 

  62. 62.

    Pemberton, T. J. et al. Genomic patterns of homozygosity in worldwide human populations. Am. J. Hum. Genet. 91, 275–292 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63.

    Fumagalli, M. Assessing the effect of sequencing depth and sample size in population genetics inferences. PLoS ONE 8, e79667 (2013).

    ADS  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, D754–D761 (2018).

    CAS  Google Scholar 

  65. 65.

    Amberger, J. S., Bocchini, C. A., Scott, A. F. & Hamosh, A. leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 47, D1038–D1043 (2019).

    CAS  Google Scholar 

  66. 66.

    Stelzer, G. et al. The GeneCards suite: from gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinformatics 54, 1.30.31–1.30.33 (2016).

    Google Scholar 

  67. 67.

    Pybus, M. et al. 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans. Nucleic Acids Res. 42, D903–D909 (2014).

    CAS  Google Scholar 

  68. 68.

    Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Pickrell, J. K. et al. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19, 826–837 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

    Google Scholar 

  72. 72.

    Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).

    PubMed  PubMed Central  Google Scholar 

  73. 73.

    Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    CAS  Google Scholar 

  76. 76.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Mazandu, G. K., Chimusa, E. R., Mbiyavanga, M. & Mulder, N. J. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool. Bioinformatics 32, 477–479 (2016).

    CAS  Google Scholar 

  78. 78.

    Bindea, G. et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25, 1091–1093 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. 79.

    Balasubramanian, S. et al. Using ALoFT to determine the impact of putative loss-of-function variants in protein-coding genes. Nat. Commun. 8, 382 (2017).

    ADS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Smedley, D. et al. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Res. 43, W589–W598 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Piñero, J. et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45, D833–D839 (2017).

    PubMed  PubMed Central  Google Scholar 

  82. 82.

    Babbi, G. et al. eDGAR: a database of disease–gene associations with annotated relationships among genes. BMC Genomics 18, 554 (2017).

    PubMed  PubMed Central  Google Scholar 

  83. 83.

    Davis, A. P. et al. The Comparative Toxicogenomics Database: update 2019. Nucleic Acids Res. 47, D948–D954 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84.

    ACMG Board of Directors. ACMG policy statement: updated recommendations regarding analysis and reporting of secondary findings in clinical genome-scale sequencing. Genet. Med. 17, 68–69 (2015).

    Google Scholar 

  85. 85.

    Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank the members of the wider H3Africa Consortium ( for their support and input, particularly J. Troyer and A. Duncanson; S. Tishkoff, J. Lupski, J. Belmont and C. Tyler-Smith for comments and feedback on the manuscript; K. Garson, A. Gillum and K. Schulze for their help with figure visualizations and for giving permission for the use of these figures; M. Cherif Rahimy for their assistance with recruitment in Benin and L. Sergeevna Mainzer, G. Rendon and V. Jongeneel from the HPCBio team at the University of Illinois Urbana-Champaign for the initial processing and variant calling of the high depth H3A-Baylor dataset using the Blue Waters supercomputing centre. WGS in H3Africa cohorts was supported by a grant from the National Human Genome Research Institute, National Institutes of Health (NIH/NHGRI) U54HG003273. The African Collaborative Center for Microbiome and Genomics Research (ACCME) is funded by NIH/NHGRI grant U54HG006947. The AWI-Gen Collaborative Centre is funded by NIH grant U54HG006938. The Exploring Perspectives on Genomics and Sickle Cell Public Health Interventions was funded by NHGRI/NIH grant U01HG007459. The Clinical and Genetic Studies of Hereditary Neurological Disorders in Mali study was funded by the NHGRI/NIH grant U01HG007044. The Collaborative African Genomics Network (CAfGEN) is funded by the National Institute of Allergy and Infectious Diseases (NIAID) of NIH and the NHGRI of the NIH (U54AI110398). ‘TrypanoGEN: an integrated approach to the identification of genetic determinants of susceptibility to trypanosomiasis’, was funded by the Wellcome Trust (099310/Z/12/Z). L.R.B. was supported by the CERCA Programme/Generalitat de Catalunya and by the Spanish Ministry of Economy and Competitiveness, through the ‘Severo Ochoa Programme for Centres of Excellence in R&D’ 2016–2019 (SEV-2015-0533). N.M. (principal investigator), S.A., G.B., G.W., J.K., Y.J.F., T.O., O.F., E.A., S.H., G. Mazandu, M. Mbiyvanga, A.B., S.K.K., E.R.C. and A. Moussa are funded by the NIH H3ABioNet grant under award number U24HG006941. The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the African Academy of Sciences, the National Institutes of Health or the Wellcome Trust.

Author information





Study design: M.R., C.R., Z.L. and A.A.A. Manuscript oversight and editing: N.A.H. (team leader), Z.L., S.K.K. and S.N.A. Data processing and quality control: N.M. (team leader), G.B., G. Mazandu, M. Mbiyavanga and E.R.C. Population genetics: S.A. (team leader), A.C., D. Sengupta, D. Shriner, S.H., E.A., T.O., E.D. and O.F. Signatures of selection: A.C. (team leader), D. Sengupta and T.B. Novel and rare variation: L.R.B. (team leader), N.A.H., T.B., O.A.N., G.B. and M. Mbiyavanga. Medically relevant variation: A.A.A. (team leader), Y.J.F., A.B., N.M., J.K., G.W., N.A.H., A.W.G. and T.B. Data generation: R.A.G., G. Metcalf and D.M. Data providers: The African Collaborative Center for Microbiome and Genomics Research (ACCME): C.A. (principal investigator), S.N.A. and A.A.A; TrypanoGEN: E.M. (principal investigator), D.M.-N., M.K., G.S., B.B., M.S., C.H.-F., H.N. and A. Macleod. AWI-Gen: M.R. (principal investigator), H.S., R.P.B., G. Agongo and A. Oduro. H3Africa Kidney Disease Research Network: A. Ojo, D.A. (principal investigators), B.O.S. and D.B. Awadalla lab: P.A. (principal investigator), E.G. and V.B. Exploring Perspectives on Genomics and Sickle Cell Public Health Interventions: A.W. (principal investigator). Clinical and Genetic Studies of Hereditary Neurological Disorders in Mali: G.L. (principal investigator), L.C., S.D. and O.S. Collaborative African Genomics Network (CAfGEN): G. Anabwani., M. Matshaba (principal investigators), S.W.M., A.K., M.J., G. Mardon (co-principal investigators), B.M., G.R., N.A.H., L.W., S.M. and S.K. H3BioNet: A. Moussa and A.B.

Corresponding authors

Correspondence to Adebowale A. Adeyemo or Zané Lombard or Neil A. Hanchard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Laura Gauthier, Joanna Mountain and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 ADMIXTURE clustering analysis of H3A-WGS samples.

Existing African datasets from AGVP4, 1000 Genomes project2, SAHGP17 and previously published studies9,14 and a representative European population (CEU) from the 1000 Genomes Project are included as reference panels. K values from 2 to 10 are shown. See Supplementary Table 22 for definitions of abbreviations.

Extended Data Fig. 2 Characteristics of known and regional selected loci.

a, CLR score distributions in known selected genes (significant population-specific outlier scores (that is, with P < 0.01) for the window overlapping the gene are indicated by an asterisk). b, Summary of PBS comparisons. Genes with longer branch lengths in WGR compared to BOT and CAM are circled in blue; longer branch lengths in BOT and CAM in comparison to the other two populations are encircled in brown and dark green, respectively. c, Overlap between the proportion of KS ancestry (%) and CLR score across chromosome 6 in BOT.

Extended Data Fig. 3 Highly divergent and putative LOF variants.

a, EFO traits from the GWAS catalogue reflected by highly divergent SNVs within 50 kb of GWAS hits. From left to right, ribbons illustrate the relative representation of variants across pairwise population comparisons, GWAS ancestry, EFO top label, EFO trait or disease label, and disease or traits mapped to the EFO label. b, Distribution and sharing of common (MAF > 5%) putative LOF variants between two or more populations (coloured bars) and between all populations surveyed (red bars). c, Specific disease classes to which 5% or more genes with putative LOF variants shared between all populations were mapped. d, Correlation (Pearson) between WHO mortality rates for influenza and ratio of putative LOF variants in direct (n = 181) compared with indirect (n = 1842) influenza-associated genes (red solid line, all populations; red dotted line, west African populations). The blue dotted line represents the mean correlation for the same correlations generated using 1,000 permutations of random genes; the s.e.m. for all populations is shown in grey. e, Correlation statistics (adjusted R2) for the putative LOF ratio for genes related to hepatitis C (HCV, n = 190 direct genes, n = 1837 indirect genes), HIV(n = 724 direct genes, n = 1351 indirect genes), influenza in west African countries (CAM, MAL, FNB and BRN), and malaria (n = 484 direct genes, n = 1554 indirect genes) are shown as red dots against the box plot distributions of correlation statistics (adjusted R2) generated using 1,000 permutations of random genes (Supplementary Table 18). Box plots show the median value (centre line), whiskers indicate the limits of the highest (fourth) and lowest (first) quartiles of the data; distribution outliers are shown as dots.

Extended Data Fig. 4 Distribution of G6PD variants and ClinVar pathogenic variants across H3Africa populations.

a, Frequency distribution of pathogenic and likely pathogenic variants (n = 287) in H3Africa HC-WGS populations. Disease genes with variants that had an allele frequency > 5% across multiple populations (shown in Fig. 4c) are highlighted. Box plots show the median value (centre line), whiskers indicate the limits of the highest (fourth) and lowest (first) quartiles of the data; distribution outliers are shown as dots. b, Relative frequencies of 11 G6PD deficiency-associated alleles within each population separated by sex. G6PD A− 202A and 376G refer to the A-deficiency associated with either rs1050828 (c.202G>A) or rs1050829 (c.376A>G) (MIM 305900).

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-5, Supplementary Figures 1-20, Supplementary Methods Figures 1–3 and Supplementary References.

Reporting Summary

Supplementary Tables

This file contains Supplementary Methods Tables 1-2 and 23 Supplementary Tables (referred to in the main Supplementary Information file).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Choudhury, A., Aron, S., Botigué, L.R. et al. High-depth African genomes inform human migration and health. Nature 586, 741–748 (2020).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing