Most large-scale human genome sequencing projects to date have sampled either large, metropolitan populations or only a few individuals from more diverse groups. Now, a study in Science demonstrates that anthropologically informed genome sequencing can provide a fuller understanding of human genetic variation than previous approaches.

In the study, 929 whole genomes, representing 54 populations with different geographical locations, languages and cultures, were sequenced at an average of 35× coverage. Linked-read sequencing was used to phase 26 of these genomes (representing 13 populations). Millions of the identified 67.3 million SNPs, 8.8 million insertions and deletions (indels) and 40,763 copy number variants were not detected in other large-scale projects, and hundreds of thousands of these novel variants are common in at least 1 of the 54 populations. Additionally, the discovered SNPs provide a more nuanced view of shared ancestry, particularly among African populations, than do variants typically included on common genotyping arrays.

Credit: Mopic/Alamy

Populations from all geographical regions were found to have some private common variation, but such variants reach high frequencies only in African, American and Oceanian populations. In most populations, this variation arose largely by novel mutation. However, a substantial proportion of private variants in Oceanian populations is derived from admixture with Denisovans. Patterns of genetic variation indicate that population sizes expanded for most groups over the past 10,000 years, except for hunter–gatherers in Africa.

Analyses of haplotype variation indicate that present-day population structure formed gradually over the past ~250,000 years, with evidence for more recent gene flow in most populations but also more ancient interactions in a few populations. Patterns of diversity in archaic haplotypes suggest that a single episode of admixing occurred between Neanderthals and the ancestors of present-day humans. By contrast, Denisovans are likely to have admixed multiple times with geographically distinct ancestral human populations.

The genome sequences from this study are freely available and provide a valuable resource for further examining human genetic variation from a range of perspectives, from anthropology through to medicine. Understanding whether population-specific variation has medical relevance will be particularly important.