As modern and ancient DNA sequence data from diverse human populations accumulate, evidence is increasing in support of the existence of beneficial variants acquired from archaic humans that may have accelerated adaptation and improved survival in new environments — a process known as adaptive introgression. Within the past few years, a series of studies have identified genomic regions that show strong evidence for archaic adaptive introgression. Here, we provide an overview of the statistical methods developed to identify archaic introgressed fragments in the genome sequences of modern humans and to determine whether positive selection has acted on these fragments. We review recently reported examples of adaptive introgression, grouped by selection pressure, and consider the level of supporting evidence for each. Finally, we discuss challenges and recommendations for inferring selection on introgressed regions.
Recent genomic analyses of sequence data from archaic humans have detected evidence of gene flow from the genomes of archaic humans to modern humans. Several studies have identified DNA segments within the genomes of modern humans that show strong signatures of both introgression and positive selection.
Statistical analysis methods have been developed to detect surviving archaic human DNA segments and to ascertain that these introgressed segments show signatures of positive selection.
There are reported examples of adaptive introgression by putative selective pressures, including pathogens, temperatures, altitude and diet. Candidate genes showing evidence for adaptive introgression have functional annotations that suggest roles in immune function, pigmentation, response to high altitude, and metabolism.
Although recent studies have identified several well-supported examples of adaptive introgression in humans, we still lack a framework that jointly models the effects of introgression and positive selection. Studies of the synergistic effect of these two forces will lead to a better characterization of adaptive introgression and of its relative importance in human adaptation.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
E.H.-S. is supported by start-up funds from the University of California, Merced, USA. R.N and S.S are supported by the US National Institutes of Health (grants R01HG003229-09 and K99 GM111744). F.R. is supported by the US National Institutes of Health grant to M. Slatkin (R01-GM40282). The authors thank F. Casey for discussions and help with Box 2, as well as E. Durand for help with Supplementary information S1 (box).
- Modern humans
Present-day humans and their recent ancestors, up to the time at which they diverged from their most closely related archaic human groups, the Neanderthals and Denisovans.
- Out-of-Africa model
A model of recent human evolution positing that all present-day humans had a recent origin in Africa and then expanded across the world, replacing other archaic groups.
Genetic exchange between individuals from two populations that were isolated in the past.
- Archaic humans
A broad category of human populations that diverged from present-day humans 550–765 thousand years ago (kya) (assuming a mutation rate of 0.5 × 10−9 per base pair per year) before present-day human populations started diverging from each other 86–130 kya (assuming the same mutation rate) and that are now extinct. This includes the Neanderthal and Denisovan populations.
- Ancestral population structure
A demographic scenario in which an ancestral population is not homogenously mixing. For example, some subpopulations might exchange more migrants with certain other subpopulations than with the rest because of geography or mate choice.
Sequences of contiguous alleles that are closely linked and that tend to be inherited together as a single unit.
- Human mutation rate
The rate (per base pair) at which mutations appear in the genome sequence of an individual at each generation or year. Currently, the exact value of this rate in humans is a topic of debate, with most estimates ranging from a value of 0.5 × 10−9 per base pair per year to a value of 10−9 per base pair per year.
- D statistic
A summary statistic based on differential sharing of derived alleles among different pairs of populations. When applied on a genome-wide scale, they can be used to detect significant deviations from a strict population tree with no admixture or migration.
- Incomplete lineage sorting
(ILS). A phenomenon whereby two or more lineages from different populations or species share a common ancestor more recently than their respective most recent common ancestor within populations, causing discordance between the population tree and a gene tree.
- Time of the MRCA
(TMRCA). Time in generations back into the past until two copies of an allele or two haplotypes shared a most recent common ancestor (MRCA). This is often an unknown parameter that can be estimated from genetic data.
- Linkage disequilibrium
(LD). A nonrandom association of alleles in different loci along the same chromosome due to low recombination rate, population structure and/or selection.
- S* statistic
A summary statistic based on patterns of linkage disequilibrium that can be used to detect introgressed haplotypes.
- Hidden Markov model
(HMM). A statistical modelling method used to infer hidden states from observed data along an ordered sequence, in which each hidden variable is independent of all other hidden variables, conditional on knowing the state of the immediately previous hidden variable.
- Archaic introgression
The introduction of genetic material into the ancestors of an extant population (for example, East Asians) from an archaic population that is currently extinct (for example, Neanderthals) via admixture.
- Conditional random field
(CRF). A statistical modelling method that is similar to a hidden Markov model but that also allows contextual data (regional data not directly contiguous to a site in a sequence) to provide information about the state of a hidden variable.
- Emission functions
Functions that relate the hidden variables to the observed data in the conditional random field framework.
- Positive selection
Selection that favours a specific allele over others. The allele may consequently rise to high frequency or become fixed. Hitchhiking of neutral alleles tightly linked to the favoured allele leaves a known genetic footprint in the genome, sometimes allowing detection of positive selection at a particular locus.
- Balancing selection
Selection that favours the maintenance of variability in a population, which can prevent any single allele from reaching fixation. Examples include frequency-dependent selection and heterozygous advantage (that is, overdominance).
- Negative selection
Selection that acts to prune away deleterious variants from the genome.
Pertaining to coalescence: an event in the past at which two genetic lineages sampled in the present shared their most recent common ancestor at a specific locus in the genome.
- Ancestral polymorphism
A present-day polymorphism that exists at a site or haplotype because more than one allele existed in the ancestor of the two populations before they diverged from each other.
- Hybrid sterility
Reduced viability or fertility of offspring from a mating between individuals from populations or species that diverged a long time ago; it is often due to incompatible mutations that occurred in each daughter population after they separated from each other.
- Uniquely shared sites
Sites containing high-frequency derived alleles in a particular population that are also present in a distantly related population but that are absent or at low frequencies in other populations more closely related to the first population. Such sites serve as necessary, but insufficient, evidence for adaptive introgression from the distantly related population.