Analyses of recently acquired genomic sequence data are leading to important insights into the early evolution of anatomically modern humans, as well as into the more recent demographic processes that accompanied the global radiation of Homo sapiens. Some of the new results contradict early, but still influential, conclusions that were based on analyses of gene trees from mitochondrial DNA and Y-chromosome sequences. In this review, we discuss the different genetic and statistical methods that are available for studying human population history, and identify the most plausible models of human evolution that can accommodate the contrasting patterns observed at different loci throughout the genome.
Over the past two decades, phylogenetic analyses of mitochondrial DNA and Y-chromosome polymorphisms supported a simple model of human origins, called the single origin hypothesis.
The single origin model proposes that anatomically modern humans trace their ancestry to a single small population that lived in Africa, and that, following a speciation bottleneck, the population expanded and completely replaced archaic forms of humans.
More sophisticated methods of analysis, based on the coalescent approach, are being applied to a plethora of new genomic sequence data.
These new analyses of multilocus sequence data show a large variance in the shape and depth of genealogies for X-chromosomal and autosomal loci, and present a more complex picture of human demographic history.
Non-African populations have reduced diversity and fewer rare polymorphisms than African populations, suggesting a history of bottlenecks. By contrast, African populations do not exhibit the predicted patterns of polymorphism after a speciation bottleneck.
These genome-scale patterns could be best accounted for by models that involve low levels of gene flow among archaic populations before the emergence of anatomically modern humans — that is, they imply the existence of ancestral population structure.
There is also growing evidence that some highly divergent genetic lineages might have entered our genome through hybridization between an expanding anatomically modern human population and archaic forms of humans.
Further tests of the predictions of these models await more systematic surveys of DNA sequence variation in multiple human populations, along with more sophisticated methods of population genetic inference.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We thank A. Di Rienzo for providing Tajima's D values and the following people for providing feedback on the manuscript: M. Cox, L. Excoffier, F. Mendez, C. Stringer, J. Wilder and E. Wood. Some of the work presented here was made possible by a US National Science Foundation HOMINID grant to M.F.H.
All the taxa on the human lineage after the split from the common ancestor with the chimpanzee.
- Neutral DNA polymorphism
Nucleotide variants that segregate in a population but have frequencies that are not influenced by natural selection.
- Demographic processes
Changes in population size, distribution and structure.
A transient reduction in the abundance of a population. This could occur, for example, because of environmental catastrophe or after the founding of a new population.
An epoch of the Quaternary period beginning 1.8 million years ago and transitioning to the Holocene epoch approximately 10,000 years ago. The Pleistocene is characterized by a cool climate and extensive glaciation of northern latitudes.
- Population phylogeny
The hierarchical relationship among individual populations, typically inferred from pairwise genetic differences between populations.
- Summary statistics
Statistics that describe some aspect of polymorphism data, such as the number of polymorphic sites, the distribution of mutation frequencies or the extent of association between linked polymorphisms. Summary statistics are often estimates of parameters in an evolutionary model.
- Coalescent approach
A probabilistic construct that describes the hierarchical common ancestry of a sample of gene copies. The probability that two gene copies share a common ancestor (or coalesce) in the preceding generation is proportional to the reciprocal of the size of the entire population.
A contiguous DNA sequence of arbitrary length along a chromosome that has a primary structure that is distinct from that of other homologous regions in a given population.
- Place of the most recent common ancestor
The geographical area, of arbitrary scale, where the ancestry of a current sample of gene copies can be traced back to a single, endemic ancestral population.
The state of genotypic or phenotypic character, possessed by some biological entity, which has mutated from a common ancestral state.
When the common ancestor of one natural group is shared with any other such group.
- Time to the most recent common ancestor
The number of generations back in time when a single gene copy gave rise to all of the gene copies in a contemporary sample. If n gene copies are sampled from a population of size N, the time to a most recent common ancestor for an autosomal locus is expected to be 4N(1 − 1/n) generations.
- Effective population size
The number of individuals of a given generation that contribute gametes to the subsequent generations. This abstract quantity depends on the breeding sex ratio, number of offspring per individual and type of mating system.
- Island model of population structure
A commonly used model to describe gene flow in a subdivided population in which each subpopulation of constant size, N, receives and gives migrants to each of the other subpopulations at the same rate, m. Under the Island model, FST = 1/(4Nm + 1).
- Standard neutral model
A population genetics model that assumes all individuals in a population are replaced by their offspring each generation, so that the population size remains constant, mating occurs randomly and each parent produces a Poisson-distributed number of offspring. Under these conditions, the model predicts the fate of mutations that are not affected by natural selection.
- Harmonic mean
One method for calculating an average, defined as the reciprocal of the arithmetic mean of the reciprocals of a specified set of positive numbers.
- Tajima's D
A statistic used to test the standard neutral model for a given region of DNA sequence. It is the standardized difference between the number of pairwise nucleotide differences and the total number of segregating sites.
- Frequency spectrum
The distribution of polymorphism frequencies in a sample of DNA sequences. For example, 30% of polymorphisms might occur in a single gene copy, 20% in two gene copies, and so on. Under the standard neutral model, the frequency spectrum is expected to follow a geometric distribution.
- Linkage disequilibrium
The non-random association of polymorphisms at two linked loci. Linkage disequilibrium is created by mutation, but broken down over time primarily by crossing over between the two loci.
- Directional selection
A form of positive selection in which a single mutation has a selective advantage over all other mutations, resulting in the selected mutation rapidly reaching fixation (that is, a frequency of 100%) in the population.
- Balancing selection
A form of positive selection that maintains polymorphism in the population. One well-known form of balancing selection is heterozygote advantage, where an individual who is heterozygous at a selected locus has a higher fitness than either of the homozygous genotypes.
- Population structure
Arises when the individual members of a population do not mate at random with respect to geography, age class, language, culture or some other defining characteristic.
- Likelihood-based method
A class of statistical methods that calculate the probability of the observed data under varying hypotheses, in order to estimate model parameters that best explain the observed data and determine the relative strengths of alternative hypotheses.
- Bayesian technique
An approach to inference in which probability distributions of model parameters represent both what we believe about the distributions before looking at data and the likelihood of the parameters given the observed data.
- Markov chain Monte Carlo technique
A simulation technique for producing samples from an unknown probability distribution. By evaluating the probability of the observed data at each step in the Markov chain, an estimate of the probability distribution of model parameters can be obtained by observing the behaviour of the chain as it proceeds through many steps.
- Importance sampling
An efficient simulation method for integrating an unknown function, in which only those parameters that can actually produce the observed data are considered.
- Approximate likelihood
A measure of the fit of some hypothetical model to a statistic calculated from observed data. For example, if 50% of polymorphisms occur in single individual chromosomes, a population growth model might have a higher likelihood of producing the observed number of singleton mutations than a model of population reduction.
A geographically localized population of a species that can be considered a distinct, interbreeding unit.
A human cultural period, beginning approximately 10,000 years ago, marked by the appearance in the archaeological record of industries such as polished stone and metal tools, pottery, animal domestication and agriculture.
Describes a diploid population in which each individual of a particular sex has an equal chance of producing offspring with any other member of the opposite sex in the population.
About this article
Scientific Reports (2015)