Letter | Published:

Modelling the recent common ancestry of all living humans

Nature volume 431, pages 562566 (30 September 2004) | Download Citation

Subjects

Abstract

If a common ancestor of all living humans is defined as an individual who is a genealogical ancestor of all present-day people, the most recent common ancestor (MRCA) for a randomly mating population would have lived in the very recent past1,2,3. However, the random mating model ignores essential aspects of population substructure, such as the tendency of individuals to choose mates from the same social group, and the relative isolation of geographically separated groups. Here we show that recent common ancestors also emerge from two models incorporating substantial population substructure. One model, designed for simplicity and theoretical insight, yields explicit mathematical results through a probabilistic analysis. A more elaborate second model, designed to capture historical population dynamics in a more realistic way, is analysed computationally through Monte Carlo simulations. These analyses suggest that the genealogies of all living humans overlap in remarkable ways in the recent past. In particular, the MRCA of all present-day humans lived just a few thousand years ago in these models. Moreover, among all individuals living more than just a few thousand years earlier than the MRCA, each present-day human has exactly the same set of genealogical ancestors.

Main

In investigations of the common ancestors of all living humans, much attention has focused on descent through either exclusively maternal or exclusively paternal lines, as occurs with mitochondrial DNA and most of the Y chromosome4,5. But according to the more common genealogical usage of the term ‘ancestor’, ancestry encompasses all lines of descent through both males and females, so that the ancestors of an individual include all of that person's parents, grandparents, and so on.

For a population of size n, assuming random mating (and so ignoring population substructure), probabilistic analysis2 has proved that the number of generations back to the MRCA, Tn, has a distribution that is sharply concentrated around log2n. We express this using the notation Tn log2n, meaning that the quotient Tn/log2n converges in probability to 1 as n → ∞. In contrast, the mean time to the MRCA along exclusively matrilineal or patrilineal lines is approximately n generations6, and the distribution is not sharply concentrated. For example, in a panmictic population of one million people, the genealogical MRCA would have lived about 20 generations ago, or around the year ad 1400, assuming a generation time of 30 years. The MRCA along exclusively maternal lines would have lived something like 50,000 times earlier—in the order of one million generations ago.

As genealogical ancestry is traced back beyond the MRCA, a growing percentage of people in earlier generations are revealed to be common ancestors of the present-day population. Tracing further back in time, there was a threshold, let us say Un generations ago, before which ancestry of the present-day population was an all or nothing affair. That is, each individual living at least Un generations ago was either a common ancestor of all of today's humans or an ancestor of no human alive today. Thus, among all individuals living at least Un generations ago, each present-day human has exactly the same set of ancestors. We refer to this point in time as the identical ancestors (IA) point. As with the MRCA point, the IA point is also quite recent in a randomly mating population: Un 1.77 log2n generations ago2.

The major problem in applying these results to human populations is that mating is not random in the real world. Mating patterns are structured by geography, proximity, culture, language and social class. Nevertheless, even in populations with considerable internal structure, the time to the MRCA can be remarkably brief. To demonstrate this in a tractable mathematical model, consider a population of size n divided into randomly mating subpopulations that are linked by occasional migrants. The population is represented by a graph, G, with a node for each subpopulation. Edges indicate pairs of nodes that exchange a small number (for example, one pair) of migrants per generation. Let R denote the radius of G, and let Δ be a quantity ranging between 0 and 1 that depends on the structure of G (see Box 1). A probabilistic analysis (see Supplementary Information) shows that as n → ∞, Tn (R + Δ) log2n. Furthermore, if we let D denote the diameter of the graph, then the number of generations, Un, since the IA point satisfies Un (D + 1.77) log2n.

Box 1: Graph-theoretical definitions

The length of a path in a graph, G, is the number of edges in the path. For each pair of nodes i and j in G, the distance d(i, j) is defined to be the length of a shortest path joining i and j. The radius of G is and a node i is called a centre of G if maxkG d(i, k) = R. Assume R ≥ 1; the case R = 0 (G has one node) was treated previously2. For each centre node i, let Si be a set of minimal size that consists of neighbours of node i and satisfies min {d(j, k): j{i}Si} ≤ R - 1 for all kG. Hi is defined as the number of nodes in Si, H is the minimum of Hi over all centres i, and Δ = 1 - 1/H. The diameter of G is D = maxi,kG d(i, k).

Computer simulations accord with these theoretical predictions. Tables 1 and 2 give distributions of Tn and Un for small populations of varying sizes in graphs with one node, three connected nodes, five fully connected nodes and for a ten-node graph loosely based on world geography as shown in Fig. 1. In these simulations, neighbouring subpopulations exchange one pair of migrants per generation. Each mean is calculated from 100 model runs. Although guaranteed to be accurate only for sufficiently large n, the theoretical predictions describe the simulations quite well even for models with just a few thousand individuals. Whenever n is doubled, Tn is expected to increase by R + Δ, and Un is expected to increase by D + 1.77. These predicted increases, which are listed in the last columns of Tables 1 and 2, agree closely with the simulation results.

Table 1: Simulations of Tn
Table 2: Simulations of Un
Figure 1: World map viewed as a ten-node graph.
Figure 1

This graph has radius 3 and diameter 5.

To hazard a rough first guess about human recent common ancestors, we could extrapolate the results for the graph of Fig. 1 to a growing population with a final size of 250 million. When applying this model to a growing population, the fixed population size that provides the best approximation is the size at the time that the MRCA lived. We take this effective population size to be 250 million, which is approximately the global population in the year ad 1. Starting from n = 16,000, a population of 250 million is reached by doubling 14 times. Approximating the increases in Tn and Un beyond the values seen in Tables 1 and 2 by their theoretical predictions for each doubling of n, we arrive at Tn ≈ 34 + 14 × 3 = 76 generations (about 2,300 years) and Un ≈ 74 + 14 × 6.77 = 169 generations (about 5,000 years). These estimates would suggest, with the exchange of just one pair of migrants per generation between large panmictic populations of realistic size, that the MRCA appears in about the year 300 bc, and all modern individuals have identical ancestors by about 3,000 bc. Such estimates are extremely tentative, and the model contains several obvious sources of error, as it was motivated more by considerations of theoretical insight and tractability than by realism. Its main message is that substantial forms of population subdivision can still be compatible with very recent common ancestors.

The dynamics of human subpopulations are much more complex than those in the simple graph model discussed above. Although these complexities make theoretical analysis difficult, a computer model incorporating more complicated forms of population substructure and migration allows the demographic history of human populations to be simulated. The Supplementary Information contains more details on the model and computations; here we briefly outline some of the main points.

This model is based on a simplified projection of the world's actual inhabited land masses and has three levels of substructure: continents, ‘countries’ and ‘towns.’ Figure 2 depicts the model's geography and migration routes used before ad 1500, with the countries shown as squares and the number of towns per country differing from continent to continent. Towns and countries represent both the local geographical areas and the relevant social and ethnic groups from which most people find mates.

Figure 2: Geography and migration routes of the simulated model.
Figure 2

Arrows denote ports and the adjacent numbers are their steady migration rates, in individuals per generation. If given, the date in parentheses indicates when the port opens. Upon opening, there is usually a first-wave migration burst at a higher rate, lasting one generation.

The model uses a simplified migration system in which each person has a single opportunity to migrate from his or her town of birth. The probabilities of leaving a town or a country are set at various levels to reflect different migration patterns. Migrants who move between towns can travel to any other town within the country. A migrant who leaves a country for another country within the same continent chooses the destination with a probability that diminishes as the inverse square of the geographical distance.

Each continent has a number of port countries from which migrants can travel to another continent. A fixed, large percentage (for example, 95% in some simulations) of the migrants through a port come from the country in which the port is located, with the remainder drawn from other countries in the continent in proportion to their inverse squared distance. The value next to a port in Fig. 2 is its migration rate, in people per generation, and the date in parentheses indicates when the port opens, if it is more recent than the start of the simulation in 20,000 bc. When a port opens, there is usually a single generation of migration at a higher rate than the steady-state rate shown in the figure. After the year ad 1500, additional large ports, which are not shown, begin to open to simulate colonization of the Americas, Australia and elsewhere. Immediately before this, the native population of the Americas is markedly reduced to simulate the effects of European-introduced diseases7.

Generations overlap in this model and we explicitly simulated the lifespan and the times at which mating and reproduction events occur for each individual8,9, as described in more detail in Supplementary Information. The birth rate of each continent or island was individually adjusted so that the populations match historical estimates, and growth rates were higher in under-populated areas. Full-sized populations were used until the world population reached 50 million in 1,000 bc. Subsequently, birth rates were reduced to achieve a worldwide level of 55 million, carried out in such a way that sparsely populated areas were less affected. This limit was a computational necessity, but simulations show that population growth has little effect, especially if it occurs after the MRCA has died.

With 5% of individuals migrating out of their home town, 0.05% migrating out of their home country, and 95% of port users born in the country from which the port emanates, the simulations produce a mean MRCA date of 1,415 bc and a mean IA date of 5,353 bc. Interestingly, the MRCAs are nearly always found in eastern Asia. This is due to the proximity of this region to both Eurasia and either the remote Pacific islands or the Americas, allowing the MRCA's descendants to reach a few major world regions in a relatively short time.

Arguably, this simulation is far too conservative, especially given its prediction that, even in densely populated Eurasia, only 55.3 people will leave each country per generation in ad 1500. If the migration rate among towns is increased to 20%, the local port users are reduced to 80%, and the migration rates between countries and continents are scaled up by factors of 5 and 10, respectively, the mean MRCA date is as recent as ad 55 and the mean IA date is 2,158 bc. The predictions of the simple ten-node graph model sketched earlier fall somewhere between these dates and those of the more conservative computational model.

The model also can be used to calculate the percentage of ancestry that current individuals receive from different parts of the world. In generations sufficiently far removed from the present, some ancestors appear much more often than do others on any current individual's family tree, and can therefore be expected to contribute proportionately more to his or her genetic inheritance1,10,11. For example, a present-day Norwegian generally owes the majority of his or her ancestry to people living in northern Europe at the IA point, and a very small portion to people living throughout the rest of the world. Furthermore, because DNA is inherited in relatively large segments from ancestors, an individual will receive little or no actual genetic inheritance from the vast majority of the ancestors living at the IA point12.

Several factors could cause the time to the true MRCA or IA point to depart from the predictions of our model. If a group of humans were completely isolated, then no mixing could occur between that group and others, and the MRCA would have to have lived before the start of the isolation. A more recent MRCA would not arise until the groups were once again well integrated. In the case of Tasmania, which may have been completely isolated from mainland Australia between the flooding of the Bass Strait, 9,000–12,000 years ago, and the European colonization of the island, starting in 1803 (ref. 13), the IA date for all living humans must fall before the start of isolation. However, the MRCA date would be unaffected, because today there are no remaining native Tasmanians without some European or mainland Australian ancestry.

No large group is known to have maintained complete reproductive isolation for extended periods. The populations on either side of the Bering Strait appear to have exchanged mates throughout the period documented in the archaeological record14. Religious isolates such as the Samaritans occasionally have absorbed migrants from outside the group15. Even populations on isolated Pacific islands have experienced occasional infusions of newcomers16. Even if rates of migration between some adjoining populations are very low, the time to the MRCA tends not to change substantially. For example, with a migration rate across the Bering Strait of just one person in each direction every ten generations, rather than the ten per generation in the more conservative simulation described earlier, Tn only increases from 3,415 years to 3,668 years.

Conversely, other factors could reduce the time to the MRCA from that predicted by the model. Examples of such factors include the existence of more diverse intercontinental migration routes, the large-scale movement and mixing of populations documented in the historical record17, marked individual differences in fertility18, and the population increase of the past two millennia, which would result in more migrants.

Actual migration rates among populations are very poorly known and undoubtedly have varied considerably in different times and places. Studies of hunter-gatherer groups and subsistence agricultural communities have found that anywhere from 1% (ref. 19) to as much as 30% (ref. 20) of mates are from outside the group. The tendency of most human groups to marry out with surrounding groups, at least to a limited extent, links networks of ancestry within specific regions (see http://computing.dcu.ie/~humphrys/FamTree/Royal/famous.descents.html).

Given the remaining uncertainties about migration rates and real-world mating patterns, the date of the MRCA for everyone living today cannot be identified with great precision. Nevertheless, our results suggest that the most recent common ancestor for the world's current population lived in the relatively recent past—perhaps within the last few thousand years. And a few thousand years before that, although we have received genetic material in markedly different proportions from the people alive at the time, the ancestors of everyone on the Earth today were exactly the same.

Further work is needed to determine the effect of this common ancestry on patterns of genetic variation in structured populations21,22,23,24. But to the extent that ancestry is considered in genealogical rather than genetic terms, our findings suggest a remarkable proposition: no matter the languages we speak or the colour of our skin, we share ancestors who planted rice on the banks of the Yangtze, who first domesticated horses on the steppes of the Ukraine, who hunted giant sloths in the forests of North and South America, and who laboured to build the Great Pyramid of Khufu.

References

  1. 1.

    in Genealogical Demography (eds Dyke, B. & Morrill, W. T.) 85–93 (Academic, New York, 1980)

  2. 2.

    Recent common ancestors of all present-day individuals. Adv. Appl. Probab. 31, 1002–1026, 1027–1038 (1999)

  3. 3.

    , & On the genealogy of a population of biparental individuals. J. Theor. Biol. 203, 303–315 (2000)

  4. 4.

    , , & Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713 (2000)

  5. 5.

    , , , & Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data. Proc. Natl Acad. Sci. USA 97, 7360–7365 (2000)

  6. 6.

    in Oxford Surveys of Evolutionary Biology (eds Harvey, P. H. & Partridge, L.) 1–44 (Oxford Univ. Press, New York, 1990)

  7. 7.

    American Holocaust: Columbus and the Conquest of the New World (Oxford Univ. Press, New York, 1992)

  8. 8.

    , Death Rates by Age, Race, and Sex, United States, 1900–1953, Vital Statistics—Special Reports Vol. 43 (US Government Printing Office, Washington DC, 1956)

  9. 9.

    Model fitting and hypothesis testing for age-specific mortality data. J. Evol. Biol. 12, 430–439 (1999)

  10. 10.

    The Malthusian parameter of ascents: What prevents the exponential increase of one's ancestors? Proc. Natl Acad. Sci. USA 93, 15276–15278 (1996)

  11. 11.

    , & Distribution of repetitions of ancestors in genealogical trees. Physica A 281, 1–16 (2000)

  12. 12.

    & On the number of ancestors to a DNA sequence. Genetics 147, 1459–1468 (1997)

  13. 13.

    Tasmanian archaeology: Establishing the sequences. Ann. Rev. Anthropol. 24, 423–446 (1995)

  14. 14.

    Fitzhugh, W. W. & Chausonnet, V. (eds) Crossroads of Continents: Cultures of Siberia and Alaska (Smithsonian Institution Press, Washington DC, 1988)

  15. 15.

    et al. Maternal and paternal lineages of the Samaritan isolate: Mutation rates and time to most recent common male ancestor. Ann. Hum. Genet. 67, 153–164 (2003)

  16. 16.

    , , & Pingelap and Mokil atolls: Migration. Am. J. Hum. Genet. 23, 339–349 (1971)

  17. 17.

    Cultures in Contact: World Migrations in the Second Millennium (Duke Univ. Press, Durham, North Carolina, 2002)

  18. 18.

    et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003)

  19. 19.

    & Archeology, population genetics and studies of human racial ancestry. Am. J. Phys. Anthropol. 44, 31–50 (1976)

  20. 20.

    & Gene frequencies and microdifferentiation among the Makiritare indians. IV. A comparison of a genetic network with ethnohistory and migration matrices; a new index of genetic isolation. Am. J. Hum. Genet. 22, 538–561 (1970)

  21. 21.

    in Current Developments in Anthropological Genetics (eds Mielke, J. H. & Crawford, M. H.) 135–208 (Plenum, New York, 1980)

  22. 22.

    The coalescent and the genealogical process in geographically structured populations. J. Math. Biol. 29, 59–75 (1990)

  23. 23.

    Genealogy and subpopulation differentiation under various models of population structure. J. Math. Biol. 37, 535–585 (1998)

  24. 24.

    & The study of structured populations—new hope for a difficult and divided science. Nature Rev. Genet. 4, 535–543 (2003)

Download references

Acknowledgements

The research of D.L.T.R. was supported by the National Institutes of Health.

Author information

Affiliations

  1. Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

    • Douglas L. T. Rohde
  2. 7609 Sebago Road, Bethesda, Maryland 20817, USA

    • Steve Olson
  3. Department of Statistics, Yale University, New Haven, Connecticut 06520, USA

    • Joseph T. Chang

Authors

  1. Search for Douglas L. T. Rohde in:

  2. Search for Steve Olson in:

  3. Search for Joseph T. Chang in:

Competing interests

The authors declare that they have no competing financial interests.

Corresponding author

Correspondence to Douglas L. T. Rohde.

Supplementary information

PDF files

  1. 1.

    Supplementary Methods A

    This file contains additional Methods (Further explanation and derivations of mathematical results) and an extra reference.

  2. 2.

    Supplementary Methods B

    This file contains additional Methods (further details of the computational model), Supplementary Figure 1, Supplementary Table 1 and extra references.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature02842

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.