Parental relatedness through time revealed by runs of homozygosity in ancient DNA

Ringbauer, Harald; Novembre, John; Steinrücken, Matthias

doi:10.1038/s41467-021-25289-w

Download PDF

Article
Open access
Published: 14 September 2021

Parental relatedness through time revealed by runs of homozygosity in ancient DNA

Harald Ringbauer^1,2,
John Novembre ORCID: orcid.org/0000-0001-5345-0214^2,3^na1 &
Matthias Steinrücken^2,3^na1

Nature Communications volume 12, Article number: 5425 (2021) Cite this article

25k Accesses
88 Citations
355 Altmetric
Metrics details

Subjects

Abstract

Parental relatedness of present-day humans varies substantially across the globe, but little is known about the past. Here we analyze ancient DNA, leveraging that parental relatedness leaves genomic traces in the form of runs of homozygosity. We present an approach to identify such runs in low-coverage ancient DNA data aided by haplotype information from a modern phased reference panel. Simulation and experiments show that this method robustly detects runs of homozygosity longer than 4 centimorgan for ancient individuals with at least 0.3 × coverage. Analyzing genomic data from 1,785 ancient humans who lived in the last 45,000 years, we detect low rates of first cousin or closer unions across most ancient populations. Moreover, we find a marked decay in background parental relatedness co-occurring with or shortly after the advent of sedentary agriculture. We observe this signal, likely linked to increasing local population sizes, across several geographic transects worldwide.

Haplotype-based inference of recent effective population size in modern and ancient DNA samples

Article Open access 01 December 2023

Differences in local population history at the finest level: the case of the Estonian population

Article Open access 25 July 2020

Ancient DNA reveals admixture history and endogamy in the prehistoric Aegean

Article Open access 16 January 2023

Introduction

An individual’s parents can be related to each other to varying degrees. For present-day humans, much intriguing geographic variation in parental relatedness has been observed. On one end of the spectrum, globally more than 700 million living humans are the offspring of second cousins or closer relatives. In some regions, the rate of such unions reaches 20–60%¹. Parents can also be more distantly related to each other, often via many deeper connections in their pedigree, as a common consequence of small population sizes^2,3,4, or as a consequence of founder effects in tight-knit groups^5,6. At the other end of the spectrum, in large populations where cousin marriages are not common, many parents have no recent connections in their pedigree at all². Going back in time, sporadic matings of close kin are documented in royal families of Europe, ancient Egypt, Inca, and pre-contact Hawaii^7,8, but little is known about broader patterns of past parental relatedness, because archeological evidence alone is typically not informative about mating preferences, especially for prehistoric societies.

The genetic sequence of an individual contains information about the relatedness of their two parents since co-inheritance of identical haplotypes results in stretches of DNA that lack genetic variation (Fig. 1A, often termed runs of homozygosity⁵, though also known by other terms, such as segments with homozygosity by descent (HBD, Supplementary Note 4). The more recent the genealogical relationship of the two parents, the more frequent and longer the resulting ROH tends to be². ROH can be identified in genome-wide data^9,10, and this signal has been analyzed for a wide range of purposes in medical, conservation, and population genetics².

**Fig. 1: Detecting runs of homozygosity using a reference panel.**

Recently, ROH has been identified in ancient DNA (aDNA)^{11,12,13,14,15,16,17,18}, that is, genetic material extracted from ancient human remains. This advance is especially promising, as large datasets of aDNA have been generated in the last decade¹⁹. However, major challenges persist, coverage for aDNA is often around or less than 1× per site (see Fig. S19), and contamination and DNA degradation introduce genotyping errors²⁰. As a consequence, ROH detection is currently only possible for ancient individuals of exceptional high coverage. Recent methodological advances enable identifying ROH in data with at least 5 × coverage²¹, but this threshold precludes analysis on all but a small fraction of the currently available aDNA record.

Here, we present an approach to detect ROH that can identify ROH longer than 4 centimorgan (cM) in individuals with coverage as low as 0.3×. It is designed to perform well for a common type of human aDNA data: Pseudo-haploid genotypes (Fig. 1B), which consist of a single allele call for each diploid site. Such data do not convey homozygote versus heterozygote genotype states directly; however, as we show, one can extract ROH from such data by leveraging haplotype information from a phased reference panel.

Using this method, we analyze 1785 ancient individuals from the last 45,000 years, a substantial fraction of the published global human aDNA record. We focused on quantifying two domains of parental relatedness: (1) Close-kin unions, measured by the sum of all ROH > 20 cM, denoted as sROH > 20; and (2) background relatedness as measured by the sROH 4–8 cM. First, we find that matings among first-cousins or closer relatives, though widely practiced today in numerous societies, are generally infrequently observed in aDNA data. Second, we observe decreasing levels of short ROH across many regions coinciding with or shortly after the local Neolithic transition from foraging to agricultural subsistence strategies. This genomic evidence of reduced background relatedness supports and refines long-held evidence of the Neolithic transition involving a major demographic shift towards increased local population sizes.

Results

Detecting ROH using a haplotype reference panel

Our approach to detect ROH employs a phased reference panel to leverage haplotype data (described in Supplementary Note 1). Briefly, our method utilizes the fact that sequencing reads in a region of ROH are effectively sampled from a single haplotype only because the maternal and paternal haplotypes of the diploid individual are identical. In contrast, outside an ROH, two distinct haplotypes are carried by the target individual, and thus, the sequencing reads to originate from both. As a result, modeling sequencing reads as a mosaic of long stretches copied from single reference haplotypes works substantially better within ROH regions (see Fig. 1). To utilize this signal, we developed a Hidden Markov Model (HMM) with hidden copying states, one for each reference haplotype, to model copying long stretches from the panel [similar to the copying model of ref. ²²], and an additional single non-ROH state as in ref. ¹⁰. We implemented this algorithm in the software package hapROH, available at https://pypi.org/project/hapROH. The default parameters of the current implementation are tuned for pseudo-haploid genotype calls from a widely used capture technology that targets ca. 1.24 million SNPs [hereafter the “1240K” SNP panel²³] when using a reference panel of 5008 haplotypes of present-day human genetic variation [1000 Genomes²⁴].

Validating the ROH inference

We tested the method in four scenarios: (1) Spiking ROH segments of various lengths (4–10 cM) into data (Supplementary Note 2.1), (2) down-sampling high-coverage ancient individuals (Supplementary Note 2.3), (3) down-sampling present-day individuals (Supplementary Note 2.4), and (4) testing different divergence times between the reference panel and the target individual (Supplementary Note 2.2).

For the spike-in experiments, we observe that the power to detect at least 80% of an inserted ROH block was above 85% in all simulated scenarios (Fig. 1D, Supplementary Note 2.1). Bias in the estimated length of the longest overlapping inferred ROH was consistently below 0.5 cM, and we observed no false positives for ROH > 4 cM. In the down-sampling experiments, we tested the ability to recover the sum of ROH segments falling into four length ranges (4–8, 8–12, 12–20, and >20 cM). When down-sampling the oldest modern human genome sequenced to high depth [Ust-Ishim, 45,000 years old²⁵], the method produces little bias in estimating the respective sROH statistics from pseudo-haploid data with as few as 400,000 of the 1240K sites covered (see Fig. 1E, Supplementary Note 2.3). In the experiments where we down-sampled 599 present-day individuals from a global sample (Supplementary Note 2.4), the ROH inference from pseudo-haploid data performs generally well (sROH_[4,8]: r = 0.925 between diploid ROH calls and pseudo-haploid data, sROH_>20: r = 0.988, Fig. S10), except for sub-Saharan forager populations. When assessing the impact of different divergence times between the test population and the haplotype reference panel in individuals with a simulated coverage of 1×, we find that using a European-only reference panel, the method can detect ROH in low coverage test individuals from East Asia and South America but showed much less power for test individuals from West Africa (see Fig. S6).

Together, our tests show that the method can infer ROH segments longer than 4 cM for individuals with more than 400,000 of the 1240K sites covered at least once while tolerating sequencing error rates up to 3%. In addition, our experiments demonstrate that the method can analyze target individuals from populations that diverged from the reference panel up to several ten thousand years ago. Therefore, all present-day and ancient humans that share the out of Africa bottleneck [20,000–40,000 BP²⁶] fall into the range of applicability of our method when using the 1240K marker set and the full 1000 Genomes dataset as haplotype reference panel.

Application to aDNA data

We then applied our method to a large publicly available dataset of aDNA data (Allen Ancient DNA Resource v42.4, released on March 1, 2020) using the 1000 Genomes dataset as haplotype reference panel (see Section “Methods”). Only 134 of the 3723 individuals in this dataset have average coverage > 5× (Fig. S19), a typical minimum coverage requirement for previous ROH methods²¹. Using the method described here with its threshold of 400,000 of the 1240K sites covered at least once allowed us to analyze a much larger fraction of this dataset (1833 of the 3723 individuals). We also integrated a dataset of modern individuals genotyped at the Human Origins SNPs [HO²⁷], which are a subset of the 1240K SNPs. After quality control and filtering (see Section “Methods”), we arrived at a dataset of 1785 ancient and 1941 present-day unique individuals. Within this dataset, we inferred ROH longer than 4 cM using all available 1240K pseudo-haploid data in all ancient individuals and using diploid data for the HO SNPs in all modern individuals. After confirming that ROH calls on pseudo-haploid and diploid data in modern individuals correlate closely (Pearson correlation coefficient r = 0.925–0.988, Fig. S8), we analyzed the inferred ROH in ancient and modern individuals jointly.

Low abundance of long ROH in ancient humans

We first identified individuals with sROH_>20 greater than 50 cM. We chose this threshold based on calculations Supplementary Note 4 and simulations Supplementary Note 5, which show that in large populations, 88% and 20% of the offspring of first and second cousins, respectively, have sROH_>20 > 50 cM, but less than 1% of offspring of third or more distant cousins do. The 50 cM threshold for sROH_>20 can also be surpassed in very small isolated populations, specifically, 34% of individuals in populations of size 250 and 8% for size 500 (Fig. S15). Hereafter we refer to this as the “long ROH” threshold, and individuals crossing it as having “long ROH”.

Overall, we find that only 54 out of the 1785 ancient individuals (3.0%, CI: 2.3–3.9%) have sROH_>20 above 50 cM. Generally, these individuals with long ROH do not concentrate in any particular region or time period (Figs. 2B and 3). The only archeological cluster (defined in annotations from the source dataset, modified for readability) with more than two individuals is “Iron Age Republican Rome”, where 3 of 11 samples (reported in ref. ²⁸) fall above the long ROH threshold. In the Pontic-Caspian Steppe region, 3 of 54 individuals who lived between 2600 and 1500 BP (5.6%, CI: 1.2–15.4%) exceed the threshold (Fig. 2F), but this signal is not significantly different from the rate in the full dataset. Three individuals with long ROH appear in the late pre-contact Andes region (Fig. 2D), and a follow-up study describes this signal with a larger sample size²⁹. Notably, 11 of the 54 individuals with long ROH are located on islands: Ordered by time and using the cluster annotations from the publicly available dataset (modified for readability) these are: “Sardinia Early Copper Age” (1 of 1, Fig. S20), “Sweden Megalithic” (1 of 5, all on Gotland), “England Neolithic” (1 of 16), “Chilean Western Archipelago” (1 of 3), “England C-EBA” (2 of 14, Fig. 2), “Russia Bolshoy” (2 of 6), “Vanuatu 1100 BP” (1 of 3), “Argentina Tierra del Fuego” (1 of 1), and “Indian Great Andaman” (1 of 1).

**Fig. 2: Time transects of major regions.**

**Fig. 3: Individual ROH in a subset of ancient and present-day populations.**

The highest value of sROH_>20 across the whole dataset (including present-day individuals) is found in a 6000-year-old Levantine Copper Age individual [I1178³⁰] with 545 cM sROH_>20. The other 8 individuals tested from the same burial site [Peqi’in Cave, Israel Chalcolithic 6000 BP³⁰] had sROH_>20 values of 0, and very little ROH overall (sROH_>4 < 30 cM). The sum and length distribution of ROH suggest the parents of individual I1178 were first-degree relatives (Fig. 4), i.e., parent-offspring or full siblings whose offspring will have a quarter of their genome in ROH. We note that the burial context of this male individual was not reported to be exceptional.

**Fig. 4: ROH in a 6000-year-old individual.**

The rate of long ROH is substantially higher in the present-day Human Origins dataset; we inferred that 176 of 1941 modern individuals (9.1%, CI: 7.8–10.4%) have long ROH. In contrast to ancient data, several geographic clusters of long ROH are found, mainly in present-day Near East, North Africa, Central/South Asia, and South America (Supplementary Data 1). This signal was described previously [reviewed in ref. ²] and mirrors the estimated prevalence of cousin marriages¹.

In two regions where long ROH are common in the present-day data (Fig. 3) our ancient data contains several ancient individuals, which allowed us to analyze time transects. In the Levant, all five present-day annotated groups in our study (Druze, Palestinian, Syrian, Lebanese, Jordanian) have a high fraction of individuals above the long ROH threshold (30 out of 102 in total, see Fig. 3C). In the ancient sample of this region, only 2 out of 28 analyzed Levant individuals from the Copper Age (n = 9), Bronze Age (n = 8), Roman times (n = 3) to the Middle Age (n = 8) fall above this threshold: the first is the Israel Chalcolithic individual with the highest sROH_>20 in our dataset (see above) (Fig. 4); the second is a male individual (SI-38) excavated from a mass burial in South Lebanon connected to a Medieval Crusader battle, who was found to have local ancestry³¹. The second region for which we could analyze a time transect is the region of present-day Pakistan. In five out of six modern annotated groups in the dataset (Pathan, Brahui, Makrani, Balochi, and Sindhi), many individuals have long ROH (33 out of 98 individuals with sROH_>20 above 50 cM). In the sixth group, from the Kalash, an isolated valley population, only 1 individual out of 18 exceeds this threshold, despite elevated levels of background ROH being observed (Fig. 3B). In contrast, in the ancient individuals³² [from present-day Northwestern Pakistan], we infer that only 1 individual out of 75 from the Iron Age (3200–2700 BP) has sROH_>20 above the threshold and that none of the 20 individuals from the Historical Period (2600–1900 BP) and none of the 4 individuals from the Middle Period (900–400 BP) surpass the long ROH threshold (Fig. 3B).

Human background relatedness decreased over time

Shorter ROH segments measured by sROH_[4,8] accumulate from parental lineages coalescing on average 10–30 generations ago (Fig. S14); thus, their abundance reflects the size of the ancestral mating pool (background relatedness) over approximately the previous half millennium [assuming 30 years per human generation³³]. Because ancestry often spreads out geographically back in time, the probability of recent coalescence and ROH decreases not only with increasing local population size but also increasing parent-offspring dispersal^34,35. Assuming that individual mobility is comparable between groups, sROH_[4,8] proxies for local population size³⁶. We plotted the values of sROH_[4,8] in time transects for 24 major geographic regions that cover 1763 of the 1785 ancient individuals (16 regions shown in Fig. 2, 8 regions in Fig. S20, 29 additional individuals from islands are shown in Fig. 3D, and the remaining 22 individuals are reported in Supplementary Data 1). In addition, we tested whether sROH_[4,8] differs between subsistence strategies (annotated for most ancient individuals, see Section “Methods”) in certain regions (PERMANOVA used for Table 1 and p-values in the text below).

Table 1 Comparison of statistics for sROH_[4,8] for pairs of groups.

Full size table

We find that sROH_[4,8] is highest among the most ancient individuals in the dataset and then generally decreases going forward in time. Each of 43 ancient individuals in the global sample dated to before 10,000 BP was inferred to have sROH_[4,8] > 0, with a median value of 54.5 (39 individuals shown in Fig. 2). We then observe a substantial decline in sROH_[4,8] coinciding with the Neolithic transition to sedentary, agricultural lifestyles (Fig. 2). In Western Eurasia, we contrasted individuals from forager cultures to those from early farming cultures (i.e., farming cultures within the first 2000 years after the first annotated “Agriculture” individual per region). We found that sROH_[4,8] decreases substantially in all 8 regional transects which contain both annotated foragers and farmers (p-value < 0.05 in 7 of these 8 transects), and median sROH_[4,8] values drop from 13 to 66 cM per foraging group to 0–9 cM per early farming group (Tab. 1). In the Andes, where agriculture gradually increased in intensity starting around 5000 BP in a heterogeneous process lasting thousands of years³⁷, the median sROH_[4,8] decreases from 55.4 for foragers to 17.9 for agriculturalists (p = 1.2 × 10⁻², Table 1).

Detailed inspection of the transitions from foraging to farming reveals interesting finer-scale dynamics. First, for the earliest western Eurasian farmers not using ceramics yet, who lived ~10,000 years ago and predate the Neolithic expansions into Europe, we still observe elevated rates of short ROH, with a median sROH_[4,8] of 36.7, 16.4, and 15.2 in Aegean, Levant, and Central Asian aceramic farmers, respectively, which is comparable to values observed for western Eurasian foragers (sROH_[4,8] ranging from 13 to 66 cM, Table 1). In all three regions there is a subsequent marked drop to ceramic early farmers, with median sROH_[4,8] decreasing substantially to 0, 0, and 4.8, respectively (p = 6.0 × 10⁻⁵, 3.8 × 10⁻², and 5.4 × 10⁻², Table 1).

Furthermore, one ceramic early farming group in our sample stands out: Individuals annotated in the original dataset as Iberian Early Neolithic [7400–7000 BP³⁸] have median sROH_[4,8] of 32.8 cM, which is substantially higher than in other early Eurasian farmers (median sROH_[4,8]: 0–8.7 cM, Table 1). However, in Iberian Middle Neolithic farmers (6800–4600 BP) ROH decreases (median sROH_[4,8] = 0, p = 1.0 × 10⁻⁵, Table 1) and becomes typical of other early European farmers. As the early Iberian individuals have exceptionally high early farmer ancestry [>90%³⁸], this signal cannot be explained by forager (hunter-gatherer) ancestry. However, archeological evidence of a rapid maritime spread (Cardial Ware expansion) within a few hundred years around 7500 BP³⁹ provides one plausible explanation of this increased abundance of short ROH in the Early Neolithic, as a rapid spread could have caused an initial bottleneck. Moreover, an initially small population of early farmers would explain why forager admixture substantially increased in Middle Neolithic Iberians and remained one of the highest of European Neolithic populations [~25%³⁸].

In the ancient Americas, elevated sROH_[4,8] values evidence sustained high levels of background relatedness. This signal is found across all American regions: Western North America (West NA) (Fig. 2D1, median sROH_[4,8] = 67.1 cM excluding long ROH individuals, n = 7), Eastern South America (East SA) (Fig. 2D2, 59.0 cM, n = 13), Andean (Fig. 2D3, 31.5 cM, n = 20), Southern SA (Fig. 2D4, 112.5 cM, n = 8) and Beringia (Fig. S20, 52.4 cM, n = 9). This abundance of ROH (overall median sROH_[4,8] = 56.3) is higher than the rest of the global sample in the same broad time period (<13k years ago, median 4.2 cM, p < 10⁻⁵, Table 1). Since sROH_[4,8] is driven by co-ancestry within the last few dozen generations (Supplementary Note 4), this elevated sROH_[4,8] cM cannot be explained by bottlenecks during early migrations into the Americas, but one needs to invoke more recent, sustained small effective population sizes. Overall we observe little temporal variation (Fig. 2), with one exception in the dataset being Andean populations around the time of the shift to agriculture (see above; also note the dataset does not include individuals from other early centers of agriculture in the Americas, e.g., Central Mexico, eastern North America).

Another observation of elevated ROH on a large geographical scale is found in the Eurasian Steppe, where early pastoralist groups all have substantial amounts of sROH_[4,8] (Steppe-PA 5.2-3k BP, median sROH_[4,8] = 10.9, Table 1), including the Yamnaya (median 17.5 cM, n = 17), Afanasievo (18.1 cM, n = 22), Sintashta (5.7 cM, n = 21), Okunevo (24.5 cM, n = 12) and Srubnaya (4.8 cM, n = 19). These sROH_[4,8] levels are significantly higher than in Western Eurasian farmer populations before 5000 BP (median 4.2 cM, p =1.0 × 10⁻⁵, Table 1), and, notably, also significantly higher than their southern contemporaneous neighbors, sedentary farmers from Central Asia (median 0, p = 1.0 × 10⁻⁵, Table 1). In samples from the Western Pontic-Caspian Steppe (present-day Ukraine and Moldavia), at the transition from foragers to pastoralists, we observe a substantial decrease of sROH_[4,8] from median 14.2 to 0 (p = 6.9 × 10⁻³, Table 1). Similarly in the Eastern Steppe (around Lake Baikal and present-day Mongolia), a shift from foragers to pastoralism coincides with a significant reduction in sROH_[4,8] (median 32.5–4.7, p = 1.0 × 10⁻⁵). We note that in both the Western and Eastern Steppe many of the pastoralists in our sample date to 3000–2000 BP (Scythian and Xiongnu, respectively), substantially later than the early pastoralists mentioned above.

Discussion

We developed a method for measuring ROH in low coverage ancient DNA. Our algorithm follows a long line of previous work utilizing HMMs to infer such segments^10,40,41,42. A key methodological advantage here is to use hidden states that, within an ROH segment, copy from a reference panel of haplotypes to take advantage of haplotype information. This tool enabled us to screen aDNA data from 1785 individuals for ROH, an order of magnitude more ancient individuals than hitherto amenable for such analysis. We generated evidence for two key aspects of the human past: Identifying long ROH (>20 cM) provided insight into the past prevalence of close kin unions such as cousin matings, whereas short ROH (4–8 cM) revealed changing patterns of past background relatedness that reflect local population sizes.

We found that only 1 out of 1785 ancient individuals have long ROH typical for the offspring of first-degree relatives (e.g., brother–sister or parent–offspring). Historically, matings of first-degree relatives are only documented in royal families of ancient Egypt, Inca, and pre-contact Hawaii, where they were sporadic occurrences⁷. The only other example of an offspring of first-degree relatives found using aDNA to date is the recently reported case from an elite grave in Neolithic Ireland¹⁸. Our findings are in agreement that first-degree unions were generally rare in the human past.

Further, we find that only 54 out of 1785 ancient individuals (3.0%, CI: 2.3–3.9%) have long ROH typical for the offspring of first cousins (88%) and less commonly observed for second cousins (20%). Such long ROH can also arise as a consequence of small mating pools (e.g., 8% in randomly mating populations of size 500, which may explain the long ROH we observed on certain island populations). Therefore, the rate of long ROH is an upper bound for the rate of first-cousin unions. On the other hand, because of incomplete power, some long ROH may be missed in our empirical analysis; however, even if the method would fail to detect half of all ROH > 20 cM, well below the power that we observed in our simulations, we would still detect 60% of first cousins (see Table S5). We conclude that in our ancient sample substantially less than 10% of all parental unions occurred on the level of first cousins.

In two specific regions with high levels of long ROH in the present-day², the dataset contained a sufficient number of ancient individuals to allow analyzing time transects. For both transects (the Levant and present-day Northwest Pakistan), we observe a substantial shift in the levels of long ROH. In contrast to the high abundance of long ROH typical of close kin unions in the present-day individuals, long ROH was uncommon in the ancient individuals, including up to the Middle Ages. Additional data from these regions and others with high levels of long ROH today, such as North Africa as well as Central, South, and West Asia², will help resolve with more precision the origin and spread of these well-studied kinship-based mating systems^43,44. Overall, our results show how an ROH-based method can be used to inform understanding of shifts in cultural marriage/mating practices.

As a second major finding, we observed that human background relatedness as measured by short ROH (4–8 cM) decreased markedly over time in many geographic transects, with a significant drop occurring during or shortly after the local “Neolithic Transition”, the transition from a lifestyle of hunting and gathering to one of agriculture and settlement^45,46,47. Assuming that early farmers had no increased individual mobility compared to foragers, which would agree with observations in present-day forager populations⁴⁸, the substantial decrease of short ROH evidences markedly increasing local population sizes. This finding adds support to the long-held hypothesis of local population sizes increasing following the Neolithic transition^45,46,47. Previous analysis of ancient genomes of foragers and early farmers already identified several lines of genomic evidence for farmers having larger population sizes than earlier hunter–gatherers, such as decreasing genome-wide diversity^49,50, decreasing prevalence of ROH^{11,12,13,14,18} and decreasing coalescent rates estimated from high-coverage genomes²⁷. Our analysis adds a refined level of geographic and temporal resolution by analyzing an order of magnitude of more individuals (1785 ancient humans) and by organizing those individuals into several densely sampled time transects in different geographic regions.

For individuals from early Eurasian Steppe pastoralist groups, we observe an intermediate level of short ROH. These early cultures (e.g., the Yamnaya) have drawn much attention in archeological and ancient DNA studies to date, as archeological, linguistic, and genetic evidence suggest they played an important role in the origin of Indo-European languages and of several populations expansions^{32,51,52,53,54}. The elevated rate of short ROH we observed provides evidence that many matings occurred within and among small, related groups. An alternative interpretation for the abundance of short ROH could be that burial sites (Kurgans) represent a biased sample of societal classes with more short ROH than the general populace⁵¹. However, as short ROH probes parental ancestry up to several dozen generations into the past, this signal would require reproductive isolation between societal strata maintained over many generations. Therefore, it is likely that at least part of the signal is due to Steppe populations having comparably low population densities or experienced recent bottlenecks.

Our analysis is limited by several caveats. Importantly, skeletal remains accessible by archeological means often do not constitute a random cross-section of past populations. While levels of background relatedness are expected to be similar within a mixing population, rates of close kin unions can vary substantially because of social structure; e.g., elite dynasties may practice close kin unions despite them being uncommon in the general population. Another limitation is the incomplete sampling of the current aDNA record and that for much of the world, we necessarily make inferences from small numbers and sparse sampling. Future work analyzing the rapidly growing ancient DNA record will help to resolve additional details of social and cultural factors operating at finer scales (e.g., leveraging more precise timings of shifts and more subtle shifts in ROH patterns). In particular, future studies focusing on specific localized questions will increasingly combine archeological and genetic evidence¹⁶, in ways that will empower the use of the genetic evidence about the past provided by the methodology presented here.

In addition to denser sampling, there are several ways how our analysis can be improved upon by future work. Here we focused our analysis on long ROH (>20 cM) and short ROH (4–8 cM). While this dichotomy helped us to disentangle more clearly recent and distant parental relatedness, we expect that future work refining the downstream analysis of ROH will be able to extract more subtle signatures by looking across all ROH scales. Furthermore, we note that our application focused on a set of SNPs widely used for human ancient DNA (1240K SNPs). For whole-genome sequencing data (available for a subset of the data analyzed here), using all genome-wide variants would likely lower the requirements for coverage below the current limit of 400,000 of the 1240K SNPs covered at least once (corresponding to ca. 0.3× whole-genome sequencing coverage). Another improvement would be using a reference panel that includes ancient haplotypes. Currently, no long-range phased ancient haplotypes are available, but future work will likely produce such data.

One alternative approach to identify ROH in low coverage ancient genomes could be to use imputation followed by screening for stretches of homozygous markers using standard ROH detection methods. This was recently done for ancient individuals with >10× coverage¹⁸. Since imputation of genomes was reported to work well to a coverage similar to the low coverage cutoff used here [^55,56ca. 0.5×] and most imputation methods are based on haplotype-copying methods related to the approach utilized here [the Li and Stephens model²², we expect any such approach to perform similar to ours, after appropriate testing and calibration, as conducted for our method. We chose to develop a method utilizing several key advantages of pseudo-haploid data, which is more widely available and requires fewer assumptions about genotype quality, making subsequent analysis less prone to batch effects introduced by various isolation, sequencing, and genotyping protocols.

Identifying ROH can also be a starting point for other powerful applications: ROH consists of only a single haplotype (the main signal of our method), which is therefore perfectly phased, a prerequisite for powerful methods relying on haplotype copying⁵⁷ or tree reconstruction^26,58. Moreover, long ROH could be used to estimate contamination and error rates, an important task in ancient DNA studies²⁰. ROH lacks heterozygotes, allowing one to identify heterozygous reads within ROH that must originate from contamination or genotyping error, similar to estimating contamination from the hemizygous X chromosomes in males⁵⁹. Another promising future direction is the development of a method to identify long shared sequence blocks in ancient DNA not only within (ROH), but also between individuals, called identity-by-descent (IBD). Calling IBD between individuals would substantially increase power for measuring background relatedness since signals from every pair of individuals could be used. Moreover, a geographic IBD block signal is highly informative about patterns of recent migration^35,60,61,62. Extending our method to similarly use haplotype information from a phased reference panel when detecting IBD could enable such analyses in low coverage ancients individuals.

Finally, the analysis of ROH has additional implications beyond human demography and kinship-based mating systems. In many plants and animal species, ROH is more prevalent (due to different mating systems, small population sizes, or domestication), and the study of ROH may be particularly interesting for understanding early plant and animal breeding, as actively controlled mating among domesticates would be expected to alter ROH⁶³. For aDNA from extinct or endangered species, ROH can shed light on the extinction and inbreeding processes, as is observed for example in aDNA from high-coverage Neanderthal individuals^17,64,65,66, or modern DNA from Isle Royal wolves⁶⁷. Finally, as ROH exposes rare deleterious recessive alleles⁶⁸, the temporal dynamics of ROH are relevant for understanding the evolutionary dynamics of deleterious variants and health outcomes^67,69,70,71. We hope that the core ideas of our approach will inspire the analysis of low-coverage data from a wide range of natural populations.

Methods

Calling ROH in a global dataset

To detect ROH, we developed a method, hapROH, which is based on an HMM with ROH and non-ROH states and uses a panel of reference haplotypes. The detailed method description and evaluation are provided in Supplementary Note 1, Supplementary Note 1.7, Supplementary Note 2.1 and (Supplementary Note 2.4). The software is publicly available at https://pypi.org/project/hapROH/. Fort the global data analysis we run hapROH (version 0.1a4) using the default parameter settings.

Empirical dataset

The global ancient DNA dataset we analyze originates from a curated dataset of published ancient DNA (“1240K”, v42.4, released on March 1, 2020, https://reich.hms.harvard.edu). This release provides ancient DNA data in a pseudo-haploid format for 1.24 million SNPs (The 1240K SNP panel). This data includes whole genome as well as 1240K SNP capture data compiled from 92 primary publications which were processed starting from bam- or fastq-files using largely identical pipelines across datasets, only adjusting bioinformatics procedures when required by different data generation procedures. We added an additional 40 ancient Sardinian individuals in pseudo-haploid format from a recent publication⁷² that had not yet been compiled into the global reference dataset.

We only analyzed previously generated, publicly available genetic data. For all data, we contacted the corresponding authors of each original study regarding our project and publication plan. We included in our final analysis the data from all studies for which we received a response confirming the use is consistent with the original permits. We filtered to individuals that contained PASS in the ASSESSMENT column of the meta-data table in order to remove individuals with possible contamination. For the remaining ancient individuals that had multiple genotypes listed, we kept the record with the highest coverage. Furthermore, we removed all Neanderthal and Denisovan individuals, as well as the individual tem003, for which initial analysis showed that it has all of chromosome 2 in ROH, but no other long ROH. Finally, we kept only individuals with at least 400,000 SNPs of the 1.24 million covered, the approximate cutoff above which our method can provide robust ROH inference (Fig. S1). For present-day data, we downloaded the Human Origins dataset with diploid genotype calls for ca. 550,000 autosomal SNPs²⁷, which are a subset of the 1240k SNPs.

We applied hapROH to the pseudo-haploid data for the 1785 ancient individuals and the diploid data for the 1941 modern individuals that pass our quality thresholds. We used all SNPs with available data for which the reference and the alternative allele matched the information in the reference panel, set the respective emission probabilities to values designed for these two types of data (Supplementary Note 1.3), and used the default parameters of hapROH that were optimized for the 1240K SNPs (Supplementary Note 1.8). For the haplotype reference panel, we used the full global set of 5008 phased haplotypes of the 1000 Genomes Project dataset (Phase 3, release 20130502) accessible via http://ftp.1000genomes.ebi.ac.uk²⁴, filtered to biallelic markers and downsampled to SNPs in the 1240K SNP panel with bcftools (version 1.9). This standard human reference dataset is computationally (and in some cases trio-) phased and we kept the phasing as provided. Throughout, we used allele frequencies calculated from the diploid genotypes of the full reference panel when calculating the emission probabilities. We report the detailed ROH calls for all individuals in Supplemental Information 1.

Annotation of subsistence strategy

For each ancient individual, we annotated the primary subsistence strategy into standard broad categories of food production⁷³, using descriptions of the archeological sites and cultural affiliations. We used three main labels: We denoted (1) hunter–gatherer and horticulture lifestyles based on collecting wild plants, hunting, or fishing with the label “Forager”; (2) groups that practiced substantial amounts of sedentary farming (e.g., cereals and domesticates observed in the archeological record) as “Agricultural”, and (3) groups with nomadic and semi-nomadic mobile lifestyles based on herding and breeding of domestic animals (e.g., cattle) as “Pastoralist”. Groups that had intermediate and transitory lifestyles were annotated using the plausible dominant food economy of the associated archeological culture. To better resolve the transition to agricultural food production, we denoted early groups that practiced agriculture, but lack ceramics in the archeological record as “Aceramic Farmers”. Individuals and groups for which the archeological record does not contain sufficient information to annotate a subsistence strategy were labeled as “Uncertain”. We stress that archeological evidence is often sparse and assignments are frequently interpretations of various lines of evidence, therefore assessments might change with updates to the archeological record. Here, we tolerate some error, since we address questions regarding very broad temporal and geographic patterns, but we advise against using our subsistence assignments as a reference for questions on a finer scale.

Detecting offspring of close relatives from ROH

We screened all individuals for ROH longer than 20 cM to identify potential offspring of close relatives. Pairwise IBD > 20 cM, which translates to ROH in the offspring, is very unlikely to be a concatenation of multiple shorter IBD blocks⁷⁴. Moreover, recombination quickly breaks up long ancestry segments of the genome, and thus most long ROH originates from co-ancestry within only a small number of generations back. Therefore, if the fraction of the genome in ROH longer than 20 cM in an individual is large, this provides strong evidence for a close relationship of its parents. We report individuals where the sum of all such ROH exceeds 50 cM as potential offspring of closely related parents (i.e., sROH_>20 > 50). This cutoff is motivated by analytical calculations and simulations, see Supplementary Note 4 and Supplementary Note 5 for details. Briefly, this threshold detects a large fraction of close kin offspring (parents being a first cousin or closer) while also being insensitive to background relatedness unless a population has a very small size (<500).

Gaussian process modeling of short ROH

To visualize the trend of the abundance of ROH in the individuals in certain regions over time, while still conveying the levels of uncertainty due to varying sample sizes, we fit a Gaussian Process (GP) model⁷⁵ using the Python package scikit-learn⁷⁶. As input, we used the square root of the sROH_[4,8] statistic to stabilize its variance⁷⁷, since sROH_[4,8] corresponds closely to count data, which can be approximated by a Poisson distribution. Furthermore, since we use this statistic as a proxy for background relatedness (which in turn proxies for local population size), we removed all individuals with sROH_>20 above 50 cM when fitting the GP model, to minimize the impact of putative offspring of close kin on this analysis (Fig. S13).

For the variance model of the GP, we used a standard squared-exponential covariance kernel summed with a residual white noise kernel. In preliminary analyses, we estimated all parameters of the model via maximum likelihood, but we found that these estimates appeared to over-fit the data for several time transects. Thus, we set custom length scales for the covariance kernel for each transect (1500 for all non-American populations and 2000 for American populations, because they had larger temporal sampling gaps) and only fit the two coefficients of the squared-exponential and white noise kernel. To visualize the final output, we estimated the variance of the predicted mean across a dense set of time points⁷⁵. We estimated the uncertainty of the predicted mean and the uncertainty of each individual point and plotted both as 95% confidence interval bands (±1.96 standard deviations) on a dense grid.

Analytical expectations of ROH

To aid interpretation of ROH, we visualize expectations of sROH using formulas describing ROH of closely related parents in otherwise outbred populations and finite populations without substructure. We derive and state these formulas in a unified framework (Supplementary Note 4). We note that these formulas have been derived previously^78,79. In addition, we verified these formulas by simulating ROH for these demographic scenarios and comparing expected sROH values to empirical averages (Supplementary Note 4 and Supplementary Note 5).

Comparing ROH between groups

To test significant differences in the distributions of the sROH statistics between two groups, we applied the Permutational multivariate analysis of variance method [PERMANOVA⁸⁰], which calculates a pseudo-F statistic and assesses its significance via permutation tests. We used the permanova function implemented in the Python package skbio, and based the distance matrices on absolute differences of individual’s sROH_[4, 8]. For each test, we ran 99,999 permutations (minimal p-Value: p = 10⁻⁵) and report two-sided p-Values. As with the GP modeling, we removed all individuals with sROH_>20 above 50 cM when comparing distributions of sROH_[4,8] between groups.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

No new DNA data were generated for this study. The ancient dataset and modern data [Human Origins²⁷] we analyzed originate from the Allen Ancient DNA Resource (Version V42.4, available via https://reich.hms.harvard.edu), primary publications listed in Supplementary Data 1B. The raw reference panel data that we used (phased haplotypes from the 1000 Genomes dataset) is available at http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/. The ancient and modern data we screened for ROH, as well as the processed reference panel we generated (down-sampled to biallelic SNPs at 1240k sites), are archived at https://doi.org/10.5281/zenodo.4992532. The source data underlying Fig. 2, Fig. 3, and Table 1, i.e., the ROH results on ancient DNA data, are provided in Supplementary Data 1A. Code that generates the data for each figure in the main text and Supplementary Information is listed in Supplementary Data 1C.

Code availability

The Python package implementing the method is available at the Python Package Index (https://pypi.org/project/hapROH/) and can be installed using pip. The documentation provides example use cases as blueprints for custom applications. Code developed for simulating data, analysis, and data visualization is available at the GitHub repository https://github.com/hringbauer/hapROH. The version used for this work is archived at https://doi.org/10.5281/zenodo.4992416⁸¹. For data analysis we used Python (3.7.6) and the Python packages jupyterlab (2.1.2), scipy (1.3.1), pandas (1.1.4), numpy (1.19.4), and scikit-bio (0.5.6). We visualized results using matplotlib (3.1.1) and basemap (1.2.1).

References

Bittles, A. H. & Black, M. Consanguinity, human evolution, and complex diseases. Proc. Natl Acad. Sci. USA 107, 1779–1786 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Ceballos, F. C., Joshi, P. K., Clark, D. W., Ramsay, M. & Wilson, J. F. Runs of homozygosity: windows into population history and trait architecture. Nat. Rev. Genet. 19, 220 (2018).
Article CAS PubMed Google Scholar
Henn, B. M. et al. Hunter-gatherer genomic diversity suggests a Southern African origin for modern humans. Proc. Natl Acad. Sci. USA 108, 5154–5162 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Mondal, M. et al. Genomic analysis of andamanese provides insights into ancient human migration into asia and adaptation. Nat. Genet. 48, 1066–1070 (2016).
Article CAS PubMed Google Scholar
Broman, K. W. & Weber, J. L. Long homozygous chromosomal segments in reference families from the centre d’Etude du polymorphisme humain. Am. J. Hum. Genet. 65, 1493–1500 (1999).
Article CAS PubMed PubMed Central Google Scholar
Goldschmidt, E., Ronen, A. & Ronen, I. Changing marriage systems in the jewish communities of israel. Ann. Hum. Genet. 24, 191–204 (1960).
Article CAS PubMed Google Scholar
Bixler, R. H. Sibling incest in the royal families of Egypt, Peru, and Hawaii. J. Sex Res. 18, 264–281 (1982).
Article Google Scholar
Ceballos, F. C. & Álvarez, G. Royal dynasties as human inbreeding laboratories: the Habsburgs. Heredity 111, 114–121 (2013).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Narasimhan, V. et al. BCFtools/RoH: a hidden Markov model approach for detecting autozygosity from next-generation sequencing data. Bioinformatics 32, 1749–1751 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gamba, C. et al. Genome flux and stasis in a five millennium transect of european prehistory. Nat. Commun. 5, 1–9 (2014).
Article CAS Google Scholar
Jones, E. R. et al. Upper palaeolithic genomes reveal deep roots of modern eurasians. Nat. Commun. 6, 1–8 (2015).
Article ADS CAS Google Scholar
Broushaki, F. et al. Early neolithic genomes from the eastern fertile crescent. Science 353, 499–503 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Sikora, M. et al. Ancient genomes show social and reproductive behavior of early Upper Paleolithic foragers. Science 358, 659–662 (2017).
Article ADS CAS PubMed Google Scholar
Schroeder, H. et al. Origins and genetic legacies of the Caribbean Taino. Proc. Natl Acad. Sci. USA 115, 2341–2346 (2018).
Article CAS PubMed PubMed Central Google Scholar
Racimo, F., Sikora, M., Vander Linden, M., Schroeder, H. & Lalueza-Fox, C. Beyond broad strokes: sociocultural insights from the study of ancient genomes. Nat. Rev. Genet. 21, 355–366 (2020).
Article CAS PubMed Google Scholar
Mafessoni, F. et al. A high-coverage Neandertal genome from Chagyrskaya Cave. Proc. Natl Acad. Sci. 117, 15132–15136 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cassidy, L. M. et al. A dynastic elite in monumental Neolithic society. Nature 582, 384–388 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Skoglund, P. & Mathieson, I. Ancient genomics of modern humans: the first decade. Annu. Rev. Genom. Hum. Genet. 19, 381–404 (2018).
Article CAS Google Scholar
Furtwängler, A. et al. Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis. Sci. Rep. 8, 14075 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Renaud, G., Hanghøj, K., Korneliussen, T. S., Willerslev, E. & Orlando, L. Joint estimates of heterozygosity and runs of homozygosity for modern and ancient samples. Genetics. 212, 587–614 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
Article CAS PubMed PubMed Central Google Scholar
Fu, Q. et al. An early modern human from Romania with a recent Neanderthal ancestor. Nature 524, 216 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS CAS Google Scholar
Fu, Q. et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Speidel, L., Forest, M., Shi, S. & Myers, S. R. A method for genome-wide genealogy estimation for thousands of samples. Nat. Genet. 51, 1321–1329 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lazaridis, I. et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Antonio, M. L. et al. Ancient Rome: a genetic crossroads of Europe and the Mediterranean. Science 366, 708–714 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Ringbauer, H., Steinrücken, M., Fehren-Schmitz, L. & Reich, D. Increased rate of close-kin unions in the central andes in the half millennium before european contact. Curr. Biol. 30, R980 – R981 (2020).
Article PubMed CAS Google Scholar
Harney, É. et al. Ancient DNA from Chalcolithic Israel reveals the role of population mixture in cultural transformation. Nat. Commun. 9, 1–11 (2018).
ADS CAS Google Scholar
Haber, M. et al. A transient pulse of genetic admixture from the crusaders in the near east identified from ancient genome sequences. Am. J. Hum. Genet. 104, 977–984 (2019).
Article CAS PubMed PubMed Central Google Scholar
Narasimhan, V. M. et al. The formation of human populations in South and Central Asia. Science 365, eaat7487 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fenner, J. N. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am. J. Phys. Anthropol. 128, 415–423 (2005).
Article PubMed Google Scholar
Barton, N. H., Depaulis, F. & Etheridge, A. M. Neutral evolution in spatially continuous populations. Theor. Popul. Biol. 61, 31–48 (2002).
Article PubMed MATH Google Scholar
Ringbauer, H., Coop, G. & Barton, N. H. Inferring recent demography from isolation by distance of long shared sequence blocks. Genetics 205, 1335–1351 (2017).
Article PubMed PubMed Central Google Scholar
Browning, S. R. et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 14, e1007385 (2018).
Article PubMed PubMed Central CAS Google Scholar
Piperno, D. R. & Fritz, G. J. On the emergence of agriculture in the New World. Curr. Anthropol. 35, 637–643 (1994).
Article Google Scholar
Olalde, I. et al. The genomic history of the iberian peninsula over the past 8000 years. Science 363, 1230–1234 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Zilhão, J. Radiocarbon evidence for maritime pioneer colonization at the origins of farming in west Mediterranean Europe. Proc. Natl Acad. Sci. USA 98, 14180–14185 (2001).
Article ADS PubMed PubMed Central Google Scholar
Leutenegger, A.-L. et al. Estimation of the inbreeding coefficient through use of genomic data. Am. J. Hum. Genet. 73, 516–523 (2003).
Article CAS PubMed PubMed Central Google Scholar
Auton, A. et al. Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res. 19, 795–803 (2009).
Article CAS PubMed PubMed Central Google Scholar
Vieira, F. G., Albrechtsen, A. & Nielsen, R. Estimating ibd tracts from low coverage ngs data. Bioinformatics 32, 2096–2102 (2016).
Article CAS PubMed Google Scholar
King-Irani, L. Kinship, class, and ethnicity. Underst. Contemp. Middle East 2, 299–334 (2004).
Google Scholar
Korotayev, A. Parallel-cousin (fbd) marriage, islamization, and arabization. Ethnology. 39, 95–407 (2000).
Article Google Scholar
Diamond, J. & Bellwood, P. Farmers and their languages: the first expansions. Science 300, 597–603 (2003).
Article ADS CAS PubMed Google Scholar
Ammerman, A. J. & Cavalli-Sforza, L. L. The Neolithic Transition and the Genetics of Populations in Europe, Vol. 836 (Princeton University Press, 2014).
Bacci, M. L. A Concise History of World Population (John Wiley & Sons, 2017).
MacDonald, D. H. & Hewlett, B. S. Reproductive interests and forager mobility. Curr. Anthropol. 40, 501–524 (1999).
Article Google Scholar
Skoglund, P. et al. Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers. Science 344, 747–750 (2014).
Article ADS CAS PubMed Google Scholar
Kousathanas, A. et al. Inferring heterozygosity from ancient and low coverage genomes. Genetics 205, 317–332 (2017).
Article PubMed Google Scholar
Anthony, D. W. The Horse, the Wheel, and Language: How Bronze-age Riders from the Eurasian Steppes Shaped the Modern World (Princeton University Press, 2010).
Allentoft, M. E. et al. Population genomics of Bronze Age Eurasia. Nature 522, 167–172 (2015).
Article ADS CAS PubMed Google Scholar
Haak, W. et al. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522, 207 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
de Barros Damgaard, P. et al. The first horse herders and the impact of earlyBronze Age steppe expansions into Asia. Science. 360, eaar7711 (2018).
Article PubMed PubMed Central CAS Google Scholar
Hui, R., D’Atanasio, E., Cassidy, L. M., Scheib, C. L. & Kivisild, T. Evaluating genotype imputation pipeline for ultra-low coverage ancient genomes. Sci. Rep. 10, 1–8 (2020).
Article CAS Google Scholar
Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J. & Delaneau, O. Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nat. Genet. 53, 120–126 (2021).
Article CAS PubMed Google Scholar
Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kelleher, J. et al. Inferring whole-genome histories in large population datasets. Nat. Genet. 51, 1330–1338 (2019).
Article CAS PubMed PubMed Central Google Scholar
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Article PubMed PubMed Central Google Scholar
Ralph, P. & Coop, G. The geography of recent genetic ancestry across europe. PLoS Biol. 11, e1001555 (2013).
Article CAS PubMed PubMed Central Google Scholar
Palamara, P. F. & Pe’er, I. Inference of historical migration rates via haplotype sharing. Bioinformatics 29, i180–i188 (2013).
Article CAS PubMed PubMed Central Google Scholar
Al-Asadi, H., Petkova, D., Stephens, M. & Novembre, J. Estimating recent migration and population-size surfaces. PLoS Genet. 15, e1007908 (2019).
Article PubMed PubMed Central CAS Google Scholar
Frantz, L. A., Bradley, D. G., Larson, G. & Orlando, L. Animal domestication in the era of ancient genomics. Nat. Rev. Genet. 21, 449–460 (2020).
Article CAS PubMed Google Scholar
Prüfer, K. et al. The complete genome sequence of a neanderthal from the altai mountains. Nature 505, 43–49 (2014).
Article ADS PubMed CAS Google Scholar
Kuhlwilm, M. et al. Ancient gene flow from early modern humans into eastern neanderthals. Nature 530, 429–433 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Robinson, J. A. et al. Genomic signatures of extensive inbreeding in isle royale wolves, a population on the threshold of extinction. Sci. Adv. 5, eaau0757 (2019).
Article ADS PubMed PubMed Central Google Scholar
Narasimhan, V. M. et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474–477 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Szpiech, Z. A. et al. Ancestry-dependent enrichment of deleterious homozygotes in runs of homozygosity. Am. J. Hum. Genet. 105, 747–762 (2019).
Article CAS PubMed PubMed Central Google Scholar
Clark, D. W. et al. Associations of autozygosity with a broad range of human phenotypes. Nat. Commun. 10, 1–17 (2019).
Article ADS CAS Google Scholar
Walters, R., Millwood, I., Lin, K., Mei, X. & Chen, Z. Associations of autozygosity with a broad range of human phenotypes. Nat. Commun. 10, 1–17 (2019).
CAS Google Scholar
Marcus, J. H. et al. Genetic history from the Middle Neolithic to present on the Mediterranean island of Sardinia. Nat. Commun. 11, 1–14 (2020).
Article CAS Google Scholar
Haviland, W. A., Prins, H. E., McBride, B. & Walrath, D. Cultural Anthropology: The Human Challenge (Cengage Learning, 2013).
Chiang, C. W., Ralph, P. & Novembre, J. Conflation of short identity-by-descent segments bias their inferred length distribution. G3 6, 1287–1296 (2016).
Article PubMed PubMed Central Google Scholar
Williams, C. K. & Rasmussen, C. E. Gaussian processes for machine learning. Vol. 2 (MIT Press Cambridge, MA, 2006).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
McCullagh, P. & Nelder, J. Generalized Linear Models (Monographs on Statistics and Applied Probability 37). (Chapman Hall, London, 1989).
Carmi, S., Wilton, P. R., Wakeley, J. & Pe’er, I. A renewal theory approach to IBD sharing. Theor. Popul. Biol. 97, 35–48 (2014).
Article PubMed MATH Google Scholar
Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015).
Article CAS PubMed PubMed Central Google Scholar
Anderson, M. J. A new method for non-parametric multivariate analysis of variance. Austral. Ecol. 26, 32–46 (2001).
Google Scholar
Ringbauer, H. hringbauer/hapRoh. https://doi.org/10.5281/zenodo.4992416 (2021).

Download references

Acknowledgements

We thank the original study authors for sharing their data publicly, and David Reich and his lab, in particular Shop Mallick, for compiling and making publicly accessible a normalized pseudohaploid compilation of those data. We thank David Anthony and Alissa Mittnik for reviewing parts of the subsistence strategy annotations and for helpful discussions. We thank Arjun Biddanda, Shai Carmi, David Schloen, Lars Fehren-Schmitz, Montgomery Slatkin, and Mashaal Sohail for their comments on the paper. Funding for H.R. and J.N. was provided by NIH grant R01HG007089 and R01GM132383 to J.N.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: John Novembre, Matthias Steinrücken

Authors and Affiliations

Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Harald Ringbauer
Department of Human Genetics, University of Chicago, Chicago, IL, USA
Harald Ringbauer, John Novembre & Matthias Steinrücken
Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA
John Novembre & Matthias Steinrücken

Authors

Harald Ringbauer
View author publications
You can also search for this author in PubMed Google Scholar
John Novembre
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Steinrücken
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

We annotate author contributions using the CRediT Taxonomy labels (https://casrai.org/credit/). Where multiple individuals serve in the same role, the degree of contribution is specified as “lead”, “equal”, or “support”. Conceptualization (Design of study)—lead: H.R.; support: J.N. and M.S. Software—lead: H.R.; support: M.S. Formal analysis—H.R. Data curation—H.R.; support: J.N. Writing (original draft preparation)—lead: H.R.; support: J.N. and M.S. Writing (review and editing)—input from all authors. Supervision—equal: J.N. and M.S. Project administration—equal: J.N. and M.S. Funding acquisition—J.N.

Corresponding author

Correspondence to Harald Ringbauer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Francisco Ceballos and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ringbauer, H., Novembre, J. & Steinrücken, M. Parental relatedness through time revealed by runs of homozygosity in ancient DNA. Nat Commun 12, 5425 (2021). https://doi.org/10.1038/s41467-021-25289-w

Download citation

Received: 22 September 2020
Accepted: 21 July 2021
Published: 14 September 2021
DOI: https://doi.org/10.1038/s41467-021-25289-w

This article is cited by

Accurate detection of identity-by-descent segments in human ancient DNA
- Harald Ringbauer
- Yilei Huang
- David Reich
Nature Genetics (2024)
Ancient Rapanui genomes reveal resilience and pre-European contact with the Americas
- J. Víctor Moreno-Mayar
- Bárbara Sousa da Mota
- Anna-Sapfo Malaspinas
Nature (2024)
Bronze age Northern Eurasian genetics in the context of development of metallurgy and Siberian ancestry
- Ainash Childebayeva
- Fabian Fricke
- Wolfgang Haak
Communications Biology (2024)
The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes
- Swapan Mallick
- Adam Micco
- David Reich
Scientific Data (2024)
Ancient genomes provide insights into the genetic history in the historical era of southwest China
- Fan Zhang
- Xinglong Zhang
- Chao Ning
Archaeological and Anthropological Sciences (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.