Triangulation supports agricultural spread of the Transeurasian languages

The origin and early dispersal of speakers of Transeurasian languages—that is, Japanese, Korean, Tungusic, Mongolic and Turkic—is among the most disputed issues of Eurasian population history1–3. A key problem is the relationship between linguistic dispersals, agricultural expansions and population movements4,5. Here we address this question by ‘triangulating’ genetics, archaeology and linguistics in a unified perspective. We report wide-ranging datasets from these disciplines, including a comprehensive Transeurasian agropastoral and basic vocabulary; an archaeological database of 255 Neolithic–Bronze Age sites from Northeast Asia; and a collection of ancient genomes from Korea, the Ryukyu islands and early cereal farmers in Japan, complementing previously published genomes from East Asia. Challenging the traditional ‘pastoralist hypothesis’6–8, we show that the common ancestry and primary dispersals of Transeurasian languages can be traced back to the first farmers moving across Northeast Asia from the Early Neolithic onwards, but that this shared heritage has been masked by extensive cultural interaction since the Bronze Age. As well as marking considerable progress in the three individual disciplines, by combining their converging evidence we show that the early spread of Transeurasian speakers was driven by agriculture.


Linguistics
We collected a new dataset of 3,193 cognate sets that represent 254 basic vocabulary concepts for 98 Transeurasian languages, including dialects and historical varieties (Supplementary Data 1). We applied Bayesian methods to infer a dated phylogeny of the Transeurasian languages (Supplementary Data 24). Our results indicate a time-depth of 9181 bp (5595-12793 95% highest probability density (95% HPD)) for the Proto-Transeurasian root of the family; 6811 bp (4404-10166 95% HPD) for Proto-Altaic, the unity of Turkic, Mongolic and Tungusic languages; 4491 bp (2599-6373 95% HPD) for Mongolo-Tungusic; and 5458 bp (3335-8024 95% HPD) for Japano-Koreanic (Fig. 1b). These dates estimate the time-depth of the initial break-up of a given language family into more than one foundational subgroup.
We used our lexical dataset to model the expansion of Transeurasian languages in space (Supplementary Data 3,4). We applied Bayesian phylogeography to complement classical approaches, such as lexicostatistics, the diversity hotspot principle and cultural reconstruction [1][2][3]8 .
In contrast to previously proposed homelands, which range from the Altai 6-8 to the Yellow River 22 to the Greater Khingan Mountains 23 to the Amur basin 24 , we find support for a Transeurasian origin in the West Liao River region in the Early Neolithic. After a primary break-up of the family in the Neolithic, further dispersals took place in the Late Neolithic and Bronze Age. The ancestor of the Mongolic languages expanded northwards to the Mongolian Plateau, Proto-Turkic moved westwards over the eastern steppe and the other branches moved eastwards: Proto-Tungusic to the Amur-Ussuri-Khanka region, Proto-Koreanic to the Korean Peninsula and Proto-Japonic over Korea to the Japanese islands (Fig. 1b).
By contrast, individual subfamilies that separated in the Bronze Age, such as Turkic, Mongolic, Tungusic, Koreanic and Japonic, inserted new subsistence terms that relate to the cultivation of rice, wheat and barley; dairying; domesticated animals such as cattle, sheep and horses; farming or kitchen tools; and textiles such as silk (Supplementary Data 5). These words are borrowings that result from linguistic interaction between Bronze Age populations speaking various Transeurasian and non-Transeurasian languages.
In summary, the age, homeland, original agricultural vocabulary and contact profile of the Transeurasian family support the farming hypothesis and exclude the pastoralist hypothesis (Supplementary Data 5).

Archaeology
Although Neolithic Northeast Asia was characterized by widespread plant cultivation 25 , cereal farming expanded from several centres of domestication, the most important of which for Transeurasian was the West Liao basin, where cultivation of broomcorn millet started by 9000 bp [26][27][28][29] . Extracting data from the published literature, we scored 172 archaeological features for 255 Neolithic and Bronze Age sites (Supplementary Data 6, Fig. 2a) and compiled an inventory of 269 directly carbon-14-dated early crop remains (Supplementary Data 9) in northern China, the Primorye, Korea and Japan.
The main results of our Bayesian analysis (Supplementary Data 25), which clusters the 255 sites according to cultural similarity, are visualized in Fig. 2b. We find a cluster of Neolithic cultures in the West Liao basin, from which two branches associated with millet farming separate: a Korean Chulmun branch and a branch of Neolithic cultures covering the Amur, Primorye and Liaodong. This confirms previous  Article findings about the dispersal of millet agriculture to Korea by 5500 bp and via the Amur to the Primorye by 5000 bp 30,31 . Our analysis further clusters Bronze Age sites in the West Liao area with Mumun sites in Korea and Yayoi sites in Japan. This mirrors how during the fourth millennium bp, the agricultural package of the Liaodong-Shandong area was supplemented with rice and wheat. These crops were transmitted to the Korean Peninsula by the Early Bronze Age (3300-2800 bp) and from there to Japan after 3000 bp (Fig. 2b).
Although population movements were not linked with monothetic archaeological cultures, Neolithic farming expansions in Northeast Asia were associated with some diagnostic features, such as stone tools for cultivation and harvesting and textile technology 32 (Supplementary Data 7). Domesticated animals and dairying had an important role in the spread of the Neolithic in western Eurasia but, except for dogs and pigs, our database shows little evidence for animal domestication in Northeast Asia before the Bronze Age (Supplementary Data 6). The link between agriculture and population migrations is especially clear from similarities between ceramics, stone tools, and domestic and burial architecture between Korea and western Japan 33 .
Building on previous studies, we provide an overview of demographic changes associated with the introduction of millet farming across the regions in our study (Extended Data Fig. 3). Having invested in elaborate paddy fields, wet rice farmers tended to stay in one place, absorbing population growth through extra labour, whereas millet farmers typically adopted a more expansionary settlement pattern 34 . Neolithic population densities increased across Northeast Asia before a population crash in the Late Neolithic 35,36 . The Bronze Age then saw exponential population increases in China, Korea and Japan.

Genetics
We report genomic analyses of 19 authenticated ancient individuals from the Amur, Korea, Kyushu and the Ryukyus and combined them with published genomes that cover the eastern steppe, West Liao, Amur and Yellow River regions, Liaodong, Shandong, the Primorye and Japan between 9500 and 300 bp (Fig. 3a Fig. 3b). Although we lack Early Neolithic genomes in the West Liao River, Amur-like ancestry thus is likely to represent the original genetic profile of indigenous pre-Neolithic (or late Palaeolithic) hunter-gatherers covering Baikal, Amur, Primorye, the southeastern steppe and West Liao, continuing in the early farmers from this region. This contradicts a recent genetic study 13   As Amur-related ancestry can be traced down to speakers of Japanese and Korean 13 , it appears to be the original genetic component common to all speakers of Transeurasian languages. By analysing ancient genomes from Korea (Supplementary Data 12), we find that Jomon ancestry was present on the Peninsula by 6000 bp (Fig. 3b    Our results support massive migration from Korea into Japan in the Bronze Age. The Nagabaka genomes from Miyako Island (Supplementary Data 12) represent the first-to our knowledge-ancient genome-wide data from the Ryukyus. Contrary to previous findings that Holocene populations reached the southern Ryukyus from Taiwan 40 , our results suggest that the prehistoric Nagabaka population originated in Jomon cultures to the north (Extended Data Fig. 7). The genetic turn-over from Jomon-to Yayoi-like ancestry before the early modern period mirrors the late arrival of agriculture and Ryukyan languages in this region.

Discussion
Triangulation of linguistic, archaeological and genetic evidence shows that the origins of the Transeurasian languages can be traced back to the beginning of millet cultivation and the early Amur gene pool in Neolithic Northeast Asia. The spread of these languages involved two major phases that mirror the dispersal of agriculture and genes (Fig. 4). The first phase, represented by the primary splits in the Transeurasian family, goes back to the Early-Middle Neolithic, when millet farmers associated with Amur-related genes spread from the West Liao River to contiguous regions. The second phase, represented by linguistic contacts between the five daughter branches, goes back to the Late Neolithic, Bronze and Iron Ages, when millet farmers with substantial Amur ancestry gradually admixed with Yellow River, western Eurasian and Jomon populations and added rice, west Eurasian crops and pastoralism to the agricultural package.
Bringing together the spatiotemporal and subsistence patterns, we find clear links between the three disciplines (Supplementary Data 26). The onset of millet cultivation in the West Liao region around the ninth millennium bp can be associated with substantial Amur-related ancestry and overlaps in time and space with the ancestral Transeurasian speech community. In line with recent associations between the Sino-Tibetan family estimated at 8000 bp 41,42 and Neolithic farmers from the Upper and Middle Yellow River 13,14 , our results associate the two centres of millet domestication in Northeast Asia with the origins of two major language families: Sino-Tibetan on the Yellow River and Transeurasian on the West Liao River. The lack of evidence for Yellow River influence in the ancestral Transeurasian language and genes is consistent with the multi-centric origins of millet cultivation suggested in archaeobotany 28 .
The early stages of millet domestication in the ninth to seventh millennia bp are accompanied by population growth (Extended Data Fig. 3), leading to the formation of environmentally or socially separated subgroups in the West Liao region and broken connectivity between speakers of Altaic and Japano-Koreanic.
Around the mid-sixth millennium bp, some of these farmers started to migrate eastwards, around the Yellow Sea into Korea and northeast into the Primorye, bringing Koreanic and Tungusic languages to these regions and bringing from the West Liao region additional Amur ancestries to the Primorye and mixed Amur-Yellow River ancestries to Korea. Our newly analysed Korean genomes are notable in that they testify to the presence of and admixture with Jomon-related ancestries outside Japan.
The Late Bronze Age saw extensive cultural exchange across the Eurasian steppe, which resulted in the admixture of populations from the West Liao region and the Eastern steppe with western Eurasian genetic lineages. Linguistically, this interaction is mirrored in the borrowing of agropastoral vocabulary by Proto-Mongolic and Proto-Turkic speakers, especially relating to wheat and barley cultivation, herding, dairying and horse exploitation.
Around 3300 bp, farmers from the Liaodong-Shandong area migrated to the Korean peninsula, adding rice, barley and wheat to millet agriculture. This migration aligns with the genetic component modelled as Upper Xiajiadian in our Bronze Age sample from Korea and is reflected in early borrowings between Japonic and Koreanic languages. Archaeologically it can be associated with agriculture in the larger Liaodong-Shandong area without being specifically restricted to Upper Xiadiajian material culture.
In the third millennium bp, this agricultural package was transmitted to Kyushu, triggering a transition to full-scale farming, a genetic turn-over from Jomon to Yayoi ancestry and a linguistic shift to Japonic. By adding unique samples from Nagabaka in the southern Ryukyus, we traced the farming/language dispersal to the edge of the Transeurasian world. Demonstrating that Jomon ancestry stretched as far south as Miyako Island, our results contradict previous assumptions of a northward expansion by Austronesian populations from Taiwan. Together with the Jomon profile discovered at Yokchido in Korea, our results show that Jomon genomes and material culture did not always overlap.
By advancing new evidence from ancient DNA, our research thus confirms recent findings that Japanese and Korean populations have West Liao River ancestry, whereas it contradicts previous claims that there is no genetic correlate of the Transeurasian language family 13 .
Although some previous research regarded the Transeurasian zone as beyond the area suitable for farming 20 , our research confirms that the farming/language dispersal hypothesis remains an important model for understanding Eurasian population dispersals 21 . Triangulation of linguistics, archaeology and genetics resolves the competition between the pastoralist and farming hypotheses and concludes that the early spread of Transeurasian speakers was driven by agriculture.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-04108-8.

Linguistics
Bayesian phylogenetics. Combining dictionary search with fieldwork, we collected a comparative dataset including 3,193 datapoints representing 254 basic vocabulary concepts for 98 Transeurasian languages, including contemporary and historical varieties (Supplementary Data 1). These concepts are based on a merger of the Leipzig-Jakarta 200 (ref. 43 ) and Jena 200 (ref. 44 ) lists (Supplementary Data 2). The Turkic and Tungusic basic vocabulary included is based on a revision of recently published datasets 45,46 . Cognate coding is supported by an inventory of basic vocabulary etymologies and sound correspondences across the Transeurasian languages presented in Supplementary Data 2.
We performed a Bayesian phylogenetic analysis with cognates encoded as binary data 47 . Because the data were collected such that at least one cognate was present, the data were ascertained to not contain any sites having all zeros. Ascertainment correction was applied to cater for this 47 .
We considered the following substitution models, which govern the evolutionary process of cognates along branches of a tree: continuous time Markov chain (CTMC), which assumes a constant rate of mutations; covarion, which assumes a slow and fast rate and the model switching between these two states; and the pseudo Dollo covarion model, which is based on the Dollo principle that a cognate can only appear once, but can be lost many times. Detailed descriptions of the CTMC and covarion models 47 and the pseudo Dollo covarion model 48 are available in the literature. For all models, we assume that each meaning class has its own relative rate to capture the variation between rates of evolution of different words.
Although language evolves on average at a constant rate, we find that there can be considerable variation in rates between branches on a tree 47,48 . Such variation can be captured using the uncorrelated relaxed clock 49 , assuming rates are log-normally distributed.
A birth death model is used to describe the generative process of language creation. As the data contain ancient languages that may be ancestral to current languages, we allow the tree to have ancestral nodes. A fossilized birth death model 50 , which allows such ancestral nodes, is used as prior on the tree. Language family node ages were informed by age priors ( Japonic 2100 bp ± 175, Koreanic 800 bp ± 175, Turkic 2100 bp ± 175, Mongolic 750 bp ± 50, Tungusic 1900 bp ± 275). These calibrations are supported by chronological estimations proposed in linguistic literature (Supplementary Data 18). We found that these node age priors helped to reduce uncertainty slightly in the root age distribution.
We compared the fit of different models by estimating the marginal likelihoods using nested sampling 51 (Supplementary Data 18), and conclude that the pseudo Dollo covarion model with a relaxed clock has the best fit, and covarion with relaxed clock the next best fit. Both models produce compatible time estimates, though covarion estimates tend to have larger uncertainty (that is, have larger 95% HPD intervals). Time estimates of the CTMC model with relaxed clock are still compatible but even wider, and tend to have a higher mean.
All posterior estimates were performed using BEAST v.2.6 52 using adaptive coupled Markov chain Monte Carlo (MCMC) 53 . Detailed specification of the models, priors, hyperpriors and settings used to run these models can be found in the BEAST XML files (Supplementary Data 19). The results of our Bayesian analysis are visualized as a dated phylogenetic tree of the Transeurasian languages (Supplementary Data 24).
Bayesian phylogeography. We assumed that the dispersal of people through Eurasia can be described as a random walk, so is best captured by diffusion on a sphere 54 . To get an impression about the uncertainty in locating origins by such model, we performed a post hoc analysis using the posterior tree set from the lexical analysis. We assigned point positions to the tips and randomly sampled trees from the posterior while estimating geographical parameters through MCMC. Even in this relatively restricted set-up, the uncertainty in root location does not allow us to distinguish the different geographical origin hypotheses. The results of our analysis are represented on a map (Supplementary Data 3). As Bayesian phylogeography must contend with a number of limitations 55,56 , we complemented it with other homeland detection methods such as linguistic palaeontology and the diversity hotspot principle to reach a balanced location for the homelands of the root and nodes of the Transeurasian family (Supplementary Data 4). To distinguish between inherited and borrowed correspondence sets, we used standard criteria based on the phonology, semantics, morphology and distribution of the word involved, as specified in Supplementary Data 5. Dividing our dataset into inherited versus borrowed subsistence vocabulary, we determined distinctive spatiotemporal and cultural patterns for each category (Supplementary Data 5).
We applied linguistic palaeontology to our subsistence vocabulary, a historical comparative method that enables us to study human prehistory by correlating our linguistic reconstructions with information from archaeology about the culture of the ancient speech communities that used these words. In this way, we drew inferences about the subsistence strategies available to speakers of the different Transeurasian proto-languages in the Neolithic and Bronze Age (Supplementary Data 5) and identified a plausible location for the homeland of the ancient speech communities involved (Supplementary Data 4).

Diversity hotspot principle.
To estimate the location of the ancient speech communities involved, we combined Bayesian phylogeography and linguistic palaeontology with the diversity hotspot principle. The principle is based on the assumption that the homeland is closest to the greatest diversity with regard to the deepest subgroups of the language family. We located these areas on the map and took them as an approximation of the area where a certain proto-language began to diversify (Supplementary Data 4). Although this method must contend with certain limitations (Supplementary Data 4), taken together with the other techniques for homeland location discussed here, it can give us a reasonably robust estimation of the location of an ancient speech community.
Sites with several major cultural phases were scored separately. The sites date from 8400-1700 bp and include the Early Neolithic to Bronze Age in northeast China, the Middle Neolithic Zaisanovka culture in the Primorye, the Middle-Late Neolithic Chulmun and Bronze Age Mumun cultures in Korea, and the Late Neolithic-Bronze Age Final Jomon and Yayoi cultures in western Japan. Categories of cultural traits scored comprised ceramics (70), stone tools (38), buildings (9), plant and animal remains (26), shell and bone artefacts (17) and burials (12). Definitions of scored features are found in Supplementary Data 6 (sheet 2) and further discussion of scoring methods can be found in Supplementary Data 7. All features were scored as present (1) or absent (0) following published site reports or other literature.
The database was used to analyse changes in the distribution of Neolithic and Bronze Age artefacts over time, especially in relation to the spread of agricultural systems in Northeast Asia (Supplementary Data 7).
In addition, the cultural data in our archaeological database were analysed using Bayesian phylogenetic methods. There is a large amount of phylogenetic work with archaeological data 57 , some parsimony-based 58 , others distance-based 59 . The benefit of Bayesian approaches is that they are model-based, have sound formal mathematical foundations in probability theory allowing us to estimate uncertainty around all estimates, and allow integration of information from various sources in a single analysis (like cognate and geographic data) based on probability theory. BEAST is aimed specifically at inferring rooted time trees, and uncertainty of time estimates, which sets it apart from other Bayesian packages that target unrooted trees. Furthermore, BEAST supports models that are currently not available in other packages, hence the use of this package.
The cultural data are encoded as a binary alignment, and we applied the same substitution and clock models as for the lexical data. The pseudo Dollo model with relaxed clock fits the data best (Supplementary Data 20). Because the coefficient of variation of the relaxed clock exceeded 1, which indicates a considerable amount of variation, we also ran the analysis with the standard deviation capped at 1, which only slightly affected time estimates.
The large number of sampling dates and uncertainty on number of missing cultures made it hard to apply the fossilized birth death prior, so we opted for the flexible Bayesian skyline plot instead 60 . Timing information is based on sampling dates of archaeological finds. As there is uncertainty in dating these findings, tip dates were uniformly sampled in these intervals during the MCMC. In line with previous archaeological studies 61 - Archaeobotanical database. In addition to the database of archaeological features, we compiled a list of the earliest crop remains from each region of Northeast Asia directly dated by radiocarbon (Supplementary Data 9). This list comprises 269 samples (China, 82; Primorye, 12; Korea, 31; Japan (excluding Ryukyus), 120; Ryukyu Islands, 24). Radiocarbon dates in this database were re-calibrated using OxCal v.4.4. We used kernel density mapping to plot the spread of cereals in this database over time Supplementary Data 7). Our databases were supplemented by published datasets for faunal remains 64,65 , dolmens 66 and spindle whorls 67 .

Genetics
Laboratory procedures. Ancient DNA wet laboratory work, including DNA extraction and library preparation, was performed in a dedicated ancient DNA clean room facility at the Max Planck Institute for the Science of Human History (MPI-SHH) and in an ancient DNA laboratory at Jilin University following established protocols 68 . A double-stranded library was built with 8-mer index sequences at both P5 and P7 Illumina adapters. Four individuals from China characterized in Jilin were directly shotgun-sequenced on the Illumina HiSeq X10 instrument in the 150-bp paired-end sequencing design to obtain an adequate coverage. Eighty-three double-stranded libraries for 33 individuals from Korea and Japan were generated and characterized in the MPI-SHH either by shotgun sequencing or by insolution capture at approximately 1.2 million informative nuclear single-nucleotide polymorphisms (SNPs). After initial screening of the preservation of those libraries, a further 108 single-stranded libraries were built aiming at retrieving more endogenous DNA from the samples, and again, those libraries were directly shotgun-sequenced and in-solution-captured at around 1.2 million SNPs (Supplementary Data 17) and sequenced on the Illumina HiSeq 4000 platform following the manufacturer's protocols.
Sequence data processing. Raw sequencing reads were processed by an automated workflow with the EAGER v.1.92.55 programme 69 . Illumina adapter sequences were trimmed from the sequencing data and overlapping pairs were merged with AdapterRemoval v.2.2.0 70 . We mapped the merged reads with a minimum of 30 bp to the human reference genome (hs37d5; GRCh37 with decoy sequences) using BWA v.0.7.12 71 . We removed PCR duplicates by DeDup v.0.12.2 60 . To minimize the effect of post-mortem DNA damage on genotyping, we masked 2 bp for nonUDG libraries and 10 bp for half-UDG libraries on both ends per read using the trimbam function on bamUtils v.1.0.13 72 . The cleaned reads with both base quality (Phred-scale quality) and mapping quality (Phred-scale mapping quality) over 30 were piled up by SAMtools 1.3 60 with the mpileup function. We called pseudo-diploid genotypes using the pileupCaller program (https://github.com/stschiff/sequenceTools) against SNPs in the '1240k' panel 73,74 under the random haploid calling mode. For C/T and G/A SNPs, we used the masked BAM files; for the rest we used the original unmasked BAM files.
Reference datasets. We compared our ancient individuals to three sets of world-wide genotype panels, one based on the Affymetrix HumanOrigins Axiom Genome-wide Human Origins 1 array ('HumanOrigins'; 593,124 autosomal SNPs) 75 , the '1240k' panel 73 , and the 'Illumina' dataset 76 . We augmented these datasets by adding the Simons Genome Diversity Panel 77

and published ancient genomes (Supplementary Data 11).
Ancient DNA authentication. We applied multiple criteria to confirm the authentication of the newly published ancient genomes from Korea and Japan. First, we characterized the post-mortem chemical modifications characteristic for ancient DNA using mapDamage v.2.0.6 78 . Second, we estimated mitochondrial contamination rates for all individuals using Schmutzi v.1.5.1 79 . Third, we measured the nuclear genome contamination rate in males on the basis of X chromosome data as implemented in ANGSD v.0.910 80 . As males have only a single copy of the X chromosome, mismatches between bases, aligned to the same polymorphic position, beyond the level of sequencing error are considered as evidence of contamination. Fourth, we assessed the potential West Eurasian contamination with all reads available and the damage-restricted reads on single-stranded libraries implemented in the PMDtools 81 with a PMD score of at least 3 and compared their positions in a Eurasia PCA with all reads and damaged reads alone. Fifth, we applied qpAdm 74 per individual to further characterize the West Eurasian contamination with West Eurasian characteristic groups such as Sintashta_MLBA or LBK_EN as sources (see Supplementary Data 17, 22 for details).
Population structure analysis. We performed a PCA with the smartpca v.16000 82 using a set of 2,077 present-day Eurasian individuals from the 'HumanOrigins' dataset and the '1240kIllumina' dataset with the option 'lsqproject: YES' and 'shrinkmode: YES'. We used outgroup-f 3 statistics 83,84 to obtain a measurement of genetic affinity between two populations since their divergence from an African outgroup. We calculated f 4 statistics with the 'f4mode: YES' function in admixtools 31 . Both f 3 and f 4 statistics were calculated using qp3Pop v.435 and qpDstat v.755 in the admixtools package.
Genetic sexing and uniparental haplogroup assignment. We determined the molecular sex of our ancient samples by comparing the ratio of X and Y chromosome coverages to autosomes 85 . For women, we would expect an approximately even ratio of X to autosome coverage and a Y ratio of 0. For men we would expect roughly half of the coverage on X and Y than autosomes.

Triangulation
The term 'triangulation' is borrowed from a navigational technique that determines a single point in space with the convergence of measurements taken from two other distinct points. In qualitative research it designates a method used to capture different dimensions of the same phenomenon by using evidence from three distinct scientific disciplines. To avoid circularity in the argumentation, data collection, analyses and results are performed or reached within the limits of each individual discipline, independently from the other two. Only in the final phase of the triangulation process are the inferences drawn by the three disciplines mapped on each other by comparing a number of variables describing the phenomenon. The purpose of triangulation is to increase the credibility and validity of the results by evaluating the extent to which the evidence from the three disciplines converges and by identifying correlations, inconsistencies, uncertainties and potential biases across the different perspectives on the investigated phenomena.
Building on previous applications of triangulation in anthropology 86 , we applied the method to the dispersal of the Transeurasian languages, integrating linguistics, archaeology and genetics to contribute a better understanding of the phenomenon. We collected different datasets and applied the methods described above to draw independent inferences with regard to a number of variables such as location, chronology, migratory dynamics, continuity versus diffusion, and subsistence (Supplementary Data 26). Each discipline inferred the most parsimonious model involving these variables on the basis of the application of tools internal to its own field, whether qualitative or quantitative, based on direct or indirect evidence. Taken by itself, a single discipline alone cannot conclusively resolve the question about farming/language dispersals, but taken together the three disciplines increase the credibility and validity of this scenario. Aligning the evidence offered by the three disciplines, we gained a more balanced and richer understanding of Transeurasian migration than each of the three disciplines could provide us with individually.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Data analysis
The code used in the Bayesian analysis of the linguistic and cultural topologies is fully referenced. Population genetic data analysis in this study was performed using the following publicly available programs: Smartpca v16000, ADMIXTURE v1.3.0, PLINK v1.90, lcMLkin v0.5.0, qp3Pop v435, qpDstat v755, qpWave v410, qpAdm v810, DataGraph v4.5.1. Non-default parameters used in our analysis are described in the Methods section. The base map in Figure 1 was downloaded from the Nature Earth map dataset (https://www.naturalearthdata.com/), granted for the public domain use and is free for use in any type of project. Calibration of AMS 14C dating results was done by OxCal v4.4, using the IntCal20 database.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

April 2020
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability All linguistics and archaeological datasets are available through the supplementary information. Files that require applications were uploaded on two external sources, i.e. GitHub (https://github.com/rbouckaert/Eurasia3angle) and FigShare. For our genetic datasets, the DNA sequences reported in this paper have been deposited in the European Nucleotide Archive (ENA) under accession PRJEB46162. Haploid genotype data of ancient individuals in this study on the 1240k panel are available in the EIGENSTRAT format from the following link:https://edmond.mpdl.mpg.de/imeji/collection/59JGAaOpSxRb96Vh Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
No sample-size calculation was performed. The study proceeding by attempting to sample ancient DNA from contexts that were not previously analyzed and every new sample contributed meaningful new information. The uncertainties due to limited sample size are clearly indicated when there are concerns.
Data exclusions Data were excluded for analysis based either on evidence for sample contamination, or low coverage data. We clearly indicate these cases.

Replication
As our study is an evolutionary analysis of language, culture and genes and the evolutionary process only proceeds once, replication was not possible.
Randomization This is not relevant to our study because we are dealing with an evolutionary process not a human-designed experiment.

Blinding
Blinding was not possible for this study because the analysts needed to understand the historical background of the samples.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. Skeletal samples newly analysed in the study are under the custodianship of archaeologists or anthropologists in our team who contributed them to the study and whose permission to analyse the samples is indicated through co-authorship of the manuscript.

Specimen deposition
The analyzed samples are under the custodianship of the co-authors who contributed them to the study; the provenance of each sample is described in SI 11 and SI 12. Our co-authors will give access to the parts of the samples remaining after ancient DNA and radiocarbon analysis to anyone who requests it. We also shared photos in SI 13 and commit to sharing more photographic material of skeletal samples before and after sampling.