Early medieval genetic data from Ural region evaluated in the light of archaeological evidence of ancient Hungarians

The ancient Hungarians originated from the Ural region of Russia, and migrated through the Middle-Volga region and the Eastern European steppe into the Carpathian Basin during the ninth century AD. Their Homeland was probably in the southern Trans-Ural region, where the Kushnarenkovo culture was disseminated. In the Cis-Ural region Lomovatovo and Nevolino cultures are archaeologically related to ancient Hungarians. In this study we describe maternal and paternal lineages of 36 individuals from these regions and nine Hungarian Conquest period individuals from today’s Hungary, as well as shallow shotgun genome data from the Trans-Uralic Uyelgi cemetery. We point out the genetic continuity between the three chronological horizons of Uyelgi cemetery, which was a burial place of a rather endogamous population. Using phylogenetic and population genetic analyses we demonstrate the genetic connection between Trans-, Cis-Ural and the Carpathian Basin on various levels. The analyses of this new Uralic dataset fill a gap of population genetic research of Eurasia, and reshape the conclusions previously drawn from tenth to eleventh century ancient mitogenomes and Y-chromosomes from Hungary.


Scientific Reports
| (2020) 10:19137 | https://doi.org/10.1038/s41598-020-75910-z www.nature.com/scientificreports/ Earlier mitochondrial DNA (mtDNA) studies of modern populations speaking Uralic languages suggest that the distribution of Eastern and Western Eurasian mtDNA lineages are determined by geographic distances rather than linguistic barriers [18][19][20] , e.g. Finno-Ugric populations from Volga-Ural region seem to be more similar to their Turkic neighbours than to linguistically related Balto-Finnish ethnic groups 18 . The recent study of 15 Uralic-speaking populations describes their similarities to neighbouring populations as well, however they also share genetic component of possibly Siberian origin 21 . Even though that some mitochondrial lineages of present-day Hungarians have possible Siberian descent 22 , the Hungarians' gene pool differs from that of other Uralic-speakers 21 .
The main goal of this study is to expand the current set of archaeological knowledge about the early medieval populations of the Ural region by archaeogenetic methods. During the collection of 36 human samples from Ural region processed in this study, the most important intention was to collect burials exclusively from such professionally excavated and appropriately documented cemeteries from the South-Ural region, which are culturally and temporally (directly or indirectly) connected to the ancestors of Hungarians ( Fig. 1 and Supplementary  Fig. S1a-h).
The sampled Uyelgi cemetery from Trans-Ural region presented the greatest similarity to the archaeological traits of the tenth-century Carpathian Basin (Fig. 1, Supplementary Fig. S1e-h). This cemetery of the late Kushnarenkovo culture was used from the end of eighth century to the eleventh century 2,23 .
As the archaeological and historical theories are slightly diverse, we aimed to cover a wide range of early medieval archaeological cultures located in the middle course of the Kama River in the west side of the Ural Mountains (Cis-Ural region). Scholars connect the termination of the Nevolino Culture in eighth-ninth centuries AD to the westward migration of ancestors of Hungarians 1-3 , hence the sampling was carried out in all three phases of this culture: Brody (third-fourth centuries), Bartym (fifth-sixth centuries) and Sukhoy Log (seventh-eighth centuries) 24 (Fig. 1). Furthermore, we investigated the Bayanovo cemetery (ninth-tenth centuries AD), which represents the southern variant of Lomovatovo culture 3 that shows close cultural connection to its southern neighbour Nevolino culture. The sampling of the richly furnished graves of Bayanovo was limited by the poor preservation of bone samples (see Supplementary text, Figs S1b-d) 6 .
Additionally, we reanalysed nine samples from tenth-to twelfth-centuries ancient Hungarians for whole mitogenomes from the Carpathian Basin, who were chosen from the previous study Csősz et al. 13 based on identical hypervariable I region (HVRI) haplotypes of mtDNA with some of investigated Uralic individuals.
In this paper, our main purpose was to characterize the maternal and paternal genetic composition of populations from the third-to eleventh-centuries South-Ural region and compare the results with the available ancient and modern genetic datasets of Eurasia. We also aimed to describe possible genetic connections between the studied Uralic populations and the Conquest period populations of the Carpathian Basin.

Results and discussion
The sample-pool consisted of 29 males and 16 females. We performed whole mitochondrial DNA and 3122 nuclear SNPs target-enrichment combining with shallow shotgun sequencing. With the latter we obtained autosomal and Y-chromosomal SNPs, as well as sex-determination of 45 individuals that originated from five different cemeteries in Ural region and six burial sites in present-day Hungary (Carpathian Basin). Furthermore, we investigated the Y-STR profiles of 20 male individuals from the Ural-region. For detailed information see Supplementary Tables S1, S2, S13. For the radiocarbon dating and stable isotope data see the Supplementary Information chapter 2 and Supplementary Tables S1, S14. Primary observations. 45 high coverage mitochondrial genomes were obtained (sequencing depth from 8.71 × to 154.03 ×), with mean coverage of 71.16 × and an average contamination rate of 0.2%. The new dataset consists of the mixture of nine macrohaplogroups (A, C, D, H, T, U, N, R, Z) (Fig. 2a). Haplogroups of presumably West Eurasian origin are represented by U (U2e1, U3a1, U4a1d, U4b1a1a1, U4d2, U5a1a1, U5b2a1a1, N = 12), H (H1b2, H3b, H40b, N = 9), N (N1a1a1a1a, N = 5) and T (T1a1, T1a2, T2b4h, N = 5), although phylogeographic analyses show eastern origin for some of them, see Supplementary Table S16  Even though that the Hungarian conquerors were selected based on mtDNA HVRI matches with certain ancient individuals from the Ural region, they have not proved to be identical on whole mitogenome level, but remained phylogenetically close to the associated samples (see Supplementary Fig. S4a-s).
A few mitochondrial lineage relations connect Trans-Ural and Cis-Ural regions: e.g. samples from Uyelgi and Sukhoy Log clustered together in one main branch of the A + 152 + 16,362 haplogroup tree ( Supplementary  Fig. S4b), furthermore samples from Uyelgi and Brody (with haplogroup D4j2) and from Uyelgi and Bartym (with haplogroup U4d2) are located on the same main branch as well (Supplementary Figs. S4e and S4p).
In contrast to the mitochondrial lineages, the Y-chromosomal gene pool based on STR and/or SNP data show homogenous composition in our dataset: 83.3% is N-M46, 5.5% G2a (G-L1266), 5.5% J2 and 5.5% is R1b of the typed male individuals (Supplementary Table S2). 13 male samples out of 19 from Uyelgi cemetery carry Y-haplogroup N with various DNA preservation-dependent subhaplogroup classifications, while in the Cis-Ural we detected three N-M46 Y-haplogroups (samples from Brody, Bartym and Bayanovo cemeteries). The overall poor preservation of further Cis-Uralic samples from Sukhoy Log and Bartym disabled further Y-chromosomebased analyses (Supplementary Table S2).
Comparative population genetic analyses of maternal lineages and genomic data. We Fig. S6). Some of Central-South Asian and Finno-Ugric modern populations (e.g.   Table S12). The genetic distance between Uyelgi and Hungarian conquerors is lower than between Uyelgi and geographical closer Cis-Ural population (Supplementary Table S5).
According to the MDS plot of 28 ancient populations based on linearized Slatkin F ST ( Supplementary Fig. S9a), the Cis-Ural population shows affinities among others to the populations of medieval Hungarian conquerors along coordinates 1 and 2, and is situated between European and Asian populations, which reflects the raw F ST values. The Uyelgi is standing on the Asian part of the plot relatively far from all ancient populations, which is most likely due to its significant and larger genetic distances from ancient populations (except the Late Iron Age population from Central Asia 25 ) and the scarcity of Asian comparative mitogenome datasets. The rank correlation heatmap ( Supplementary Fig. S9b) of the F ST values of ancient populations supports the MDS plot, where the Uyelgi and Cis-Ural populations cluster with the same ancient populations that are close to them on the MDS plot. The genetic connection of Cis-Ural population and Hungarian conquerors 14 is obvious based on pairwise F ST calculation and is visible on the PCA and MDS plots as well, where they are the closest, although direct phylogenetic connections are scarce. This indicates geographical proximity of their former settlement area, rather than a direct connection. Neparáczki et al. have described the Hungarian conqueror mitogenome diversity in essence as a mixture of Srubnaya culture associated and Asian nomadic populations. Their analyses and interpretation were restricted by the lack of ancient samples from the Ural region, whereas new data now refine such previous conclusions 14 . Furthermore, it is notable, that the previously studied Hungarian conqueror population is a pool of mixed origin including not only immigrants but also local admixed lineages from the Carpathian Basin.
The Cis-Ural population reveals non-significant genetic distances from four modern populations of Central Asian Highlands, furthermore seven populations of Near East and Caucasus region and six European populations (see Supplementary Table S6) indicating a mixed character of this population, which is also visible on the MDS plot ( Supplementary Fig. S10).
Interestingly, the mitogenome pool of Uyelgi shows significant differences in genetic distances among nearly all prehistoric and modern populations including Hungarian conqueror population in spite of the extensive phylogenetic connections, which might be explained by high amount of related lineages within the population, as well as by their mixed character of Eastern-and Western-Eurasian haplogroups.
We performed genomic PCA with five Uyelgi samples yielding 10,928 nuclear genomic SNPs on average gained from 3000 SNP capture and shallow shotgun sequencing data (called from 598,094 SNPs of which 524,301 were used for the calculations, see also Supplementary Information, Chapter 3). The PCA plots (Fig. 3,   (Fig. 3). Based on linguistic clines described by Jeong et al. 28 , the Uyelgi population is plotted between the Uralic speaking individuals and the Turkic language speaking populations from Central Steppe. Since PCA may not reveal population stratification we performed unsupervised ADMIXTURE (K = 16) on an enlarged sets of SNPs (SI, chapter 3). The five Uyelgi samples with an average calling of 22,540 SNPs show the most similar ancestry cluster proportions to present-day Mansis and Irtysh-Barabinsk Tatars and to a set of various ancient genomes from the Central Steppe region 26 .
To disentangle the connections between these populations and possible population genetic events of thousands of years between populations under study, more ancient reference samples and deeper sequencing is needed.
The genetic continuity between the horizons of the Uyelgi cemetery (Trans-Ural region). The kurgan burials at Uyelgi site can be divided into at least three chronological horizons: (1) the oldest ninthcentury, (2) ninth-and tenth-centuries and (3) tenth-and eleventh-centuries according to the archaeological records (see Supplementary text chapter 1 and Supplementary Fig. S1e-h). Uniparental genetic markers show genetic continuity between these horizons suggesting maternally rather endogamous population, which could not be observed in archaeological findings due to high number of disturbed burials in the cemetery. Mitochondrial phylogenies of N1a1a1a1a, C4a1a6 and H40b provide identical or monophyletic lineages within and between the three horizons (see Figs. 4, 5 and Supplementary Fig. S4c, g-h), which trend is more pronounced by haplotype and network analysis of paternal lineages (Fig. 6, Supplementary Figs. S11-S12).
The haplotypes of N-M46 Y-haplogroup are presented in all three horizons, however with little differences in STR profiles (Supplementary Table S2). The oldest and the middle horizons contain only N-M46 haplotypes including two identical STR profiles in Kurgan 32 (ninth century). Three identical Y-STR profiles are detected among individuals of Kurgans 28,29 and 30 (Figs. 4,6). Probably further identical Y-haplotypes could have been in this cemetery, but the preservation does not let us reconstruct whole Y-STR profiles of seven males (see Supplementary Table S2). Based on these results we suggest that Uyelgi cemetery was used by a patrilocal community.
Uniparental markers as well the grouping of the graves suggest blood relationships between the investigated individuals at least to some extent. Unfortunately, informative positions for kinship analysis do not exceed 400 autosomal SNPs on pairwise comparisons, thus only conjectures can be made at this level.
The possible maternal genetic connection of South-Ural region's populations and the Hungarian conquerors. The genetic connection of Uyelgi cemetery in the Trans-Ural and tenth century Hungarian conquerors in the Carpathian Basin is supposed by close maternal relationships of the following individuals: Uyelgi3 and three Hungarian conquerors from Karos II cemetery 14 have identical U4d2 mitogenome haplotype ( Supplementary Fig. S4p). Furthermore, the mtDNA A12a lineage of Hconq3 (30-40 years old woman from Harta cemetery dated to the first half of tenth century AD) is an ancestor of the mtDNA lineage of Uyelgi7 (see Supplementary Fig. S4a).
The mentioned graves from Uyelgi show the archaeological characteristic of the Srostki culture, where the gilt silver mounts with plant ornaments were typical, and which was disseminated from the Siberian Minusinsk Depression and the Altai region through the Baraba Steppe and North-Kazakhstan to the Trans-Ural region (Fig. 1). Moreover, it is notable that the archaeological findings in these kurgans are dated not earlier then the tenth century AD, i.e. after the Hungarian conquest of the Carpathian Basin. Identical mitogenome sequences with the Hungarian conquerors from Karos cemetery appearing on the phylogenetic tree U4d2 can point out close biological connections or common source population of the Uyelgi population and the Hungarian conquerors.
The D4j phylogenetic tree shows one interesting phenomenon: Uyelgi21 clusters with one modern-day Hungarian (Fig. S4e). The findings of this Kurgan 11 (belonging to the Srostki culture) show similarities to the typical findings of the Hungarian conquerors from the Carpathian Basin as well (see Fig. 1 and Supplementary Fig. S1h).
The mitogenome of individual Uyelgi10 and three identical lineages of two Hungarian conquerors (Hconq1 and Hconq6) from Balatonújlak-Erdő-dűlő and Hconq9 from Makó-Igási járandó cemetery clustered together in one branch on the phylogenetic tree of haplogroup U5a1a1 (Supplementary Fig. S4q). The Uyelgi10 shows mixed character from archaeological point of view: the findings can be connected to the ninth century AD as well as to the cultural influences of the Srostki culture (for the detailed information see Supplementary information) 29,30 . The samples of adult women from Balatonújlak-Erdő-dűlő buried with gilt silver hairpins could be dated (based on archaeological findings) to the middle third of the tenth century AD 31 . One of the burials had a grave with a sidewall niche of eastern origin. The grave from Makó-Igási járandó without findings is dated to the middle third of eleventh century AD, i.e. to the Árpádian Age, when conquerors and the local population presumably admixed already. Interestingly, the 25-30 years old man shows some Asian cranial traits as the most men buried in this cemetery 32 .

Scientific Reports
| (2020) 10:19137 | https://doi.org/10.1038/s41598-020-75910-z www.nature.com/scientificreports/ The connection of Uyelgi cemetery and Hungarian conquerors is visible on the N1a1a1a1a branch of the tree of mtDNA haplogroup N1a too, that was prevalent among the ancient Hungarians (Fig. 5). Here seven Hungarian conqueror samples from cemeteries Kenézlő-Fazekaszug, Orosháza-Görbicstanya and Karos-Eperjesszög . Mitochondrial haplotype and Y-chromosomal similarities between the kurgans of the three horizons of Uyelgi cemetery. Three chronological horizons were defined in the cemetery: an oldest horizon from ninth century (marked with green), a middle horizon from ninth-tenth centuries (marked with orange) and the youngest horizon from tenth-eleventh centuries (marked with red colour). The bold and italic highlighted letters indicate different mitochondrial and/or Y-STR haplotype matches within and between the kurgans whose localization is visible on the right part of the figure. Four identical mitogenome haplotypes belonging to C4a1a6 haplogroup appear in all three horizons in four different kurgans, furthermore two identical haplotypes of H40b haplogroup are from the middle horizon and three also identical but different from the previous haplotypes of H40b come from two kurgans of the youngest horizon. Additionally, the five individuals with mtDNA haplogroup N1a1a1a1a are distributed in three kurgans from the oldest and youngest horizons.

Scientific Reports
| (2020) 10:19137 | https://doi.org/10.1038/s41598-020-75910-z www.nature.com/scientificreports/ clustered on one branch, while the five Uyelgi samples from the earliest and latest horizons are located on the adjacent branch. These results signalize indirect connection between these two populations and their common source in agreement with the archaeological chronology of Uyelgi site. The maternal genetic connection of the Cis-Ural region and the Hungarian conquerors is apparent especially on the phylogenetic tree of mitochondrial haplogroup T2b4h, where Bartym2, Bay3 and Hungarian conqueror from Karos site 14 are located on the same branch, moreover, the individuals from Bartym and Karos share the same lineage that is ancestral to the mtDNA lineages of individual from Bayanovo ( Supplementary Fig. S4k). The lineage of Karos sample was determined as of possibly Asian origin 14 , nevertheless, this assumption is revisited by our data, not only by actual phylogenetic connections but due to the recurrent western presence of these lineages even from pre-medieval times.
The described phylogenetic connections of six mitochondrial lineages between Ural region and Hungarian conquerors signal rather close maternal genetic connection between these populations, which is supported by the archaeological findings, as well.

Ancient paternal lineages of the South-Ural region.
Majority of Uyelgi males belong to Y chromosome haplogroup N, and according to combined STR, SNP and network analyses they belong to the same subclade within N-M46 (also known as N-Tat and N1a1-M46 in ISOGG 14.255). N-M46 is geographically widely distributed from East of Siberia to Scandinavia 33 . One of its subclades is N-Z1936 (also known as N3a4 and N1a1a1a1a2 in ISOGG 14.255), which is prominent among Uralic speaking populations, probably originated from the Ural region as well and mainly distributed from the west of Ural Mountains to Scandinavia (Finland). Seven samples of Uyelgi site most probably belong to N-Y24365 (also known as N-B545 and N1a1a1a1a2a1c2 in ISOGG 14.255) under N-Z1936, a specific subclade that can be found almost exclusively in todays' Tatarstan, Bashkortostan and Hungary 17 (ISOGG, Yfull). Median Joining (MJ) network analysis is performed using 238 N-M46 Y-haplotypes including seven samples from Uyelgi detected with 17 STR loci (Fig. 6, Supplementary Table S8) as well as 335 N-M46 Y-haplotypes with 12 STR loci ( Supplementary Fig. S12, Supplementary  Table S8). Based on MJ of 17 Y-STR loci, certain samples show identical or one-step neighbour profiles to Bashkirs, Khantys 17 , Hungarians 34 , Tatars from Volga-Ural region and a Central Russian sample 17 (Fig. 6). The MJ based on 12 Y-STR data shows one-step neighbour connection of Uyelgi with two Hungarian conquerors from Bodrogszerdahely-Bálványhegy and Karos-Eperjesszög 15 (Supplementary Fig. S12). YHRD online database shows further affinities or identities among Finnish, Ural region (Sverdlovsk Oblast) or European Russian region (Penza and Arkhangelsk Oblasts) samples, notably either from territories of Uralic language affinities, where also the Hungarian language belongs to, or along the supposed migration route of early Hungarians.
It is noteworthy that the seventh-century Avar elite males from the Carpathian Basin 35 , in spite of the similar N-M46 frequency to Uyelgi, had a distant subtype (N-F4205, N1a1a1a1a3a in ISOGG 14.255), which is prominent in present-day Mongolic speaking populations around Lake Baikal 33 . Furthermore, they had a fairly different population history than populations of this study, therefore they shall not be confused with each other 35 .

Figure 5.
Phylogenetic tree of mitochondrial haplogroup N1a1. The subhaplogroup N1a1a1a1a was detected in five individuals assigned to two horizons of Uyelgi cemetery: Uyelgi19 and Uyelgi20 from the oldest (ninth century) horizon and the Uyelgi2, Uyelgi12 and Uyelgi13 from the youngest horizon (tenth-eleventh centuries), furthermore, in nine tenth centuries Hungarian conqueror graves from various cemeteries in Hungary, and in one modern Hungarian individual. The Uyelgi branch of the tree is very compact, clearly connects the oldest and youngest horizons together, however, the maternal lineages of the populations from Uyelgi and the Hungarian conquerors are separated (for the abbreviation and further information see Table S7).
Uyelgi4 belongs to G-L1266 (G2a2b2a1a1a1b in ISOGG 14.255), which sublineage is confirmed to be present outside of Europe within the European G-L140 branch of G. Among Hungarian conquerors the presence of G-L30 (G2a2b in ISOGG 14.255) was attested by Neparáczki et al. 16 from Karos II (K2/33) without further classification or STR data, but recently G-L1266 has been confirmed by Fóthi et al. 15 which sample could also be included in our STR analysis. By using 14 STR markers in this case, due to the limitations of the database, MJ network shows a Caucasian affinity of both Hungarian conqueror (RP/2) and Uyelgi individuals ( Supplementary  Fig. S11, Supplementary Table S9), however, neither identity nor monophyly can be observed between them.

Conclusions
The Ural region had an important role in ancient Hungarians' ethnogenesis based on archaeological, linguistic and historical sources, although the results of these research fields exhibit differences of chronological and cultural aspects. The new mitogenome, Y-chromosome and shallow shotgun autosomal DNA sequence data from the South-Urals presented here confirms the region's relevance from population genetic perspective too.
The overall maternal makeup of the investigated 36 samples from the Ural region in a phylogenetic and phylogeographic point of view suggests a mixed characteristic of rather western and rather eastern components, although the paternal lineages are more homogenous with Y-haplogroups typical for the Volga-Ural region. The exact assignment of each mitochondrial haplotype of the Trans-Uralic Uyelgi population to the Eastern and Western Eurasian components is impossible, but comprehensive representatives are present. Mitochondrial haplogroups of European origin N1a1a1a1a and H40b provide a horizon-through success of maternal lineages with inner diversification, which suggests a base population of a rather western characteristics. On the other hand, identical (C4a1a6) or single (A, A12a, C4a2a1) haplotypes with strong eastern phylogeography, highly  Supplementary Table S8).

Scientific Reports
| (2020) 10:19137 | https://doi.org/10.1038/s41598-020-75910-z www.nature.com/scientificreports/ pronounced in the third horizon, suggest a relatively recent admixture to this population. The apparent co-occurrence of genetic and archaeological shift is however contradicted by the homogeneity of ancestry components, nuclear genomic PCA positions, homogeneity of paternal makeup (although this one itself can be explained by patrilocality), and presence of eastern component (C4a1a6) in all horizons. Despite the fact that the genetic contribution of a population related to the Srostki culture cannot be excluded at this level, it is more likely that the majority of eastern components admixed before the usage of the Uyelgi cemetery. The uniparental genetic composition of Uyelgi population signals them as a chronologically and/or geographically related population to the possible genetic source of the Hungarian conquerors. Furthermore, their preliminary autosomal results show that they shared their allele frequency makeup with modern Uralic and West Siberian populations that are linguistically or historically related to Hungarians, which provide a good standpoint for future studies. The maternal phylogenetic connections of Uyelgi with Hungarian conquerors can be divided to indirect (monophyletic but not successive) and direct (identical or one-step neighbour) relationships. Interestingly, indirect connections can be genetically assigned to the western-characteristic base population, whereas direct connections are almost exclusive to the admixed eastern component. One possible explanation for this phenomenon is that Hungarian conquerors and Uyelgi shared common ancestry in the past that separated prior eastern admixture, latter which provided genetic components subsequently to both groups. The exact origin or identification of the eastern component yet to be described, however, nuclear admixture proportions and loose phylogenetic connections points towards Central Asia.
The phylogenetic makeup of Cis-Ural region questions their compactness or successiveness; however, the scarce data does not allow extensive analysis for this group. Hungarian conqueror lineage-based connections here are sporadic, but regional affinity is observable, which is more pronounced in MDS and PCA. Earlier studies based solely on the genetic makeup of Hungarian conquerors tend to connect the non-European lineages to various eastern regions, but especially the presence of rare Eastern Eurasian haplotypes in the Late Iron Age and Early Medieval Cis-Ural group may reshape these conclusions in the future.
In subsequent studies, we plan to extend the presented dataset with high coverage genomic analyses and including samples from further East European cemeteries of ancient Hungarians and their neighbouring communities.

Material and methods
Sampling. Our aim was to collect samples from all available anthropologically well characterised human remains from five cemeteries of the Ural region: from Uyelgi 22 samples, from Bayanovo (Boyanovo) three samples, from Sukhoy Log five samples, from Bartym five samples and from Brody one sample, as well as nine comparative samples from Carpathian Basin (for more information see Supplementary Table S1).
Individuals from the cemeteries Bayanovo, Sukhoy Log, Bartym and Brody were grouped as "Cis-Ural" in the mtDNA population genetic analyses, indicated by the relative geographical proximity (~ 400 km) and archaeological similarities (see Supplementary text). Furthermore, these cemeteries are connected to Hungarian prehistory through various archaeological evidences and historical sources as well 1,3 .
Sample preparation. All procedures were performed in a dedicated ancient DNA laboratory according cleanness recommendations at Laboratory of Archaeogenetics, Institute of Archaeology, Research Centre for the Humanities in Budapest, Hungary. After photo documentation and bleach, samples were UV-C irritated for 30 min per side. Therefore, samples were abraded by using bench-top sandblaster machine with clean sand, followed by additional UV-C exposure procedure for 20 min per side. Cleaned bone samples were grinded into fine powder. Approximately 100 mg (80-120 mg) of powder was collected and processed 13,37 . DNA extraction, library preparation and NGS sequencing. DNA extraction was performed according the protocol of Dabney et al. 38 with minor changes pointed also by Lipson et al. 37 .
For verifying the result of DNA extraction, a test PCR reaction was performed 37 . DNA library preparation with partial uracil-DNA-glycosylate treatment was performed as described at Rohland et al. 39 with minor modifications. Partially double-stranded and barcoded P5 and P7 adapters were used. The libraries were amplified using TwistAMP (TwistDX) in 34.3 µL final volume. Amplification reaction products were purified by AMPure Beads Purification (Agilent).
To capture the target sequences covering whole mitochondrial genome and autosomal SNPs, in solution hybridisation method was used as described by Csáky et al. 35 , Haak et al. 40 and Lipson et al. 37 . The bait production used for capture was performed based on Fu et al. 41 and N. Rohland personal communication, with the exemption that the oligos as a pool was ordered from CustomArray Inc. The raw libraries for shotgun sequencing and the captured samples were indexed using universal iP5 and unique iP7 indexes 42 .
Next generation sequencing was performed on Illumina MiSeq System (Illumina) using V3 (2 × 75 cycles) sequencing kits and custom sequencing setup.
Pre-processing of the Illumina sequence data. Customized in-house analytic pipeline was run on the Illumina sequence data. Paired reads were merged together with SeqPrep master (John JS. SeqPrep. https :// githu b.com/jstjo hn/SeqPr ep), requiring an overlap at least 10 base pairs for capture, and 5 base pairs for shotgun data. For one mismatch, the one with higher base quality was accepted, the overlapping reads with two or more mismatches were discarded. Cutadapt 43 were used to remove barcodes as well as to discard fragments too short (< 15 bp for shotgun and < 20 bp for capture) or/and without barcode. The pre-processed reads were mapped to the reference sequence (GRCh37) using BWA v.0.7.5 44 , with MAPQ of 20, and gap extension of 3 base pairs. These permissive options were considered due to the frequent occurrence of low quality and/or amount of reli- Estimates of contamination. The contamMix 1.0.10 was used to estimate the level of human DNA contamination in the mitochondrial DNA 40,48 . All of our samples show 99% < endogenous content, which makes them eligible for whole genome analyses. For the results see Supplementary Table S2. Spearman rank correlation matrix of F ST values was calculated in Pandas (Python) and visualized in seaborn package by clustermap function using Euclidean metric.
To assess the correlation between genetic and geographic distances for the Hungarian conquerors and two investigated Uralic populations, we performed Mantel test 54 based on population pairwise F ST and linearized Slatkin F ST values using Arlequin 3.5.2.2 50 (Supplementary Table S12).
Principal component analysis was performed based on mtDNA haplogroup frequencies of 64 modern and 50 ancient populations. 32 mitochondrial haplogroups were considered in PCA of ancient populations, while in PCA of modern populations and two ancient populations from Ural region we considered 36 mitochondrial haplogroups (Supplementary Tables S3, S4). The PCAs were carried out using the prcomp function in R 3.4.1 and visualised in a two-dimensional plot with first two (PC1 and PC2) or the first and third principal components (PC1 and PC3) (Fig. 2b, Supplementary Figs. S5, S7).
For hierarchical clustering, Ward type algorithm 55 and Euclidean measurement was conducted based on haplogroup frequencies of ancient and modern populations as well, and displayed as a dendrogram in R3.4.1 ( Supplementary Figs. S6, S8). The same population-pool was used as in PCAs.
Shallow shotgun and captured nuclear DNA sequences were 1 bp trimmed on both ends by trimBam function of bamUtil (https ://genom e.sph.umich .edu/wiki/BamUt il:_trimB am). Genotypes from shotgun data were called for the Human Origin SNP panel by samtools mpileup command (-q30 and -Q30) and by pileupCaller (which is designed to sample alleles from low coverage sequence data, see https ://githu b.com/stsch iff/seque nceTo ols). Methods of genomic PCA are presented in Supplementary Information, Chapter 3. Prior to the ADMIXTURE analysis, we filtered for missing SNPs in the dataset ("-geno 0.999 parameter") and pruned SNPs in strong linkage disequilibrium with each other using the parameters "-indep-pairwise 200 25 0.4" in PLINK 56 , leaving 1,146,167 SNPs. We run unsupervised ADMIXTURE with K = 16 on a "1240k" worldwide dataset (https ://reich .hms.harva rd.edu/datas ets) of published ancient captured/shotgun sequenced and modern deep sequenced genomes 57 . The plotted samples' sources are seen in Supplementary Table S10.
Kinship analyses. We used READ 58 and lcMLkin 59 software on our samples, to assess IBD patterns for nuclear data. The informative positions do not exceed 400 SNPs on pairwise comparisons, which are below of comparable thresholds for each software.
Phylogenetic and network analysis. All available mitochondrial genome sequences in NCBI (more than 33,500) were downloaded and sorted according to their haplogroup assignments. Then multiple alignments for each haplogroup were performed with ClustalO within SeaView 49 . Neighbour Joining (NJ) trees were generated by PHYLIP version 3.6 60 . The phylogenetic trees then were drawn by Figtree version 1.4.2 (https ://tree. bio.ed.ac.uk/softw are/figtr ee). We decided to omit median joining network (MJN) to avoid unresolvable ties and bootstrap calculation due to the low number of substitutions.

Scientific Reports
| (2020) 10:19137 | https://doi.org/10.1038/s41598-020-75910-z www.nature.com/scientificreports/ To analyse the Y-STR variation within the Y-chromosomal haplogroups N1a1-M46 and G2a, Median Joining (MJ) networks were constructed using the Network 5.0 software (https ://www.fluxu s-engin eerin g.com). For the N1a1-M46 Y-haplogroup MJ network calculation with 17 STR loci 238 samples, and for MJ network calculation with 12 STR loci of the same haplogroup, 335 samples of 27 ancient and modern population were included (Supplementary Table S8). The MJ network analysis of G2a Y-haplogroup was calculated based on 14 STR data using 120 samples of 27 populations (Supplementary Table S9). Post processing MP calculation was used, creating network containing all shortest tree. Repeats of the locus DYS389I were subtracted from the DYS389II.

Data availability
The NGS data were uploaded to the repository ENA (European Nucleotide Archive) under Project No. PRJEB39054.