Y-chromosome haplogroups from Hun, Avar and conquering Hungarian period nomadic people of the Carpathian Basin

Hun, Avar and conquering Hungarian nomadic groups arrived to the Carpathian Basin from the Eurasian Steppes and significantly influenced its political and ethnical landscape, however their origin remains largely unknown. In order to shed light on the genetic affinity of above groups we have determined Y chromosomal haplogroups and autosomal loci, suitable to predict biogeographic ancestry, from 49 individuals, supposed to represent the power/military elit. Haplogroups from the Hun-age are consistent with Xiongnu ancestry of European Huns. Most of the Avar-age individuals carry east Eurasian Y haplogroups typical for modern north-eastern Siberian and Buryat populations and their autosomal loci indicate mostly un-admixed Asian characteristics. In contrast the conquering Hungarians seem to be a recently assembled population incorporating un-admixed European, Asian as well as admixed components. Their heterogeneous paternal and maternal lineages indicate similar supposed phylogeographic origin of males and females, derived from Central-Inner Asian and European Pontic Steppe sources.


Results
We selected 168 phylogenetically informative Y chromosome SNP-s 13 defining all major Hg-s and the most frequent Eurasian sub-Hg-s, as well as the following 61autosomal SNP-s: 25 HirisPlex markers suitable for eye and hair color prediction 14 , two SNP-s linked to adult lactase persistence 15 and 34 ancestry informative markers (AIMs) 16 , as listed in Supplementary Tables S1 and S2. DNA segments containing above SNP-s were enriched from the next generation sequencing (NGS) library of each sample with hybridization capture. Anthropological males were selected from the cemeteries, but as control we also included four females for which the presence of Y-chromosomal reads can reveal contamination. Analysis of women control samples revealed low contamination, having negligible Y-chromosomal reads but comparable autosomal reads to males (Supplementary Tables S1 and  S2) with one exception, KEF1/10936 genetically turned out to be a male despite its anthropological description as female (Supplementary Table S6). Negligible contamination levels for 26 of the 49 libraries had been demonstrated before 10 and low contamination is also inferred from unambiguous Hg classifications with very few contradicting SNP-s, most of which can be explained by postmortem ancient DNA modifications (Supplementary Table S1). Besides, transition patterns of DNA molecules and library fragment size distributions were typical for ancient DNA and we also estimated low contamination by calculating the proportion of reads which did not correspond to the consensus sequence in diagnostic positions (Supplementary Table S3). Moreover we detected a significant number of east Eurasian Hg-s which had a negligible chance to have been derived from a contaminating source of European researchers. We obtained low to medium average coverage of the targeted loci, (Supplementary Tables S1, S2 and S3) and Y-Hg determination was consistent from 46 males (Supplementary Table S1) which is summarized in Fig. 1 and phylogeographic distribution of Conqueror lineages is displayed on Fig. 2. by the Hg of our Hun/1 sample, derived from Transylvania. One Conqueror sample KEF1/10936 belongs to Q1a-F1096, possibly to the Q1a1-F746 sister clade of Q1a2-M25, which was not tested in our experiment. Q1a1 is present in East Asia; Mongolia, Japan, China, Korea and its presence in the Kenézlő (KEF) graveyard imply a common substrate of Huns and Conquerors.
Hun/3 belongs to Hg R1a1a1b2a2-Z2124, a subclade of R1a1a1b2-Z93, the east Eurasian subbranch of R1a. Today Z2124 is most frequent in Kyrgyzstan and Afghanistan, but is also widespread among Karachai-Balkars and Baskhirs 18 . Z2124 was widespread on the Bronze Age steppe, especially in the Afanasievo and Sintashta cultures 19 and R1a detected in Xiongnus 20,21 very likely belong to the same branch. Two samples from the Karos Conqueror cemeteries (K1/3286 and K2/61) were also classified as R1a-Z2124 and two Avar age individuals (DK/701 and MM/227) belong to the same R1a1a1b2a-Z94 branch but marker Z2124 was not covered in latter samples.
Two Avar samples belonged to Hg C2-M217, which is found mostly in Central Asia and Eastern Siberia. Presence of this Hg had also been detected in the Conquerors, as the Karos2/60 individual belongs to C2 (Fóthi unpublished).
All N-Hg-s identified in the Avars and Conquerors belonged to N1a1a-M178. We have tested 7 subclades of M178; N1a1a2-B187, N1a1a1a2-B211, N1a1a1a1a3-B197, N1a1a1a1a4-M2118, N1a1a1a1a1a-VL29, N1a1a1a1a2-Z1936 and the N1a1a1a1a2a1c1-L1034 subbranch of Z1936. The European subclades VL29 and Z1936 could be excluded in most cases, while the rest of the suclades are prevalent in Siberia 22 from where this Hg dispersed in a counter-clockwise migratory route to Europe 23 . Avar sample MM/58, did not go into any of the tested M178 subclades, while only N1a1a2 could be excluded for the KB/300 Avar khagan due to low coverage. All the 5 other Avar samples belonged to N1a1a1a1a3-B197, which is most prevalent in Chukchi, Buryats, Eskimos, Koryaks and appears among Tuvans and Mongols with lower frequency 22  West Eurasian Hg-s. The west Eurasian R1a1a1b1a2b-CTS1211 subclade of R1a is most frequent in Eastern Europe especially among Slavic people. This Hg was detected just in the Conqueror group (K2/18, K2/41 and K1/10). Though CTS1211 was not covered in K2/36 but it may also belong to this sub-branch of Z283.
Hg I2a1a2b-L621 was present in 5 Conqueror samples, and a 6 th sample form Magyarhomorog (MH/9) most likely also belongs here, as MH/9 is a likely kin of MH/16 (see below). This Hg of European origin is most prominent in the Balkans and Eastern Europe, especially among Slavic speaking groups. It might have been a major lineage of the Cucuteni-Trypillian culture and it was present in the Baden culture of the Calcholitic Carpathian Basin 24 .
I1-M253, identified from one Conqueror sample is a northern European Hg found mostly in Scandinavia and Finland and might have originated from this region during the Mesolithic. It has somewhat similar distribution to R1b-U106 associated with Germanic speaking populations.
Three out of 4 samples in the small Karos3 cemetery belonged to Hg R1b1a1b1a1a1-U106 setting apart this cemetery from all other groups, except for the Hun/2 sample which is the only other one with this Hg. Hg U106 is considered a "Germanic" branch as it is most significant today in Germany, Scandinavia, and Britain, and rare in Eastern Europe (Supplementary Table S4). Its ancestral branch Hg R1b1a1b-M262 is assumed to have emerged www.nature.com/scientificreports www.nature.com/scientificreports/ in the Pontic-Caspian Steppe and arrived to Europe with Bronze Age migrations 25 . Its presence in Hun and Conqueror samples may derive from Goths, Gepids or other German allies of the Huns.
We detected R1b1a1b1a1a2b-U152 in one sample from the Sárrétudvari Conqueror cemetery, which represent rather commoners than the elite. U152 is the Italo-Celtic R1b branch, concentrated around the Alps, and which was present in the Carpathian Basin before the conquer, so did not necessarily arrived with the Conquerors.
The mediterranean haplogroup E1b1b1a1b1a-V13 was detected in an Avar (SzK/239) and a Conqueror (K2/6) sample, while this marker was not covered in another sample (K1/13, E1b1b-M215). This Hg originated in the Middle East and migrated to the Balkans and Western Asia during the Bronze Age.
The PSZ/1 Avar leader belongs to Hg G2a-P15, while K2/33 to the G2a2b-L30 subbranch. Hg G2a is originated from Anatolia/Iran, now it is most common in the Caucasus region and its arrival to Europe is associated with the spread of Neolithic farmers 26 .
One Conqueror sample belongs to Hg J1-M267 and another to J2a1a-L26. Both J1 and J2 lineages are most frequent around the Middle East-Caucasus and probably originated from this region 27 , then expanded with pastorists prior to the Neolithic.
Autosomal SNP-s. We have predicted eye, hair and skin color phenotypes from 25 HirisPlex SNP-s, also suitable to predict non-European ancestry 14 , as summarized in Figs 3 and 4. Samples from different archaeological cultures and cemeteries showed a remarkable pattern of phenotypic distribution. All Hun and Avar  Table S2), gray shaded values predict non-European ancestry. Most likely origin of individuals was calculated from 34 AIMs (Supplementary Table S2) with two methods; multinomial logistic regression and naive Bayesian classifier (supposing Hardy-Weinberg equilibrium). "Cannot be classified" indicates that maximum probabilities of most likely ancestral origin are below 0.34. Names of individuals carrying SNP variants associated with lactase persistence are highlighted with green. Abbreviations are the following: NA = lack of data, EU = Europe, EA = East Asia AF = Africa. www.nature.com/scientificreports www.nature.com/scientificreports/ age samples had inherently dark eye/hair colors, DK/701 being the only exception (Fig. 3). Moreover 6/14 Avar age samples were characterized with >0,7 black hair; >0,99 brown eye p-values, inferring 86,5% probability of non-European biogeographic ancestry 14 in agreement with their anthropological, archaeological and historical evaluation. In contrast the Conquerors showed a wide variety of phenotypes clustered by cemeteries (Fig. 4). All individuals from the Sárrétudvari (SH), Magyarhomorog (MH) and majority from the Kenézlő (KEF) graveyards displayed European phenotypic patterns; blue eye and/or light hair with pale skin. In the three Karos cemeteries darker eye/hair colors predominated, 4/20 individuals having p-values consistent with non-European origin, www.nature.com/scientificreports www.nature.com/scientificreports/ nevertheless 5/20 individuals had light hair color indicating a rather mixed origin of this population, concurrent with their mtDNA and Y chromosomal Hg composition.
We have also determined 34 AIMs, which can inform about the likely geographic origin of individuals, and used the Snipper App suite version 2.5 portal 16 to assign biogeographic ancestry. We predicted ancestry with both the naive Bayesian classifier and multinomial logistic regression (MLR) algorithms, as these make different assumptions about genetic equilibrium 28 , and listed the results on Figs 3 and 4. The AIM-s results fairly matched and complemented phenotypic information. All Hun age individuals revealed admixture derived from European and East Asian ancestors, while 8/15 Avar age individuals showed predominantly East Asian origin with both methods, 4 individuals were definitely European, while two showed evidence of admixture. The KFP/31 sample gave contradicting results due to low coverage.
Conqueror samples from the Magyarhomorog (MH) and Sárrétudvari (SH) cemeteries showed mostly European ancestry in agreement with their phenotypes and Y Hg-s, though MLR detected a significant east Asian ancestry component and the SH/103 women was classified east Asian despite her blond hair. The Karos (K) and Kenézlő (KEF) populations were profoundly admixed, comprising individuals of purely East Asian, European and mixed origin in nearly identical proportions, again in agreement with results obtained from uniparental and phenotypic markers.
The determined variable autosomal loci are also suitable to exclude possible direct (parent-child, sibling) genetic relatedness 29 , thus we compared the autosomal genotypes of all individuals sharing either maternal or paternal Hg-s. Direct kinship could not be excluded between the MH/9 and MH/16 individuals with identical phenotypes, suggesting that MH/9 probably also belongs to Hg I2a1a2b-L621, despite its uncovered L621 marker. KEF2per1027 and KEF2per1045 were probably brothers as they had identical mitogenomes, Y Hg-s and blue eye color besides sharing autosomal alleles. The same applies to K3/1 and K3/13 individuals who were probably also brothers. We could not exclude possible direct paternal relationship between K2/36, K2/18 and K2/41 but the first two samples had unsatisfactory coverage to make a strong statement.
We have tested two SNP-s (rs4988235 and rs182549) associated with the adult lactase persistence phenotype in Europe 15 . Individuals carrying derived alleles in these loci are able to digest lactose in dairy products during adulthood without symptoms of lactose intolerance. Allele frequency of the persistence genotype varies throughout Eurasia 30 , reaching above 90% at some parts of Northwestern Europe, around 80% in present day Hungary, but drops below 30% in Central Asia and even lower in East Asia. We detected the derived persistence allele in all of the studied groups (Figs 3 and 4, Supplementary Table S2); 1/3 of the Hun period individuals, 2/14 of the Avar period individuals, and 6/31 of the Conqueror period individuals carried persistence alleles. It is remarkable that the persistence genotype seems to be associated with European origin, as all of the carriers were predicted to have predominantly European ancestors. This is in agreement with a previous study 31 , which found that all of the 11% Conqueror samples with persistence genotype carried European mtDNA Hg H. In addition all carriers were heterozygous, 6 of them for both SNP-s, but three of them carried just the rs182549 (-22018 G > A) derived allele suggesting previous admixture with non-carriers, possibly derived from East Eurasia.
Population genetic analysis. The studied Conqueror group very likely represent real populations, as 24 of the 29 samples came from 4 nearby cemeteries (Karos 1,2,3 and Kenézlő) with identical archaeological and anthropological groupings and the studied Magyarhomorog individuals were also categorized as belonging to the same early Conqueror elite. The Avar group was assembled from several different cemeteries of a wider timespan, thus they cannot represent the Avar period population of the Carpathian Basin, however their relatively homogenous Hg distribution indicate that the Avar elite embodied the same east Eurasian sub-population throughout their reign, so it appeared to be meaningful to include them in the analysis.
In order to find the most similar populations to our studied ones, we compared the Hg distribution of the Avar and Conqueror elite to that of 78 modern Eurasian populations (Supplementary Table S4) and represented their relations on MDS plot displayed on Fig. 5. Only one individual was included in the analysis from first degree relatives. In order to increase resolution, we separately calculated the frequencies of each sub-Hg-s which are present in our samples, while frequencies of all other subclades were combined as listed in Supplementary Table S4 In order to better discern the position of the Conquerors we redraw the MDS plot without the eastern Asian and Siberian populations (Fig. 6). Though having removed subset of the data MDS imaging in two-dimensions inevitably rearranged the remaining components, but the general organization of the population clusters remained unchanged: Western-Northern Europeans map to the left, Central-eastern European and Balkan people around the middle, Central Asians to the top right, and Caucasus-Middle East populations to bottom right. Conquerors remained between Central Asians and eastern Europeans, with smallest weighted Euclidean distances from Bashkirs (BSH), Hungarians (HUN), Tajiks (TJK), Estonians (EST), Kazakhs (KAZ), Uzbeks (UZB) and Slovaks (SVK).
The considerable frequency of Hg N1a in Conquerors and especially in Avars facilitates another analysis in which the frequency distribution of their N1a subbranches can be compared to that of all Eurasian populations carrying this Hg, which is described in 22 . The N1a database of 22

Discussion
The origin and composition of the Conqueror paternal lineages fairly mirrors that of their maternal ones 10 ; 20,7% of the Y-Hg-s originated from East Eurasia, this value is 30,4% for mtDNA; proportion of west Eurasian paternal lineages is 69% compared to 58,8% for mtDNA; while proportion of lineages with north-western European and Caucasus-Middle East origin are nearly the same affirming that both males and females of similar origin migrated together. Both MDS analysis of the entire Conqueror Y chromosome pool and PCA of their N1a lineages indicates that their admixture sources are found among Central Asians and eastern European Pontic Steppe groups, a finding comparable to what had been described for maternal lineages 10 . Composition of the Conqueror paternal lineages is very similar to that of Baskhirs, while their maternal composition was found most similar to Volga Tatars 10 . These modern populations are located next to each other, have similar prehistory 32 and genetic structure  www.nature.com/scientificreports www.nature.com/scientificreports/ derived from the same admixture sources detected in the Conquerors. Moreover it must be noted that Bashkirs were not represented in the mitogenome database while Volga Tatars are missing from our Y-chromosome database due to lack of data, but their N1a distribution is quite similar (Fig. 7), thus mtDNA results are in accord with Y-chromosomal ones.
The Conqueror-Bashkir relations are also supported by historical sources, as early Hungarians of the Carpathian Basin were reported to be identical to Baskhirs by Arabic historians like al-Masudi, al-Qazwini, al-Balhi, al-Istahri and Abu Hamid al-Garnati 33 , latter visited both groups at the same time around 1150 AD and used the term Bashgird to refer to the Hungarians in the Carpathian Basin. In addition parallels were found between several Conqueror and Bashkir tribe names and Bashkiria has been identified with Magna Hungaria, the motherland of Conquerors 34 .
We identified potential relatives within Conqueror cemeteries but not between them. The uniform paternal lineages of the small Karos3 (19 graves) and Magyarhomorog (17 graves) cemeteries approve patrilineal organization of these communities. The identical I2a1a2b Hg-s of Magyarhomorog individuals appears to be frequent among high-ranking Conquerors, as the most distinguished graves in the Karos2 and 3 cemeteries also belong to this lineage. The Karos2 and Karos3 leaders were brothers with identical mitogenomes 10 and Y-chromosomal STR profiles (Fóthi unpublished). The Sárrétudvari commoner cemetery seems distinct from the others, containing other sorts of European Hg-s. Available Y-chromosomal and mtDNA data 10 from this cemetery suggest that common people of the 10 th century rather represented resident population than newcomers. The great diversity of Y Hg-s, mtDNA Hg-s, phenotypes and predicted biogeographic classifications of the Conquerors indicate that they were relatively recently associated from very diverse populations.
In contrast the studied Avar military leader group had a much more uniform origin. The Avar group carried predominantly East Eurasian lineages in accordance with their known Inner Asian origin inferred from archaeological and anthropological parallels as well as historical sources. However the unanticipated prevalence of their Siberian N1a Hg-s, sheds new light on their prehistory. Accepting their presumed Rouran origin would implicate a ruling class with Siberian ancestry in Inner Asia before Turkic take-over. The surprisingly high frequency of N1a1a1a1a3 Hg reveals that ancestors of contemporary eastern Siberians and Buryats could give a considerable part the Rouran and Avar elite, nevertheless a larger sample size from more Avar cemeteries are needed to clarify their exact composition.
The genetic profile of the Avar and Conqueror leader groups seems considerably different, as latter group is distinguished by the significant presence of European Hg-s; I2a1a2b-L621, R1b1a1b1a1a1-U106 and the Finno-Permic N1a1a1a1a2-Z1936 branch. Their Siberian N1a1a1a1a4 subclade also points at different source populations among ancestors of Yakuts, Evenks and Evens. Nevertheless the east Eurasian R1a subclade, R1a1a1b2a-Z94 seems to be a common element of the Hun, Avar and Conqueror elite. In contrast to Avars, all three Hun lineages have parallels among the Conquerors, but strong inferences cannot be drawn due to small sample size.
It is generally accepted that the Hungarian language was brought to the Carpathian Basin by the Conquerors. Uralic speaking populations are characterized by a high frequency of Y-Hg N, which have often been interpreted as a genetic signal of shared ancestry. Indeed, recently a distinct shared ancestry component of likely Siberian origin was identified at the genomic level in these populations, modern Hungarians being a puzzling exception 35   www.nature.com/scientificreports www.nature.com/scientificreports/ of the examined Conquerors belonged to the L1034 subclade of Z1936, while all of the Khanty Z1936 lineages reported in 36 proved to be L1034 which has not been tested in the 22 study. Population genetic data rather position the Conqueror elite among Turkic groups, Bashkirs and Volga Tatars, in agreement with contemporary historical accounts which denominated the Conquerors as "Turks" 37 . This does not exclude the possibility that the Hungarian language could also have been present in the obviously very heterogeneous, probably multiethnic Conqueror tribal alliance.

Methods
Ancient DNA work was performed in the specialized ancient DNA (aDNA) facilities of the Department of Genetics, University of Szeged, Hungary. Petrous bones were used for DNA extraction from each sample. All libraries were made from partial uracil-DNA-glycosylase (UDG) treated DNA extracts. Details of the aDNA extraction, NGS library construction, sequencing and sequence analysis methods are given in 10,38 . For designing oligonucleotide probes we selected 120 nucleotide (nt) long sequences around each targeted SNP (listed in Supplementary Tables S1 and S2), with the SNP in central position, and designed 80 nt long oligos with 3x tiling and 60 nt overlap. Selected sequences were sent to MYcroarray (Ann Arbor, MI, USA, now Arbor Biosciences) for quality control. Sequences with acceptable quality were used to synthetize a custom biotinylated MYbaits RNA oligonucleotide probe set at MYcroarray.
We tested the endogenous DNA content of each library with low coverage shotgun sequencing (Supplementary Hybridization was done at 65 °C for 72 hours, then bead-bound enriched libraries were resuspended in 24 µl water and released from the beads in a 50 µl PCR reaction containing 1 X KAPA HiFi HotStart ReadyMix and 1000 nM each of P5 forward and P7 reverse indexing primers. PCR conditions were: 98 °C 2 min, 16 cycles of 98 °C 20 sec, 60 °C 30 sec, 72 °C 30 sec, followed by a final extension of 72 °C 1 min. The captured and amplified library was purified on MinElute column and eluted in 16 µl EB. Quantity and quality measurements were performed using Qubit and TapeStation (Agilent).
Sequencing was done on Illumina MiSeq platform (2 × 150 bp paired end reads) or NextSeq platform (1 × 150 bp single end reads). Adapters were trimmed with the cutadapt software 40 in paired or single end read mode. Read quality was assessed with FastQC 41 . Sequences shorter than 25 nucleotide were removed from the dataset, the resulting reads were mapped to the GRCh37.75 human genome reference sequence using the Burrows Wheeler Aligner (BWA) v0.7.9 software 42 with the BWA mem algorithm and default parameters. In case the read counts were insufficient, according to duplicate statistics and library complexity, new library preparation and/or new NGS sequencing were performed and the aligned BAM files from multiple runs were merged by the samtools (version 1.3.1) merge algorithm 43 . Duplicates were removed by PICARD tools (version 1.113) 44 MarkDuplicates algorithm and <MAPQ 50 reads were excluded from downstream analysis. The SNPs were genotyped by freebayes (version 1.2.0) [http://arxiv.org/abs/1207.3907] using the "-variant-input" parameter with a VCF file and the "-t" flag with a bed file containing the coordinates of the SNPs. For each samples the reference and alternate allele counts for each positions were exported to a joint XLS file, and variants with low allele counts were also manually evaluated in IGV.
Five N1a Y-chromosomal sub-Hg-s were determined by amplicon sequencing. Annealing temperatures were optimized for each primers and non-template PCR controls were done for each reaction. PCR fragments were purified from agarose gels and sequenced on Ion Torrent Personal Genome Machine (PGM). Primers with amplified Hg defining markers, annealing temperatures and product length are listed below: