Rare human mitochondrial HV lineages spread from the Near East and Caucasus during post-LGM and Neolithic expansions

Of particular significance to human population history in Eurasia are the migratory events that connected the Near East to Europe after the Last Glacial Maximum (LGM). Utilizing 315 HV*(xH,V) mitogenomes, including 27 contemporary lineages first reported here, we found the genetic signatures for distinctive movements out of the Near East and South Caucasus both westward into Europe and eastward into South Asia. The parallel phylogeographies of rare, yet widely distributed HV*(xH,V) subclades reveal a connection between the Italian Peninsula and South Caucasus, resulting from at least two (post-LGM, Neolithic) waves of migration. Many of these subclades originated in a population ancestral to contemporary Armenians and Assyrians. One such subclade, HV1b-152, supports a postexilic, northern Mesopotamian origin for the Ashkenazi HV1b2 lineages. In agreement with ancient DNA findings, our phylogenetic analysis of HV12 and HV14, the two exclusively Asian subclades of HV*(xH,V), point to the migration of lineages originating in Iran to South Asia before and during the Neolithic period. With HV12 being one of the oldest HV subclades, our results support an origin of HV haplogroup in the region defined by Western Iran, Mesopotamia, and the South Caucasus, where the highest prevalence of HV has been found.

1. In several subhaplogroups, Italian lineages coalesce with the Near East and the Caucasus lineages at the bases of the clades. This pattern is strikingly similar across several pre-Neolithic and Neolithic subclades (HV1a1, HV1a2, HV1b3, HV4a2, HV4b, HV18). 2. Certain HV*(xH,V) subclades are completely absent in Europe. Theses subclades are primarily composed of lineages found in the Near East, the Caucasus and South Asia. The coalescent times of most of these subclades point at their pre-Neolithic (HV12, HV14) and early Neolithic (HV1a3, HV1b1) origins. This category also includes subclades first described in this study (HV1b5, HV12a1a, HV12b1b).
We focus our analysis on these two groups of subclades, as they provide insight into the earliest Europe-Near East and Near East-South Asia branching events within the HV*(xH,V) clade, as well as the time and place of origin of the oldest non-European HV*(xH,V) subclades.
HV1 is one of the most prevalent and most diverse subclades of HV*(xH,V). Previous studies have estimated the age of HV1 at between 17 and 28 kya 7,26,29 , making it one of the oldest subclades of HV. A vast majority of the HV1 lineages belong to HV1a and HV1b subclades, dated similarly at 17 kya 7,29 . Using 40 mitogenomes representing more populations than previous studies, we estimate the age of HV1a at 16.3-18.8 kya (corrected mutation rate) and 11.0-17.4 kya (aDNA-calibrated rates). Of the three HV1a branches, HV1a1 is the oldest. According to the median-joining network analysis, an Armenian mitogenome represents the ancestral node of HV1a1 (Fig. 2). The vast majority of lineages in this subclade belong to the HV1a1a subclade, the exception being four mitogenomes representative of Armenians, Assyrians and Azeris. The ancestral node of HV1a1a is also represented by an Assyrian mitogenome. Notably, an Italian lineage branches off the root of HV1a1a, representing a schism between the ancient Italian HV*(xH,V) lineages, and the rest of lineages descending from the Assyrian mitogenome. Our Bayesian estimation puts the age of this branch at 11.4-15.6 kya.
Within haplogroup HV*(xH,V), HV1a1a is unusual for its extraordinary geographic distribution. Although rare, HV1a1a is present in the Caucasus, the Near East, Central and Northeast Asia, North Africa, and Italy. A previous study designating Siberian-specific HV1a1a1 and HV1a1a2 lineages has speculated about the West Asian origin of these subclades 30 . We were able to identify a sister subclade of HV1a1a1, hereby designated as HV1a1a3, comprising Iranian and Armenian lineages. Sharing the deletion at position 249, the HV1a1a1-HV1a1a3 clade (Supplementary Fig. S1) has an estimated age of 2.9-4.2 kya. We can speculate that the Mongolian/Siberian HV1a1a1 subclade is the result of migrations from Mesopotamia and Iran to Central Asia during the expansion of the Persian Achamenid Empire 2.5 kya 31 . Within HV1a1a, we also introduce an older subclade, HV1a1a4, which represents shared ancestry of its Italian and Yemeni lineages at 6.1-7.9 kya ( Supplementary Fig. S1).
The HV1a2 and HV1a3 subclades are distinctive from HV1a1 by virtue of their predominant presence in Africa and Yemen. The phylogeny of the HV1a2, the larger and older of the two (6.8-12.5 kya), shows some parallels with that of HV1a1: Assyrian, Armenian, and Italian mitogenomes are present, with the latter comprising a deep branch that separates it from the root with eight mutations (Fig. 2).
According to Phylotree Build 17 2 , HV1b contains two main branches: HV1b1, found in Arabia and East Africa, and a larger subclade defined by the T152C! mutation. HV1b-152 is subsequently divided into HV1b2 and HV1b3, present in Eastern Europe and Armenia, respectively. Our Assyrian samples informed the phylogeny of HV1b subclade by revealing four new haplotypes (Fig. 2). Within HV1b-152, we were able to identify two new branches: HV1b4 (defined by mutations T4047C and C10095T) and HV1b5 (defined by C16234T) (Supplementary Fig. S1). With regard to the relationship between Italian and Near Eastern lineages, the phylogeny of HV1b resembles that of previously described HV1a subclades; an Italian lineage forms an extremely deep branch off the root of HV1b (12.0-17.2 kya), and several Assyrian and Armenian mitogenomes form the most ancestral nodes. Curiously, three Sardinian mitogenomes 32 represent a younger branch (8 kya) within the otherwise exclusively Armenian HV1b3 subclade.
Unlike HV1, HV4 is a primarily European clade, with its highest frequencies reported in Eastern Europe. Within HV4, however, HV4a2 and HV4b are known for their different geographic distributions ( Supplementary  Fig. S3). Unlike the exclusively European HV4a1, HV4a2 has been found among Assyrians and Armenians, as well as in Jordan, Egypt and Italy. Here we doubled the number of full mtDNA sequences representing HV4a2 by adding seven new Assyrian mitogenomes. We estimate the age of HV4a2 and the exclusively Assyrian HV4a2a subclades at 8.2-12.5 and 3.3-4.2 kya, respectively. We also identified an Assyrian branch within HV4b, a small subclade previously reported from the Caucasus (Supplementary Fig. S1).
With their territory stretching from the South Caucasus to South Asia, the phylogeography of HV12 and HV14 haplogroups reveal a distinctive pattern among the HV*(xH,V) subclades. Given the ages of these subclades, HV12 and HV14 provide important information about the origin and early expansion of HV*(xH,V). Particularly concerning HV12, previous studies have estimated its origin at 19.4 to 21.3 kya, a short time after the coalescence of the entire HV haplogroup around 21.9 to 28 kya 7,29 . Using a larger number of sequences, including five new Assyrian mitogenomes, we estimate the age of HV12 at a similar 19.2 kya (ρ statistics, corrected mutation rate), although Bayesian analysis suggests a younger (16.6 kya) age. The median-joining network (Fig. 3) illustrates a pattern of geographic separation between the two main branches of HV12. HV12a, the older subclade, is exclusive to South Caucasus and Anatolia. HV12b is the younger branch (8.8-14.0 kya) and has been found in Iran, India, and sporadically as far as Central and Southeast Asia. A lineage branching off the root of HV12 consists of a single Qashqai mitogenome. Nomadic pastoralists of southern Iran, Qashqais are the Turkic-speaking people who previously resided in the Iranian section of the South Caucasus 33 .
A rare haplogroup, HV14 has been previously reported from Iran, India, Sri Lanka and Yemen, as well as one individual from ancient Turkmenistan (4.5 kya) 22 . A subclade of this haplogroup (HV14a1) is surprisingly prevalent (5%) in the southernmost states of India as well as Sri Lanka, with its age estimated at 11 kya 34  www.nature.com/scientificreports www.nature.com/scientificreports/ of only eight HV14 mitogenomes. We estimate the age of HV14 haplogroup at 12.1-13.4 kya (corrected mutation rate) or 8.4-12.6 kya (aDNA-calibrated rates). Although the small number of available mitogenomes makes any conclusive interpretation difficult, the phylogeography of HV14 nevertheless suggests southern Iran as a likely point of origin for this haplogroup. Moreover, two highly differentiated HV14 mitogenomes reported from a province in southern Iran (Kerman) suggest an old age for HV14 in this region. Additionally, one of the Iranian mitogenomes and the ancient Turkmenistan individual form together a "Pre-HV14" subclade ( Supplementary  Fig. S2) that had reached Central Asia 4.5 kya 22 .
The Near Eastern and South Asian dispersion of HV14 is particularly curious, given that it is a subclade of the otherwise European HV-16311. Furthermore, one ancestral lineage of HV14 in fact lacks the haplogroup-defining T16311C mutation. With T16311C being highly recurrent, we examined the alternative phylogeny of HV14 as a clade independent from HV-16311. Utilizing all available HV-16311 mitogenomes, a total of 135, our analysis did not provide evidence against the current position of HV14 as a subclade of HV-16311 ( Supplementary Fig. S2,  Fig. 4). Given the phylogeography of HV-16311 and the age estimates, it is likely that HV-16311 first emerged in Anatolia around 17 kya. While most of HV-16311 lineages soon expanded westward into Europe, probably during www.nature.com/scientificreports www.nature.com/scientificreports/ the post-LGM and Neolithic migrations, an eastward migration introduced the HV14 or its precursors to Iranian plateau by 12 kya.
Revisiting the phylogeography of HV*(xH,V) in the light of the aDNA, archaeological, and paleoclimatological discoveries, our findings are as follows:

HV*(xH,V) subclades shared between Italy and the Near East characterize two distinct migratory events.
a. Post-Glacial migration from the Near East to Europe: Parallel phylogenies of HV1a1, HV1a2, HV1b and HV4b point to a division of exclusively Italian lineages from the Caucasus/Near East lineages no later than 12 kya. The phylogeography of these subclades ( Supplementary Fig. S1) suggests an expansion from the South Caucasus or northern Mesopotamia into Europe, as well as to Africa and Asia. Our proposed ages of 12-16 kya for branching of the oldest European HV*(xH,V) lineages from the Caucasus/Near East lineages correlate with the start of the first post-LGM warm period in Europe around 14.7 kya 35 . Studies of other mtDNA haplogroups have provided substantial evidence for a migratory event of similar age being responsible for introducing a variety of Near Eastern haplogroups (I, W, J-T) to Europe 36,37 . The aDNA evidence also reveals a major change in the mitochondrial makeup of Europe around 14.5 kya 38 , although to date HV*(xH,V) has not been found in pre-Neolithic Europe or Near Eastern sites. We attribute the lack of aDNA evidence to the small population size of HV*(xH,V) clade before the Neolithic period (Supplementary Fig. S4), and predict that further study of pre-Neolithic sites in Southern Europe, especially Italy, will reveal the presence of HV*(xH,V) lineages, likely from HV1a, HV1b and HV4b subclades. Given the significance of South Caucasus in phylogeography of HV*(xH,V), it is noteworthy that genomic similarities have been observed between the Caucasus "Satsurblia" individuals (10-13 kya) and the "Villabruna Cluster" Europeans (7-14 kya) 39 . It should be noted that these results do not necessarily indicate a relationship between these two groups, and data can also be explained by population structure. b. Neolithic expansion into Europe: Several HV*(xH,V) subclades, including HV1b3 and HV18, point at an early Neolithic connection between Europe and the Near East. More specifically, these subclades represent branching events at 7-11 kya between Italian and South Caucasus lineages (Supplementary Fig. S1). Ancient DNA studies have demonstrated that the first European farmers were descended from Neolithic northwestern Anatolians and Aegeans [40][41][42] , and that the Neolithic Western Iranians are not ancestral to Neolithic Europeans 43,44 . That said, the possible role of intermediate southeastern and central Anatolian populations in Neolithic migration to Europe has not been ruled out 44,45 and a recent study has revealed a secondary source of ancestry for Neolithic Europeans 42 . With its Caucasus Hunter Gatherer (CHG)-rich ancestry, this source population is likely related to Neolithic Central or eastern Anatolia. Given the heavy presence of closely related Armenian and Assyrian lineages near the root of these HV*(xH,V) subclades, it is plausible that these lineages were introduced to Europe by Neolithic populations from southeastern or central Anatolia that are yet to be represented in aDNA studies.  comprises two major branches (Fig. 3). The eastern branches of these haplogroups, HV12b and HV14a1, split from their Near Eastern bases 8.8-14.0 and 5.0-10.6 kya, respectively. Comprising very deep branches of Indian and Iranian lineages, HV12b is unique among HV*(xH,V) subclades in characterizing a pre-or early Neolithic expansion of this haplogroup eastward of Iranian plateau. We also identified a new branch within HV12b; originated 10-13.4 kya, HV12b1b connects Iran to Central (Altai Kazakh) and Southeast Asia (Myanmar). In case of HV14, due to the prevalence of HV14a1 among Dravidian-speaking populations of southern India and Sri Lanka, it has been suggested that this subclade represents the migration of proto-Dravidian people from the Near East to South Asia around 10 kya 34 . That said, the origin of Dravidian languages and the possibility of a Near Eastern connection is highly debated 48 . Nevertheless, the phylogeography of HV14 points to an origin in Iran, and the age of HV14a1 in India suggest its presence in South Asia prior to the Bronze Age expansion of Indo-Aryan languages.
HV1b2 is an Ashkenazi Jewish subclade with links to Northern Mesopotamia. The new lineages introduced in this study have implications on the possible origin of HV1b2, a star-like subclade of HV1b-152 that is mostly comprised of Ashkenazi Jewish lineages 49 (Fig. 2). Our updated phylogeography of HV1b-152 suggests its pre-Neolithic origin in South Caucasus or northern Mesopotamia, with HV1b2 positioned close to two Assyrian branches. Given our age estimates for HV1b2, it is conceivable that this clade originated among the displaced Jewish communities during Assyrian and Babylonian captivities (2.5-2.7 kya), and likely remained exclusive to the Jewish populations that settled in Upper Mesopotamia, including Adiabene and Osroene Kingdoms, up to 1.7 kya 50 . Perhaps a similar course of events has resulted in 58% of Georgian Jews belonging to a single haplotype of HV1a1a 51 , a subclade also known to be present among Assyrians, Armenians and Georgians. Being phylogenetically independent from nuclear DNA, mtDNA is particularly valuable when studying complex demographic events, such as sexually biased migrations and unilateral population structuring 52,53 . A growing number of mitogenomic studies prove the effectiveness of this approach, especially when analysis focuses on a specific subclade [54][55][56][57] . Given the underrepresentation of most Near Eastern populations in genomic studies, the relative wealth of full mitogenome data provides a great opportunity to study the complex and long history of human dispersal in Near East. In case of HV*(xH,V), although relatively uncommon, this haplogroup is highly informative to study of prehistoric migrations that connected the Near East to Europe and South Asia. The parallel phylogenies of several HV*(xH,V) subclades reveal a connection between the Italian Peninsula and the South Caucasus populations, likely resulting from at least two (post-LGM, Neolithic) waves of migrations. These findings add to aDNA evidence suggesting a secondary, CHG-rich source of ancestry for Neolithic Europeans. Similarly, the eastern subclades of HV*(xH,V) provide insight into the ancient migratory events between Near East and South Asia.
While the accumulation of full mitogenomes in the past decade has largely advanced our understanding of the phylogeography of major haplogroups, the bulk of mitogenomes comes from studies of European populations, both contemporary and ancient. Given the abundance of novel mtDNA lineages discovered by every study of Near Eastern populations, further research in this region is essential to better understanding of the origin and dispersal of West Eurasian mtDNA clades, and of important demographic events that have shaped the genetic structure of West Eurasian populations.

Materials and Methods
Participants were selected from Assyrian residents of the United States who could trace their maternal lineages back for at least three generations, and were able to identify their ancestral town or village. All participants provided their written informed consent as per the consent form approved by Binghamton University's Human Subjects Research Review Committee (protocol number: 3000-13). All methods were performed in accordance with the Binghamton University's Human Subjects Research Review Committee. Buccal samples were collected from the volunteers using Oragene OGR-500 Saliva Collection Kit (DNA Genotek, Canada) and DNA was extracted according to the manufacturer' instructions. Prior to high-throughput sequencing, mtDNA content of samples was enriched using a long-PCR method with a novel set of primers that amplified mtDNA in two segments of 8,476 bp (primers 2508F-CATCACCTCTAGCATCACCAGT and 10983R-AGGGGTAGGAGTCAGGTAGTT) and 8,460 bp (primers 10837F-CACAACCACCCACAGCCTAAT and 2727R-GGTCTTCTCGTCTTGCTGTGT). These primers were designed with high specificity, in order to avoid amplification of nuclear DNA, particularly mtDNA pseudo-genes. The two amplicons targeted by these primers also have a smaller overlap compared with previous mtDNA long-PCR methods 14,58,59 in order to improve evenly distributed coverage during sequencing. The optimized conditions of two separate reactions were as following: TaKaRa (Clontech) LA PCR buffer II with 25 mM Mg2+ (5 μL), TaKaRa 10 mM dNTP mixture (8 μL), TaKaRa LA Taq polymerase (2.5 units), primers (1.25 μL of each), and dH20 to reach a final volume of 50 microliters.
Libraries were prepared using Nextera ® XT kits (Illumina, Inc.) and sequencing was carried out on Illumina's MiSeq benchtop sequencer using the v2 500-cycle reagent kit. Trim Galore v0.4.0 software (Babraham Informatics) was used for quality filtering (PHRED >30) of reads and trimming of adapters. The paired-end reads were mapped using BWA MEM 60 . Further processing of SAM files was carried out utilizing Samtools 61 along with bcftools and vcfutils 62 . Haplotypes were manually verified using Integrative Genomics Viewer (IGV) v2.3.72 63 . Full mitogenome sequences were aligned using MAFFT v7 64 . Mutations were defined relative to the rCRS, and haplogroups were assigned following PhyloTree Build 17 2 . Network 5.0.0.1 program (Fluxus Technology Ltd, fluxus-engineering.com) was used to construct Median-joining networks 65 . Two different approaches were taken to estimating the coalescence times: (ρ) statistic 66 , and the Bayesian MCMC as implemented in Program BEAST2 67 . For both approaches, times were calculated based on the corrected molecular clock mutation rate 26 , as well as mutation rates calibrated with ancient mitochondrial genomes 27,28 . For Bayesian analysis, jModelTest v2.1 68 determined the Hasegawa Kishino-Yano (HKY) 69 as the best substitution model for the sample. The Strict