East Eurasian ancestry in the middle of Europe: genetic footprints of Steppe nomads in the genomes of Belarusian Lipka Tatars

Medieval era encounters of nomadic groups of the Eurasian Steppe and largely sedentary East Europeans had a variety of demographic and cultural consequences. Amongst these outcomes was the emergence of the Lipka Tatars—a Slavic-speaking Sunni-Muslim minority residing in modern Belarus, Lithuania and Poland, whose ancestors arrived in these territories via several migration waves, mainly from the Golden Horde. Our results show that Belarusian Lipka Tatars share a substantial part of their gene pool with Europeans as indicated by their Y-chromosomal, mitochondrial and autosomal DNA variation. Nevertheless, Belarusian Lipkas still retain a strong genetic signal of their nomadic ancestry, witnessed by the presence of common Y-chromosomal and mitochondrial DNA variants as well as autosomal segments identical by descent between Lipkas and East Eurasians from temperate and northern regions. Hence, we document Lipka Tatars as a unique example of former Medieval migrants into Central Europe, who became sedentary, changed language to Slavic, yet preserved their faith and retained, both uni- and bi-parentally, a clear genetic echo of a complex population interplay throughout the Eurasian Steppe Belt, extending from Central Europe to northern China.


Supplementary Information Text (Linguistics) Lipka Tatars by Alexei Kassian
It is likely that Turkic-speaking Muslims, whose descendants are known as Lipka Tatars  The Lipka Tatars were gradually giving up their Turkic language, having shifted to the Slavic languages of surrounding communities: normally to Old Belarusian, sometimes to Polish. As follows from direct textual evidence, in the middle of the 16 th century, a substantial part of Lipka Tatars has already abandoned their original language in favor of Belarusian and Polish (Dubiński 1972;Antonovich 1968: 10). It is probable that the original Turkic language (or languages?) was almost totally lost by the community already in the early 17 th century The Kipchak group is one of the five main groups within the Turkic language family.
The other traditionally distinguished groups are: Bulghar, Oghuz, Karluk, Siberian, plus some minor taxa; note that apparently the real genealogical classification is more complicated. The Kipchak group consists of at least the following modern languages (the recently divergent lects are grouped): • Karachay-Balkar, Kumyk, • Karaim, • Crimean Tatar (Middle "dialect"), • Crimean Tatar (Steppe "dialect"), • Tatar Dybo (2013: 18), the Proto-Kipchak language splits into the western (Karachay-Balkar, Kumyk, Karaim, Middle "dialect" of Crimean Tatar) and eastern (the rest of the lects) branches ca. the 9 th century AD. Further detailed filiation of Kipchak lects cannot be proposed with certainty due to the following obstacles: • dialects of individual languages are still poorly documented (particularly Swadesh wordlists are not collected); • active contact-driven convergent processes within the Kipchak lects or between the Kipchak and other groups such as Oghuz and Karluk in the first half of the 2 nd millennium AD that lead to linguistic homoplasy of various kinds • some literary languages, which are much better described than their living dialects, are actually somewhat artificial being full of hidden loans (e.g., in the lexicostatistical tree in Dybo 2013: 18, the languages of the Karluk group, namely Literary Uzbek and Literary Uygur, are included in the eastern Kipchak branch that seems historically unjustified).
Kipchak nature of the Lipka Tatar language is well confirmed by historical evidence.
It is known that the main waves of the Lipka Tatars arrived from the Golden Horde and Post-Golden Horde khanates, such as the Crimean Khanate, in the 14 th -16 th centuries (see Pylypchuk 2014 with further references). Languages of the Golden Horde were predominantly Kipchak, including the official language of this state.
In the second half of the 2 nd millennium, the Lipka Tatars were culturally and linguistically influenced by the Ottoman Empire, whose dominant languages were Turkish varieties, including the official language of the empire -Ottoman Turkish (Dubiński 1972: 86;Miškinienė 2005). Turkish belongs to the Oghuz group of the Turkic language family.    The haplogroup composition and their frequency in BLT sample are given in Table 1.

Supplementary Information Text (Material and Methods)
In the whole genome SNP variation analyses we have used six samples of Belarusian Lipka Tatars (BLT) genotyped on Illumina HumanOmniExpress-24 v1.0 BeadChip that includes around 730 thousand SNPs. Although this sample size is small, we believe that conclusions drawn in this study are reliable and fairly applicable to the BLT population.

Dataset
BLT samples genotyped genome-wide were selected from different regions of Belarus so to increase the diversity of analyzed dataset. According to self-reported information as well as to our analysis for cryptic kinship, those samples are not relatives in at least three generations.

Genetic homogeneity of the BLT dataset
Analyses based on genotypes' data as well as haplotype-based approaches indicate the homogeneity of BLT dataset used in this study. a) PC plot PC1vsPC2 shown in Figure 2A in the main text reveals that BLT do not overlap, yet nevertheless form a tight cluster of their own; b) ADMIXTURE plot (k6) ( Figure 2B in the main text) shows that all six individuals bear similar ancestral proportions to each other -another indication of homogeneity of their autosomal genomes and the lack of recent admixture with equally highly homogeneous, though different, Belarusians.
c) fineSTRUCTURE dendrogram ( Supplementary Fig. 12) is based on information on genomic chunks "copied" by a recipient population from a range of other populations (donors). According to the dendrogram, BLTs are differentiated from other populations and form a single cluster.
Hence, analyses performed in this study suggest that though the sample set of six BLT is limited, it is homogeneous and representative for Lipkas.