The phylogeographic history of Krascheninnikovia reflects the development of dry steppes and semi-deserts in Eurasia

Constituting one of Earth’s major biomes, steppes are characterised by naturally treeless extra-tropical vegetation. The formation of the Eurasian steppe belt, the largest steppe region in the world, began in Central Asia during the Neogene. In the glacial stages of the Pleistocene, steppe displaced forest vegetation, which in turn recolonised the area during the warmer interglacial periods, thus affecting the distribution of plants adapted to these habitats. Krascheninnikovia ceratoides (Chenopodiaceae) is a plant characteristic of dry steppe and semi-desert formations. Earlier studies showed that the ancestor of this autochthonous steppe element originated in Central Asia during the Miocene/Pliocene, i.e., in the same region and at the same time as the first appearance of steppe vegetation. However, as the extant lineages of Krascheninnikovia ceratoides diversified only 2.2 ± 0.9 Mya, it may represent a modern element of current dry steppe and semi-desert formations, rather than a component of the first steppe precursors of the Miocene. As such, it may have capitalised on the climatic conditions of the cold stages of the Quaternary to expand its range and colonise suitable habitats outside of its area of origin. To test this hypothesis, phylogeographic methods were applied to high-resolution genotyping-by-sequencing data. Our results indicate that Krascheninnikovia originated in western Central Asia and the Russian Altai, then spread to Europe in the West, and reached North America in the East. The populations of eastern Central Asia and North America belong to the same clade and are genetically clearly distinct from the Euro-Siberian populations. Among the populations west of the Altai Mountains, the European populations are genetically distinct from all others, which could be the result of the separation of populations east and west of the Urals caused by the Pleistocene transgressions of the Caspian Sea.

According to previous studies, the stem group of Krascheninnikovia originated c. 5 to 23 Million years ago (Mya) (~ 21.2 Mya 50 ; 12.8 Mya (5.3-23.1 Mya) 51 ; 17.3 Mya (12.0-23.5 Mya) 35 ), while the crown group diversified 2.2 ± 0.9 Mya 35 . With a minimum age of more than 5 My, K. ceratoides has persisted throughout several global climatic changes including the glacial and interglacial phases of the Quaternary. Its diversification into extant lineages took place during the Gelasian (Early Pleistocene, 2.6-1.8 Mya), and seemingly correlates with the appearance of modern types of desert and semi-desert. These are, together with semi-arid temperate grasslands such as prairie, steppe, and pampas, a Plio-/Pleistocene phenomenon 52 . Therefore, K. ceratoides could have been among the first plants that inhabited modern-type dry steppe areas.
Seidl et al. 35 reconstructed the biogeographical history of Krascheninnikovia using a few molecular markers from its nuclear and chloroplast genomes (ITS, ETS, atpB-rbcL, rpl32-trnL, trnL-trnF). Genetic analyses relying on single loci may reflect only the history of these particular loci, rather than the history of the species studied. Therefore, a whole-genome approach is favourable, where single loci have less weight and cannot influence the result as strongly. To investigate the spatio-temporal diversification of K. ceratoides in more detail, we sequenced populations of the species from throughout most of its distribution range employing genotyping-bysequencing (GBS). We then inferred phylogenetic trees and networks from the GBS data, conducted ancestral range reconstructions, and measured genome size. We specifically addressed the following questions: (1) Where did Krascheninnikovia originate? (2) Can GBS provide the appropriate resolution to retrace its colonising history and migration routes? (3) Can the retrieved phylogeographic and population genetic patterns be related to palaeogeographical and palaeoenvironmental events during the evolutionary history of the Eurasian steppe? With this, our overall aim was to contribute to the knowledge of the florogenesis of the Eurasian steppe belt.

Materials and methods
Plant material. Krascheninnikovia ceratoides leaves were collected in the field and dried with silica gel or taken from herbarium material. Leaves from up to five individuals per population were collected. The five individual specimens from Kyrgyzstan combined into population P025 came from two different localities about 80 km apart. In total, 183 accessions from 43 different populations of K. ceratoides were used for GBS analyses ( Fig. 1; Supplementary Table S1). Most of the samples were the same as those used in a previous phylogenetic study of ITS, ETS, and chloroplast DNA sequences 35 . Three North American populations determined as K. ceratoides subsp. lanata (P039, P040, P041) were included, as well as two eastern Middle Asian populations of K. ceratoides subsp. ceratoides, which were originally determined as K. ewersmanniana (P069, P070). Axyris hybrida and Ceratocarpus arenarius were used as outgroups. Further details to the collected samples can be found in Supplementary Table S1.
The present study complies with relevant institutional, national, and international guidelines and legislation.
Genome size measurements. We determined the genome size and ploidy level of 99 samples of K. ceratoides leaf tissue by flow cytometry, using an internal standard (either Pisum sativum L. 'Kleine Rheinländerin' (2C = 8.8 pg 53 ) or Petroselinum crispum (Mill.) Fuss (2C = 4.5 pg 54 )) and the CyStain PI Absolute P kit (Sysmex, Görlitz, Germany) according to the manufacturer's instructions. The relative fluorescence of at least 5000 particles per sample was recorded with a CyFlow Space (532 nm diode laser; Sysmex).
DNA extraction and GBS. The accessions used in the study by Seidl et al. 35  Genotyping-by-sequencing library preparation followed Poland et al. 55 , using the restriction enzymes MspI and PstI. Samples were sequenced in two runs on a HiSeq 2500 (Illumina, California, USA) (single-end sequencing of 100 bp fragments). To achieve the same genomic coverage for individuals with different ploidies, twice the amount of DNA was used for tetraploids as for diploids. Two individuals (one from population P029 and one from population P072) were sequenced twice to assess the reproducibility of the method. Data analysis. We used STACKS v.2.55 56 for the sorting, filtering and de novo assembly of raw read data.
To find the optimal parameter settings, we ran STACKS 'denovo_map' on a reduced data set (Supplementary Methods S2). We varied one of the three following parameters while keeping the others at their default value: the minimal number of identical reads required per stack, m, was tested in a range from three to six; the differences between loci within an individual, M, was tested for one to four base pair differences; the differences allowed between catalog loci, n, was tested for one to eight base pair differences. The parameter value that maximized www.nature.com/scientificreports/ the number of variant loci present in at least 80% of all individuals were used for the complete data set: m = 3, M = 3 and n = 8. We set the number of stacks per locus to five to account for possibly present allelic variation in tetraploids. Diploid and tetraploid samples were treated identically. The STACKS 'population' program using the 'write random SNPs' flag was used to produce unlinked SNP data sets, with and without the Axyris and Ceratocarpus outgroups.
To estimate ploidy levels from GBS data, we used nQuire 57 and ploidyNGS 58 , which recognize bam-files as input. These were built using BWA (Burrows-Wheeler Alignment tool) 59 with the concatenated consensus sequence of the respective individuals serving as a reference. BWA settings were the following: maximum gap length of twelve, maximum of one gap per read, and a mismatch parameter of 0.01 60 . These bam-files were used as input for nQuire using the denoising option and for ploidyNGS using the "guess ploidy" option.
To infer the population genetic structure of Krascheninnikovia, an analysis was performed in R 61 using the LEA 62 package, using the data set of unlinked SNPs in Variant Call Format (VCF) of ingroup samples only. The ploidy was set to tetraploid. One to fifteen hypothetical ancestral populations were tested with 100 repetitions each. The number of clusters was chosen based on minimal cross-entropy. To show the result per population, the affiliation to each hypothetical ancestral population was calculated per population as the mean over its individuals and shown as pie charts on a map. To validate the results using a different approach, both Discriminant Analysis of Principal Components (DAPC) and Principle Coordinate Analysis (PCoA) were performed using the R packages 63, 64 adegenet and dartR, respectively, based on the ingroup data set of unlinked SNPs.
Unlinked SNPs of 6,029 loci were retained in the PHYLIP file of Krascheninnikovia as consensus sequences per population, after removing all uninformative sites 65 . Phylogenetic analyses were conducted by applying the maximum likelihood (ML) criterion using 66 IQ-TREE v.2.0.3 67 with the GTR+G model and correction for ascertainment bias 68 . SH-like approximate likelihood ratio test 69 and ultrafast bootstrapping (UFBoot) 70 were performed with 1,000 repetitions each to obtain bootstrap support (aLRT/UFBoot BS) for nodes. The outgroups Axyris and Ceratocarpus are genetically too distant to assess the correct rooting, so midpoint rooting was used, which resulted in a tree with basal positions for populations of the Russian Altai. The basal population of the IQ-TREE analysis in Seidl et al. 35 , P242, originated in the Russian Altai as well. The splits data was visualised in SplitsTree v4.14.8 71  To calculate private allelic richness per region, a rarefaction analysis was performed in HP-Rare 73 . Populations were assigned to biogeographic regions according to their origin based on Brummit 74 : Central Europe, eastern Europe, western Middle Asia, central Middle Asia, eastern Middle Asia, southern Middle Asia, the Russian Altai Mountains, western Central Asia, southern Central Asia, eastern Central Asia, and North America. Only loci with called bases present in all populations were used. Rarefaction sample sizes for each 'SNP locus' were two populations from each region, and two alleles ('genes' in HP-Rare) from each population. Mean rarefaction and standard deviation were then calculated per region.
Ancestral range reconstruction. An ultrametric tree was built based on the ML tree using PATHd8 75 as input tree for the BioGeoBears v. 1.1.1 76-78 R package for R ver. 3.5.0 [76][77][78] . The assignment of populations to regions was described above. The analysis was performed based on two different migration hypotheses: free migration between all areas, and restricted migration due to the distance between areas (0.8 for regions with a common land border and 0.2 for areas without a common land border). Three different biogeographic models were tested: DEC, DIVALIKE, and BAYAREALIKE, all both with and without the jumping parameter "j", which determines the probability of jump dispersals. Based on AICc values, DIVALIKE+J with restricted migration was chosen as the most likely model. DIVALIKE is a likelihood interpretation of parsimony-based DIVA and allows the same processes 79 such as dispersal, extinction, narrow sympatry, narrow and widespread vicariance, but not, e.g., subset sympatry.

Results
Ploidy. The mean 2C genome sizes (± standard deviation) of diploid (N = 38) and tetraploid (N = 61) individuals were 2.8 ± 0.3 pg and 5.8 ± 0.2 pg, respectively. The populations measured were uniformly di-or tetraploid. The populations from Central Europe, western Middle Asia, and the Russian Altai Mountains were all tetraploid, but in most regions both diploid and tetraploid populations were found ( Fig. 1; Supplementary Table S1). The genome sizes of 85 samples could not be measured due to the old age of the material, their storage conditions (e.g., populations from eastern Europe: P215; central Middle Asia: P034; western Central Asia: P207, P209, P218; eastern Central Asia: P210; Supplementary Table S1), or a lack of material. Only one individual each of the populations P034 (central Middle Asia) and P025 (Kyrgyzstan) could be measured, both turning out to be diploid.
Ploidy level estimations based on GBS data using nQuire and ploidyNGS confirmed most of these measurements. Only the North American sample P041-I0372 was inferred to be tetraploid but appeared diploid according to flow cytometry measurements. Of those not measured using flow cytometry, the populations of eastern Europe (P215), western Central Asia (P207, P209, P218), and eastern Central Asia (P210) were found to be tetraploid; of P025, all individuals were found to be tetraploid except P025-I0223, which was diploid in both analyses, by GBS and flow-cytometry data. Consequently, we consider it very likely that P025-I0223 came from one of the two Kyrgyz localities, and the other four P025 individuals from the other locality, which is ~ 80 km away. P025-I0223 (hereinafter P025_D) and the other four individuals (hereinafter P025_T) were therefore treated as two different populations in all analyses. Of population P034, one individual (P034-I0317) was tetraploid, while all others were diploid. In all populations except for P025 and P034, the GBS-estimated ploidy of unmeasured samples corresponded to the ploidy of measured individuals from the same population.  North America belong to the lilac cluster. Some admixture is observed, especially between the light green, dark green, and red clusters (Fig. 2).
To illustrate the consecutive splits between the hypothetical ancestral populations, the cluster membership was depicted as pie charts on a map for all K values from two to eight (Fig. 3)

DAPC.
Similarly to LEA, DAPC analysis clusters individuals according to their genetic composition (Fig. 4).    Figure 2. Genetic cluster membership as inferred using LEA is shown as barplot. Vertical bars denote individuals. Eight clusters were retained, which are indicated by different colours. The samples are ordered by their assigned geographical region from west to east. Ploidy is indicated by grey (diploid) and black (tetraploid) bars. The figure was generated in R using the package 'ggplot2' (https:// ggplo t2. tidyv erse. org) and edited in Inkscape v1.0.2 (https:// inksc ape. org).  Biogeographic analysis with BioGeoBears. Based on our BioGeoBEARS analysis, the ancestral range of K. ceratoides lies within the area of western Central Asia (Mongolian Altai Mountains, yellow) and the Russian Altai Mountains (dark blue) (Fig. 7). From western Central Asia, K. ceratoides migrated east to eastern Central Asia (orange) and from there to North America (lilac). From the Russian Altai Mountains, it migrated to southern Central Asia and to eastern Middle Asia (light green). From eastern Middle Asia, it then migrated again to southern Central Asia (light blue) and to southern Middle Asia (brown). It migrated further west to central Middle Asia (pink) and to western Middle Asia (dark green), and on to eastern Europe (red) and Central Europe (blue).
Private allelic richness. The highest private allelic richness was found in western Central Asia (0.014 ± 0.002), followed by eastern Central Asia (0.013 ± 0.004) and the Russian Altai Mountains (0.011 ± 0.001; Supplementary Figure S5). The lowest private allelic richness was found in North America (0.007 ± 0.003).

Discussion
In this study we reconstructed the phylogeographic history of K. ceratoides, a widespread and representative steppe inhabitant, to gain insights into the general evolutionary history of the Eurasian steppe belt. The retrieved molecular patterns fit best to a scenario of a western Central Asian/Russian Altai origin of Krascheninnikovia and its subsequent spread both towards the West, through stepwise migration until reaching Central Europe, and towards the East, finally colonizing western North America.   Steppe precursors originated in Central Asia during the Miocene at the latest 1,80,81 . From there, the steppe started to spread, with many latitudinal range shifts due to changing climatic conditions 6,[82][83][84][85] . A continuous Eurasian steppe belt was first present during the late Miocene/early Pliocene 1 . At the end of the Pliocene, true steppes were present in Central Asia, while forest-steppe assemblies were present in south-eastern Europe and on the West Siberian Plain 5 . Cold steppe vegetation spread during the glacial periods of the Pleistocene 1 . Eurythermal species could have survived the last glacial maximum in situ 1 . Krascheninnikovia ceratoides is an autochthonous steppe element, which originated in the steppe belt similarly to some species of the genus Allium 86 . In contrast, species of the genus Camelina migrated into the Eurasian steppe secondarily out of their presumed area of origin in the western part of the Irano-Turanian floristic region 19 .
The Altai Mountains are known as a refuge, e.g. for cold-adapted species during warm humid phases 87,88 , and could have served as a refugium for steppe and semi-desert plants such as K. ceratoides during the interglacial phases of the Quaternary. As a consequence, the Altai Mountains could have been a source for recolonisation of the spreading steppe area during glacial periods. Because of the numerous large populations of K. ceratoides in Central Asia with high morphological variation in leaf size and shape, hair density, and plant height, this area was assumed to be the cradle of the species 35,46,49,[89][90][91][92][93][94][95][96] . Our findings suggest western Central Asia and the Altai Mountains as the area of origin for Krascheninnikovia (Fig. 7) and the starting point of its range expansion throughout the Northern Hemisphere. As indicated by their high private allelic richness, the populations of eastern and western Central Asia harbour a diverse gene pool (Supplementary Figure S5). Thus, these may have served as a good source for dry steppe plants for recolonising the surrounding areas after times of unsuitable, more humid conditions 97 during the interglacial periods.
Although the stem lineage of Krascheninnikovia may have existed since the Pliocene 34,49,50 , it has only diversified in the last ~ 2.2 My 35 , coincidental with the beginning of the Pleistocene glacial-interglacial macro cycles. It seems that the cold, dry climate during the glacial stages facilitated the migration of Krascheninnikovia, as opposed to the more humid and less continental climatic conditions earlier during the Neogene 1 .
Krascheninnikovia ceratoides shows an apparent isolation-by-distance pattern: populations that are geographically close are usually genetically similar (Figs. 2,5,6,7). The populations of Central Asia, which belong to two distinct genetic lineages, are exceptions. While geographically close to the populations of western Central Asia and separated only by the Khangai Mountains, the eastern populations are genetically more similar to the populations of North America. The Khangai Mountains (covered by alpine vegetation, taiga, mountain steppe, and forest-steppe 98 ) have divided the populations east and west of the mountains and prevented gene flow for a long time. This split occurred in the early stages of diversification, after Krascheninnikovia migrated from the Altai Mountains to eastern Central Asia (Fig. 7).
A migration from eastern Central Asia to North America as hypothesised by Heklau and Röser 23 and Seidl et al. 35 is here confirmed. The dispersal to North America may have occurred across Beringia, which repeatedly connected the Asian and American continents during several arid glacial phases in the Pleistocene 99, 100 .
The sister group relationship of populations in North America (USA) on the one hand and eastern Central Asia (East Mongolia) on the other was found in all phylogenetic inferences (Figs. 3, 4) as well as in Seidl et al. 35 The two groups form strongly supported clades (Figs. 2, 3, 4, 6, Supplementary Figure S4), which hints at low or absence of gene flow.
Starting from its likely area of origin in western Central Asia and the adjacent Altai mountains, Krascheninnikovia migrated west to Middle Asia.
During the Middle Pleistocene, glaciers dammed the Siberian Ob-Irtysh-Tobol river system, leading to backwaters that reached the foothills of the Kazakhstan Highlands and the Altai Mountains. The resulting landscape of lakes and swampy areas was not a suitable habitat for steppe plants 1,83,85 . According to our biogeographic analysis, this probably also affected the phylogeographic patterns of K. ceratoides, which was probably present on the Kazakhstan plain and even further west at this time. This damming would have caused range splits and contractions, followed by isolation and subsequent expansion. Repeated secondary contact may be the explanation for the absence of monophyly for individuals from eastern Middle Asia in our IQ-TREE tree (Supplementary Figure S4). It has been shown that the histories of other steppe taxa such as Clausia aprica (Stephan) Korn.-Trotzky 12 , Goniolimon speciosum (L.) Boiss. 18 , Capsella Medik. 13 , Allium cretaceum N.Friesen & Seregin, and A. montanostepposum N.Friesen & Seregin 14 were affected by these backwater events as well.
Our results suggest that Krascheninnikovia migrated stepwise from Central Asia to Middle Asia and to Europe. Then, the European populations were separated from populations east of the Urals (Figs. 2, 4, 7). This genetic split may be traced back to a migration barrier imposed by a transgression of the Caspian Sea like the Apsheron transgression ~ 1 Mya 1 . This suggests that the separated populations found refugia in the Ural mountains 17,18 and west of the Caspian Sea 18, 101, 102 to survive the interglacial phases. The same spatio-temporal pattern was observed for the steppe plant Camelina microcarpa Andrz. ex DC., whose two major groups split about 1.2 ± 1 Mya 19 .
According to our data, Central Europe was colonised through stepping-stone-like migration from the Altai Mountains to Central Europe, rather than by long-distance dispersal. Stepping-stone migration to Central Europe would indicate that trees present during glacial periods may potentially have grown in patches, as opposed to a dense tree cover 103 , allowing the steppe and semi-desert plant Krascheninnikovia to migrate step by step.
Diploid and tetraploid populations of Krascheninnikovia coexist in most regions 48 . In Eurasia, both ploidy levels are present in eastern Europe, Middle Asia, southern Central Asia, and eastern Central Asia (Fig. 1, Supplementary Table S1) 34,47 . Most diploid samples from eastern Middle Asia (P030, P070, P072, P269) and southern Central Asia (P237, P238) belong to the light green cluster in our LEA analysis, suggesting a common ancestry. The same cluster membership is shared by the diploid individuals from central Middle Asia (P034) and eastern Europe (P057, P062), which are admixed. The westward migration of K. ceratoides was possibly accomplished by diploid individuals, followed by (auto-)polyploidization in the newly colonised areas, a process which occurred several times independently as a response to climatic fluctuations in the Pleistocene 104  www.nature.com/scientificreports/ There is an ongoing discussion about potential subspecies and species in the genus Krascheninnikovia. While some argue that Krascheninnikovia is monospecific with only two subspecies ceratoides and lanata 22 , others distinguish several entities at the specific level, such as K. arborescens, K. ewersmanniana, and K. compacta from Middle and Central Asia 48, 87-90 . Our data did not include samples representing these potential taxa, except for two populations identified as K. ewersmanniana (P069 and P070). Therefore, this study is not suitable to conclude on this topic. The results of our analyses support however the recognition of subspecies lanata as defined by Heklau and Röser 23 , as the North American individuals form a clearly distinguishable genetic lineage. Our data did not support the samples identified as K. ewersmanniana to belong to a distinct clade, as they are nested at different positions among morphologically typical subsp. ceratoides specimens from eastern Middle Asia. However, to confirm this hypothesis thorough morphological studies are necessary. Thus, we concur with the conclusions of Heklau and Röser 23 , recognizing (at the moment) two subspecific entities, i.e. subsp. ceratoides and subsp. lanata within monotypic Krascheninnikovia.
Our analyses provide insights into the history of Krascheninnikovia and its connection to the history of the Eurasian steppe belt, confirming and refining previous findings using GBS. The observed phylogeographic patterns reveal the common history of Krascheninnikovia and the steppe belt: The Krascheninnikovia lineage represents an autochthonous steppe element that evolved in situ during the Miocene when steppe precursors made their first appearance in Central Asia. Additionally, the Caspian Sea transgressions and the ice-dammed backwaters of the Ob-Irtysh-Tobol river system during the Pleistocene left clear imprints on the population genetic structure that persist until today. Nevertheless, there is still much to be investigated: The inclusion of further samples from the southernmost part of the range in Eurasia and from North America in future GBS analyses of Krascheninnikovia could allow the narrowing down of the area of origin and conclusions to be drawn about the number of migration events to America that have taken place.

Data availability
The raw data is available at the European Nucleotide Archive (accessions ERS5794424 to ERS5794612).