Genetic diversity in North American Cercis Canadensis reveals an ancient population bottleneck that originated after the last glacial maximum

Understanding of the present-day genetic diversity, population structure, and evolutionary history of tree species can inform resource management and conservation activities, including response to pressures presented by a changing climate. Cercis canadensis (Eastern Redbud) is an economically valuable understory tree species native to the United States (U.S.) that is also important for forest ecosystem and wildlife health. Here, we document and explain the population genetics and evolutionary history of this deciduous tree species across its distributed range. In this study, we used twelve microsatellite markers to investigate 691 wild-type trees sampled at 74 collection sites from 23 Eastern U.S. states. High genetic diversity and limited gene flow were revealed in wild, natural stands of C. canadensis with populations that are explained by two major genetic clusters. These findings indicate that an ancient population bottleneck occurred coinciding with the last glacial maximum (LGM) in North America. The structure in current populations likely originated from an ancient population in the eastern U.S. that survived LGM and then later diverged into two contemporary clusters. Data suggests that populations have expanded since the last glaciation event from one into several post-glacial refugia that now occupy this species’ current geographic range. Our enhanced understanding benchmarks the genetic variation preserved within this species and can direct future efforts in conservation, and resource utilization of adaptively resilient populations that present the greatest genetic and structural diversity.


Materials and methods
Sample collection. Leaf samples of C. canadensis were collected by authors, collaborators, and citizen scientists (see acknowledgements), who sampled specimens across the native range of this species in 23 states in the midwestern and eastern U.S. (Table 1). The use of trees in the present study complied with international, national and/or institutional guidelines. Plants were identified based on collection guide provided to our collectors and confirmed by co-authors (voucher specimen deposited at the University of Tennessee Vascular Herbarium, catalog # TENN-V-0246136). For each collection site, at least 10 non-cultivated C. canadensis trees occurring within a one-mile radius were selected and their geographical coordinates were recorded. From each tree, five to seven young and disease-free leaves were collected, held between several pieces of absorbent paper and stored at ambient room temperature in a paper envelope until processing. Leaf samples from 1193 individual trees were collected at 117 collection sites. To avoid the over-representation of trees within a geographical area, we randomly selected a subset of collection sites from geographical areas with more than one collection site sampled. This study used total of 790 trees representing 79 collection sites that span much of the current native geographic range of C. canadensis. DNA extraction. From each tree, 60 to 100 mg of dry leaf tissue was used to isolate DNA. Samples were homogenized four times for 30 s each at 6 m/s using a Beadmill 24 homogenizer (Fisher Scientific, Pittsburgh, Pennsylvania, U.S.) and were kept in liquid nitrogen for 5 min between each pulverization step. The Qiagen DNeasy Plant Mini Kit (Qiagen, Valencia, California, U.S.) was used to isolate genomic DNA (gDNA) from the pulverized samples with the following minor modifications in the manufacturer's provided protocol. Specifically, 2% w/v polyvinylpyrrolidone (PVP) was mixed into the lysis buffer (AP1). Then 8 µl of RNase was added into each sample tube and incubated at 65 °C in a water bath for 45 min. Every two mins each sample tube was inverted gently to mix the sample well. Lastly, samples were incubated at − 20 °C for at least one hour. Ethanol was used to wash the spin columns if there was any visible remaining debris and elution buffer added. Elution buffer was heated to 65 °C before 50 µl was added to the spin columns twice. Concentrations of gDNA were quantified using ND1000 Ultraviolet-Vis Spectrophotometer (NanoDrop Technologies, Wilmington, Delaware, U.S.) and the gDNA was stored at − 20 °C until further use.  Amplified PCR products were visualized with QIAxcel Capillary Electrophoresis System (Qiagen) and analyzed with a 15/600 bp internal alignment marker and a 25 to 500 bp DNA ladder. All C. canadensis gDNA samples were amplified and visualized against each of the 12 microsatellite loci using the procedure described above. Reactions not producing any amplified products were rerun once before they were considered missing data. Samples with ≥ 40% missing data were discarded. Also, collection sites with more than four samples having ≥ 40% missing data were excluded from the dataset.
Genetic diversity. Using the Excel macro FLEXBIN version 28 , raw allele sizes were converted into allelic classes. In this program, alleles were binned into base-pair (bp) size categories by statistical similarities. This binned genetic dataset was used for all of the following statistical analyses, which were completed using R version 3.5.3 29 . Clone-correction of the data was implemented to identify presence of clonal individuals at the collection site level using the R package POPPR version 2.8.2 30,31 . For each collection site, only multi-locus genotypes (MLG) were used to obtain unbiased estimates of allelic frequency from the dataset 32 .
R package POPPR was used to calculate the total number of alleles per locus, observed heterozygosity (H O ; number of the heterozygotes present at a locus which is divided by sample size), expected heterozygosity (H E ; calculated as expected heterozygosity per locus 33 ), and linkage disequilibrium (rbard); non-random association of alleles between loci). Additionally, the Shannon-Weiner diversity index (H) was calculated for each collection site using POPPR. H considers both allele richness and evenness of the allelic distribution 34 . The number of unique private alleles in collection sites and different loci was estimated in POPPR package. Allelic richness (Ar), a measure of rarefied allelic counts per locus, was estimated using package HIERFSTAT version 0.04-22 35 . Allelic richness is used as an estimate of the long-term evolutionary potential to adapt and persist in a given population 36,37 . The genetic fixation index (F ST ), inbreeding coefficient (F IS ), and allelic differentiation (F' ST ) 38,39 were calculated using HIERFSTAT package. Gene flow (Nm) was estimated using GenAlEx 6. Population structure. Population structure within the native range of wild C. canadensis trees was analyzed using the program STRU CTU RE version 2.3.4 40 to which an admixture model was applied. This Bayesian clustering method with Monte Carlo Markov Chain (MCMC) approach was used with the following parameters: 500,000 burn-in period with 1,000,000 MCMC repetitions for 30 independent chains for K values from 1 to 18. The resulting output was visualized with STRU CTU RE HARVESTER web version 6.94 41 . The optimum K value, indicator of population clusters present in the dataset, was calculated utilizing the Evanno method 42 . The estimation of ΔK criterion obtained from STRU CTU RE HARVESTER were visualized using POPHELPER 2.2.6 43 that merged the 30 independent chains. R packages MAPS version 3.3.0 44 and PLOTRIX version 3.8-1 45 were used to generate pie charts of admixture proportions at K = 2.
Several model-free methods were utilized to investigate the population structure of C. canadensis samples. A Neighbor-Joining (NJ) dendrogram was constructed using Nei's genetic distance in POPPR 46,47 . Discriminant Analysis of Principal Components (DAPC) was implemented using package ADEGENET version 2.1.1 48 to visualize the underlying genetic structure of this species in its wide geographical range. This is a two-step multivariate analysis that investigates the genetic variations within populations among the sampled collection sites 49 . At first, a principal component analysis (PCA) was conducted, and then the number of PCA vectors (to explain majority of variance with minimizing over-fit of the DAPC) was selected. Then, a selected number of PCAs were used to reveal differences between groups while minimizing within group variations, as well as ordination of collection sites into distinct groups using discriminant analysis 49,50 . Moreover, missing values were calculated as mean allele frequency and cross-validation analysis was performed to select appropriate PC numbers.
Isolation by distance (IBD) was estimated using the Mantel test 51,52 with 10,000 permutations in package VEGAN version 2.5-6 53 using Euclidean distance. IBD checks for a correlation between genetic distance and geographical distance among the individuals in a dataset. The Mantel test was implemented across the 74 collection sites while considering the whole dataset as one population.
Analysis of Molecular Variance (AMOVA) 54 was carried out using POPPR with 10,000 permutations by sorting the individuals into hierarchical groups to assess the degree of molecular variance partitioned within, between, and among the collection sites. The levels of population hierarchy included: (1) 74 collection sites as one hierarchical group; (2) two groups on the basis of the STRU CTU RE analysis; and (3) four major groups on the basis of five major eco-region divisions namely hot continental division (mountain provinces), hot continental division, warm continental division, subtropical division, and prairie division (see Supporting Information Fig. S2) in the midwestern and eastern U.S. (Bailey, 1994). C. canadensis collection sites in warm continental division were grouped with hot continental division collection sites. As C. canadensis is found in wide range of climate and elevations, we tested if there was any influence of the regional climate patterns in the population structures of C. canadensis in AMOVA analysis.

Demographic histories.
To investigate and interpret the evolutionary history of C. canadensis, we used DIYABC program version 2.1 55,56 that utilized Approximate Bayesian Computation (ABC) statistical methods. For this analysis, collected individuals were pooled into two major groups, based on the STRU CTU RE results. To elucidate the evolutionary history of C. canadensis, we analyzed competing scenarios in two ABC steps. In the first step, we tested five demographic scenarios using 200,000 simulated pseudo-observed datasets (PODs) wherein: (1) the first two scenarios suggested stepwise divergence of the current two major groups from Scientific Reports | (2021) 11:21803 | https://doi.org/10.1038/s41598-021-01020-z www.nature.com/scientificreports/ an ancient population, (2) a third scenario suggested a single, two-way split of contemporary groups from an ancient unsampled population, and (3) the last two scenarios were based on the hypothesis of divergence of current groups from two separate ancient un-sampled populations. Once the analysis of these scenarios was completed, the two scenarios from the first step that yielded higher logistic regression support were selected as the basis for assessing the second step of ABC. In the second step, seven scenarios were constructed that addressed the possibility of a bottleneck occurrence within the evolutionary history of the species. Over 1,000,000 pseudoobserved datasets (PODs) were simulated under the assumed prior parameter ranges for each scenario. Posterior probabilities of the compared scenarios were estimated to select the best supported scenario 55 .

Results
Microsatellite genetic diversity and hierarchical fixation indices. Twelve microsatellite loci were amplified from 790 C. canadensis trees sampled in this study. Due to presence of missing data (missing ≥ 40% SSRs), five of 79 collection sites were excluded and 49 individuals from the remaining 74 sites were discarded, resulting in 691 individuals from 74 collections. Additionally, after deleting two clonal individuals, 689 unique multilocus genotypes from 74 collection sites remained for further data analyses ( Table 1). The average null alleles or missing data across the dataset were overall 2.89% (see Supporting Information Fig. S1). Nei's genetic diversity index (H E ) value in the studied 74 collection sites was 0.67, ranging from 0.32 (Anderson county, TN10) to 0.68 (Sarpy county, NE1) ( Table 1). Moreover, weak (rbard = 0.05, P value = 0.01) but significant linkage disequilibrium value detected in the dataset. Nineteen private alleles were detected within the 74 collection sites (Table 1). In addition, 9 of the 12 microsatellite loci yielded private alleles, and the highest number of private alleles was recovered from locus 168a (Pa = 4, Table 2). The number of alleles per locus ranged from 6 to 13 with a mean of 10 alleles per locus (  Table 2). Additionally, high population fixation (F ST = 0.19; ranging from 0.05 to 0.53; Table 2) and population differentiation (F' ST = 0.19; ranging from 0.05 to 0.54) were identified among C. canadensis populations. We estimated an inbreeding coefficient (F IS ) of 0.43 across all loci, indicating excess homozygotes (Table 2) among the studied C. canadensis populations. The average estimated gene flow was 0.75, which indicates that a limited amount of gene flow has occurred among the studied populations (Table 2). Population structure. Using Nei's genetic distance, we estimated pairwise F ST values among the 74 collection sites and the values ranged from 0.02 to 0.33 (see Supporting Information Table S1). STRU CTU RE results revealed an optimum ∆K = 2, implying that across its wide native range, C. canadensis collection sites are divided into two major clusters. Collection sites in the northern-most collection region of the U.S. (Ohio to Nebraska) and mid-south to mid-north (from Texas to Nebraska) were part of the first cluster (designated as north genetic cluster) (Fig. 1). The rest of the collection sites from the northeast (New York) to mid-south (Mississippi) along the Atlantic Ocean coastline belonged to the second cluster (designated as south genetic cluster). Whereas, a constructed NJ dendrogram revealed the presence of two major groups (except KS and TX collection sites that did not group with any major group), which supported the STRU CTU RE findings of presence of two genetic clusters (Fig. 2). In addition, the collection site distribution in these two major groups (NJ dendrogram) is similar to the distribution of collection sites in the two STRU CTU RE-based clusters. The DAPC biplot further confirmed the presence of genetic structures, primarily along the x-axis with two overlapping clusters (Fig. 3). Therefore, based on additional analyses used in this study, the grouping of C. canadensis individuals is best explained with two genetic clusters ( Fig. 1-3). These analyses also showed that the majority of the collection sites (except two collection sites from Georgia) grouped in clusters based on their geographical origin.
In the Analysis of Molecular Variance (AMOVA) analysis, the first data arrangement showed that most of the genetic variation was present within 74 collection sites (74.2%, P < 0.001) ( Table 3). A significant amount of variation was also partitioned among collection sites (25.8%, P < 0.001). When the dataset was divided into two genetic clusters based on STRU CTU RE results, 69.2% (P < 0.001) of the genetic variability was attributed to the location of collection sites (Table 3). There was also a significant amount of variability between the two different genetic clusters (13.9%, P < 0.001) and between collection sites within the clusters (17%, P < 0.001) ( Table 3). When the dataset was partitioned by the major eco-regions groups that are represented across the distribution of C. canadensis, only 7.9% (P < 0.001) of the variability could be attributed among eco-region groups, versus 18.6% (P < 0.001) variability that was attributed among the collection sites within groups ( Table 3). The majority of genetic variation was explained among individuals within a collection site, rather than among populations or group levels for all three tested scenarios (Table 3). Nevertheless, the extent of variation that was observed within collection sites and between clusters revealed the presence of genetic structure. AMOVA results were, therefore, congruent with the hierarchical fixation indices and indicated the presence of population structure. However, the lowest amount of variation was found within the groups when the data were divided according to major eco-regions. This finding suggests that partitioning trees within eco-regions cannot be expected to explain the genetic differentiation and population structure observed in C. canadensis wild populations. Results from the isolation-by-distance analysis indicated that among C. canadensis populations, the geographical distance effect was weak, but linearly correlated (r = 0.08, P < 0.001) with genetic distance (see Supporting Information Fig. S2).  Fig. 4A). In these analyses, Scenario 2 provided evidence that the contemporary C. canadensis population originated in the southeastern U.S. region from an ancient population and then later, a north population group (first genetic cluster) diverged from the south population group (second genetic cluster). Alternatively, Scenario 3 suggested that both current C. canadensis groups (north and south) have split from an ancient, as yet unsampled group (Fig. 4A). In the second step of the ABC analysis (Fig. 4B), the principal component analysis and relative posterior probability tests revealed that Scenario 2a (posterior probability (P) = 0.74, Fig. 4B) was the most   www.nature.com/scientificreports/ 25,000 years ago, given the average time for a C. canadensis tree to reach reproductive maturity is six to seven years 57 (Fig. 4B). Therefore, the bottleneck event most probably occurred during the last glacial period, which ended about 21,000 years ago 3,5 . Later, the northern population diverged from the southern population about 493 generations ago (ranging from 102 to 1,490 generations in simulated datasets) (Fig. 4B). Post-hoc analyses provided goodness-of-fit for this scenario, with the original dataset well embedded in the prior PODs population and nested in the posterior PODs population (see Supporting Information Table S2 for details of this analysis).

Discussion
Wild populations of C. canadensis that were sampled across its native range in the U.S. revealed high levels of genetic diversity and population differentiation, the presence of population structure, limited gene flow, and an ancient bottleneck that temporally coincides with the last glacial period in North America. We detected the presence of geographical clusters, longitudinally in the southern region (along U.S. coastal plains), and northern region. Evolutionary history analyses revealed an ancient bottleneck event occurring in the C. canadensis population in the south followed by divergence of the northern population from the southern population of C. canadensis. When populations were compared across the ecoregion divisions from which they were collected, ecoregion designations were not associated with population structure and genetic diversity of wild populations of C. canadensis. Low genetic variation in C. canadensis across ecoregions is not surprising, given that this tree species is well-adapted to a wide range of soil types, environmental conditions and habitats, has not been constrained by any major geographical barriers, and is widespread among the eastern U.S. However, we found evidence of relatively higher genetic diversity among the northernmost collection sites (IA, IN, MI, NE, and OH, and in mid-latitude North America) that are located at the periphery of the contemporary northern range of this species. A plausible explanation for this discrepancy between northernmost samples when compared to the southern collection sites is the probability of one major refugium or admixture between small, but genetically rich, populations in refugial contact zones [58][59][60] . This effect is most evident in species that experience reproduction via long distance gene flow and local adaptation among sensitive individuals in the distributed population margins. However, unlike European temperate species, eastern north American species maintained high genetic diversity in northern populations 61,62 . This high level of genetic diversity in the C. canadensis northern population could be maintained by long-distance seed dispersal events during range expansion from post-glacial northern refugia 62,63 .
The ability of C. canadensis to maintain high genetic diversity can be influenced by several factors, including wide and continuous geographic distribution, an outcrossing reproductive system, and large effective population size 59,[64][65][66] . Many other temperate tree species sustain high genetic diversity and allelic richness across a wide geographical range even in the presence of environmental stressors 11,14,58 , pressure from insect and plant pathogens 67,68 , and human disturbances 67,69,70 . A study using microsatellite loci revealed high genetic variation among five Asian Cercis spp., averaging 5.7 alleles per locus 71 . A recent study focused on smaller and fragmented C. canadensis populations also determined that trees in this species maintain high genetic diversity and allelic richness across the native range 27 , congruent with Asian Cercis species and several other hardwood tree species 66,72-76 . Table 3. Analysis of Molecular Variance of Cercis canadensis across 12 microsatellite loci for (i) 74 collection sites as one hierarchical cluster, (ii) 74 collection sites divided into two groups (north and south) based on STRU CTU RE results (two clusters), and (iii) four eco-regions. F ST = variance among collection sites relative to the total variance. F IS = inbreeding coefficient of individuals relative to population. F CT = variance among groups relative to the total variance. www.nature.com/scientificreports/ In addition to high genetic variability, C. canadensis populations also display a wide range of morphological variation across diverse environmental conditions 20,26,[77][78][79] . For example, Cercis leaf shape, size, surface pubescence, and other structural features were found to be strongly related to environmental factors, such as temperature and moisture content [79][80][81] . In Cercis spp., these characteristics likely originated through local adaptation to varying climatic pressure 26,82,83 . Morphological variation in C. canadensis has led efforts to differentiate the species into the following three varieties: C. canadensis var. canadensis L., distributed across mesophytic habitats of the eastern U.S., C. canadensis var. mexicana (Rose) M. Hopkins (Mexican redbud) and C. canadensis  A, B). Probable DIYABC evolutionary scenarios for Cercis canadensis evolutionary history. Here, current C. canadensis populations are north (N1) and south (N2). Also, N1b and N2b represent the populations of N1 and N2 before bottleneck event. On the right side of each scenario, a time scale indicated the timeline of each event (t = 0 is current time, t1-db = bottleneck occurrence, t1/t2 = split of the populations from originating population). Our data was analyzed using two ABC steps resulting in five competing scenarios from the first step (4A), and seven scenarios with possible bottleneck events from the second step analyses (4B). The scenario 2a (B) from step 2 gained the most support in DIYABC analysis and the timeline for the bottleneck (t1-db) and divergence (t1) events of scenario 2a is given in generations. For each scenario, value of relative posterior probability (P) was reported.  89 and Cornus florida L. 68 are temperate tree species that are widely distributed in the southeastern U.S. and  75,89 . Contrary to these studies, high genetic differentiation observed among widely distributed populations of C. canadensis may be attributed to a limited gene flow, as well as the demographic history of the species. Similar to many other self-incompatible forest tree species 20 , gene flow in C. canadensis is dependent upon various pollen and seed dispersal mechanisms. Flight distance of insect pollinators varies from one to several miles 90,91 , which would limit long distance gene flow by pollen dispersal among trees. Seedpods and seeds of C. canadensis are relatively heavy and typically fall in close proximity to the parental tree. Progeny that survive grow as non-reproductive seedlings during the next several years 57,92 and yield half-sib "neighborhoods" within a localized spatial scale 89,[93][94][95][96] . Several seed-feeding mammals, such as eastern woodrats (Neotoma floridana Ord) 97 , and birds including quail 98 contribute to the dispersal of C. canadensis seed to some extent. Small rodents and deer may repeatedly eat from the same tree, thus carrying the closely related, half-sibling propagules (if eaten when seeds have matured) for distances restricted to the retention time of fecal scats 57,97,[99][100][101] . To fully understand gene flow patterns and predict changes to C. canadensis distribution patterns, it may be helpful to unravel the seed and pollen dispersal methods and efficacy of seed transport by animals that have been associated with this tree species. However, seed transport efficacy will likely be limited to the relatively short distances traveled by these animals during foraging. Fruit consumption rate by animals is also restricted by reliance upon C. canadensis fruits as emergency food in late fall or winter, and this behavior would lower efficiency of functional seed dispersal 57,97,99-101 . These events likely limit the gene flow to short distances, create spatial genetic structures, and increase the likelihood of inbreeding at a local level, as revealed in fine scale level assessments of C. canadensis 27,69,102 . We also collected C. canadensis samples from New York (U.S.) that represent individuals occurring farther north than the reported geographic range of the species. These individuals also could result from open-pollinated escapes subsequent to introduction of C. canadensis into managed landscapes.
STRU CTU RE analysis of the C. canadensis dataset revealed the presence of two geographically distinct clusters, designated as northern and southern clusters, that are divided longitudinally northwest by southeast along a Kentucky-Tennessee-Mississippi transition zone. From this evidence, the southern Appalachian Mountains have not posed a barrier, as populations belonging to the northern STRU CTU RE clusters were found on both sides of the Appalachian Mountains. The presence of only two genetic clusters is congruent with the simple postglacial lineage theory presented for eastern North American tree species 59,103 . The most recent glacial event ended approximately 21,000 years ago. By its conclusion, boreal and temperate tree species had shifted closer to mid-latitudes within the eastern U.S., where many species survived within bottlenecked refugial populations 104 . According to our DIYABC supported scenario, a southeastern refugium was also likely to be the major postglacial refugium for C. canadensis. This scenario is further supported by several phylogenetic studies that have indicated that southeastern U.S. populations served as one prominent large-scale, post-glacial refugium for many temperate species 14,103,104 . Modern day temperate species including Fagus grandifolia Ehrh. (American beech), Acer rubrum L. (Red maple), and C. florida (Flowering dogwood), for example, likely originated from this southeastern refugium 4,14,105 . Cercis canadensis also shares the same geographic distribution as these temperate tree species, and modern-day wild populations of C. canadensis are ubiquitous throughout this region.
Our analyses also revealed support for several possible micro-refugia across the eastern U.S., evident in genetic differences among populations and presence of substructures that lack distinct centers. Post-glacial C. canadensis populations from this geographical range may have spread northward to establish the current species distribution. Post-glacial populations of other tree species from this range are adapted to semiarid to xeric environments 9,14,103 and present adaptive characteristics that are similarly evident in mid-western C. canadensis populations. Moreover, the presence of a number of refugia or fragmented refugia is also can be supported by the high genetic diversity and allelic richness of the modern-day C. canadensis populations 60 .
Phylogeographical studies of other tree species and animals indicate that they survived as northern cryptic micro-refugia 104,106-108 . Due to insufficient fossil data from the Late Pleistocene in the northern region, it is difficult to conclude the definite presence of northern refugia of the C. canadensis populations 5,6 . The best supported DIYABC evolutionary scenario suggests that at the time of the last glacial period C. canadensis populations persisted within a southern population group. Therefore, we also find little evidence for the possibility of a northern cryptic refugium for this species. Instead, pre-glacial C. canadensis was distributed in midwestern and southeastern U.S. populations, which later survived in one or more post-glacial midwestern and southeastern U.S. refugia. As a consequence of long-term population isolation within the refugial areas, a post-glacial refugial population in the midwestern U.S. may have diverged from the southeastern large refugium population, giving rise to a genetically differentiated northern spatial cluster 70,95 . Also, this post-glacial northern population may have later migrated from the midwestern U.S. to establish the current range distribution.
Ancestors of North American Cercis species are thought to have originated under mesic conditions and may have dispersed into North America across the North Atlantic Land Bridge 81,84,109 . According to several studies, ancestral Cercis population adapted to the drier environment and then spread into the Northern hemisphere during the mid-Miocene period 84,87 . It is possible, then, that this ancestral, un-sampled, mid-Miocene Cercis population gave rise to the southern C. canadensis population as suggested by DIYABC Scenario 2a.
This economically and ecologically significant deciduous shade tree species has a number of desirable morphological variations and ornamental characteristics including foliar color and texture, flower color variation, drought tolerance, pathogen resistance, as well as a wide variety of architectural forms 20,26,57 . Fruits and seeds of C. canadensis are consumed by several bird species and small mammals 57,97,99,101 , and many pollinators depend on this tree for an early season food source 110 . More than three dozen cultivars are available commercially and nursery stock sales of the species contribute to more than $27 M annually in the U.S. 71,111 . When paired with Scientific Reports | (2021) 11:21803 | https://doi.org/10.1038/s41598-021-01020-z www.nature.com/scientificreports/ recent introductions of novel horticultural cultivars with highly desirable characteristics, the value of adaptive traits that are likely to exist across different geographic localities supports the importance in conserving local level diversity of C. canadensis. These populations are genetic reservoirs of potential variability that can provide breeding programs with the resources needed to improve selected traits (e.g., limiting seed pod productivity in landscape specimens) and provide additional opportunities for developing high-value cultivars for commercial trade. Future work should also focus on identifying important adaptive traits in wild populations that can be used to help ensure that C. canadensis populations will persist and will continue to adapt to a changing climate that is occurring across portions of the current species distribution.

Data Archiving Statement
After the manuscript is accepted, data will be publicly available and deposited to Dryad Depository.