Past climate-driven range shifts structuring intraspecific biodiversity levels of the giant kelp (Macrocystis pyrifera) at global scales

The paradigm of past climate-driven range shifts structuring the distribution of marine intraspecific biodiversity lacks replication in biological models exposed to comparable limiting conditions in independent regions. This may lead to confounding effects unlinked to climate drivers. We aim to fill in this gap by asking whether the global distribution of intraspecific biodiversity of giant kelp (Macrocystis pyrifera) is explained by past climate changes occurring across the two hemispheres. We compared the species’ population genetic diversity and structure inferred with microsatellite markers, with range shifts and long-term refugial regions predicted with species distribution modelling (SDM) from the last glacial maximum (LGM) to the present. The broad antitropical distribution of Macrocystis pyrifera is composed by six significantly differentiated genetic groups, for which current genetic diversity levels match the expectations of past climate changes. Range shifts from the LGM to the present structured low latitude refugial regions where genetic relics with higher and unique diversity were found (particularly in the Channel Islands of California and in Peru), while post-glacial expansions following ~ 40% range contraction explained extensive regions with homogenous reduced diversity. The estimated effect of past climate-driven range shifts was comparable between hemispheres, largely demonstrating that the distribution of intraspecific marine biodiversity can be structured by comparable evolutionary forces across the global ocean. Additionally, the differentiation and endemicity of regional genetic groups, confers high conservation value to these localized intraspecific biodiversity hotspots of giant kelp forests.

www.nature.com/scientificreports/ fit non-linear relationships and complex interactions, while strongly reducing overfitting through hyper-parametrization and forcing of monotonic responses 49,50 . Biologically meaningful benthic predictors (i.e., along bottom layers) for the species 29 were developed 52 for the present-day and the LGM conditions using the computation pipelines of Bio-ORACLE 51 . These reflected essential resources (nutrients as Phosphate and Nitrate), disturbance (ice thickness) and factors affecting physiology (salinity and temperature). Predictors were clipped to 30 m depth, the typical maximum depth distribution of the species 52 . The LGM predictors considered sea level and shoreline displacements by placing the isobath of − 120 m below current sea level 53 . Presence records were extracted from the fine-tuned dataset of marine forest species 54 . The same number of pseudo-absences as presences were randomly generated in sites where no presences were recorded 55 . To reduce the negative effect of autocorrelation in the models, the correlation of predictors within the range of occurrence records (presence and pseudo-absences) was tested as a function of geographic distance. For this purpose, a correlogram was built to pinpoint the minimum distance at which predictors were significantly correlated. Records were pruned by randomly selecting one record from the pool found within such distance 56 .
Considering the general pattern found in the species' genetic structure (see "Results"), models were developed separately for the northern and southern hemispheres 57 . A tenfold cross-validation framework using independent latitudinal bands 7 was implemented to find the optimal hyperparameter combination of number of trees (50-1000, step 50), tree complexity (1-6) and learning rate (0.01, 0.005 and 0.001). This approach also inferred the performance of models 58 by using the area under the curve (AUC) of the receiver operating characteristic curve and sensitivity 59 . To reduce overfitting, models were forced to fit negative monotonic responses for maximum temperature and ice thickness, and positive responses for nutrients, salinity and minimum temperature 7,49,60 . The relative contribution of predictors to the models was determined by computing the increase in deviance explained (i.e., goodness of fit) when each predictor was added to its alternative model. Physiological tolerance limits (maximum and minimum, depending on the predictor) were estimated from individual response functions produced for each predictor, while fixing all alternative predictors to their averages 61,62 .
Parsimonious models (i.e., with fewer predictors) with higher potential for transferability 63 were built using a stepwise approach by fitting a full model (i.e., with all predictors) and interactively removing predictors one at the time, from the least to the higher contributive, until the difference of deviance explained between the full model and the reduced model was higher than zero 58,60,64 . Maps reflecting the potential distribution of the species for present-day conditions and the LGM were developed with the parsimonious models fitting the optimal hyperparameters. These maps were reclassified to binomial surfaces, reflecting presence and absence of Macrocystis pyrifera, by applying a threshold maximizing both specificity (true negative rate) and sensitivity 7,58,65 . Range shifts were determined in terms of area by comparing the LGM and present modelled distributions for the regions where genetic sampling took place (see "Methods" below). Regions providing glacial refugia were identified where the species was predicted to occur during the LGM period. Because the species can produce rafts that disperse for long periods of time over the surface of the ocean 29 , predictions were also performed for the LGM and the present with surface layers 51 .
Genetic structure and diversity. Approximately 30 individuals were sampled per site (115 sites; 3872 individuals in total; S1) for genetic analysis, by removing a piece of the blade and preserving it in silica drying crystals. Genomic DNA extraction was performed with the NucleoSpin 96 Plant Kit II (Macherey-Nagel, Duren, Germany) and standard protocols. Microsatellite amplification and scoring was performed for 6 polymorphic loci using PCR conditions and methods described in 66 and 39 . Genetic structure between sites was inferred with Structure software 67 without a priori population assignment, allowing admixture, using the model of correlated allele frequencies for a range of K clusters between 1 and 10, a burn-in time of 2 × 10 5 repetitions and 1 × 10 6 iterations. The potential number of clusters was inferred with the DeltaK criterion 68 , which identifies the "best" K value as the one maximizing the rate of change in the log probability of the data (ΔK). The analysis was performed under two hierarchical steps 3,8 : the first aimed to discover the main genetic groups when considering the entire distribution, and the second aimed to reveal the regional levels of genetic structure within each of the main genetic groups that resulted from the first step. This approach allows the analysis of regional genetic differentiation within the main genetic groups independently of their differences in allelic richness. Two sites showing signs of admixture in the first hierarchal step (see "Results") were excluded from the second step to better disentangle genetic structure 3,8 .
Genetic structure was also inferred with a network analysis, with nodes defined by the sampled sites and edges by their relative number of shared alleles 69 . Excessive network connections with surplus information were removed until a threshold maximizing Modularity 70 . The leading eigenvector algorithm using the percolated network assigned a unique membership (i.e., cluster) to the nodes based on the edge's distance. The significance of this process was inferred by testing the proportion of 10 4 random membership assignments retrieving higher Modularity than observed. Network eigenvector centrality was also determined to identify sites serving as hubs to gene flow 69 .
Genetic differentiation was determined between clusters with Jost's D. This index was chosen in detriment of F ST because it is more suitable for comparisons with contrasting levels of diversity 71 , as in this case (see "Results"). Genetic diversity per site and cluster was inferred as standardized allelic richness, gene diversity (expected heterozygosity) and private alleles, for the smallest sample size found in any population (excluding sizes ≤ 10) using 10 4 randomizations.

Results
Species distribution modelling. The marine forests dataset retrieved 10,715 occurrence records for the species. Correlograms identified predictors autocorrelated at a minimum distance of 4 km in the northern group, and 9 km in the southern group, pruning a final dataset of 180 and 369 records, respectively. The distribution models included multiple predictors, from which temperatures (maximum and minimum) had a prominent role in explaining the distribution of both northern and southern genetic groups (combined relative contributions: 95.06% and 75.15%, for the northern and southern hemispheres, respectively; Table 1). Thermal tolerance was estimated between 3.22 and 24.23 °C for the northern group and between 1.94 and 23.98 °C for the southern group. The modelled distribution of the southern group was further dependent on nutrients above ~ 0.02 mmol.m −3 and ice thickness above 0.16 m (contributions ranging between 5.69 and 10.63%; Table 1). An additional niche overlap analyses performed between the northern and southern groups (S2) showed evidence of niche conservatism in Macrocystis pyrifera (i.e., the northern and southern hemispheres display identical ecological niches).
The models showed good potential for transferability (CV AUC > 0.9; CV Sensitivities > 0.9; Table 2) and the predictions performed for the present largely matched the known distribution of the species, as verified by sensitivity (i.e., true positive rate) > 0.95 and AUC > 0.94 (Table 2; Fig. 1). Comparing predictions with the species' realized distribution, as inferred by reports of occurrence records ( Fig. 10 in S3), shows some degree of overprediction in Alaska (from Palma Bay to Kodiak Island, ~ 900 km coastline), Chile (from Antofagasta to Huasco, Atacama, ~ 500 km coastline), Southern Africa (from Paternoster, South Africa, to Kunene River, Namibia, ~ 1,500 km coastline). To a lower extent, overprediction was detected in Western Australia (~ 300 km coastline) and Southern Australia (~ 400 km coastline). These are regions where the species is not reported to occur, but where the models predicted suitable conditions based on the predictor variables considered. Similarly, the species was continuously predicted along the coastlines of Washington (USA) and Southern Oregon, where reports of occurrence records are elusive ( Fig. 10 in S3). During the LGM, the models estimated a 40% range contraction within the regions sampled for genetic data ( Table 3). During that past period, unsuitable areas might have occurred in the high latitudinal regions of Alaska to Washington state (USA), and throughout most of the Southern Ocean, Tierra del Fuego (Argentina) and Tasmania ( Fig. 1; Table 3). Conversely, the low latitude range limits might have remained stable (e.g., California to Mexico, Northern Chile, Peru, Australia and New Zealand). Among the sampled regions, Channel Islands to Mexico was the only one predicted with wider suitable habitats in the LGM when compared to the present (Table 3). Results also indicated that other regions, like Southeastern Africa, Western and Eastern Australia and Northern New Zealand, might have had wider suitable habitats for M. pyrifera during the LGM (Figs. 6-9 in S3), with ranges potentially expanding beyond its present-day modelled distribution. Australia is in the same genetic group as Patagonia and the sub-Antarctic islands (see genetic results below), and these latter ones suffered local extinctions due to ice during the LGM, therefore the genetic group as a whole increased in area from the LGM to the present (Table 3), even though that was not the case in Australia.
The models predicting suitable surface conditions (i.e., exclusively for rafting) showed disjunct potential distributions between the northern and southern hemispheres during the LGM and the present. Throughout the Southern Ocean, models showed broad suitable conditions for rafting with present-day conditions, while during the LGM, a discontinuity might have occurred along southern Chile and Tierra del Fuego (Argentina).
Genetic structure and diversity. A total of 156 alleles were amplified across 3872 genotyped specimens (range of alleles per locus: . The Evanno criteria applied to the first hierarchal level of Structure revealed www.nature.com/scientificreports/   Fig. 2). The southern hemisphere was divided into a broad region including northern Chile and New Zealand, and into Patagonia, Australia and the Subantarctic Islands. The network analysis supported the two main clusters of the northern and southern hemispheres (Modularity p-value test: 0.001) and showed the two sites in Peru belonging to the northern (site 74) and southern clusters (site 73; Fig. 2). This analysis further showed Point Conception, Catalina Island and Tierra del Fuego (sites 25, 26, 62 and 90; Fig. 2) with higher eigenvector centrality (> 95th percentile). The two main clusters associated with the northern and southern hemispheres showed high differentiation (Jost's D: 0.64; Fig. 3C). At a lower level of genetic structure, Peru (admixture region) and the cluster comprising northern Chile and New Zealand were most differentiated among themselves and from all remaining clusters (Fig. 3D). In terms of genetic diversity, results show allelic richness per site higher in the Channel Islands and San Mateo Point (USA), followed by San Diego region (USA) and Mexico (Â > 8; S1). Lower diversity values were found in the regions predicted as extinct during the LGM (i.e., Patagonia, throughout Marion, Gough and Macquarie Subantarctic Islands and in Canada and Alaska). Number of private alleles were higher in the Channel Islands (particularly in San Clemente and Santa Barbara Islands), in the two sampled sites of Peru, in Monterey (USA) and New Zealand (PÂ > 1; S1). More than 50 sites distributed throughout the sampled regions exhibited no private alleles. Overall, allelic richness outside glacial refugia was always < 3, with the exception of two sites in Alaska (USA), and the average number of private alleles was < 1 at distances > 55 km from refugia and = 0 at distances > 150 km from refugia (Fig. 3A, B). The sites Sitka (Alaska), Craig (Alaska) and Bamfield Island (Canada) of the northern hemisphere and Grytviken (Subantarctic Island) and Murray Channel (Tierra del Fuego) of the southern hemisphere distanced > 500 km from the closest glacial refugia (Table S1). Considering the first level of genetic clustering, results show diversity (allelic richness and number of private alleles) significantly higher in the northern hemisphere (Fig. 3C). The second level of structure revealed higher diversity in the cluster from Channel Islands to Mexico and lower diversify in that of Alaska and Canada. In the southern hemisphere the number of private alleles were higher in Peru and in the northern Chile and New Zealand clusters.

Discussion
The rare opportunity to study a biological model with replication of postglacial expansions in both northern and southern hemispheres allowed us to demonstrate that the distribution of intraspecific marine biodiversity can be structured by comparable evolutionary forces across the global ocean. By combining independent theoretical distribution modelling with empirical genetic data, we show how past range shifts (c. 20 Kyr ago to the present) shaped refugial regions in both hemispheres, where Macrocystis pyrifera retained higher and unique genetic diversity, and how significant post-glacial expansions left ~ 40% of the sampled distribution with homogenous reduced diversity, regardless the distance to refugia. Genetic patterns further show the northern hemisphere with significantly higher genetic diversity and number of private alleles, supporting previous hypotheses 46 of an ancestral origin of the species in this region. In particular, the genetic hotspots found at the Channel Islands (California; putative global origin of M. pyrifera) and in Peru (highest number of private alleles), potentially involved in the ancestral radiation of lineages, are suggested here as priority conservation areas under Climate Change Integrated Conservation Strategies for potentially endangered phylogeographic lineages 72 . The distribution of 6 distinct clusters, with strong genetic discontinuities corresponding to previously described phylogeographic breaks of additional species and well-known oceanographic barriers shaped by ocean currents and habitat discontinuities, allow making broad generalizations about the process driving marine phylogeography, particularly important for ecosystems structuring species.

Climate changes structuring genetic diversity across the hemispheres. The SDM approach
showed the potential distribution of Macrocystis pyrifera (i.e., climatic suitable habitats) mainly driven by thermal conditions (i.e., minimum and maximum extremes), which is typical for marine forest species modelled at the scales of our study 2,7 . The estimated thermal tolerance of the species between approx. 2 and 24 °C largely agreed with empirical physiological studies 73,74 , which together with the evidence found of niche conservatism between hemispheres (i.e., identical ecological niches; S2) provided strong support for the models, particularly important when performing temporal transferability to estimate demographic changes 57,75 . The high performance shown by our SDM approach is expected when using machine learning algorithms with proper hyperparameterization and monotonicity constrains 49 , fitting biologically meaningful predictors against comprehensive datasets of distribution records 7,76 . Despite the general agreement found between predicted suitable habitats and reported occurrence records (i.e., high sensitivity rate), overprediction was found in Alaska, Chile, and Southwestern Africa, and, to a lower extent, in Western and Southern Australia 52 , and along the coastlines of Washington and Oregon states (USA). While the models predicted favorable conditions, additional predictor variables not considered might be preventing its occurrence. Specifically, the lack of rocky substrate 43 , interspecific competition 77 , limited light conditions 78 , and restricted dispersal potential 29 can be excluding the species from such areas predicted as suitable.
Hindcasting the models to LGM climatologies estimated a 40% range contraction along the regions sampled for genetics. This was mostly evident from Alaska to Vancouver Island and throughout most of sub-Antarctic Islands, Southern Patagonia and Tasmania. Conversely, lower latitude ranges (e.g., California to Mexico, Northern www.nature.com/scientificreports/ The rate and magnitude of these changes are not uncommon for cold-temperate species 6,[9][10][11]14 , but were never addressed before with the same biological model exposed to the LGM conditions of both hemispheres in a SDM framework. This allowed identifying multiple mid-to low latitude refugia at global scales, for which peri-glacial margins largely match additional genetic studies; for instance, Vancouver Island adjacent to the Laurentide Ice sheet in the northern hemisphere 79 and Patagonia and the Falkland Islands in the southern hemisphere 24 . As anticipated, populations within refugia showed higher and unique genetic diversity levels, as long-term persistence rendered the possibility of population to accumulate and retain regional diversity levels 8,9,14,80 . Outside refugia, where range shifts might have occurred, diversity levels were always lower regardless the hemisphere, with private alleles reduced to negligible values < 1 at distances > 55 km of refugia, and to 0 at distances > 150 km. This skewed pattern of diversity occurring at such short distances of closest refugia (but not from the putative origin region; Fig. 5 in S3) likely resulted from consecutive founder effects occurring during post-glacial range expansions, which have left homogeneous landscapes with lower diversity levels in the two hemispheres 81 . This pattern might have been maintained by density barrier effects, halting the expansion of distinct immigrant genes 82,83 .
Results also highlighted the role of suitable habitat changes (i.e., predicted areas) as a proxy of effective population size changes through time 11 . In particular, the genetic hotspot found from Channel Islands to Mexico, www.nature.com/scientificreports/ with striking higher diversity compared to anywhere else, was the only sampled region that increased suitable habitats during the LGM. An additional study shows this region with even larger habitats (up to threefold) from the LGM to the mid-Holocene 42 . Large and stable habitats can potentially maintain population sizes large enough to counteract the genetic effects of inbreeding depression and drift over time 84 . This contrasts with regions like Patagonia and sub-Antarctic islands that might have experienced severe population reductions, while not complete extinct (up to 56% LGM contraction), and today show reduced diversity. But comparing diversity levels between sites / genetic groups must consider additional drivers beyond potential habitat changes through time. For instance, the major genetic bottleneck during the colonization of the southern hemisphere 46 (see "Discussion" below on "Global phylogeographic patterns") may have strongly reduced diversity of new colonized areas, despite the subsequent relative size of persisting refugia. Accordingly, outside the putative region of Macrocystis origin, even populations within estimated regionally persistent areas, may presently contain lower diversity than expected in the ancient locations of its ancestral evolutionary history. Genetic erosion within refugia can further occur due to the exposure of individuals to peripheral niche conditions at low latitude range margins, typically characterized by nutrient-deprived warmer waters 30,85 . In the present study, Mexico (northern hemisphere) and Peru (southern hemisphere) can represent such cases. Despite being predicted within refugia, considering their high private diversity levels and SDM estimates, both regions show low allelic richness. There, studies following demographic changes of giant kelp have recurrently linked the extreme conditions of El Niño events to local population extinctions 32,86 , which might have produced bottlenecks and shifted genetic baselines 34 . Cryptic persistence in deep colder waters or microscopic gametophyte stages, as suggested elsewhere 87,88 , could have counteracted complete genetic losses and safeguarded the observed unique diversity. In this context, climate change scenarios are projected to shift the 24 °C isotherm matching the species' thermal tolerance to more poleward regions, hundreds of km beyond these sites, particularly in the no mitigation scenarios overlooking the Paris Agreement initiative 89 . The complete loss of unique genes, as recently reported for the Ecklonia forests of Oman 90 , is of particular concern 30 , calling for considering both warm range edges of M. pyrifera as priority for conservation in light of Climate Change Integrated Conservation Strategies for potentially endangered phylogeographic lineages 72 .
Global phylogeographic patterns. Results showed M. pyrifera composed by two main genetic clusters, which coincide with the division of both hemispheres, followed by a second and more complex level of structure comprising six clusters. All clusters showed significant pairwise differentiation levels, largely suggesting the effect of genetic drift, not compensated by regular gene flow. Compared to the southern hemisphere, the northern hemisphere showed significantly higher genetic diversity and up to fourfold more private alleles (i.e., genetic diversity endemic of the northern hemisphere), rendering it the potential ancient origin of the species. This is supported by previous studies also hypothesizing the southern hemisphere distribution of M. pyrifera and additional cold-temperate species of macroalgae resulting from northward introductions 46,91 . The centrality patterns found in both genetic structure and network analyses point to Catalina Island bridging the hemispheres through the region sampled in Peru, a hypothesis raised elsewhere 46 . This process can explain the admixture levels found in Peru, and may have occurred by stepping-stone through unsampled tropical regions where the species occurred, or still occurs in elusive habitats (e.g., Socorro Island in Mexico) 92 , yet in a distant past beyond the LGM, as SDM does not suggest the possibility of rafting between the hemispheres. Additional temperature reconstructions for this tropical area match the sea surface temperature anomaly of our data during the LGM of approx. − 2 °C, reinforcing a plausible long-term barrier for rafting. Earlier periods like the pre-or mid-Pleistocene climate transition (620-435 Kyr ago and 870-620 Kyr ago) could have provided suitable conditions for the stepping-stone process, as suggested for the red algae Callophyllis variegata 91 .
Marked genetic structure hinders the inference of other colonization pathways, with the exception for the regions more recently colonized (i.e., Alaska, Canada and most sub-Antarctic Islands). Structure and network analyses (degree centrality) suggest Alaska and Canada potentially originating from Monterey Bay, and the sub-Antarctic Islands being colonized by Tierra del Fuego or a neighboring persistent location (e.g., Falkland Islands), where the species could have dispersed 9,24,93 ; this contrasts with additional persistent areas (e.g., southern Chile) from which dispersal might have been hindered by the major discontinuity for rafting occurring along southern Chile and Tierra del Fuego, as suggested by the SDM.
The genetic divisions observed in both hemispheres are corroborated by previous studies of M. pyrifera 38,39,46 , and reinforce the role of oceanographic currents and habitat discontinuity shaping the genetic structure of the species, as previously studied along the Californian coastline 43,44 . In the northern hemisphere, the deflection of the Alaska and California currents 94 can explain the division between G1 (Alaska, Canada & Monterey) and G2 (Santa Cruz to Point San Luis), while the well-known biogeographic barrier of Point Conception 95 defines a genetic cluster per se. In the southern hemisphere, the split of the Antarctic Circumpolar Current into the northward Humboldt current and the southward Cape Horn current shapes a barrier at − 42° latitude matching the observed division between Patagonia and N. Chile (G5 and G6). The genetic discontinuities in this region were also associated with founder effects during post-glacial colonization of Patagonia for other macroalgae species, as well as for invertebrates and vertebrates [96][97][98][99] . The subdivision of Peru can result from habitat discontinuity shaped by extensive sandy beaches, or by the highly advective Humboldt Current System, which promotes off-shore transport, therefore generating a connectivity barrier to coastal populations. Lastly, the Tasman Sea, identified as a major biogeographical barrier for marine biodiversity 100 , explains the additional separation of G5 and G6, from Australia to New Zealand.
Overall, the present study with replication along the northern and southern hemisphere, highlights the role of past climate changes at global scales in structuring refugial regions where populations may display higher and unique genetic diversity, compared to more recent populations resulting from post-glacial expansions. Additional www.nature.com/scientificreports/ drivers such as regional population size changes, genetic bottlenecks produced by major phylogeographic events (e.g., the colonization of the southern hemisphere) and the exposure to peripheral niche conditions in warm range edges might have further been key to structuring global diversity levels. The study also highlights the role of major oceanographic currents and habitat discontinuities, coupled with the density barrier effects of previously established populations, restricting the homogenization of distinct genetic lineages over time, as suggested for other kelp species 3,98 . In contrast, long-distance dispersal by rafting must be a rare process 29 , but crucial to the colonization of unoccupied suitable habitats following glacial events. Together, these processes shaped disjunct genetic lineages at short distances (scales of tens to hundreds of km), rendering them of higher conservation value in the face of future climate changes, particularly if one considers the potential loss of unique alleles at low latitude range margins, and the multiple ecological, and socioeconomic services provided by giant kelp forests 27 . The insights provided with replication across hemispheres, contribute to a better understanding of the genetic effects of past climate changes, and may be generalizable to additional cold-temperate species, particularly for other important ecosystem-structuring species.