Genetic patterns in Neotropical Magnolias (Magnoliaceae) using de novo developed microsatellite markers

Veltjen, Emily; Asselman, Pieter; Hernández Rodríguez, Majela; Palmarola Bejerano, Alejandro; Testé Lozano, Ernesto; González Torres, Luis Roberto; Goetghebeur, Paul; Larridon, Isabel; Samain, Marie-Stéphanie

doi:10.1038/s41437-018-0151-5

Download PDF

Article
Open access
Published: 27 October 2018

Genetic patterns in Neotropical Magnolias (Magnoliaceae) using de novo developed microsatellite markers

Emily Veltjen ORCID: orcid.org/0000-0002-3170-3345¹^na1,
Pieter Asselman^1,2^na1,
Majela Hernández Rodríguez³,
Alejandro Palmarola Bejerano⁴,
Ernesto Testé Lozano³,
Luis Roberto González Torres⁵,
Paul Goetghebeur¹,
Isabel Larridon ORCID: orcid.org/0000-0003-0285-722X^1,6 &
…
Marie-Stéphanie Samain ORCID: orcid.org/0000-0002-7530-9024^1,7

Heredity volume 122, pages 485–500 (2019)Cite this article

3824 Accesses
22 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Conserving tree populations safeguards forests since they represent key elements of the ecosystem. The genetic characteristics underlying the evolutionary success of the tree growth form: high genetic diversity, extensive gene flow and strong species integrity, contribute to their survival in terms of adaptability. However, different biological and landscape contexts challenge these characteristics. This study employs 63 de novo developed microsatellite or SSR (Single Sequence Repeat) markers in different datasets of nine Neotropical Magnolia species. The genetic patterns of these protogynous, insect-pollinated tree species occurring in fragmented, highly-disturbed landscapes were investigated. Datasets containing a total of 340 individuals were tested for their genetic structure and degree of inbreeding. Analyses for genetic structure depicted structuring between species, i.e. strong species integrity. Within the species, all but one population pair were considered moderate to highly differentiated, i.e. no indication of extensive gene flow between populations. No overall correlation was observed between genetic and geographic distance of the pairwise species’ populations. In contrast to the pronounced genetic structure, there was no evidence of inbreeding within the populations, suggesting mechanisms favouring cross pollination and/or selection for more genetically diverse, heterozygous offspring. In conclusion, the data illustrate that the Neotropical Magnolias in the context of a fragmented landscape still have ample gene flow within populations, yet little gene flow between populations.

Genetic diversity in North American Cercis Canadensis reveals an ancient population bottleneck that originated after the last glacial maximum

Article Open access 08 November 2021

Meher Ony, William E. Klingeman, … Denita Hadziabdic

Patterns of genetic diversity and structure of a threatened palm species (Euterpe edulis Arecaceae) from the Brazilian Atlantic Forest

Article 13 June 2022

Aléxia Gonçalves Pereira, Marcia Flores da Silva Ferreira, … Adésio Ferreira

Genetic diversity and conservation of Siberian apricot (Prunus sibirica L.) based on microsatellite markers

Article Open access 11 July 2023

Xinxin Wang, Li Wang, … Shengjun Dong

Introduction

Conservation genetics utilises a representative sample of DNA and organisms to quantify and study genetic diversity to preserve species as dynamic entities capable of coping with environmental change (Frankham et al. 2010). A collection of DNA fragments representing the genome is realised by employing molecular markers: fragments of DNA associated with a certain location within the genome, providing information about the allelic variation at the given locus (Schlötterer 2004). Microsatellite or SSR (Simple Sequence Repeat) markers are often the preferred type of molecular marker in conservation genetics because they are codominant, highly polymorphic, ubiquitous, reproducible and neutral; and they have a high mutation rate, as well as an easy sample preparation (Selkoe and Toonen 2006). Although it is labour and cost intensive to develop and test SSR primer pairs, these can often be employed across species, with success decreasing proportionally to relatedness (Kalia et al. 2011). A representative sampling of organisms can be interpreted at different levels: individuals for populations, populations for species, and species for ecosystems. The latter strategy makes use of the umbrella species concept (Roberge and Angelstam 2004).

An exemplar group of umbrella species are trees: they maintain the structure and function of forest ecosystems, and create resource niches and patches for other organisms (Pautasso 2009). Trees also provide various ecosystem services and resources for human use (Neale and Kremer 2011) and their genetics and evolution have paradoxical features (Petit and Hampe 2006). Trees were found to maintain high levels of genetic diversity (Hamrick et al. 1992), but experience low nucleotide substitution rates and low speciation rates when compared to annual plant lineages (e.g. Bousquet et al. 1992; Petit and Hampe 2006; Whittle and Johnston 2003). They combine high local differentiation for adaptive traits (Aitken et al. 2008) with extensive gene flow (Austerlitz et al. 2000; Kremer and Le Corre 2012). Furthermore, they maintain species integrity, while expressing abundant interspecific gene flow (Ellstrand et al. 1996). The abovementioned features provide an expected capacity for tree survival, as they create resilience against threats such as climate change or habitat fragmentation (Aitken et al. 2008; Hamrick 2004). However, the interplay of the biological and landscape context challenges these generalised characteristics and creates the need for context-oriented tree conservation genetic studies and subsequent management guidelines (Aparicio et al. 2012; Dick et al. 2008).

To investigate the general patterns of tree genetics in an empirical setting, and to contribute to the conservation of the species and forests under study, we focus on New World representatives of the tree genus Magnolia (Magnoliaceae) occurring at tropical latitudes, hereafter named Neotropical Magnolias. Magnolia trees provide an interesting case-study with bisexual, protogynous flowers, specialised beetle pollination with tepal movement, variable flowering phenology and seed dispersal by animals (Thien 1974). The Red List of Magnoliaceae (Rivers et al. 2016) states that 76% of the Neotropical Magnolias are threatened, with an additional 16% listed as data deficient. Neotropical Magnolia populations have not been studied from a molecular point of view (Cires et al. 2013) and their species are delineated based on morphological and distributional argumentation (e.g. Howard 1948; Palmarola et al. 2016; Vázquez-García et al. 2013b). Many of the Magnolia species and populations occur in fragmented, highly-disturbed, relict primary forest landscapes, such as the cloud forests of the Caribbean islands and the cloud and rain forests of Mexico (Rivers et al. 2016).

This study aims to (1) provide de novo developed SSR markers for Neotropical Magnolia species; (2) employ the SSR markers for genetic species delimitation between Caribbean Magnolia species; (3) search for patterns of extensive gene flow between Caribbean Magnolia (sub)species and populations; and (4) test for signs of inbreeding within the Neotropical Magnolia populations.

Material and methods

Sampling and DNA extraction

Sample information of the 17 different taxa (i.e. 16 species, of which one species consists of two subspecies) and 17 populations included in this study are given in Table 1. A map, showing the location information of the wild collected accessions of Neotropical Magnolia from the Caribbean and Mexico, is given in Fig. 1. The wild collected samples comprise 346 samples, of which 340 represent the 17 populations. The additional six wild collected samples represent single collections of different species. One further sample is from an ex situ collection of M. dealbata.

Table 1 Sample information of 17 Magnolia taxa (i.e. 16 species, of which one species consists of two subspecies) and 17 populations included in the SSR testing and/or genotyping

Full size table

For the 17 populations included in the full genetic analyses, Average Pairwise Distance between individuals (APD), Maximum distance between consecutive individuals (Max), Spatial extent of the populations (SpE) and number of sampled individuals per populations (N_S) are given in Table 2. Pairwise distances were calculated using the fossil package (Vavrek 2011) in R v.3.4.3 (R Core Team 2016).

Table 2 Population statistics of Caribbean and Mexican Magnolias

Full size table

All 347 leaf samples were dried in silica gel and their DNA was isolated using a modified cetyltrimethylammonium bromide (CTAB) (Doyle and Doyle 1987) extraction protocol, with MagAttract Suspension G solution (Qiagen, Germantown, USA) (Xin and Chen 2012) mediated cleaning (Larridon et al. 2015). DNA quantity and quality control was executed using a Qubit® 2.0 Fluorometer (Thermo Fisher Scientific, Massachusetts, USA) and Nanodrop 2000 Spectrophotometer (Thermo Fisher Scientific), respectively.

SSR markers: development and testing

Primer pairs were developed to amplify sequences containing SSR repeats based on four Neotropical Magnolia species: Magnolia lacandonica (MA39), M. mayae (MA40), M. dealbata (MA41), and M. cubensis subsp. acunae (MA42). The development of the enriched microsatellite library was outsourced to Allgenetics® (A Coruña, Spain) where enrichment was performed using the Nextera XT DNA kit probes (Illumina, California, USA) with the following motifs: AGG, ACG, AAG, AAC, ACAC and ATCT. The library was sequenced on an Illumina MiSeq® platform.

From the 4 × 500 predetermined SSR primer pairs provided by Allgenetics®, 176 were selected for further testing: 49 developed from MA39-reads, 20 developed from MA40-reads, 20 developed from MA41-reads and 87 developed from MA42-reads. Selection of the 176 SSR markers was carried out randomly, respecting the characteristics specified in Guichoux et al. (2011). The forward primers were linked with a universal tail to accomplish multiplex pooling in a three-primer PCR (Vartia et al. 2014). The following universal tags were used: T3: 5′ AATTAACCCTCACTAAAGGG 3′, M13(-20): 5′ GTAAAACGACGGCCAGT 3’, Hill: 5’ TGACCGGCAGCAAAATTG 3′ (Tozaki et al. 2001) and Neomycin reverse: 5′ AGGTGAGATGACAGGAGATC 3’. The reverse primers had a PIG-tail (Brownstein et al. 1996).

All 176 markers were screened for amplification success on the 17 taxa, each represented by one randomly selected sample. PCRs were performed on a total volume of 13 µL under the following conditions: 2 min at 95 °C; 35 cycles of 95 °C for 30 s, 52 °C for 30 s, 72 °C for 90 s; 72 °C for 6 min. The Master Mix contained 0.2 µM forward primer, 0.2 µM reverse primer, 5 ng/ml DNA (suspended in 1 × TE buffer), 1 × TrueStart Taq Buffer (Thermo Fisher Scientific), 1.5 µM MgCl₂ (Thermo Fisher Scientific), 0.125 µM dNTP, 5U of TrueStart Hot Start DNA polymerase (Thermo Fisher Scientific), and 0.4 mg/ml BSA (bovine serum albumin) per reaction. PCR products were run on a 1% agarose gel, stained with ethidium bromide and visualised under UV-light. Every (sub)species × primer combination was scored. Amplification scores of the 63 published SSR markers are given in the Supplementary Table S1. The (sub)species × primer combinations which were scored to have a single band were submitted to polymorphism testing.

Polymorphism tests were executed on eight individuals per Magnolia species, comprising four individuals per predefined population. The individuals for the test-multiplexes were selected to be spatially spread throughout the populations and have 260/230 and 260/280 OD (Optical Density) ratios approximating 2. The (sub)species × primer combinations were scored: 63 were considered polymorphic and unambiguous SSR markers in at least one of the ten tested taxa (Supplementary Table S2). These 63 SSR markers were used for species-specific multiplex design and final genotyping. Their primer information can be found in Supplementary Table S3.

Genotyping of individuals was executed by a multiplex pooling with a three-primer PCR (Vartia et al. 2014). The fluorescent labels FAM, NED, PET and VIC were linked to the tails T3, Hill, Neo and M13, respectively. The multiplex pools were designed using Multiplex Manager (Holleley and Geerts 2009). Multiplex PCRs were performed on a total volume of 5 µL, under the following conditions: 15 min at 95 °C; 35 cycles of 94 °C for 30 s, 57 °C for 90 s, 72 °C for 90 s; 72 °C for 10 min. Each multiplex reaction contained 2 × QIA Multiplex PCR Master Mix (Qiagen), 5 ng/µL DNA, 0.025 µM for each forward primer, 0.1 µM for each reverse primer and 0.1 µM for each specified dye, carrying the same universal tail as the selected forward primer of the chosen primer pairs. Fragment analyses were executed by Macrogen Inc. (Seoul, South Korea) on an ABI 3730XL fragment analyser (Thermo Fisher Scientific) with a GeneScan^TM 500 LIZ^TM ladder (Thermo Fisher Scientific). The results were analysed in Geneious v.8.1.9 (http://www.geneious.com, Kearse et al. 2012) using the microsatellite plugin. When the test on the subset of individuals appeared promising (i.e. one set of clear peaks, good amplification and more than one allele), 20 individuals per population were genotyped for that marker. The ten taxa were genotyped for 21–36 polymorphic markers, delivering ten separate taxon-datasets (Supplementary Table S2: one taxon-dataset = one column with the markers coded “A”).

Error rates (Selkoe and Toonen 2006) for the markers (Supplementary Table S3) across all ten taxon-datasets were calculated, but were not actively and consistently tested for: duplicate genotyping was produced as a side-product during testing for polymorphism, optimizing multiplexes, re-genotyping a complete multiplex for (a) low/unclear peak(s), or as positive control between PCR batches.

The ten taxon-datasets were submitted to MICRO-CHECKER v.2.2.3 (Van Oosterhout et al. 2004) and ML-NullFreq (Kalinowski and Taper 2006) to test for null alleles. MICRO-CHECKER was run with 1000, and ML-NullFreq was run with 100 000 repetitions. Based on the results, markers with a high probability of representing null alleles were discarded from all downstream analyses.

To ensure that all amplified genetic regions were independent samples of the genome, allelic associations (Lewontin and Kojima 1960) (synonym: Linkage Disequilibrium = LD) per population were analysed in each of the ten taxon-datasets using the software program GENEPOP v.4.3 (Rousset 2008) with the dememorization number set to 10 000, batches set to 1000 and 50 000 iterations per batch. Evaluation of allelic associations was executed by examining both the uncorrected (Waples 2015) and (sequential Bonferroni) corrected p-values (Holm 1979) with nominal p-values of 0.05 per species and per population.

Genetic structure

To assess the utility of the SSR markers for genetic species delimitation between closely located Caribbean Magnolia species and to search for patterns of extensive gene flow between Caribbean Magnolia (sub)species, five different supraspecific (i.e. above species level) datasets were instated. Dataset 1 comprises 340 individuals representing 17 populations, genotyped for all their polymorphic and monomorphic loci (see Supplementary Table S2: all marker × taxon combinations coded A, B and C). Hence, for this dataset it was assumed that the loci that tested to be monomorphic for four or eight individuals were monomorphic for all 20 individuals. Dataset 2 comprises 340 individuals representing 17 populations, genotyped for all the polymorphic and monomorphic loci, but not the assumed monomorphic loci (See Supplementary Table S2: all marker × taxon combinations coded A and B). Dataset 3, or the Splendentes-normalized-dataset, comprises ten loci (see Supplementary Table S2: SSR markers labelled with an asterisk) that were genotyped for 260 individuals representing 13 populations and eight taxa of section Talauma subsection Splendentes (Table 1: Class. = TAS). Added to datasets 1, 2 and 3, two smaller supraspecific datasets were instated, representing the apparently closely related species i.e. the two species of Puerto Rico: the PR-dataset; and the three species of the Dominican Republic: the DR-dataset. To search for patterns of extensive gene flow between Caribbean Magnolia population pairs within the defined species, the 17 populations were studied on the infraspecific (i.e. below species) level using nine species-datasets (i.e. the taxon-datasets of the two M. cubensis subspecies were joined) and 17 population-datasets.

A first batch of analyses was conducted in STRUCTURE v.2.3.4 (Pritchard et al. 2000) on datasets 1, 2 and 3, the PR- and DR-datasets, the nine species-datasets and the 17 population-datasets. STRUCTURE analyses were run with a burn-in of 100 000, 100 000 MCMC steps after the burn-in and the admixture model as ancestry model. Datasets 1, 2 and 3 were run with the allele frequency model set to independent allele frequencies. They were expected to consist of 13 (dataset 3) or 17 (dataset 1 and 2) populations and were run with K set from 1 to 25. The PR- and DR-datasets were run both with the independent allele frequency model and the correlated allele frequency model and their results were compared. They were expected to have between 2 and 6 populations and K was set from 1 to 15. The nine species-datasets and 17 population-datasets were run with the allele frequency model set to correlated allele frequencies. They were run with K set from 1 to 10. For all datasets, each value of K was run 10 times. The results were visualized with Structure Harvester Web v.0.6.94 (Earl and vonHoldt 2012). The best K-value was selected using the ΔK statistic (Evanno et al. 2005) and the results for mean maximum likelihood (Mean LnK). The latter was taken into consideration because the ΔK statistic appointed K-values with unstable replicate results for datasets 1, 2 and 3 and because the ΔK statistic cannot detect single clusters: an outcome expected at the infraspecific level (i.e. population-datasets and possibly the species-datasets). Barplots were visualised using DISTRUCT v.1.1 (Rosenberg 2004).

DAPC analyses (Discriminant Analysis of Principal Components) on datasets 1, 2 and 3 were executed in R using the package adegenet (Jombart 2008). In the find.clusters function we retained 300 PCs for dataset 1 and 2, and 140 PCs for dataset 3. The number of PCs to retain for the PCA eigenvalues was determined using cross-validation. All discriminant functions (DA eigenvalues) were kept.

Pairwise F_ST values (Weir and Cockerham 1984) and their confidence intervals were calculated in R using the package diveRsity (Keenan et al. 2013). To visualize the genetic distances for dataset 1, 2 and 3, an unrooted network applying the Neighbour-joining (NJ) method based on Nei’s genetic distance: D_A (Nei et al. 1983), was constructed using Populations v.1.2.32 (http://bioinformatics.org/populations/) using 1000 bootstrap replicates as a confidence measure.

Mantel tests on the supraspecific level were performed in GenAlEx v.6.5 (Peakall and Smouse 2006; Peakall and Smouse 2012) on the pairwise log-transformed geographic distance and pairwise F_ST values using 9999 permutations. Coordinates of one individual were taken as a representative of its population. Species geographic distance was averaged over the populations of the species.

Inbreeding and population statistics

To test for inbreeding within the Caribbean Magnolia populations, the inbreeding coefficient (F_IS) for each locus and population was calculated in FSTAT. Tests to detect significant deviations from Hardy-Weinberg proportions (HWP) were calculated in GENEPOP, performing 2-tailed exact tests for each locus in each population. Complete enumeration was performed whenever possible (Louis and Dempster 1987), otherwise MCMC chains were run with 200 batches and 50 000 iterations (Guo and Thompson 1992). Deviations of both the uncorrected and sequential Bonferroni corrected p-values were used to evaluate if populations were truly deviating from HWP (Waples 2015). To frame and discuss the results, different statistical parameters were calculated for each locus and population within the ten taxon-datasets using GenAlEx, i.e. the percentage of polymorphic loci (P), the number of genotyped individuals (N), mean number of alleles (A), expected heterozygosity (H_e), and observed heterozygosity (H_o).

Results

SSR markers

Overall, 82–92% of the primer pairs amplified, of which 53–67% were scored to be a single amplification product (Supplementary Table S1). The polymorphism tests of the markers giving a single amplification product classified 16–37% of the primer pairs unambiguous and polymorphic (Supplementary Table S2). The reported SSR primers all have heterozygote states in at least one individual and a perfect motif (Weber 1990). For 56 SSR markers, the duplicate runs rendered the same genotypes (Supplementary Table S3: error rate: 0%). For one SSR marker no genotypes were duplicated. The error rates of the other six SSR markers ranged from 1–3.85%.

Results of detection and frequency of null alleles per marker × population combination are given in Supplementary Table S4. Twelve marker × species combinations were considered to have a high probability of showing null alleles: M. cubensis (MA42_028), M. domingensis (MA39_199), M. ekmanii (MA39_023, MA42_087), M. hamorii (MA40_223, MA42_413), M. lacandonica (MA39_182), M. pallescens (MA39_023, MA42_472), M. portoricensis (MA42_481) and M. splendens (MA39_023, MA42_481).

Associated alleles per marker × species combination are given in Supplementary Table S4. Magnolia domingensis and M. lacandonica showed a number of SSR markers with associated alleles that were higher than expected for the number of pairwise tests executed. The other eight taxa fell within their confidence intervals of false positives, whereby one significantly associated pair of SSR markers was detected in M. pallescens (MA40_045 × MA42_472).

Genetic structure: supraspecific level

Supraspecific ΔK and Mean LnK plots are depicted in Supplementary Figure S5A–E and their interpretation is summarized in Table 3. Barplots of the STRUCTURE analyses on the three full supraspecific datasets are depicted in Fig. 2a–d. The DR-dataset and PR-dataset structured according to the species given both criteria and correlation frequency models. In the DAPC analysis, the “true” K in the replicate runs of the find.clusters algorithm was not univocal, and ranged between 9–13 for dataset 1, 9–15 for dataset 2 and 8–11 for dataset 3. For each dataset, a representative DAPC analysis is visualised in Fig. 3. Supraspecific pairwise F_ST values range from 0.216 to 0.618 for dataset 1, 0.166 to 0.472 for dataset 2 and 0.130 to 0.308 for dataset 3 (See Table 4). Their confidence intervals are visualized in Supplementary Figure S6. The unrooted NJ trees based on D_A are depicted in Fig. 4. The Mantel tests for all three datasets including all population-pairs were significant (p = 0.000–0.003). Mantel tests on the supraspecific pairwise distances were significant for dataset 1 (p = 0.000), but not for dataset 2 (p = 0.080) and dataset 3 (p = 0.256). See Supplementary Figure S7 for visualisation of the relationship between geographic and genetic distance and Table 4 for the Pairwise Geographic Distance (PGD) between the population pairs.

Table 3 Number of STRUCTURE clusters of Magnolias from the Caribbean and Mexico

Full size table

Table 4 Pairwise F_ST values and pairwise geographic distance (PGD in km) of Magnolias from the Caribbean and Mexico

Full size table

Genetic structure: infraspecific level

Infraspecific ΔK and Mean LnK plots are depicted in Supplementary Figure S5F–V2 and their interpretation is summarized in Table 3. Barplots of the two infraspecific STRUCTURE analyses exceeding the predefined clusters: GUA and TOR are given in Figs. 2e, f, respectively. Infraspecific pairwise F_ST values can be found in Table 4 and range from 0.044 to 0.222 for the species-datasets and 0.035 to 0.226 when standardized cf. dataset 3. Confidence intervals of the infraspecific pairwise F_ST values are depicted in Supplementary Figure S6. Mantel tests at the infraspecific level were not significant (dataset 1 and dataset 2: p = 0.084, dataset 3: p = 0.080): see Supplementary Figure S7.

Inbreeding: infraspecific level

Detailed results on the population statistics calculated on the ten taxon-datasets are listed per marker, population and subset in Supplementary Table S4. Population statistics of the most representative subset are listed in Table 2. Three populations: GUA, MART and TOR showed significant departure from HWP. GUA and MART presented significant deviation from HWP for 5/21 and 4/21 loci (1.45 [0, 3] expected to test false positive when p = 0.05). TOR showed significant deviation from HWP for 7/29 loci (1.45 [0, 3] expected to test false positive when p = 0.05).