Introduction

Most sexually reproducing fungal species pathogenic on forest trees are thought to possess substantial genetic variability. These fungi, unless recently introduced, have undergone an extended period of coevolution with their host species that themselves are generally genetically diverse. Most, if not all, forest tree species exist as populations growing in either an undomesticated or partly domesticated state. Even the few that are intensively managed are grown primarily under seminatural conditions and almost all have the ability to survive and grow in an array of environments. Their fungal pathogens then must also have the capacity to function and reproduce in a variable environment and must retain the capability to adapt to the changing environmental challenges. These requirements suggest that a reservoir of genetic variation that can be exploited for purposes of survival and population expansion is highly beneficial.

To begin to understand which evolutionary processes contribute to the genetic variability present in these organisms, it is necessary to determine how this variation is distributed within and among populations, and for widespread pathogens, whether regional variation patterns exist. MacDonald (1997) has pointed out that information about how crop pathogens evolve can be used to improve disease management in agricultural ecosystems. Even though the pathosystems involved differ, the same conclusion is likely to hold for efforts directed at controlling pathogen populations in forest ecosystems.

This paper reports results from an analysis of genetic structure for populations of Cronartium quercuum (Berk.) Miyabe ex Shirai f. sp fusiforme (Burdsall and Snow, 1977) infecting loblolly pine (Pinus taeda L.) over much of this host's natural range. C. quercuum fusiforme is a macrocylic, heteroecious rust fungus that kills or severely damages loblolly and slash pines (P. elliottii var. elliottii Engelm.) growing in the southeastern United States. It typically forms fusiform-shaped swellings, known as galls, at points of infection on pine hosts. In this research, we identified microsatellite marker loci that segregate in populations of this pathogen and base our analysis of population structure on allele frequency variation observed for these loci. Previously, little was known about how genetic variability is distributed across the landscape that comprises the natural range of infection for this pathogen. A number of earlier investigations, however, uncovered evidence that indicates that this formae speciales harbors considerable genetic variation. Variation in pathogenicity was shown to occur among isolates collected from different geographic regions (Snow and Kais, 1970; Snow et al, 1975; Powers et al, 1977; Walkinshaw and Bey, 1981; Powers, 1985; Kuhlman, 1990), among different galls within regions (Snow et al, 1975, 1976; Powers et al, 1977, 1978; Snow and Griggs, 1980; Kuhlman, 1992) and among isolates originating from different dikaryotic spores collected within single galls (Powers, 1980; Kuhlman and Matthews, 1993). In an exploratory examination of genetic variability in C. quercuum fusiforme occurring on loblolly pine, Hamelin et al (1994) obtained results with random amplified polymorphic DNA (RAPD) markers that suggest populations west of the Mississippi river might be differentiated from those in eastern areas. Still, by far, most of the genetic diversity observed occurred within the regions studied.

In contrast to the situation for C. quercuum fusiforme, patterns of genetic variability among and within North American populations of C. ribicola JC Fischer, a pathogen that infects white pines (subgenus Haploxylon), have been intensely investigated. Unlike C. quercuum fusiforme, C. ribicola is an exotic pathogen in North America, having been separately introduced to both eastern and western portions of this continent early in the 20th century (Maloy, 1997). Investigations in eastern Canada and in western North America based on neutral genetic markers indicate that most genetic diversity in this fungus is found among individual infections within local populations, but that modest differentiation also occurs among populations within geographic regions (Hamelin et al, 1995, 1998; Kinloch et al, 1998; Et-touil et al, 1999). Furthermore, there is no evidence for genetic variability among population composites representing different regions within either eastern or western areas. Populations in each area are believed to be individual gene pools shaped by founder and migration events that collectively form distinct eastern and western metapopulations that are highly differentiated from each other (Hamelin et al, 2000).

Our goal for the research described herein is to acquire a more detailed and complete understanding of population structure for C. quercuum fusiforme occurring on loblolly pine that both supplements and goes beyond the initial work described in Hamelin et al (1994). In the following sections, we present an account of procedures used to identify and select microsatellite loci appropriate for the study of genetic differentiation among as well as within populations of this pathogen, report estimates of descriptive genetic diversity parameters associated with microsatellite loci segregating in this formae speciales and disclose inter- and intraregional genetic variation patterns detected during our investigation. Finally, we compare our results to patterns of variability previously reported for genetic marker loci in C. quercuum fusiforme and to findings regarding the distribution of genetic variation for neutral markers in C. ribicola.

Materials and methods

Subpopulation sampling and DNA extraction

Samples of C. quercuum fusiforme were collected at 19 sites across the coastal plain of the southeastern United States (refer Figure 1). Collections were obtained from three locations in Texas, three in Louisiana, five in Mississippi, two in Alabama, two in Georgia, and one in Florida during October and November of 1997. Specimens consisting of spermatial exudate from three additional Georgia sites collected in October and November of 1992 were added to expand our sample of locations. Most galls sampled occurred on naturally regenerated loblolly pine less than 25 years in age, although some occurred on trees in plantations. At the GPTMS site, samples were collected from juvenile slash pines. Galls showing signs of imminent sporulation were selected for sampling, and entire galls or gall sections containing spermogonia were collected. Stands were generally not greater than 10–20 acres in size. Each sample was assigned a unique ID, placed on ice, and brought back to the laboratory for DNA extraction and analysis. For galls and gall sections, the bark was peeled/flaked off and a portion of the distinct orange-colored hymenium was gently scraped and extracted. For the spermatial samples, spores were separated from the nectar, washed, and extracted. DNA was extracted from approximately 50 mg of tissue using the Nucleon PhytoPure DNA Extraction Kit (Amersham International plc, Buckinghamshire, England).

Figure 1
figure 1

Map of the geographic origin of the 19 C. quercuum f. sp fusiforme collection sites.

Construction of enriched repeat library for Cqf

Approximately 5.0 μg of DNA extracted from a single basidiospore mycelial culture (gametothallus) of C. quercuum fusiforme was sent to BC Research Inc. (Vancouver, British Columbia, Canada), and a dinucleotide repeat-enriched library (AC/TGn or AG/TCn) consisting of 170 positive clones was constructed and stored as glycerol stocks. Additional information regarding dinucleotide repeat library construction may be obtained from Dr Craig Newton (cnewton@bcresearch.com).

Insert size and sequence determination

To assess insert sizes and prepare sequencing template, 1.0 μl of each of the glycerol stocks was diluted 20-fold and 1.0 μl was PCR amplified using the M13 universal (−40) and M13 reverse (−24) primers (Operon Technologies, Inc., Alameda, CA, USA). Approximately 5.0 μl of each reaction was run out on 3.0% agarose gels to assess insert size, and the remaining reaction was precipitated by the addition of 2.0 μl linear polyacrylamide and three volumes of cold ethanol, washed with 70% ethanol and dried at room temperature. Resuspended samples were sequenced on an Applied Biosystems 373A automated DNA sequencer (Applied Biosystems, Inc., Foster City, CA, USA) using the ABI PRISM dye terminator cycle sequencing kit as described by the manufacturer. Only those clones/sequences harboring stretches of perfect repeat (n⩾12, where n is the number of dinucleotide repeat units) were selected for oligonucleotide primer design. Oligonucleotide primer sequences were designed using the Oligo® Primer Analysis Software version 5.0 (National Biosciences, Inc., Plymouth, MN, USA). In general, 21-mers were selected if possible, but primer lengths down to 18 were accepted if longer primers were not available. The low end of the Td range was lowered, and duplexes allowed (except in the last five bases on the 3′ end) if necessary to obtain compatible primers. Oligonucleotide primers were synthesized by Integrated DNA Technologies, Inc. (Coralville, IA, USA).

Microsatellite PCR amplification

Individual microsatellite loci were PCR amplified in 24 μl total volume: 3.125 ng of template DNA, 1 μl each of forward and reverse primers (5 μM stocks), 3.6 μl of dNTPs (1 mM stock), 2.4 μl 10 × Taq DNA polymerase reaction buffer (500 mM KCl, 100 mM Tris-HCl, 1.0% Triton X-100, 15 mM MgCl2), and 1.0 U Taq DNA polymerase. Although all primer pairs amplified sufficient product for detection using 1.5 mM MgCl2 per reaction, 2.25 mM was found to result in more robust amplification for some DNA templates. Reactions were loaded in flexible microtiter plates and overlaid with 25 μl of mineral oil. Microtiter plates were placed in preheated (85°C) programmable temperature cyclers (MJ Research PTC-100) and covered with mylar film. The DNA samples were immediately subject to amplification using the following ‘touchdown’ profile: 2 min at 95°C; followed by 10 cycles of 1 min at 92°C, 20 s at X, and 20 s at 72°C, where X=73°C in the first cycle and decreases by 2°C every cycle thereafter; followed by 25 cycles of 20 s at 92°C, 20 s at 55°C, 20 s at 72°C; followed by a 5 min extension at 72°C and an indefinite hold at 4°C.

Microsatellite selection and characterization

All primer pairs that amplified a single band of expected size from the single basidiospore mycelial culture of C. quercuum fusiforme from which the enriched library had been constructed, and did not amplify any bands from a loblolly pine control, were selected for further characterization. Promising primer pairs were then screened against a panel of pooled DNA samples of C. quercuum fusiforme to assess the polymorphic nature of each band. Three randomly selected galls from seven sites were used to construct site-specific bulks. Based on this screen, a total of 12 polymorphic microsatellites were chosen for this study. The inheritance pattern of the selected microsatellites was then examined in defined pedigrees. DNAs were extracted from gametothalli produced from single-urediniospore cultures as described in Doudrick et al (1993), and the bands detected were tested for Mendelian segregation and genetic linkage. For each microsatellite, the forward primer was 5′-end labeled with one of the three possible fluorescent dyes (either HEX, 6-FAM, or TET) to facilitate detection and analysis using the Applied Biosystems 373A automated DNA sequencer and the GENESCAN® version 1.1 fragment analysis software (Applied Biosystems, Inc., Foster City, CA, USA). Microsatellites were multiplexed by color and size allowing for the simultaneous analysis of up to six loci in a single gel run. Allele sizes were determined by including the GENESCAN®-500[TAMRA] internal size standard in each sample lane. All alleles were classified to the nearest odd or even base pair. For those cases where an allele was estimated to be exactly half way between two size classes, it was visually classified using other alleles of apparent similar size from the same electropherogram.

Data analysis

Chi-square (χ2) tests were used to assess goodness-of-fit of segregating polymorphisms to expected Mendelian ratios. Loci were tested for linkage using χ2 tests and results confirmed using the software package JoinMap version 2.0 (Stam and van Ooijen, 1995). A search for common/redundant haplotypes within as well as among populations was performed. Allele frequencies for each population were computed and estimates obtained for effective number of alleles per locus (ne), Nei's (1973) gene diversity (h), Nei's unbiased measure of genetic distance (D), and Michalakis and Excoffier's (1996) genetic differentiation measure (ΦST), using the software program POPGENE version 1.31 (Yeh et al, 1997) and ARLEQUIN version 2.001 (Schneider et al, 2000). In addition, χ2 tests were conducted to test homogeneity of allele frequencies among populations. For analysis in ARLEQUIN, the difference in length between amplified products was assumed to be the direct consequence of changes in repeat numbers; therefore, the smallest allele at each locus was assigned a repeat length of one and was used as a reference to assign a repeat length to the remaining alleles. Associations between allele frequency and latitude and/or longitude were studied using stepwise regression analysis in the PROC REG procedure in SAS version 8.01 (SAS, 1999). The models considered included both linear and quadratic components. A variable was only added to the model if its F-statistic was significant at the 1% level. Once added, any variable that did not have an F-value significant at the 1% level was deleted from the model. Estimates for genetic distance (D) and among population differentiation (Φst) were determined for each pair of populations and associations with geographic distance were investigated using the PROC REG procedure in SAS. Genetic associations among populations were first studied using unweighted pair-group mean analysis (UPGMA) based on estimates for Nei's genetic distance, and then by principal components analysis (PCA) conducted on the allele frequency data using the PROC PRINCOMP procedure in SAS. A hierarchical analysis of molecular variance (AMOVA) based on the suggested population groupings was then performed using ARLEQUIN.

Results

Identification of microsatellites in C. quercuum fusiforme

A total of 170 C. quercuum fusiforme genomic DNA clones putatively positive for dinucleotide repeats were subjected to PCR amplification to assess insert size and prepare sequencing template. Direct cycle sequencing was performed on those clones that amplified only a single easily observable PCR band. Inserts ranged in size from approximately 225 base pairs up to 1300 base pairs, averaging 650 base pairs. The number of repeat units per clone ranged from as few as three to as many as 63, averaging 15.7. Sequence data from a total of 126 unique repeat-containing clones were available for primer design. For 21 of the clones, the repeat was too close to the cloning site of the vector to facilitate primer design. Using fairly stringent criteria, primer pairs were developed for 37 clones. Of the 37 primer pairs constructed, 20 were successful in amplifying a single band of expected size from the gametothallic isolate of C. quercuum fusiforme used to produce the enriched library, five failed to amplify any bands, and 12 amplified more than one band. None of the 37 primer pairs amplified any bands from the loblolly pine control, suggesting that the microsatellite primer pairs were specific to the C. quercuum fusiforme genome. A total of 12 microsatellite markers were selected for this study.

Testing for Mendelian segregation and genetic linkage

Gametothalli derived from two unrelated single-ure-diniospore cultures of C. quercuum fusiforme were used to examine the segregation ratios for the microsatellite markers. A total of 48 gametothalli were derived from one single-spore culture (Pedigree 1), and 16 from another (Pedigree 2). Of the 12 microsatellites assayed, seven were found to be segregating in at least one of the two pedigrees. Based on χ2 analyses, only one marker (Cqf151) was found to deviate significantly from its expected Mendelian inheritance ratio. This marker, however, was found to deviate in only one of the two pedigrees examined (P<0.01 in Pedigree 1 and P=1.0 in Pedigree 2). No spurious or otherwise unexpected alleles were found for any of the loci. Based on recombination rates observed in the two pedigrees, there was no significant evidence for genetic linkage between/among any of the seven segregating loci (P⩾0.38 for χ2 analyses, LOD⩽0.58 for maximum-likelihood-based estimation).

Genetic differentiation in C. quercuum fusiforme

Data describing the microsatellite loci used in our analyses are presented in Table 1. Number of alleles observed for the various loci varied from six to 81 with most loci having numbers in the range 20–35. Sufficient variation was displayed by the 12 microsatellite loci to identify uniquely almost every gall sample analyzed. Only two samples, both from the NWHMS population, were found to have the same alleles at all loci scored. Allele frequencies for alleles greater than 10% frequency over all populations, plus those found to be significantly associated with longitude, are presented by population in Table 2. After correcting for the presence of low-frequency alleles, all single-locus contingency χ2 analyses for heterogeneity of allele frequencies across populations were found to be highly significant (P<0.00001).

Table 1 Microsatellite primer sequence, repeat type, allele size range, and number of alleles identified in 19 populations of C. quercuum f. sp fusiforme located along the south Atlantic and Gulf Coast of the United States
Table 2 Microsatellite allele frequencies in 19 populations of C. quercuum f. sp fusiforme collected along the south Atlantic and Gulf Coasts of the United States

Differentiation statistics computed over all populations are shown in Table 3. Values obtained for effective numbers of alleles per locus ranged from 2.8 to 31.3. Genetic diversity varied among loci from 0.64 to a high of 0.97, and among population differentiation (ΦST) differed from a low of 0.019 to a maximum of 0.233 with a mean of 0.116. Using stepwise regression analysis, frequencies of 35 alleles at 10 loci were found to be significantly associated with latitude and/or longitude (P<0.01). All 35 alleles had frequencies associated with longitude (see the bold-italic markers in Table 2), but two also showed associations with latitude. The R2 or proportion of the variation explained by the models ranged from a low of 0.33 to a high of 0.80. In addition, at several loci, associations (P<0.01) were detected between longitude and number of alleles per population (Cqf084 and Cqf165) and between longitude and gene diversity (Cqf065, Cqf084, and Cqf165). For both Cqf065 and Cqf165, estimates of parameters were higher in eastern populations and lower in the west. The opposite trend was observed for locus Cqf084.

Table 3 Summary of genetic diversity descriptive statistics for 12 microsatellite loci assayed on 19 populations of C. quercuum f. sp fusiforme located along the south Atlantic and Gulf Coasts of the United States

Unbiased estimates of genetic distance (D) between pairwise comparisons of populations computed across all 12 loci ranged from a low of 0.081 to a high of 0.94, averaging 0.42. These estimates were shown to be significantly associated with the geographic distance between populations (P<0.0001 for the F-test and R2=0.42). Similarly, estimates of genetic differentiation (ΦST) between paired populations ranged from a low of −0.27 to a high of 0.52, averaging 0.06. These estimates were also found to be significantly associated with geographic distance separating populations (P=0.0002 for the F-test and R2=0.08). Thus, it is clear that geographically separated populations are more highly genetically differentiated from each other than geographically proximate populations.

Hierarchical structure among populations was studied using UPGMA on genetic distance and by PCA based on allele frequencies. Results from both methods suggested at least four distinct groups: an eastern Gulf and south Atlantic Coast group (AUGGA, ATHGA, BRNGA, BLTFL, SAVGA, and VALGA), a mid-eastern Gulf Coast group (BRWAL, CLNAL, NWHMS, and RSLMS), a mid-western Gulf Coast group (GPTMS, HLDLA, HZLMS, and SAUMS), and a western Gulf Coast group (ALXLA, CLHLA, JSPTX, LNGTX, and ORGTX) (Figure 2). The first two principal components accounted for 47.9% of the total variation and divided the populations into the same four basic groups as UPGMA. It is notable that the sample collected from slash pine, GPTMS, does not separate out as a distinct unit but is found clustered with its geographic neighbors. This suggests that C. quercuum fusiforme infecting slash pine may not be genetically distinct from C. quercuum fusiforme infecting loblolly pine.

Figure 2
figure 2

UPGMA dendrogram based on Nei's (1978) genetic distance.

The four groups were chosen to serve as upper level categories in a hierarchical AMOVA. Differentiation measures estimated from this AMOVA are shown in Table 4. Considering all loci, the majority of the microsatellite variation occurs ‘within-populations’, but the ‘among-groups’ component generally is larger than the ‘among-populations within-groups’ component. Genetic diversity descriptive statistics estimated for each of the four regions separately are presented in Table 5. Genetic identities and distances among the four groups are presented in Table 6. In summary, the populations sampled appear to cluster into four distinct groups largely based on longitude.

Table 4 Results of AMOVA for C. quercuum f. sp fusiforme located in four regions along the south Atlantic and Gulf Coasts of the United States based on the hierarchical structure suggested by UPGMA and PCA
Table 5 Summary of genetic diversity descriptive statistics for 12 microsatellite loci assayed on C. quercuum f. sp fusiforme located in four regions along the south Atlantic and Gulf Coasts of the United States
Table 6 Summary of genetic identity and genetic distance statistics among four regional metapopulations of C. quercuum f. sp fusiforme located along the south Atlantic and Gulf Coasts of the United States

Discussion

Previously, it was not known whether perennial fusiform rust galls found in nature are primarily genetically homogeneous, that is, formed as a result of infection and colonization by the mycelium originating from a single basidiospore, or whether they are typically heterogeneous, that is, formed as a result of multiple infections or contain regions of dikaryotic cells (aecidia) resulting from past fertilization events. Since genetic heterogeneity would have a direct bearing on how to analyze and interpret our data, we first looked for the presence of more than one unique haplotype within individual galls. In a total of 33 DNA samples extracted from spermogonia of 13 individual galls, all samples from the same gall were observed to have identical multilocus haplotypes. Furthermore, only four DNA samples out of the 691 collected for this study (0.006%) contained more than one allele per microsatellite locus. These results suggest that the majority of fusiform rust galls formed under field conditions are produced either as a direct result of infection and colonization by the haploid mycelium originating from a single basidiospore of C. quercuum fusiforme, or if multiple infections do occur then only a single haplotype must ultimately dominate and be responsible for gall formation.

Our results clearly demonstrate that high levels of microsatellite variability exist in C. quercuum fusiforme, and that most of this variation occurs within local populations (average 88.4%). These findings are consistent with previous observations on both pathogenic variability as well as molecular variability in C. quercuum fusiforme. In a comparison of pathogenicity among inocula collected along both a Florida–Georgia and a Mississippi transect, Snow et al (1975) found variation associated with inocula from locations within transects to be more pronounced than the variation between transects. Powers et al (1977) observed that variation in pathogenicity among isolates collected from individual galls within several southeastern states is influenced by a host family × state interaction and affected by a host family × collection within state interaction. Hamelin et al (1994) using RAPD markers reported that 94% of the variation in C. quercuum fusiforme occurred within regional assemblages of infections. However, due to the extremely small sample sizes employed in this study, samples from different populations within predefined geographic regions had to be pooled to obtain reliable allele frequency estimates. Therefore, differentiation among local populations within regions could not be distinguished from variation among individual galls within populations. Similar observations of large within-population variability have been reported for eastern Canadian populations of white pine blister rust fungus, C. ribicola (Hamelin, 1996; Hamelin et al, 1998; Et-touil et al, 1999). As a general conclusion, C. quercuum fusiforme exists as a highly variable formae speciales, with a large proportion of its genetic variability occurring within populations.

Although most of the microsatellite variation found in C. quercuum fusiforme occurs within local populations, a statistically significant proportion is found among populations, and the magnitude of this differentiation is closely associated with the geographic distance between populations. Magnitudes of the ΦST estimates obtained in this study are consistent with a previous estimate reported by Hamelin et al (1994) for an analogous measure of regional genetic differentiation (GST) in C. quercuum fusiforme. Whereas GST values for only three of the 12 RAPD loci studied by Hamelin et al (1994) were observed to be statistically different from random expectations (P<0.05), ΦST values obtained for 11 of the 12 microsatellite loci studied in this investigation were statistically significant (P<0.001). Moreover, single-locus estimates of pairwise population genetic differentiation measures were found to be significantly associated with distance between populations, demonstrating that geographically proximate populations have greater genetic similarity than geographically distant populations. These findings lead us to conclude that although long-distance spore migration is possible, it is infrequent enough so that genetic differentiation can take place.

In addition to significant levels of among-population differentiation, UPGMA and PCA both indicate that regional differentiation has occurred in C. quercuum fusiforme. These analyses identify at least four genetically distinct regional groups of C. quercuum fusiforme present in the south Atlantic and Gulf coastal states: an eastern group, a mid-eastern group, a mid-western group, and a western group. Estimates of ΦCT significantly greater than random expectations were found between all pairwise combinations of these groups. A somewhat weaker and incomplete pattern of differentiation was inconclusively suggested by the results obtained by Hamelin et al (1994), who reported evidence for regional differentiation in an east–west transect for only three of 12 RAPD loci. Although those investigators found significant differences between western populations on one hand and eastern and central populations on the other hand, GST values obtained between eastern and central populations were not statistically significant. In their exploratory investigation, populations in Louisiana were not sampled and populations in Mississippi were only marginally represented. With our more intensive sample, we demonstrate that Louisiana populations west of the Mississippi river join with those from Texas to make up a western metapopulation and contrary to their results, populations east of the Mississippi river are differentiated into at least three additional groups. In addition, in their exploratory investigation, population samples within predefined regional boundaries had to be pooled to obtain meaningful allele frequency estimates, and hence they could not discriminate between differentiation among individual galls within populations and differentiation among local populations.

In summary, our findings demonstrate that at least four metapopulations of C. quercuum fusiforme exist along the south Atlantic and Gulf coastal plain of the United States. Gene flow between these regions appears to be less than that occurring among populations within regions. As a general trend, genetic distance and genetic differentiation are closely associated with the geographic distance between populations. This pattern of genetic differentiation is different from that reported for C. ribicola in eastern Canada. Et-touil et al (1999) found variation among provinces to be several orders of magnitude smaller than differentiation among populations within regions. Furthermore, within the eastern and western metapopulations, associations between genetic and geographic distance have not been observed in C. ribicola. Although genetic variability in C. ribicola follows a pattern consistent with a hypothesis of subpopulations within metapopulations where genetic drift or founder effects play major evolutionary roles (Kinloch et al, 1998; Et-touil et al, 1999), in C. quercuum fusiforme the distribution of genetic variability is consistent with a hypothesis of at least four metapopulations with gene flow occurring less among regions than among populations within regions, and where overall levels of gene migration are related to the geographic distance between populations.