Introduction

Microsatellite analysis detects interpopulational admixtures from population differentiation and cladistic relationships among alleles. Pines, indigenous only to the northern hemisphere, have been introduced into southern hemisphere countries to provide forest products. These introduced populations provided the opportunity to trace North American ancestors using microsatellite analysis. Microsatellite markers are ideal for determining the origin of 20th century introductions of Pinus taeda in Zimbabwe.

There have been several documented 20th century introductions of P. taeda into Zimbabwe prior to 1960 (Mullin et al., 1978; pp. 189–193). The first introduction in Zimbabwe was a collection of two individuals growing near a homestead of the Rhodes Inyanga Estate. The second introduction in 1926 came from seed harvested from South African exotic plantations of unknown provenance. The third and largest introduction arrived in 1927–31 from both the U.S. via South Africa and from collections made in South African plantations. Results from local provenance trials suggest the Georgia Piedmont as the putative source of the third introduction (Mullin et al., 1978; pp. 189–193). From 1957 to 1962, Queensland, Australia supplied three P. taeda selections with a putative origin of south-eastern Georgia or north-eastern Florida.

Today, Zimbabwe is self-sufficient in producing P. taeda seed for afforestation, but determining the origin of Zimbabwe P. taeda introductions is critical for long-term management of the exotic population. The original provenance(s) of these selections is either the Georgia Piedmont (Mullin et al., 1978) or more likely an admixture of ancestral populations east and west of the Mississippi River Valley. Genetic diversity levels appear comparable to U.S. populations (Williams et al., 1995). The objective of this study is to determine the origin of a highly selected P. taeda population in Zimbabwe by comparison with P. taeda populations sampled throughout the species’ natural range in the southern United States.

Materials and methods

Study organism and sample collections

Pinus taeda L. (Pinaceae) is wind-pollinated, outcrossing and monoecious. The species has a nearly continuous range throughout the southern U.S. Population divergence follows the east–west division of the Mississippi River (Florence & Zink, 1979); in the Atlantic coast populations about 95% of the genetic variation is found within a population rather than between populations (Williams et al., 1995). The eastern range of the species extends from southern Delaware (39°21′N) through Maryland, Virginia, North Carolina, South Carolina, Georgia into central Florida then westwards through Alabama, southern Tennessee and eastern Mississippi. This constitutes the eastern range of P. taeda. The Mississippi River Valley forms a natural gap of 50–200 km, delineating the western range of Louisiana, south-western Arkansas, the south-eastern tip of Oklahoma, eastern Texas and the remote Lost Pines population in Bastrop Co., Texas, which is the westernmost range of P. taeda.

To test for admixtures, data must be available from either two putative sources or from parental populations which represent the original populations that produced the hybrid populations. A set of 41 U.S. samples were collected from grafted archives (Table 1). The archives represent indigenous populations present in the U.S. from 1950 to 1960 prior to domestication. The selected Zimbabwe population was composed of 12 pedigreed Zimbabwe individuals bred from eight unrelated parents of unknown origin. The Zimbabwe population represents an elite domesticated population which is three generations removed from indigenous U.S. populations.

Table 1 Description of 41 samples of Pinus taeda representing U.S. populations from 1950 to 1960

PCR amplification for microsatellite markers

The DNA extraction was based on Doyle & Doyle (1987). The 18 low-copy and genomic microsatellite markers were published in Elsik et al. (in press): PtTX2008, PtTX2037, PtTX2123, PtTX2128, PtTX2142, PtTX2146, PtTX2164, PtTX3011, PtTX3013, PtTX3017, PtTX3019, PtTX3020, PtTX3025, PtTX3026, PtTX3030, PtTX3032, PtTX3034, PtTX3037. GenBank accession numbers for sequences were AF143958 to AF143975.

Each 25 μL PCR reaction consisted of 50 ng of template DNA, 50 mM Tris (pH 9.0), 20 mM ammonium sulphate, 100 μg/μL gelatin, 1% DMSO, 100 μM of each dNTP, 200 nM of each forward and reverse primer, 0.625 units Taq polymerase (Life Technologies), 1–2 mM MgCl2 (optimized for each primer set). PCR products were visualized on silver-stained 6% polyacrylamide denaturing gels using modifications from Echt et al. (1996). The molecular weight marker pGem (Promega) was loaded every eight lanes for allele sizing.

Estimating genetic diversity

Allele frequencies were determined by direct count. Diagnostic alleles for eastern and western parts of the P. taeda range were determined by pooling allele frequencies in these two regions exclusive of Mississippi–Alabama. The Mississippi–Alabama population was intermediate between the eastern and western regions and thus omitted in the identification of region-specific diagnostic alleles. Only alleles present in Zimbabwe were treated as diagnostic in either the eastern or western parts of the range. Alleles which were present in one part of the range yet absent in Zimbabwe were excluded because their absence was confounded with population sampling error. Summary analyses were completed using GENEPOP (Raymond & Rousset, 1995) and BIOSYS-2, a version of BIOSYS-1 (Swofford & Selander, 1989) modified for microsatellite analyses by Dr William C. Black IV, Colorado State University (pers. comm.).

Deviations from Hardy–Weinberg equilibrium (HWE) were based on exact tests (χ2) (Rousset & Raymond, 1995). In addition to the exact test, a Markov-chain algorithm was used for loci with five or more alleles to obtain an unbiased estimate of the exact probability of incorrectly rejecting HWE (Rousset & Raymond, 1995). In all cases, the Markov chain was set to 1000 steps of dememorization and 1000 iterations per batch. Heterozygote deficiency was tested as the alternative hypothesis to HWE. Heterozygote excess, common in the presence of symmetric overdominance or in the case where male and female gametes have different allele frequencies, was separately tested as an alternative hypothesis to HWE. The experimentwise error rates (α′) for 108 chi-squared tests (Weir, 1990; p. 109) were 0.996 and 0.102 at the 5% and 0.1% levels, respectively.

The probability of deviation from HWE for pooled loci across populations was determined using Fisher’s formula (Raymond & Rousset, 1995). This is a χ2 distribution with 2 × r degrees of freedom.

where r is the number of loci pooled across populations and Pj is the probability value for each population.

Principal components analysis

Hybrid populations are not discrete evolutionary lineages and thus phylogenetic trees with admixed populations can violate the fundamental principles of phylogeny reconstruction (Felsenstein, 1982). Principal components analysis should be used in lieu of phylogenetic trees in the case of admixture between two ancestral populations (Cavalli-Sforza et al., 1994; pp. 39–42). Principal components analysis was conducted using the SAS statistical and graphics software package (SAS, 1991).

Estimating genetic distances and cladistic relationships among populations

Genetic distance estimates for microsatellites are based on underlying mutational processes. At least two known mutational processes are responsible for microsatellite variation: the infinite alleles model (IAM) and the stepwise mutation model (SMM). Unequal crossing-over during meiosis generates new alleles, satisfying the IAM assumptions for novel alleles not previously found in the population (Kimura & Crow, 1964). For SMM, DNA polymerase slippage shifts the number of repeats up or down in a stepwise manner (Levinson & Guttman, 1987). Under SMM, an allele with i repeats mutates to an allele either in state i + 1 or i − 1 with equal probability (Ohta & Kimura, 1973). The coexistence of two mutational processes operating on different loci or even different alleles within a locus raises the question of an appropriate measure of genetic distance (Estroup et al., 1995). The best choice is a third class of genetic distance estimates that is not based on a mutational model. Cavalli-Sforza and Edwards’ chord distance, Dc, depends on geometric distances rather than on a given mutational process (Cavalli-Sforza & Edwards, 1967). A computer simulation study of various genetic distance measures found Dc to be the most suitable for microsatellite analysis, even in the presence of a bottleneck effect (Takezaki & Nei, 1996).

If the two populations are represented on the surface of a multidimensional hypersphere using allele frequencies at the jth locus, Dc gives the chord distance between two populations. A generalized form of the chord distance (Goldstein & Pollock, 1997) is given as follows:

where x and y are frequencies of alleles of size i in populations x and y, respectively, and a and b are constants set at 0.5 where the constant c equals 2√2/π. For multiple loci, the average distance is taken across loci. The cladistic relationship among U.S. (ancestral) populations was based on the chord distances and the neighbour-joining method (Saitou & Nei, 1987). Neighbour-joining analysis was conducted using MEGA (Kumar et al., 1993).

Results

Genetic diversity levels were high within all populations except the Lost Pines from Bastrop Co., Texas. The greatest population differentiation for P. taeda occurred between east and west of the Mississippi River Valley. High levels of microsatellite polymorphism and population differentiation in the ancestral U.S. populations made it possible to distinguish between an admixture vs. the putative Georgia Piedmont origin for the Zimbabwe population. The Zimbabwe population was an admixture of south-eastern and western P. taeda populations. Five alleles in the Zimbabwe population were not detected in the samples of U.S. ancestral populations.

Genetic diversity estimation

All populations including Zimbabwe had similar levels of genetic diversity if the sample size bias was removed (Table 2). The total allele number was 192 for the full sample (Table 2). The SC–FL region added the most alleles (126) but it was also the population with the largest sample (14/41). The other populations had the following total alleles: Zimbabwe (94), Western Gulf (88), North-east (83), MissAla (67) and Bastrop (63). However, mean number of alleles (MNA) is a good indicator of genetic variation only if the populations are the same size and at mutation–drift equilibrium.

Table 2 Summary statistics for 18 low-copy microsatellite loci assayed for Zimbabwe samples and five U.S. Pinus taeda populations

To check for sample size bias, the Pearson product–moment correlation between MNA and the size of each population was estimated. The highly positive correlation (r=0.93) indicated that MNA was higher in the larger populations and that there was a bias in allele numbers caused by sample size. A uniform size estimate of MNA was added by randomly deleting entries in the larger population so that sample size became uniform in all populations (Table 2) and the total allele number was 172. After adjusting for sample size, the allelic contributions were nearly similar in all populations. The Lost Pines in Bastrop Co., Texas had slightly less genetic diversity.

This is the first estimate of genetic diversity for Bastrop Co., Texas. This population was sampled within a single county, rather than the single sample per county or even regional sampling applied to other populations. The lower genetic diversity level was also expected because Bastrop Co., Texas has partial reproductive isolation. This population is connected to the larger, continuous distribution of P. taeda in east Texas by a string of small isolated stands between Bastrop and Houston, Texas.

The HWE test for heterozygosity deficiency was significant at the 0.1% level in eight out of 108 tests (Table 3). Significance at the 5% level was only reported as a suggestive level because of its high experimentwise error rate for 108 tests, raising the probability of incorrectly rejecting HWE. Pooled loci across populations showed that seven loci out of 18 had statistically significant heterozygote deficiency and thus deviated substantially from HWE (Table 3). There was no heterozygote excess detected at any locus.

Table 3 Probability values (□ for P ≥ 0.05; for P ≥ 0.001) for exact test of Hardy–Weinberg equilibrium vs. heterozygote deficiency for Pinus taeda microsatellites. Markers developed from a set of total genomic libraries have 1 as second digit

Divergence in ancestral populations

The greatest population differentiation in ancestral P. taeda populations was east and west of the Mississippi River Valley. The neighbour-joining phylogeny indicated that eastern populations were more similar to one another than to western populations (Fig. 1). The neighbour-joining phylogeny was constructed for the two sets of ancestral U.S. populations to check for population differentiation prior to testing for admixture. The eastern and western regions were clearly delineated, as reported by Florence & Zink (1979). The population differentiation between eastern and western P. taeda regions and microsatellites made it possible to distinguish between U.S. regional origins and an interpopulational admixture in the selected Zimbabwe population.

Fig. 1
figure 1

Neighbour-joining tree based on Cavalli-Sforza & Edwards (1967) chord distances among three eastern and two western Pinus taeda populations indigenous to the southern U.S.

Testing for interpopulational admixture

The principal components analysis supported the Zimbabwe population as an admixture. This was corroborated by the diagnostic alleles which appear to be specific to eastern or western regions of the P. taeda range in the southern U.S. The eastern region included the Eastern Shore, Virginia, North Carolina (north-east) and South Carolina, Georgia and Florida. The western region was represented by Louisiana, Oklahoma, Arkansas and Texas (Western Gulf) including the westernmost extreme of the P. taeda range, Bastrop Co., Texas.

To a lesser extent, chord distances showed the Zimbabwe selected population to be an admixture of populations ancestral to the east and west of the Mississippi River Valley. A finer analysis of sources within the South Carolina–Florida population was conducted to find the central origins. The analysis showed Georgia Piedmont and north-eastern Florida to have the two highest chord distances with Zimbabwe (Dc=0.578, 0.556). In the western region, the chord distance was higher for Bastrop Co., Texas (Dc=0.543). The South Carolina coastal region, the north-eastern range, and Western Gulf regions had slightly lower chord distances (Dc=0.521, 0.475 and 0.433, respectively).

Principal components analysis showed the Zimbabwe population to be divergent from either U.S. region despite a marked affinity to the eastern region (Fig. 2). The proportion of the total variance accounted for by all three components is 90.6%. The first, second and third principal components accounted for 73.1%, 9.2% and 8.3% of the total variance, respectively. Divergence of the Zimbabwe selected population from other populations was more pronounced with the addition of the third component (Fig. 2).

Fig. 2
figure 2

Graphical depiction of principal components analysis for Zimbabwe and the U.S. ancestral populations of Pinus taeda. Factors 1, 2 and 3 account for 90.6% of the total variance associated with allele distribution and frequencies.

Allelic frequency distribution also confirmed the Zimbabwe population as an admixture. Seven diagnostic alleles which were present in Zimbabwe showed affinity for both western and eastern regions (Table 4). Each had an allele present in the Zimbabwe population which occurred in either the eastern or western regions. Of the seven, five occurred only in the eastern region and two occurred in the western region. The region-specific alleles could not be traced back to one particular parent, suggesting that admixture occurred in preceding generations. For example, the western alleles occurred in cross 124 × 13 at locus PtTX3037. The same cross 124 × 13 also had eastern alleles in three cases: PtTX2128 allele 8, PtTX2164 allele 10 and PtTX3011 allele 10.

Table 4 Allele number and frequencies for Zimbabwe which are unique to eastern vs. western parts of the natural range of Pinus taeda. Bold letters indicate frequencies of diagnostic alleles for the western part of the P. taeda range

Diagnostic alleles for the Zimbabwe population

The Zimbabwe population had five alleles absent in the U.S. samples. The unique alleles were PtTX2164 allele 3, PtTX3030 allele 17, PtTX3034 alleles 9, 11 and PtTX3037 allele 26. The four loci with a total of five alleles either (i) arose from de novo mutations (ii) were overlooked in the U.S. sampling or (iii) resulted from introgression with another introduced North American pine.

Three of these four loci (PtTX3030, PtTX3034, PtTX3037) had dinucleotide repeat motifs (Elsik et al., in press). Four of the Zimbabwe alleles were at the smaller end of the size range of the respective locus; all alleles except 3037–26 were separated from the adjacent larger allele by at least two to five nucleotides, suggesting a tendency towards contraction in repeat length or deletions. The distribution of these private alleles also suggested a rare mutation, a low-frequency allele from P. taeda or mating with a close relative. Allele 17 in PtTX3030 occurred twice but the same parent contributed the allele in both cases. Allele 3 in PtTX2164 occurred four times in the selected Zimbabwe population but one parent contributed the allele in three of the four cases. Both PtTX3034 alleles 9 and 11 and PtTX3037 allele 26 each occurred once.

Discussion

This is the first microsatellite analysis of population structure for P. taeda. Detecting interpopulational admixture is possible given population differentiation between eastern and western regions of the P. taeda range (Florence & Zink, 1979) and highly polymorphic microsatellite markers (Elsik et al., in press). The selected Zimbabwe population is an admixture of eastern and western P. taeda populations.

High genetic diversity within all populations

All P. taeda populations except Bastrop Co., Texas had similar levels of genetic diversity. Zimbabwe’s selected population had genetic diversity levels comparable to U.S. populations, as previously reported (Williams et al., 1995). The Lost Pines were heterozygous but had lower genetic diversity levels. The lower genetic diversity resulted from the narrow area for sampling and partial reproductive isolation. The high genetic diversity levels within U.S. populations and relatively low levels of population differentiation fit with P. taeda’s life history attributes: historically large populations, an outcrossing species, annual producer of larger quantities of wind-borne pollen and seeds, and an aggressive colonizer. These attributes may explain why Zimbabwe P. taeda introductions have retained high genetic diversity levels: the extensive plantings throughout southern Africa, multiple introductions of U.S. germplasm (Mullin et al., 1978; pp. 189–193) and possibly introgression or de novo mutations in southern Africa.

Deviation from Hardy–Weinberg equilibrium was caused by heterozygote deficiency. No loci exhibited deviation resulting from heterozygote excess. The most common cause for heterozygote deficiency in microsatellite marker studies is the presence of null alleles. Null alleles occur when there is a mutation within the DNA sequence complementary to one or both primers, preventing PCR amplification of the microsatellite sequence (Callen et al., 1993). Null alleles for some of these loci have been detected in intergenerational P. taeda pedigrees (Elsik et al. in press) but are undetected in this population survey, leading to downward bias for heterozygosity.

Other causes of heterozygote deficiency include selection or hitchhiking on a linked locus under selection, deviation from panmixis, or population admixture. Of the three, selection seemed to be a likely cause for HWE deviation at locus PtTX3037. Unlike the other microsatellite loci, PtTX3037 exhibited strong heterozygote deficiency at the 5% or 0.1% levels for all populations including Zimbabwe (Table 3). Of the 18 loci, 13 were selected from low-copy gene-rich regions in the pine genome (Elsik et al. in press) and thus may be under strong directional selection or closely linked to protein-coding regions which are under selection. Deviation from panmixis or the presence of admixture seemed the least likely cause of heterozygote deficiency. The selected Zimbabwe population had fewer HWE deviations than the U.S. populations and yet it exhibited admixture and a deviation from panmixis (Table 3).

Pinus taeda in Zimbabwe: a divergent admixture

The strongest evidence for admixture was based on the principal components analysis (PCA). Chord distances differed only slightly and the diagnostic alleles should be subject to closer scrutiny with larger sampling from U.S. populations. PCA results also indicated that Zimbabwe has diverged from its ancestral populations (Fig. 2). Divergence may be accelerated by the same mutational processes which generated the five private alleles found only in the Zimbabwe population.

The unique alleles could have arisen through de novo mutations or from introgression with another introduced North American pine species. Both explanations are supported by a novel allele at the Pg1 isozyme locus found in a selected Zimbabwe population; this allele was absent in over 500 U.S. samples throughout the P. taeda range (Williams et al., 1995; J. L. Hamrick, pers. comm.). A less likely explanation is that the unique alleles are very low-frequency and thus undetected in these U.S. samples. If so, the frequency of these alleles would have been increased in the Zimbabwe population through drift and selective breeding.

Three of these four loci (PtTX3030, PtTX3034, PtTX3037) have dinucleotide repeat motifs (Elsik et al. in press). Dinucleotide repeats tend to have higher mutation rates in model organisms (Chakraborty et al., 1997) and this supports the idea of de novo mutations in Zimbabwean P. taeda introductions. Additional de novo mutations for Zimbabwe may be present yet obscured by allelic size homoplasy where two different repeat changes result in the same allele size (Grimaldi & Crouau-Roy, 1997) or by null alleles (Callen et al., 1993).

The fact that Zimbabwe alleles were at the smaller end of the size range of the respective locus mitigates against de novo mutation. Four of the five Zimbabwe alleles were at the smaller end of the size range of the respective locus; all alleles except marker 3037 allele 26 were separated from the adjacent larger allele by at least two to five nucleotides. If so, then these private alleles would have arisen from deletions or from contractions in the repeat motif rather than expansions. This is counter to the observed trend in microsatellite analyses in mammals: random mutations in the repeat motif are expected to be both expansions and contractions and longer alleles expand more than short alleles (e.g. Valdes et al., 1993).

Introgression with Pinus elliottii, another introduced species closely related to P. taeda, is the more likely source of novel alleles for Zimbabwe populations. Closely related pines hybridize freely and North American pines have an unusual reproductive phenology in southern Africa which favours introgression. The strobilus receptivity between P. elliottii and P. taeda species completely overlaps relative to the species range in the southern U.S. (Mullin et al., 1978). Seed collections from plantations in southern Africa would be more likely to contain a higher proportion of inter-specific hybrids than U.S. collections. Natural hybrid zones frequently have higher levels of novel alleles (Hoffman & Brown, 1995). The introgression hypothesis is further supported by the high degree of interspecific repeat motif conservation in many low-copy microsatellite markers (C. G. Williams, unpubl. data).