Introduction

Understanding the relationship between genotype and phenotype is essential for the improvement of complex traits in economically important plant species. Improvements in genomic technology and knowledge gained from research in model organisms are creating opportunities for large-scale genomic research in commercially important species. Association genetics studies in humans have demonstrated the potential to discover DNA sequence variants that are correlated with disease phenotypes (Smith and Newton-Cheh, 2009). Both candidate gene and genome-wide approaches are potentially successful methods for the discovery of variants that are either causal, or are linked to, the causal variant for disease and quantitative traits (Hirschhorn and Daly, 2005). Association studies for herbaceous plants have been successful in identifying polymorphisms related to phenotypic variation in adaptive traits in Arabidopsis (Chan et al., 2010), as well as economically important traits in maize (Buckler et al., 2009), sugar beet (Stich et al., 2008b) and wheat (Jing et al., 2007).

Forest trees have recently been used in several association studies (González-Martínez et al., 2007, 2008; Ingvarsson et al., 2008; Eckert et al., 2009). Conifers are well-suited for use in association genetics studies because of their large random-mating populations, nucleotide diversity, rapid decay of linkage disequilibrium and haploid tissue obtainable from seeds (Neale and Savolainen, 2004). Economic and adaptive traits have been explored in conifers including Douglas-fir (Pseudotsuga menziesii) and loblolly pine, and in angiosperms such as Populus (Neale and Ingvarsson, 2008). Results of association studies for wood properties and carbon isotope discrimination have been published in loblolly pine revealing potential associations using a candidate gene approach (González-Martínez et al., 2007, 2008).

The limitation of available water is a critical factor affecting plant growth and survival. The regulation and control of water use by plants is a complex system, and is difficult to measure in a high-throughput assay suitable for a population-scale study. Carbon isotope discrimination between atmosphere and plant organic material (Δ) has been used in a wide range of plant species to assess water use efficiency. Carbon isotopic composition of plant organic material has been established as an indirect but integrated measure of the ratio of photosynthetic activity to stomatal conductance (Farquhar et al., 1989). The response by plants to limitations in water has implications for both an understanding of adaptive variation in natural populations, as well as economic importance for increasing production or yield in economically important plant species. Carbon isotope discrimination has also been used in crop species in an attempt to improve water use efficiency (Condon et al., 2004; Rebetzke et al., 2006; Anyia et al., 2007).

Genetic variation in carbon isotopes (Δ or δ) has been reported for several species of forest trees. Foliar carbon isotope discrimination was under moderate genetic control in Picea mariana and was strongly correlated with growth, making it a candidate for indirect selection to improve growth (Johnsen et al., 1999). Studies in Araucaria cunninghamii (Prasolova et al., 2000), Pinus elliotii x Pinus caribea hybrids (Prasolova et al., 2005) and Pinus pinaster (Brendel et al., 2002) report variable levels of inheritance for carbon isotope discrimination in foliage and wood samples where individual tree h2 levels ranged from 0.07 to 0.72. Carbon isotope discrimination in loblolly pine seems to be under a lower level of genetic control when compared with Picea, Araucaria and Pinus pinaster. Recent work by Baltunis et al. (2008) reported a low individual tree heritability (h2=0.09) from a clonally replicated, controlled mating design, field-grown experiment in loblolly pine. However, the population-wide genetic variation in carbon isotope discrimination has not been well documented in loblolly pine. Carbon isotope discrimination has also been the target trait in genetic marker-based studies in other forest trees. Brendel et al. (2002) reported four QTL explaining 25% of the phenotypic variance for carbon isotope discrimination in Pinus pinaster. Previous association testing in loblolly pine found SNPs from four candidate genes (dhn-1, sod-chl, wrky-like and lp5-like) potentially associated with carbon isotope discrimination (González-Martínez et al., 2008), in which candidate sequences were selected based on putative functions in drought response. However, to date, a large number of potential loci have not been tested in a large population for water use efficiency in loblolly pine.

Previous analyses of carbon isotope discrimination in loblolly pine suggested that isotope ratios may be related to either water use efficiency or photosynthetic capacity (González-Martínez et al., 2007; Baltunis et al., 2008), thus traits related to growth should be measured to distinguish between SNPs influencing carbon isotope discrimination or growth traits. Prasolova et al. (2000) identified relationships in Araucauria cunninghamii between δ13C and tree growth and nitrogen concentration in the crown. The photosynthetic process in leaves uses a large proportion of foliar nitrogen (Evans, 1989), and foliar nitrogen concentration has been related to photosynthesis in loblolly pine (Springer et al., 2005), thus looking at foliar nitrogen content and tree growth with Δ13C is important to understand whether the isotopic variation relates to growth.

The objectives of this study were to estimate the variation in carbon isotope discrimination, as well as height and foliar nitrogen concentration in a population of unrelated P. taeda trees; and to use an association genetics approach to test 3938 SNPs for associations with the previously mentioned traits, estimate effects of associated SNPs, and compare their allele frequencies.

Materials and methods

Plant material

A total of 425 (n=425) unrelated genotypes of loblolly pine were selected from the North Carolina State University Cooperative Tree Improvement Program and the Western Gulf Forest Tree Improvement Program at the Texas Forest Service (henceforth referred to as the NCSU+WG population). Trees were grown from seed from trees selected from natural stands during the first generation of improvement to represent the natural range of loblolly pine (Figure 1). A small number of plantation selection seedlots, which had known-seed sources were used to cover geographical areas from which natural stand selections were not available. All seeds were open-pollinated, except for a few control-pollinated seedlots. Control-pollinated seedlots came from 2nd- or 3rd-cycle mating, but care was taken to ensure no relatedness to other entries in the population. Trees were grown from seed for 1 year and then hedged for stem cutting production using established methods for loblolly pine (Lebude et al., 2004). In the spring of 2006, rooted cuttings of each genotype were planted in a raised nursery bed comprised of a loamy sand from the coastal plain of North Carolina with a soil texture composed of 85% sand, 12.2% silt and 2.8% clay (Gocke, 2006). The bed was 1.5 × 40 m with a soil depth of 30 cm and a randomized complete block design was used, with two replicates planted per clone. Trees were planted at a 22-cm spacing in the nursery bed.

Figure 1
figure 1

Counties of origin for genotypes selected for the NCSU association population. Shaded counties represent at least one genotype.

The trees were grown for two growing seasons and foliage was collected for carbon isotope discrimination after the end of the second growing season in December 2007. Annual rainfall during 2007 was 69% of 30-year normal rainfall amounts (64 cm compared with the normal of 92 cm), so foliage produced during that growing season developed under drought conditions (http://www.nc-climate.ncsu.edu). The entire second flush of foliage from 2007 was harvested from each tree. Harvested tissue was dried at 65 °C for 48 h and isotope analysis was performed at the COIL Cornell stable isotope facility (http://www.cobsil.com). Samples were pulverized using a Spex certiprep freezer/mill and analyzed using a Thermo Delta V Advantage Isotope Ratio Mass Spectrometer (Thermo Fisher Scientific, Waltham, MA, USA). The relative abundance of 13C to 12C was determined using 3 mg samples of pine needle tissue. The carbon isotope ratio δ13C was reported against a standard of Pee Dee Belemnite (Craig, 1954) and then converted to Δ, the carbon isotope discrimination value using

where δa is the atmospheric isotope composition (assumed to be −8‰) and δp is the leaf tissue isotope composition. Foliar nitrogen was estimated from mass spectroscopy (as a mass percentage) at COIL using the same foliage samples used for isotopic analysis. Total tree height (cm) was measured after the second growing season from the soil level to the top of the terminal bud.

Analysis of phenotypic data

A two-stage approach was used for association testing. Phenotypic data were analyzed using a mixed model with the standard form:

where y is the vector of response variables (observations); X is the design matrix relating individual observations to the fixed effects in the model and b is the vector of fixed effect factors that includes the overall mean, site and replication within site effects; Z is the incidence matrix relating observations to random effects and a is the vector of random effects that includes clone effects and interactions with blocks; and e is the vector of random residual terms. The expectation of fixed terms is E(y)=Xb. The random terms are assumed to have zero means and variances var(a)=G, var(e)=R. The variance-covariance matrix of observations (vector y) is Var(y) = ZGZT+R where G and R are covariance matrices corresponding to a and e, respectively. The G matrix accounts for the genetic effects, while e accounts for random residual effects. The G is a block diagonal matrix defined as I σC2 for the variance among clones and I σCB2 for the variance of the clone by block interaction, where I is an identity matrix of dimension ni × ni (ni=number of levels of the ith term). The R=I σe2 is a diagonal matrix with the residual error variances (σe2) in the diagonal and zero covariances (null submatrix) in the off diagonals when errors are independent; I is the identity matrix of dimension equal to the number of observations. In this experiment a spatial residual structure was implemented which divides e into spatially dependent (ξ) and spatially independent (η) residuals (Dutkowski et al., 2002). For spatially dependent residuals a covariance structure was specified using a first-order autoregressive process in rows and columns:

where σξ2 is the spatial residual variance, ση2 is the independent residual variance, I is an identity matrix equal to the number of observations, and the error term is included in the individual residual variance. AR1(ρ) is a first order autoregressive correlation matrix with the form:

Broad-sense clone mean heritability was estimated using the formula:

where σC2 is the variance among clones, σCB2 is the variance due to the clone by block interaction, σE2 is the residual variance after spatial residuals are removed, b is the number of blocks (2) and t is the trees per plot (1). Standard errors of the heritability estimates were calculated using the Taylor series expansion (Gilmour et al., 2006). The best linear unbiased prediction (BLUP) values for clonal genotypes were used as phenotypes in the association analysis. All statistical analysis was performed using the ASReml 2.0 statistical software package (VSN International Ltd., Hertfordshire, UK) (Gilmour et al., 2006).

Genotypic data

Genotypes for single nucleotide polymorphisms (SNPs) were obtained using the Illumina Infinium assay (Illumina, San Diego, CA, USA). The discovery and genotyping of the SNPs has been previously described (Eckert et al., 2009). A description of the development of these SNPs is available online at http://dendrome.ucdavis.edu/adept2/. Briefly, SNPs were detected and genotyped for 7508 resequenced amplicons generated from all available unique EST contigs representing all pine ESTs known to date using an Infinium genotyping chip. From the resequenced amplicons, roughly 22,000 SNPs were discovered and 7216 were selected for genotyping. SNPs were chosen based on quality of the SNP call (for example, PolyPhred), coverage across unique amplicons and spacing within those amplicons (SNPs were chosen to be far apart). This resulted in most amplicons being represented by one SNP. SNP genotypes were selected for association analysis using the BeadStudio ver. 3.1.3.0 software (Illumina), based on quality, reliability of genotype calls, and polymorphism which produced 3938 SNP loci out of the 7216 that were informative in this population. Predicted gene function was not a criterion for the choice of SNPs for inclusion among the SNPs assayed, nor for inclusion of the resulting genotypic data in the association analysis. Genotypic data from 3938 SNP loci were available for 380 of the 425 clones measured in this experiment.

Association analyses

Association testing was performed in TASSEL using both a general linear model (GLM) and a mixed model (MLM) approach (Bradbury et al., 2007). Population structure covariates were estimated from a set of 23 nuclear microsatellite markers using STRUCTURE with a cluster number of five (Eckert et al., 2010). Marker-based kinship was estimated using a function in the EMMA (efficient mixed model analysis) package (Kang et al., 2008) in the R programming environment (R Development Core Team, 2010). We used the positive false discovery rate approach to adjust P-values for multiple testing (Storey, 2003), using the qvalue package in R with a false discovery rate of 0.05.

We chose to compare the use of population and marker-based kinship to determine whether these factors enhanced analysis of the NCSU association population. Yu et al. (2006) demonstrated the value of accounting for population structure and relatedness through the incorporation of genomic control and marker-based kinship in mixed model association testing. We compared observed p-values from association testing against a uniform distribution of expected P-values using the mean of the squared differences of potential models for association testing (Stich et al., 2008a). We compared four models to examine the distribution of P-values from association tests: a GLM with no structure or kinship effects (GLM), a GLM with covariates to account for population structure (Q), a mixed model with a marker-based kinship component (K) and a mixed model that incorporated both population structure and marker-based kinship estimates (QK).

SNP effects and genotypic frequencies

For each associated SNP, we estimated the additive (a) and dominance (d) effects using ASReml 2.0. SNPs were treated as fixed effects to test for significance and to generate best linear unbiased estimates for genotype classes of each SNP. Additionally, variance estimates of significant SNPs with three genotypic classes were generated by adding a SNP effect into the mixed model analyzing phenotypic data, where the SNP effect was treated as a random effect with two degrees of freedom. To further evaluate the effect of potential associations with phenotypes, we compared the genotypic frequencies of the tails of the population phenotypic distribution with the frequencies observed in the entire population. The tails of each phenotypic distribution were truncated at >1.5 s.d. above and below the population mean for each trait. Genotypic frequencies and tests for Hardy-Weinberg Equilibrium were performed in the Allele procedure in SAS (SAS, 1989).

To obtain annotations for SNPs, flanking sequences of the corresponding EST contig were obtained from the Dendrome database (http://dendrome.ucdavis.edu/treegenes) for all SNPs associated with a trait after multiple testing correction. We performed a BLASTx query against the NCBI non-redundant protein database (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Gene and site annotations for the strongest hits (lowest e-value) for each sequence are reported.

Results

Phenotypic variation

Carbon isotope discrimination, height and %N displayed significant variation among clones. Use of the spatial model allowed additional environmental variation to be removed, which enhanced heritability estimates and removed environmental bias from the clonal BLUP values. Figure 2 displays a variogram of the residuals from the spatial model indicating spatial trends in the nursery bed that would not have been accounted for without the spatial model in the two-replicate design. Broad-sense clone mean heritability estimates (Hc2) explained half of the variation in Δ13C (Table 1), whereas 40% of the variation in second-year height and %N were explained by variation among clones. Bivariate analyses (Table 2) revealed that Δ13C was negatively genetically correlated with height (rg(xy)=−0.37) and %N (rg(xy)=−0.42), whereas height and %N were positively correlated (rg(xy)=0.32). Phenotypic correlations taking into account both genetic and environmental effects were weak with the exception of Δ13C and %N (r=−0.55). Clonal BLUPs for carbon isotope discrimination ranged from 20.7 to 23.4 with a population mean of 22.11. Height ranged from 116 to 240 cm with a mean of 194 cm and %N ranged from 1.15 to 1.52 with a mean of 1.40. BLUPs were used as the phenotype in subsequent association testing.

Figure 2
figure 2

Variogram plot of residuals for Δ13C. The three dimensional plot reflects the similarity of trees because of environmental trends in a row and column spatial configuration.

Table 1 Phenotypic mean Pop and s.e.Pop, population minimum (Min), population maximum (Max), σc2, σɛ2 and Hc2 and Hc2 s.e. for δ13C, height and %nitrogen
Table 2 Phenotypic (below-diagonal) and Clonal (above-diagonal) correlations (and standard errors) among Δ13C, height and %N

Annotations for associated SNPs

Association analyses revealed SNPs located in regions of sequences of previously described function and also in unknown sequences (Table 3). BLASTx queries using the flanking sequences around the associated SNPs found similar proteins in the Genbank database for five out of 14 SNPs. Putative orthologs were a mitochondrial protein (0_8304_02_414), a heme activated DNA-binding protein (CL599Contig1_07_109) and an AP2 domain transcription factor (0_3648_01_357) for Δ13C -associated SNPs. For %N-associated SNPs, these analyses revealed a receptor protein kinase-like protein (2_7865_01_156), and glutamate decarboxylase (0_17195_01_417) (Table 3).

Table 3 SNP loci annotations and significance values for second-year Ht, carbon isotope discrimination (Δ13C) and foliar nitrogen level (%N)

SNP-trait associations

Significant associations between BLUP values and one or more SNP loci were found for all traits. Before multiple testing correction, the number of associations significant at the P<0.05 level ranged from 144 (K) to 193 (GLM) for Δ13C, 133 (K and QK) to 172 (GLM) for height, and 135 (QK) to 176 (GLM) for %N, across the four different models tested. After correction for multiple testing, we found four to seven SNPs associated with Δ13C, one SNP associated with height, and five to six SNPs associated with %N, depending on the model used for testing. For each trait, the model with the highest number of significant associations following multiple testing correction was reported: QK for Δ13C, GLM for height, and GLM for %N. The four models used in this analysis identified similar numbers of SNPs as associated with all traits at the four levels of P-value thresholds (see Supplementary Information).

The associations in this study were largely caused by SNPs segregating rare alleles (MAF 2–35%) with small effects (r2: 4–9%) on Δ13C, height and %N (Table 4). All significantly associated SNPs failed to depart from Hardy Weinberg Equilibrium except for SNP 2_1501_01_109. Additive effects were significantly different from zero for six loci associated with Δ13C and five loci associated with %N. Dominance effects were significant for the locus associated with height, four of the loci associated with Δ13C, and five of the loci associated with %N (Table 4).

Table 4 Description of significant SNP loci, MAF, r2, %variance estimates, additive and dominance effects

SNP effects

When individual SNPs were used in a mixed model as an independent random effect with clones, no SNP variance estimates were significantly different from zero using a chi-square test of −2logliklihood model values (not shown). Marker r2 values for significantly associated SNPs ranged from 0.04 to 0.08 indicating that individual SNP loci account for small amounts of the variation among clones (Table 4). Using a model that included the effect of clones and individual SNPs as random effects, individual SNP effects accounted for small amounts of the phenotypic variance ranging from less than 0.001% up to 3.2% in Δ13C and as high as 7% for %N.

Additive and dominance estimates were estimated for all SNPs observed to be in Hardy Weinberg Equilibrium with all three genotype classes present. Additive effects were significant for six SNPs associated with Δ13C, but the estimates of SNP effects were not precise as standard errors were greater than half the value of the estimate (Table 4). The ratio of dominance to additive effects (d/a) showed dominance effects similar in magnitude to additive effects ranging from −0.79 to 1.30 for Δ13C (Table 4). Dominance effects for %N were similar in range (−0.75 to −1.04) with the exception of locus CL1074Contig1_03_101 where dominance was only 1% of the additive effect. The dominance effect for tree height at locus 0_14415_01_190 was the largest dominance effect observed at more than three times the additive effect, which resulted in a reduction in height (Table 4).

Genotypic frequencies

We tested for differences in genotypic frequency in the tails of the population phenotypic distribution for all loci significantly associated with Δ13C, height and %N. For each trait, we compared the extremes of the population, which we considered to be clonal BLUP values greater than 1.5 s.ds. above and below the mean of clonal values. Using the population mean as the expected genotypic frequencies, we observed nine loci with significant frequency changes (P<0.05) from the mean for one of the phenotypic tails (Table 5). For each trait, we observed SNPs with departures from the overall population genotype frequencies (one SNP for height (0_14415_01_190), four SNPs for Δ13C (0_17030_01_94, 0_10921_01_353, CL599Contig1_07_109 and 0_8304_02_414), and five SNPs for %N (0_17195_01_417, 2_1087_01_86, 2_4191_01_104, 2_7865_01_156 and CL1074Contig1_03_101). The single SNP that survived multiple testing correction for height displayed an increased frequency in the heterozygous class for the tail with lower height (P<0.01), whereas SNPs for Δ13C and %N reflected changes in genotypes with the minor alleles in one tail.

Table 5 Genotypic frequencies by SNP and phenotypic class (±1.5 std dev) with χ2 tests for significant differences from the population

Discussion

The ability of plants to respond to different levels of available water is variable and complex. Members of the genus Pinus use a drought avoidance strategy where under well-watered conditions water use is maximized, but decreases quickly when water is limiting to avoid low water potential (Martinez-Vilalta et al., 2004). Moderate genetic correlations indicate an increase in height growth and nitrogen content in foliage as Δ13C decreases, suggesting that under moderate water stress during the growing season, faster growth is related to assimilation rather than stomatal control as seen by Brendel et al. (2002) in Pinus pinaster. Previous analyses of height and Δ13C revealed similar results suggesting that carbon isotope analysis may be used to improve water use efficiency in loblolly pine populations (Baltunis et al., 2008). An understanding of the genes involved in carbon isotope discrimination and foliar nitrogen content are valuable for both the environmental services provided by natural forests managed with low intensity and wood productivity in more intensively managed plantation forests. The underlying genetic variation in water use efficiency may be valuable for modeling forest stand dynamics, as well as for producing more wood products in a sustainable and efficient manner. In the future we will likely be able to associate alleles of genes with more specific traits to dissect complex traits of economic and ecological value (Nelson and Johnsen, 2008). Carbon isotopes have shown promise for use in improving water use efficiency in pine species, but adequate sampling across environments, such as the natural range of loblolly pine, should be taken into account.

Significantly associated SNPs in this experiment were found in both known and unknown gene sequences and in some cases SNPs were in functional genes related to similar traits in other plant species. The sequence flanking SNP 0_3648_01_357 for Δ13C was similar to an AP2 domain transcription factor. The AP2 domain family of transcription factors has been associated with ABA-sensitive abiotic stress response in Arabidopsis (Finkelstein et al., 1998) in seed and leaf tissue. In Pinus strobus the over-expression of the ERF/AP2 transcription factor CaPF1 conferred increased drought and freeze tolerance in young plants (Tang et al., 2007). SNP 0_3648_01_357 results in a non-synonomous codon change, and accounted for 3.8% of the phenotypic variation in this experiment suggesting that a response to stress is linked to Δ13C variation in this experiment.

Nitrogen is critical to plant growth and photosynthetic activities, and it is often a limiting nutrient for growth in forest plantations (Fox et al., 2007). To date, genes in loblolly pine affecting the use or uptake of nitrogen have not yet been identified. The sequence flanking SNP 0_17195_01_417 is similar to glutamate decarboxylase (GAD), which is involved in the production of GABA, nitrogen metabolism and C:N ratio balance in Arabidopsis (Bouché and Fromm, 2004). SNP 2_7865_01_156 was identified as a receptor-like protein kinase-like (RLK) protein, a class of proteins which are important for many plant functions including nitrogen fixation in legumes such as alfalfa, soybean and peas (Morris and Walker, 2003).

Association analyses have been performed in populations of loblolly pine for wood quality traits (González-Martínez et al., 2007) and carbon isotope discrimination (González-Martínez et al., 2008), and cold-tolerance in Douglas-fir (Eckert et al., 2009) using a candidate gene approach. To date, associated SNPs for traits in loblolly pine explained the highest proportion of phenotypic variance in wood quality traits (20% of phenotypic variance) while only 7% of the phenotypic variance was accounted for in this experiment for the Δ13C-associated SNPs. Recent association analyses in conifers have explained small proportions of the phenotypic variation with individual SNPs (González-Martínez et al., 2007, 2008; Eckert et al., 2009), supporting the treatment of these traits as polygenic and quantitative in nature (Falconer, 1989).

Our results complement those of González-Martínez et al. (2008) with new SNPs in previously unidentified sequences, as well as those found in known sequences. Associations identified in this study and candidate genes from the work of González-Martínez et al. (2008) in carbon isotope discrimination association testing in loblolly pine reveal functional annotations of genes that are involved in abiotic stress response rather than growth functions, which might suggest that Δ13C is related to both stomatal conductance and photosynthetic capacity, but may be an artifact of the candidate genes tested in these experiments. The gene annotations may be somewhat contradictory to the trait relationships observed in this study, but the candidate genes used in this experiment do not cover a majority of the loblolly pine genome, so gene annotations by themselves are not conclusive.

In both studies, the most significant SNPs accounted for small portions of the phenotypic variance supporting a polygenic response to regulate water loss in loblolly pine. González-Martínez et al. (2008) tested only a small number of candidate genes (46 SNPs from 41 genes), whereas we tested associations for 3938 SNPs from nearly as many genes. Unfortunately, the candidate gene sequences from González-Martínez et al. (2008) were not included in this study, thus validation of previous associations was not possible. In both studies, the SNPs represent a small proportion of the total variation in the loblolly pine genome, which is 24,000 MB (Ahuja and Neale, 2005; Morse et al., 2009). Thus, there is future opportunity to identify additional genes that influence Δ13C and growth traits.

The population used in the González-Martínez et al. experiment consisted of 61 families from 31 parents in a partial-diallel mating design from a narrower geographical range, whereas the population in this experiment consisted of largely unrelated trees sampled from across the native range of loblolly pine. Allele frequencies for associated SNPs in their experiment were higher than observed in this study, which may be a product of the different population sizes or the degree of kinship among the individuals analyzed. Our study examined SNP alleles across 380 unrelated parents, compared with offspring of 31 unrelated individuals in the González-Martínez et al. (2008) work. This study was not replicated on multiple sites, but did experience water stress during the growing season, while the study of González-Martínez et al. (2008) was installed on two sites that were not limited in available water. A replicated experiment in a water-limited environment may be necessary to validate the marker-trait associations identified in this experiment.

Improved methods and larger populations are needed to have greater precision in estimates of marker effects. F-tests show significant additive and dominance effects at several SNP loci associated with measured phenotypes, but best linear unbiased estimates of the genotypic effects revealed large standard errors. The estimated magnitudes of allelic effects in this population are likely to be biased upward, because effects are estimated from a truncated distribution (Xu, 2003). In addition, the majority of minor allele frequencies observed for significant SNPs in this population are low (0.05–0.10) with one exception at 0.35, thus the small number of observations with these alleles are more heavily weighted in the estimation of allelic effects and variances. Experiments using a larger population are likely to more accurately estimate the true effect of any associated SNP polymorphism. A spatial model was implemented to remove environmental variability that would not have been removed by the randomized complete block experimental design. The individual SNP models regressed the BLUP of each clone for each trait on individual SNPs. BLUP predictions were based on two replicates of each genotype planted in a relatively small, homogenous nursery bed. R-square values and the strength of any relationship between a SNP polymorphism and clonal BLUP values may not be repeatable in a larger, more heterogeneous field trial, which is typical of forest tree progeny trials. Future experiments should incorporate efficient designs to remove environmental variation and improve the precision of phenotypes used in association studies.

In this population the magnitude of the dominance effects was similar to that of additive effects. Excluding two SNPs, 0_14415_01_190, which was associated with height, and CL1074Contig1_03_101, which was associated with %N, the ratio of dominance to additive ranged from 0.75 to 1.3 (Table 5). Eckert et al. (2009) reported similar ratios of dominance to additive effects for cold-tolerance-related traits in Douglas-fir, where most dominance effects were 0.8x–1.3x the additive effect. Additive effects have been the focus of tree breeding programs for loblolly pine but the use of non-additive effects may be an opportunity for capturing desirable trait attributes in deployment populations.

The largest dominance effect observed in this study was the effect of SNP 0_14415_01_190 on height. This effect is supported by the increased number of heterozygotes in the low-end tail for height in the population (Table 5). This difference in genotypic frequency was significant (P<0.01), and the proportion of heterozygotes in the low-end tail increases from 0.41 in the entire population to 0.68 in the low-end tail. However, variance in height due to this SNP was very small (<0.001%) and was not significant, highlighting the challenge to discover variants and estimate the magnitude of their effects in field trials. Eckert et al. (2009) reported that 86% of the SNPs tested for associations with cold tolerance in Douglas-fir showed non-additive effects. If such SNPs are validated and account for significant variation, non-additive effects could have an important role in the improvement of growth.

We observed significant changes in genotypic frequency between at least one tail of the population and the overall population for nine SNPs, lending support that the SNP variants are having some impact on the phenotypes observed in this study. Finding significant changes in allele frequency between the tails of the population suggests that a pooled sampling approach may be of value in forest trees. Pooling individuals based on phenotype for SNP discovery and allele frequency estimation has shown promise in effectively discovering polymorphic SNPs and estimating allele frequencies within the pooled phenotypic classes in human blood and disease studies (Craig et al., 2009; Druley et al., 2009). Such an approach could significantly reduce genotyping cost as compared with association studies in large populations; and identify candidate SNPs for future studies based upon allele frequency differences.

Recent characterization of population structure in loblolly pine, which included the NCSU+WG population, identified 24 SNPs as Fst outliers and five loci associated with geographical variation for potential evapotranspiration (Eckert et al., 2010). Two of the Fst outliers were associated with traits in this analysis: 0_14415_01_190 for height and 2_1087_01_86 for %N. If selection is occurring on these genes in different sub-populations, this may bias the associations and care must be taken in interpreting the result.

Conclusion

We identified 14 new marker-trait associations for Δ13C, height and %N in loblolly pine. Results suggest that Δ13C, height and %N in loblolly pine are influenced by many genes, each having a small effect, in keeping with the ‘infinitesimal model’ of quantitative genetics (Fisher, 1918). A recent review summarized data from many experiments, and reported that additive genetic models adequately account for variation in complex traits (Hill et al., 2008), but results from this study suggest that there are both additive and non-additive effects on adaptive and economic traits. As breeding programs select for desirable genotypes of loblolly pine, the allele frequency distribution in pine breeding populations are likely to change from that found in the non-domesticated individuals analyzed in this study. As allele frequencies move closer to 0.5, and alleles that were originally rare in the wild population become more abundant through selection, more complex models that account for non-additive effects are likely to be useful (Mackay et al., 2009). The application of analytical tools such as spatial models will help remove the environmental variation from phenotypic data and may help strengthen future association testing in field-grown trees. Traditional breeding methods based on quantitative genetic approaches have been successful in the improvement of loblolly pine populations, but the development of forest tree genomic resources will aid the improvement and understanding of ecological and economically valuable traits in forest trees.