Introduction

Heritability, the proportion of total phenotypic variance attributable to the additive effects of genes, is a measure of the potential response to artificial or natural selection. An absence of heritable genetic variance would pose a fundamental constraint to evolution. Accurate measures of heritability are thus a prerequisite for predictions of evolutionary response to selection. The covariance between relatives for quantitative traits is the basis for estimating the heritability of traits and their genetic correlations (Falconer, 1989). Classically, the level of relationship between relatives is calculated from known pedigrees (Cannings & Thompson, 1981). A variety of statistical methods can be used to estimate genetic and environmental variance components for quantitative traits from comparisons of full-sibs and half-sibs, parents and their offspring or by response to selection (Falconer, 1989). For the most part, studies of quantitative genetic variation are conducted in the laboratory or greenhouse because of the need to conduct controlled breeding programmes and to minimize environmental variance that can confound estimates of genetic variation.

Two consequences of the constraints posed by traditional quantitative genetics designs are: (i) organisms with long generation times (e.g. trees, large vertebrates) are very difficult to study; and (ii) an unknown relationship between estimates of heritability and genetic correlations generated in the laboratory and those expressed in wild populations experiencing natural environmental conditions. Previous surveys of quantitative genetic variation (e.g. Mousseau & Roff, 1987; Roff & Mousseau, 1987; Houle, 1992) generally found significant heritabilities for most organisms, even for traits believed to be under intense selection (e.g. fitness components). However, because most of these studies were conducted in a controlled laboratory setting, their relevance to the real world (and hence our ability to predict evolutionary change in the wild) is unknown. It is widely believed that heritability will generally be lower in natural populations because of the obscuring effects of natural environmental variation (Price & Schluter, 1991; Houle, 1992) and maternal effects (e.g. Mousseau & Dingle, 1991). It has even been suggested that the lower heritabilities observed for fitness components, such as fecundity and survival (Mousseau & Roff, 1987; Roff & Mousseau, 1987), may simply reflect these traits' greater sensitivity to environmental variance rather than the eroding influences of natural selection (Price & Schluter, 1991; Houle, 1992).

Solutions to these problems include generating artificial populations in the wild (e.g. Mitchell-Olds, 1986; Shaw, 1986), cross-fostering designs (e.g. Boag & Grant, 1978; van Noordwijk et al., 1980; Smith & Dhondt, 1980; Alatalo & Lundber, 1986), or the generation of populations adapted to the laboratory environment (e.g. the Rose lines of Drosophila melanogaster; Chippindale et al., 1996). More recently, Drosophila workers have estimated heritability in nature by regressing laboratory-grown progeny on field-collected parents (Coyne & Beecham, 1987; Prout & Barker, 1989; Riska et al., 1989). Unfortunately, each of these approaches has limitations: the generation of artificial populations ignores natural mating patterns; cross-fostering is limited to organisms that can be tracked through time and is very costly and time consuming; and evolution in laboratory populations may have limited relevance to natural populations and is dependent upon organisms that can be crossed and reared under laboratory conditions. Furthermore, all of these approaches are generally limited to organisms with relatively short generation times.

In this paper, we outline and apply a novel method for estimating heritabilities and genetic correlations that could be used in wild, naturally occurring populations. This method permits quantitative genetic analysis of wild populations with natural breeding patterns. It is particularly appropriate for organisms with long generation times, because it does not require any form of controlled breeding programme. It is not a method for detecting quantitative trait loci (QTL). Consequently, only a handful of markers is required. In essence, a simple pedigree is inferred for a wild population using molecular markers, and this pedigree is subsequently used to estimate quantitative genetic parameters for traits of interest. Here, we test this method using a captive population of the Pacific Chinook salmon, Oncorhynchus tshawytscha.

The basic approach is as follows. First, individuals in the study population are genotyped for molecular marker loci and scored for quantitative traits. Secondly, a maximum likelihood procedure infers relatedness between pairs of individuals, based upon the marker data, assuming a mixture of unrelated and full-sib pairs. Thirdly, estimates of relatedness are combined with quantitative trait data in a mixture model to infer heritabilities and genetic correlations. The procedure used in this paper differs from that presented by Ritland (1996a), in that here we make an explicit assumption about the distribution of relatedness (either full-sibs or unrelated), allowing maximum likelihood methods to be used. In contrast, Ritland (1996a) gave a linear estimation procedure that does not make assumptions about the distribution of relatedness and that is more appropriate for inferences involving natural populations, in which relatedness spans a near continuum.

Materials and methods

The study system

The Chinook salmon is a species of considerable commercial and ecological importance along the Pacific north-west coast of North America. Coastal populations generally spend the first 6–8 months of their life in freshwater and then undergo a complex physiological transformation (smoltification) and migrate to saltwater (Healey, 1991). Adults typically return to freshwater to reproduce after 2–5 years at sea. Some males of this species undergo precocious sexual maturation (‘jacking’), whereby males return to natal streams 1–2 years earlier than females of their cohort. These ‘jacks’ rely on a sneaker strategy (Krebs & Davies, 1984) to obtain matings as opposed to the territorial defence strategy of normal (large) males. Because they mature early and at a smaller size, jacks are of little economic value to commercial or recreational fisherman. The elimination of jacking genotypes would also be highly desirable for the salmon aquaculture industry. Similarly, flesh colour (red is preferred) and growth rate are traits of considerable economic importance.

In order to estimate the heritability of jacking, weight, length and flesh colour, we generated a captive population containing a mix of full-sibs and unrelated individuals. Wild-caught males and females (15 of each sex), originally collected from Roberston Creek, British Columbia, were mated to generate 15 full-sib families. During the first year, offspring were reared in large freshwater holding tanks. In the second year, offspring were transferred to large (10×10×10 m) sea pens. One hundred and seventy 2-year-old fish were collected and measured for flesh colour, body weight and length and the incidence of jacking. At this age (2 years), jacks show functional testes and adult skin coloration, whereas nonjack males and females are immature with respect to gonad development and skin coloration (i.e. silver).

DNA extraction and fingerprinting

The methods used for DNA extraction and fingerprinting are provided by Heath et al. (1994). Briefly, liver or testes samples were taken from each fish for DNA extraction according to the protocol given by Devlin et al. (1991). For Southern analysis, 3 μg of DNA from each individual were digested overnight with the restriction endonuclease Hae III, size fractionated by gel electrophoresis and transferred to 20×25 cm nylon membranes. The membranes were hybridized with two probes: OtSL1, a chinook salmon hypervariable probe developed by Heath et al. (1994) using directed amplification of minisatellite region DNA (DAMD; Heath et al. 1993), and Ssa1, a highly variable single-locus probe (SLP) developed for Atlantic salmon (Bentzen & Wright, 1993) and found suitable for use with chinook salmon. Nylon membranes were first hybridized with random-primed radiolabelled OtSL1 probe and exposed to X-ray film for 1–2 days with an intensifying screen at −70°C. Following autoradiography, membranes were stripped of OtSL1 probe and rehybridized with Ssa1 following the same protocol. Observed alleles were scored and binned according to Heath et al. (1995). The OtSL1 and Ssa1 probes yielded 23 and 21 alleles, respectively.

A mixture model for estimating heritability using molecular markers

Ritland (1996a) presented and evaluated a marker-based procedure for inferring heritability using marker-inferred relatedness, and this method was applied to natural populations of monkeyflowers (Ritland & Ritland, 1996). That procedure was based upon a linear model, which made no a priori assumptions about the distribution of relatedness.

This is appropriate for natural populations, in which a full spectrum of possible relationships can occur. In contrast, our salmon population was known to have been created by the mating of 15 pairs of parents. Thus, we can assume individuals are either mutually unrelated or full-sibs. This existence of only two types of relatives allows an alternative estimation procedure to be used: a ‘mixture model’ of unrelated pairs and full-sib pairs. Because we know the distribution of relatedness, we can also use maximum likelihood, a procedure that requires us to specify the distribution of data. This distributional assumption can increase our statistical power but, if assumptions are invalid, can also bias our estimates.

The data consist of the following. Observations are taken of a population consisting of m individuals. We have measurements of one or more quantitative traits on each individual and have assayed at least one marker locus in each individual for the purpose of estimating the level of relatedness between individuals. Relatedness between individuals is unknown but is restricted to being either unrelated or full-sibs (extension to other classes of relationship is straightforward).

Our expectations are as follows. With purely additive genetic variation and no sharing of environmental or maternal effects, the expected correlation between two individuals for a quantitative trait is either zero (when unrelated) or 0.5h2 (when full-sibs), h2 being the narrow-sense heritability. This follows because the expected correlation between relatives for a quantitative trait is 2rh2, where r is the coefficient of kinship (the probability that a gene from the first individual is identical-by-descent with a homologous gene from the second individual; cf. Jacquard, 1974). In our case, r=0 (unrelated) or 1/4 (full-sibs). Unlike previous studies, we infer the relationship probabilistically from markers.

At this point, we can specify the distribution function for quantitative traits of two individuals, denoted Yi and Yj, in three different ways. First, Yi and Yj may follow a bivariate distribution, in which relationship and heritability affect the covariance term but not the mean or variance term. Secondly, the product YiYj may follow a univariate distribution, in which relationship and heritability affect the mean term but not the variance term. Finally, the sum Yi+Yj may follow a univariate distribution, in which relationship and heritability affect only the variance term. The second approach, although useful in the linear statistical approach of Ritland (1996a), requires a non-normal distribution function because the product YiYj is skewed. Also, the expected product of YiYj between unrelated individuals is actually negative and not zero, and including its expectation increases the complexity. The first and the third approaches extract essentially the same information from the data, and the third is much simpler; thus, we adopt the third approach.

The statistical model is therefore constructed as follows. For the ith individual, the value of a trait is Yi. The probabilities of the marker data at a single locus, given the type of relationship, are given in Table 1 [these are obtained from table 2 of Ritland (1996b) using r=h=0 for unrelated individuals and r=1/4, h=1/8 for full-sibs]. In the following, these probabilities for relationship between individuals i and j for the kth locus are denoted as Pijk for unrelated individuals and Pijk| f for full-sibs. The probability of data at several marker loci is the product of conditional single-locus probabilities. Let the probability that two randomly sampled individuals are full-sibs be denoted as a. For simplicity, we denote the normalized trait value as Y§i=(Yi−μy)/σ 2y for μy, the mean of Yi, and σ2 y is the variance of Yi. The expectation of Y§i+Y§j is zero and the variance of Y§i+Y§j is 2+h2 for full-sibs and simply 2 for unrelated individuals.

Table 1 Genotype probabilities for two diploid relatives at a marker locus, conditioned upon mode of relationship (unrelated or full-sib) Subscripts denote alleles, with j, k and l denoting any mutually exclusive alleles other than i.
Table 2 Estimates of heritabilities in salmon based upon the marker method (SEs in parentheses, based on 100 bootstraps), calculated as a mean over the range of possible fractions of full-sibs, from a=0.07 to a=0.10

We assume that the traits Y are Gaussian (normally) distributed according to the zero-mean probability function F(Y, σ2). Under the two possible levels of relatedness, the likelihood of each pair is a sum of two likelihoods (corresponding to unrelated vs. full-sibs), and the total likelihood is the product across all possible pairings of individuals in the population:

This equation can also be used to estimate genetic correlations by letting Y§j be the normalized value of a different trait on the second individual. The genetic parameter estimate obtained in this case is the additive genetic covariance between the two traits divided by the geometric mean of the phenotypical variances of the two traits. The genetic correlation is subsequently found by dividing this quantity by the geometric mean of the heritabilities of the two traits. In cases in which the heritability for one or both traits is not statistically significant, estimates of the former parameter (the additive genetic covariance between the two traits divided by their phenotypical variances) can at least be used to determine the sign of the genetic correlation.

The numerical value of heritability that maximizes the above likelihood equation can be found by the classical Fisher scoring method. In this method, the first derivative of the log-likelihood is obtained for each pair of individuals; this is denoted the ‘score’ Sij and it works out to be

where Pij |r=Lij,r/(Lij+Lij, f) is the probability that pair ij are full-sibs, given their marker data, the quantitative trait data and the prior probability of full-sibs in the population. Under the scoring method protocol, the estimate of heritability is iterated as

where the prime denotes the revised value. The convergence of this recursion to the maximum was confirmed by computing likelihoods about the final converged value.

In this paper, we assume a is known, as its true value cannot be estimated because, in fact, the estimate of mean pairwise relatedness is approximately zero, regardless of the level of true relatedness (Ritland 1996b). This downward bias is caused by the drift of gene frequency occurring during the formation of family structure in the population. Attempts to estimate a using the above likelihood equation with our data also confirmed this. The trade-off of requiring prior knowledge of a is that, in contrast to the method of Ritland (1996a), the variance of actual relationship is not estimated. In our estimation, we estimated heritability for each of several plausible values of a, ranging from 0.07 to 0.10 in increments of 0.01. Estimates were found to depend very little on a in this range, varying by 10 per cent at most.

The error of the heritability estimate can be determined by bootstrapping the data. Ideally, entire families should be the unit of resampling but, in our case, where markers incompletely infer relatedness, one can only resample individuals. The effect of this is that errors are slightly underestimated. (A copy of the FORTRAN code used for this analysis is available from K.R. at ritland@unixg.ubc.ca.)

Results and discussion

Heritabilities and genetic correlations generated using our marker-based approach are given in Tables 2 and 3 (values obtained for a=0.07–0.10 in increments of 0.01 were averaged). Estimates of heritability for jacking and colour were significantly greater than zero, and all estimates generally fall within ranges commonly observed for these traits in both salmon (e.g. Gjerde & Gjedrem, 1984; Heath et al., 1994) and animals in general (e.g. Mousseau & Roff, 1987; Roff & Mousseau, 1987; Houle, 1993).

Table 3 Estimates of genetic correlations in salmon based upon the marker method (all estimates did not differ significantly from zero)

The estimate of heritability for jacking (0.67) is in accordance with previous estimates of 0.73–1.86 obtained using sib analyses in a conventional quantitative genetic design for this same population of chinook salmon (Heath et al., 1994). Because sire–offspring regression analysis indicated heritabilities of 0.32–0.77, large nonadditive genetic effects are probably present for this trait in this population (Heath et al., 1994).

Genetic correlations among all traits were usually positive, but even the strong positive genetic correlation between length and weight did not differ significantly from zero, which is not surprising given the relatively small number of individuals used in this study (n=170). However, the pattern of estimates accords with expectations, with size traits highly positively correlated and correlations among the other traits intermediate. These results agree with the biology of this fish: jacks generally grow quickest and show immature flesh colour (white) even following sexual maturity (normal individuals turn to red).

The principal objective for the development of the molecular marker-based method for quantitative genetic analyses presented here and elsewhere (Ritland, 1996a, b; Ritland & Ritland, 1996) was to provide a means for examining quantitative genetic variation in natural populations that did not require experimental manipulation and to allow genetic studies of organisms not amenable to laboratory studies either because of long life spans or difficulty of rearing in captivity. In most natural populations, it is not possible to establish mating crosses, nor is it usually possible to monitor natural breeding patterns. And even when organisms establish territories and appear to breed monogamously so that probable parents can be identified (e.g. many birds), extrapair copulations (i.e. cuckoldry) and their effects on relationships within families are not easily detected. The approach presented in this and related papers (Ritland, 1996a, b; Ritland & Ritland, 1996) circumvents these problems because it does not require manipulation of the study population in any way. Instead, individuals are sampled and genotyped (which can often be achieved in a nondestructive manner), and patterns of relatedness are inferred from molecular markers.

Similarly, many organisms have very long life spans (e.g. trees, many vertebrates, salmon, etc.). For these organisms, the generation of artificial populations for quantitative genetic analysis is a time-consuming and expensive venture. The approach advocated here permits study of these organisms because it does not require that individuals be monitored at all stages of development. This is an especially appealing attribute when working with mobile organisms (e.g. fish), which are very difficult to rear individually for the duration of their life span. Using a marker approach allows for group rearing, such as that conducted for this study. However, it should be noted that the MLE method promoted in this paper is not appropriate for situations in which the distribution of relatedness in a study population is not known (i.e. here, we knew a priori that all individuals were either full-sibs or unrelated); the methods outlined in Ritland (1996a) and Ritland & Ritland (1996) are more appropriate when the distribution of relatedness is unknown.

There are several situations under which our approach will result in biased estimates of quantitative genetic parameters. For example, all of the usual assumptions for quantitative genetic analyses still hold. Directional or stabilizing selection will bias heritability estimates downwards, whereas maternal and common environmental effects will tend to inflate heritability estimates (Falconer, 1989), although some of these influences can be detected and statistically removed from estimates of genetic variance (Ritland, 1996a, b). Also, it is assumed that parents are unrelated and represent a random sample from the study population, which is also a requirement for more conventional approaches. For this study, a random sample of wild-caught parents was used to generate the experimental families, no deliberate selection was imposed (and study individuals were a random sample of a much larger number of total offspring reared) and all individuals were reared in a common garden (a large sea pen). These precautions should have reduced the potential influence of inbreeding, selection and environmental effects, although there was no way of eliminating the possible importance of maternal effects, which have been shown to be important in many other organisms (e.g. Mousseau & Dingle, 1991; Mousseau & Fox, 1998).

Several potential problems inherent in our approach include insufficient molecular marker variability and a low variance of relatedness in the population of interest. For example, in a highly inbred population, there will be few polymorphic markers, and all individuals will be highly related to one another. In contrast, large heterogeneous populations may be highly polymorphic, but most random pairs of individuals will show a very low degree of relatedness. Under these conditions, our approach will not work. However, some of these problems can be avoided by sampling at an appropriate scale and life history stage. For example, although many organisms disperse widely (e.g. salmon), some are philopatric and return to natal territories for breeding. As long as sampling is conducted near the breeding area (either for adults or offspring), average and variance in relatedness will probably be sufficient for our approach. In plants, dispersal distances are often small, resulting in low variance and a high degree of relatedness (i.e. inbreeding) over short distances. This problem is solved by sampling individuals over a wider geographical range (e.g. Ritland & Ritland, 1996).

A related problem stems from the number and degree of polymorphism for molecular markers. The method proposed here, as well as related methods described in Ritland & Ritland (1996) and Ritland (1996a, b), is absolutely dependent upon there being a large number of alleles for the molecular markers. This can be accomplished using either a few highly polymorphic loci or many less polymorphic loci. As discussed in Ritland & Ritland (1996) and Ritland (1996a, b), it is a multiplicative function of the number of loci and the number of alleles at each locus that will determine the precision with which pairwise relatedness can be estimated. Unlike studies of quantitative trait loci (QTL), where a large number of loci are usually required to sample from the entire genome, the method proposed here will work with a few loci, as long as there are many alleles present in the population (e.g. microsatellites, minisatellites).

Ideally, this study would have been conducted using additional molecular markers and a great many more individuals. More markers (i.e. total number of alleles) would enhance the precision with which relatedness might be estimated (see Ritland, 1996a, b for discussions of the relationship between the number of markers and estimates of relatedness), and an increase in the total number of individuals greatly enhances the precision and accuracy of estimates of heritability and genetic correlation (Falconer, 1989). The problem of sample size is frequently encountered in studies of quantitative genetic variation, and our method is not a panacea for these requirements. However, the pilot project described here serves very well as a test of the feasibility of this method for future studies on a larger scale.

Despite the shortcomings and pitfalls listed above, the ability to examine quantitative genetic variation in wild, undisturbed populations represents a major advance in the study of evolutionary genetics. The advent of DNA fingerprinting and allied techniques has made the generation of highly polymorphic molecular markers a relatively simple task, and the wide availability of high-speed computers has allowed for sophisticated and numerically intensive statistical analysis of population data. The approach advocated in this paper (and others) will probably be improved and ultimately lead to unexpected insights into the genetic mechanisms shaping organismal evolution in the wild.