For many traits, including susceptibility to common diseases in humans, causal loci uncovered by genetic-mapping studies explain only a minority of the heritable contribution to trait variation. Multiple explanations for this ‘missing heritability’ have been proposed1. Here we use a large cross between two yeast strains to accurately estimate different sources of heritable variation for 46 quantitative traits, and to detect underlying loci with high statistical power. We find that the detected loci explain nearly the entire additive contribution to heritable variation for the traits studied. We also show that the contribution to heritability of gene–gene interactions varies among traits, from near zero to approximately 50 per cent. Detected two-locus interactions explain only a minority of this contribution. These results substantially advance our understanding of the missing heritability problem and have important implications for future studies of complex and quantitative traits.
Individuals within species show heritable variation for many traits of biological and medical interest. Most heritable traits follow complex inheritance patterns, with multiple underlying genetic factors2,3. Finding these factors has been a central focus of modern genetic research in humans, as well as in model organisms and agriculturally important species4,5,6,7. Recent work, most notably genome-wide association studies (GWAS) in humans, has underscored the problem of missing heritability: although many genetic loci have been identified for a wide range of traits, these typically explain only a minority of the heritability of each trait, implying the existence of other, undiscovered genetic factors1.
Multiple non-mutually exclusive explanations have been proposed for missing heritability1. One possibility is that the undiscovered factors could have effects that are too small to be detected with current sample sizes, or even too small to ever be individually detected with statistical significance8,9. The existence of many small-effect variants is supported by studies showing that a large proportion of heritable trait variation is tagged when all GWAS markers are considered simultaneously10,11. Because GWAS can only detect variants that are common in the population, another possibility is that the undiscovered variants are too rare in the population to be captured by GWAS12. One recent proposal13 highlights the fact that non-additive interactions among loci (sometimes termed epistasis) may inflate heritability measures14. Other proposed contributions to missing heritability include structural variation, gene–environment interactions, parent-of-origin effects, heritable epigenetic factors and ‘entirely unforeseen sources’1,15. A better understanding of the sources of missing heritability is crucial for designing studies to find the missing components.
We set out to investigate these questions in the yeast Saccharomyces cerevisiae. We and others have previously used a cross between a laboratory strain and a wine strain to investigate the genetic basis of many complex traits, including global gene expression, protein abundance, telomere length, cell shape, gene-expression noise and drug sensitivity, and we have demonstrated missing heritability in this system16,17. Recently, we used extreme quantitative trait locus mapping (X-QTL), a bulk segregant approach that uses pools of millions of cross progeny (segregants), to detect many loci underlying heritable trait variation18. We showed that for one trait, loci detected by X-QTL explained most of the heritability. However, pooled approaches do not allow direct estimates of heritability, the contribution of gene–gene interactions or locus effect sizes. Here we use a large panel of individually genotyped and phenotyped yeast segregants to accurately measure the heritable components of many quantitative traits, discover the underlying loci and examine the sources of missing heritability.
To estimate heritability and detect the underlying loci with high statistical power, we constructed a panel of 1,008 prototrophic haploid segregants from a cross between a laboratory strain and a wine strain (Supplementary Fig. 1 and Methods). These strains differ by 0.5% at the sequence level19. We sequenced the parent strains to high coverage and compared the sequences to define 30,594 high-confidence single-nucleotide polymorphisms that distinguish the strains and densely cover the genome. We obtained comprehensive individual genotype information for each of the 1,008 segregants by highly multiplexed short-read sequencing (Supplementary Fig. 1d).
We sought to accurately measure a large number of quantitative traits in the segregant panel. To do so, we implemented a high-throughput end-point colony size assay and measured growth in multiple conditions, including different temperatures, pHs and carbon sources, as well as addition of metal ions and small molecules16,18 (Supplementary Fig. 1c). We defined each trait as end-point colony size normalized relative to growth on control medium (Methods), and obtained reproducible measurements with a strong heritable component for 46 traits (Supplementary Table 1). Most of the traits were only weakly correlated with each other (Supplementary Fig. 2).
Phenotypic variation in the segregant panel can be partitioned into the contribution of heritable genetic factors (broad-sense heritability) and measurement errors or other random environmental effects. Broad-sense heritability can, in turn, be partitioned into the contribution of additive genetic factors (narrow-sense heritability), dominance effects, gene–gene interactions and gene–environment interactions14. In our experiment, dominance effects are absent because the segregants are haploid, and gene–environment interactions for a given trait should also be absent as all the segregants are grown simultaneously under uniform conditions. Thus our estimates of broad-sense heritability include additive and gene–gene interaction components, whereas our estimates of narrow-sense heritability include only the additive component. The difference between the two heritability measures therefore provides an estimate of the contribution of gene–gene interactions.
We estimated broad-sense heritability from repeatability of trait measurements (Methods). Estimating narrow-sense heritability usually involves measuring phenotypic similarity for different degrees of relatedness. We took advantage of a recently developed genomic approach in which narrow-sense heritability is estimated by comparing phenotypic similarity among individuals with their actual genetic relatedness, computed from dense genotype data (Methods)20. Among the 46 traits, broad-sense heritability estimates ranged from 0.40 to 0.96, with a median of 0.77. Narrow-sense heritability estimates ranged from 0.21 to 0.84, with a median of 0.52 (Fig. 1). An analysis that partitioned additive genetic variation among chromosomes produced similar results (Supplementary Table 2). We used the difference between broad-sense and narrow-sense heritability to estimate the fraction of genetic variance due to gene–gene interactions, which ranged from 0.02 to 0.54, with a median of 0.30. Thus, the genetic basis for variation in some traits is almost entirely due to additive effects, whereas for others approximately half of the heritable component is due to gene–gene interactions.
Next, we sought to map the additive heritable variation to specific quantitative trait loci (QTL). Simple linkage analysis of one marker at a time revealed multiple QTL per trait (Methods). To more accurately capture the effects of each QTL while controlling for the other QTL affecting the same trait, we used a step-wise forward-search approach to detect QTL and build a multiple-regression model (Methods). With this approach, we detected a total of 591 QTL for 46 traits at an empirical false-discovery rate (FDR) of 5% (Supplementary Table 3). We observed varying degrees of trait complexity, with a minimum of 5, a maximum of 29 and a median of 12 QTL per trait. These numbers of QTL are comparable to those previously seen for a smaller set of traits by X-QTL18. Consistent with theoretical predictions21 and previous observations, we detected many more QTL of small effect than of large effect (Supplementary Fig. 3). Some traits showed a distribution of QTL effect sizes roughly consistent with Orr’s evolutionary model21, whereas others showed one or more larger-than-expected QTL (Supplementary Fig. 4).
Having identified QTL, we next measured the fraction of additive heritability explained by our model of detected QTL for each trait. To obtain unbiased estimates, we performed tenfold cross-validation by detecting QTL in a subset of the segregant panel and estimating the effects in the rest of the panel. Across the traits, the detected loci explained between 72% and 100% of the narrow-sense heritability, with a median of 88% (Fig. 2a). Thus, high statistical power provided by the large segregant panel allowed us to detect QTL that jointly explain most of the additive heritability for the traits studied here. By analysing subsets of the data, we showed that ‘missing’ narrow-sense heritability can be explained by insufficient sample sizes (Fig. 2b). For instance, we detected 16 significant QTL, which jointly explain 78% of narrow-sense heritability for growth in E6 berbamine in a panel of 1,000 segregants (Fig. 3a), but only 2 of these, explaining 21% of narrow-sense heritability, also reached statistical significance in a smaller panel of 100 segregants (Fig. 3b). For traits with mostly additive genetics, the high fraction of variance explained by the detected QTL allowed us to accurately predict individual trait values from QTL genotypes (Fig. 4).
Differences between the estimates of broad-sense and narrow-sense heritability for many traits imply the presence of genetic interactions. We next sought to identify specific two-locus interactions. For each trait, we first performed an exhaustive two-dimensional scan for pairwise interactions. At a lod (log10 of the odds ratio for linkage) score of 6.2, corresponding to an empirical FDR of 10%, we detected significant QTL–QTL interactions for 17 of the 46 traits, with a total of 23 interacting locus pairs. A two-dimensional scan has low statistical power owing to the large search space. Power can be increased, at the cost of missing interactions between loci with no main effects, by testing only for interactions between each locus with significant additive effects and the rest of the genome22. Using this approach, we detected interactions for 24 of the 46 traits, with a total of 78 QTL–QTL interactions at an FDR of 10%. We observed a minimum of 1 and a maximum of 16 pairwise interactions per trait. These 78 pairs included 20 of the 23 locus pairs detected in the exhaustive two-dimensional scan, suggesting that two-locus interactions in which neither locus has a detectable main effect are uncommon. For 47 of the 78 pairs, both loci were detected as significant in the single-locus search for additive effects. In the remaining 31 cases, the additive effect of the second locus was too small to reach genome-wide significance, although it was nominally significant in 10 of these cases. These observations are broadly consistent with our previous work on genetic interactions that affect gene-expression traits22,23.
For most of the traits with a sizeable difference between broad-sense and narrow-sense heritability, pairwise interactions were either not detected or explained little of the difference (Fig. 5). The detected interaction effects were typically small (a median of 1.1% of genetic variance per interaction or a median of 3% of genetic variance per trait). Only in a few cases did detected genetic interactions explain a substantial fraction of the difference between broad-sense and narrow-sense heritability. Most notably, in the case of growth on maltose, one strong interaction explained 14% of the genetic variance and 71% of the difference between broad-sense and narrow-sense heritability (Fig. 5, inset).
We have used a large panel of segregants from a cross between two yeast strains to investigate the genetic architecture of 46 quantitative traits. We measured both the total and the additive contributions of genetic factors to trait variation, and showed that these often differ. The observed differences between total and additive heritability estimates suggest that the contribution of genetic interactions to broad-sense heritability ranges from near zero to 54%. However, with a few exceptions, the specific combinations of loci that account for these interactions remain elusive. There are several possible explanations for this result. First, the statistical power to detect interactions is lower than the power to detect main effects. Second, individual interaction effects are expected to be smaller than additive effects, and hence their detection requires even larger sample sizes13. Finally, higher-order interactions among more than two loci could also contribute24. Our estimates of the contribution of interactions in a cross may overestimate their contribution to trait heritability in a population, because a higher proportion of variance is expected to be additive as allele frequencies depart from one-half25.
The large size of the panel allowed us to detect specific loci that jointly account for almost all of the additive (narrow-sense) heritability of each trait (72–100%). Human traits examined by GWAS vary in their genetic complexity1, ranging from macular degeneration, for which 5 variants in 3 genes explain roughly half of the genetic risk26, to height, for which 180 loci explain about 13% of heritability, implying the existence of a much larger number of undetected loci27. Compared to our results in a yeast cross, GWAS typically detect a larger number of loci explaining a smaller proportion of trait heritability. One obvious difference is that the number of variants segregating in a cross between two strains is smaller than the number of common variants segregating in a population sample. The difference is roughly a factor of three for a neutral allele frequency spectrum, and potentially much larger if functional variants are deleterious and hence shifted towards lower frequency (Methods). The human genome also offers a larger target size, perhaps by a factor of five, for variants affecting a trait (Methods). These very rough estimates suggest that we might expect at least 15 times more loci to be found by GWAS than the median of 12 loci per trait we observe in the yeast cross. Because of the resolution of linkage analysis in our cross, some QTL may contain multiple linked variants, further increasing the true number of loci. Several additional factors could lead to a larger missing heritability in humans: the fraction of heritability due to genetic interactions could be higher13, rare variants may account for a disproportionately large contribution of heritable variation28,29,30, and some human traits might be inherently more complex than yeast traits in that they integrate over physiological processes involving a larger number of underlying gene pathways. Within-locus dominance effects represent an additional source of genetic complexity in diploid organisms.
Our results are consistent with the suggestions that missing additive (narrow-sense) heritability arises primarily from many loci with small but not infinitesimal effects. These loci can be discovered in studies with sufficiently large sample sizes, although the optimal study designs will depend on the population frequency spectra of the causative alleles. Because all alleles are fixed at a frequency of one-half in a cross, we cannot yet delineate the contributions of common and rare variants to inherited variation, but we plan to do so in future studies.
Construction, genotyping and phenotyping of segregant panel
We crossed two prototrophic strains, a MAT a BY parent (a laboratory strain derived from a cross of BY4716 and BY4700) and a MATα RM parent (a vineyard strain derived from RM11-1a). We sporulated the diploid and retained 1,056 four-spore tetrads that showed 2:2 segregation of mating type and drug-resistance markers. One spore from each tetrad was genotyped using a modified version of the Nextera protocol and Illumina short-read sequencing. Segregants were phenotyped for end-point growth on agar plates. Custom image processing software was used to quantify colony size.
We estimated broad-sense heritability using replicated trait measurements for each strain and a random effects analysis of variance. We estimated narrow-sense heritability using a linear mixed model that compares phenotypic similarity among individuals with their realized genetic relatedness measured from genotype data. Standard methods were used for linkage analysis. Empirical FDRs were calculated using a permutation approach.
Construction of segregant panel
We crossed a prototrophic BY parent (a laboratory strain derived from a cross of BY4716 and BY4700) that is MAT a with a prototrophic RM parent (a vineyard strain derived from RM11-1a) that is MATα hoΔ::hphMX4 flo8Δ::natMX4 AMN1-BY. Diploid zygotes were recovered and sporulated for 5–7 days in SPO++ sporulation medium (http://dunham.gs.washington.edu/sporulationdissection.htm) in a roller drum at room temperature (∼21–22 °C). Tetrads were dissected31 using the MSM 400 dissection microscope (Singer Instrument Company). Colonies from four-spore tetrads were innoculated into 150 μl of yeast nitrogen base (YNB) + 2% glucose in 96-well plates (Corning), grown for 48 h at 30 °C without shaking, and stored as frozen stocks in 20% glycerol. 1,056 four-spore tetrads that showed 2:2 segregation of hygromycin resistance, G418 resistance and mating type were retained. A Biomek FX (Beckman Coulter) was used to select one segregant from each four-spore tetrad for downstream analysis.
We calculated statistical power (1− β) for sample sizes of 100 and 1,000 segregants in R using the ‘power.t.test’ function32. Power was calculated over a range of effect sizes, where effect size was calculated as the per cent genetic variance explained by a single QTL. To correct for multiple testing over thousands of markers across the genome, a genome-wide significance threshold (α) of P < 2 × 10−4 was used.
DNA preparation and sequencing library construction
Segregants were innoculated into 1-ml deep-well 96-well plates (Thermo Scientific) in 800 μl of yeast peptone dextrose (YPD) and grown 2 days at 30 °C without shaking. Plates were sealed with Breathe-Easy gas-permeable membranes (Sigma-Aldrich). DNA was extracted using 96-well DNeasy Blood & Tissue kits (Qiagen). DNA concentrations were determined using the Quant-iT dsDNA High-Sensitivity DNA quantification kit (Invitrogen) and the Bio-Tek Synergy 2 plate reader. DNA was diluted to 1.66 ng per microlitre. Per sample, 15 μl of 1.66 ng per μl DNA was added to 4 μl of 5× Nextera HMW buffer, 0.95 μl of water and 0.05 μl of Nextera Enzyme Mix (Epicentre). The transposition reaction was performed for 5 min at 55 °C. One-hundred microlitres of water was added to each sample, and the samples were purified using the MinElute kit (Qiagen). Fifteen microlitres of the purified fragmented DNA was PCR-amplified and barcoded with custom 5-base pair (bp) sequences using Ex Taq polymerase (Takara) and between 15 and 30 PCR cycles. Five microlitres of each PCR-amplified sample was combined into one 96-plex library. The combined library was loaded on a 2% agarose gel, and the 350–650-bp region was excised and gel-extracted using QIAquick Gel Extraction Kit (Qiagen). Final libraries were diluted to 3.3 ng per microlitre and sequenced using the single-end module on a HiSeq 2000 (Illumina) with 100-bp reads.
Determining segregant genotypes
Custom R and python code was used to demultiplex the sequencing data and trim ends. Sequencing reads were assigned to segregants based on the 5-bp barcode at the beginning of each read. The internal 19-bp transposon sequence and 10 bp on the right end of each read were removed. Reads were aligned to the S288C reference genome using the Burrows–Wheeler Aligner33 with the ‘-q 30’ parameter. SAMtools34 was run with the ‘view’ command and ‘-bHsq 1’ parameters to retain uniquely mapping reads. Sequence variants were identified using SAMtools34 with the ‘mpileup’ command and parameters ‘-d 10000 -D –u’. 42,689 high-confidence sequence variants between BY and RM were determined from sequencing the parental strains at greater than 50-fold coverage. Variants in the segregants were restricted to these 42,689 expected sites using ‘bcftools view’34 with the parameters ‘-N -c -g -v -P flat’. Genotype likelihoods for the BY and RM alleles for each genotypic variant were extracted from the VCF file using custom R code.
For each segregant and chromosome, a hidden Markov model was used to calculate the posterior likelihood that the read data was coming from the BY allele, the RM allele, a BY gene-conversion event or an RM gene-conversion event. Genotypes were called the BY variant if the log10 ratio of the BY posterior likelihood and the RM posterior likelihood was greater than 2, the RM variant if this ratio was less than −2, and missing data if between −2 and 2. In total, 1,008 out of 1,056 segregants had between 25 and 120 recombination breakpoints and at least 35,000 markers with genotype calls, and these were retained for downstream analysis. Genotypic markers were excluded if their allele frequency was greater than 56% or less than 45%, or if they were not called in at least 99% of the segregants. This resulted in a final set of 30,594 genotypic markers. Markers with missing data were imputed using the Viterbi algorithm as implemented in the R/qtl package35. Adjacent markers with the same genotypes in all segregants were collapsed to one unique marker, resulting in a final set of 11,623 unique genotypic markers.
Phenotyping by end-point growth on agar plates
Individual segregants were inoculated in at least two different plate configurations into 384-well plates (Themo Scientific; 264574) with 50 ml of YPD and grown for 36 to 48 h in a 30 °C incubator without shaking. Replicates here consisted of strains that were grown in independent 384-well plates and in different plate configurations. Each segregant was represented once in each plate configuration. Each target agar plate was made with 50 ml of media (YPD or YNB) and with drug or condition of choice (Supplementary Table 1). The Singer Rotor HDA pinning robot was used to pin the segregants to the agar plates. Before pinning, each 384-well plate was mixed for 1 min at 2,000 r.p.m. using a MixMate (Eppendorf). Segregants were pinned to the agar plates with 100% pin pressure and 384 long pins (Singer Instruments; RP-MP-3L). After pinning, the plates were incubated at 30 °C, or the specified condition temperature (Supplementary Table 1), for approximately 48 h. Plates were scanned face-up and without lids on an Epson 700 transparency scanner with 400 DPI resolution and a greyscale bit-depth of 8. Images were saved as TIFFs or 99% quality JPEGs. The pixel coordinates of the centres of the four corner colonies for each plate were manually identified using ImageJ36.
Custom R code was written to determine the size of each colony. Expected colony positions were calculated using manually identified coordinates of the corner colonies for each plate. Images were segmented using k-means clustering on the distribution of pixel intensities across a plate. The Voronoi region of each 9-pixel diameter circular seed, corresponding to the expected location of a colony, was used together with the segmented image to match colonies to their expected positions. This was implemented using functions in the EBImage37 R package. The radius of each colony was calculated as . Colonies with more than 15 pixels touching the edge of the image and colonies larger than 3,500 pixels were removed and treated as missing data in downstream analysis. Irregular colonies, representing image processing or pinning artefacts, were removed if and radius >20. These irregular colonies were also treated as missing data for downstream analysis. Colonies on each edge of a plate were tested for difference with all other colonies on the plate using a Wilcoxon rank-sum test. All colonies on an edge with P < 0.05 were treated as missing data and excluded from downstream analysis. Images were further inspected manually, and colonies subject to pinning or image-processing artefacts were removed. To normalize for occasional subtle within-plate spatial growth artefacts, a robust locally weighted regression was fit to the radius measurements using functions in the locfit R package38. The residuals were used for downstream analysis. End-point growth measurements were normalized for growth on control media by fitting a regression for effect of growth on control media and using the residuals for downstream analysis. One-hundred-and-forty conditions were assayed. If multiple doses of a compound were tested, the dose with the highest heritability and phenotype data for at least 600 segregants was retained. Traits with a broad-sense heritability of less than 25% were excluded from downstream analysis.
Broad-sense heritability was calculated using replicated segregant data and a random effects analysis of variance. This was implemented using the ‘lmer’ function in the lme4 R package39. The variance components , the genetic variance due to effect of segregant; and , the error variance, were calculated, and broad sense-heritability was estimated as . Standard errors were calculated by delete-one jackknife.
Narrow-sense heritability was calculated for each trait using a linear mixed model40. This can be written as . Here y is a vector of phenotype values for n segregants. For the comparison of broad-sense and narrow-sense heritabilities, y consists of one randomly chosen measurement of the replicate phenotype measurements for each segregant. This puts the comparison of broad-sense and narrow-sense heritability on the same scale. For the comparison of variance explained by each QTL model and narrow-sense heritability, y consists of the average of replicate phenotype measurements for each segregant. β is the overall mean, and 1n is a vector of n ones. Z is an n × n identity matrix, u is a vector of random effects (BLUPs or breeding values for each segregant), and e is a vector of residuals. The variance structure of the phenotypes is written as , where A is the relatedness matrix between all pairs of segregants, estimated from our genotype data as the proportion of markers shared identical-by descent (IBD) between all pair of segregants, I is an n × n identity matrix, is the polygenic additive genetic variance explained by the single nucleotide polymorphisms and is the error variance. Variance components were estimated using the rrBLUP R package41, and narrow-sense heritability was estimated as . Standard errors were calculated by delete-one jackknife. Estimates of narrow-sense heritability using the average of segregant replicates or one measurement for each segregant were similar.
Although the average genetic relatedness among the segregants is 0.5, it varies owing to random Mendelian segregation, and the actual relatedness for any pair of segregants can be calculated from high-density genotype data as the proportion of single nucleotide polymorphism alleles shared by these segregants. In our segregant panel, the standard deviation of relatedness was 0.048. For the partitioning of additive variance by chromosome, the relatedness matrix A between all pairs of segregants was estimated as the proportion of markers on a chromosome-shared IBD between all pair of segregants42.
Mapping additive QTL
Forty-six traits were chosen for QTL mapping on the basis of the criteria described above. Each trait was scaled to have mean 0 and variance 1. We tested for linkage by calculating lod scores for each genotypic marker and each trait as , where r is the Pearson correlation coefficient between the segregant genotypes at the marker and segregant trait values43.
To estimate significance empirically, assignment of phenotype to each segregant was randomly permuted 1,000 times while maintaining the correlation structure among phenotypes. The maximum lod score for each chromosome and trait was retained44. The FDR was calculated as the ratio of expected peaks to observed peaks across different lod thresholds. Genetic markers corresponding to QTL peaks that were significant at an FDR of 5% were added to a linear model for each trait. Trait-specific linear models that included the significant QTL genotypes as additive covariates were computed, and phenotypic residuals were estimated. Phenotypic residuals for each trait were then used for another round of QTL detection45. This process of peak detection, calculation of empirical significance thresholds and expansion of the linear model for each trait to include significant QTLs detected at each step was repeated four times. The lod thresholds corresponding to a 5% FDR at each step were 2.68, 2.92, 3.72 and 4.9.
Calculating effect size and variance explained by additive QTL
For each trait, a multiple regression linear model was computed with trait-specific QTL genotypes as independent variables. Phenotypes for each trait were scaled to have mean 0 and variance 1. The multiple regression coefficients are the standardized differences in allelic means for each QTL while controlling for the effects of other segregating QTL. The square of the multiple regression coefficient is the fraction of phenotypic variance explained by a QTL. The fitted truncated exponential distribution in Supplementary Fig. 3 is parameterized as , where x is the absolute value of the multiple regression coefficient pooled across traits and QTL, n is the number of bins, w is the bin size, l and r are left and right truncation points, respectively and b is estimated using maximum likelihood. Here l = 0.12 and r = 0.35 to correspond to the magnitude of effect sizes where power is nearly 100% and to exclude large-effect QTL.
The total phenotypic variance explained by the multiple regression QTL model is the r2 from the model, which was calculated using the ‘fitqtl’ function in R/qtl. Unbiased estimates of the total phenotypic variance explained by the multiple QTL model were calculated by standard tenfold cross validation. The segregants were randomly split into ten equal-sized groups, nine groups were combined for QTL detection using the algorithm described above, and the remaining group was used to estimate QTL effect sizes. This process was repeated for each of the ten groups, and the average of the ten estimates was calculated. A similar method was used for phenotypic prediction. The segregants were randomly split into ten equal-sized groups, but here nine groups were combined for QTL detection and for estimation of QTL effect sizes. Phenotypes were then predicted for the remaining group (validation set) using only the genotypes from the validation set and the QTL model constructed from the nine other groups (training set). This was repeated such that each segregant was in the validation set one time and the training set nine times.
Detecting QTL–QTL interactions
For computational efficiency, the marker set was reduced to 4,420 by picking one marker closest to each centimorgan position on the genetic map. For the full-genome scan for interacting QTL, a lod score corresponding to the likelihood ratio of a model that includes an interaction term, y = ax+bz+cxz+d, to a model that does not, y = ax+bz+d, was computed for each trait and every marker pair. Here, y is the residuals vector for each trait after fitting the additive QTL model, x is the genotype vector at one position in the genome, z is the genotype vector at another position in the genome at least 25 centimorgans away from x, and a, b, c and d are estimated parameters specific to each trait and marker pair22,23. FDR at different lod thresholds was calculated by dividing the average number of peaks obtained in 1,000 permutations of the data that scramble the segregant identities by the number of peaks observed in the real data.
To increase statistical power, we tested for interactions between each locus with significant additive effects and the rest of the genome. A lod score for interaction was computed in exactly the same manner as for the full two-dimensional scan described above, except that x was constrained to genotypes corresponding to trait-specific significant additive QTL. FDR was calculated as above. Non-additive genetic variance explained was calculated as the amount of variance explained by including QTL–QTL interactions in the model divided by the difference between broad-sense and narrow-sense heritability.
Conversion factors for yeast versus human expected numbers of QTL
The number of variants segregating in a cross between two strains is smaller than the number of (common) variants segregating in a population sample. If the causal variants follow a neutral allele frequency spectrum, then under the standard neutral model of population genetics, the relationship between the number of variants segregating in a cross of two haploid strains (S2) and the number of common variants segregating in a population (Sf) with minor allele frequency ≥f is (ref. 46). By setting f to 0.05, we obtain a ratio Sf/S2 of approximately three (2.94). The ratio is 2.2 for f = 0.1 and 4.6 for f = 0.01. If the functional variants are at least weakly deleterious, and hence skewed towards lower frequency, the conversion factors will be larger.
Several approaches can be used to obtain a rough idea of relative target size for mutation. The yeast genome is 12 megabases, whereas the coding regions of the human genome alone are ∼30 megabases, and at least the same amount of non-coding sequence is expected to be functional based on selective constraint47. If we assume that the number of loci involved in a trait is proportional to the number of bases available for mutations with functional consequences, human traits should be more complex than yeast traits by at least a factor of 5. Similarly, yeast has ∼5,700 genes compared to ∼20,000 for humans, which gives a conversion factor of ∼3.5.
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)
Hill, W. G. Understanding and using quantitative genetic variation. Phil. Trans. R. Soc. Lond. B 365, 73–85 (2010)
Mackay, T. F. C., Stone, E. A. & Ayroles, J. F. The genetics of quantitative traits: challenges and prospects. Nature Rev. Genet. 10, 565–577 (2009)
Buckler, E. S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009)
Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627–631 (2010)
Mackay, T. F. et al. The Drosophila melanogaster Genetic Reference Panel. Nature 482, 173–178 (2012)
Aylor, D. L. et al. Genetic analysis of complex traits in the emerging Collaborative Cross. Genome Res. 21, 1213–1222 (2011)
Rockman, M. V. The QTN program and the alleles that matter for evolution: all that’s gold does not glitter. Evolution 66, 1–17 (2012)
Goldstein, D. B. Common genetic variation and human traits. N. Engl. J. Med. 360, 1696–1698 (2009)
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010)
Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012)
Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001)
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012)
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics edn 4 (Longman, 1996)
Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Rev. Genet. 11, 446–450 (2010)
Ehrenreich, I. M., Gerke, J. P. & Kruglyak, L. Genetic dissection of complex traits in yeast: insights from studies of gene expression and other phenotypes in the BY×RM cross. Cold Spring Harb. Symp. Quant. Biol. 74, 145–153 (2009)
Brem, R. B. & Kruglyak, L. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl Acad. Sci. USA 102, 1572–1577 (2005)
Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039–1042 (2010)
Ruderfer, D. M., Pratt, S. C., Seidel, H. S. & Kruglyak, L. Population genomic analysis of outcrossing and recombination in yeast. Nature Genet. 38, 1077–1081 (2006)
Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006)
Orr, H. A. Adaptation and the cost of complexity. Evolution 54, 13–20 (2000)
Storey, J. D., Akey, J. M. & Kruglyak, L. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol. 3, e267 (2005)
Brem, R. B., Storey, J. D., Whittle, J. & Kruglyak, L. Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436, 701–703 (2005)
Dowell, R. D. et al. Genotype to phenotype: a complex problem. Science 328, 469 (2010)
Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4, e1000008 (2008)
Maller, J. et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nature Genet. 38, 1055–1059 (2006)
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010)
Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012)
Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740–743 (2012)
Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012)
Amberg, D. C., Burke, D. & Strathern, J. N. Methods in Yeast Genetics: a Cold Spring Harbor Laboratory Course Manual (Cold Spring Harbor Laboratory Press, 2005)
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2012)
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009)
Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental crosses. Bioinformatics 19, 889–890 (2003)
Abramoff, M. D., Magalhaes, P. J. & Ram, S. J. Image Processing with ImageJ. Biophotonics International 11, 36–42 (2004)
Pau, G., Fuchs, F., Sklyar, O., Boutros, M. & Huber, W. EBImage—an R package for image processing with applications to cellular phenotypes. Bioinformatics 26, 979–981 (2010)
Loader, C. locfit: Local Regression, Likelihood and Density Estimation http://CRAN.R-project.org/package=locfit (2012)
Bates, D., Maechler, M. & Bolker, B. lme4: Linear Mixed-Effects Models Using S4 Classes http://CRAN.R-project.org/package=lme4 (2011)
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011)
Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011)
Visscher, P. M. Variation of estimates of SNP and haplotype diversity and linkage disequilibrium in samples from the same population due to experimental and evolutionary sample size. Ann. Hum. Genet. 71, 119–126 (2007)
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits edn 1 (Sinauer Associates, 1998)
Chen, L. & Storey, J. D. Relaxed significance criteria for linkage analysis. Genetics 173, 2371–2381 (2006)
Doerge, R. W. & Churchill, G. A. Permutation tests for multiple loci affecting a quantitative character. Genetics 142, 285–294 (1996)
Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet. 27, 234–236 (2001)
Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)
We thank D. Botstein, M. McClean, E. Andersen, F. Albert, S. Treusch, R. Ghosh and X. Wang for comments on the manuscript, Y. Jia and S. Schrader for technical assistance and E. Lander for discussions. This work was supported by National Institutes of Health (NIH) grants R37 MH59520 and R01 GM102308, a James S. McDonnell Centennial Fellowship, and the Howard Hughes Medical Institute (L.K.), a National Science Foundation (NSF) fellowship (J.S.B.), NIH postdoctoral fellowship F32 HG51762 (I.M.E.) and NIH grant P50 GM071508 to the Center for Quantitative Biology at the Lewis-Sigler Institute of Princeton University.
The authors declare no competing financial interests.
This file contains Supplementary Figures 1-4 and a link to the additional Supplementary Data and Code. (PDF 536 kb)
This table contains drug doses, heritability statistics, and QTL summary statistics for traits investigated in this study. (XLS 36 kb)
This table shows the additive genetic variance, partitioned by chromosome, for each trait. (XLS 34 kb)
This is a table of detected QTL. Positions, effect sizes, confidence intervals and genes underneath detected QTL for each trait are listed. (XLS 522 kb)
About this article
Cite this article
Bloom, J., Ehrenreich, I., Loo, W. et al. Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013). https://doi.org/10.1038/nature11867
Trends in Genetics (2021)
PLOS Pathogens (2021)
International Journal for Parasitology: Drugs and Drug Resistance (2021)
PLOS Genetics (2021)
Tolerance to oxidative stress is associated with both oxidative stress response and inherent growth in a fungal wheat pathogen