Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Dysregulation of expression correlates with rare-allele burden and fitness loss in maize

Abstract

Here we report a multi-tissue gene expression resource that represents the genotypic and phenotypic diversity of modern inbred maize, and includes transcriptomes in an average of 255 lines in seven tissues. We mapped expression quantitative trait loci and characterized the contribution of rare genetic variants to extremes in gene expression. Some of the new mutations that arise in the maize genome can be deleterious; although selection acts to keep deleterious variants rare, their complete removal is impeded by genetic linkage to favourable loci and by finite population size1,2,3,4. Modern maize breeders have systematically reduced the effects of this constant mutational pressure through artificial selection and self-fertilization, which have exposed rare recessive variants in elite inbred lines5. However, the ongoing effect of these rare alleles on modern inbred maize is unknown. By analysing this gene expression resource and exploiting the extreme diversity and rapid linkage disequilibrium decay of maize6, we characterize the effect of rare alleles and evolutionary history on the regulation of expression. Rare alleles are associated with the dysregulation of expression, and we correlate this dysregulation to seed-weight fitness. We find enrichment of ancestral rare variants among expression quantitative trait loci mapped in modern inbred lines, which suggests that historic bottlenecks have shaped regulation. Our results suggest that one path for further genetic improvement in agricultural species lies in purging the rare deleterious variants that have been associated with crop fitness.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: The abundance of local rare alleles correlates with extremes in expression.
Figure 2: Ancestral rare alleles are significantly enriched for highly explanatory cis eQTL in modern germplasm.
Figure 3: Dysregulation of expression can predict fitness.

Accession codes

Primary accessions

BioProject

Sequence Read Archive

References

  1. Kimura, M., Maruyama, T. & Crow, J. F. The mutation load in small populations. Genetics 48, 1303–1312 (1963)

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Marth, G. T. et al. The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011)

    Article  Google Scholar 

  3. Henn, B. M., Botigué, L. R., Bustamante, C. D., Clark, A. G. & Gravel, S. Estimating the mutation load in human genomes. Nat. Rev. Genet. 16, 333–343 (2015)

    CAS  Article  Google Scholar 

  4. Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2012)

    CAS  Article  Google Scholar 

  5. Troyer, A. F. A retrospective view of corn genetic resources. J. Hered. 81, 17–24 (1990)

    Article  Google Scholar 

  6. Remington, D. L. et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl Acad. Sci. USA 98, 11479–11484 (2001)

    ADS  CAS  Article  Google Scholar 

  7. Kono, T. J. Y. et al. The role of deleterious substitutions in crop genomes. Mol. Biol. Evol. 33, 2307–2317 (2016)

    CAS  Article  Google Scholar 

  8. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)

    ADS  CAS  Article  Google Scholar 

  9. Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014)

    CAS  Article  Google Scholar 

  10. Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016)

    CAS  Article  Google Scholar 

  11. Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012)

    CAS  Article  Google Scholar 

  12. Gore, M. A. et al. A first-generation haplotype map of maize. Science 326, 1115–1117 (2009)

    ADS  CAS  Article  Google Scholar 

  13. Tenaillon, M. I. et al. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl Acad. Sci. USA 98, 9161–9166 (2001)

    ADS  CAS  Article  Google Scholar 

  14. Vigouroux, Y. et al. Rate and pattern of mutation at microsatellite loci in maize. Mol. Biol. Evol. 19, 1251–1260 (2002)

    CAS  Article  Google Scholar 

  15. Beissinger, T. M. et al. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2, 16084 (2016)

    Article  Google Scholar 

  16. Duvick, D. N. The contribution of breeding to yield advances in maize (Zea mays L.). Adv. Agron. 86, 83–145 (2005)

    Article  Google Scholar 

  17. Troyer, A. F. & Wellin, E. J. Heterosis decreasing in hybrids: yield test inbreds. Crop Sci. 49, 1969–1976 (2009)

    Article  Google Scholar 

  18. Flint-Garcia, S. A. et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 44, 1054–1064 (2005)

    CAS  Article  Google Scholar 

  19. Eveland, A. L., McCarty, D. R. & Koch, K. E. Transcript profiling by 3′-untranslated region sequencing resolves expression of gene families. Plant Physiol. 146, 32–44 (2008)

    CAS  Article  Google Scholar 

  20. Lohman, B. K., Weber, J. N. & Bolnick, D. I. Evaluation of TagSeq, a reliable low-cost alternative for RNAseq. Mol. Ecol. Resour. 16, 1315–1321 (2016)

    CAS  Article  Google Scholar 

  21. Bukowski, R. et al. Construction of the third generation Zea mays haplotype map. Gigascience https://doi.org/10.1093/gigascience/gix134 (2017)

  22. Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013)

    Article  Google Scholar 

  23. Yao, H., Dogra Gray, A., Auger, D. L. & Birchler, J. A. Genomic dosage effects on heterosis in triploid maize. Proc. Natl Acad. Sci. USA 110, 2665–2669 (2013)

    ADS  CAS  Article  Google Scholar 

  24. Josephs, E. B., Lee, Y. W., Stinchcombe, J. R. & Wright, S. I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl Acad. Sci. USA 112, 15390–15395 (2015)

    ADS  CAS  Article  Google Scholar 

  25. Gout, J.-F., Kahn, D., Duret, L. & Paramecium Post-Genomics Consortium. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 6, e1000944 (2010)

    Article  Google Scholar 

  26. Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012)

    CAS  Article  Google Scholar 

  27. Hung, H.-Y. et al. The relationship between parental genetic or phenotypic divergence and progeny variation in the maize nested association mapping population. Heredity 108, 490–499 (2012)

    CAS  Article  Google Scholar 

  28. Rodgers-Melnick, E. et al. Recombination in diverse maize is stable, predictable, and associated with genetic load. Proc. Natl Acad. Sci. USA 112, 3823–3828 (2015)

    ADS  CAS  PubMed  Google Scholar 

  29. Wan, C. Y. & Wilkins, T. A. A modified hot borate method significantly enhances the yield of high-quality RNA from cotton (Gossypium hirsutum L.). Anal. Biochem. 223, 7–12 (1994)

    CAS  Article  Google Scholar 

  30. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)

    CAS  Article  Google Scholar 

  31. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)

    CAS  Article  Google Scholar 

  32. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)

    CAS  Article  Google Scholar 

  33. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

    CAS  Article  Google Scholar 

  34. Money, D. et al. LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3 5, 2383–2390 (2015)

    Article  Google Scholar 

  35. Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007)

    CAS  Article  Google Scholar 

  36. Swarts, K. et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 7, https://doi.org/10.3835/plantgenome2014.05.0023 (2014)

    CAS  Article  Google Scholar 

  37. Ramu, P. et al. Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat. Genet. 49, 959–963 (2017)

    CAS  Article  Google Scholar 

  38. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010)

    ADS  MathSciNet  Article  Google Scholar 

  39. Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012)

    CAS  Article  Google Scholar 

  40. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)

    Article  Google Scholar 

  41. Kisselbach, T. A. The Structure and Reproduction of Corn (Cold Spring Harbor Laboratory, 1999)

Download references

Acknowledgements

We thank J. Pardo, J. Wallace, R. Punna, K. Shirasawa and S. Miller for assistance with tissue collection; J. Budka and G. Inzinna for field and greenhouse assistance; R. Bukowski for running the maize HapMap genotyping pipeline; L. Johnson and Z. Miller for database curation; G. Gibson, M. Wolfe, J.-L. Jannink, M. Hufford and J. Ross-Ibarra for discussions; P. Schweitzer, J. Mosher, A. Tate, J. Mattison, M. Magallanes-Lundback, I. Holländer and D. Daujotyte for guidance on RNA extraction, library preparation automation and sequencing; and S. Miller for copy-editing. This work was supported by the US Department of Agriculture–Agricultural Research Service and the National Science Foundation grants IOS-0922493 and IOS-1238014 to E.S.B. The National Science Foundation Graduate Research Fellowship Program grant DGE-1650441 and the Section of Plant Breeding and Genetics at Cornell University provided support to K.A.G.K. The Taiwanese Ministry of Science and Technology Overseas Project for Post Graduate Research grant 104-2917-I-564-015 supported S.-Y.C.

Author information

Authors and Affiliations

Authors

Contributions

K.A.G.K. and E.S.B. designed the experiments and wrote the manuscript. K.A.G.K performed the analyses and made the RNA-seq libraries. K.A.G.K., S.-Y.C., and M.-H.S. extracted RNA. N.K.L. managed germplasm and plants with K.A.G.K., M.C.R., K.L.S. and A.L. produced and imputed HapMap genotypic data. P.J.B. implemented matrixEQTL in Java/TASSEL. F.L. implemented SNP calling from RNA-seq data.

Corresponding authors

Correspondence to Karl A. G. Kremling or Edward S. Buckler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks N. Springer and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Tissues that were expression profiled by 3′ RNA-seq.

See additional details regarding tissue collection in Methods. Illustrations inspired by ref. 41.

Extended Data Figure 2 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the most highly expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 3 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the medium-expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,001–10,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples).f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 4 Comparison of the number of rare cis alleles near genes with differing expression levels.

The 10,000 most-expressed genes in each tissue are divided into groups of 1,000 on the basis of expression level. Plots in each panel show genes ranked 1–1,000, 1,001–2,000, …, 9,001–10,000 from left to right. Each of the individuals represented in each tissue is ranked for expression for each of the 1,000 genes in each group. Individuals in the bottom five expression ranks (fuchsia) versus the middle two quartiles (yellow) versus the top five expression ranks (blue) (mean ± s.e.m.). Y axes refer to mean upstream (within 5 kb) rare-allele count. a, Roots of germinating seedling (n = 273 unique inbred samples). b, Shoots of germinating seedling (n = 278 unique inbred samples). c, Kernels at 350-growing-degree days (n = 229 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Tip of leaf three (n = 265 unique inbred samples). f, Adult leaves collected during the day (n = 204 unique inbred samples). g, Adult leaves collected at night (n = 260 unique inbred samples).

Extended Data Figure 5 eQTL R2 distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.1–0.2 (RNA-set MAF) versus 0.1–0.2 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10−16 two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 6 eQTL R2 distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.4–0.5 (RNA-set MAF) versus 0.4–0.5 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10−16 two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 7 Expression value and dysregulation of 5,000 most-expressed genes are both predictive of fitness.

Orange boxes represent correlations between predicted and true seed weight when using expression values. Yellow boxes represent correlations between predicted and true seed weight when using absolute deviation in expression from the population mean. Range of correlations between predicted and true seed weight is displayed from ten repetitions of nested tenfold cross validation (ten inner and ten outer) using ridge regression. In the box plots, the middle horizontal lines represent the median, hinges represent the 25th and 75th percentiles (the interquartile range), the upper and lower whiskers extend to maximum and minimum points no more than 1.5× interquartile range beyond the hinges, and individual dots are outliers beyond the whiskers. Sample sizes: 2-cm root tips of germinating seedlings (unique n = 181) and whole shoots of germinating seedlings (unique n = 183); the 2-cm base (unique n = 181) and tip (unique n = 182) of leaf 3; leaves collected in the field during the day (unique n = 135) and night (unique n = 187); and 350-growing-degree-day kernels (unique n = 171), post sexual maturity (anthesis).

Extended Data Figure 8 Cumulative expression dysregulation of the 5,000 most-expressed genes in each tissue versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples). Regression statistics in Extended Data Table 1. Sweet corn and popcorn lines were excluded from these regressions.

Extended Data Figure 9 Mean upstream rare-allele count from the 5,000 most highly expressed genes versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples).

Extended Data Table 1 Regression statistics for cumulative expression dysregulation in each tissue against seed-weight fitness

Supplementary information

Life Sciences Reporting Summary (PDF 72 kb)

Supplementary Table 1

This table contains collection details for all sampled genotypes. Sequencing batch, tissue of origin, RNAseq depth, and subpopulation membership are specified for each sample. (XLS 445 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kremling, K., Chen, SY., Su, MH. et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature 555, 520–523 (2018). https://doi.org/10.1038/nature25966

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature25966

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing