Dysregulation of expression correlates with rare-allele burden and fitness loss in maize

Kremling, Karl A. G.; Chen, Shu-Yun; Su, Mei-Hsiu; Lepak, Nicholas K.; Romay, M. Cinta; Swarts, Kelly L.; Lu, Fei; Lorant, Anne; Bradbury, Peter J.; Buckler, Edward S.

doi:10.1038/nature25966

Letter
Published: 14 March 2018

Dysregulation of expression correlates with rare-allele burden and fitness loss in maize

Karl A. G. Kremling¹,
Shu-Yun Chen^2,3,
Mei-Hsiu Su²,
Nicholas K. Lepak⁴,
M. Cinta Romay²,
Kelly L. Swarts^1,5,
Fei Lu^2,6,
Anne Lorant⁷,
Peter J. Bradbury⁴ &
…
Edward S. Buckler^1,2,4

Nature volume 555, pages 520–523 (2018)Cite this article

17k Accesses
134 Citations
135 Altmetric
Metrics details

Subjects

Abstract

Here we report a multi-tissue gene expression resource that represents the genotypic and phenotypic diversity of modern inbred maize, and includes transcriptomes in an average of 255 lines in seven tissues. We mapped expression quantitative trait loci and characterized the contribution of rare genetic variants to extremes in gene expression. Some of the new mutations that arise in the maize genome can be deleterious; although selection acts to keep deleterious variants rare, their complete removal is impeded by genetic linkage to favourable loci and by finite population size^1,2,3,4. Modern maize breeders have systematically reduced the effects of this constant mutational pressure through artificial selection and self-fertilization, which have exposed rare recessive variants in elite inbred lines⁵. However, the ongoing effect of these rare alleles on modern inbred maize is unknown. By analysing this gene expression resource and exploiting the extreme diversity and rapid linkage disequilibrium decay of maize⁶, we characterize the effect of rare alleles and evolutionary history on the regulation of expression. Rare alleles are associated with the dysregulation of expression, and we correlate this dysregulation to seed-weight fitness. We find enrichment of ancestral rare variants among expression quantitative trait loci mapped in modern inbred lines, which suggests that historic bottlenecks have shaped regulation. Our results suggest that one path for further genetic improvement in agricultural species lies in purging the rare deleterious variants that have been associated with crop fitness.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The abundance of local rare alleles correlates with extremes in expression.**

**Figure 2: Ancestral rare alleles are significantly enriched for highly explanatory *cis* eQTL in modern germplasm.**

**Figure 3: Dysregulation of expression can predict fitness.**

The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars

Article Open access 15 April 2024

Jarkko Salojärvi, Aditi Rambani, … Patrick Descombes

Genetic gains underpinning a little-known strawberry Green Revolution

Article Open access 19 March 2024

Mitchell J. Feldmann, Dominique D. A. Pincot, … Steven J. Knapp

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Qichao Lian, Bruno Huettel, … Raphael Mercier

Accession codes

Primary accessions

BioProject

PRJNA383416

Sequence Read Archive

SRP115041

References

Kimura, M., Maruyama, T. & Crow, J. F. The mutation load in small populations. Genetics 48, 1303–1312 (1963)
CAS PubMed PubMed Central Google Scholar
Marth, G. T. et al. The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011)
Article Google Scholar
Henn, B. M., Botigué, L. R., Bustamante, C. D., Clark, A. G. & Gravel, S. Estimating the mutation load in human genomes. Nat. Rev. Genet. 16, 333–343 (2015)
Article CAS Google Scholar
Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2012)
Article CAS Google Scholar
Troyer, A. F. A retrospective view of corn genetic resources. J. Hered. 81, 17–24 (1990)
Article Google Scholar
Remington, D. L. et al. Structure of linkage disequilibrium and phenotypic associations in the maize genome. Proc. Natl Acad. Sci. USA 98, 11479–11484 (2001)
Article ADS CAS Google Scholar
Kono, T. J. Y. et al. The role of deleterious substitutions in crop genomes. Mol. Biol. Evol. 33, 2307–2317 (2016)
Article CAS Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)
Article ADS CAS Google Scholar
Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014)
Article CAS Google Scholar
Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016)
Article CAS Google Scholar
Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012)
Article CAS Google Scholar
Gore, M. A. et al. A first-generation haplotype map of maize. Science 326, 1115–1117 (2009)
Article ADS CAS Google Scholar
Tenaillon, M. I. et al. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl Acad. Sci. USA 98, 9161–9166 (2001)
Article ADS CAS Google Scholar
Vigouroux, Y. et al. Rate and pattern of mutation at microsatellite loci in maize. Mol. Biol. Evol. 19, 1251–1260 (2002)
Article CAS Google Scholar
Beissinger, T. M. et al. Recent demography drives changes in linked selection across the maize genome. Nat. Plants 2, 16084 (2016)
Article Google Scholar
Duvick, D. N. The contribution of breeding to yield advances in maize (Zea mays L.). Adv. Agron. 86, 83–145 (2005)
Article Google Scholar
Troyer, A. F. & Wellin, E. J. Heterosis decreasing in hybrids: yield test inbreds. Crop Sci. 49, 1969–1976 (2009)
Article Google Scholar
Flint-Garcia, S. A. et al. Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J. 44, 1054–1064 (2005)
Article CAS Google Scholar
Eveland, A. L., McCarty, D. R. & Koch, K. E. Transcript profiling by 3′-untranslated region sequencing resolves expression of gene families. Plant Physiol. 146, 32–44 (2008)
Article CAS Google Scholar
Lohman, B. K., Weber, J. N. & Bolnick, D. I. Evaluation of TagSeq, a reliable low-cost alternative for RNAseq. Mol. Ecol. Resour. 16, 1315–1321 (2016)
Article CAS Google Scholar
Bukowski, R. et al. Construction of the third generation Zea mays haplotype map. Gigascience https://doi.org/10.1093/gigascience/gix134 (2017)
Romay, M. C. et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 14, R55 (2013)
Article Google Scholar
Yao, H., Dogra Gray, A., Auger, D. L. & Birchler, J. A. Genomic dosage effects on heterosis in triploid maize. Proc. Natl Acad. Sci. USA 110, 2665–2669 (2013)
Article ADS CAS Google Scholar
Josephs, E. B., Lee, Y. W., Stinchcombe, J. R. & Wright, S. I. Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Natl Acad. Sci. USA 112, 15390–15395 (2015)
Article ADS CAS Google Scholar
Gout, J.-F., Kahn, D., Duret, L. & Paramecium Post-Genomics Consortium. The relationship among gene expression, the evolution of gene dosage, and the rate of protein evolution. PLoS Genet. 6, e1000944 (2010)
Article Google Scholar
Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012)
Article CAS Google Scholar
Hung, H.-Y. et al. The relationship between parental genetic or phenotypic divergence and progeny variation in the maize nested association mapping population. Heredity 108, 490–499 (2012)
Article CAS Google Scholar
Rodgers-Melnick, E. et al. Recombination in diverse maize is stable, predictable, and associated with genetic load. Proc. Natl Acad. Sci. USA 112, 3823–3828 (2015)
ADS CAS PubMed Google Scholar
Wan, C. Y. & Wilkins, T. A. A modified hot borate method significantly enhances the yield of high-quality RNA from cotton (Gossypium hirsutum L.). Anal. Biochem. 223, 7–12 (1994)
Article CAS Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014)
Article CAS Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)
Article CAS Google Scholar
Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)
Article CAS Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)
Article CAS Google Scholar
Money, D. et al. LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3 5, 2383–2390 (2015)
Article Google Scholar
Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007)
Article CAS Google Scholar
Swarts, K. et al. Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants. Plant Genome 7, https://doi.org/10.3835/plantgenome2014.05.0023 (2014)
Article CAS Google Scholar
Ramu, P. et al. Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat. Genet. 49, 959–963 (2017)
Article CAS Google Scholar
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010)
Article ADS MathSciNet Google Scholar
Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012)
Article CAS Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Kisselbach, T. A. The Structure and Reproduction of Corn (Cold Spring Harbor Laboratory, 1999)

Download references

Acknowledgements

We thank J. Pardo, J. Wallace, R. Punna, K. Shirasawa and S. Miller for assistance with tissue collection; J. Budka and G. Inzinna for field and greenhouse assistance; R. Bukowski for running the maize HapMap genotyping pipeline; L. Johnson and Z. Miller for database curation; G. Gibson, M. Wolfe, J.-L. Jannink, M. Hufford and J. Ross-Ibarra for discussions; P. Schweitzer, J. Mosher, A. Tate, J. Mattison, M. Magallanes-Lundback, I. Holländer and D. Daujotyte for guidance on RNA extraction, library preparation automation and sequencing; and S. Miller for copy-editing. This work was supported by the US Department of Agriculture–Agricultural Research Service and the National Science Foundation grants IOS-0922493 and IOS-1238014 to E.S.B. The National Science Foundation Graduate Research Fellowship Program grant DGE-1650441 and the Section of Plant Breeding and Genetics at Cornell University provided support to K.A.G.K. The Taiwanese Ministry of Science and Technology Overseas Project for Post Graduate Research grant 104-2917-I-564-015 supported S.-Y.C.

Author information

Authors and Affiliations

Section of Plant Breeding and Genetics, 175 Biotechnology Building, Cornell University, Ithaca, 14853, New York, USA
Karl A. G. Kremling, Kelly L. Swarts & Edward S. Buckler
Institute for Genomic Diversity, 175 Biotechnology Building, Cornell University, Ithaca, 14853, New York, USA
Shu-Yun Chen, Mei-Hsiu Su, M. Cinta Romay, Fei Lu & Edward S. Buckler
Institute of Plant and Microbial Biology, Academia Sinica 128, Sec 2nd, Academia road, Taipei, 11529, Taiwan
Shu-Yun Chen
USDA-ARS, R. W. Holley Center, Cornell University, Ithaca, 14853, New York, USA
Nicholas K. Lepak, Peter J. Bradbury & Edward S. Buckler
Department of Molecular Biology, Research Group for Ancient Genomics and Evolution, Max Planck Institute for Developmental Biology, Spemannstr. 35, Tübingen, 72076, Germany
Kelly L. Swarts
The State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
Fei Lu
Department of Plant Sciences, University of California Davis, Davis, 95616, California, USA
Anne Lorant

Authors

Karl A. G. Kremling
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Yun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mei-Hsiu Su
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas K. Lepak
View author publications
You can also search for this author in PubMed Google Scholar
M. Cinta Romay
View author publications
You can also search for this author in PubMed Google Scholar
Kelly L. Swarts
View author publications
You can also search for this author in PubMed Google Scholar
Fei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Anne Lorant
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Bradbury
View author publications
You can also search for this author in PubMed Google Scholar
Edward S. Buckler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.A.G.K. and E.S.B. designed the experiments and wrote the manuscript. K.A.G.K performed the analyses and made the RNA-seq libraries. K.A.G.K., S.-Y.C., and M.-H.S. extracted RNA. N.K.L. managed germplasm and plants with K.A.G.K., M.C.R., K.L.S. and A.L. produced and imputed HapMap genotypic data. P.J.B. implemented matrixEQTL in Java/TASSEL. F.L. implemented SNP calling from RNA-seq data.

Corresponding authors

Correspondence to Karl A. G. Kremling or Edward S. Buckler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks N. Springer and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Tissues that were expression profiled by 3′ RNA-seq.

See additional details regarding tissue collection in Methods. Illustrations inspired by ref. 41.

Extended Data Figure 2 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the most highly expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 3 Higher numbers of rare alleles are upstream of genes in extreme-expressing individuals, for the medium-expressed genes.

Quadratic regression of the expression rank of each line, for each of the top 5,001–10,000 most-expressed genes versus the average local (5-kb upstream) rare-allele count. a, Base of leaf three (n = 263 unique inbred samples). b, Tip of leaf three (n = 265 unique inbred samples). c, Adult leaves collected during the day (n = 204 unique inbred samples). d, Adult leaves collected at night (n = 260 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples).f, Roots of germinating seedling (n = 273 unique inbred samples). g, Shoots of germinating seedling (n = 278 unique inbred samples).

Extended Data Figure 4 Comparison of the number of rare cis alleles near genes with differing expression levels.

The 10,000 most-expressed genes in each tissue are divided into groups of 1,000 on the basis of expression level. Plots in each panel show genes ranked 1–1,000, 1,001–2,000, …, 9,001–10,000 from left to right. Each of the individuals represented in each tissue is ranked for expression for each of the 1,000 genes in each group. Individuals in the bottom five expression ranks (fuchsia) versus the middle two quartiles (yellow) versus the top five expression ranks (blue) (mean ± s.e.m.). Y axes refer to mean upstream (within 5 kb) rare-allele count. a, Roots of germinating seedling (n = 273 unique inbred samples). b, Shoots of germinating seedling (n = 278 unique inbred samples). c, Kernels at 350-growing-degree days (n = 229 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Tip of leaf three (n = 265 unique inbred samples). f, Adult leaves collected during the day (n = 204 unique inbred samples). g, Adult leaves collected at night (n = 260 unique inbred samples).

Extended Data Figure 5 eQTL R² distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.1–0.2 (RNA-set MAF) versus 0.1–0.2 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10⁻¹⁶ two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 6 eQTL R² distribution comparisons between SNPs in 0.0–0.1 (tropical MAF) and 0.4–0.5 (RNA-set MAF) versus 0.4–0.5 (RNA-set and tropical MAF).

a, Adult leaves collected at night (n = 260 unique inbred samples). b, Adult leaves collected during the day (n = 204 unique inbred samples). c, Tip of leaf three (n = 265 unique inbred samples). d, Base of leaf three (n = 263 unique inbred samples). e, Kernels at 350-growing-degree days (n = 229 unique inbred samples). f, Shoots of germinating seedling (n = 278 unique inbred samples). g, Roots of germinating seedling (n = 273 unique inbred samples). All pairs of distributions within each tissue are significantly different. P < 2.2 × 10⁻¹⁶ two-sided Wilcoxon signed-rank test and Kolmogorov–Smirnov test.

Extended Data Figure 7 Expression value and dysregulation of 5,000 most-expressed genes are both predictive of fitness.

Orange boxes represent correlations between predicted and true seed weight when using expression values. Yellow boxes represent correlations between predicted and true seed weight when using absolute deviation in expression from the population mean. Range of correlations between predicted and true seed weight is displayed from ten repetitions of nested tenfold cross validation (ten inner and ten outer) using ridge regression. In the box plots, the middle horizontal lines represent the median, hinges represent the 25th and 75th percentiles (the interquartile range), the upper and lower whiskers extend to maximum and minimum points no more than 1.5× interquartile range beyond the hinges, and individual dots are outliers beyond the whiskers. Sample sizes: 2-cm root tips of germinating seedlings (unique n = 181) and whole shoots of germinating seedlings (unique n = 183); the 2-cm base (unique n = 181) and tip (unique n = 182) of leaf 3; leaves collected in the field during the day (unique n = 135) and night (unique n = 187); and 350-growing-degree-day kernels (unique n = 171), post sexual maturity (anthesis).

Extended Data Figure 8 Cumulative expression dysregulation of the 5,000 most-expressed genes in each tissue versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples). Regression statistics in Extended Data Table 1. Sweet corn and popcorn lines were excluded from these regressions.

Extended Data Figure 9 Mean upstream rare-allele count from the 5,000 most highly expressed genes versus seed weight.

a, Adult leaves collected at night (n = 221 unique inbred samples). b, Adult leaves collected during the day (n = 171 unique inbred samples). c, Tip of leaf three (n = 226 unique inbred samples). d, Base of leaf three (n = 224 unique inbred samples). e, Kernels at 350-growing-degree days (n = 195 unique inbred samples). f, Shoots of germinating seedling (n = 235 unique inbred samples). g, Roots of germinating seedling (n = 226 unique inbred samples).

Extended Data Table 1 Regression statistics for cumulative expression dysregulation in each tissue against seed-weight fitness

Full size table

Supplementary information

Life Sciences Reporting Summary (PDF 72 kb)

Supplementary Table 1

This table contains collection details for all sampled genotypes. Sequencing batch, tissue of origin, RNAseq depth, and subpopulation membership are specified for each sample. (XLS 445 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kremling, K., Chen, SY., Su, MH. et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature 555, 520–523 (2018). https://doi.org/10.1038/nature25966

Download citation

Received: 07 October 2016
Accepted: 01 February 2018
Published: 14 March 2018
Issue Date: 22 March 2018
DOI: https://doi.org/10.1038/nature25966

This article is cited by

A combination of conserved and diverged responses underlies Theobroma cacao’s defense response to Phytophthora palmivora
- Noah P. Winters
- Eric K. Wafula
- Mark J. Guiltinan
BMC Biology (2024)
A role for heritable transcriptomic variation in maize adaptation to temperate environments
- Guangchao Sun
- Huihui Yu
- James C. Schnable
Genome Biology (2023)
Pervasive under-dominance in gene expression underlying emergent growth trajectories in Arabidopsis thaliana hybrids
- Wei Yuan
- Fiona Beitel
- Detlef Weigel
Genome Biology (2023)
Genome- and Transcriptome-wide Association Studies to Discover Candidate Genes for Diverse Root Phenotypes in Cultivated Rice
- Shujun Wei
- Ryokei Tanaka
- Shiori Yabe
Rice (2023)
An efficient CRISPR–Cas12a promoter editing system for crop improvement
- Jianping Zhou
- Guanqing Liu
- Yong Zhang
Nature Plants (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.