The strength and pattern of natural selection on gene expression in rice

Abstract

Levels of gene expression underpin organismal phenotypes1,2, but the nature of selection that acts on gene expression and its role in adaptive evolution remain unknown1,2. Here we assayed gene expression in rice (Oryza sativa)3, and used phenotypic selection analysis to estimate the type and strength of selection on the levels of more than 15,000 transcripts4,5. Variation in most transcripts appears (nearly) neutral or under very weak stabilizing selection in wet paddy conditions (with median standardized selection differentials near zero), but selection is stronger under drought conditions. Overall, more transcripts are conditionally neutral (2.83%) than are antagonistically pleiotropic6 (0.04%), and transcripts that display lower levels of expression and stochastic noise7,8,9 and higher levels of plasticity9 are under stronger selection. Selection strength was further weakly negatively associated with levels of cis-regulation and network connectivity9. Our multivariate analysis suggests that selection acts on the expression of photosynthesis genes4,5, but that the efficacy of selection is genetically constrained under drought conditions10. Drought selected for earlier flowering11,12 and a higher expression of OsMADS18 (Os07g0605200), which encodes a MADS-box transcription factor and is a known regulator of early flowering13—marking this gene as a drought-escape gene11,12. The ability to estimate selection strengths provides insights into how selection can shape molecular traits at the core of gene action.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The strength and pattern of selection on heritable rice-leaf transcript levels differ across field environments.
Fig. 2: Gene-expression level, stochasticity, plasticity, tissue specificity and connectivity influence microevolutionary rates of expression change.
Fig. 3: Transcripts under selection could affect fitness through regulating early growth vigour and flowering time.
Fig. 4: Selection targets expression patterns in different biological processes in wet and dry conditions.

Data availability

Raw FASTQ reads for 188 accessions with resequenced genomes were downloaded from the SRA under SRA BioProject accession numbers PRJNA422249 and PRJNA557122. Raw FASTQ reads for a further 27 accessions included in the 3K-RG project were downloaded from the SRA under BioProject accession number PRJEB6180. RNA sequence data that support the findings of this study have been deposited under SRA BioProject accession number PRJNA588478. Processed RNA expression count data have been deposited in Zenodo (https://zenodo.org/record/3533431 with DOI 10.5281/zenodo.3533431), alongside a sample metadata file with a key to the RNA sequence data in SRA BioProject accession number PRJNA588478. This key can also be found in Supplementary Table 4. Source Data for Figs. 14 and Extended Data Figs. 18 are provided with the paper.

Code availability

Selection analyses were run using custom-made scripts in Python version 2.7, which are available in Supplementary Notes 1, 2, and on GitHub in repositories icalic/Linear-regression-analysis (https://github.com/icalic/Linear-regression-analysis.git) and icalic/Logistic-regression-analysis (https://github.com/icalic/Logistic-regression-analysis.git). For all other analyses we used previously developed, publicly available software and code: leaf area was assessed using ImageJ v.1.52 and GIMP v.2.10.0; RNA-seq data were processed and analysed using Drop-seq tools v.1.12, STAR aligner v.020201, Picard tools v.2.9.0, DChip v.2010.01 and R v.3.4.3 packages edgeR v.3.14 and lme4 v.1.1; gene-set enrichment analyses were performed using PlantGSEA v.1; statistical analyses were performed in R v.3.4.3, further using packages lme4 v.1.1 and corpcor v.1.6.9; and genome analyses were performed using bbduk v.37.66, bwa-mem v.0.7.16a-r1181, the GATK GenotypeGVCFs engine v.3.8-0-ge9d806836, vcftools v.0.1.15, jvarkit suite v.1, Beagle v.4.1, plink v.1.9 and GAPIT v.3.

References

  1. 1.

    Fay, J. C. & Wittkopp, P. J. Evaluating the role of natural selection in the evolution of gene regulation. Heredity 100, 191–199 (2008).

    CAS  Google Scholar 

  2. 2.

    Romero, I. G., Ruvinsky, I. & Gilad, Y. Comparative studies of gene expression and the evolution of gene regulation. Nat. Rev. Genet. 13, 505–516 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Wing, R. A., Purugganan, M. D. & Zhang, Q. The rice genome revolution: from an ancient grain to green super rice. Nat. Rev. Genet. 19, 505–517 (2018).

    CAS  Google Scholar 

  4. 4.

    Kingsolver, J. G. et al. The strength of phenotypic selection in natural populations. Am. Nat. 157, 245–261 (2001).

    CAS  Google Scholar 

  5. 5.

    Lande, R. & Arnold, S. J. The measurement of selection on correlated characters. Evolution 37, 1210–1226 (1983).

    Google Scholar 

  6. 6.

    Anderson, J. T., Lee, C. R., Rushworth, C. A., Colautti, R. I. & Mitchell-Olds, T. Genetic trade-offs and conditional neutrality contribute to local adaptation. Mol. Ecol. 22, 699–708 (2013).

    Google Scholar 

  7. 7.

    Lemos, B., Bettencourt, B. R., Meiklejohn, C. D. & Hartl, D. L. Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein–protein interactions. Mol. Biol. Evol. 22, 1345–1354 (2005).

    CAS  Google Scholar 

  8. 8.

    Lehner, B. Selection to minimise noise in living systems and its implications for the evolution of gene expression. Mol. Syst. Biol. 4, 170 (2008).

    PubMed  PubMed Central  Google Scholar 

  9. 9.

    MacNeil, L. T. & Walhout, A. J. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res. 21, 645–657 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Conner, J. & Via, S. Natural selection on body size in Tribolium: possible genetic constraints on adaptive evolution. Heredity 69, 73–83 (1992).

    Google Scholar 

  11. 11.

    Franks, S. J. Plasticity and evolution in drought avoidance and escape in the annual plant Brassica rapa. New Phytol. 190, 249–257 (2011).

    Google Scholar 

  12. 12.

    Kumar, A. et al. Breeding high-yielding drought-tolerant rice: genetic variations and conventional and molecular approaches. J. Exp. Bot. 65, 6265–6278 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Fornara, F. et al. Functional characterization of OsMADS18, a member of the AP1/SQUA subfamily of MADS box genes. Plant Physiol. 135, 2207–2219 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Ayroles, J. F. et al. Systems genetics of complex traits in Drosophila melanogaster. Nat. Genet. 41, 299–307 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Conner, J. Field measurements of natural and sexual selection in the fungus beetle, Bolitotherus cornutus. Evolution 42, 736–749 (1988).

    Google Scholar 

  17. 17.

    Hoekstra, H. E. et al. Strength and tempo of directional selection in the wild. Proc. Natl Acad. Sci. USA 98, 9157–9160 (2001).

    ADS  CAS  Google Scholar 

  18. 18.

    Nourmohammad, A. et al. Adaptive evolution of gene expression in Drosophila. Cell Rep. 20, 1385–1395 (2017).

    CAS  Google Scholar 

  19. 19.

    Ghalambor, C. K. et al. Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature 525, 372–375 (2015).

    ADS  CAS  Google Scholar 

  20. 20.

    Kenkel, C. D. & Matz, M. V. Gene expression plasticity as a mechanism of coral adaptation to a variable environment. Nat. Ecol. Evol. 1, 0014 (2016).

  21. 21.

    Zhang, L. & Li, W. H. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol. Biol. Evol. 21, 236–239 (2004).

    Google Scholar 

  22. 22.

    Hendry, A. P. & Kinnison, M. T. The pace of modern life: measuring rates of contemporary microevolution. Evolution 53, 1637–1653 (1999).

    Google Scholar 

  23. 23.

    Duveau, F. et al. Fitness effects of altering gene expression noise in Saccharomyces cerevisiae. eLife 7, e37272 (2018).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Jimenez-Gomez, J. M., Corwin, J. A., Joseph, B., Maloof, J. N. & Kliebenstein, D. J. Genomic analysis of QTLs and genes altering natural variation in stochastic noise. PLoS Genet. 7, e1002295 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Plessis, A. et al. Multiple abiotic stimuli are integrated in the regulation of rice gene expression under field conditions. eLife 4, e08411 (2015).

    PubMed  PubMed Central  Google Scholar 

  26. 26.

    Wilkins, O. et al. EGRINs (environmental gene regulatory influence networks) in rice that function in the response to water deficit, high temperature, and agricultural environments. Plant Cell 28, 2365–2384 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Huang, X. et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat. Genet. 44, 32–39 (2011).

    CAS  Google Scholar 

  28. 28.

    Wang, Y. et al. Background-independent quantitative trait loci for drought tolerance identified using advanced backcross introgression lines in rice. Crop Sci. 53, 430–441 (2013).

    Google Scholar 

  29. 29.

    Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    CAS  Google Scholar 

  30. 30.

    Zaidem, M. L., Groen, S. C. & Purugganan, M. D. Evolutionary and ecological functional genomics, from lab to the wild. Plant J. 97, 40–55 (2019).

    CAS  Google Scholar 

  31. 31.

    Keurentjes, J. J. et al. Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc. Natl Acad. Sci. USA 104, 1708–1713 (2007).

    ADS  CAS  Google Scholar 

  32. 32.

    Caicedo, A. L. et al. Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 3, e163 (2007).

    Google Scholar 

  33. 33.

    Garris, A. J., Tai, T. H., Coburn, J., Kresovich, S. & McCouch, S. Genetic structure and diversity in Oryza sativa L. Genetics 169, 1631–1638 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Gutaker, R. M. et al. Genomic history and ecology of the geographic spread of rice. Preprint at bioRxiv https://doi.org/10.1101/748178 (2019).

  35. 35.

    McCouch, S. R. et al. Open access resources for genome-wide association mapping in rice. Nat. Commun. 7, 10532 (2016).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    McNally, K. L. et al. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc. Natl Acad. Sci. USA 106, 12273–12278 (2009).

    ADS  CAS  Google Scholar 

  37. 37.

    Torres, R. O., McNally, K. L., Cruz, C. V., Serraj, R. & Henry, A. Screening of rice genebank germplasm for yield and selection of new drought tolerance donors. Field Crops Res. 147, 12–22 (2013).

    Google Scholar 

  38. 38.

    Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Abramoff, M. D., Magalhaes, P. J. & Ram, S. J. Image processing with ImageJ. Biophoton. Int. 11, 36–42 (2004).

    Google Scholar 

  40. 40.

    Bracken, B. Barcoded plate-based single cell RNA-seq. https://www.protocols.io/view/barcoded-plate-based-single-cell-rna-seq-nkgdctw (2018).

  41. 41.

    Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    CAS  Google Scholar 

  42. 42.

    Soumillon, M., Cacchiarelli, D., Semrau, S., van Oudenaarden, A. & Mikkelsen, T. S. Characterization of directed differentiation by high-throughput single-cell RNA-seq. Preprint at bioRxiv https://doi.org/10.1101/003236 (2014).

  43. 43.

    Li, C. & Wong, W. H. Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc. Natl Acad. Sci. USA 98, 31–36 (2001).

    ADS  CAS  MATH  Google Scholar 

  44. 44.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  Google Scholar 

  45. 45.

    R Core Team. R: a language and environment for statistical computing. http://www.R-project.org/ (R Foundation for Statistical Computing, Vienna, 2016).

    Google Scholar 

  46. 46.

    Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    ADS  MathSciNet  CAS  PubMed  MATH  Google Scholar 

  47. 47.

    Bates, D., Maechler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).

    Google Scholar 

  48. 48.

    Yi, X., Du, Z. & Su, Z. PlantGSEA: a gene set enrichment analysis toolkit for plant community. Nucleic Acids Res. 41, W98–W103 (2013).

    PubMed  PubMed Central  Google Scholar 

  49. 49.

    Brodie, E. D. III, Moore, A. J. & Janzen, F. J. Visualizing and quantifying natural selection. Trends Ecol. Evol. 10, 313–318 (1995).

    Google Scholar 

  50. 50.

    Janzen, F. J. & Stern, H. S. Logistic regression for empirical studies of multivariate selection. Evolution 52, 1564–1571 (1998).

    Google Scholar 

  51. 51.

    Koenig, W. D., Albano, S. S. & Dickinson, J. L. A comparison of methods to partition selection acting via components of fitness: do larger male bullfrogs have greater hatching success? J. Evol. Biol. 4, 309–320 (1991).

    Google Scholar 

  52. 52.

    Kassambara, A. Practical Guide to Principal Component Methods in R: PCA, M (CA), FAMD, MFA, HCPC, factoextra (STHDA, 2017).

  53. 53.

    Davidson, R. M. et al. Comparative transcriptomics of three Poaceae species reveals patterns of gene expression evolution. Plant J. 71, 492–502 (2012).

    CAS  Google Scholar 

  54. 54.

    Yanai, I. et al. Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21, 650–659 (2005).

    CAS  Google Scholar 

  55. 55.

    Schäfer, J. & Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4, Article32 (2005).

    MathSciNet  Google Scholar 

  56. 56.

    Larracuente, A. M. et al. Evolution of protein-coding genes in Drosophila. Trends Genet. 24, 114–123 (2008).

    CAS  Google Scholar 

  57. 57.

    Keren, L. et al. Noise in gene expression is coupled to growth rate. Genome Res. 25, 1893–1902 (2015)

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Hieno, A. et al. ppdb: plant promoter database version 3.0. Nucleic Acids Res. 42, D1188–D1192 (2014).

    CAS  Google Scholar 

  59. 59.

    Yamamoto, Y. Y. et al. Identification of plant promoter constituents by analysis of local distribution of short sequences. BMC Genomics 8, 67 (2007).

    PubMed  PubMed Central  Google Scholar 

  60. 60.

    Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Proost, S. et al. PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell 21, 3718–3731 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Van Bel, M. et al. PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic Acids Res. 46, D1190–D1196 (2018).

    Google Scholar 

  63. 63.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  64. 64.

    Van der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).

    Google Scholar 

  65. 65.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Browning, B. L. & Browning, S. R. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98, 116–126 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Tropf, F. C. et al. Human fertility, molecular genetics, and natural selection in modern societies. PLoS ONE 10, e0126821 (2015).

    PubMed  PubMed Central  Google Scholar 

  69. 69.

    Lipka, A. E. et al. GAPIT: genome association and prediction integrated tool. Bioinformatics 28, 2397–2399 (2012).

    CAS  Google Scholar 

  70. 70.

    VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

    CAS  PubMed  Google Scholar 

  71. 71.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

    PubMed  PubMed Central  Google Scholar 

  73. 73.

    Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. 74.

    Bland, J. M. & Altman, D. G. Multiple significance tests: the Bonferroni method. Br. Med. J. 310, 170 (1995).

    CAS  Google Scholar 

  75. 75.

    Fournier-Level, A. et al. A map of local adaptation in Arabidopsis thaliana. Science 334, 86–89 (2011).

    ADS  CAS  Google Scholar 

  76. 76.

    Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).

    CAS  Google Scholar 

  77. 77.

    Mather, K. A. et al. The extent of linkage disequilibrium in rice (Oryza sativa L.). Genetics 177, 2223–2232 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78.

    Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 467 (2011).

    ADS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank B. U. Principe, P. C. Maturan and L. Holongbayan for assistance with field management, tissue sampling and trait measurements; the staff of IRRI’s Climate Unit for providing weather data; Z. Fresquez for help with tissue processing; L. Harshman for assistance with a pilot RNA-seq run; the New York University Center for Genomics and Systems Biology GenCore Facility for sequencing support; and New York University High Performance Computing for supplying computational resources. We are grateful to current and former members of the Purugganan laboratory (particularly J. Flowers, R. Gutaker, A. Plessis, O. Wilkins and M. Zaidem) and the IRRI Strategic Innovation and Rice Breeding research platforms (particularly S. Dixit, A. Kohli, Y. Ludwig, K. McNally, R. Oliva, V. Roman-Reyna and N. Tsakirpaloglou) for insightful discussions; M. Quintana for sharing scripts in R; and S. Zaaijer for codesigning the figures. This work was funded in part by grants from the Zegar Family Foundation, the National Science Foundation Plant Genome Research Program and the NYU Abu Dhabi Research Institute to M.D.P., a fellowship from the Natural Sciences and Engineering Research Council of Canada through Grant PDF-502464-2017 to Z.J.-L., and a fellowship from the Gordon and Betty Moore Foundation/Life Sciences Research Foundation through Grant GBMF2550.06 to S.C.G.

Author information

Affiliations

Authors

Contributions

M.D.P. conceived and directed the project; M.D.P., G.V., A.H., R.O.T., A.K., and S.C.G. designed and coordinated field experiments; M.N., C.L.U.C., Z.J.-L., J.Y.C., and S.C.G. performed field experiments; K.D., M.N., Z.J.-L., J.Y.C. and S.C.G. processed samples and extracted RNA for sequencing; W.M.M. III and B.B. made RNA-seq libraries; A.E.P. and R.S. designed and ran the bioinformatics workflow for RNA-seq; J.Y.C. and S.C.G. conducted genetic-marker-based analyses; I.Ć., S.C.G., S.J.F. and M.D.P. designed and performed selection analyses; M.N., A.H., J.Y.C., S.C.G. and I.Ć. processed fitness, higher-level-trait and gene-expression data, and performed statistical analyses; and S.C.G., S.J.F. and M.D.P. wrote the manuscript.

Corresponding author

Correspondence to Michael D. Purugganan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Anthony Greenberg, Detlef Weigel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Experimental setup.

a, Geographical origins of 220 O. sativa accessions, of which 4 constitute additionally replicated checks (Supplementary Table 1). Seven accessions that are not from Eurasia or Africa are not shown. Varietal group (vg.) Indica accessions are indicated in indigo and vg. Japonica accessions are indicated in jade. Map data ©2019 Google. b, Populations of Indica and Japonica accessions (planted in triplicate alongside one another) were monitored for total lifetime fitness in wet (magenta) and dry (blue) fields. Both fields had identical layouts. Numbers reflect Indica populations with 3 × 136 accessions = 408 individuals planted in each field; Extended Data Fig. 3 shows Japonica populations. Under drought conditions, both multiplicative fitness components (flowering success (lime) and fecundity (green)) were relevant (multiplying to total lifetime fitness), but in wet conditions only the latter was relevant (fecundity equating to total lifetime fitness, magenta). c, Drought exerts truncating selection on the populations (declining and shifting blue versus magenta bar), and end-of-season was reached earlier under drought conditions. d, Cumulative rainfall shows one major rainfall event that caused the rainout shelter over the dry field to close temporarily after the start of the drought treatment and the sampling of leaf tissue for RNA sequencing (>51 DAS). e, During the period of flowering (>51 DAS), there was an increasing deficit in soil water potential. f, g, Patterns of volumetric soil moisture and vapour pressure deficit (VPD) were consistent with the pattern of soil water potential. Lighter shades of grey in f indicate deeper layers of soil. Grey and mustard lines in g indicate the VPD in the wet and dry field, respectively. h, Day length increased over the course of the experiment. i, Air temperature generally increased over the course of the experiment (grey and mustard lines indicate the wet and dry field, respectively). Source Data

Extended Data Fig. 2 Systems genetics of gene expression in the Indica populations in wet and dry field environments.

a, Environmental bias for transcript expression. Magenta and blue dots represent transcripts showing a 1.5-fold difference in expression between the wet and dry field environments, respectively. ANOVA, Indica environment FDR-adjusted q < 0.001, n = 136 accessions. b, Distribution of cross-environment genetic correlations (rWD) for transcripts showing significant (blue) genotype × environment (G × E) variance. ANOVA, Indica genotype × environment FDR-adjusted q < 0.001, n = 136 accessions. Source Data

Extended Data Fig. 3 Systems genetics of gene expression in the Japonica populations in wet and dry field environments.

a, Monitoring the Japonica populations, with 3 × 84 accessions = 252 individuals planted in both the wet and dry fields, for flowering success, fecundity fitness and total lifetime fitness (legend as in Extended Data Fig. 1b, c). b, Environmental bias for transcript expression. Magenta and blue dots represent transcripts showing a 1.5-fold difference in expression between the wet and dry field environments, respectively. ANOVA, Japonica environment FDR-adjusted q < 0.01, n = 84 accessions. c, Distribution of broad-sense heritabilities (H2) for transcripts with significant expression polymorphism. ANOVA, Japonica genotype FDR-adjusted q < 0.01, n = 84 accessions. d, Distribution of cross-environment genetic correlations (rWD) for transcripts showing significant (blue) genotype × environment (G × E) variance. ANOVA, Japonica genotype × environment FDR-adjusted q < 0.01, n = 84 accessions. Source Data

Extended Data Fig. 4 The strength and pattern of selection on Indica rice-leaf transcript levels under drought conditions differ across fitness components.

a, The strength of selection |S| on gene expression differed between selection for flowering success (lime), and fecundity (green) in the dry field. Mann–Whitney U-test, two-sided P < 0.001, n = 15,343. b, Positive directional selection (n = 11,304) was stronger than negative selection (n = 4,039) for fecundity under drought (green) (Mann–Whitney U-test, two-sided P < 0.001), and selection for flowering success showed higher absolute values (Kolmogorov–Smirnov test, two-sided P < 0.001, n = 15,343). c, Patterns of quadratic selection differed significantly for the two fitness components. Kolmogorov–Smirnov test, two-sided P < 0.001, n = 15,343. d, Patterns of conditional neutrality (light grey) and antagonistic pleiotropy (lime and green for transcripts beneficial for flowering success and fecundity, respectively) for gene expression under drought conditions. Black indicates transcripts that experienced selection in the same direction for both fitness components. Source Data

Extended Data Fig. 5 Stochastic expression noise and transcript connectivity limit the efficacy of selection on gene expression.

a, b, Partial correlation analyses of factors that negatively (grey) and positively (mustard) influence the strength of selection |S| on gene expression for flowering success (a) and fecundity (b) fitness in dry conditions. Dots indicate statistical significance of Pearson’s partial r (t-test, two-sided P < 0.05, n = 14,753) (Supplementary Table 14). c, Global expression stochasticity limits fecundity under drought conditions. Spearman’s ρ = −0.174, t-test, two-sided P = 0.042, n = 136 accessions. d, As in wet conditions, |S| is bounded by expression connectivity under drought conditions. Kruskal–Wallis test, P = 0.0008, n = 12,502 transcripts. Left, box plot with centre line = median, cross = mean, box limits = upper and lower quartiles, whiskers = 1.5 × interquartile range, points = outliers. Right, mean ± s.e.m. e, In dry as well as in wet conditions, |S| is limited by gene regulatory constraints as assessed through the number of cis-regulatory elements in the promoter (n = 3,907 transcripts, Mann–Whitney U-test, two-sided P = 0.000015), and the number of transcription factors regulating a gene (n = 2,905 transcripts, Mann–Whitney U-test, two-sided P = 0.0027) illustrated for selection for total lifetime fitness under drought. Left, boxes and whiskers as in d. Right, mean ± s.e.m. Source Data

Extended Data Fig. 6 Distributions of transcript–trait correlations for the three higher-level traits measured in the dry field environment.

a, Absolute Pearson’s correlations |r| of transcripts with leaf area (green). n = 15,635 transcripts. The cloud delineates transcripts (listed) that show significant linear or quadratic selection differentials for fecundity under drought conditions, and significant correlations with leaf area (Supplementary Text). b, Absolute Pearson’s correlations |r| of transcripts with chlorophyll concentration (green). n = 15,635 transcripts. The cloud delineates a transcript that shows a significant quadratic selection differential for fecundity under drought conditions, and a significant correlation with chlorophyll concentration (Supplementary Text). c, Absolute Pearson’s correlations |r| of transcripts with flowering time (lime). n = 15,635 transcripts. The cloud delineates transcripts (listed) that show significant linear selection differentials for flowering success under drought conditions, and significant correlations with early flowering (Supplementary Text). Source Data

Extended Data Figure 7 Genome-wide association mapping of the genetic architecture of transcripts that covary significantly with fitness in the Indica population under drought conditions.

Three out of eight transcripts are partially controlled by trans-eQTLs (illustrated for expression of the glycine-rich family protein-coding gene Os11g0209000 under drought conditions). Supplementary Table 27 provides results for other transcripts and for expression principal components or eigengenes as suites of transcripts. a, PCA of 179,634 SNP markers from the Indica population that were selected for analysis; the three principal components, plus a fourth, were included as cofactors in the multi-locus linear mixed model. b, Distribution of expected versus observed P values for associations between SNP markers and Os11g0209000 expression in a QQ plot. n = 131 genotypes; multi-locus linear mixed model, two-sided, Bonferroni-adjusted P < 0.05 for 179,634 SNP markers. c, The Manhattan plot indicates two significant trans-eQTL peaks for expression of Os11g0209000 (gene location indicated with vertical red bar). Only the top approximately 5% of SNPs (10,000 SNPs) are shown. Source Data

Extended Data Fig. 8 Genome-wide association mapping for fitness in the wet and dry field environments.

Taking the top approximately 0.5% of SNPs (1,000 SNPs) with the strongest association to total lifetime fitness in the wet (magenta) and dry (blue) field conditions after genome-wide association mapping, we observed no enrichment for transcripts (n = 809 and 142 transcripts in the wet and dry fields, respectively) that were expressed in the leaves and had significant linear selection differentials S (n = 408 plants, t-test, two-sided, unadjusted P < 0.05) among transcripts (n = 1,960 transcripts in the wet field and n = 1,671 transcripts in the dry field) from genes in 100-kb regions surrounding these SNPs, compared to transcripts from genes in other genomic regions (χ2, not significant (ns); two-sided P = 0.862 for the wet field and P = 0.85 for the dry field). Supplementary Table 27 provides genome-wide association mapping results for total lifetime fitness in wet and dry conditions, and for flowering success and fecundity under drought conditions. Source Data

Extended Data Table 1 Phenotypic selection gradients, G-matrices and outcomes of selection for transcript levels in wet and dry conditions
Extended Data Table 2 Phenotypic selection gradients on transcript levels for flowering success, fecundity and lifetime fitness in dry conditions

Supplementary information

Supplementary Information

This file contains Supplementary Text and References, and Supplementary Notes 1-2

Reporting Summary

Supplementary Table 1 | List of accessions with metadata and genome re-sequencing statistics

Supplementary Table 2 | Experimental design, and trait as well as fitness measurements

Supplementary Table 3 | Weather and soil characteristics data

Supplementary Table 4 | Details of RNA-seq libraries

Supplementary Table 5 | Systems genetics analysis of variance in the transcriptome of the Indica population in wet and dry field conditions

Supplementary Table 6 | Gene set enrichment analysis of transcripts showing environmentally biased expression patterns in the Indica population

Supplementary Table 7 | Systems genetics analysis of variance in the transcriptome of the Japonica population in wet and dry field conditions

Supplementary Table 8 | Gene set enrichment analysis of transcripts showing environmentally biased expression patterns in the Japonica population

Supplementary Table 9 | Statistical analyses of fitness measurements in the Indica and Japonica populations

Supplementary Table 10 | Selection differentials for the Indica population across field environments and fitness components

Supplementary Table 11 | Gene set enrichment analyses on the tails of the distributions of |S| for the Indica population across field environments and fitness components

Supplementary Table 12 | Conditional Neutrality / Antagonistic Pleiotropy (CNAP) analyses

Supplementary Table 13 | Metadata per transcript of factors that may influence the strength of selection on gene expression

Supplementary Table 14 | Partial correlation analyses on factors that may influence the strength of selection

Supplementary Table 15 | Global levels of stochastic expression noise per Indica accession in each of the two field environments

Supplementary Table 16 | Global levels of gene expression plasticity per Indica accession in each of the two field environments

Supplementary Table 17 | Metadata per transcript and analysis of gene regulatory network factors that may influence the strength of selection on gene expression

Supplementary Table 18 | Principal components/eigengenes as suites of transcripts for multivariate selection analyses for the Indica population in wet and dry field conditions

Supplementary Table 19 | Statistical analyses for the higher-level traits measured in the Indica and Japonica populations in wet and dry field conditions

Supplementary Table 20 | Multivariate selection analyses on the higher-level traits for the Indica and Japonica populations in wet and dry field conditions

Supplementary Table 21 | Gene set term enrichment analyses on the tails of the distributions of principal components for the transcriptomes of the Indica population across field environments and fitness components

Supplementary Table 22 | Transcript-trait correlations for the Indica population in both field environments

Supplementary Table 23 | Strength of selection on genes grouped by gene ontology biological process for the Indica population in the two field environments

Supplementary Table 24 | Selection differentials for the JAPONICA population across field environments and fitness components

Supplementary Table 25 | Single-nucleotide polymorphisms included in genome-wide association mapping

Supplementary Table 26 | Principal component (PC) loadings for SNPs on PCs included as cofactors in genome-wide association mapping

Supplementary Table 27 | Genome-wide association mapping of fitness and (suites of) transcripts under selection in the Indica population across field environments

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Groen, S.C., Ćalić, I., Joly-Lopez, Z. et al. The strength and pattern of natural selection on gene expression in rice. Nature 578, 572–576 (2020). https://doi.org/10.1038/s41586-020-1997-2

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing