Mass-spectrometry-based draft of the Arabidopsis proteome

Mergner, Julia; Frejno, Martin; List, Markus; Papacek, Michael; Chen, Xia; Chaudhary, Ajeet; Samaras, Patroklos; Richter, Sandra; Shikata, Hiromasa; Messerer, Maxim; Lang, Daniel; Altmann, Stefan; Cyprys, Philipp; Zolg, Daniel P.; Mathieson, Toby; Bantscheff, Marcus; Hazarika, Rashmi R.; Schmidt, Tobias; Dawid, Corinna; Dunkel, Andreas; Hofmann, Thomas; Sprunck, Stefanie; Falter-Braun, Pascal; Johannes, Frank; Mayer, Klaus F. X.; Jürgens, Gerd; Wilhelm, Mathias; Baumbach, Jan; Grill, Erwin; Schneitz, Kay; Schwechheimer, Claus; Kuster, Bernhard

doi:10.1038/s41586-020-2094-2

Article
Published: 11 March 2020

Mass-spectrometry-based draft of the Arabidopsis proteome

Nature volume 579, pages 409–414 (2020)Cite this article

42k Accesses
260 Citations
323 Altmetric
Metrics details

Subjects

Abstract

Plants are essential for life and are extremely diverse organisms with unique molecular capabilities¹. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Tissue map and multi-omics dataset.**

**Fig. 2: Data exploration in ATHENA and ProteomicsDB.**

**Fig. 3: Protein and mRNA expression.**

**Fig. 4: Characterization of protein complexes by protein co-expression and SEC–MS.**

**Fig. 5: Ascribing function to protein phosphorylation.**

Arabidopsis proteome and the mass spectral assay library

Article Open access 22 November 2019

PlantMWpIDB: a database for the molecular weight and isoelectric points of the plant proteomes

Article Open access 06 May 2022

Genome-Scale Characterization of Predicted Plastid-Targeted Proteomes in Higher Plants

Article Open access 19 May 2020

Data availability

The data supporting the findings of this study are available within the paper, the Supplementary Information and the public repositories. Source Data for Figs. 1–5 and Extended Data Figs. 1–9 are included with the paper. Transcriptome sequencing and quantification data are available at ArrayExpress (www.ebi.ac.uk/arrayexpress) under the identifier E-MTAB-7978. The raw mass spectrometric data and MaxQuant result files have been deposited to the ProteomeXchange Consortium via PRIDE¹²², with the dataset identifier PXD013868.

References

Krämer, U. Planting molecular functions in an ecological context with Arabidopsis thaliana. eLife 4, (2015).
Peng, J. et al. ‘Green revolution’ genes encode mutant gibberellin response modulators. Nature 400, 256–261 (1999).
Article ADS CAS PubMed Google Scholar
The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).
Article ADS Google Scholar
Kawakatsu, T. et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166, 492–505 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cheng, C. Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).
Article CAS PubMed Google Scholar
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45 (D1), D158–D169 (2017).
Article CAS Google Scholar
Baerenfaller, K. et al. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320, 938–941 (2008).
Article ADS CAS PubMed Google Scholar
van Wijk, K. J., Friso, G., Walther, D. & Schulze, W. X. Meta-analysis of Arabidopsis thaliana phospho-proteomics data reveals compartmentalization of phosphorylation motifs. Plant Cell 26, 2367–2389 (2014).
Article CAS PubMed PubMed Central Google Scholar
Durek, P. et al. PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res. 38, D828–D834 (2010).
Article CAS PubMed Google Scholar
Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).
Article CAS PubMed Google Scholar
Schmidt, T. et al. ProteomicsDB. Nucleic Acids Res. 46 (D1), D1271–D1281 (2018).
Article CAS PubMed Google Scholar
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
Article CAS PubMed Google Scholar
Bienvenut, W. V. et al. Comparative large scale characterization of plant versus mammal proteins reveals similar and idiosyncratic N-α-acetylation features. Mol. Cell. Proteomics 11, mcp.M111.015131 (2012).
Hazarika, R. R. et al. ARA-PEPs: a repository of putative sORF-encoded peptides in Arabidopsis thaliana. BMC Bioinformatics 18, 37 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
Article ADS CAS PubMed Google Scholar
Zheng, Y. et al. iTAK: a program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant 9, 1667–1670 (2016).
Article CAS PubMed Google Scholar
Yang, M. et al. A comprehensive analysis of protein phosphatases in rice and Arabidopsis. Plant Syst. Evol. 289, 111–126 (2010).
Article CAS Google Scholar
Litt, A. & Kramer, E. M. The ABC model and the diversification of floral organ identity. Semin. Cell Dev. Biol. 21, 129–137 (2010).
Article CAS PubMed Google Scholar
Bar-On, Y. M. & Milo, R. The global mass and average rate of rubisco. Proc. Natl Acad. Sci. USA 116, 4738–4743 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gupta, R. et al. Time to dig deep into the plant proteome: a hunt for low-abundance proteins. Front Plant Sci 6, 22 (2015).
PubMed PubMed Central Google Scholar
Galván-Ampudia, C. S. & Offringa, R. Plant evolution: AGC kinases tell the auxin tale. Trends Plant Sci. 12, 541–547 (2007).
Article CAS PubMed Google Scholar
Zhang, Y., He, J. & McCormick, S. Two Arabidopsis AGC kinases are critical for the polarized growth of pollen tubes. Plant J. 58, 474–484 (2009).
Article CAS PubMed Google Scholar
Eraslan, B. et al. Quantification and discovery of sequence determinants of protein-per-mRNA amount in 29 human tissues. Mol. Syst. Biol. 15, e8513 (2019).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
Article CAS PubMed Google Scholar
Hanson, G. & Coller, J. Codon optimality, bias and usage in translation and mRNA decay. Nat. Rev. Mol. Cell Biol. 19, 20–30 (2018).
Article CAS PubMed Google Scholar
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Article ADS CAS PubMed Google Scholar
Santner, A. & Estelle, M. The ubiquitin-proteasome system regulates plant hormone signaling. Plant J. 61, 1029–1040 (2010).
Article CAS PubMed PubMed Central Google Scholar
Luo, J., Zhou, J. J. & Zhang, J. Z. Aux/IAA gene family in plants: molecular structure, regulation, and function. Int. J. Mol. Sci. 19, E259 (2018).
Article CAS PubMed Google Scholar
Bai, B. et al. Seed stored mRNAs that are specifically associated to monosome are translationally regulated during germination. Plant Physiol. 182, 378–392 (2019).
Article CAS PubMed PubMed Central Google Scholar
Szklarczyk, D. et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 45 (D1), D362–D368 (2017).
Article CAS PubMed Google Scholar
Wang, Y., Tan, X. & Paterson, A. H. Different patterns of gene structure divergence following gene duplication in Arabidopsis. BMC Genomics 14, 652 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lloyd, J. & Meinke, D. A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis. Plant Physiol. 158, 1115–1129 (2012).
Article CAS PubMed PubMed Central Google Scholar
Brandão, M. M., Dantas, L. L. & Silva-Filho, M. C. AtPIN: Arabidopsis thaliana protein interaction network. BMC Bioinformatics 10, 454 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kristensen, A. R., Gsponer, J. & Foster, L. J. A high-throughput approach for measuring temporal changes in the interactome. Nat. Methods 9, 907–909 (2012).
Article CAS PubMed PubMed Central Google Scholar
Schwartz, D. & Gygi, S. P. An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 23, 1391–1398 (2005).
Article CAS PubMed Google Scholar
Villén, J., Beausoleil, S. A., Gerber, S. A. & Gygi, S. P. Large-scale phosphorylation analysis of mouse liver. Proc. Natl Acad. Sci. USA 104, 1488–1493 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Battaglia, M., Olvera-Carrillo, Y., Garciarrubio, A., Campos, F. & Covarrubias, A. A. The enigmatic LEA proteins and other hydrophilins. Plant Physiol. 148, 6–24 (2008).
Article CAS PubMed PubMed Central Google Scholar
Bah, A. et al. Folding of an intrinsically disordered protein by phosphorylation as a regulatory switch. Nature 519, 106–109 (2015).
Article ADS CAS PubMed Google Scholar
Mitra, S. K. et al. An autophosphorylation site database for leucine-rich repeat receptor-like kinases in Arabidopsis thaliana. Plant J. 82, 1042–1060 (2015).
Article CAS PubMed Google Scholar
Landry, C. R., Levy, E. D. & Michnick, S. W. Weak functional constraints on phosphoproteomes. Trends Genet. 25, 193–197 (2009).
Article CAS PubMed Google Scholar
Hauser, F., Li, Z., Waadt, R. & Schroeder, J. I. SnapShot: abscisic acid signaling. Cell 171, 1708–1708 (2017).
Article CAS PubMed PubMed Central Google Scholar
Vaddepalli, P. et al. The C2-domain protein QUIRKY and the receptor-like kinase STRUBBELIG localize to plasmodesmata and mediate tissue morphogenesis in Arabidopsis thaliana. Development 141, 4139–4148 (2014).
Article CAS PubMed Google Scholar
Fulton, L. et al. DETORQUEO, QUIRKY, and ZERZAUST represent novel components involved in organ development mediated by the receptor-like kinase STRUBBELIG in Arabidopsis thaliana. PLoS Genet. 5, e1000355 (2009).
Article CAS PubMed PubMed Central Google Scholar
Smyth, D. R., Bowman, J. L. & Meyerowitz, E. M. Early flower development in Arabidopsis. Plant Cell 2, 755–767 (1990).
CAS PubMed PubMed Central Google Scholar
Johnson-Brousseau, S. A. & McCormick, S. A compendium of methods useful for characterizing Arabidopsis pollen mutants and gametophytically-expressed genes. Plant J. 39, 761–775 (2004).
Article CAS PubMed Google Scholar
Sprunck, S. et al. Egg cell-secreted EC1 triggers sperm cell activation during double fertilization. Science 338, 1093–1097 (2012).
Article ADS CAS PubMed Google Scholar
Karimi, M., Inzé, D. & Depicker, A. GATEWAY vectors for Agrobacterium-mediated plant transformation. Trends Plant Sci. 7, 193–195 (2002).
Article CAS PubMed Google Scholar
Clough, S. J. & Bent, A. F. Floral dip: a simplified method for Agrobacterium-mediated transformation of Arabidopsis thaliana. Plant J. 16, 735–743 (1998).
Article CAS PubMed Google Scholar
Schmid, M. et al. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506 (2005).
Article MathSciNet CAS PubMed Google Scholar
Boyes, D. C. et al. Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell 13, 1499–1510 (2001).
CAS PubMed PubMed Central Google Scholar
Bowman, J. L. Arabidopsis: an Atlas of Morphology and Development (Springer-Verlag, 1994).
Bradford, M. M. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254 (1976).
Article CAS PubMed Google Scholar
Ruprecht, B. et al. Optimized enrichment of phosphoproteomes by Fe-IMAC column chromatography. Methods Mol. Biol. 1550, 47–60 (2017).
Article CAS PubMed Google Scholar
Marx, H. et al. A large synthetic peptide and phosphopeptide reference library for mass spectrometry-based proteomics. Nat. Biotechnol. 31, 557–564 (2013).
Article CAS PubMed Google Scholar
Ruprecht, B., Zecha, J., Zolg, D. P. & Kuster, B. High pH reversed-phase micro-columns for simple, sensitive, and efficient fractionation of proteome and (TMT labeled) phosphoproteome digests. Methods Mol. Biol. 1550, 83–98 (2017).
Article CAS PubMed Google Scholar
Smith, P. K. et al. Measurement of protein using bicinchoninic acid. Anal. Biochem. 150, 76–85 (1985).
Article CAS PubMed Google Scholar
Zolg, D. P. et al. PROCAL: a set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, (2017).
Hahne, H. et al. DMSO enhances electrospray response, boosting sensitivity of proteomic experiments. Nat. Methods 10, 989–991 (2013).
Article CAS PubMed Google Scholar
Bian, Y. et al. Robust, reproducible and quantitative analysis of thousands of proteomes by micro-flow LC-MS/MS. Nat. Commun. 11, 157 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protocols 11, 2301–2319 (2016).
Article CAS PubMed Google Scholar
Hanada, K. et al. sORF finder: a program package to identify small open reading frames with high coding potential. Bioinformatics 26, 399–400 (2010).
Article ADS CAS PubMed Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Li, W., Jaroszewski, L. & Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17, 282–283 (2001).
Article CAS PubMed Google Scholar
Perkins, D. N., Pappin, D. J., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Article CAS PubMed Google Scholar
Franken, H. et al. Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Nat. Protocols 10, 1567–1593 (2015).
Article CAS PubMed Google Scholar
Toprak, U. H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014).
Article CAS PubMed PubMed Central Google Scholar
Oñate-Sánchez, L. & Vicente-Carbajosa, J. DNA-free RNA isolation protocols for Arabidopsis thaliana, including seeds and siliques. BMC Res. Notes 1, 93 (2008).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS PubMed Google Scholar
Silva, J. C., Gorenstein, M. V., Li, G. Z., Vissers, J. P. & Geromanos, S. J. Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol. Cell. Proteomics 5, 144–156 (2006).
Article CAS PubMed Google Scholar
The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 45 (D1), D331–D338 (2017).
Article CAS Google Scholar
Cox, J. & Mann, M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics 13 (Suppl. 16), S12 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).
Article CAS PubMed Google Scholar
Olsen, J. V. et al. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 127, 635–648 (2006).
Article CAS PubMed Google Scholar
Uhlén, M. et al. Transcriptomics resources of human tissues and organs. Mol. Syst. Biol. 12, 862 (2016).
Article PubMed PubMed Central Google Scholar
Rijpkema, A. S., Vandenbussche, M., Koes, R., Heijmans, K. & Gerats, T. Variations on a theme: changes in the floral ABCs in angiosperms. Semin. Cell Dev. Biol. 21, 100–107 (2010).
Article CAS PubMed Google Scholar
Heazlewood, J. L., Verboom, R. E., Tonti-Filippini, J., Small, I. & Millar, A. H. SUBA: the Arabidopsis Subcellular Database. Nucleic Acids Res. 35, D213–D218 (2007).
Article CAS PubMed Google Scholar
Löytynoja, A. Phylogeny-aware alignment with PRANK. Methods Mol. Biol. 1079, 155–170 (2014).
Article PubMed Google Scholar
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Article CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
van der Graaf, A. et al. Rate, spectrum, and evolutionary dynamics of spontaneous epimutations. Proc. Natl Acad. Sci. USA 112, 6676–6681 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Gebert, D., Jehn, J. & Rosenkranz, D. Widespread selection for extremely high and low levels of secondary structure in coding sequences across all domains of life. Open Biol. 9, 190020 (2019).
Article CAS PubMed PubMed Central Google Scholar
Camiolo, S., Melito, S. & Porceddu, A. New insights into the interplay between codon bias determinants in plants. DNA Res. 22, 461–470 (2015).
Article CAS PubMed PubMed Central Google Scholar
Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 102, 14338–14343 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Das, S. & Bansal, M. Variation of gene expression in plants is influenced by gene architecture and structural properties of promoters. PLoS ONE 14, e0212678 (2019).
Article CAS PubMed PubMed Central Google Scholar
Celaj, A. et al. Quantitative analysis of protein interaction network dynamics in yeast. Mol. Syst. Biol. 13, 934 (2017).
Article CAS PubMed PubMed Central Google Scholar
Niederhuth, C. E. et al. Widespread natural variation of DNA methylation within angiosperms. Genome Biol. 17, 194 (2016).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nakazawa, N. fmsb: functions for medical statistics book with some demographic data. R package v.0.6.3; https://CRAN.R-project.org/package=fmsb (2018).
Zhang, Z. Variable selection with stepwise and best subset approaches. Ann. Transl. Med. 4, 136 (2016).
Article PubMed PubMed Central Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. A Stat. Soc. 58, 267–288 (1996).
MathSciNet MATH Google Scholar
R Core Team. R: A language and environment for statistical computing. https://www.R-project.org/ (R Foundation for Statistical Computing, 2014).
Knecht, W. Pilot Willingness to Take Off Into Marginal Weather, Part II: Antecedent Overfitting With Forward Stepwise Logistic Regression. Final Report DOT/FAA/AM-05/15 (Federal Aviation Administration, 2005).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article PubMed PubMed Central Google Scholar
Groemping, U. Relative importance for linear regression in R: the package relaimpo. J. Stat. Softw. 17, 1–27 (2007).
Google Scholar
Heusel, M. et al. Complex-centric proteome profiling by SEC-SWATH-MS. Mol. Syst. Biol. 15, e8438 (2019).
Article CAS PubMed PubMed Central Google Scholar
McBride, Z., Chen, D., Reick, C., Xie, J. & Szymanski, D. B. Global analysis of membrane-associated protein oligomerization using protein correlation profiling. Mol. Cell. Proteomics 16, 1972–1989 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res. 38, D497–D501 (2010).
Article CAS PubMed Google Scholar
Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005).
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008).
Article CAS PubMed Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 (D1), D353–D361 (2017).
Article CAS PubMed Google Scholar
Fabregat, A. et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 44 (D1), D481–D487 (2016).
Article CAS PubMed Google Scholar
Hochberg, Y. B. Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. A Stat. Soc. 57, 289–300 (1995).
MathSciNet MATH Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
List, M. et al. KeyPathwayMinerWeb: online multi-omics network enrichment. Nucleic Acids Res. 44 (W1), W98–W104 (2016).
Article CAS PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. 20 years of the SMART protein domain annotation resource. Nucleic Acids Res. 46 (D1), D493–D496 (2018).
Article CAS PubMed Google Scholar
Wagih, O. ggseqlogo: a versatile R package for drawing sequence logos. Bioinformatics 33, 3645–3647 (2017).
Article CAS PubMed Google Scholar
Goel, R., Harsha, H. C., Pandey, A. & Prasad, T. S. Human Protein Reference Database and Human Proteinpedia as resources for phosphoproteome analysis. Mol. Biosyst. 8, 453–463 (2012).
Article CAS PubMed Google Scholar
Zourelidou, M. et al. The polarly localized D6 PROTEIN KINASE is required for efficient auxin transport in Arabidopsis thaliana. Development 136, 627–636 (2009).
Article CAS PubMed Google Scholar
Mayer, U. B. G. & Jurgens, G. Apical-basal pattern formation in the Arabidopsis embryo: studies on the role of the gnom gene. Development 177, 149–162 (1993).
Google Scholar
Moes, D., Himmelbach, A., Korte, A., Haberer, G. & Grill, E. Nuclear localization of the mutant protein phosphatase abi1 is required for insensitivity towards ABA responses in Arabidopsis. Plant J. 54, 806–819 (2008).
Article CAS PubMed Google Scholar
Tischer, S. V. et al. Combinatorial interaction network of abscisic acid receptors and coreceptors from Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 114, 10280–10285 (2017).
Article CAS PubMed PubMed Central Google Scholar
Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46 (W1), W296–W303 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nishimura, N. et al. Structural mechanism of abscisic acid binding and signaling by dimeric PYR1. Science 326, 1373–1379 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article ADS CAS PubMed PubMed Central Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Article CAS PubMed Google Scholar
Box, M. S., Coustham, V., Dean, C. & Mylne, J. S. Protocol: A simple phenol-based method for 96-well extraction of high quality RNA from Arabidopsis. Plant Methods 7, 7 (2011).
Article CAS PubMed PubMed Central Google Scholar
Enugutti, B. et al. Regulation of planar growth by the Arabidopsis AGC protein kinase UNICORN. Proc. Natl Acad. Sci. USA 109, 15060–15065 (2012).
Article ADS PubMed PubMed Central Google Scholar
Koncz, C. & Schell, J. The promoter of TL-DNA gene 5 controls the tissue-specific expression of chimaeric genes carried by a novel type of Agrobacterium binary vector. Molecular and General Genetics MGG 204, 383–396 (1986).
Article CAS Google Scholar
Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682 (2012).
Article CAS PubMed Google Scholar
Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44 (D1), D447–D456 (2016).
Article CAS PubMed Google Scholar
Kwok, S. F. et al. Arabidopsis homologs of a c-Jun coactivator are present both in monomeric form and in the COP9 complex, and their abundance is differentially affected by the pleiotropic cop/det/fus mutations. Plant Cell 10, 1779–1790 (1998).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the NGS@tum core facility for RNA sequencing, R. Tofanelli for help with imaging the ovules, R. J. Schmitz for providing data access for the feature analysis and M. Reinecke, F. Bayer and S. Galinec for mass spectrometry measurements. This work was in part funded by the German Science Foundation (DFG, SFB924), a research fellowship to H.S. by the Japan Society for the Promotion of Sciences, and a research fellowship to X.C. by the Chinese Research Council.

Author information

Authors and Affiliations

Chair of Proteomics and Bioanalytics, Technical University of Munich (TUM), Freising, Germany
Julia Mergner, Martin Frejno, Patroklos Samaras, Daniel P. Zolg, Tobias Schmidt, Mathias Wilhelm & Bernhard Kuster
Chair of Experimental Bioinformatics, Technical University of Munich (TUM), Freising, Germany
Markus List & Jan Baumbach
Chair of Botany, Technical University of Munich (TUM), Freising, Germany
Michael Papacek & Erwin Grill
Plant Developmental Biology, Technical University of Munich (TUM), Freising, Germany
Xia Chen, Ajeet Chaudhary & Kay Schneitz
Center for Plant Molecular Biology, University of Tübingen, Tübingen, Germany
Sandra Richter & Gerd Jürgens
Chair of Plant Systems Biology, Technical University of Munich (TUM), Freising, Germany
Hiromasa Shikata & Claus Schwechheimer
Devision of Plant Environmental Responses, National Institute for Basic Biology, Okazaki, Japan
Hiromasa Shikata
Plant Genome and Systems Biology, Helmholtz Center Munich, German Research Center for Environmental Health, Munich-Neuherberg, Germany
Maxim Messerer, Daniel Lang & Klaus F. X. Mayer
Institute of Network Biology (INET), Helmholtz Center Munich, German Research Center for Environmental Health, Munich-Neuherberg, Germany
Stefan Altmann & Pascal Falter-Braun
Cell Biology and Plant Biochemistry, University of Regensburg, Regensburg, Germany
Philipp Cyprys & Stefanie Sprunck
Cellzome GmbH, Heidelberg, Germany
Toby Mathieson & Marcus Bantscheff
Population Epigenetics and Epigenomics, Technical University of Munich (TUM), Freising, Germany
Rashmi R. Hazarika & Frank Johannes
Institute of Advanced Study (IAS), Technical University of Munich (TUM), Freising, Germany
Rashmi R. Hazarika, Frank Johannes & Bernhard Kuster
Chair of Food Chemistry and Molecular Sensory Science, Technical University of Munich (TUM), Freising, Germany
Corinna Dawid, Andreas Dunkel & Thomas Hofmann
Chair of Microbe-Host Interactions, Ludwigs-Maximilians-University (LMU), Munich, Germany
Pascal Falter-Braun
Plant Genome Biology, Technical University of Munich (TUM), Freising, Germany
Klaus F. X. Mayer
Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), Technical University of Munich (TUM), Freising, Germany
Bernhard Kuster

Authors

Julia Mergner
View author publications
You can also search for this author in PubMed Google Scholar
Martin Frejno
View author publications
You can also search for this author in PubMed Google Scholar
Markus List
View author publications
You can also search for this author in PubMed Google Scholar
Michael Papacek
View author publications
You can also search for this author in PubMed Google Scholar
Xia Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ajeet Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Patroklos Samaras
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Richter
View author publications
You can also search for this author in PubMed Google Scholar
Hiromasa Shikata
View author publications
You can also search for this author in PubMed Google Scholar
Maxim Messerer
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Lang
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Altmann
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Cyprys
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Zolg
View author publications
You can also search for this author in PubMed Google Scholar
Toby Mathieson
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Bantscheff
View author publications
You can also search for this author in PubMed Google Scholar
Rashmi R. Hazarika
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Schmidt
View author publications
You can also search for this author in PubMed Google Scholar
Corinna Dawid
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dunkel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Hofmann
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Sprunck
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Falter-Braun
View author publications
You can also search for this author in PubMed Google Scholar
Frank Johannes
View author publications
You can also search for this author in PubMed Google Scholar
Klaus F. X. Mayer
View author publications
You can also search for this author in PubMed Google Scholar
Gerd Jürgens
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
Jan Baumbach
View author publications
You can also search for this author in PubMed Google Scholar
Erwin Grill
View author publications
You can also search for this author in PubMed Google Scholar
Kay Schneitz
View author publications
You can also search for this author in PubMed Google Scholar
Claus Schwechheimer
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Kuster
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M. performed (phosho)proteomic and transcriptomic experiments under the supervision of B.K. S.R. and H.S. performed AGC kinase experiments in plants under the supervision of G.J. and C.S. M.P., A.C. and X.C. performed phosphomutant analysis under the supervision of E.G. and K.S. P.C. and S.S. generated and provided plant material. J.M., M.F., M.M., D.L., S.A., D.P.Z., T.M., C.D., A.D. and R.R.H. performed data analysis under the supervision of B.K., K.F.X.M., P.F., M.B., T.H. and F.J. M.L., P.S. and T.S. generated Arabidopsis resource databases under supervision of M.W. and J.B. J.M., C.S. and B.K. conceptualized the project and wrote the manuscript. All authors edited the manuscript.

Corresponding author

Correspondence to Bernhard Kuster.

Ethics declarations

Competing interests

M.W. and B.K. are founders and shareholders of OmicScouts GmbH and msAId GmbH. They have no operational role in the companies. M.F. and D.P.Z. are founders and shareholders of msAId GmbH. T.M. and M.B. are employees and/or shareholders of Cellzome GmbH. The remaining authors declare no competing interests.

Additional information

Peer review information Nature thanks José Dinneny, Paul Haynes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Descriptive analysis of the multi-omic tissue atlas.

a, Pairwise global Pearson’s expression correlation analysis of all 30 tissues (n = 1 measurement per tissue) on the transcriptome level (bottom triangle) and proteome level (top triangle) using all identified gene loci. Proteins correlate more strongly between tissues than transcripts. Turquoise squares mark examples for morphologically highly similar tissues. Tissues are coloured as in Fig. 1. b, Scatter plots showing highly reproducible abundance measurements for transcript (top) and protein (bottom) in morphologically similar tissues that were marked in a; namely, node (ND) versus internode (IND), leaf distal (LFD) versus leaf proximal (LFP) and root (RT) versus root upper zone (RTUZ). r denotes the Pearson’s correlation coefficient; n denotes the number of transcripts or proteins. c, Percentage of genes encoded by a specific chromosome that were identified at the transcriptome, proteome or phosphoproteome level. d, Percentage of Swiss-Prot and TrEMBL protein database entries as well as protein evidence categories from UniProt that were identified at the transcriptome, proteome or phosphoproteome level. Evidence level: (1) protein evidence; (2) transcript evidence; (3) homology; (4) predicted; and (5) uncertain. e, Comparison of protein identifications between an earlier Arabidopsis proteome study⁷ based on 12 tissues, this study (30 tissues) and the number of protein-coding genes in Araport11. f, iBAQ intensity distribution of proteins identified in this study. Proteins also identified in a previous study⁷ are projected into the same plot. g, Left, proportion of identified P-sites on S, T or Y residues with highly confident localization of the phosphorylation site within the identified peptide sequence (termed class I P-sites if the localization score is greater than 0.75). Right, distribution of proteins for which phosphorylated S, T or Y residues were identified. h, Left, Venn diagram comparing phosphoprotein datasets from a previous publication⁸, PhosPhAT4.0 and this study. Right, Venn diagram comparing P-site localization confidence between class I sites identified in this study and the low and high confidence datasets reported in a previous publication⁸.

Source data

Extended Data Fig. 2 Proteogenomics and dynamic range of transcript and protein expression.

a, Number of identified N-terminal (NT) or C-terminal (CT) peptides of proteins in either unmodified or phosphorylated form. b, Frequency of amino acids following the initiator methionine in N-terminal peptides with (−X) or without (M−X) cleavage of the initiator methionine. X denotes the amino acid after the start codon. c, Frequency of protein N-terminal acetylation for amino acids in b. Because trypsin was used for protein digestion, the frequencies for Arg and Lys residues could not be determined (n.d.). d, Distribution of peptide-based sequence coverage of proteins in individual tissues and for the combined dataset (tissue abbreviations as in Fig. 1). Boxes contain 50% of the data and show the median as a black line. The top and bottom quartile ranges are shown as whiskers. The number of proteins is indicated for each tissue. e, Pie charts showing the percentage of proteins identified by <3, 3–10 or >10 peptides either allowing shared (razor) peptides or restricting to unique peptides only. f, Left, number of protein isoforms detected at the transcript and protein level compared with the number of all annotated isoforms in Araport11. Right, number of multiple isoforms of the same gene distinguished at the peptide level. g, Validation of protein isoform and sORF identification by comparing the tandem mass spectra from the tissue atlas to those of synthetic peptide reference standards. The normalized spectral contrast angle (SA) was used as a similarity metric (Methods). Candidate isoforms and sORFs were considered valid if the spectral contrast angle of the spectra was >0.7. These data are reported in Supplementary Data 3. h, Amino acid sequence and mirror plots of tandem mass spectra for two peptides of the sORF BIP138_4. The spectra pointing upwards were collected from tissue digests; those pointing downwards were collected from synthetic peptides. The normalized spectral contrast angle and Pearson’s correlation coefficient (r) were used as similarity metrics (Methods) and indicate that both high-scoring spectra (n = 1 acquired spectra) are near identical, thus validating the identification of this sORF as an expressed protein. i, Dynamic range of transcript abundance (grey) and proportion of transcripts that were also identified at the protein level projected into this plot (blue). OM, orders of magnitude. Note that for lower abundance transcripts, fewer proteins were detected. j, Dynamic range of protein abundance and proportion of proteins with phosphorylation evidence. Protein abundance spans six orders of magnitude, whereas transcript abundance only spans four (i). In addition, note that phosphorylation was detected across the entire protein abundance range. k, Percentage of all annotated kinases (K), phosphatases (P), transcription factors (TF) and transcription regulators (TR) detected at the transcript, protein or phosphoprotein levels. Numbers below the x axis denote the number of genes for these protein classes in the A. thaliana genome.

Source data

Extended Data Fig. 3 Descriptive analysis of transcript and protein expression in tissues.

a, Distribution of expression specificity categories for protein and transcript identifications. See Methods for the definition of these categories. In brief, there are very few transcripts and proteins that are only expressed in a single tissue. The quantities of the shared transcripts or proteins can differ vastly between tissues (b). b, Left, protein identifications shared between flower (FL) and flower organs showing an almost complete qualitative overlap of proteins. Sepal (SP), petal (PT), stamen (ST), carpel (CP). Right, clustering of z-scored protein intensities showing distinct quantitative expression differences between flower organs. c, Expression analysis of flower organ identity marker at the protein and transcript level. PISTILLATA (PI, green), APETALA3 (AP3, red), APETALA1 (AP1, orange), AGAMOUS (AG, blue). The expression of these markers is in line with the model of flower organ identity (AP1 expression marking sepal, AP1, AP3, PI marking petal, AG, AP3, PI marking stamen and AG marking carpel). d, Total number of transcripts plotted against the total number of proteins detected in each individual tissue (n = 30 tissues) showing that the more genes are expressed as mRNAs, the more proteins can be detected in a tissue (Pearson’s correlation r = 0.79). Tissues are coloured according to tissue groups as in Fig. 1. e, Cumulative abundance plots of intensity-ranked identifications of transcripts and proteins for five representative tissues. The five most abundant transcripts and proteins are listed in descending order for each tissue. These are generally not the same. In addition, note that the characteristics of the plots are not the same for all tissues. In flower, the protein line rises more quickly than the transcript line. The opposite is true for pollen and a more even characteristic is observed in seed. f, Distribution of shared and unique identifications among the 100 most abundant transcripts and proteins in each tissue. Relatively few proteins and transcripts are found together on the list of the 100 most abundant transcripts and proteins. This demonstrates that the quantitative differences in transcript and protein expression are more important in defining a tissue than the qualitative expression of transcripts or proteins. g, List of 11 proteins that were found as the most abundant protein (in at least one tissue) and their proportion of the total iBAQ intensity in each tissue. Individual proteins can represent up to 9% of the total protein in a given tissue. h, Principal component analysis (PCA) of the core tissue proteomes and transcriptomes (that is, the proteins and transcripts that were identified in every tissue) using z-scored abundances. Only about 30% of all protein and 20% of all mRNAs were detected in every of the 30 tissues despite the fact that all tissues were deeply profiled at both protein and transcript level. This shows that strong qualitative and quantitative expression differences exist between tissues. The PCA separates tissues into photosynthetically active versus inactive tissues (component 1) and separates pollen from all other tissues (component 2), indicating that the molecular composition of pollen is particularly different from all other tissues. i, Proportion of the total summed protein intensity for genes with specific subcellular compartment annotation (from SUBA⁷⁷; Methods) in the different tissue groups. The comparison of photosynthetically active and inactive tissues shows that most of the protein content in photosynthetically active tissues is contained in the plastids, whereas most protein is found in the cytosol for photosynthetically inactive tissues. Proteins with only one single subcellular compartment annotation were selected for the plot and the proportion of their iBAQ intensities were averaged for each tissue group. Nucleus (n = 1,393), endoplasmatic reticulum (n = 58), Golgi (n = 68), peroxisome (n = 67), plastid (n = 525), mitochondrion (n = 317), vacuole (n = 71), cytosol (n = 385), cytoskeleton (n = 1), plasma membrane (n = 268), extracellular (n = 351).

Source data

Extended Data Fig. 4 Relationships between transcript and protein levels.

a, Pearson’s correlation (r) of transcriptome and proteome expression (core datasets; n = 5,043) for each tissue. b, Pearson’s correlation between measured and predicted protein abundance levels in all tissues. Predicted protein abundance levels were obtained from the best fitting feature selection model for each tissue (Methods). The number of genes used for the correlation analysis is indicated for each tissue. c, Violin plots showing the spread in relative contribution of selected features to the prediction of gene-level protein abundance across tissues (n = 30 tissues) using our model. Violin shapes show the kernel density estimation of the data distribution and the median as white dot. Thick black bars denote the interquartile range. d, Specific nucleotide sequence motifs in 5′ UTRs of mRNAs contribute to the prediction of protein levels in a subset of tissues. Clustering tissues based on the presence or absence of detected 5′ UTR motifs shows that several features are repeatedly selected for inclusion in the model while others appear to be more tissue-specific. e, On the basis of the observation that the d_N/d_S ratio between orthologous of A. thaliana and A. lyrata contributed to the prediction of protein levels (c), we analysed this feature in more detail. Left, distribution of the d_N/d_S ratio for orthologous genes in A. thaliana and A. lyrata. The distribution is plotted for the example of ‘leaf distal’ (n = 6,447 genes). To compare evolutionarily conserved genes (defined by low d_N/d_S ratios) and genes that evolve neutrally or are under positive selection (high d_N/d_S ratios), we selected the bottom 5% and top 5% of the d_N/d_S ratio distribution, respectively. Right, evolutionarily conserved genes (low d_N/d_S ratio) show 10–20 times higher protein abundance than genes under evolutionary pressure. Boxes contain 50% of the data and show the median as a black line. Whiskers denote 1.5 times the interquartile range. Outliers were omitted from the plot for clarity. f, Time-course analysis of median protein abundance changes after treatment with CHX (translation block) or MG132 (proteasome block) versus time-matched DMSO control samples (Methods). Boxes contain 50% of the data and show medians as black lines. Whiskers denote 1.5 times the interquartile range. Outliers were omitted from the plot for clarity but were included in the statistical tests below. All proteins in the experiment (n = 8,920, grey), proteins that have a high PTR in seed (n = 425, red) or a low PTR in seed (n = 254, blue) (defined as in Fig. 3d) are shown. Differences between time points were tested for significance within each subset (all; high PTR; low PTR) using one-way ANOVA and the post hoc Tukey HSD test. ***P < 0.001 (all_CHX8–CHX16: P < 1 × 10⁻⁷; all_CHX8–CHX24: P < 1 × 10⁻⁷; all_CHX16–CHX24: P = 0.0002; highPTR_CHX8–CHX24: P = 0.0003; lowPTR_CHX8–CHX16: P = 0.0000004; lowPTR_CHX8–CHX24: P < 1 × 10⁻⁷; lowPTR_CHX16–CHX24: P < 1 × 10⁻⁷). g, Representative images of seeds after 4 days of incubation with CHX, MG132 or DMSO control medium (n = 1). Germination was completely inhibited by CHX and partially inhibited by MG132, showing that the drug treatments were effective.

Source data

Extended Data Fig. 5 Correlations between transcriptomes, proteomes and phosphoproteomes.

a, Median PTRs across tissues plotted against the inter-tissue variation of these PTRs (expressed as MAD; proteins and transcripts had to be detected in at least 10 matching tissues to be included in the analysis). Arrows denote examples of genes with high PTRs (rbcL and petA) and low PTRs (IAA8 and IAA13). Bar plot shows the MAD range segmented into five quantiles, each containing the same number of genes (coloured bars and dashed lines). Most genes have reasonably stable PTRs across tissues. b, As in a (dataset n = 14,069) but for transcript (left) and protein (right) measurements. There is more variation in protein levels across tissues than there is mRNA variation (80% of all transcripts show a MAD of <1; 80% of all proteins show a MAD of 1.2). There is also more variation in the protein levels across tissues for low abundant proteins. This may in part be due to technical limitations as low abundance proteins can generally be less accurately quantified. c, As in a but for the ratio of phosphorylation site versus protein abundance. P-sites and proteins had to be detected in at least 10 matching tissues to be included in the analysis (n = 13,793). d, As in b (dataset n = 13,793) but for P-site abundance. P-site abundance shows greater variation across tissues than protein abundance (60% of all P-sites show MAD <1 compared with 80% of all proteins; see b). Again, this may in part be due to technical limitations as P-site quantification is performed on a peptide level and does not benefit from aggregating multiple peptide quantifications into one value for protein quantification.

Source data

Extended Data Fig. 6 Inferring redundant gene function and physical interactions from co-expression analysis.

a, Scatter plot of Pearson’s correlation coefficients (r) as a measure for co-expression across tissues for all pairs of proteins (x axis) and all pairs of transcripts (y axis) (core dataset only, n = 5,043) along with their marginal histograms. Colours denote the log₁₀-normalized STRING scores of individual gene pairs as a measure of known or predicted direct (physical) or indirect (functional) associations. Strong co-expression of transcripts or proteins or both are more strongly related (physically or functionally) than transcripts and proteins that are not. b, Co-expression analysis of duplicated genes (pairs had to be detected in at least 10 matching tissues to be included in the analysis). The density plots show the distribution of Pearson’s correlation coefficients (r) of co-expressed transcripts (grey) or proteins (blue) for genes that arose by whole-genome duplications (WGD), local duplications or transposon-mediated duplications. Randomly selected gene pairs are shown as a control. Medians are given and displayed as dotted lines. There is substantial co-expression of duplicated genes, indicating that these genes probably have redundant functions. c, Left, protein-level Pearson’s correlation coefficient (r) values (from b) for all duplicate gene pairs (WGD, local, transposed) plotted against the protein abundance ratio of each pair (average across 30 tissues) (Methods). Blue arrows denote an example of a high or low ratio of protein production for the duplicated genes. Right, example for tissue-resolved protein intensity proportions (top-3) (Methods) for the duplicate pair MAC5A and MAC5B. Irrespective of the tissue, MAC5A is always much higher expressed than MAC5B. Tissues are coloured as in Fig. 1. d, Top, ranked protein abundance ratio for selected duplicate pairs (mean ± s.d.; n = 30) and annotated for phenotypic effects (bottom) in the loss-of-function mutant for either duplicate 1 or duplicate 2 (+). Minus symbols denotes absence of a phenotypic effect. Asymmetric protein production within duplicate pairs can be associated with the occurrence of a phenotype in the loss-of-function mutant of the higher expressed duplicate protein, indicating a dominant functional role of the more highly expressed protein. Blue arrows highlight MAC5A–MAC5B and PHB3–PHB4 as examples. e, Inference of physical protein–protein interactions from co-expression data. Distribution of pairwise Pearson’s correlation coefficients (r) of co-expressed proteins across (at least 10) tissues that are subunits of selected protein complexes. r > 0.5 (shaded in grey) was chosen as a cut-off for the selection of proteins for subsequent analysis to make sure that proteins present in well-characterized protein complexes are retained. CONSTITUTIVE PHOTOMORPHOGENESIS9 SIGNALOSOME (CSN), CELLULOSE SYNTHASE (CESA). f, Recovery of annotated protein–protein interactions by co-expression analysis. Distribution of Pearson’s correlation coefficients (r) of pairs of transcripts (grey) or protein (blue) that are annotated to interact physically in the AtPIN database³³ (pairs had to be detected in at least 10 matching tissues to be included in the analysis). Subsets of the AtPIN database, namely interactions detected by the yeast two-hybrid (Y2H) method, by affinity purification–mass spectrometry (AP–MS) or both. r > 0.5 are shaded in blue (protein). Dotted lines denote median values. Co-expression only recovers a minority of annotated physical interactions andinteractions supported by more than one line of experimental evidence also tend to show stronger co-expression.

Source data

Extended Data Fig. 7 Inferring protein complexes and subunit stoichiometry from proteome correlation profiling using SEC–MS.

a, Molecular mass (MW) of monomeric proteins (determined from sequence) plotted against the mass determined from the apex of the elution profile for proteins identified by SEC–MS fractions of flower tissue (sFL). Inset shows the molecular mass calibration of the SEC column using a protein calibration standard (mass between 44 and 690 kDa). The distribution of proteins annotated in Araport11 is shown at the top. Many proteins show a much higher apparent molecular mass than would be expected from their sequences (data points above the x = y line). This suggests that these proteins engage in physical protein interactions that are sufficiently stable during SEC separation. b, SEC traces of proteins from five well-characterized protein complexes for flower, leaf and root tissue. Although the resolution of SEC separations is not very high, the complex subunits show very strong co-elution behaviour and the SEC separations of the five complexes are reproducible between tissues. CoA carboxylase n = 4 proteins; CDC48 n = 3 proteins; RubisCO n = 4 proteins; prefoldin n = 6 proteins; SCS n = 3 proteins. c, Intensity-normalized SEC elution profile of proteins for flower tissue. Proteins are ordered based on the SEC fraction in which their intensity peaks and the data are displayed as a heat map (n = 2,485 protein traces). Co-eluting proteins were grouped into ‘trace modules’ (Methods). Proteins in trace modules may represent members of protein complexes and thus serve as candidates for further experimental validation. d, To quantify how well protein complexes can be detected using co-expression analysis from data in the tissue atlas (TA) or by SEC–MS, a summary statistic termed ‘complex index’ was calculated (Methods). The complex index is 1 when all subunits of a complex are identified in the same module and no other proteins are contained in the module. Bar plots show examples for complex indices obtained from the different datasets and are divided into large (>4 subunits) and small (≤4 subunits) protein complexes (according to UniProt). Co-expression alone generates many candidates of interactors, but combining co-expression and SEC–MS analysis is an efficient way to prioritize candidates for follow-up experiments. e, Subunit heterogeneity within the coatomer complex. The coatomer complex consists of seven subunits, five of which (α, β, β′, ε and ζ) can be provided by twelve paralogues of these five genes. Plots show the protein proportions of these paralogues in all 30 tissues (data from tissue atlas). The coatomer complex has a similar composition in most tissues. A notable exception is seed tissues, in which production of subunit ζ-1 dominates over the two other paralogous proteins, suggesting that the coatomer complex in seed tissue also preferentially contains the ζ-1 subunit. Tissues are coloured as in Fig. 1. f, Absolute SEC intensity traces of individual complex subunits for determining subunit stoichiometry. Examples from left to right: the chaperonin complex (flower, 8 proteins, ratio of all subunits: 1:1), the 26S proteasome core and lid (flower, 14+17 proteins, ratio of all subunits: 1:1), the COP9 signalosome (flower, CSN; 8 proteins, ratio of all subunits: 1:1) and the CESA1–CESA3–CESA6 complex (root, 3 proteins, ratio of all subunits: 1:1). CSN3 and CSN5 were detected both as part of the CSN complex and in monomeric form. g, Top, total intensity of protein complex subunits across all tissues for the complexes shown in f (subunit intensities from the tissue atlas). Middle, relative proportion (mean ± s.d.; n = 30 tissues) of subunits across tissues (Methods). For the CESA complex, ratios were calculated for the subunit combinations CESA1–CESA3–CESA6 and CESA4–CESA7–CESA8. The stoichiometries determined from the tissue expression data are generally well-aligned with the expected 1:1 ratio of subunits in these complexes. As noted in f, a substantial amount of CSN5 was detected as a monomer in the SEC analysis, and the tissue expression atlas also shows higher relative expression of this protein compared with all other complex partners. This suggests that the protein is produced in excess over what is required for the COP9 complex (as observed previously¹²³), and may therefore indicate an additional function within the cell.

Source data

Extended Data Fig. 8 Kinases, phosphatases and phosphorylation motifs.

a, Percentage of annotated kinases and phosphatases family members detected at the protein or phosphoprotein level. Parentheses denote the number of genes in each family in the Arabidopsis genome. b, Tissue-resolved combined intensity (that is, protein abundance) of families of kinases (left) and phosphatases (right). Tissues are coloured as in Fig. 1. Several tissues (notably pollen) stand out in terms of the expression of kinases and phosphatases, which indicates that these tissues are particularly active in phosphorylation-mediated dynamic signalling. c, Top, pie chart of specificity categories for kinases and phosphatases (see Methods for definition). Bottom, distribution of tissue-enhanced kinases and phosphatases across the 30 tissues. Several tissues (such as pollen) stand out in terms of the expression of certain kinases and phosphatases, which indicates tissue-specific signalling. d, Pie charts showing the proportion of proline-directed, acidic, basic and other motif categories for phosphorylated Ser (pS), Thr (pT) and Tyr (pY) residues. Only class I P-sites (localization score > 0.75; Methods) were considered in this analysis. e, Example motif logo plots for motifs such as proline-directed, acidic and basic. P-site motifs were identified using the motif-X algorithm (see Supplementary Table 2 for all 266 motifs). n denotes the number of phosphorylation sites that contain the respective motif; ‘fc’ denotes the fold change (that is, enrichment) of the motif in phosphorylated versus unmodified peptides (Methods). f, Enrichment of proline-directed (yellow), acidic (red), basic (blue) and other (grey) sequence motifs (circles) in the serine P-site dataset versus the same motifs detected in the background dataset of unmodified peptides (Methods). Motifs are shown for two, three and four fixed amino acid positions. The P-site in each motif example is underlined. ‘X’ denotes any amino acid. g, Number of identified P-sites for a given protein plotted against the sequence lengths of the same protein. LEA proteins are shown. h, Schematic of the LEA protein sequences (black bars). Pink denotes phosphorylated and blue denotes unphosphorylated STY residues. Almost all STY residues in LEA proteins can be phosphorylated. i, Schematic of the sequences and domain topology of the receptor-like kinases SRF4, FER and CERK1. P-sites often preferentially occur in specific domains, notably the juxtamembrane domain. Protein sequence regions covered by identified peptides are marked in blue, and P-sites are marked in pink.

Source data

Extended Data Fig. 9 Functional analysis of phosphorylation mutants of RCAR10 and QKY.

a, P-site localization within the structure of RCAR10. The RCAR10 structure (blue) was modelled using the RCAR11 protein crystal structure (cornflower blue) as a template¹¹⁵. ABA-binding loops are shown in turquoise, P-sites in pink, and ABA ligand in yellow. b, RCAR10 expression across tissues at the protein (blue, iBAQ), transcript (grey, TPM) and P-site (pink, intensity) level. c, Tissue-resolved total protein intensity and relative proportions of the members of the PP2C co-receptor family. Seed tissues stand out in terms of overall expression as well as the dominance of AHG1 in these tissues. d, Measurement of the ABA response after expression of RCAR10 or phosphomimetic mutant variants in combination with different PP2C co-receptors in protoplasts (Methods). Columns display the average ABA response (mean ± s.d., n = 3) and grey dots indicate individual measurements. Co-expression of the phosphatases HAI1–HAI3 leads to similar responses in both phosphomimetic mutants, whereas other co-expressed phosphatases show diverse responses. e, QKY expression across tissues at the protein (blue, iBAQ), transcript (grey, TPM) and P-site (pink, intensity) level. f, Members of the MCTP family clustered by sequence similarity (left) and schematic of their domain structures along with detected P-sites (right). MCTP11a, MCTP12 and MCTP13 were not detected (n.d.) in this study. MCTP15 (also known as QKY) is in bold. g, Number of independent transgenic plant lines (qky-9 mutant background) transformed with wild-type QKY, phosphomutant (S262A, SA; blue) or phosphomimetic (S262E, SE; purple) constructs that show complete, partial or no rescue of the mutant phenotype. qPCR results (mean ± s.d., n = 3; individual data points as grey dots) show the relative transgene expression in wild-type, qky-9 mutant and selected transgenic lines. h–j, Representative confocal images of six-day-old qky-9 pQKY::mCherry:QKY (WT QKY; n = 14 roots), qky-9 pQKY::mCherry:QKY(S262A) (phosphomutant; n = 21 roots), qky-9 pQKY::mCherry:QKY(S262E) (phosphomimetic; n = 15 roots) root epidermal cells of the meristematic zone. The punctate signal along the cell circumference shows the expected localization of the QKY protein. Arrows indicate punctate structures. Scale bars, 5 μm.

Source data

Supplementary information

Supplementary Tables:

This file contains Supplementary Table 1: List of validated peptide spectra for uncertain proteins including mirror plots between experimental and predicted spectra. Supplementary Table 2: Position weight matrix logo plots for all identified serine, threonine and tyrosine phosphorylation motifs.

Reporting Summary

Supplementary Data

Supplementary Data 1: This file contains tissue sample names and a description of their growth conditions.

Supplementary Data

Supplementary Data 2: This file contains the tissue atlas expression values on transcriptome, proteome and p-sites level.

Supplementary Data

Supplementary Data 3: This file contains a summary of expression evidence for gene isoforms and sORF sequences.

Supplementary Data

Supplementary Data 4: This file contains the results for 1D GO term enrichment analyses using transcript abundance or protein to RNA ratio (PTR) values as input.

Supplementary Data

Supplementary Data 5: This file contains a description of feature analysis parameters used in the protein abundance prediction model.

Supplementary Data

Supplementary Data 6: This file contains protein expression values for selected paralog genes and information about their mutational phenotypes.

Supplementary Data

Supplementary Data 7: This file contains size exclusion chromatography protein elution profiles for flower, leaf and root tissue.

Supplementary Data

Supplementary Data 8: This file contains GO term enrichment analysis for WGCNA protein and trace modules and lists gene identifiers in protein and trace module intersections.

Supplementary Data

Supplementary Data 9: This file contains annotations of selected complexes and their identification in tissue atlas or size-exclusion chromatography experiments.

Supplementary Data

Supplementary Data 10: This file lists all identified serine, threonine and tyrosine phosphorylation motifs.

Supplementary Data

Supplementary Data 11: This file contains a list of all primer sequences and gene IDs used in this study.

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Fig. 4

Source Data Fig. 5

Source Data Extended Data Fig. 1

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 4

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mergner, J., Frejno, M., List, M. et al. Mass-spectrometry-based draft of the Arabidopsis proteome. Nature 579, 409–414 (2020). https://doi.org/10.1038/s41586-020-2094-2

Download citation

Received: 04 June 2019
Accepted: 17 January 2020
Published: 11 March 2020
Issue Date: 19 March 2020
DOI: https://doi.org/10.1038/s41586-020-2094-2

This article is cited by

Generation and characterisation of an Arabidopsis thaliana f3h/fls1/ans triple mutant that accumulates eriodictyol derivatives
- Hanna Marie Schilbert
- Mareike Busche
- Ralf Stracke
BMC Plant Biology (2024)
D6PK plasma membrane polarity requires a repeated CXX(X)P motif and PDK1-dependent phosphorylation
- Alina Graf
- Alkistis Eleftheria Lanassa Bassukas
- Claus Schwechheimer
Nature Plants (2024)
Chromosomal level genome assemblies of two Malus crabapple cultivars Flame and Royalty
- Hua Li
- Xuyang Zhai
- Yi Zheng
Scientific Data (2024)
High allelic diversity in Arabidopsis NLRs is associated with distinct genomic features
- Chandler A Sutherland
- Daniil M Prigozhin
- Ksenia V Krasileva
EMBO Reports (2024)
Innovative computational tools provide new insights into the polyploid wheat genome
- Yongming Chen
- Wenxi Wang
- Weilong Guo
aBIOTECH (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links