Abstract
Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry–based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Accession codes
Primary accessions
ArrayExpress
Referenced accessions
NCBI Reference Sequence
References
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).
Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Brewis, I.A. & Brennan, P. Proteomics technologies for the global identification and quantification of proteins. Adv. Protein Chem. Struct. Biol. 80, 1–44 (2010).
Lamond, A.I. et al. Advancing cell biology through proteomics in space and time (PROSPECTS). Mol. Cell. Proteomics 11, O112.017731 (2012).
Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 73, 2092–2123 (2010).
Lundberg, E. et al. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 6, 450 (2010).
Li, M. et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333, 53–58 (2011).
Castellana, N. & Bafna, V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73, 2124–2135 (2010).
Volkening, J.D. et al. A proteogenomic survey of the Medicago truncatula genome. Mol. Cell. Proteomics 11, 933–944 (2012).
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
Wu, B.J., Hurst, H.C., Jones, N.C. & Morimoto, R.I. The E1A 13S product of adenovirus 5 activates transcription of the cellular human HSP70 gene. Mol. Cell. Biol. 6, 2994–2999 (1986).
Dallaire, F., Blanchette, P. & Branton, P.E. A proteomic approach to identify candidate substrates of human adenovirus E4orf6-E1B55K and other viral cullin-based E3 ubiquitin ligases. J. Virol. 83, 12172–12184 (2009).
Evans, J.D. & Hearing, P. Relocalization of the Mre11-Rad50-Nbs1 complex by the adenovirus E4 ORF3 protein is required for viral replication. J. Virol. 79, 6207–6215 (2005).
Lam, Y.W., Evans, V.C., Heesom, K.J., Lamond, A.I. & Matthews, D.A. Proteomics analysis of the nucleolus in adenovirus-infected cells. Mol. Cell. Proteomics 9, 117–130 (2010).
Blankenberg, D. et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89, 19.10 (2010).
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Swift, F.V., Bhat, K., Younghusband, H.B. & Hamada, H. Characterization of a cell type-specific enhancer found in the human papilloma virus type 18 genome. EMBO J. 6, 1339–1344 (1987).
Zhao, H., Granberg, F., Elfineh, L., Pettersson, U. & Svensson, C. Strategic attack on host cell gene expression during adenovirus infection. J. Virol. 77, 11006–11015 (2003).
Zhao, H., Dahlo, M., Isaksson, A., Syvanen, A.C. & Pettersson, U. The transcriptome of the adenovirus infected cell. Virology 424, 115–128 (2012).
Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Soloway, P.D. & Shenk, T. The adenovirus type 5 i-leader open reading frame functions in cis to reduce the half-life of L1 mRNAs. J. Virol. 64, 551–558 (1990).
Symington, J.S. et al. Biosynthesis of adenovirus type 2 i-leader protein. J. Virol. 57, 848–856 (1986).
van den Hengel, S.K. et al. Truncating the i-leader open reading frame enhances release of human adenovirus type 5 in glioma cells. Virol. J. 8, 162 (2011).
Xu, X. et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat. Biotechnol. 29, 735–741 (2011).
Cingolani, P et al. snpEff SNP effect predictor v.3.0 〈http://snpeff.sourceforge.net/〉 (2012).
O'Shea, C.C. et al. Late viral RNA export, rather than p53 inactivation, determines ONYX-015 tumor selectivity. Cancer Cell 6, 611–623 (2004).
Orazio, N.I., Naeger, C.M., Karlseder, J. & Weitzman, M.D. The adenovirus E1b55K/E4orf6 complex induces degradation of the Bloom helicase during infection. J. Virol. 85, 1887–1892 (2011).
Woo, J.L. & Berk, A.J. Adenovirus ubiquitin-protein ligase stimulates viral late mRNA nuclear export. J. Virol. 81, 575–587 (2007).
Boisvert, F.M. et al. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell. Proteomics 11, M111.011429 (2012).
Ma, X.M., Yoon, S.O., Richardson, C.J., Julich, K. & Blenis, J. SKAR links pre-mRNA splicing to mTOR/S6K1-mediated enhanced translation efficiency of spliced mRNAs. Cell 133, 303–313 (2008).
Forrester, N.A. et al. Serotype-specific inactivation of the cellular DNA damage response during adenovirus infection. J. Virol. 85, 2201–2211 (2011).
Halbert, D.N., Cutt, J.R. & Shenk, T. Adenovirus early region 4 encodes functions required for efficient DNA replication, late gene expression, and host cell shutoff. J. Virol. 56, 250–257 (1985).
Acknowledgements
We thank P. Kellam and A. Palser for help and advice throughout and C. Trapnell, J. Goeks, J. Jackson, P. Cingonlani, B. Haas, T. Wu and J. Robinson for informative and helpful discussions by email. We especially thank I. Goodfellow for discussions on using proteomic data from baby hamster kidney and CHO cells. We are grateful to R.T. Hay (University of Dundee) and J. Blenis (Harvard Medical School) for antibodies to DBP and POLDIP3, respectively. In addition, we thank J. Blenis for the HA-tagged POLDIP3 expression plasmid and K. Leppard (University of Warwick) for the dl366 adenovirus. We also thank the members of the University of Bristol Transcriptomics facility (especially J. Coghill) and University of Bristol Wolfson Bioimaging facility for their help. D.A.M. and V.C.E. receive funding from the Wellcome trust (grant no. 083604). C.B. and J.F. receive funding from the Biotechnology and Biological Sciences Research Council (grant BB/I00095X/1).
Author information
Authors and Affiliations
Contributions
V.C.E. cowrote the manuscript, prepared infected cells, performed western blots and assisted with immunofluorescence. G.B. cowrote the manuscript, wrote software and assisted with handling the RNA-seq data. K.J.H. performed the mass spectrometry and assisted with analysis of the MS/MS data. J.F. helped with the BLAST analysis and wrote some of the BLAST search software. C.B. cowrote the manuscript and assisted with the MS/MS analysis and BLAST database searches. D.A.M. conceived of the experiments and PIT analysis pipeline, led the manuscript writing, wrote software, assisted with the immunofluorescence and the preparation of infected cells, and carried out manual curation, analysis and integration of the data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Tables 4 and 8 (PDF 2517 kb)
Supplementary Data
Supplementary Data 1–12: 1. Trinity transcripts derived from human RNA-seq data. 2. Trinity-derived protein data set from human Trinity transcripts data. 3. Sequence alignment map (SAM) file of Trinity transcripts with peptide data added from human PIT analysis. 4. Gene feature format (GFF3) file of the peptides associated with the Trinity transcripts from the human PIT analysis. 5. FASTA file of the longest open-reading frames in the human PIT analysis for which MS/MS data indicates that a protein is being made from a Trinity-predicted transcript. 6. Trinity transcripts derived from Chinese hamster ovary (CHO) RNA-seq data. 7. Trinity-derived protein list CHO Trinity transcriptome data. 8. SAM file of Trinity transcripts mapped to the Cricetulus griseus genome with peptide data added. 9. GFF3 file of the peptides associated with the Trinity transcripts on the Cricetulus griseus genome. 10. FASTA file of proteins identified as altered by SNPeffect. 11. FASTA file of proteins identified as altered by SNPeffect and corrected to reflect the nonsynonymous change. 12. SNPeffect output file containing a list of proteins that have been affected by a nonsynonymous changes and the location and type of change. (ZIP 97275 kb)
Supplementary Tables
Supplementary Tables 1–3, 5–7, 9 and 10: 1. Combined data on gene expression and changes in protein abundance 2. Peptides identified by searching distinct data sets. 3. Results of BLAST analysis of proteins predicted by Trinity assembly and confirmed by MS/MS. 5. Analysis of a hamster proteome using a Trinity-derived list of hamster proteins, presenting peptides identified. 6. BLAST analysis of open-reading frames in the CHO-based Trinity list of which at least one peptide has been identified by MaxQuant. 7. Searching for SNPs within the proteome. 9. Analysis of transcript and ORF length generated by Trinity and getorf from human RNAseq data. 10. BLAST analysis of all the ORFS present in the Trinity list derived from the human RNAseq data and used by MaxQuant to search for peptides. (ZIP 46155 kb)
Supplementary Software
A ZIP file (Scripts.zip) of the Supplementary Software (9 items) and the README file. (ZIP 22 kb)
Rights and permissions
About this article
Cite this article
Evans, V., Barker, G., Heesom, K. et al. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 9, 1207–1211 (2012). https://doi.org/10.1038/nmeth.2227
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2227
This article is cited by
-
A proteomics informed by transcriptomics insight into the proteome of Ornithodoros erraticus adult tick saliva
Parasites & Vectors (2022)
-
An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics
Genome Biology (2022)
-
Discovery of a small protein-encoding cis-regulatory overlapping gene of the tumor suppressor gene Scribble in humans
Communications Biology (2021)
-
Deep splicing plasticity of the human adenovirus type 5 transcriptome drives virus evolution
Communications Biology (2020)
-
Adenovirus-mediated ubiquitination alters protein–RNA binding and aids viral RNA processing
Nature Microbiology (2020)