Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

De novo derivation of proteomes from transcriptomes for transcript and protein identification


Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry–based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Data integration between the transcriptome and the proteome.
Figure 2: Adenovirus-induced degradation of POLDIP3 in an MG132-sensitive manner, and redistribution of POLDIP3 in infected cells.

Accession codes

Primary accessions


Referenced accessions

NCBI Reference Sequence


  1. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  Google Scholar 

  2. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).

    Article  CAS  Google Scholar 

  3. Brewis, I.A. & Brennan, P. Proteomics technologies for the global identification and quantification of proteins. Adv. Protein Chem. Struct. Biol. 80, 1–44 (2010).

    Article  CAS  Google Scholar 

  4. Lamond, A.I. et al. Advancing cell biology through proteomics in space and time (PROSPECTS). Mol. Cell. Proteomics 11, O112.017731 (2012).

    Article  Google Scholar 

  5. Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 73, 2092–2123 (2010).

    Article  CAS  Google Scholar 

  6. Lundberg, E. et al. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 6, 450 (2010).

    Article  Google Scholar 

  7. Li, M. et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333, 53–58 (2011).

    Article  CAS  Google Scholar 

  8. Castellana, N. & Bafna, V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73, 2124–2135 (2010).

    Article  CAS  Google Scholar 

  9. Volkening, J.D. et al. A proteogenomic survey of the Medicago truncatula genome. Mol. Cell. Proteomics 11, 933–944 (2012).

    Article  CAS  Google Scholar 

  10. Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  CAS  Google Scholar 

  11. Wu, B.J., Hurst, H.C., Jones, N.C. & Morimoto, R.I. The E1A 13S product of adenovirus 5 activates transcription of the cellular human HSP70 gene. Mol. Cell. Biol. 6, 2994–2999 (1986).

    Article  CAS  Google Scholar 

  12. Dallaire, F., Blanchette, P. & Branton, P.E. A proteomic approach to identify candidate substrates of human adenovirus E4orf6-E1B55K and other viral cullin-based E3 ubiquitin ligases. J. Virol. 83, 12172–12184 (2009).

    Article  CAS  Google Scholar 

  13. Evans, J.D. & Hearing, P. Relocalization of the Mre11-Rad50-Nbs1 complex by the adenovirus E4 ORF3 protein is required for viral replication. J. Virol. 79, 6207–6215 (2005).

    Article  CAS  Google Scholar 

  14. Lam, Y.W., Evans, V.C., Heesom, K.J., Lamond, A.I. & Matthews, D.A. Proteomics analysis of the nucleolus in adenovirus-infected cells. Mol. Cell. Proteomics 9, 117–130 (2010).

    Article  CAS  Google Scholar 

  15. Blankenberg, D. et al. Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. 89, 19.10 (2010).

    Google Scholar 

  16. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  Google Scholar 

  17. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    Article  CAS  Google Scholar 

  18. Swift, F.V., Bhat, K., Younghusband, H.B. & Hamada, H. Characterization of a cell type-specific enhancer found in the human papilloma virus type 18 genome. EMBO J. 6, 1339–1344 (1987).

    Article  CAS  Google Scholar 

  19. Zhao, H., Granberg, F., Elfineh, L., Pettersson, U. & Svensson, C. Strategic attack on host cell gene expression during adenovirus infection. J. Virol. 77, 11006–11015 (2003).

    Article  CAS  Google Scholar 

  20. Zhao, H., Dahlo, M., Isaksson, A., Syvanen, A.C. & Pettersson, U. The transcriptome of the adenovirus infected cell. Virology 424, 115–128 (2012).

    Article  CAS  Google Scholar 

  21. Wu, T.D. & Watanabe, C.K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).

    Article  CAS  Google Scholar 

  22. Soloway, P.D. & Shenk, T. The adenovirus type 5 i-leader open reading frame functions in cis to reduce the half-life of L1 mRNAs. J. Virol. 64, 551–558 (1990).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Symington, J.S. et al. Biosynthesis of adenovirus type 2 i-leader protein. J. Virol. 57, 848–856 (1986).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. van den Hengel, S.K. et al. Truncating the i-leader open reading frame enhances release of human adenovirus type 5 in glioma cells. Virol. J. 8, 162 (2011).

    Article  CAS  Google Scholar 

  25. Xu, X. et al. The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line. Nat. Biotechnol. 29, 735–741 (2011).

    Article  CAS  Google Scholar 

  26. Cingolani, P et al. snpEff SNP effect predictor v.3.0 〈〉 (2012).

  27. O'Shea, C.C. et al. Late viral RNA export, rather than p53 inactivation, determines ONYX-015 tumor selectivity. Cancer Cell 6, 611–623 (2004).

    Article  CAS  Google Scholar 

  28. Orazio, N.I., Naeger, C.M., Karlseder, J. & Weitzman, M.D. The adenovirus E1b55K/E4orf6 complex induces degradation of the Bloom helicase during infection. J. Virol. 85, 1887–1892 (2011).

    Article  CAS  Google Scholar 

  29. Woo, J.L. & Berk, A.J. Adenovirus ubiquitin-protein ligase stimulates viral late mRNA nuclear export. J. Virol. 81, 575–587 (2007).

    Article  CAS  Google Scholar 

  30. Boisvert, F.M. et al. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell. Proteomics 11, M111.011429 (2012).

    Article  Google Scholar 

  31. Ma, X.M., Yoon, S.O., Richardson, C.J., Julich, K. & Blenis, J. SKAR links pre-mRNA splicing to mTOR/S6K1-mediated enhanced translation efficiency of spliced mRNAs. Cell 133, 303–313 (2008).

    Article  CAS  Google Scholar 

  32. Forrester, N.A. et al. Serotype-specific inactivation of the cellular DNA damage response during adenovirus infection. J. Virol. 85, 2201–2211 (2011).

    Article  CAS  Google Scholar 

  33. Halbert, D.N., Cutt, J.R. & Shenk, T. Adenovirus early region 4 encodes functions required for efficient DNA replication, late gene expression, and host cell shutoff. J. Virol. 56, 250–257 (1985).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank P. Kellam and A. Palser for help and advice throughout and C. Trapnell, J. Goeks, J. Jackson, P. Cingonlani, B. Haas, T. Wu and J. Robinson for informative and helpful discussions by email. We especially thank I. Goodfellow for discussions on using proteomic data from baby hamster kidney and CHO cells. We are grateful to R.T. Hay (University of Dundee) and J. Blenis (Harvard Medical School) for antibodies to DBP and POLDIP3, respectively. In addition, we thank J. Blenis for the HA-tagged POLDIP3 expression plasmid and K. Leppard (University of Warwick) for the dl366 adenovirus. We also thank the members of the University of Bristol Transcriptomics facility (especially J. Coghill) and University of Bristol Wolfson Bioimaging facility for their help. D.A.M. and V.C.E. receive funding from the Wellcome trust (grant no. 083604). C.B. and J.F. receive funding from the Biotechnology and Biological Sciences Research Council (grant BB/I00095X/1).

Author information

Authors and Affiliations



V.C.E. cowrote the manuscript, prepared infected cells, performed western blots and assisted with immunofluorescence. G.B. cowrote the manuscript, wrote software and assisted with handling the RNA-seq data. K.J.H. performed the mass spectrometry and assisted with analysis of the MS/MS data. J.F. helped with the BLAST analysis and wrote some of the BLAST search software. C.B. cowrote the manuscript and assisted with the MS/MS analysis and BLAST database searches. D.A.M. conceived of the experiments and PIT analysis pipeline, led the manuscript writing, wrote software, assisted with the immunofluorescence and the preparation of infected cells, and carried out manual curation, analysis and integration of the data.

Corresponding author

Correspondence to David A Matthews.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 4 and 8 (PDF 2517 kb)

Supplementary Data

Supplementary Data 1–12: 1. Trinity transcripts derived from human RNA-seq data. 2. Trinity-derived protein data set from human Trinity transcripts data. 3. Sequence alignment map (SAM) file of Trinity transcripts with peptide data added from human PIT analysis. 4. Gene feature format (GFF3) file of the peptides associated with the Trinity transcripts from the human PIT analysis. 5. FASTA file of the longest open-reading frames in the human PIT analysis for which MS/MS data indicates that a protein is being made from a Trinity-predicted transcript. 6. Trinity transcripts derived from Chinese hamster ovary (CHO) RNA-seq data. 7. Trinity-derived protein list CHO Trinity transcriptome data. 8. SAM file of Trinity transcripts mapped to the Cricetulus griseus genome with peptide data added. 9. GFF3 file of the peptides associated with the Trinity transcripts on the Cricetulus griseus genome. 10. FASTA file of proteins identified as altered by SNPeffect. 11. FASTA file of proteins identified as altered by SNPeffect and corrected to reflect the nonsynonymous change. 12. SNPeffect output file containing a list of proteins that have been affected by a nonsynonymous changes and the location and type of change. (ZIP 97275 kb)

Supplementary Tables

Supplementary Tables 1–3, 5–7, 9 and 10: 1. Combined data on gene expression and changes in protein abundance 2. Peptides identified by searching distinct data sets. 3. Results of BLAST analysis of proteins predicted by Trinity assembly and confirmed by MS/MS. 5. Analysis of a hamster proteome using a Trinity-derived list of hamster proteins, presenting peptides identified. 6. BLAST analysis of open-reading frames in the CHO-based Trinity list of which at least one peptide has been identified by MaxQuant. 7. Searching for SNPs within the proteome. 9. Analysis of transcript and ORF length generated by Trinity and getorf from human RNAseq data. 10. BLAST analysis of all the ORFS present in the Trinity list derived from the human RNAseq data and used by MaxQuant to search for peptides. (ZIP 46155 kb)

Supplementary Software

A ZIP file ( of the Supplementary Software (9 items) and the README file. (ZIP 22 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Evans, V., Barker, G., Heesom, K. et al. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat Methods 9, 1207–1211 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing