Identification of proteins by tandem mass spectrometry requires a reference protein database, but these are only available for model species. Here we demonstrate that, for a non-model species, the sequencing of expressed mRNA can generate a protein database for mass spectrometry–based identification. This combination of high-throughput sequencing and protein identification technologies allows detection of genes and proteins. We use human cells infected with human adenovirus as a complex and dynamic model to demonstrate the robustness of this approach. Our proteomics informed by transcriptomics (PIT) technique identifies >99% of over 3,700 distinct proteins identified using traditional analysis that relies on comprehensive human and adenovirus protein lists. We show that this approach can also be used to highlight genes and proteins undergoing dynamic changes in post-transcriptional protein stability.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
NCBI Reference Sequence
We thank P. Kellam and A. Palser for help and advice throughout and C. Trapnell, J. Goeks, J. Jackson, P. Cingonlani, B. Haas, T. Wu and J. Robinson for informative and helpful discussions by email. We especially thank I. Goodfellow for discussions on using proteomic data from baby hamster kidney and CHO cells. We are grateful to R.T. Hay (University of Dundee) and J. Blenis (Harvard Medical School) for antibodies to DBP and POLDIP3, respectively. In addition, we thank J. Blenis for the HA-tagged POLDIP3 expression plasmid and K. Leppard (University of Warwick) for the dl366 adenovirus. We also thank the members of the University of Bristol Transcriptomics facility (especially J. Coghill) and University of Bristol Wolfson Bioimaging facility for their help. D.A.M. and V.C.E. receive funding from the Wellcome trust (grant no. 083604). C.B. and J.F. receive funding from the Biotechnology and Biological Sciences Research Council (grant BB/I00095X/1).
Supplementary Data 1–12: 1. Trinity transcripts derived from human RNA-seq data. 2. Trinity-derived protein data set from human Trinity transcripts data. 3. Sequence alignment map (SAM) file of Trinity transcripts with peptide data added from human PIT analysis. 4. Gene feature format (GFF3) file of the peptides associated with the Trinity transcripts from the human PIT analysis. 5. FASTA file of the longest open-reading frames in the human PIT analysis for which MS/MS data indicates that a protein is being made from a Trinity-predicted transcript. 6. Trinity transcripts derived from Chinese hamster ovary (CHO) RNA-seq data. 7. Trinity-derived protein list CHO Trinity transcriptome data. 8. SAM file of Trinity transcripts mapped to the Cricetulus griseus genome with peptide data added. 9. GFF3 file of the peptides associated with the Trinity transcripts on the Cricetulus griseus genome. 10. FASTA file of proteins identified as altered by SNPeffect. 11. FASTA file of proteins identified as altered by SNPeffect and corrected to reflect the nonsynonymous change. 12. SNPeffect output file containing a list of proteins that have been affected by a nonsynonymous changes and the location and type of change.
Supplementary Tables 1–3, 5–7, 9 and 10: 1. Combined data on gene expression and changes in protein abundance 2. Peptides identified by searching distinct data sets. 3. Results of BLAST analysis of proteins predicted by Trinity assembly and confirmed by MS/MS. 5. Analysis of a hamster proteome using a Trinity-derived list of hamster proteins, presenting peptides identified. 6. BLAST analysis of open-reading frames in the CHO-based Trinity list of which at least one peptide has been identified by MaxQuant. 7. Searching for SNPs within the proteome. 9. Analysis of transcript and ORF length generated by Trinity and getorf from human RNAseq data. 10. BLAST analysis of all the ORFS present in the Trinity list derived from the human RNAseq data and used by MaxQuant to search for peptides.
A ZIP file (Scripts.zip) of the Supplementary Software (9 items) and the README file.
About this article
Scientific Reports (2017)