Article | Published:

Detecting actively translated open reading frames in ribosome profiling data

Nature Methods volume 13, pages 165170 (2016) | Download Citation

Abstract

RNA-sequencing protocols can quantify gene expression regulation from transcription to protein synthesis. Ribosome profiling (Ribo-seq) maps the positions of translating ribosomes over the entire transcriptome. We have developed RiboTaper (available at https://ohlerlab.mdc-berlin.de/software/), a rigorous statistical approach that identifies translated regions on the basis of the characteristic three-nucleotide periodicity of Ribo-seq data. We used RiboTaper with deep Ribo-seq data from HEK293 cells to derive an extensive map of translation that covered open reading frame (ORF) annotations for more than 11,000 protein-coding genes. We also found distinct ribosomal signatures for several hundred upstream ORFs and ORFs in annotated noncoding genes (ncORFs). Mass spectrometry data confirmed that RiboTaper achieved excellent coverage of the cellular proteome. Although dozens of novel peptide products were validated in this manner, few of the currently annotated long noncoding RNAs appeared to encode stable polypeptides. RiboTaper is a powerful method for comprehensive de novo identification of actively used ORFs from Ribo-seq data.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

Referenced accessions

Gene Expression Omnibus

References

  1. 1.

    , , & Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).

  2. 2.

    , , , & The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 7, 1534–1550 (2012).

  3. 3.

    et al. Translational regulation shapes the molecular landscape of complex disease phenotypes. Nat. Commun. 6, 7200 (2015).

  4. 4.

    , , & Distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mRNA fragments. Elife 3, e01257 (2014).

  5. 5.

    et al. Genome-wide search for novel human uORFs and N-terminal protein extensions using ribosomal footprinting. Genome Res. 22, 2208–2218 (2012).

  6. 6.

    et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).

  7. 7.

    et al. Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq. Elife 3, e03528 (2014).

  8. 8.

    et al. Causal signals between codon bias, mRNA structure, and the efficiency of translation and elongation. Mol. Syst. Biol. 10, 770 (2014).

  9. 9.

    et al. Ribosome profiling reveals resemblance between long non-coding RNAs and 5′ leaders of coding RNAs. Development 140, 2828–2834 (2013).

  10. 10.

    , , , & Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

  11. 11.

    et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).

  12. 12.

    Polypeptide chain initiation: nucleotide sequences of the three ribosomal binding sites in bacteriophage R17 RNA. Nature 224, 957–964 (1969).

  13. 13.

    & The translational landscape of fission-yeast meiosis and sporulation. Nat. Struct. Mol. Biol. 21, 641–647 (2014).

  14. 14.

    et al. Observation of dually decoded regions of the human genome using ribosome profiling data. Genome Res. 22, 2219–2229 (2012).

  15. 15.

    et al. GWIPS-viz: development of a ribo-seq genome browser. Nucleic Acids Res. 42, D859–D864 (2014).

  16. 16.

    et al. Assessing gene-level translational control from ribosome profiling. Bioinformatics 29, 2995–3002 (2013).

  17. 17.

    , , & RiboTools: a Galaxy toolbox for qualitative ribosome profiling analysis. Bioinformatics 31, 2586–2588 (2015).

  18. 18.

    Spectrum estimation and harmonic analysis. Proc. IEEE 70, 1055–1096 (1982).

  19. 19.

    & A review of multitaper spectral analysis. IEEE Trans. Biomed. Eng. 61, 1555–1564 (2014).

  20. 20.

    , & Propagation of solar oscillations through the interplanetary medium. Nature 376, 139–144 (1995).

  21. 21.

    et al. IVT-seq reveals extreme bias in RNA-sequencing. Genome Biol. 15, R86 (2014).

  22. 22.

    et al. Quantitative profiling of initiating ribosomes in vivo. Nat. Methods 12, 147–153 (2015).

  23. 23.

    et al. Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science 343, 1248636 (2014).

  24. 24.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

  25. 25.

    et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

  26. 26.

    , & PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics 27, i275–i282 (2011).

  27. 27.

    , & IPG strip-based peptide fractionation for shotgun proteomics. Methods Mol. Biol. 1156, 67–77 (2014).

  28. 28.

    et al. Translation of 5′ leaders is pervasive in genes resistant to eIF2 repression. Elife 4, e03971 (2015).

  29. 29.

    & Translation inhibitors cause abnormalities in ribosome profiling experiments. Nucleic Acids Res. 42, e134 (2014).

  30. 30.

    & Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation. Genome Res. 24, 2011–2021 (2014).

  31. 31.

    et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).

  32. 32.

    , & Gene expression regulation by upstream open reading frames and human disease. PLoS Genet. 9, e1003529 (2013).

  33. 33.

    , , & uORFdb—a comprehensive literature database on eukaryotic uORF biology. Nucleic Acids Res. 42, D60–D67 (2014).

  34. 34.

    et al. Detecting translational regulation by change point analysis of ribosome profiling data sets. RNA 20, 1507–1518 (2014).

  35. 35.

    et al. PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration. Nucleic Acids Res. 43, e29 (2015).

  36. 36.

    et al. Differential protein occupancy profiling of the mRNA transcriptome. Genome Biol. 15, R15 (2014).

  37. 37.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  38. 38.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  39. 39.

    et al. GENCODE: the reference human genome annotation for the ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  40. 40.

    & RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  41. 41.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  42. 42.

    , & Appendix: A Multitaper R package. in Applications of Multitaper Spectral Analysis to Nonstationary Data. PhD dissertation, Queen's Univ., 149–183 (2014).

  43. 43.

    et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 179 (2015).

  44. 44.

    & MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

  45. 45.

    , , , & A fast Peptide Match service for UniProt Knowledgebase. Bioinformatics 29, 2808–2809 (2013).

Download references

Acknowledgements

L.C. sincerely thanks R. Marangoni (University of Pisa) for inspiring and fruitful discussions and A. Munteanu (MDC) for help with the supplementary figures. L.C. is funded by the MDC PhD program. B.O. acknowledges funding through a Delbrück fellowship at the MDC. N.M. acknowledges funding from EU Marie Curie IIF. N.M. and U.O. were supported by US National Institutes of Health grant R01GM104962.

Author information

Author notes

    • Neelanjan Mukherjee
    •  & Emanuel Wyler

    These authors contributed equally to this work.

Affiliations

  1. Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany.

    • Lorenzo Calviello
    • , Neelanjan Mukherjee
    • , Emanuel Wyler
    • , Henrik Zauber
    • , Antje Hirsekorn
    • , Matthias Selbach
    • , Markus Landthaler
    • , Benedikt Obermayer
    •  & Uwe Ohler
  2. Department of Biology, Humboldt University, Berlin, Germany.

    • Uwe Ohler
  3. Department of Computer Science, Humboldt University, Berlin, Germany.

    • Uwe Ohler

Authors

  1. Search for Lorenzo Calviello in:

  2. Search for Neelanjan Mukherjee in:

  3. Search for Emanuel Wyler in:

  4. Search for Henrik Zauber in:

  5. Search for Antje Hirsekorn in:

  6. Search for Matthias Selbach in:

  7. Search for Markus Landthaler in:

  8. Search for Benedikt Obermayer in:

  9. Search for Uwe Ohler in:

Contributions

L.C. and U.O. developed the computational approach. L.C. implemented the RiboTaper method and analyzed the sequencing data, supervised by U.O. E.W., N.M. and A.H. performed the Ribo-seq experiments, supervised by M.L. B.O. carried out the evolutionary-conservation analysis and helped with the interpretation of the presented findings. H.Z., M.S. and L.C. analyzed the mass spectrometry data. L.C., N.M. and U.O. wrote the manuscript, with crucial input from all other authors.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Uwe Ohler.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–17

Excel files

  1. 1.

    Supplementary Table 1

    Statistics about alignment and pre-processing of the sequencing datasets used in this study.

  2. 2.

    Supplementary Table 2

    Complete list of identified ORFs in the different datasets used.

  3. 3.

    Supplementary Table 3

    Filtered list of identified ORFs in the different datasets used. Non-coding ORFs overlapping CDS regions were removed. Additionally, ORFs were filtered out when >30% of the Ribo-seq coverage were supported by multi-mapping reads only.

  4. 4.

    Supplementary Table 4

    Evidence table for peptides identified by the RiboTaper database only.

Zip files

  1. 1.

    Supplementary Data

    Archive containing bed files for the identified ORFs.

  2. 2.

    Supplementary Software

    RiboTaper (version 1.0) software code.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3688

Further reading