Improved Ribo-seq enables identification of cryptic translation events

Erhard, Florian; Halenius, Anne; Zimmermann, Cosima; L'Hernault, Anne; Kowalewski, Daniel J; Weekes, Michael P; Stevanovic, Stefan; Zimmer, Ralf; Dölken, Lars

doi:10.1038/nmeth.4631

Brief Communication
Published: 12 March 2018

Improved Ribo-seq enables identification of cryptic translation events

Florian Erhard ORCID: orcid.org/0000-0002-3574-6983^1,2,
Anne Halenius^3,4,
Cosima Zimmermann^3,4,
Anne L'Hernault⁵,
Daniel J Kowalewski^6,7,
Michael P Weekes⁸,
Stefan Stevanovic⁶,
Ralf Zimmer¹ &
…
Lars Dölken ORCID: orcid.org/0000-0002-4651-3544²

Nature Methods volume 15, pages 363–366 (2018)Cite this article

8855 Accesses
121 Citations
106 Altmetric
Metrics details

Subjects

Abstract

Ribosome profiling has been used to predict thousands of short open reading frames (sORFs) in eukaryotic cells, but it suffers from substantial levels of noise. PRICE (https://github.com/erhard-lab/price) is a computational method that models experimental noise to enable researchers to accurately resolve overlapping sORFs and noncanonical translation initiation. We experimentally validated translation using major histocompatibility complex class I (MHC I) peptidomics and observed that sORF-derived peptides efficiently enter the MHC I presentation pathway and thus constitute a substantial fraction of the antigen repertoire.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: Re-decoding human cytomegalovirus.**

High-throughput translational profiling with riboPLATE-seq

Article Open access 05 April 2022

Jordan B. Metz, Nicholas J. Hornstein, … Peter A. Sims

Streamlined and sensitive mono- and di-ribosome profiling in yeast and human cells

Article 02 October 2023

Lucas Ferguson, Heather E. Upton, … Nicholas T. Ingolia

Quantification of translation uncovers the functions of the alternative transcriptome

Article 29 June 2020

Lorenzo Calviello, Antje Hirsekorn & Uwe Ohler

Accession codes

Primary accessions

Proteomics Identifications Database

PXD007203

Referenced accessions

Proteomics Identifications Database

PXD000394

References

Ingolia, N.T., Ghaemmaghami, S., Newman, J.R.S. & Weissman, J.S. Science 324, 218–223 (2009).
Article CAS Google Scholar
Ingolia, N.T. Cell 165, 22–33 (2016).
Article CAS Google Scholar
Gao, X. et al. Nat. Methods 12, 147–153 (2015).
Article CAS Google Scholar
Ingolia, N.T., Lareau, L.F. & Weissman, J.S. Cell 147, 789–802 (2011).
Article CAS Google Scholar
Calviello, L. et al. Nat. Methods 13, 165–170 (2016).
Article CAS Google Scholar
Chun, S.Y., Rodriguez, C.M., Todd, P.K. & Mills, R.E. BMC Bioinformatics 17, 482 (2016).
Article Google Scholar
Ingolia, N.T. et al. Cell Rep. 8, 1365–1379 (2014).
Article CAS Google Scholar
Bazzini, A.A. et al. EMBO J. 33, 981–993 (2014).
Article CAS Google Scholar
Ji, Z., Song, R., Regev, A. & Struhl, K. eLife 4, e08890 (2015).
Article Google Scholar
Malone, B. et al. Nucleic Acids Res. 45, 2960–2972 (2017).
CAS PubMed PubMed Central Google Scholar
Fields, A.P. et al. Mol. Cell 60, 816–827 (2015).
Article CAS Google Scholar
Zhang, P. et al. Nat. Commun. 8, 1749 (2017).
Article Google Scholar
Rutkowski, A.J. et al. Nat. Commun. 6, 7126 (2015).
Article Google Scholar
Stern-Ginossar, N. et al. Science 338, 1088–1093 (2012).
Article CAS Google Scholar
Weekes, M.P. et al. Cell 157, 1460–1472 (2014).
Article CAS Google Scholar
Yewdell, J.W. Trends Immunol. 32, 548–558 (2011).
Article CAS Google Scholar
Bassani-Sternberg, M., Pletscher-Frankild, S., Jensen, L.J. & Mann, M. Mol. Cell. Proteomics 14, 658–673 (2015).
Article CAS Google Scholar
Karosiene, E., Lundegaard, C., Lund, O. & Nielsen, M. Immunogenetics 64, 177–186 (2012).
Article CAS Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Genome Biol. 10, R25 (2009).
Article Google Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Nat. Methods 5, 621–628 (2008).
Article CAS Google Scholar
Erhard, F. & Zimmer, R. Nucleic Acids Res. 43, e136 (2015).
PubMed PubMed Central Google Scholar
Hong, Y. Comput. Stat. Data Anal. 59, 41–51 (2013).
Article Google Scholar
Pachter, L. Preprint at http://arxiv.org/abs/1104.3889 (2011).
Kowalewski, D.J. & Stevanović, S. Methods Mol. Biol. 960, 145–157 (2013).
Article CAS Google Scholar
Cox, J. & Mann, M. Nat. Biotechnol. 26, 1367–1372 (2008).
Article CAS Google Scholar

Download references

Acknowledgements

This work was funded by the MRC (Clinical Fellowship grant G1002523 to L.D.), NHSBT (grant WP11-05 to L.D.), the European Research Council (grant ERC-2016-CoG 721016–HERPES to L.D.) and the Wellcome Trust (Senior Clinical Research Fellowship 108070/Z/15/Z to M.P.W.). R.Z. acknowledges partial funding from the DFG (SFB 1123) and from Bavaria (BioSysNet). We thank S. Gorsky for critical reading of the manuscript.

Author information

Authors and Affiliations

Institute for Informatics, Ludwig-Maximilians-Universität München, München, Germany
Florian Erhard & Ralf Zimmer
Institute for Virology and Immunobiology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
Florian Erhard & Lars Dölken
Institute of Virology, Medical Center, University of Freiburg, Freiburg, Germany
Anne Halenius & Cosima Zimmermann
Faculty of Medicine, University of Freiburg, Freiburg, Germany
Anne Halenius & Cosima Zimmermann
Innovative Medicines & Early Development, AstraZeneca UK Ltd, Cambridge, UK
Anne L'Hernault
Department of Immunology, Interfaculty Institute for Cell Biology, University of Tübingen, Tübingen, Germany
Daniel J Kowalewski & Stefan Stevanovic
Immatics Biotechnologies GmbH, Tübingen, Germany
Daniel J Kowalewski
Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
Michael P Weekes

Authors

Florian Erhard
View author publications
You can also search for this author in PubMed Google Scholar
Anne Halenius
View author publications
You can also search for this author in PubMed Google Scholar
Cosima Zimmermann
View author publications
You can also search for this author in PubMed Google Scholar
Anne L'Hernault
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J Kowalewski
View author publications
You can also search for this author in PubMed Google Scholar
Michael P Weekes
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Stevanovic
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Zimmer
View author publications
You can also search for this author in PubMed Google Scholar
Lars Dölken
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.E. designed and implemented the computational approach. R.Z. supervised the development of the computational methods. A.H., C.Z., D.J.K. and S.S. provided the MHC I peptidome analysis. M.P.W. provided whole-proteome mass spectrometry data. A.L. provided Ribo-seq data used for the validation of this approach. F.E. and L.D. designed the experiments and wrote the paper.

Corresponding authors

Correspondence to Florian Erhard, Ralf Zimmer or Lars Dölken.

Ethics declarations

Competing interests

D.J.K. is an employee of Immatics Biotechnologies GmbH.

Integrated supplementary information

Supplementary Figure 1 Deterministic versus probabilistic mapping strategies.

(a) The read length distribution of the HSV-1 Ribo-seq experiment (see Supplementary Table 1) shows that cleavage is stochastic. Single 5 mismatches are frequent and shift the read-length distribution by +1. (b) In the HSV-1 data set the most abundant read class is 28 bp long and the annotated codon triplets start at positions divisible by three within the read (Frame=0). Shorter or longer reads with frame 0 are also highly abundant, indicating high variability of cleavage distances downstream of the P site. Additionally, a substantial fraction of 29 bp long reads is in frame 1, and many 27 bp long reads are in frame 2. This indicates that cleavage upstream of the ribosome exhibits significant variation, albeit less than downstream. A substantial fraction of reads has a mismatch at their 5’ end, which can be attributed to untemplated nucleotide additions during cDNA synthesis. (c) For each of the data sets (see Supplementary Table 1), the fraction of reads is shown that exhibit the most frequent read length in CDS mapped reads. In all data sets, only a minority of all reads have canonical length. (d) Only a fraction of all canonical reads correspond to footprints with the P site codon at a defined distance from their respective start position in the read alignment (e.g. position 12 within the read).

Supplementary Figure 2 Probabilistic mapping approach.

(a) Ribo-seq produces several distinct reads from a given codon (AGU) located in the P site of the ribosome. Besides the canonical read (1), both shorter (2) and longer (5) reads arise due to stochastic mRNA cleavage. Further variability is caused by observed (3) or unobservable (4) untemplated single nucleotide additions. The proportions of these reads are determined by the model parameters. (b) The deterministic approach of mapping reads to P site codons assigns all reads of a certain class (characterized by its length and 5’ mismatch state) to a specific codon at the same position within each read. In contrast, probabilistic mapping solves the inverse problem of the generative model to assign reads to codons. Finally, regularization can be used to overcome the problem of fractional mappings that are due to the over-parameterized problem.

Supplementary Figure 3 Start-codon prediction.

(a) Potential start codons are predicted by comparing the activity of each codon to the mean activity of downstream codons. Data from several different kinds of samples such as translation start site profiling employing cells treated with Lactimidomycin or Harringtonine prior to cell lysis can be included into the start codon prediction using a logistic regression approach. A start codon score (s) is computed via logistic regression for each codon in an ORF candidate. Three examples (x, y and z) are shown. (b) For all annotated ORFs with at least 25 reads, start scores (see Figure 1e) were predicted for all codons. We computed the relative ranks for all codons (i.e. the maximal scores get rank 0) and plotted the distribution of ranks for the annotated start codon. For both the HCMV and HSV-1 data set, about three quarter of the start codons are top ranked.

Supplementary Figure 4 Signal-to-noise ratio of read classes.

Read classes (characterized by read length and 5’ mismatch state) exhibit specific signal to noise ratios from slightly above the theoretical minimum (dashed horizontal line) to more than 8. Ignoring 5’ mismatches (pale orange) increases noise levels. For this experiment, the optimal deterministic mapping strategy utilizes four read classes, namely 28 and 29 bp without 5’ mismatch, and 29 and 30 bp long reads with 5’ mismatch.

Supplementary Figure 5 Improved signal-to-noise with PRICE.

(a) Deterministic mapping approaches trade off signal against signal to noise ratio (connected dots, from left to right) when considering reads of additional sizes. Probabilistic mapping and regularization (symbols for the regularization parameter δ) substantially improve Ribo-seq data quality for a broad range of published data sets (see Supplementary Table 1). (b) Aggregated reads around start codons demonstrate the improved signal to noise ratio of PRICE for both the HSV-1 and HCMV data set. Codon counts were normalized to the start codon for all well-translated ORFs (at least one read per codon on average) and aggregated per position relative to the start codon.

Supplementary Figure 6 Between-laboratory comparisons of HFF data sets (HCMV compared with HSV-1).

(a) The top-ranked uoORFs from ORF-scoring methods (RibORF, SPECtre, ORFScore, FLOSS) have significantly higher off-frame fractions (overlapping reads mapped out-of-frame divided by all overlapping reads) than uoORFs from ORF-identifying methods (PRICE, Rp-Bp, ORF-RATER). Data for PRICE with regularization (reg.) and using only the ORF-identifying component of PRICE with deterministic mapping (det.) are included. (b) Start codon distributions for the combined set of uORFs and uoORFs identified by the different methods are shown. Other refers to all other codons with one mismatch to AUG. Rp-Bp could only be run to identify NUG start codons. For each method, the data of the top 1000 uORFs/uoORFs according to the read density (number of reads from the HSV-1 data set mapped to codons divided by ORF length) are shown. Reproducibility of the respective ORFs in between two Ribo-seq data sets from primary human fibroblasts is indicated. (c) / (d) Illustration of ORF classes analyzed in (e)-(h): uORFs (e) are defined to both start and end in the 5’-UTR of an annotated mRNA, whereas uoORFs (f) end within the downstream coding sequence. Both may either start with an AUG (g) or a non-canonical start codon (h). (e) - (h) The reproducibility (number of reproducible ORFs divided by total number of ORFs identified) is plotted against the total number of ORFs by varying their read density (translation rates). This is shown for each method and for different classes of ORFs (uORFs, uoORFs, uORFs/uORFs starting with either an AUG or non-canonical start codon). PRICE with probabilistic mapping substantially outperforms all other methods in terms of reproducibility and total number of identified ORFs.

Supplementary Figure 7 Intra-laboratory comparisons of HFF data sets (HSV-1 replicate 1 compared with HSV-1 replicate 2).

(a) This is the same as Supplementary Fig. 6b, with the exception that not only the top 1000 ORFs were used but all. Note that the start codon distributions are virtually the same, however with a much larger fraction of reproducible ORFs for PRICE. The remaining panels are the same as Supplementary Fig. 6 for the comparison between two replicates of the HSV-1 data set.

Supplementary Figure 8 Sensitivity analysis using mass spectrometry data.

(a) PRICE has the greatest sensitivity for identifying annotated ORFs as estimated by the total amount of peptides detected in total proteome. (b) Novel ORFs remain virtually undetectable in the whole proteome data.

Supplementary Figure 9 Sensitivity and specificity in MHC I peptidome analysis.

(a) Compare to Fig. 1c from the main text, data for the MHC-I peptidome data set 2 is shown. (b) Compare to Fig. 1d from the main text, data for the MHC-I peptidome data set 1 is shown.

Supplementary Figure 10 Predicted MHC I binding affinities.

(a) We used NetMHCcons [Karosiene et al., 2012] to predict the binding affinities of all identified peptides to the respective HLA allotypes for both MHC-1 peptidome data sets (published: HLA-A*03:01, HLA-A*23:01, HLA-B*08:01, HLA-B*15:01, HLA-C*07:02, HLA-C*07:04; new: HLA-A*01:01, HLA-A*03:01, HLA-B*08:01, HLA-B*51:01, HLA-C*07:01, HLA-C*01:02). Shown is the percentile rank of the IC50 value, as predicted by NetMHCcons. Strong binders are recognized by a percentile rank ≤ 1 (vertical dotted line) [Karosiene et al., 2012]. With the exception of Rp-Bp predicted peptides in the new data set, all predictions of novel ORFs result in peptides that strongly resemble the binding affinities of peptides from annotated proteins.

Supplementary Figure 11 Rejected and confirmed novel HCMV ORFs.

(a) Many of the novel ORFs from the HCMV re-annotation (see Fig. 2 in the main text) were filtered by PRICE due to too much noise off-frame, too many inactive codons, noise down- or upstream, no clear start codon, abortive translation or due to very weak expression. (b) In comparison to the 168 ORFs of the reference annotation and the 248 confirmed ORFs, about half of our novel ORFs were expressed at very low levels and thus might reflect not experimental but translational noise.

Supplementary Figure 12 Penalizing factor of start-codon prediction.

We used this shifted logistic function to compute a penalizing factor for start codon prediction. For each start codon candidate, we computed the fraction of reads downstream of this codon and multiplied the preliminary start codon prediction score (from the logistic regression) by the factor computed from the logistic function. The specific shape of the function is automatically chosen by our implementation such that (i) a preliminary score of 0.5 was still enough to call the start position (we used 0.1 as overall threshold), and (ii) that the score increases smoothly to 1 for positions where more than 60% of the reads belong to the translated region.

Supplementary Figure 13 Data-driven estimation of the noise model.

(a) Each panel scatters the mean read count per in-frame codon (in log space, discarding zero-count codons) against the fraction of in-frame (f=0) or off-frame (f=1 or f=2) codons above the cut-off of 10% of the mean log read count for all genes from the annotation. A smoothing spline is fit to each scatterplot (red lines). This spline is used as an estimate of the probability p_f that a codon in frame f is observed (over this cut-off), when an ORF is actively translated with a specific strength (as estimated by the mean log read count per codon). To filter ORF candidates with too few observed codons, for an estimate of the variance of p₀, another spline is fit to the squared residuals of f = 0. The spline functions for mean f_m and variance f_v of p₀ are used to construct a family of beta distributions with parameters f_m(a) · c(a), (1 − f_m(a)) · c(a)) with c(a) = f_m(a) · (1−f_m(a))/f_v(a)-1 and their 5% quantiles q₅(a). An ORF with length n amino acids and mean log read count per in-frame codon a is filtered, if less than q₅(a) · n reads are observed (i.e. below the blue line in the scatter plot).

Source data

Source data to Fig. 1

Source data to Fig. 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erhard, F., Halenius, A., Zimmermann, C. et al. Improved Ribo-seq enables identification of cryptic translation events. Nat Methods 15, 363–366 (2018). https://doi.org/10.1038/nmeth.4631

Download citation

Received: 28 February 2017
Accepted: 22 December 2017
Published: 12 March 2018
Issue Date: 01 May 2018
DOI: https://doi.org/10.1038/nmeth.4631

This article is cited by

Widespread stable noncanonical peptides identified by integrated analyses of ribosome profiling and ORF features
- Haiwang Yang
- Qianru Li
- Zhe Ji
Nature Communications (2024)
Two-Step Parameter Estimation for Read Feature Models
- Florian Erhard
KI - Künstliche Intelligenz (2024)
ORFeus: a computational method to detect programmed ribosomal frameshifts and other non-canonical translation events
- Mary O. Richardson
- Sean R. Eddy
BMC Bioinformatics (2023)
BamQuery: a proteogenomic tool to explore the immunopeptidome and prioritize actionable tumor antigens
- Maria Virginia Ruiz Cuevas
- Marie-Pierre Hardy
- Grégory Ehx
Genome Biology (2023)
grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis
- Teresa Rummel
- Lygeri Sakellaridi
- Florian Erhard
Nature Communications (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

Proteomics Identifications Database

Referenced accessions

Proteomics Identifications Database

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links