Noncanonical open reading frames encode functional proteins essential for cancer cell survival

Prensner, John R.; Enache, Oana M.; Luria, Victor; Krug, Karsten; Clauser, Karl R.; Dempster, Joshua M.; Karger, Amir; Wang, Li; Stumbraite, Karolina; Wang, Vickie M.; Botta, Ginevra; Lyons, Nicholas J.; Goodale, Amy; Kalani, Zohra; Fritchman, Briana; Brown, Adam; Alan, Douglas; Green, Thomas; Yang, Xiaoping; Jaffe, Jacob D.; Roth, Jennifer A.; Piccioni, Federica; Kirschner, Marc W.; Ji, Zhe; Root, David E.; Golub, Todd R.

doi:10.1038/s41587-020-00806-2

Letter
Published: 28 January 2021

Noncanonical open reading frames encode functional proteins essential for cancer cell survival

John R. Prensner^1,2,3,
Oana M. Enache¹,
Victor Luria⁴,
Karsten Krug¹,
Karl R. Clauser¹,
Joshua M. Dempster¹,
Amir Karger ORCID: orcid.org/0000-0002-4561-3850⁵,
Li Wang¹,
Karolina Stumbraite¹,
Vickie M. Wang¹,
Ginevra Botta¹,
Nicholas J. Lyons¹,
Amy Goodale¹,
Zohra Kalani¹,
Briana Fritchman¹,
Adam Brown¹,
Douglas Alan¹,
Thomas Green¹,
Xiaoping Yang¹,
Jacob D. Jaffe ORCID: orcid.org/0000-0001-9845-1210¹^nAff8,
Jennifer A. Roth¹,
Federica Piccioni¹^nAff9,
Marc W. Kirschner⁴,
Zhe Ji ORCID: orcid.org/0000-0002-1809-8099^6,7,
David E. Root ORCID: orcid.org/0000-0001-5122-861X¹ &
…
Todd R. Golub ORCID: orcid.org/0000-0003-0113-2403^1,2,3

Nature Biotechnology volume 39, pages 697–704 (2021)Cite this article

15k Accesses
68 Citations
171 Altmetric
Metrics details

Subjects

Abstract

Although genomic analyses predict many noncanonical open reading frames (ORFs) in the human genome, it is unclear whether they encode biologically active proteins. Here we experimentally interrogated 553 candidates selected from noncanonical ORF datasets. Of these, 57 induced viability defects when knocked out in human cancer cell lines. Following ectopic expression, 257 showed evidence of protein expression and 401 induced gene expression changes. Clustered regularly interspaced short palindromic repeat (CRISPR) tiling and start codon mutagenesis indicated that their biological effects required translation as opposed to RNA-mediated effects. We found that one of these ORFs, G029442—renamed glycine-rich extracellular protein-1 (GREP1)—encodes a secreted protein highly expressed in breast cancer, and its knockout in 263 cancer cell lines showed preferential essentiality in breast cancer-derived lines. The secretome of GREP1-expressing cells has an increased abundance of the oncogenic cytokine GDF15, and GDF15 supplementation mitigated the growth-inhibitory effect of GREP1 knockout. Our experiments suggest that noncanonical ORFs can express biologically active proteins that are potential therapeutic targets.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Identification of translated unannotated or unstudied ORFs.**

**Fig. 2: Defining bioactive ORFs through gene expression profiling.**

**Fig. 3: CRISPR screening to identify unknown ORFs implicated in cancer cell viability.**

**Fig. 4: Characterization of *GREP1* as a cancer dependency gene in breast cancer.**

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Srinivas Niranj Chandrasekaran, Beth A. Cimini, … Anne E. Carpenter

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Targeting DCAF5 suppresses SMARCB1-mutant cancer by stabilizing SWI/SNF

Article 27 March 2024

Sandi Radko-Juettner, Hong Yue, … Charles W. M. Roberts

Data availability

Processed data for CRISPR screens (Figs. 3 and 4d) are available in Supplementary Tables 22 and 27. Raw data are available in the Source data files accompanying this manuscript, as well as through the NCBI Sequence Read Archive at: SRR13126801, SRR13128583, SRR13132373, SRR13142215 and SRR13142421. Mass spectrometry data relating to Fig. 1 are available in Supplementary Table 14. Raw MS spectra are available through the original datasets at: https://cptac-data-portal.georgetown.edu/study-summary/S060 (CPTAC2_BRCA_prosp), https://cptac-data-portal.georgetown.edu/study-summary/S045 (CPTAC2_COAD_prosp), https://cptac-data-portal.georgetown.edu/study-summary/S050 (CPTAC3_ccRCC), https://cptac-data-portal.georgetown.edu/study-summary/S056 (CPTAC3_LUAD), https://cptac-data-portal.georgetown.edu/study-summary/S051 (CPTAC3_PTRC_DP1), https://cptac-data-portal.georgetown.edu/study-summary/S053 (CPTAC3_UCEC), ftp://massive.ucsd.edu/MSV000080527 (HLA_Abelin), ftp://massive.ucsd.edu/MSV000084787 (HLA_Ouspenskaia), ftp://massive.ucsd.edu/MSV000084172/; ftp://massive.ucsd.edu/MSV000080527; ftp://massive.ucsd.edu/MSV000084442/ (HLA_Sarkizova), ftp://massive.ucsd.edu/MSV000082644 (CPTAC Medulloblastoma) and http://www.peptideatlas.org (PeptideAtlas database). L1000 data relating to Fig. 2 and Supplementary Figs. 8 and 9 are available through the NIH LINCS program and at https://clue.io/data. The website lincsproject.org provides information about the LINCS consortium, including data standards. Source data are provided with this paper.

Code availability

L1000 data analysis code and preprocessed data are available via GitHub: https://github.com/cmap/cmapM. There is additional information about this database and tools at http://clue.io/connectopedia. L1000 data were analyzed via the following: the ‘tidyverse’ suite36 of R packages (v.1.2.1), the ‘cmapR’ package37 (v.1.0.1) in R v.3.5.0 (R Core Team 2018) and in-house code available through github (https://github.com/johnprensner/smORF_analyses). Mass spectrometry peptides were processed via Spectrum Mill MS Proteomics Workbench v.6.0. Additional code for computational tools used in this study is listed here: PhyloCSF (https://github.com/mlin/PhyloCSF/wiki) for 29-mammal alignment, Slncky (https://slncky.github.io), STARS v.1.3 (http://www.broadinstitute.org/rnai/public/software/index) and CERES v.1.0 (https://github.com/cancerdatasci/ceres).

References

Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25, 232–234 (2000).
Article CAS PubMed Google Scholar
Fields, C., Adams, M. D., White, O. & Venter, J. C. How many genes in the human genome? Nat. Genet. 7, 345–346 (1994).
Article CAS PubMed Google Scholar
Liang, F. et al. Gene index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet. 25, 239–240 (2000).
Article CAS PubMed Google Scholar
Omenn, G. S. et al. Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO Human Proteome Project. J. Proteome Res. 17, 4031–4041 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
Article PubMed PubMed Central Google Scholar
Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).
Article CAS PubMed PubMed Central Google Scholar
van Heesch, S. et al. The translational landscape of the human heart. Cell 178, 242–260 (2019).
Article PubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Dinger, M. E., Pang, K. C., Mercer, T. R. & Mattick, J. S. Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput. Biol. 4, e1000176 (2008).
Article PubMed PubMed Central Google Scholar
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS PubMed Google Scholar
Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article Google Scholar
Mudge, J. M. et al. Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci. Genome Res. 29, 2073–2087 (2019).
Article CAS PubMed PubMed Central Google Scholar
Banfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jungreis, I. et al. Nearly all new protein-coding predictions in the CHESS database are not protein-coding. Preprint at bioRxiv https://doi.org/10.1101/360602 (2018).
Bazzini, A. A. et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. EMBO J. 33, 981–993 (2014).
Article CAS PubMed PubMed Central Google Scholar
Branca, R. M. et al. HiRIEF LC–MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11, 59–62 (2014).
Article CAS PubMed Google Scholar
Cabili, M. N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Article CAS PubMed PubMed Central Google Scholar
Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).
Article CAS PubMed Google Scholar
Gao, X. et al. Quantitative profiling of initiating ribosomes in vivo. Nat. Methods 12, 147–153 (2015).
Article CAS PubMed Google Scholar
Gascoigne, D. K. et al. Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes. Bioinformatics 28, 3042–3050 (2012).
Article CAS PubMed Google Scholar
Iyer, M. K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
Article CAS PubMed PubMed Central Google Scholar
Koch, A. et al. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites. Proteomics 14, 2688–2698 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ma, J. et al. Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J. Proteome Res. 13, 1757–1765 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mackowiak, S. D. et al. Extensive identification and analysis of conserved small ORFs in animals. Genome Biol. 16, 179 (2015).
Article PubMed PubMed Central Google Scholar
Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
Article PubMed PubMed Central Google Scholar
Schwaid, A. G. et al. Chemoproteomic discovery of cysteine-containing human short open reading frames. J. Am. Chem. Soc. 135, 16750–16753 (2013).
Article CAS PubMed Google Scholar
Slavoff, S. A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013).
Article CAS PubMed Google Scholar
Sun, H. et al. Integration of mass spectrometry and RNA-seq data to confirm human ab initio predicted genes and lncRNAs. Proteomics 14, 2760–2768 (2014).
Article CAS PubMed Google Scholar
Zhang, C. et al. Systematic analysis of missing proteins provides clues to help define all of the protein-coding genes on human chromosome 1. J. Proteome Res. 13, 114–125 (2014).
Article CAS PubMed Google Scholar
Vanderperre, B. et al. Direct detection of alternative open reading frames translation products in human significantly expands the proteome. PLoS ONE 8, e70698 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
Article CAS PubMed Google Scholar
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Article CAS PubMed PubMed Central Google Scholar
Nassa, M. et al. Analysis of human collagen sequences. Bioinformation 8, 26–33 (2012).
Article PubMed PubMed Central Google Scholar
Breit, S. N., Tsai, V. W. & Brown, D. A. Targeting obesity and cachexia: Identification of the GFRAL receptor-MIC-1/GDF15 pathway. Trends Mol. Med. 23, 1065–1067 (2017).
Article CAS PubMed Google Scholar
Mullican, S. E. & Rangwala, S. M. Uniting GDF15 and GFRAL: therapeutic opportunities in obesity and beyond. Trends Endocrinol. Metab. 29, 560–570 (2018).
Article CAS PubMed Google Scholar
Baroni, M. et al. Distinct response to GDF15 knockdown in pediatric and adult glioblastoma cell lines. J. Neurooncol. 139, 51–60 (2018).
Article CAS PubMed Google Scholar
Huang, C. Y. et al. Molecular alterations in prostate carcinomas that associate with in vivo exposure to chemotherapy: identification of a cytoprotective mechanism involving growth differentiation factor 15. Clin. Cancer Res. 13, 5825–5833 (2007).
Article CAS PubMed Google Scholar
Ratnam, N. M. et al. NF-kappaB regulates GDF-15 to suppress macrophage surveillance during early tumor development. J. Clin. Invest. 127, 3796–3809 (2017).
Article PubMed PubMed Central Google Scholar
Corre, J. et al. Bioactivity and prognostic significance of growth differentiation factor GDF15 secreted by bone marrow mesenchymal stem cells in multiple myeloma. Cancer Res. 72, 1395–1406 (2012).
Article CAS PubMed Google Scholar
Peake, B. F., Eze, S. M., Yang, L., Castellino, R. C. & Nahta, R. Growth differentiation factor 15 mediates epithelial mesenchymal transition and invasion of breast cancers through IGF-1R-FoxM1 signaling. Oncotarget 8, 94393–94406 (2017).
Article PubMed PubMed Central Google Scholar
Martinez, T. F. et al. Accurate annotation of human protein-coding small open reading frames. Nat. Chem. Biol. 16, 458–468 (2020).
Article CAS PubMed Google Scholar
Chen, J. et al. Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134–1148 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Evolutionary analysis across mammals reveals distinct classes of long non-coding RNAs. Genome Biol. 17, 19 (2016).
Article CAS PubMed PubMed Central Google Scholar
Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).
Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011).
Article CAS PubMed Google Scholar
Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).
Article CAS PubMed PubMed Central Google Scholar
Domazet-Loso, T., Brajkovic, J. & Tautz, D. A phylostratigraphy approach to uncover the genomic history of major adaptations in metazoan lineages. Trends Genet. 23, 533–539 (2007).
Article CAS PubMed Google Scholar
Domazet-Loso, T. et al. No evidence for phylostratigraphic bias impacting inferences on patterns of gene emergence and evolution. Mol. Biol. Evol. 34, 843–856 (2017).
CAS PubMed PubMed Central Google Scholar
Kumar, S., Stecher, G., Suleski, M. & Hedges, S. B. TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).
Article CAS PubMed Google Scholar
Yang, X. et al. A public genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 659–661 (2011).
Article CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS PubMed PubMed Central Google Scholar
Ross, Z., Wickham, H., Robinson, D. Declutter your R workflow with tidy tools. Preprint at PeerJ https://peerj.com/preprints/3180.pdf (2017).
Enache, O. M. et al. The GCTx format and cmap{Py, R, M, J} packages: resources for optimized storage and integrated traversal of annotated dense matrices. Bioinformatics 35, 1427–1429 (2019).
Article CAS PubMed Google Scholar
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Article CAS PubMed PubMed Central Google Scholar
Piccioni, F., Younger, S. T. & Root, D. E. Pooled lentiviral-delivery genetic screens. Curr. Protoc. Mol. Biol. 121, 32.1.1–32.1.21 (2018).
Article Google Scholar
Meyers, R. M. et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat. Genet. 49, 1779–1784 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hart, T., Brown, K. R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
Article PubMed PubMed Central Google Scholar
Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
Article CAS PubMed PubMed Central Google Scholar
Pinello, L. et al. Analyzing CRISPR genome-editing experiments with CRISPResso. Nat. Biotechnol. 34, 695–697 (2016).
Article CAS PubMed PubMed Central Google Scholar
Niknafs, Y. S. et al. MiPanda: a resource for analyzing and visualizing next-generation sequencing transcriptomics data. Neoplasia 20, 1144–1149 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shevchenko, A., Wilm, M., Vorm, O. & Mann, M. Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal. Chem. 68, 850–858 (1996).
Article CAS PubMed Google Scholar
Peng, J. & Gygi, S. P. Proteomics: the move to mixtures. J. Mass Spectrom. 36, 1083–1091 (2001).
Article CAS PubMed Google Scholar
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Article CAS PubMed Google Scholar
Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. & Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).
Article CAS PubMed Google Scholar
Jones, D. T. & Cozzetto, D. DISOPRED3: precise disordered region predictions with annotated protein-binding activity. Bioinformatics 31, 857–863 (2015).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank D. Bondeson, P. Tsvetkov, S. Corsello, U. Ben-David and T. Ouspenskaia for helpful discussions and critical reading of the manuscript. We thank M. Zhong for technical assistance with cloning and Z. Demere for assistance with CRISPR-sequencing. We thank D. Nusinow and S. Gygi for insights into identification of small peptides in proteomics datasets. We thank R. Tomaino for assistance with mass spectrometry at the Talpin Biological Mass Spectrometry Facility at Harvard Medical School. We thank J. Chen for assistance with the Slncky algorithm. We thank J. Gould for assistance with gene datasets. We thank I. Cheeseman for provision of DOX-inducible HeLa Cas9 cells. J.R.P. was supported by the Harvard K-12 in Central Nervous System tumors (grant 5K12 CA 90354-18). V.L and M.W.K. were supported by the National Institutes of Health (grants R01 HD073104 and RO1 HD091846 to M.W.K.).

Author information

Jacob D. Jaffe
Present address: Inzen Therapeutics, Cambridge, MA, USA
Federica Piccioni
Present address: Merck Research Laboratories, Boston, MA, USA

Authors and Affiliations

Broad Institute of Harvard and MIT, Cambridge, MA, USA
John R. Prensner, Oana M. Enache, Karsten Krug, Karl R. Clauser, Joshua M. Dempster, Li Wang, Karolina Stumbraite, Vickie M. Wang, Ginevra Botta, Nicholas J. Lyons, Amy Goodale, Zohra Kalani, Briana Fritchman, Adam Brown, Douglas Alan, Thomas Green, Xiaoping Yang, Jacob D. Jaffe, Jennifer A. Roth, Federica Piccioni, David E. Root & Todd R. Golub
Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
John R. Prensner & Todd R. Golub
Division of Pediatric Hematology/Oncology, Boston Children’s Hospital, Boston, MA, USA
John R. Prensner & Todd R. Golub
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Victor Luria & Marc W. Kirschner
IT-Research Computing, Harvard Medical School, Boston, MA, USA
Amir Karger
Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Zhe Ji
Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, USA
Zhe Ji

Authors

John R. Prensner
View author publications
You can also search for this author in PubMed Google Scholar
Oana M. Enache
View author publications
You can also search for this author in PubMed Google Scholar
Victor Luria
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Krug
View author publications
You can also search for this author in PubMed Google Scholar
Karl R. Clauser
View author publications
You can also search for this author in PubMed Google Scholar
Joshua M. Dempster
View author publications
You can also search for this author in PubMed Google Scholar
Amir Karger
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar
Karolina Stumbraite
View author publications
You can also search for this author in PubMed Google Scholar
Vickie M. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ginevra Botta
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas J. Lyons
View author publications
You can also search for this author in PubMed Google Scholar
Amy Goodale
View author publications
You can also search for this author in PubMed Google Scholar
Zohra Kalani
View author publications
You can also search for this author in PubMed Google Scholar
Briana Fritchman
View author publications
You can also search for this author in PubMed Google Scholar
Adam Brown
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Alan
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Green
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoping Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jacob D. Jaffe
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer A. Roth
View author publications
You can also search for this author in PubMed Google Scholar
Federica Piccioni
View author publications
You can also search for this author in PubMed Google Scholar
Marc W. Kirschner
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Ji
View author publications
You can also search for this author in PubMed Google Scholar
David E. Root
View author publications
You can also search for this author in PubMed Google Scholar
Todd R. Golub
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.R.P. and T.R.G. conceived the project, designed experimental approaches, supervised the study and analyzed data. J.R.P. selected ORFs for screening and developed ORF prioritization methods. J.R.P. and X.Y. designed and generated the ORF cDNA library. J.R.P performed ORF library screening, in vitro CRISPR experiments, siRNA experiments, immunoblots, cell culture assays and all GREP1 functional experiments. B.F. executed the arrayed ORF screen for L1000. O.M.E. and N.J.L. performed gene expression profiling and analyzed L1000 gene expression data. Z.J. contributed ORF predictions and assisted in analysis of ORF candidates. V.L., A.K., M.K. and J.R.P. performed protein evolutionary analyses and analyzed phylostratigraphy data. K.K., K.R.C. and J.D.J. performed proteomic identification of ORFs from datasets. J.R.P., F.P. and D.E.R. designed and analyzed CRISPR screens. T.G., D.A. and A.B. assisted with sgRNA design. A.G. and Z.K. performed cell line CRISPR screens. L.W., K.S., G.B. and J.A.R. performed pooled CRISPR screening. V.M.W. and J.M.D. analyzed pooled CRISPR screen data. J.M.D. performed comparative analyses of ORF CRISPR data with publicly available CRISPR screens. J.R.P. and T.R.G. wrote the manuscript draft and all authors contributed to editing it.

Corresponding author

Correspondence to Todd R. Golub.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Generation and validation of a non-canonical ORF cDNA library.

a, Vector design and sequence details for the ORF library. The vector used is a modified version of the plx307 vector developed by the Genomic Perturbation Platform at the Broad Institute. b, Titration analyses of in cell western experiments. Three ORFs were chosen: eGFP (positive control), LINC00116 (high-expressing ORF), and RP11-539I5 (low expressing ORF). Increasing amounts of plasmid were transfected into increasing numbers of HEK293T cells as shown. c, Quantification the in cell western titration shown in b, demonstrating signal detection over noise and signal plateau. Signal was quantified using pixel density in the 800 nM green color channel. d, Replicate experiments assessing signal-to-noise thresholds for a low-expressing ORF transfected into HEK293T cells with a low DNA plasmid concentration, as well as a high-expressing ORF (eGFP) transfected into HEK293T cells at a high DNA plasmid concentration. e, Example in cell western data in triplicate experiments for selected ORFs. f, Abrogation of protein translation via mutation of the ORF for selected examples. g, A systematic evaluation of in cell western signal for wild type and mutant ORFs for all pairs. ORFs are separated into those with signal above the baseline threshold, and those without reproducible signal. h, An immunoblot showing in vitro transcription/translation of selected tag-free ORFs using a wheat germ lysate system. Red arrows indicate the translated ORFs. Results were repeated in two independent experiments.

Extended Data Fig. 2 Analysis of paired wild-type and mutant constructs in L1000 data.

a, A strategy for ORF mutagenesis strategy in which the start codon and downstream methionines were mutated to alanine. The shown amino acid sequence is a fictional sequence. b, A pie chart showing the number and percentage of amino acids changed per ORF from the mutagenesis. c, A violin plot showing the number of Perturbational Class (PCL) connections made at the 98th percentile for matched mutant and wild type constructs (n = 47 for each, all data points are biologically independent experiments). P value by a two-tailed Wilcoxon matched pairs rank test. d, Left, the overall distribution of PCL connections across all ranks in wild type and mutant constructs (n = 19,012 independent comparisons for each). Right, an inset image of distribution of PCL connections at high connectivity, showing a bias in connections made with wild type compared to mutant constructs (n = 1,920 independent comparisons each). P value by a two-tailed Wilcoxon matched pairs rank test. e, All PCL connections in wild type constructs at either the > =95th percentile or < = -95th percentile, with the matched percentile connectivity in the mutant constructs. f, The distribution of percentile connectivity results in wild type or mutant constructs for the indicated genes. In brief, all ORF L1000 signatures were queried against all PCL classes and a percentile connectivity was generated for each individual cell line and for both wild type and mutant constructs. Cell line and construct data was then aggregated and ranked from highest to lowest connectivity. The rank positions of wild type and mutant ORFs were then plotted to reveal a depletion of mutant constructs at high connectivity scores. g, Two example heatmaps for the TINCR and SLC35A4 uORF plasmids showing clustering of PCL connectivity among wild type constructs that is not shared with mutant constructs. Purple bars denote wild type ORF experiments and green bars denote mutant ORF experiments. h, L1000 signature replicate reproducibility for all wild type and mutant pairs across all cell lines. All ORF signatures with at least one reproducible wild type signature are shown.

Extended Data Fig. 3 Validation of CRISPR hits via manual assays.

a–i, CRISPR assays using doxycycline-inducible Cas9 in HeLa cells. Targets are divided in ones that validated and ones that did not. For each experiment, the right-set panel is qPCR data of expression 96 hours after induction of Cas9 with doxycycline. a) ZBTB11-AS1 b) HP08474 c) GREP1 d) RP11-54A9.1 e) G083755 f) OLMALINC g) CTD-2270L9.4 h) RP11-277L2.3 i) ASNSD1 uORF. j-k, CRISPR assays using stably-expressing A375 Cas9 cells. j) CTD-2270L9.4 k) ASNSD1 uORF. For all data in this figure, n = 6 technical replicates for each data point. Error bars represent standard deviation. Data was also acquired a 3 independent biological replicates based on doxycycline dose level (0.2 ug/mL, 1.0 ug/mL and 2.0 ug/mL doxycycline, as well as 0 ug/mL doxycycline). The data shown are the 1.0 ug/mL dosing level, with similar results observed for the 0.2 ug/mL and 2.0 ug/mL doxycycline dosing levels.

Extended Data Fig. 4 Tiling CRISPR assays to elucidate functional non-canonical ORFs.

a, A heatmap showing log fold change viability loss at Day +21 in the secondary CRISPR screen for the indicated non-canonical ORFs tested by multiple tiling sgRNA regions. b-e, Examples of non-canonical ORFs with a CRISPR tiling phenotype. b-e) Graphical representation of tiling CRISPR assays in which each dot represents an individual sgRNA. sgRNAs are mapped to their genomic loci and the genomic region of the tiling assay is shown. The location of the putative non-canonical ORF is shown in the gene annotation above. b) CTD-2270L9.4 c) OLMALINC d) RP11-54A9.1 e) RPP14 dORF / HTD2. f - k, Representative sgRNA log fold change data for the indicated transcripts. Each tiling experiment is classified as indicated. f) LINC00662 g) RP11-195B21.3 h) LYRM4-AS1 i) ESRG j) TCONS_I2_00007040 k) LINC01184.

Extended Data Fig. 5 Specific siRNA knockdown of ZBTB11-AS1 mRNA transcript causes a viability phenotype which is specifically rescued by the wild type ZBTB11-AS1 ORF.

a, A schematic showing the genomic location and sequences for the two siRNAs used for ZBTB11-AS1. b, mRNA expression levels for ZBTB11-AS1 or ZBTB11 transcripts 48 hours after siRNA knockdown of ZBTB11-AS1 in A549 cells. N = 3 independent replicates for all conditions. Barplots represent mean ± standard deviation. c, Relative cell viability of A549 cells treated with ZBTB11-AS1 siRNAs at 72 hours. Parental A549 cells were used along with A549 cells expressing cDNAs for GFP, wild type ZBTB11-AS1 ORF sequence, or mutant ZBTB11-AS1 ORF lacking translational start sites. Only the wild-type ZBTB11-AS1 ORF sequence rescues the viability phenotype. N = 6 independent replicates for all conditions. Barplots represent mean ± standard deviation. d, DNA and amino acid sequences of the wild type and mutant ZBTB11-AS1 ORF cDNAs. *p < 0.05, **p < 0.01. n.s., non-significant. For P values: Parental, non-targeting vs siRNA #1 P < 0.0001, non-targeting vs siRNA #2 P < 0.0001; GFP, non-targeting vs siRNA #1 P = 0.0008, non-targeting vs siRNA #2, P < 0.0001; WT ORF, non-targeting vs siRNA #1 P = 0.04, non-targeting vs siRNA #2 P = 0.83; MUT ORF, non-targeting vs siRNA #1 P = 0.001, non-targeting vs siRNA #2 P = 0.02. P values by a two-tailed Student’s T test.

Extended Data Fig. 6 The GREP1 locus and expression.

a, A schematic representation of the GREP1 gene structure and the annotation of this locus in the indicated databases. The year of release for each database is indicated. b, mRNA expression level of GREP1 across tumor lineages in the Cancer Cell Line Encyclopedia. The Y axis is in a log10 scale. c, mRNA expression of GREP1 across tumor types using TCGA and GTex data. A two-tailed Student’s t-test was used to calculate significance of change between normal and cancer tissues. Cell lineages are grouped according to whether GREP1 expression is specifically modulated in cancer, universally expressed as a lineage gene, or not robustly expressed in the indicated lineage.

Extended Data Fig. 7 GREP1 is implicated in cell proliferation and breast cancer patient outcomes.

a, Cell viability curves following GREP1 knockout in three sensitive and three insensitive cell lines. GREP1 expression in the Cancer Cell Line Encyclopedia is indicated in transcripts per million (TPM) b) A scatter plot showing lineage-specific correlation between cell viability and GREP1 mRNA expression on the X axis with the average GREP1 expression level on the Y axis. c, Overall survival for breast cancer patients in the TCGA database stratified by GREP1 expression. N = 1,036 individual patients. N = 969 GREP1-low and N = 67 GREP1-high patients. Significance by a one-sided log-rank P value. d, Overall survival for colon cancer patients in the TCGA database stratified by GREP1 expression. N = 296 individual patients. N = 38 GREP1-high and N = 258 GREP1-low patients. Significance by a one-sided log-rank P value. e, Immunoblot of V5-tagged GREP1 or GFP in HEK293T cells in both whole cell lysate and conditioned media. A mutant GREP1, in which translational start sites were mutated to alanine, lacks protein translation initiation ability. Results were repeated in three independent experiments. i, Abundance of mass spec peptides detected in the full length GREP1 or cleavage product GREP1 proteins. Peptide abundance is represented as a fraction of total peptides detected. All error bars represent standard deviation.

Extended Data Fig. 8 GREP1 is associated with the extracellular matrix.

a, Total fraction of amino acid usage in the ORFeome, GENBANK, GREP1, and the Collagen alpha-1 family. Sequence similarities between GREP1 and the collagen family are indicated. b, Predicted disorder score for the GREP1 amino acid sequence. c, Amino acid conservation for detected homologs of GREP1 in the indicated species. d, Non-denaturing native western blot of GREP1 in conditioned media from HEK293T cells expressing V5-tagged GREP1. e, Representative Commassie-stained gels for immunoprecipitation of GREP1 from the conditioned media of HEK293T cells. Two representative biological replicates are shown. f, Enrichment of extracellular matrix proteins in the IP-MS data for GREP1 compared to IP-MS data for GFP. g, Gene Ontology Cellular Component analysis of proteins > = 2 fold enriched in GREP1 immunoprecipitation compared to GFP immunoprecipitations. h, IP MS total peptide count for fibronectin shown for three separate experiments. i, Commassie stain of V5 immunoprecapitation of V5-tagged GFP, GREP1 del_SLS or GREP1 constructs expressed in CAMA-1 cells following fractionation of cell lysate into cytoplasmic, membrane and cell media components. Results were repeated in 2 independent experiments. j, Western blot of endogenous fibronectin, E-cadherin, beta-actin and GAPDH in cell lysate or cell culture media for CAMA-1 cells expressing GFP, GREP1 del_SLS or GREP1 constructs as in panel i. Results were repeated in two independent experiments. k, IP mass spectrometry data showing the total peptide count for GREP1 and other top-scoring proteins following IP of V5-tagged GREP1 in HEK293T, ZR-75-1, and CAMA-1 cells. N = 4 independent IP MS experiments. Lines represent median ± interquartile (25-75%) range.

Source data

Extended Data Fig. 9 GREP1 regulates GDF15 in vitro and correlates with GDF15 expression in patient tumor tissues.

a, Cytokine profiling in HEK293T cells with transient ectopic GREP1 or GFP overexpression, ZR-75-1 cells with stable GREP1 knockout, or HDQP1 cells with stable GREP1 knockout. The change in signal abundance was calculated for each control/GREP1 pair. To rank cytokines, the average of the absolute values for the individual signal changes was plotted. b, GDF15 abundance by ELISA in ZR-75-1 and CAMA-1 cells overexpressing a GREP1 or GFP cDNA plasmid. N = 3 technical replicates. N = 2 independent experiments performed, with representative results shown. c, Spearman’s rho for GREP1 expression correlation with GDF15, EMILIN2, or FN1 in the indicated TCGA datasets. d, Spearman’s p value for the GREP1 correlation coefficient for GREP1 correlation with GDF15, EMILIN2, or FN1 in the indicated TCGA datasets. e-g, Recombinant GDF15 partially rescues GREP1 knockout. CAMA-1, ZR-75-1 or T47D Cas9 cells were infected with the indicated sgRNAs. 24 hours after infection, cells were treated with vehicle control or increasing concentration of recombinant human GDF15 as shown. Relative abundance was measured 7 days after infection. N = 5 for all conditions in panel e. N = 6 for all conditions in panel f. N = 5 for all conditions in panel g. All error bars represent standard deviation. Two independent experiments were performed for panels e–g.

Supplementary information

Supplementary Information

Supplementary Figures 1–14, Supplementary Discussion and Supplementary References.

Reporting Summary

Supplementary Tables 1–17

This file contains Supplementary Tables 1–17, including additional information and source data for Fig. 1.

Supplementary Tables 18–20

This file contains Supplementary Tables 18–20, including additional information and source data for Fig. 2.

Supplementary Tables 21–32

This file contains Supplementary Tables 21–32, including additional information and source data for Fig. 3.

Supplementary Tables 33–38

This file contains Supplementary Tables 33–38, including additional information and source data for Fig. 4.

Source data

Source Data Fig. 1

Unprocessed immunoblot images used in Fig. 1d.

Source Data Fig. 3

Unprocessed immunoblot images used in Fig. 3d.

Source Data Fig. 4

Unprocessed Coomassie image used in Fig. 4i.

Source Data Fig. 4

Unprocessed immunoblot images used to generate cytokine data in Fig. 4k.

Source Data Fig. 3

A table, including the unprocessed sequencing read counts for each gRNA at each time point used in the primary CRISPR screen shown in Fig. 3.

Source Data Fig. 3

A table, including the unprocessed sequencing read counts for each gRNA at each time point used in the secondary CRISPR screen shown in Fig. 3.

Source Data Extended Data Fig. 8

Unprocessed native immunoblot images used in Extended Data Fig. 8d.

Source Data Extended Data Fig. 8

Unprocessed immunoblot images used in Extended Data Fig. 8j.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Prensner, J.R., Enache, O.M., Luria, V. et al. Noncanonical open reading frames encode functional proteins essential for cancer cell survival. Nat Biotechnol 39, 697–704 (2021). https://doi.org/10.1038/s41587-020-00806-2

Download citation

Received: 18 February 2020
Accepted: 16 December 2020
Published: 28 January 2021
Issue Date: June 2021
DOI: https://doi.org/10.1038/s41587-020-00806-2

This article is cited by

Widespread stable noncanonical peptides identified by integrated analyses of ribosome profiling and ORF features
- Haiwang Yang
- Qianru Li
- Zhe Ji
Nature Communications (2024)
SUsPECT: a pipeline for variant effect prediction based on custom long-read transcriptomes for improved clinical variant annotation
- Renee Salz
- Nuno Saraiva-Agostinho
- Peter A.C. ’t Hoen
BMC Genomics (2023)
Small open reading frames: a comparative genetics approach to validation
- Niyati Jain
- Felix Richter
- Bruce D. Gelb
BMC Genomics (2023)
Evolution and implications of de novo genes in humans
- Luuk A. Broeils
- Jorge Ruiz-Orera
- Sebastiaan van Heesch
Nature Ecology & Evolution (2023)
A novel tumor suppressor encoded by a 1p36.3 lncRNA functions as a phosphoinositide-binding protein repressing AKT phosphorylation/activation and promoting autophagy
- Lili Li
- Xing-sheng Shu
- Qian Tao
Cell Death & Differentiation (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links