Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Integrative classification of human coding and noncoding genes through RNA metabolism profiles

Abstract

Pervasive transcription of the human genome results in a heterogeneous mix of coding RNAs and long noncoding RNAs (lncRNAs). Only a small fraction of lncRNAs have demonstrated regulatory functions, thus making functional lncRNAs difficult to distinguish from nonfunctional transcriptional byproducts. This difficulty has resulted in numerous competing human lncRNA classifications that are complicated by a steady increase in the number of annotated lncRNAs. To address these challenges, we quantitatively examined transcription, splicing, degradation, localization and translation for coding and noncoding human genes. We observed that annotated lncRNAs had lower synthesis and higher degradation rates than mRNAs and discovered mechanistic differences explaining slower lncRNA splicing. We grouped genes into classes with similar RNA metabolism profiles, containing both mRNAs and lncRNAs to varying extents. These classes exhibited distinct RNA metabolism, different evolutionary patterns and differential sensitivity to cellular RNA-regulatory pathways. Our classification provides an alternative to genomic context-driven annotations of lncRNAs.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Progressive metabolic labeling of RNA.
Figure 2: Dynamics of intron excision.
Figure 3: RNA metabolism of mRNA and lncRNA.
Figure 4: Classification of genes according to RNA metabolism profiles.
Figure 5: Evolutionary and regulatory differences among RNA classes.
Figure 6: Distinct behavior of lncRNA classes.

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. 1

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    CAS  Google Scholar 

  2. 2

    Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Iyer, M.K. et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    CAS  Article  Google Scholar 

  5. 5

    van Heesch, S. et al. Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes. Genome Biol. 15, R6 (2014).

    PubMed  PubMed Central  Google Scholar 

  6. 6

    Ingolia, N.T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Guttman, M., Russell, P., Ingolia, N.T., Weissman, J.S. & Lander, E.S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Bánfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).

    PubMed  PubMed Central  Google Scholar 

  9. 9

    Calviello, L. et al. Detecting actively translated open reading frames in ribosome profiling data. Nat. Methods 13, 165–170 (2016).

    CAS  PubMed  Google Scholar 

  10. 10

    Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat. Struct. Mol. Biol. 14, 103–105 (2007).

    CAS  PubMed  Google Scholar 

  13. 13

    Andersson, R. et al. Nuclear stability and transcriptional directionality separate functionally distinct RNA species. Nat. Commun. 5, 5336 (2014).

    CAS  PubMed  Google Scholar 

  14. 14

    Quek, X.C. et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 43, D168–D173 (2015).

    CAS  PubMed  Google Scholar 

  15. 15

    Rinn, J.L. & Chang, H.Y. Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145–166 (2012).

    CAS  Google Scholar 

  16. 16

    Ulitsky, I. & Bartel, D.P. lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    St Laurent, G., Wahlestedt, C. & Kapranov, P. The landscape of long noncoding RNA classification. Trends Genet. 31, 239–251 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18

    Keene, J.D. RNA regulons: coordination of post-transcriptional events. Nat. Rev. Genet. 8, 533–543 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Le Hir, H., Nott, A. & Moore, M.J. How introns influence and enhance eukaryotic gene expression. Trends Biochem. Sci. 28, 215–220 (2003).

    CAS  PubMed  Google Scholar 

  20. 20

    Cabili, M.N. et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 16, 20 (2015).

    PubMed  PubMed Central  Google Scholar 

  21. 21

    Windhager, L. et al. Ultrashort and progressive 4sU-tagging reveals key characteristics of RNA processing at nucleotide resolution. Genome Res. 22, 2031–2042 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Fong, N. et al. Pre-mRNA splicing is facilitated by an optimal RNA polymerase II elongation rate. Genes Dev. 28, 2663–2676 (2014).

    PubMed  PubMed Central  Google Scholar 

  23. 23

    Sultan, M. et al. Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics 15, 675 (2014).

    PubMed  PubMed Central  Google Scholar 

  24. 24

    Sterne-Weiler, T. et al. Frac-seq reveals isoform-specific recruitment to polyribosomes. Genome Res. 23, 1615–1623 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    de Pretis, S. et al. INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments. Bioinformatics 31, 2829–2835 (2015).

    CAS  PubMed  Google Scholar 

  27. 27

    Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. 22, 1616–1625 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Haerty, W. & Ponting, C.P. Unexpected selection to retain high GC content and splicing enhancers within exons of multiexonic lncRNA loci. RNA 21, 333–346 (2015).

    PubMed  Google Scholar 

  29. 29

    Schüler, A., Ghanbarian, A.T. & Hurst, L.D. Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs. Mol. Biol. Evol. 31, 3164–3183 (2014).

    PubMed  PubMed Central  Google Scholar 

  30. 30

    Hsin, J.-P. & Manley, J.L. The RNA polymerase II CTD coordinates transcription and RNA processing. Genes Dev. 26, 2119–2137 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Nojima, T. et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell 161, 526–540 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Hirose, Y., Tacke, R. & Manley, J.L. Phosphorylated RNA polymerase II stimulates pre-mRNA splicing. Genes Dev. 13, 1234–1239 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33

    Gregersen, L.H. et al. MOV10 Is a 5′ to 3′ RNA helicase contributing to UPF1 mRNA target degradation by translocation along 3′ UTRs. Mol. Cell 54, 573–585 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34

    Rabani, M. et al. Metabolic labeling of RNA uncovers principles of RNA production and degradation dynamics in mammalian cells. Nat. Biotechnol. 29, 436–442 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Clark, M.B. et al. Genome-wide analysis of long noncoding RNA stability. Genome Res. 22, 885–898 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Tani, H. et al. Genome-wide determination of RNA stability reveals hundreds of short-lived noncoding transcripts in mammals. Genome Res. 22, 947–956 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Bahar Halpern, K. et al. Nuclear retention of mRNA in mammalian tissues. Cell Rep. 13, 2653–2662 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Battich, N., Stoeger, T. & Pelkmans, L. Control of transcript variability in single mammalian cells. Cell 163, 1596–1610 (2015).

    CAS  Google Scholar 

  39. 39

    Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    CAS  Google Scholar 

  40. 40

    Zhang, Y.E., Vibranovski, M.D., Landback, P., Marais, G.A.B. & Long, M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol. 8, e1000494 (2010).

    PubMed  PubMed Central  Google Scholar 

  41. 41

    Necsulea, A. et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).

    CAS  Google Scholar 

  42. 42

    Kutter, C. et al. Rapid turnover of long noncoding RNAs and the evolution of gene expression. PLoS Genet. 8, e1002841 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Wu, X. & Sharp, P.A. Divergent transcription: a driving force for new gene origination? Cell 155, 990–996 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Mukherjee, N. et al. Integrative regulatory mapping indicates that the RNA-binding protein HuR couples pre-mRNA processing and mRNA stability. Mol. Cell 43, 327–339 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Bresson, S.M., Hunter, O.V., Hunter, A.C. & Conrad, N.K. Canonical poly(A) polymerase activity promotes the decay of a wide variety of mammalian nuclear RNAs. PLoS Genet. 11, e1005610 (2015).

    PubMed  PubMed Central  Google Scholar 

  46. 46

    Gulko, B., Hubisz, M.J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Marques, A.C. et al. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol. 14, R131 (2013).

    PubMed  PubMed Central  Google Scholar 

  48. 48

    Michalik, K.M. et al. Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth. Circ. Res. 114, 1389–1397 (2014).

    CAS  Google Scholar 

  49. 49

    Kretz, M. et al. Suppression of progenitor differentiation requires the long noncoding RNA ANCR. Genes Dev. 26, 338–343 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Yuan, S.X. et al. Long noncoding RNA DANCR increases stemness features of hepatocellular carcinoma by derepression of CTNNB1. Hepatology 63, 499–511 (2016).

    CAS  PubMed  Google Scholar 

  51. 51

    Tripathi, V. et al. The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol. Cell 39, 925–938 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).

    CAS  PubMed  Google Scholar 

  53. 53

    Zhang, X. et al. A myelopoiesis-associated regulatory intergenic noncoding RNA transcript within the human HOXA cluster. Blood 113, 2526–2534 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Rabani, M. et al. High-resolution sequencing and modeling identifies distinct dynamic RNA regulatory strategies. Cell 159, 1698–1710 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Yang, J.-R. & Zhang, J. Human long noncoding RNAs are substantially less folded than messenger RNAs. Mol. Biol. Evol. 32, 970–977 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Ulveling, D., Francastel, C. & Hubé, F. When one is better than two: RNA with dual functions. Biochimie 93, 633–644 (2011).

    CAS  PubMed  Google Scholar 

  57. 57

    Sauvageau, M. et al. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife 2, e01749 (2013).

    PubMed  PubMed Central  Google Scholar 

  58. 58

    Bassett, A.R. et al. Considerations when investigating lncRNA function in vivo. eLife 3, e03058 (2014).

    PubMed  PubMed Central  Google Scholar 

  59. 59

    Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Li, B. & Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61

    Fraley, C., Raftery, A.E., Murphy, T.B. & Scrucca, L. Mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation (Department of Statistics, University of Washington, 2012).

  62. 62

    Fraley, C. & Raftery, A.E. Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002).

    Google Scholar 

  63. 63

    Doelken, P., Huggins, J.T., Goldblatt, M., Nietert, P. & Sahn, S.A. Effects of coexisting pneumonia and end-stage renal disease on pleural fluid analysis in patients with hydrostatic pleural effusion. Chest 143, 1709–1716 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    PubMed  PubMed Central  Google Scholar 

  65. 65

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66

    Duttke, S.H. et al. Human promoters are intrinsically directional. Mol. Cell 57, 674–684 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Pervouchine, D.D., Knowles, D.G. & Guigó, R. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29, 273–274 (2013).

    CAS  PubMed  Google Scholar 

  68. 68

    Yeo, G. & Burge, C.B. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 11, 377–394 (2004).

    CAS  Google Scholar 

  69. 69

    Corvelo, A., Hallegger, M., Smith, C.W.J. & Eyras, E. Genome-wide association between branch point properties and alternative splicing. PLoS Comput. Biol. 6, e1001016 (2010).

    PubMed  PubMed Central  Google Scholar 

  70. 70

    Schwartz, S., Hall, E. & Ast, G. SROOGLE: webserver for integrative, user-friendly visualization of splicing signals. Nucleic Acids Res. 37, W189–W192 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Duffy, E.E. et al. Tracking distinct RNA populations using efficient and reversible covalent chemistry. Mol. Cell 59, 858–866 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2015).

  73. 73

    Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).

    Google Scholar 

  74. 74

    Ladewig, E., Okamura, K., Flynt, A.S., Westholm, J.O. & Lai, E.C. Discovery of hundreds of mirtrons in mouse and human small RNA data. Genome Res. 22, 1634–1645 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Wiwie, C., Baumbach, J. & Röttger, R. Comparing the performance of biomedical clustering methods. Nat. Methods 12, 1033–1038 (2015).

    CAS  PubMed  Google Scholar 

  76. 76

    Kishore, S. et al. Insights into snoRNA biogenesis and processing from PAR-CLIP of snoRNA core proteins and small RNA sequencing. Genome Biol. 14, R45 (2013).

    PubMed  PubMed Central  Google Scholar 

  77. 77

    Akalin, A., Franke, V., Vlahovicˇek, K., Mason, C.E. & Schübeler, D. Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics 31, 1127–1129 (2015).

    CAS  PubMed  Google Scholar 

  78. 78

    Shen, L. GeneOverlap: Test and Visualize Gene Overlaps (Mount Sinai, 2013).

  79. 79

    Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Series B Stat. Methodol. 63, 411–423 (2001).

    Google Scholar 

  80. 80

    van Buuren, S. & Groothuis-Oudshoorn, K. Mice: multivariate imputation by chained equations in r. J. Stat. Softw. 45, 1–67 (2011).

    Google Scholar 

  81. 81

    Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev. Genet. 15, 829–845 (2014).

    CAS  PubMed  Google Scholar 

  82. 82

    Gaujoux, R. & Seoighe, C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics 11, 367 (2010).

    PubMed  PubMed Central  Google Scholar 

  83. 83

    Gaujoux, R. & Seoighe, C. Using the Package nMF (CRAN, 2015).

  84. 84

    Gaujoux, R. & Seoighe, C. The Package nMF: Manual Pages (CRAN, 2015).

  85. 85

    Hahne, F. & Ivanek, R. in Statistical Genomics: Methods and Protocols (eds. Mathé, E. & Davis, S.) 335–351 (Springer, 2016).

  86. 86

    Kim, S. ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients. Commun. Stat. Appl. Methods 22, 665–674 (2015).

    PubMed  PubMed Central  Google Scholar 

  87. 87

    Epskamp, S., Cramer, A.O.J., Waldorp, L.J., Schmittmann, V.D. & Borsboom, D. Qgraph: network visualizations of relationships in psychometric data. J. Stat. Softw. 48, 1–18 (2012).

    Google Scholar 

  88. 88

    Spasic, M. et al. Genome-wide assessment of AU-rich elements by the AREScore algorithm. PLoS Genet. 8, e1002433 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

U.O. acknowledges support from an award from the US National Institutes of Health (R01-GM104962) and the Simons Institute for the Theory of Computing at UC Berkeley, where he was a long-term visitor in the Algorithmic Challenges in Genomics Program in the spring of 2015. N.M. acknowledges support from EU Marie Curie IIF.

Author information

Affiliations

Authors

Contributions

N.M. and U.O. conceived the project; N.M. and U.O. developed the methodology; N.M., L.C. and S.d.P. developed software and performed formal analysis; N.M. and A.H. conducted the investigation; N.M. conducted the visualization; N.M. and U.O. wrote the original draft; L.C., S.d.P. and M.P. reviewed and edited the paper; N.M. and U.O. acquired funding; N.M. and U.O. provided resources; N.M. and U.O. supervised the project.

Corresponding authors

Correspondence to Neelanjan Mukherjee or Uwe Ohler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 ERCC fit and table.

The fit between the number of expected ERCC molecules and the observed TPM measurement for total RNA depleted of rRNA with (a) ribozero or (b) RNAseH with oligos targetting rRNA, and (c) RNA that underwent one round of polyA selection. (d) Median gene expression (TPM) of 101 tissues/cell lines from strand-specific paired-end RNA seq generated by ENCODE. CPC distribution for the high expression population of (e) intronless and (f) multiexonic genes in HEK293 cells. (g) Boxplot of the distribution of primary RNA fraction for different data types for genes with mature RNA TPM > 1 in total RNA (n = 12,033 genes). Coverage depth compared to coverage breadth of 4SU and GROseq data for (h) the genome, (i) introns of coding genes, (j) intergenic enhancers, (k) exons of coding genes, (l) exons of lncRNA, and (m) introns of lncRNA.

Supplementary Figure 2 Splicing metrics, features and alternative models.

(a) Description of θ. (b) Features utilized in splicing models: physical features (orange), canonical splicing signals (blue) and splicing regulatory element density (blue). For details regarding calculation of splice site strengthp olypyrimidine tract score, branchpoint score and exonic splicing enhancer and silencer see Supplemental Experimental Procedures. (c) Violin plot of the θ calculated for introns of coding genes, lncRNA, mirtrons and snoRNA host introns. (d) The average r-squared for all regression models generated for each labeling time point and intron category. (e) The spearman correlation coefficient for each feature with θ for different feature categories and intron types. (f) The variable importance for different feature categories and intron types. The average NET-seq signal +/- 25 nucleotides from the 5' splice site for (g) total RNA polymerase II, (h) unphosphorylated RNA polymerase II, and (i) ser2p RNA polymerase II.

Supplementary Figure 3 Comparison of inferred rates.

Boxplot of the (a) synthesis, (b) processing, and (c) degradation rates. (d) The Pearson correlation between rates derived from all timepoints. (e) The distribution of polysomal vs cytosolic ratio for coding genes, lncRNAs, and pseudeogenes. (f) The distribution of synthesis rates for polyribosomal lncRNAs divided into groups based on the presence of a translated ORF.

Supplementary Figure 4 Characterizing class behavior.

(a) Optimal cluster number estimation by gap statistic. (b) Clustering GO enrichment for protein-coding genes and fold-enrichment of unclassified genes (grey). (c) Steady-state HEK293 expression distribution for each class. (d) Tissue specificity score distribution for each class and genes with inssuficient metabolic datain HEK293 cells. (e) Nuc/Cyt localization in mouse liver RNA-seq. (f) Odds-ratio for enrichment of the "core" and "missing " proteome.

Supplementary Figure 5 Regulatory and fitness differences.

a) Log 2 fold change of RBP perturbation - control for K562 ENCODE data. (b) Boxplot of cytoplasmic vs nuclear localization for genes grouped by origination class. A line is depicted connecting the means for each class (point).

Supplementary Figure 6 Characterization of lncRNAs in classes.

(a) The odds-ratio of the overlap between lncRNAs that were either found ("yes") or not found ("no") in lncRNADb for each lncRNA biotype. The numbers represent the gene count in each category. The fraction of nucleotides in a particular class with a fitCons score > S calculated for (b) coding exons and (c) 3' UTR exons of protein-coding genes and (d) all exons of lncRNA genes defined by GENCODE V19. The signal in the “sense-intronic” category, which are genes located in the intron of protein-coding genes, may be due to higher than background signal within introns of coding genes. (e-g) Coverage of 4SU (blue), total (green), cytoplasmic (red) and nuclear (cyan) RNA profiles for example lncRNAs from c6, c5, and c7, respectively. (h) Comparison of median values for each RNA metabolism feature by each class for coding genes (left) and lncRNAs (right).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–6 and Supplementary Note (PDF 1805 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mukherjee, N., Calviello, L., Hirsekorn, A. et al. Integrative classification of human coding and noncoding genes through RNA metabolism profiles. Nat Struct Mol Biol 24, 86–96 (2017). https://doi.org/10.1038/nsmb.3325

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing