Analysis

The landscape of long noncoding RNAs in the human transcriptome

Received:
Accepted:
Published online:

Abstract

Long noncoding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. To delineate genome-wide lncRNA expression, we curated 7,256 RNA sequencing (RNA-seq) libraries from tumors, normal tissues and cell lines comprising over 43 Tb of sequence from 25 independent studies. We applied ab initio assembly methodology to this data set, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements, and 7% (3,900) overlapped disease-associated SNPs. To prioritize lineage-specific, disease-associated lncRNA expression, we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light on normal biology and cancer pathogenesis and may be valuable for future biomarker development.

  • Subscribe to Nature Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int. J. Cancer 136, E359–E386 (2015).

  2. 2.

    et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  3. 3.

    et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013).

  4. 4.

    et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  5. 5.

    & lincRNAs: genomics, evolution, and mechanisms. Cell 154, 26–46 (2013).

  6. 6.

    & The emergence of lncRNAs in cancer biology. Cancer Discov. 1, 391–407 (2011).

  7. 7.

    et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).

  8. 8.

    et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).

  9. 9.

    et al. Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat. Biotechnol. 29, 742–749 (2011).

  10. 10.

    et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  11. 11.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  12. 12.

    et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014).

  13. 13.

    et al. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 42, D764–D770 (2014).

  14. 14.

    et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Res. 41, e74 (2013).

  15. 15.

    et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

  16. 16.

    et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

  17. 17.

    et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

  18. 18.

    et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

  19. 19.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  20. 20.

    et al. ENCODE data in the UCSC genome browser: year 5 update. Nucleic Acids Res. 41, D56–D63 (2013).

  21. 21.

    et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640 (2014).

  22. 22.

    & UCNEbase—a database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 41, D101–D109 (2013).

  23. 23.

    et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004).

  24. 24.

    et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

  25. 25.

    et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

  26. 26.

    et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 487, 239–243 (2012).

  27. 27.

    et al. Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J. Clin. Oncol. 22, 2790–2799 (2004).

  28. 28.

    et al. Integrative genomic profiling of human prostate cancer. Cancer Cell 18, 11–22 (2010).

  29. 29.

    et al. TP53 genomics predict higher clinical and pathologic tumor response in operable early-stage breast cancer treated with docetaxel-capecitabine ± trastuzumab. Breast Cancer Res. Treat. 132, 781–791 (2012).

  30. 30.

    et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).

  31. 31.

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  32. 32.

    et al. Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia 9, 166–180 (2007).

  33. 33.

    , , , & Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. (31 October 2014).

  34. 34.

    et al. LIFR is a breast cancer metastasis suppressor upstream of the Hippo-YAP pathway and a prognostic marker. Nat. Med. 18, 1511–1517 (2012).

  35. 35.

    et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076 (2010).

  36. 36.

    et al. The long noncoding RNA SChLAP1 promotes aggressive prostate cancer and antagonizes the SWI/SNF complex. Nat. Genet. 45, 1392–1398 (2013).

  37. 37.

    et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

  38. 38.

    et al. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat. Genet. 41, 579–584 (2009).

  39. 39.

    et al. Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor–positive breast cancer. Nat. Genet. 39, 865–869 (2007).

  40. 40.

    et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).

  41. 41.

    et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat. Genet. 42, 504–507 (2010).

  42. 42.

    et al. A combined analysis of genome-wide association studies in breast cancer. Breast Cancer Res. Treat. 126, 717–727 (2011).

  43. 43.

    , , , & lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 39, D146–D151 (2011).

  44. 44.

    et al. LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 41, D246–D251 (2013).

  45. 45.

    , , , & lncRNAtor: a comprehensive resource for functional investigation of long noncoding RNAs. Bioinformatics 30, 2480–2485 (2014).

  46. 46.

    , & Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs. PLoS Genet. 9, e1003569 (2013).

  47. 47.

    et al. Activation of p53 by MEG3 non-coding RNA. J. Biol. Chem. 282, 24731–24742 (2007).

  48. 48.

    et al. Urine TMPRSS2:ERG fusion transcript stratifies prostate cancer risk in men with elevated serum PSA. Sci. Transl. Med. 3, 94ra72 (2011).

  49. 49.

    et al. PCAT-1, a long noncoding RNA, regulates BRCA2 and controls homologous recombination in cancer. Cancer Res. 74, 1651–1660 (2014).

  50. 50.

    et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  51. 51.

    Recognition of protein coding regions in DNA sequences. Nucleic Acids Res. 10, 5303–5318 (1982).

  52. 52.

    et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

  53. 53.

    et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

  54. 54.

    et al. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13, 134 (2012).

  55. 55.

    & Human housekeeping genes, revisited. Trends Genet. 29, 569–574 (2013).

  56. 56.

    et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell 120, 169–181 (2005).

  57. 57.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  58. 58.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  59. 59.

    & Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

  60. 60.

    , , , & RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

  61. 61.

    et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).

Download references

Acknowledgements

We thank B. Palen and J. Hallum for technical assistance with the high-performance computing cluster, S. Roychowdhury for reviewing the manuscript, the University of Michigan DNA Sequencing Core for Sanger sequencing and K. Giles for critically reading the manuscript and for the submission of documents. This work was supported in part by US National Institutes of Health Prostate Specialized Program of Research Excellence grant P50 CA69568, Early Detection Research Network grant UO1 CA111275, US National Institutes of Health grants R01 CA132874 and RO1 CA154365 (D.G.B. and A.M.C.), and US Department of Defense grant PC100171 (A.M.C.). A.M.C. is supported by the Prostate Cancer Foundation and the Howard Hughes Medical Institute. A.M.C. is an American Cancer Society Research Professor and a Taubman Scholar of the University of Michigan. R.M. was supported by a Prostate Cancer Foundation Young Investigator Award and by US Department of Defense Post-Doctoral Fellowship W81XWH-13-1-0284. Y.S.N. is supported by a University of Michigan Cellular and Molecular Biology National Research Service Award Institutional Predoctoral Training Grant.

Author information

Author notes

    • Matthew K Iyer
    •  & Yashar S Niknafs

    These authors contributed equally to this work.

Affiliations

  1. Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan, USA.

    • Matthew K Iyer
    • , Yashar S Niknafs
    • , Rohit Malik
    • , Udit Singhal
    • , Anirban Sahu
    • , Yasuyuki Hosono
    • , Terrence R Barrette
    • , John R Prensner
    • , Joseph R Evans
    • , Shuang Zhao
    • , Anton Poliakov
    • , Xuhong Cao
    • , Saravana M Dhanasekaran
    • , Yi-Mi Wu
    • , Dan R Robinson
    • , Felix Y Feng
    •  & Arul M Chinnaiyan
  2. Department of Computational Medicine and Bioinformatics, Ann Arbor, Michigan, USA.

    • Matthew K Iyer
    •  & Arul M Chinnaiyan
  3. Department of Cellular and Molecular Biology, University of Michigan, Ann Arbor, Michigan, USA.

    • Yashar S Niknafs
  4. Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA.

    • Rohit Malik
    • , Anirban Sahu
    • , Saravana M Dhanasekaran
    •  & Arul M Chinnaiyan
  5. Howard Hughes Medical Institute, University of Michigan, Ann Arbor, Michigan, USA.

    • Udit Singhal
    • , Xuhong Cao
    •  & Arul M Chinnaiyan
  6. Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan, USA.

    • Joseph R Evans
    • , Shuang Zhao
    • , David G Beer
    •  & Felix Y Feng
  7. Section of Thoracic Surgery, Department of Surgery, University of Michigan, Ann Arbor, Michigan, USA.

    • David G Beer
  8. Comprehensive Cancer Center, University of Michigan, Ann Arbor, Michigan, USA.

    • Felix Y Feng
    •  & Arul M Chinnaiyan
  9. Department of Statistics, Colorado State University, Fort Collins, Colorado, USA.

    • Hariharan K Iyer
  10. Department of Urology, University of Michigan, Ann Arbor, Michigan, USA.

    • Arul M Chinnaiyan

Authors

  1. Search for Matthew K Iyer in:

  2. Search for Yashar S Niknafs in:

  3. Search for Rohit Malik in:

  4. Search for Udit Singhal in:

  5. Search for Anirban Sahu in:

  6. Search for Yasuyuki Hosono in:

  7. Search for Terrence R Barrette in:

  8. Search for John R Prensner in:

  9. Search for Joseph R Evans in:

  10. Search for Shuang Zhao in:

  11. Search for Anton Poliakov in:

  12. Search for Xuhong Cao in:

  13. Search for Saravana M Dhanasekaran in:

  14. Search for Yi-Mi Wu in:

  15. Search for Dan R Robinson in:

  16. Search for David G Beer in:

  17. Search for Felix Y Feng in:

  18. Search for Hariharan K Iyer in:

  19. Search for Arul M Chinnaiyan in:

Contributions

M.K.I., Y.S.N. and A.M.C. conceived the study and analyses. M.K.I. processed RNA-seq data and performed ab initio assembly. M.K.I. and Y.S.N. performed data processing and data analysis with assistance from T.R.B., R.M., A.S., Y.H., J.R.E., S.Z., J.R.P. and F.Y.F. R.M., U.S., A.S. and Y.H. performed quantitative PCR validations. M.K.I. and Y.S.N. developed SSEA with the help of H.K.I. D.G.B. contributed primary samples. D.R.R., Y.-M.W. and S.M.D. generated RNA-seq libraries, and X.C. performed the sequencing. M.K.I., Y.S.N. and A.S. developed the web resource. T.R.B. provided systems administration, data storage, high-performance computing and networking support. A.P. performed the proteomics analysis. M.K.I., Y.S.N. and A.M.C. wrote the manuscript. All authors discussed results and commented on the manuscript.

Competing interests

Oncomine is supported by ThermoFisher, Inc. (previously Life Technologies and Compendia Biosciences). A.M.C. was a co-founder of Compendia Biosciences and served on the scientific advisory board of Life Technologies before it was acquired. The University of Michigan has filed a patent application for the use of a subset of the lncRNAs described in this study as biomarkers of cancer.

Corresponding author

Correspondence to Arul M Chinnaiyan.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1-12 and Supplementary Note.

Excel files

  1. 1.

    Supplementary Tables 1-9, 11, 12, 14 and 15

    Supplementary Tables 1-9, 11, 12, 14 and 15.

  2. 2.

    Supplementary Table 10

    Specific details for lncRNA discoveries.

  3. 3.

    Supplementary Table 13

    GSEA results.