Review Article | Published:

Role of non-coding sequence variants in cancer

Nature Reviews Genetics volume 17, pages 93108 (2016) | Download Citation

Abstract

Patients with cancer carry somatic sequence variants in their tumour in addition to the germline variants in their inherited genome. Although variants in protein-coding regions have received the most attention, numerous studies have noted the importance of non-coding variants in cancer. Moreover, the overwhelming majority of variants, both somatic and germline, occur in non-coding portions of the genome. We review the current understanding of non-coding variants in cancer, including the great diversity of the mutation types — from single nucleotide variants to large genomic rearrangements — and the wide range of mechanisms by which they affect gene expression to promote tumorigenesis, such as disrupting transcription factor-binding sites or functions of non-coding RNAs. We highlight specific case studies of somatic and germline variants, and discuss how non-coding variants can be interpreted on a large-scale through computational and experimental methods.

Key points

  • Germline and somatic sequence variants in non-coding regions can play an important role in cancer.

  • Many different modes of action of non-coding variants are known. For example, point mutations and complex genomic rearrangements can disrupt or create transcription factor-binding sites or affect non-coding RNA loci.

  • Oncogenesis involves an interplay between germline and somatic variants.

  • Drivers in non-coding regions can be identified using computational methods that analyse functional effects of variants and recurrence across multiple samples.

  • Functional effects of non-coding variants can be studied by various experimental approaches.

  • The overall role of non-coding variants in tumorigenesis is currently likely underestimated as only a handful of genome-wide studies of tumours have analysed them. However, current and future efforts involving large-scale whole-genome sequencing of tumours are likely to shed more light on the importance of non-coding variants in cancer.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  2. 2.

    & Genome-wide association studies in cancer. Hum. Mol. Genet. 17, R109–R115 (2008).

  3. 3.

    et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

  4. 4.

    , , & On the identification of potential regulatory variants within genome wide association candidate SNP sets. BMC Med. Genomics 7, 34 (2014).

  5. 5.

    et al. Epigenomic enhancer profiling defines a signature of colon cancer. Science 336, 736–739 (2012).

  6. 6.

    , & Enhancer alterations in cancer: a source for a cell identity crisis. Genome Med. 6, 77 (2014).

  7. 7.

    & The emergence of lncRNAs in cancer biology. Cancer Discov. 1, 391–407 (2011).

  8. 8.

    et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 47, 199–208 (2015).

  9. 9.

    , , & Mining cancer methylomes: prospects and challenges. Trends Genet. 30, 75–84 (2014).

  10. 10.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  11. 11.

    et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

  12. 12.

    et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492, 438–442 (2012).

  13. 13.

    Somatic mosaicism in healthy human tissues. Trends Genet. 27, 217–223 (2011).

  14. 14.

    et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013). Shows how mutational heterogeneity in the genome can lead to false positives during the identification of cancer driver genes.

  15. 15.

    et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013). One of the first papers showing prevalence of TERT promoter mutations in cancer.

  16. 16.

    , & A powerful test for multiple rare variants association studies that incorporates sequencing qualities. Nucleic Acids Res. 40, e60 (2012).

  17. 17.

    et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

  18. 18.

    et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).

  19. 19.

    & Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis. Nat. Rev. Mol. Cell Biol. 10, 478–487 (2009).

  20. 20.

    et al. Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979–993 (2012).

  21. 21.

    & Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2012).

  22. 22.

    & DNAse footprinting: a simple method for the detection of protein−DNA binding specificity. Nucleic Acids Res. 5, 3157–3170 (1978).

  23. 23.

    et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 489, 83–90 (2012).

  24. 24.

    et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). Discussion of functional annotations from the ENCODE project.

  25. 25.

    et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205–212 (2014).

  26. 26.

    & 3C-based technologies to study the shape of the genome. Methods 58, 189–191 (2012).

  27. 27.

    et al. Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012).

  28. 28.

    The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  29. 29.

    et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012).

  30. 30.

    The NIH Roadmap Epigenomics Program data resource. Epigenomics 4, 317–324 (2012).

  31. 31.

    et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  32. 32.

    et al. Systematic dissection of the sequence determinants of gene 3′ end mediated expression control. PLoS Genet. 11, e1005147 (2015).

  33. 33.

    et al. Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast. Proc. Natl Acad. Sci. USA 110, E2792–E2801 (2013).

  34. 34.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  35. 35.

    The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  36. 36.

    & The rise of regulatory RNA. Nat. Rev. Genet. 15, 423–437 (2014).

  37. 37.

    & Modular regulatory principles of large non-coding RNAs. Nature 482, 339–346 (2012).

  38. 38.

    , , , & Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756 (2008).

  39. 39.

    et al. Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311–1323 (2007).

  40. 40.

    , , , & Requirement for Xist in X chromosome inactivation. Nature 379, 131–137 (1996).

  41. 41.

    , , & Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes. Genes Dev. 24, 2264–2269 (2010).

  42. 42.

    et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).

  43. 43.

    et al. Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res. 38, 6997–7007 (2010).

  44. 44.

    & Genomics: protein fossils live on as RNA. Nature 453, 729–731 (2008).

  45. 45.

    et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008).

  46. 46.

    et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000).

  47. 47.

    & Genomic strategies to identify mammalian regulatory sequences. Nat. Rev. Genet. 2, 100–109 (2001).

  48. 48.

    et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

  49. 49.

    et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521 (2004).

  50. 50.

    et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).

  51. 51.

    et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004).

  52. 52.

    , & Transcribed ultraconserved region in human cancers. RNA Biol. 10, 1771–1777 (2013).

  53. 53.

    et al. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell 12, 215–229 (2007).

  54. 54.

    et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013). One of the first methods for genome-wide identification of non-coding candidate cancer drivers.

  55. 55.

    et al. Human genome ultraconserved elements are ultraselected. Science 317, 915 (2007).

  56. 56.

    & Evidence of abundant purifying selection in humans for recently acquired regulatory functions. Science 337, 1675–1678 (2012).

  57. 57.

    , , & VISTA Enhancer Browser — a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).

  58. 58.

    , , , & Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014). Analysis of hundreds of cancer whole-genomes to identify driver mutations in non-coding regions.

  59. 59.

    , , & Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).

  60. 60.

    et al. Signatures of accelerated somatic evolution in gene promoters in multiple cancer types. Nucleic Acids Res. 43, 5307–5317 (2015).

  61. 61.

    , , & Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).

  62. 62.

    et al. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).

  63. 63.

    et al. Non-coding recurrent mutations in chronic lymphocytic leukaemia. Nature 526, 519–524 (2015).

  64. 64.

    & Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13, 36–46 (2012).

  65. 65.

    et al. A streamlined method for detecting structural variants in cancer genomes by short read paired-end sequencing. PLoS ONE 7, e48314 (2012).

  66. 66.

    , & Advances in understanding cancer genomes through second-generation sequencing. Nat. Rev. Genet. 11, 685–696 (2010).

  67. 67.

    , , & TERT promoter mutations in cancer development. Curr. Opin. Genet. Dev. 24, 30–37 (2014).

  68. 68.

    et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013). One of the first papers showing prevalence of TERT promoter mutations in cancer.

  69. 69.

    et al. TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal. Proc. Natl Acad. Sci. USA 110, 6021–6026 (2013).

  70. 70.

    et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).

  71. 71.

    et al. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 644–648 (2014).

  72. 72.

    et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 1373–1377 (2005).

  73. 73.

    et al. An integrated network of androgen receptor, polycomb, and TMPRSS2ERG gene fusions in prostate cancer progression. Cancer Cell 17, 443–454 (2010).

  74. 74.

    et al. The genomic complexity of primary human prostate cancer. Nature 470, 214–220 (2011).

  75. 75.

    et al. Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. Cancer Cell 23, 159–170 (2013).

  76. 76.

    et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).

  77. 77.

    et al. Site-specific deletions involving the tal-1 and sil genes are restricted to cells of the T cell receptor α/β lineage: T cell receptor δ gene deletion mechanism affects multiple genes. J. Exp. Med. 177, 965–977 (1993).

  78. 78.

    , & Chromosomal translocations in cancer. Biochim. Biophys. Acta 1786, 139–152 (2008).

  79. 79.

    & The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol. 9, 703–719 (2012).

  80. 80.

    , , , & Inducing cell proliferation inhibition, apoptosis, and motility reduction by silencing long noncoding ribonucleic acid metastasis-associated lung adenocarcinoma transcript 1 in urothelial carcinoma of the bladder. Urology 81, 209.e1–209.e7 (2013).

  81. 81.

    et al. Effects of a novel long noncoding RNA, lncUSMycN, on N-Myc expression and neuroblastoma progression. J. Natl Cancer Inst. 106, dju113 (2014).

  82. 82.

    & N-myc and noncoding RNAs in neuroblastoma. Mol. Cancer Res. 10, 1243–1253 (2012).

  83. 83.

    et al. Epigenetic repression of miR-31 disrupts androgen receptor homeostasis and contributes to prostate cancer progression. Cancer Res. 73, 1232–1244 (2013).

  84. 84.

    et al. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature 465, 1033–1038 (2010).

  85. 85.

    et al. The BRAF pseudogene functions as a competitive endogenous RNA and induces lymphoma in vivo. Cell 161, 319–332 (2015).

  86. 86.

    iCOGS collection provides a collaborative model. Nat. Genet. 45, 343 (2013).

  87. 87.

    et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

  88. 88.

    , & A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front. Genet. 6, 149 (2015).

  89. 89.

    & A single nucleotide polymorphism in the p53 pathway interacts with gender, environmental stresses and tumor genetics to influence cancer in humans. Oncogene 26, 1317–1323 (2007).

  90. 90.

    et al. A single nucleotide polymorphism in the MDM2 promoter attenuates the p53 tumor suppressor pathway and accelerates tumor formation in humans. Cell 119, 591–602 (2004).

  91. 91.

    & Chromosome 8q24-associated cancers and MYC. Genes Cancer 1, 555–559 (2010).

  92. 92.

    et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat. Genet. 46, 126–135 (2014).

  93. 93.

    et al. Genetic predisposition to neuroblastoma mediated by a LMO1 super-enhancer polymorphism. Nature 528, 418–421 (2015).

  94. 94.

    et al. In-silico identification and functional validation of allele-dependent AR enhancers. Oncotarget 6, 4816–4828 (2015).

  95. 95.

    et al. A novel splice site mutation in the noncoding region of BRCA2: implications for Fanconi anemia and familial breast cancer diagnostics. Hum. Mut. 35, 442–446 (2014).

  96. 96.

    et al. Identification of functionally active, low frequency copy number variants at 15q21.3 and 12q21.31 associated with prostate cancer risk. Proc. Natl Acad. Sci. USA 109, 6686–6691 (2012).

  97. 97.

    et al. Targeted resequencing of the microRNAome and 3′UTRome reveals functional germline DNA variants with altered prevalence in epithelial ovarian cancer. Oncogene 34, 2125–2137 (2015).

  98. 98.

    et al. Genetic variations in miR-27a gene decrease mature miR-27a level and reduce gastric cancer susceptibility. Oncogene 33, 193–202 (2014).

  99. 99.

    , & HOXA10 regulates p53 expression and matrigel invasion in human breast cancer cells. Cancer Biol. Ther. 3, 568–572 (2004).

  100. 100.

    et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013).

  101. 101.

    et al. Variants at IRX4 as prostate cancer expression quantitative trait loci. Eur. J. Hum. Genet. 22, 558–563 (2014).

  102. 102.

    et al. Putative cis-regulatory drivers in colorectal cancer. Nature (2014).

  103. 103.

    Mutation and cancer: statistical study of retinoblastoma. Proc. Natl Acad. Sci. USA 68, 820–823 (1971).

  104. 104.

    et al. Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc. Natl Acad. Sci. USA 101, 2999–3004 (2004).

  105. 105.

    et al. Recurrent focal copy-number changes and loss of heterozygosity implicate two noncoding RNAs and one tumor suppressor gene at chromosome 3q13.31 in osteosarcoma. Cancer Res. 70, 160–171 (2010).

  106. 106.

    et al. LncRNA loc285194 is a p53-regulated tumor suppressor. Nucleic Acids Res. 41, 4976–4987 (2013).

  107. 107.

    et al. TERT promoter mutations in bladder cancer affect patient survival and disease recurrence through modification by a common polymorphism. Proc. Natl Acad. Sci. USA 110, 17426–17431 (2013).

  108. 108.

    , , , & Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics 14, S7 (2013).

  109. 109.

    et al. Optimal unified approach for rare-variant association testing with application to small-sample case−control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224–237 (2012).

  110. 110.

    et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci. Rep. 3, 2650 (2013).

  111. 111.

    , , , & LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations. Nucleic Acids Res. 43, 8123–8134 (2015). Method that accounts for heterogeneity in mutation rate in non-coding regions to identify regulatory driver mutations.

  112. 112.

    & SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

  113. 113.

    et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

  114. 114.

    et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature 518, 360–364 (2015). Shows that somatic mutation density can be predicted based on epigenomic features from the cell of origin.

  115. 115.

    et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).

  116. 116.

    et al. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders. Science 41, 177–181 (2012).

  117. 117.

    et al. Genome-scale transcriptional activation by an engineered CRISPR−Cas9 complex. Nature 517, 583–588 (2014).

  118. 118.

    , & Massively parallel synthetic promoter assays reveal the in vivo effects of binding site variants. Genome Res. 23, 1908–1915 (2013).

  119. 119.

    et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 339, 1074–1077 (2013).

  120. 120.

    et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).

  121. 121.

    , & Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

  122. 122.

    , , & High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

  123. 123.

    & Minigene reporter for identification and analysis of cis elements and trans factors affecting pre-mRNA splicing. Biotechniques 41, 177–181 (2006).

  124. 124.

    et al. Use of splicing reporter minigene assay to evaluate the effect on splicing of unclassified genetic variants. Methods Mol. Biol. 653, 249–257 (2010).

  125. 125.

    et al. Systematic screening of promoter regions pinpoints functional cis-regulatory mutations in a cutaneous melanoma genome. Mol. Cancer Res. 13, 1218–1226 (2015).

  126. 126.

    et al. Prospective derivation of a living organoid biobank of colorectal cancer patients. Cell 161, 933–945 (2015).

  127. 127.

    et al. Organoid models of human and mouse ductal pancreatic cancer. Cell 160, 324–338 (2015).

  128. 128.

    et al. Organoid cultures derived from patients with advanced prostate cancer. Cell 159, 176–187 (2014).

  129. 129.

    & After GWAS: mice to the rescue? Curr. Opin. Immunol. 24, 564–570 (2012).

  130. 130.

    , , , & Functional validation of mouse tyrosinase non-coding regulatory DNA elements by CRISPR−Cas9-mediated mutagenesis. Nucleic Acids Res. 43, 4855–4867 (2015).

  131. 131.

    , , , & Precision cancer mouse models through genome editing with CRISPR−Cas9. Genome Med. 7, 53 (2015).

  132. 132.

    et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

  133. 133.

    The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

  134. 134.

    , , , & A molecular basis for classic blond hair color in Europeans. Nat. Genet. 46, 748–752 (2014).

  135. 135.

    et al. Cumulative haploinsufficiency and triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell 155, 948–962 (2013).

  136. 136.

    et al. A cluster of cooperating tumor-suppressor gene candidates in chromosomal deletions. Proc. Natl Acad. Sci. USA 109, 8212–8217 (2012).

  137. 137.

    et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat. Genet. 46, 573–582 (2014).

  138. 138.

    et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

  139. 139.

    & Chapter 11: genome-wide association studies. PLoS Comput. Biol. 8, e1002822 (2012).

  140. 140.

    , & SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics 25, 655–661 (2009).

  141. 141.

    , & SNPnexus: a web server for functional annotation of novel and publicly known genetic variants (2012 update). Nucleic Acids Res. 40, W65–W70 (2012).

  142. 142.

    , & ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  143. 143.

    et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

  144. 144.

    et al. OncoCis: annotation of cis-regulatory mutations in cancer. Genome Biol. 15, 485 (2014).

  145. 145.

    , , & GEMINI: integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).

  146. 146.

    , , , & FunciSNP: an R/bioconductor tool integrating functional non-coding data sets with genetic association studies to identify candidate regulatory SNPs. Nucleic Acids Res. 40, e139 (2012).

  147. 147.

    & HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

  148. 148.

    , , , & GWAS3D: detecting human regulatory variants by integrative analysis of genome-wide associations, chromosome interactions and histone modifications. Nucleic Acids Res. 41, W150–W158 (2013).

  149. 149.

    , , & is-rSNP: a novel technique for in silico regulatory SNP detection. Bioinformatics 26, i524–i530 (2010).

  150. 150.

    et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

  151. 151.

    & Exploring functional variant discovery in non-coding regions with SInBaD. Nucleic Acids Res. 41, e7 (2013).

  152. 152.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  153. 153.

    , , & Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).

  154. 154.

    , , & A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

  155. 155.

    & Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

Download references

Acknowledgements

F.D. would like to acknowledge grant IG 13562 from AIRC (Associazione Italiana per la Ricerca sul Cancro).

Author information

Affiliations

  1. Meyer Cancer Center, Weill Cornell Medical College, New York, New York 10065, USA.

    • Ekta Khurana
    •  & Mark A. Rubin
  2. Institute for Precision Medicine, Weill Cornell Medical College, New York, New York 10065, USA.

    • Ekta Khurana
    • , Dimple Chakravarty
    • , Francesca Demichelis
    •  & Mark A. Rubin
  3. Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York 10021, USA.

    • Ekta Khurana
    •  & Francesca Demichelis
  4. Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065, USA.

    • Ekta Khurana
  5. Bina Technologies, Roche Sequencing, Redwood City, California 94065, USA.

    • Yao Fu
  6. Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, New York 10065, USA.

    • Dimple Chakravarty
    •  & Mark A. Rubin
  7. Centre for Integrative Biology, University of Trento, 38123 Trento, Italy.

    • Francesca Demichelis
  8. Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA.

    • Mark Gerstein
  9. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.

    • Mark Gerstein
  10. Department of Computer Science, Yale University, New Haven, Connecticut 06520, USA.

    • Mark Gerstein

Authors

  1. Search for Ekta Khurana in:

  2. Search for Yao Fu in:

  3. Search for Dimple Chakravarty in:

  4. Search for Francesca Demichelis in:

  5. Search for Mark A. Rubin in:

  6. Search for Mark Gerstein in:

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Ekta Khurana or Mark A. Rubin or Mark Gerstein.

Glossary

Exome sequencing

Sequencing the protein-coding portion of the genome using target-enrichment and high-throughput sequencing technology.

Driver mutations

Sequence variants that confer growth advantage to tumour cells.

Passenger mutations

Sequence variants that do not contribute to cancer growth.

Germline variants

Heritable variants that are transmitted to offspring. These variants are constitutional (that is, present in all cells of the body).

Genome-wide association studies

(GWASs). Studies that interrogate multiple common genetic variants along the genome in large cohorts of individuals to evaluate whether any variant is associated with a specific trait.

Single nucleotide variants

DNA sequence changes at single nucleotides.

Somatic variants

Variants that are not inherited from a parent and are not transmitted to offspring.

Penetrance

The proportion of individuals carrying an allele (or a genotype) that also express the trait (phenotype) associated with it.

Chromoplexy

(From the Greek pleko, meaning to weave, or to braid). A class of complex somatic DNA rearrangements whereby abundant DNA deletions and intra- and inter-chromosomal translocations that have originated in an interdependent way occur within a single cell cycle.

Chromothripsis

(From the Greek thripsis, meaning shattering into pieces). A clustered chromosomal rearrangement in confined genomic regions that results from a single catastrophic event, usually limited to one chromosome.

Kataegis

(From the Greek kataigis, meaning thunder). A phenomenon that is characterized by large clusters of mutations (hypermutation) in the genome of cancer cells. An APOBEC family enzyme might be responsible for the kataegis process.

Cis-regulatory regions

Regions that regulate the expression of genes on the same DNA molecule. These include promoters, enhancers, silencers, insulators and untranslated regions.

Enhancers

Distal cis-regulatory regions bound by transcription factors that activate genes by helping the recruitment of RNA polymerase to the promoters.

Silencers

Distal cis-regulatory regions bound by transcription factors that repress gene expression by preventing RNA polymerase from binding to the gene promoter.

Insulators

Regions that block the interaction between enhancers and promoters.

DNase I footprinting

A method to detect the exact binding sites of DNA-binding proteins based on the fact that a protein bound to DNA protects it from cleavage by DNase I.

Chromosome conformation capture

(3C). A biochemical method whereby the three-dimensional organization of chromatin in living cells is fixed and analysed.

Expression quantitative trait loci

(eQTLs). Loci in which DNA sequence variants are related with expression levels of mRNAs.

Endo-siRNAs

Endogenously produced small interfering RNAs that regulate gene expression by binding and cleaving mRNA targets or mediating heterochromatin formation.

Negative selection

Selective pressure that results in the removal of deleterious alleles.

Single nucleotide polymorphisms

(SNPs). Single nucleotide variants that show variability in the human population. As used in the context of this Review, they may be common (with high allele frequency) or rare (with low allele frequency).

Oncogene

A gene that is often upregulated in cancer and can lead to or promote cancer growth.

Burden tests

Statistical methods to test the cumulative effect of multiple variants in a genomic region.

Positive selection

Directed selection that forces the allele frequency of advantageous mutations to increase.

Minigene assays

Assays using a plasmid with a minimal gene fragment necessary for the gene to be expressed. It can include exons as well as introns, and it serves as a tool for evaluating splicing patterns.

Precision medicine

Medical care tailored to the individual patient, usually using the patient's genomic sequence.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrg.2015.17

Further reading