We developed cis-X, a computational method for discovering regulatory noncoding variants in cancer by integrating whole-genome and transcriptome sequencing data from a single cancer sample. cis-X first finds aberrantly cis-activated genes that exhibit allele-specific expression accompanied by an elevated outlier expression. It then searches for causal noncoding variants that may introduce aberrant transcription factor binding motifs or enhancer hijacking by structural variations. Analysis of 13 T-lineage acute lymphoblastic leukemias identified a recurrent intronic variant predicted to cis-activate the TAL1 oncogene, a finding validated in vivo by chromatin immunoprecipitation sequencing of a patient-derived xenograft. Candidate oncogenes include the prolactin receptor PRLR activated by a focal deletion that removes a CTCF-insulated neighborhood boundary. cis-X may be applied to pediatric and adult solid tumors that are aneuploid and heterogeneous. In contrast to existing approaches, which require large sample cohorts, cis-X enables the discovery of regulatory noncoding variants in individual cancer genomes.
This is a preview of subscription content
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
WGS and RNA-seq data for the SCMC cohort analyzed in this study can be accessed from the Genome Sequence Archive for Human under the National Genomics Data Center of China (http://bigd.big.ac.cn/gsa-human), under accession nos. HRA000097 and HRA000096 for WGS and RNA-seq, respectively. The data are publicly available to users following a standard access application process for human genomic and associated phenotypic data. The ChIP–seq data generated in this study can be accessed from the Gene Expression Omnibus under accession nos. GSE113565 and GSE145549, for H3K27Ac and YY1, respectively, with the called peaks (in BED format) available upon request. Whole-exome sequencing and RNA-seq data for the TARGET T-ALL and NBL cohorts have been deposited in the database of Genotypes and Phenotypes (http://www.ncbi.nlm.nih.gov/gap) as part of previous projects under accession nos. phs000464 and phs000467, respectively. The WGS and RNA-seq data for the TCGA melanoma were downloaded from Genomic Data Commons data portal (https://portal.gdc.cancer.gov/legacy-archive/search/f). The complete list of somatic variant calls for the 13 T-ALLs used as the input of the cis-X analysis presented in the manuscript can be accessed from our research laboratory page at http://www.stjuderesearch.org/site/lab/zhang/cis-x. Source data are provided with this paper.
The cis-X package, together with detailed instructions and demo data, is available at https://www.stjuderesearch.org/site/lab/zhang/cis-x, https://platform.stjude.cloud/workflows/cis-x and https://github.com/stjude/cis-x. In addition to the source code, we have provided a Dockerfile along with the package to run cis-X in a container via Docker, to minimize the difficulty of running cis-X on different computing platforms.
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Khurana, E. et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).
Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).
Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).
Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).
Zhang, J. et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat. Genet. 48, 1481–1489 (2016).
Zhang, X. et al. Identification of focally amplified lineage-specific super-enhancers in human epithelial cancers. Nat. Genet. 48, 176–182 (2016).
Mansour, M. R. et al. Oncogene regulation. An oncogenic super-enhancer formed through somatic mutation of a noncoding intergenic element. Science 346, 1373–1377 (2014).
Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578, 102–111 (2020).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014).
Weinhold, N., Jacobsen, A., Schultz, N., Sander, C. & Lee, W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
Melton, C., Reuter, J. A., Spacek, D. V. & Snyder, M. Recurrent somatic mutations in regulatory regions of human cancer genomes. Nat. Genet. 47, 710–716 (2015).
Kim, K. et al. Chromatin structure-based prediction of recurrent noncoding mutations in cancer. Nat. Genet. 48, 1321–1326 (2016).
Ma, X. et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371–376 (2018).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Belver, L. & Ferrando, A. The genetics and mechanisms of T cell acute lymphoblastic leukaemia. Nat. Rev. Cancer 16, 494–507 (2016).
Liu, Y. et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nat. Genet. 49, 1211–1218 (2017).
Li, Z. et al. APOBEC signature mutation generates an oncogenic enhancer that drives LMO1 expression in T-ALL. Leukemia 31, 2057–2064 (2017).
Hu, S. et al. Whole-genome noncoding sequence analysis in T-cell acute lymphoblastic leukemia identifies oncogene enhancer mutations. Blood 129, 3264–3268 (2017).
Abraham, B. J. et al. Small genomic insertions form enhancers that misregulate oncogenes. Nat. Commun. 8, 14385 (2017).
Rahman, S. et al. Activation of the LMO2 oncogene through a somatically acquired neomorphic promoter in T-cell acute lymphoblastic leukemia. Blood 129, 3221–3226 (2017).
Mayba, O. et al. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 15, 405 (2014).
Pawlikowska, I. et al. The most informative spacing test effectively discovers biologically relevant outliers or multiple modes in expression. Bioinformatics 30, 1400–1408 (2014).
Simonis, M. et al. High-resolution identification of balanced and complex chromosomal rearrangements by 4C technology. Nat. Methods 6, 837–842 (2009).
Weintraub, A. S. et al. YY1 Is a structural regulator of enhancer-promoter loops. Cell 171, 1573–1588.e28 (2017).
Ali, S. & Ali, S. Prolactin receptor regulates Stat5 tyrosine phosphorylation and nuclear translocation by two separate pathways. J. Biol. Chem. 273, 7709–7716 (1998).
Goffin, V. Prolactin receptor targeting in breast and prostate cancers: New insights into an old challenge. Pharmacol. Ther. 179, 111–126 (2017).
Pugh, T. J. et al. The genetic landscape of high-risk neuroblastoma. Nat. Genet. 45, 279–284 (2013).
Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).
Valentijn, L. J. et al. TERT rearrangements are frequent in neuroblastoma and identify aggressive tumors. Nat. Genet. 47, 1411–1414 (2015).
Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330 (2014).
Zhang, Y. et al. High-coverage whole-genome analysis of 1220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nat. Commun. 11, 736 (2020).
Akbani, R. et al. Genomic classification of cutaneous melanoma. Cell 161, 1681–1696 (2015).
Strub, T. et al. SIRT6 haploinsufficiency induces BRAFV600E melanoma cell resistance to MAPK inhibitors via IGF signalling. Nat. Commun. 9, 3440 (2018).
Zhou, B. et al. INO80 governs superenhancer-mediated oncogenic transcription and tumor growth in melanoma. Genes Dev. 30, 1440–1453 (2016).
Fontanals-Cirera, B. et al. Harnessing BET inhibitor sensitivity reveals AMIGO2 as a melanoma survival gene. Mol. Cell 68, 731–744.e9 (2017).
Kaufman, C. K. et al. A zebrafish melanoma model reveals emergence of neural crest identity during melanoma initiation. Science 351, aad2197 (2016).
Lin, A. W. & Lowe, S. W. Oncogenic ras activates the ARF-p53 pathway to suppress epithelial cell transformation. Proc. Natl Acad. Sci. USA 98, 5025–5030 (2001).
Kamijo, T. et al. Tumor suppression at the mouse INK4a locus mediated by the alternative reading frame product p19 ARF. Cell 91, 649–659 (1997).
Zhang, Y. et al. A cis-element within the ARF locus mediates repression of p16 INK4A expression via long-range chromatin interactions. Proc. Natl Acad. Sci. USA 116, 26644–26652 (2019).
Zhang, B. & Peng, Z. Defective folding of mutant p16INK4 proteins encoded by tumor-derived alleles. J. Biol. Chem. 271, 28734–28737 (1996).
Walker, G. J., Gabrielli, B. G., Castellano, M. & Hayward, N. K. Functional reassessment of P16 variants using a transfection-based assay. Int. J. Cancer 82, 305–312 (1999).
Yu, M. & Ren, B. The three-dimensional organization of mammalian genomes. Annu. Rev. Cell Dev. Biol. 33, 265–289 (2017).
Monk, M. & Holding, C. Human embryonic genes re-expressed in cancer cells. Oncogene 20, 8085–8091 (2001).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Hidalgo, M. et al. Patient-derived xenograft models: an emerging platform for translational cancer research. Cancer Discov. 4, 998–1013 (2014).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–d783 (2017).
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Kulakovskiy, I. V. et al. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res. 44, D116–D125 (2016).
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Edmonson, M. N. et al. Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format. Bioinformatics 27, 865–866 (2011).
Chen, X. et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat. Methods 12, 527–530 (2015).
Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat. Methods 8, 652–654 (2011).
MacDonald, J. R., Ziman, R., Yuen, R. K., Feuk, L. & Scherer, S. W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986–D992 (2014).
Geoffroy, V. et al. AnnotSV: an integrated tool for structural variations annotation. Bioinformatics 34, 3572–3574 (2018).
Parker, M. et al. C11orf95-RELA fusions drive oncogenic NF-κB signalling in ependymoma. Nature 506, 451–455 (2014).
Anders, S., Pyl, P. T. & Huber, W. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).
Zhang, X.-L. et al. Integrative epigenomic analysis reveals unique epigenetic signatures involved in unipotency of mouse female germline stem cells. Genome Biol. 17, 162 (2016).
Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP–seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).
Zhang, Y. et al. Model-based analysis of ChIP–Seq (MACS). Genome Biol. 9, R137 (2008).
Cheng, Y. et al. Principles of regulatory information conservation between mouse and human. Nature 515, 371–375 (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
This work was funded in part by National Institutes of Health grant nos. 1R35 CA210064-01 (to A.T.L.) and 1R01 CA216391-01A1 (to J.Z.), and Cancer Center Support Grant no. P30 CA021765 from the National Cancer Institute and the American Lebanese Syrian Associated Charities of St. Jude Children’s Research Hospital. We thank B. Abraham, M. Zimmerman, A. Durbin, D. Wheeler and D. Flasch for critically reviewing the manuscript, and C. Sherr for providing the literature relevant to p16 activation.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Cumulative distribution of transcription imbalance under binomial transcription model (dotted line), beta-binomial model as implemented in MBASED (solid line), balanced transcription model (dashed line) and experimentally observed data (dots). Different RNA-seq coverages (N = 10, 50 and 100) are shown separately.
Each panel represents a simulation of allelic imbalance ranging from 1:1 (no allele-specific expression) to 10000:1 (complete mono-allelic expression). Percentage of simulations identified as allele-specific expression from a group of 2,000 simulations are shown on y-axis, with plots on each panel representing simulation results with different imbalanced transcription ratio between two alleles. The imbalanced ratio of 1:1 represents the false positive rate was showed on the top, while plots in the other lines represent false negative rates of detecting transcription imbalance at various allelic ratio. Coverage for the markers in RNA-seq is shown on the x-axis. Each column, labeled by a distinct color, represents a distinct ploidy group (that is copy number alterations), while shape of each plot represents the number of markers within a gene for assessing allele-specific expression.
Workflow for constructing the gene-specific reference expression matrix.
(a) Allele specific expression of LMO3 in T-ALL SJTALL013797_D1. Eight heterozygous variants are present in LMO3 locus in this tumor, with the B-allele fractions from WGS and RNA-seq plotted on the top of the wiggle plot. (b) Outlier high expression of LMO3 was observed in this sample compared to the NCI TARGET T-ALL cohort (n = 264 samples). (c) Gene expression based clustering of the combined cohort of 13 SCMC T-ALLs and 264 NCI TARGET T-ALLs showed that SJTALL013797_D1 is clustered with other T-ALLs driven by TAL/LMO activation. The same genes from the previous study (Liu et al. Nature Genetics, 2017) were used in clustering the combined cohort. Colors on the top track represent different T-ALL subtypes.
Extended Data Fig. 5 Somatically acquired noncoding mutation activating TAL1 in T-ALL sample SJALL018373.
(a) The heterozygous C to T mutation (indicated by arrow, with mutant allele T shown in red) was only present in the tumor DNA but not in the remission sample from whole genome sequencing data. (b) H3K27Ac profile from ChIP-seq at TAL1 locus. The active enhancer present in the mutation positive PDX sample (as shown in Fig. 3d) was absent from normal T cells (CD3, CD4 and CD8) or from the T-ALL cell line (LOUCY) with no TAL1 expression.
Expression (FPKM on y-axis) of SPEF2 (a) and IL7R (b) in the T-ALLs. The 3 tumors carrying the focal deletions (SJALL043558_D1, PATFYZ, and PATRUN) are labeled. (c) H3K27Ac profiles from ChIP-seq show active enhancer upstream of IL7R in the PDX (derived from patient SJALL018373) and a T-ALL cell line (KOPT-K1) having high IL7R transcription; both samples have the wild-type allele at this locus.
(a) Copy number variations identified in the four neuroblastoma cell lines. The blue and red colors represent the deletion and amplifications, respectively, identified in these cell lines. (b) Circos plot showing the cis-activating structural rearrangements identified in NBL cell lines by cis-X. The copy number alterations in each genome are shown in the inner track, with blue lines representing a copy number of 1 and red a copy number of three. The cis-activating structural variants are shown as links in the middle of the plot, with purple links representing inter-chromosome translocations and green for intra-chromosome translocations. The target genes activated by these rearrangements are labeled on the outer track of each plot.
The analysis was based on 90 NBL primary tumor samples with matching RNA-seq and WGS from TARGET, 42 of which had positive immune cell infiltration signature based on prior analysis (Ma et al, Nature, 2018). (a) Samples with somatic copy number alterations (CNA, marked by red or blue blocks) or/and structural variations (SVs, marked by circles) at TERT locus. All except for one (PARAMT, marked #) were detected by cis-X as cis-activated candidates. Samples marked with * have immune cell infiltration signature. Samples highlighted in gray are used to illustrate allele-specific expression (ASE) below. (b) Examples of ASE detected in neuroblastoma with or without infiltrating immune cells. Variant allele fraction in DNA (by WGS) and RNA (by RNA-seq) of SNPs, depicted as bar graph, demonstrates that ASE analysis is not affected by the presence of immune cell infiltration signature in tumor samples.
TERT expression in adult TCGA melanoma (MEL) samples (n = 38), pediatric neuroblastoma (NBL) patient samples from TARGET project (n = 90) and cell lines (n = 4) analyzed in this study. The MEL samples were color-coded by TERT promoter mutation status while the NBL samples were marked by the status of cis-activation, infiltrating immune cells and cell-lines as depicted in figure legend.
About this article
Cite this article
Liu, Y., Li, C., Shen, S. et al. Discovery of regulatory noncoding variants in individual cancer genomes by using cis-X. Nat Genet 52, 811–818 (2020). https://doi.org/10.1038/s41588-020-0659-5
SVExpress: identifying gene features altered recurrently in expression with nearby structural variant breakpoints
BMC Bioinformatics (2021)
Mutational patterns and clonal evolution from diagnosis to relapse in pediatric acute lymphoblastic leukemia
Scientific Reports (2021)
Nature Communications (2021)
Nature Reviews Cancer (2021)
Emerging molecular subtypes and therapeutic targets in B-cell precursor acute lymphoblastic leukemia
Frontiers of Medicine (2021)