Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Soda, M. et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007).
Kwak, E. L. et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N. Engl. J. Med. 363, 1693–1703 (2010).
Rowley, J. D. Letter: a new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973).
Kantarjian, H. et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N. Engl. J. Med. 346, 645–652 (2002).
Wan, T. S. Cancer cytogenetics: methodology revisited. Ann. Lab. Med. 34, 413–425 (2014).
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Mardis, E. R. & Wilson, R. K. Cancer genome sequencing: a review. Hum. Mol. Genet. 18, R163–168 (2009).
Inaki, K. et al. Transcriptional consequences of genomic structural aberrations in breast cancer. Genome Res. 21, 676–687 (2011).
Maher, C. A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).
Zhang, J. et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016).
Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Xu, H. et al. Integrative analysis reveals the transcriptional collaboration between EZH2 and E2F1 in the regulation of cancer-related gene expression. Mol. Cancer Res. 14, 163–172 (2016).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).
Wang, Z. et al. The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS One 8, e58793 (2013).
Barutcu, A. R. et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. 16, 214 (2015).
Barutcu, A. R. et al. RUNX1 contributes to higher-order chromatin organization and gene regulation in breast cancer cells. Biochim. Biophys. Acta 1859, 1389–1397 (2016).
Taberlay, P. C. et al. Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res. 26, 719–731 (2016).
Guo, Y. et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910 (2015).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechno.l 31, 1119–1125 (2013).
Engreitz, J. M., Agarwala, V. & Mirny, L. A. Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLoS One 7, e44196 (2012).
Naumova, N. et al. Organization of the mitotic chromosome. Science 342, 948–953 (2013).
Seaman, L. et al. Nucleome analysis reveals structure–function relationships for colon cancer. Mol. Cancer Res. 15, 821–830 (2017).
Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017).
Wu, H. J. & Michor, F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics 32, 3695–3701 (2016).
Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2017).
Naumann, S., Reutzel, D., Speicher, M. & Decker, H. J. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk. Res. 25, 313–322 (2001).
O’Doherty, A. et al. An aneuploid mouse strain carrying human chromosome 21 with Down syndrome phenotypes. Science 309, 2033–2037 (2005).
Gribble, S. M. et al. Massively parallel sequencing reveals the complex structure of an irradiated human chromosome on a mouse background in the Tc1 model of Down syndrome. PLoS One 8, e60482 (2013).
Rhind, N. & Gilbert, D. M. DNA replication timing. Cold Spring Harb. Perspect. Biol. 5, a010132 (2013).
Dileep, V., Rivera-Mulia, J. C., Sima, J. & Gilbert, D. M. Large-scale chromatin structure-function relationships during the cell cycle and development: insights from replication timing. Cold Spring Harb. Symp. Quant. Biol. 80, 53–63 (2015).
Pope, B. D. et al. Replication-timing boundaries facilitate cell-type and species-specific regulation of a rearranged human chromosome in mouse. Hum. Mol. Genet. 21, 4162–4170 (2012).
Ryba, T. et al. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. Genome Res. 22, 1833–1844 (2012).
Dileep, V. et al. Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program. Genome Res. 25, 1104–1113 (2015).
Rivera-Mulia, J. C. et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. Genome Res. 25, 1091–1103 (2015).
Sima, J. & Gilbert, D. M. Complex correlations: replication timing and mutational landscapes during cancer and genome evolution. Curr. Opin. Genet. Dev. 25, 93–100 (2014).
Chiarle, R. et al. Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell 147, 107–119 (2011).
Struski, S. et al. Identification of chromosomal loci associated with non-P-glycoprotein-mediated multidrug resistance to topoisomerase II inhibitor in lung adenocarcinoma cell line by comparative genomic hybridization. Genes Chromosomes Cancer 30, 136–142 (2001).
Strefford, J. C. et al. A combination of molecular cytogenetic analyses reveals complex genetic alterations in conventional renal cell carcinoma. Cancer Genet. Cytogenet. 159, 1–9 (2005).
Peng, K. J. et al. Characterization of two human lung adenocarcinoma cell lines by reciprocal chromosome painting. Dongwuxue Yanjiu 31, 113–121 (2010).
Beheshti, B., Karaskova, J., Park, P. C., Squire, J. A. & Beatty, B. G. Identification of a high frequency of chromosomal rearrangements in the centromeric regions of prostate cancer cell lines by sequential giemsa banding and spectral karyotyping. Mol. Diagn. 5, 23–32 (2000).
Liu, J. et al. Modeling of lung cancer by an orthotopically growing H460SM variant cell line reveals novel candidate genes for systemic metastasis. Oncogene 23, 6316–6324 (2004).
Espino, P. S., Pritchard, S., Heng, H. H. & Davie, J. R. Genomic instability and histone H3 phosphorylation induction by the Ras-mitogen activated protein kinase pathway in pancreatic cancer cells. Int. J. Cancer 124, 562–567 (2009).
Sirivatanauksorn, V. et al. Non-random chromosomal rearrangements in pancreatic cancer cell lines identified by spectral karyotyping. Int. J. Cancer 91, 350–358 (2001).
Rondón-Lagos, M. et al. Differences and homologies of chromosomal alterations within and between breast cancer cell lines: a clustering analysis. Mol. Cytogenet. 7, 8 (2014).
Hillmer, A. M. et al. Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome Res. 21, 665–675 (2011).
Hampton, O. A. et al. Long-range massively parallel mate pair sequencing detects distinct mutations and similar patterns of structural mutability in two breast cancer cell lines. Cancer Genet. 204, 447–457 (2011).
Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).
Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).
Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–811 (2015).
Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).
Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).
Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Marchal, C. et al. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat. Protoc. 13, 819–839 (2018).
Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).
Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-Seq. Preprint at https://www.biorxiv.org/content/early/2017/03/24/120295 (2017).
Benelli, M. et al. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 28, 3232–3239 (2012).
Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).
van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).
This work was supported by NIH grants R35GM124820, R01HG009906, and U01CA200060 (F.Y.), R24DK106766 (R.C.H. and F.Y.), GM083337 (D.M.G.), GM085354 (D.M.G.), DK107965 (D.M.G.), U54HG004592 (J.D. and J.A.S.), HG003143 and DK107980 (J.D.), U41HG007000 (W.S.N.), and DP5OD023071 (J. D.). This work was also supported by European Research Council (No. 615584 to D.T.O.and C.E.), Cancer Research UK (Nos. 20412 and 22398 to D.T.O. and C.E.), Wellcome Trust (No. 84459 to D.T.O. and C.E.), and Wellcome Trust (No. 106985/Z/15/Z to S.H.). J.D. is an investigator of the Howard Hughes Medical Institute. J.R.D. is also supported by the Leona M. and Harry B. Helmsley Charitable Trust grant No. 2017-PG-MED001. F.A. was supported by Institute Leadership Funds from La Jolla Institute for Allergy and Immunology. F.Y. is also supported by the Leukemia Research Foundation and Penn State Clinical and Translational Science Institute. We thank the ENCODE Data Coordination Center for helping with Hi-C and replication time data deposition. We would also like to thank Jan Karlseder and Nausica Arnault for help with the FISH experiments.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figures 1–24
List of cell/tissue types with performed experiments and analysis
Number of SVs detected by WGS, Hi-C and optical mapping in eight cancer cell lines and NA12878
SVs detected by WGS in eight cancer cell lines and NA12878
SVs detected by optical mapping in eight cancer cell lines and NA12878
SVs detected by Hi-C in 36 cell lines
High-confidence SV calls from integration
Validated translocations and deletions in K562, Caki and T47D cells
Cross comparison of large intrachromosomal rearrangements (≥1 Mb) and interchromosomal translocations
Contribution by each method and their overlapping percentage with high-confidence SVs
Integration of intrachromosomal rearrangements (<1 Mb)
Irys-detected deletions encompass multiple smaller WGS-detected deletions with the same total deletion sizes
Optical mapping predicts the size of unresolved genome gap in hg19
Optical mapping provides estimation of gap size in hg38 and comparison to previous gap assessment of hg38
SV-induced fused genes detected by RNA-seq
Summary of genes, repetitive elements and insulators overlapping with high-confidence deletions
Frequency of enhancer deletions versus simulated expectation in cancer cells and normal cells
Deleted potential enhancers and insulators in T47D, Caki2, K562 and NCIH460
About this article
Cite this article
Dixon, J.R., Xu, J., Dileep, V. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat Genet 50, 1388–1398 (2018). https://doi.org/10.1038/s41588-018-0195-8
Genome Biology (2021)
BMC Genomics (2021)
Nucleic Acids Research (2021)
SSRN Electronic Journal (2021)
International Journal of Molecular Sciences (2021)