Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

  2. 2.

    Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

  3. 3.

    Soda, M. et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007).

  4. 4.

    Kwak, E. L. et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N. Engl. J. Med. 363, 1693–1703 (2010).

  5. 5.

    Rowley, J. D. Letter: a new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973).

  6. 6.

    Kantarjian, H. et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N. Engl. J. Med. 346, 645–652 (2002).

  7. 7.

    Wan, T. S. Cancer cytogenetics: methodology revisited. Ann. Lab. Med. 34, 413–425 (2014).

  8. 8.

    Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).

  9. 9.

    Mardis, E. R. & Wilson, R. K. Cancer genome sequencing: a review. Hum. Mol. Genet. 18, R163–168 (2009).

  10. 10.

    Inaki, K. et al. Transcriptional consequences of genomic structural aberrations in breast cancer. Genome Res. 21, 676–687 (2011).

  11. 11.

    Maher, C. A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

  12. 12.

    Zhang, J. et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016).

  13. 13.

    Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

  14. 14.

    Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

  15. 15.

    Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).

  16. 16.

    Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

  17. 17.

    Xu, H. et al. Integrative analysis reveals the transcriptional collaboration between EZH2 and E2F1 in the regulation of cancer-related gene expression. Mol. Cancer Res. 14, 163–172 (2016).

  18. 18.

    Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

  19. 19.

    Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

  20. 20.

    Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).

  21. 21.

    Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

  22. 22.

    Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).

  23. 23.

    Wang, Z. et al. The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS One 8, e58793 (2013).

  24. 24.

    Barutcu, A. R. et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. 16, 214 (2015).

  25. 25.

    Barutcu, A. R. et al. RUNX1 contributes to higher-order chromatin organization and gene regulation in breast cancer cells. Biochim. Biophys. Acta 1859, 1389–1397 (2016).

  26. 26.

    Taberlay, P. C. et al. Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res. 26, 719–731 (2016).

  27. 27.

    Guo, Y. et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910 (2015).

  28. 28.

    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

  29. 29.

    Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechno.l 31, 1119–1125 (2013).

  30. 30.

    Engreitz, J. M., Agarwala, V. & Mirny, L. A. Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLoS One 7, e44196 (2012).

  31. 31.

    Naumova, N. et al. Organization of the mitotic chromosome. Science 342, 948–953 (2013).

  32. 32.

    Seaman, L. et al. Nucleome analysis reveals structure–function relationships for colon cancer. Mol. Cancer Res. 15, 821–830 (2017).

  33. 33.

    Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017).

  34. 34.

    Wu, H. J. & Michor, F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics 32, 3695–3701 (2016).

  35. 35.

    Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2017).

  36. 36.

    Naumann, S., Reutzel, D., Speicher, M. & Decker, H. J. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk. Res. 25, 313–322 (2001).

  37. 37.

    O’Doherty, A. et al. An aneuploid mouse strain carrying human chromosome 21 with Down syndrome phenotypes. Science 309, 2033–2037 (2005).

  38. 38.

    Gribble, S. M. et al. Massively parallel sequencing reveals the complex structure of an irradiated human chromosome on a mouse background in the Tc1 model of Down syndrome. PLoS One 8, e60482 (2013).

  39. 39.

    Rhind, N. & Gilbert, D. M. DNA replication timing. Cold Spring Harb. Perspect. Biol. 5, a010132 (2013).

  40. 40.

    Dileep, V., Rivera-Mulia, J. C., Sima, J. & Gilbert, D. M. Large-scale chromatin structure-function relationships during the cell cycle and development: insights from replication timing. Cold Spring Harb. Symp. Quant. Biol. 80, 53–63 (2015).

  41. 41.

    Pope, B. D. et al. Replication-timing boundaries facilitate cell-type and species-specific regulation of a rearranged human chromosome in mouse. Hum. Mol. Genet. 21, 4162–4170 (2012).

  42. 42.

    Ryba, T. et al. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. Genome Res. 22, 1833–1844 (2012).

  43. 43.

    Dileep, V. et al. Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program. Genome Res. 25, 1104–1113 (2015).

  44. 44.

    Rivera-Mulia, J. C. et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. Genome Res. 25, 1091–1103 (2015).

  45. 45.

    Sima, J. & Gilbert, D. M. Complex correlations: replication timing and mutational landscapes during cancer and genome evolution. Curr. Opin. Genet. Dev. 25, 93–100 (2014).

  46. 46.

    Chiarle, R. et al. Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell 147, 107–119 (2011).

  47. 47.

    Struski, S. et al. Identification of chromosomal loci associated with non-P-glycoprotein-mediated multidrug resistance to topoisomerase II inhibitor in lung adenocarcinoma cell line by comparative genomic hybridization. Genes Chromosomes Cancer 30, 136–142 (2001).

  48. 48.

    Strefford, J. C. et al. A combination of molecular cytogenetic analyses reveals complex genetic alterations in conventional renal cell carcinoma. Cancer Genet. Cytogenet. 159, 1–9 (2005).

  49. 49.

    Peng, K. J. et al. Characterization of two human lung adenocarcinoma cell lines by reciprocal chromosome painting. Dongwuxue Yanjiu 31, 113–121 (2010).

  50. 50.

    Beheshti, B., Karaskova, J., Park, P. C., Squire, J. A. & Beatty, B. G. Identification of a high frequency of chromosomal rearrangements in the centromeric regions of prostate cancer cell lines by sequential giemsa banding and spectral karyotyping. Mol. Diagn. 5, 23–32 (2000).

  51. 51.

    Liu, J. et al. Modeling of lung cancer by an orthotopically growing H460SM variant cell line reveals novel candidate genes for systemic metastasis. Oncogene 23, 6316–6324 (2004).

  52. 52.

    Espino, P. S., Pritchard, S., Heng, H. H. & Davie, J. R. Genomic instability and histone H3 phosphorylation induction by the Ras-mitogen activated protein kinase pathway in pancreatic cancer cells. Int. J. Cancer 124, 562–567 (2009).

  53. 53.

    Sirivatanauksorn, V. et al. Non-random chromosomal rearrangements in pancreatic cancer cell lines identified by spectral karyotyping. Int. J. Cancer 91, 350–358 (2001).

  54. 54.

    Rondón-Lagos, M. et al. Differences and homologies of chromosomal alterations within and between breast cancer cell lines: a clustering analysis. Mol. Cytogenet. 7, 8 (2014).

  55. 55.

    Hillmer, A. M. et al. Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome Res. 21, 665–675 (2011).

  56. 56.

    Hampton, O. A. et al. Long-range massively parallel mate pair sequencing detects distinct mutations and similar patterns of structural mutability in two breast cancer cell lines. Cancer Genet. 204, 447–457 (2011).

  57. 57.

    Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

  58. 58.

    Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).

  59. 59.

    Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–811 (2015).

  60. 60.

    Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).

  61. 61.

    Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).

  62. 62.

    Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).

  63. 63.

    Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).

  64. 64.

    Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).

  65. 65.

    Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

  66. 66.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

  67. 67.

    Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

  68. 68.

    Marchal, C. et al. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat. Protoc. 13, 819–839 (2018).

  69. 69.

    Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).

  70. 70.

    Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-Seq. Preprint at (2017).

  71. 71.

    Benelli, M. et al. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 28, 3232–3239 (2012).

  72. 72.

    Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).

  73. 73.

    van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

  74. 74.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

Download references


This work was supported by NIH grants R35GM124820, R01HG009906, and U01CA200060 (F.Y.), R24DK106766 (R.C.H. and F.Y.), GM083337 (D.M.G.), GM085354 (D.M.G.), DK107965 (D.M.G.), U54HG004592 (J.D. and J.A.S.), HG003143 and DK107980 (J.D.), U41HG007000 (W.S.N.), and DP5OD023071 (J. D.). This work was also supported by European Research Council (No. 615584 to D.T.O.and C.E.), Cancer Research UK (Nos. 20412 and 22398 to D.T.O. and C.E.), Wellcome Trust (No. 84459 to D.T.O. and C.E.), and Wellcome Trust (No. 106985/Z/15/Z to S.H.). J.D. is an investigator of the Howard Hughes Medical Institute. J.R.D. is also supported by the Leona M. and Harry B. Helmsley Charitable Trust grant No. 2017-PG-MED001. F.A. was supported by Institute Leadership Funds from La Jolla Institute for Allergy and Immunology. F.Y. is also supported by the Leukemia Research Foundation and Penn State Clinical and Translational Science Institute. We thank the ENCODE Data Coordination Center for helping with Hi-C and replication time data deposition. We would also like to thank Jan Karlseder and Nausica Arnault for help with the FISH experiments.

Author information

Author notes

  1. These authors contributed equally to this work: Jesse R. Dixon, Jie Xu, Vishnu Dileep, Ye Zhan, and Fan Song.


  1. Salk Institute for Biological Studies, La Jolla, CA, USA

    • Jesse R. Dixon
    •  & Victoria T. Le
  2. Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA

    • Jie Xu
    • , Lijun Zhang
    • , Hongbo Yang
    • , Tingting Liu
    • , Sriranga Iyyanki
    • , James R. Broach
    •  & Feng Yue
  3. Department of Biological Science, Florida State University, Tallahassee, FL, USA

    • Vishnu Dileep
    • , Takayo Sasaki
    • , Juan Carlos Rivera-Mulia
    •  & David M. Gilbert
  4. Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA

    • Ye Zhan
    • , Hakan Ozadam
    • , Bryan R. Lajoie
    •  & Job Dekker
  5. Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, USA

    • Fan Song
    • , Yanli Wang
    • , Lin An
    •  & Feng Yue
  6. Department of Genome Sciences, University of Washington, Seattle, WA, USA

    • Galip Gürkan Yardımcı
    •  & William Stafford Noble
  7. La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA

    • Abhijit Chakraborty
    •  & Ferhat Ay
  8. Division of Otolaryngology, Head & Neck Surgery, Milton S. Hershey Medical Center, Hershey, PA, USA

    • Darrin V. Bann
    •  & Christopher Pool
  9. Penn State College of Medicine, Informatics and Technology, Hershey, PA, USA

    • Royden Clark
  10. Altius institute for Biomedical Sciences, Seattle, WA, USA

    • Rajinder Kaul
    • , Michael Buckley
    • , Kristen Lee
    • , Morgan Diegel
    •  & John A. Stamatoyannopoulos
  11. Research Department of Cancer Biology, Cancer Institute, University College London, London, UK

    • Dubravka Pezic
    •  & Suzana Hadjur
  12. Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

    • Christina Ernst
    •  & Duncan T. Odom
  13. German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics, Heidelberg, Germany

    • Duncan T. Odom
  14. Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, State College, PA, USA

    • Ross C. Hardison
  15. School of Medicine, University of California San Diego, La Jolla, CA, USA

    • Ferhat Ay
  16. Howard Hughes Medical Institute, Chevy Chase, MD, USA

    • Job Dekker


  1. Search for Jesse R. Dixon in:

  2. Search for Jie Xu in:

  3. Search for Vishnu Dileep in:

  4. Search for Ye Zhan in:

  5. Search for Fan Song in:

  6. Search for Victoria T. Le in:

  7. Search for Galip Gürkan Yardımcı in:

  8. Search for Abhijit Chakraborty in:

  9. Search for Darrin V. Bann in:

  10. Search for Yanli Wang in:

  11. Search for Royden Clark in:

  12. Search for Lijun Zhang in:

  13. Search for Hongbo Yang in:

  14. Search for Tingting Liu in:

  15. Search for Sriranga Iyyanki in:

  16. Search for Lin An in:

  17. Search for Christopher Pool in:

  18. Search for Takayo Sasaki in:

  19. Search for Juan Carlos Rivera-Mulia in:

  20. Search for Hakan Ozadam in:

  21. Search for Bryan R. Lajoie in:

  22. Search for Rajinder Kaul in:

  23. Search for Michael Buckley in:

  24. Search for Kristen Lee in:

  25. Search for Morgan Diegel in:

  26. Search for Dubravka Pezic in:

  27. Search for Christina Ernst in:

  28. Search for Suzana Hadjur in:

  29. Search for Duncan T. Odom in:

  30. Search for John A. Stamatoyannopoulos in:

  31. Search for James R. Broach in:

  32. Search for Ross C. Hardison in:

  33. Search for Ferhat Ay in:

  34. Search for William Stafford Noble in:

  35. Search for Job Dekker in:

  36. Search for David M. Gilbert in:

  37. Search for Feng Yue in:


J.X., J.R.D., F.S., and F.Y. led the overall integrative analysis. J.X. and S.F. performed the WGS data analysis. J.R.D. led the overall Hi-C analysis. ENCODE Hi-C data were generated by Y. Z. and analyzed by B.R.L., H.O., and J.D. J.R.D., V.T.L., J.X., and F.Y. performed the additional Hi-C and FISH experiments. J.X., F.Y., A.C. and F.A. contributed to Hi-C analysis. J.X., D.V.B., R.C., J.B., L.Z., C.P., J.R.B., and F.Y. performed the optical mapping and data analysis. V.D., T.S., J.C., and D.G. led the replication timing analysis. C.E. and D.O. prepared the Tc1 material. D.P. and S.H. prepared the Hi-C experiments on Tc1 cells and the preliminary analysis. G.Y., L.Z., H.Y., T.L., S.I., L.A., C.P., R.K., M.B., K.L., M.D., J.S., and D.G. analyzed the data. J.R.D., J.X., V.D., F.S., F.A., R.C.H., W.S.N., J.D., D.G., and F.Y. wrote the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Jesse R. Dixon or Ferhat Ay or William Stafford Noble or Job Dekker or David M. Gilbert or Feng Yue.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–24

  2. Reporting Summary

  3. Supplementary Note

  4. Supplementary Table 1

    List of cell/tissue types with performed experiments and analysis

  5. Supplementary Table 2

    Number of SVs detected by WGS, Hi-C and optical mapping in eight cancer cell lines and NA12878

  6. Supplementary Table 3

    SVs detected by WGS in eight cancer cell lines and NA12878

  7. Supplementary Table 4

    SVs detected by optical mapping in eight cancer cell lines and NA12878

  8. Supplementary Table 5

    SVs detected by Hi-C in 36 cell lines

  9. Supplementary Table 6

    High-confidence SV calls from integration

  10. Supplementary Table 7

    Validated translocations and deletions in K562, Caki and T47D cells

  11. Supplementary Table 8

    Cross comparison of large intrachromosomal rearrangements (≥1 Mb) and interchromosomal translocations

  12. Supplementary Table 9

    Contribution by each method and their overlapping percentage with high-confidence SVs

  13. Supplementary Table 10

    Integration of intrachromosomal rearrangements (<1 Mb)

  14. Supplementary Table 11

    Irys-detected deletions encompass multiple smaller WGS-detected deletions with the same total deletion sizes

  15. Supplementary Table 12

    Optical mapping predicts the size of unresolved genome gap in hg19

  16. Supplementary Table 13

    Optical mapping provides estimation of gap size in hg38 and comparison to previous gap assessment of hg38

  17. Supplementary Table 14

    SV-induced fused genes detected by RNA-seq

  18. Supplementary Table 15

    Summary of genes, repetitive elements and insulators overlapping with high-confidence deletions

  19. Supplementary Table 16

    Frequency of enhancer deletions versus simulated expectation in cancer cells and normal cells

  20. Supplementary Table 17

    Deleted potential enhancers and insulators in T47D, Caki2, K562 and NCIH460

About this article

Publication history