Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Integrative detection and analysis of structural variation in cancer genomes

Subjects

Abstract

Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overall strategy of SV detection in cancer genomes.
Fig. 2: Detection of SVs using Hi-C in cancer genomes.
Fig. 3: Comparison of SVs detected by different methods.
Fig. 4: The impact of SVs on enhancers.
Fig. 5: Rearrangements and TAD fusions.

Similar content being viewed by others

References

  1. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  3. Soda, M. et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature 448, 561–566 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Kwak, E. L. et al. Anaplastic lymphoma kinase inhibition in non-small-cell lung cancer. N. Engl. J. Med. 363, 1693–1703 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Rowley, J. D. Letter: a new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 243, 290–293 (1973).

    Article  CAS  PubMed  Google Scholar 

  6. Kantarjian, H. et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. N. Engl. J. Med. 346, 645–652 (2002).

    Article  CAS  PubMed  Google Scholar 

  7. Wan, T. S. Cancer cytogenetics: methodology revisited. Ann. Lab. Med. 34, 413–425 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mardis, E. R. & Wilson, R. K. Cancer genome sequencing: a review. Hum. Mol. Genet. 18, R163–168 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Inaki, K. et al. Transcriptional consequences of genomic structural aberrations in breast cancer. Genome Res. 21, 676–687 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Maher, C. A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Zhang, J. et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 26, 108–118 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Alkan, C., Coe, B. P. & Eichler, E. E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Xu, H. et al. Integrative analysis reveals the transcriptional collaboration between EZH2 and E2F1 in the regulation of cancer-related gene expression. Mol. Cancer Res. 14, 163–172 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).

    Article  CAS  PubMed  Google Scholar 

  21. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wang, Z. et al. The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS One 8, e58793 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Barutcu, A. R. et al. Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells. Genome Biol. 16, 214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Barutcu, A. R. et al. RUNX1 contributes to higher-order chromatin organization and gene regulation in breast cancer cells. Biochim. Biophys. Acta 1859, 1389–1397 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Taberlay, P. C. et al. Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res. 26, 719–731 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Guo, Y. et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900–910 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechno.l 31, 1119–1125 (2013).

    Article  CAS  Google Scholar 

  30. Engreitz, J. M., Agarwala, V. & Mirny, L. A. Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLoS One 7, e44196 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Naumova, N. et al. Organization of the mitotic chromosome. Science 342, 948–953 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Seaman, L. et al. Nucleome analysis reveals structure–function relationships for colon cancer. Mol. Cancer Res. 15, 821–830 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours. Genome Biol. 18, 125 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wu, H. J. & Michor, F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics 32, 3695–3701 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Chakraborty, A. & Ay, F. Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics 34, 338–345 (2017).

    Article  PubMed Central  Google Scholar 

  36. Naumann, S., Reutzel, D., Speicher, M. & Decker, H. J. Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk. Res. 25, 313–322 (2001).

    Article  CAS  PubMed  Google Scholar 

  37. O’Doherty, A. et al. An aneuploid mouse strain carrying human chromosome 21 with Down syndrome phenotypes. Science 309, 2033–2037 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Gribble, S. M. et al. Massively parallel sequencing reveals the complex structure of an irradiated human chromosome on a mouse background in the Tc1 model of Down syndrome. PLoS One 8, e60482 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Rhind, N. & Gilbert, D. M. DNA replication timing. Cold Spring Harb. Perspect. Biol. 5, a010132 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Dileep, V., Rivera-Mulia, J. C., Sima, J. & Gilbert, D. M. Large-scale chromatin structure-function relationships during the cell cycle and development: insights from replication timing. Cold Spring Harb. Symp. Quant. Biol. 80, 53–63 (2015).

    Article  PubMed  Google Scholar 

  41. Pope, B. D. et al. Replication-timing boundaries facilitate cell-type and species-specific regulation of a rearranged human chromosome in mouse. Hum. Mol. Genet. 21, 4162–4170 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ryba, T. et al. Abnormal developmental control of replication-timing domains in pediatric acute lymphoblastic leukemia. Genome Res. 22, 1833–1844 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Dileep, V. et al. Topologically associating domains and their long-range contacts are established during early G1 coincident with the establishment of the replication-timing program. Genome Res. 25, 1104–1113 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Rivera-Mulia, J. C. et al. Dynamic changes in replication timing and gene expression during lineage specification of human pluripotent stem cells. Genome Res. 25, 1091–1103 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sima, J. & Gilbert, D. M. Complex correlations: replication timing and mutational landscapes during cancer and genome evolution. Curr. Opin. Genet. Dev. 25, 93–100 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Chiarle, R. et al. Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells. Cell 147, 107–119 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Struski, S. et al. Identification of chromosomal loci associated with non-P-glycoprotein-mediated multidrug resistance to topoisomerase II inhibitor in lung adenocarcinoma cell line by comparative genomic hybridization. Genes Chromosomes Cancer 30, 136–142 (2001).

    Article  CAS  PubMed  Google Scholar 

  48. Strefford, J. C. et al. A combination of molecular cytogenetic analyses reveals complex genetic alterations in conventional renal cell carcinoma. Cancer Genet. Cytogenet. 159, 1–9 (2005).

    Article  CAS  PubMed  Google Scholar 

  49. Peng, K. J. et al. Characterization of two human lung adenocarcinoma cell lines by reciprocal chromosome painting. Dongwuxue Yanjiu 31, 113–121 (2010).

    PubMed  Google Scholar 

  50. Beheshti, B., Karaskova, J., Park, P. C., Squire, J. A. & Beatty, B. G. Identification of a high frequency of chromosomal rearrangements in the centromeric regions of prostate cancer cell lines by sequential giemsa banding and spectral karyotyping. Mol. Diagn. 5, 23–32 (2000).

    Article  CAS  PubMed  Google Scholar 

  51. Liu, J. et al. Modeling of lung cancer by an orthotopically growing H460SM variant cell line reveals novel candidate genes for systemic metastasis. Oncogene 23, 6316–6324 (2004).

    Article  CAS  PubMed  Google Scholar 

  52. Espino, P. S., Pritchard, S., Heng, H. H. & Davie, J. R. Genomic instability and histone H3 phosphorylation induction by the Ras-mitogen activated protein kinase pathway in pancreatic cancer cells. Int. J. Cancer 124, 562–567 (2009).

    Article  CAS  PubMed  Google Scholar 

  53. Sirivatanauksorn, V. et al. Non-random chromosomal rearrangements in pancreatic cancer cell lines identified by spectral karyotyping. Int. J. Cancer 91, 350–358 (2001).

    Article  CAS  PubMed  Google Scholar 

  54. Rondón-Lagos, M. et al. Differences and homologies of chromosomal alterations within and between breast cancer cell lines: a clustering analysis. Mol. Cytogenet. 7, 8 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Hillmer, A. M. et al. Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome Res. 21, 665–675 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hampton, O. A. et al. Long-range massively parallel mate pair sequencing detects distinct mutations and similar patterns of structural mutability in two breast cancer cell lines. Cancer Genet. 204, 447–457 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods 12, 780–786 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Seo, J. S. et al. De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016).

    Article  CAS  PubMed  Google Scholar 

  59. Forbes, S. A. et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–811 (2015).

    Article  CAS  PubMed  Google Scholar 

  60. Mifsud, B. et al. Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015).

    Article  CAS  PubMed  Google Scholar 

  61. Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).

    Article  CAS  PubMed  Google Scholar 

  62. Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Weischenfeldt, J. et al. Pan-cancer analysis of somatic copy-number alterations implicates IRS4 and IGF2 in enhancer hijacking. Nat. Genet. 49, 65–74 (2017).

    Article  CAS  PubMed  Google Scholar 

  66. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Marchal, C. et al. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq. Nat. Protoc. 13, 819–839 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Kim, D. & Salzberg, S. L. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Haas, B. et al. STAR-Fusion: fast and accurate fusion transcript detection from RNA-Seq. Preprint at https://www.biorxiv.org/content/early/2017/03/24/120295 (2017).

  71. Benelli, M. et al. Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript. Bioinformatics 28, 3232–3239 (2012).

    Article  CAS  PubMed  Google Scholar 

  72. Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nat. Biotechnol. 33, 306–312 (2015).

    Article  CAS  PubMed  Google Scholar 

  73. van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by NIH grants R35GM124820, R01HG009906, and U01CA200060 (F.Y.), R24DK106766 (R.C.H. and F.Y.), GM083337 (D.M.G.), GM085354 (D.M.G.), DK107965 (D.M.G.), U54HG004592 (J.D. and J.A.S.), HG003143 and DK107980 (J.D.), U41HG007000 (W.S.N.), and DP5OD023071 (J. D.). This work was also supported by European Research Council (No. 615584 to D.T.O.and C.E.), Cancer Research UK (Nos. 20412 and 22398 to D.T.O. and C.E.), Wellcome Trust (No. 84459 to D.T.O. and C.E.), and Wellcome Trust (No. 106985/Z/15/Z to S.H.). J.D. is an investigator of the Howard Hughes Medical Institute. J.R.D. is also supported by the Leona M. and Harry B. Helmsley Charitable Trust grant No. 2017-PG-MED001. F.A. was supported by Institute Leadership Funds from La Jolla Institute for Allergy and Immunology. F.Y. is also supported by the Leukemia Research Foundation and Penn State Clinical and Translational Science Institute. We thank the ENCODE Data Coordination Center for helping with Hi-C and replication time data deposition. We would also like to thank Jan Karlseder and Nausica Arnault for help with the FISH experiments.

Author information

Authors and Affiliations

Authors

Contributions

J.X., J.R.D., F.S., and F.Y. led the overall integrative analysis. J.X. and S.F. performed the WGS data analysis. J.R.D. led the overall Hi-C analysis. ENCODE Hi-C data were generated by Y. Z. and analyzed by B.R.L., H.O., and J.D. J.R.D., V.T.L., J.X., and F.Y. performed the additional Hi-C and FISH experiments. J.X., F.Y., A.C. and F.A. contributed to Hi-C analysis. J.X., D.V.B., R.C., J.B., L.Z., C.P., J.R.B., and F.Y. performed the optical mapping and data analysis. V.D., T.S., J.C., and D.G. led the replication timing analysis. C.E. and D.O. prepared the Tc1 material. D.P. and S.H. prepared the Hi-C experiments on Tc1 cells and the preliminary analysis. G.Y., L.Z., H.Y., T.L., S.I., L.A., C.P., R.K., M.B., K.L., M.D., J.S., and D.G. analyzed the data. J.R.D., J.X., V.D., F.S., F.A., R.C.H., W.S.N., J.D., D.G., and F.Y. wrote the manuscript.

Corresponding authors

Correspondence to Jesse R. Dixon, Ferhat Ay, William Stafford Noble, Job Dekker, David M. Gilbert or Feng Yue.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–24

Reporting Summary

Supplementary Note

Supplementary Table 1

List of cell/tissue types with performed experiments and analysis

Supplementary Table 2

Number of SVs detected by WGS, Hi-C and optical mapping in eight cancer cell lines and NA12878

Supplementary Table 3

SVs detected by WGS in eight cancer cell lines and NA12878

Supplementary Table 4

SVs detected by optical mapping in eight cancer cell lines and NA12878

Supplementary Table 5

SVs detected by Hi-C in 36 cell lines

Supplementary Table 6

High-confidence SV calls from integration

Supplementary Table 7

Validated translocations and deletions in K562, Caki and T47D cells

Supplementary Table 8

Cross comparison of large intrachromosomal rearrangements (≥1 Mb) and interchromosomal translocations

Supplementary Table 9

Contribution by each method and their overlapping percentage with high-confidence SVs

Supplementary Table 10

Integration of intrachromosomal rearrangements (<1 Mb)

Supplementary Table 11

Irys-detected deletions encompass multiple smaller WGS-detected deletions with the same total deletion sizes

Supplementary Table 12

Optical mapping predicts the size of unresolved genome gap in hg19

Supplementary Table 13

Optical mapping provides estimation of gap size in hg38 and comparison to previous gap assessment of hg38

Supplementary Table 14

SV-induced fused genes detected by RNA-seq

Supplementary Table 15

Summary of genes, repetitive elements and insulators overlapping with high-confidence deletions

Supplementary Table 16

Frequency of enhancer deletions versus simulated expectation in cancer cells and normal cells

Supplementary Table 17

Deleted potential enhancers and insulators in T47D, Caki2, K562 and NCIH460

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dixon, J.R., Xu, J., Dileep, V. et al. Integrative detection and analysis of structural variation in cancer genomes. Nat Genet 50, 1388–1398 (2018). https://doi.org/10.1038/s41588-018-0195-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-018-0195-8

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research