Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Systematic discovery of complex insertions and deletions in human cancers


Complex insertions and deletions (indels) are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here we present a systematic analysis of somatic complex indels in the coding sequences of samples from over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer-associated genes (such as PIK3R1, TP53, ARID1A, GATA3 and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or misannotated (17.6%) in previous reports of 2,199 samples. In-frame complex indels are enriched in PIK3R1 and EGFR, whereas frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN and ATRX. Furthermore, complex indels display strong tissue specificity (such as VHL in kidney cancer samples and GATA3 in breast cancer samples). Finally, structural analyses support findings of previously missed, but potentially druggable, mutations in the EGFR, MET and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Workflow and algorithm testing for somatic complex indel detection and filtering.
Figure 2: The exome-wide landscape and characteristics of somatic complex indels across 20 cancer types.
Figure 3: Schematics showing the configurations of simple and complex indels.
Figure 4: Abundance of somatic complex indels in key cancer genes.
Figure 5: Druggability of somatic complex indels in EGFR and KIT.

Similar content being viewed by others

Accession codes


Protein Data Bank


  1. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  Google Scholar 

  2. Sudmant, P.H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  Google Scholar 

  3. Kloosterman, W.P. et al. Characteristics of de novo structural changes in the human genome. Genome Res. 25, 792–801 (2015).

    Article  CAS  Google Scholar 

  4. Roerink, S.F., van Schendel, R. & Tijsterman, M. Polymerase-θ–mediated end joining of replication-associated DNA breaks in C. elegans. Genome Res. 24, 954–962 (2014).

    Article  CAS  Google Scholar 

  5. Koole, W. et al. A polymerase-θ–dependent repair pathway suppresses extensive genomic instability at endogenous G4 DNA sites. Nat. Commun. 5, 3216 (2014).

    Article  Google Scholar 

  6. Yu, A.M. & McVey, M. Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions. Nucleic Acids Res. 38, 5706–5717 (2010).

    Article  CAS  Google Scholar 

  7. Han, S.W. et al. Predictive and prognostic impact of epidermal growth factor receptor mutation in non–small-cell lung cancer patients treated with gefitinib. J. Clin. Oncol. 23, 2493–2501 (2005).

    Article  CAS  Google Scholar 

  8. Lara-Guerra, H. et al. Phase 2 study of preoperative gefitinib in clinical stage 1 non–small-cell lung cancer. J. Clin. Oncol. 27, 6229–6236 (2009).

    Article  CAS  Google Scholar 

  9. Ruppert, A.M. et al. EGFR-TKI and lung adenocarcinoma with CNS relapse: interest of molecular follow-up. Eur. Respir. J. 33, 436–440 (2009).

    Article  CAS  Google Scholar 

  10. Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).

    Article  CAS  Google Scholar 

  11. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).

  12. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumors. Nature 490, 61–70 (2012).

  13. Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  14. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).

  15. Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

  16. Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).

  17. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear-cell renal-cell carcinoma. Nature 499, 43–49 (2013).

  18. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322 (2014).

  19. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

  20. Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).

  21. Pleasance, E.D. et al. A comprehensive catalog of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010).

    Article  CAS  Google Scholar 

  22. Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  Google Scholar 

  23. Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).

    Article  CAS  Google Scholar 

  24. Frampton, G.M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).

    Article  CAS  Google Scholar 

  25. Kanchi, K.L. et al. Integrated analysis of germline and somatic variants in ovarian cancer. Nat. Commun. 5, 3156 (2014).

    Article  Google Scholar 

  26. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

    Article  CAS  Google Scholar 

  27. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumor types. Nature 505, 495–501 (2014).

    Article  CAS  Google Scholar 

  28. Pritchard, C.C. et al. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation and gene rearrangement in clinical cancer specimens. J. Mol. Diagn. 16, 56–67 (2014).

    Article  CAS  Google Scholar 

  29. Rahman, N. Realizing the promise of cancer predisposition genes. Nature 505, 302–308 (2014).

    Article  CAS  Google Scholar 

  30. Rehm, H.L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013).

    Article  Google Scholar 

  31. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).

    Article  CAS  Google Scholar 

  32. Sordella, R., Bell, D.W., Haber, D.A. & Settleman, J. Gefitinib-sensitizing EGFR mutations in lung cancer activate anti-apoptotic pathways. Science 305, 1163–1167 (2004).

    Article  CAS  Google Scholar 

  33. Ye, K., Kosters, W.A. & Ijzerman, A.P. An efficient, versatile and scalable pattern-growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23, 687–693 (2007).

    Article  CAS  Google Scholar 

  34. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern-growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  Google Scholar 

  35. Zhang, Y. et al. PASSion: a pattern-growth algorithm–based pipeline for splice junction detection in paired-end RNA-seq data. Bioinformatics 28, 479–486 (2012).

    Article  CAS  Google Scholar 

Download references


This work was supported by the National Cancer Institute grant R01CA180006 (L.D.), the National Human Genome Research Institute grant U01HG006517 (L.D.), the Department of Defense grant W81XWH-14-1-0458 (F.C.) and the National Institute of Diabetes and Digestive and Kidney Diseases grant R01DK087960 (F.C.). Additional support came from the National Institute of General Medical Sciences Cell and Molecular Biology training grant GM 007067 (R.J.) and the National Human Genome Research Institute Genome Analysis Training Program grant T32 HG000045 (M.X.). We acknowledge The Cancer Genome Atlas ( as the source of primary data, and we thank M. Wyczalkowski for technical assistance and members of TCGA Research Network for helpful discussions.

Author information

Authors and Affiliations



L.D. designed and supervised research. L.D. and K.Y. led data analysis and K.Y., J.W., M.D.M., M.X., R.J., S.C., A.S., V.Y., K.H., K.J.J. and M.C.W. performed data analysis. K.Y. led methods development and E.-W.L., M.M., B.N. and P.E.S. contributed code to Pindel-C. K.Y., R.J., S.C. and S.F. developed the QC code. J.F.M., K.Y. and L.D. prepared figures and tables. F.C. and J.N. performed experimental validation. K.Y., M.C.W. and L.D. wrote the manuscript.

Corresponding author

Correspondence to Li Ding.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Notes 1-4 (PDF 35613 kb)

Supplementary Table 1

In total 1,128 complex indels on Venter chr1 were spiked in (XLSX 126 kb)

Supplementary Table 2

The complex indels reported by Pindel-C taking the bam file with spiked in Venter complex indels as input (XLSX 98 kb)

Supplementary Table 3

Somatic and germline complex indel validation result in COLO829 cell lines (XLSX 46 kb)

Supplementary Table 4

Exome-wide complex indels (XLSX 451 kb)

Supplementary Table 5

MUSIC correlation analysis (XLSX 894 kb)

Supplementary Table 6

A list of 624 cancer-associated genes compiled from literature (XLSX 41 kb)

Supplementary Table 7

A list of complex indels in cancer-associated genes (XLSX 94 kb)

Supplementary Table 8

Validation of somatic complex indels discovered in exome data using whole genome sequence data (XLSX 53 kb)

Supplementary Table 9

Complex indel variant allele fraction and VAF of simple variants (XLSX 61 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, K., Wang, J., Jayasinghe, R. et al. Systematic discovery of complex insertions and deletions in human cancers. Nat Med 22, 97–104 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer