Abstract
Complex insertions and deletions (indels) are formed by simultaneously deleting and inserting DNA fragments of different sizes at a common genomic location. Here we present a systematic analysis of somatic complex indels in the coding sequences of samples from over 8,000 cancer cases using Pindel-C. We discovered 285 complex indels in cancer-associated genes (such as PIK3R1, TP53, ARID1A, GATA3 and KMT2D) in approximately 3.5% of cases analyzed; nearly all instances of complex indels were overlooked (81.1%) or misannotated (17.6%) in previous reports of 2,199 samples. In-frame complex indels are enriched in PIK3R1 and EGFR, whereas frameshifts are prevalent in VHL, GATA3, TP53, ARID1A, PTEN and ATRX. Furthermore, complex indels display strong tissue specificity (such as VHL in kidney cancer samples and GATA3 in breast cancer samples). Finally, structural analyses support findings of previously missed, but potentially druggable, mutations in the EGFR, MET and KIT oncogenes. This study indicates the critical importance of improving complex indel discovery and interpretation in medical research.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Comprehensive analysis of mutational signatures reveals distinct patterns and molecular processes across 27 pediatric cancers
Nature Cancer Open Access 26 January 2023
-
A promising Prognostic risk model for advanced renal cell carcinoma (RCC) with immune-related genes
BMC Cancer Open Access 23 June 2022
-
Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes
Scientific Reports Open Access 27 October 2021
Access options
Subscribe to this journal
Receive 12 print issues and online access
We are sorry, but there is no personal subscription option available for your country.
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





References
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Sudmant, P.H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Kloosterman, W.P. et al. Characteristics of de novo structural changes in the human genome. Genome Res. 25, 792–801 (2015).
Roerink, S.F., van Schendel, R. & Tijsterman, M. Polymerase-θ–mediated end joining of replication-associated DNA breaks in C. elegans. Genome Res. 24, 954–962 (2014).
Koole, W. et al. A polymerase-θ–dependent repair pathway suppresses extensive genomic instability at endogenous G4 DNA sites. Nat. Commun. 5, 3216 (2014).
Yu, A.M. & McVey, M. Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions. Nucleic Acids Res. 38, 5706–5717 (2010).
Han, S.W. et al. Predictive and prognostic impact of epidermal growth factor receptor mutation in non–small-cell lung cancer patients treated with gefitinib. J. Clin. Oncol. 23, 2493–2501 (2005).
Lara-Guerra, H. et al. Phase 2 study of preoperative gefitinib in clinical stage 1 non–small-cell lung cancer. J. Clin. Oncol. 27, 6229–6236 (2009).
Ruppert, A.M. et al. EGFR-TKI and lung adenocarcinoma with CNS relapse: interest of molecular follow-up. Eur. Respir. J. 33, 436–440 (2009).
Nangalia, J. et al. Somatic CALR mutations in myeloproliferative neoplasms with nonmutated JAK2. N. Engl. J. Med. 369, 2391–2405 (2013).
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumors. Nature 490, 61–70 (2012).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).
Cancer Genome Atlas Research Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear-cell renal-cell carcinoma. Nature 499, 43–49 (2013).
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315–322 (2014).
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Pleasance, E.D. et al. A comprehensive catalog of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010).
Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res. 22, 1589–1598 (2012).
Frampton, G.M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).
Kanchi, K.L. et al. Integrated analysis of germline and somatic variants in ovarian cancer. Nat. Commun. 5, 3156 (2014).
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumor types. Nature 505, 495–501 (2014).
Pritchard, C.C. et al. Validation and implementation of targeted capture and sequencing for the detection of actionable mutation, copy number variation and gene rearrangement in clinical cancer specimens. J. Mol. Diagn. 16, 56–67 (2014).
Rahman, N. Realizing the promise of cancer predisposition genes. Nature 505, 302–308 (2014).
Rehm, H.L. et al. ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Sordella, R., Bell, D.W., Haber, D.A. & Settleman, J. Gefitinib-sensitizing EGFR mutations in lung cancer activate anti-apoptotic pathways. Science 305, 1163–1167 (2004).
Ye, K., Kosters, W.A. & Ijzerman, A.P. An efficient, versatile and scalable pattern-growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23, 687–693 (2007).
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern-growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Zhang, Y. et al. PASSion: a pattern-growth algorithm–based pipeline for splice junction detection in paired-end RNA-seq data. Bioinformatics 28, 479–486 (2012).
Acknowledgements
This work was supported by the National Cancer Institute grant R01CA180006 (L.D.), the National Human Genome Research Institute grant U01HG006517 (L.D.), the Department of Defense grant W81XWH-14-1-0458 (F.C.) and the National Institute of Diabetes and Digestive and Kidney Diseases grant R01DK087960 (F.C.). Additional support came from the National Institute of General Medical Sciences Cell and Molecular Biology training grant GM 007067 (R.J.) and the National Human Genome Research Institute Genome Analysis Training Program grant T32 HG000045 (M.X.). We acknowledge The Cancer Genome Atlas (http://cancergenome.nih.gov) as the source of primary data, and we thank M. Wyczalkowski for technical assistance and members of TCGA Research Network for helpful discussions.
Author information
Authors and Affiliations
Contributions
L.D. designed and supervised research. L.D. and K.Y. led data analysis and K.Y., J.W., M.D.M., M.X., R.J., S.C., A.S., V.Y., K.H., K.J.J. and M.C.W. performed data analysis. K.Y. led methods development and E.-W.L., M.M., B.N. and P.E.S. contributed code to Pindel-C. K.Y., R.J., S.C. and S.F. developed the QC code. J.F.M., K.Y. and L.D. prepared figures and tables. F.C. and J.N. performed experimental validation. K.Y., M.C.W. and L.D. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–3 and Supplementary Notes 1-4 (PDF 35613 kb)
Supplementary Table 1
In total 1,128 complex indels on Venter chr1 were spiked in (XLSX 126 kb)
Supplementary Table 2
The complex indels reported by Pindel-C taking the bam file with spiked in Venter complex indels as input (XLSX 98 kb)
Supplementary Table 3
Somatic and germline complex indel validation result in COLO829 cell lines (XLSX 46 kb)
Supplementary Table 4
Exome-wide complex indels (XLSX 451 kb)
Supplementary Table 5
MUSIC correlation analysis (XLSX 894 kb)
Supplementary Table 6
A list of 624 cancer-associated genes compiled from literature (XLSX 41 kb)
Supplementary Table 7
A list of complex indels in cancer-associated genes (XLSX 94 kb)
Supplementary Table 8
Validation of somatic complex indels discovered in exome data using whole genome sequence data (XLSX 53 kb)
Supplementary Table 9
Complex indel variant allele fraction and VAF of simple variants (XLSX 61 kb)
Rights and permissions
About this article
Cite this article
Ye, K., Wang, J., Jayasinghe, R. et al. Systematic discovery of complex insertions and deletions in human cancers. Nat Med 22, 97–104 (2016). https://doi.org/10.1038/nm.4002
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nm.4002
This article is cited by
-
SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads
Nature Methods (2023)
-
Comprehensive analysis of mutational signatures reveals distinct patterns and molecular processes across 27 pediatric cancers
Nature Cancer (2023)
-
A promising Prognostic risk model for advanced renal cell carcinoma (RCC) with immune-related genes
BMC Cancer (2022)
-
Comparison of sequencing data processing pipelines and application to underrepresented African human populations
BMC Bioinformatics (2021)
-
Mutation–selection balance and compensatory mechanisms in tumour evolution
Nature Reviews Genetics (2021)