Abstract
Microsatellites (MSs) are tracts of variable-length repeats of short DNA motifs that exhibit high rates of mutation in the form of insertions or deletions (indels) of the repeated motif. Despite their prevalence, the contribution of somatic MS indels to cancer has been largely unexplored, owing to difficulties in detecting them in short-read sequencing data. Here we present two tools: MSMuTect, for accurate detection of somatic MS indels, and MSMutSig, for identification of genes containing MS indels at a higher frequency than expected by chance. Applying MSMuTect to whole-exome data from 6,747 human tumors representing 20 tumor types, we identified >1,000 previously undescribed MS indels in cancer genes. Additionally, we demonstrate that the number and pattern of MS indels can accurately distinguish microsatellite-stable tumors from tumors with microsatellite instability, thus potentially improving classification of clinically relevant subgroups. Finally, we identified seven MS indel driver hotspots: four in known cancer genes (ACVR2A, RNF43, JAK1, and MSH3) and three in genes not previously implicated as cancer drivers (ESRP1, PRDM2, and DOCK3).
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Microsatellite instability assessment is instrumental for Predictive, Preventive and Personalised Medicine: status quo and outlook
EPMA Journal Open Access 25 January 2023
-
Mutational signatures reveal ternary relationships between homologous recombination repair, APOBEC, and mismatch repair in gynecological cancers
Journal of Translational Medicine Open Access 02 February 2022
-
Genomic predictors of response to PD-1 inhibition in children with germline DNA replication repair deficiency
Nature Medicine Open Access 06 January 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





References
Ellegren, H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 5, 435–445 (2004).
Sun, J.X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).
Pearson, C.E., Nichol Edamura, K. & Cleary, J.D. Repeat instability: mechanisms of dynamic mutations. Nat. Rev. Genet. 6, 729–742 (2005).
Kennedy, L. et al. Dramatic tissue-specific mutation length increases are an early molecular event in Huntington disease pathogenesis. Hum. Mol. Genet. 12, 3359–3367 (2003).
Willemsen, R., Levenga, J. & Oostra, B.A. CGG repeat in the FMR1 gene: size matters. Clin. Genet. 80, 214–225 (2011).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Giannakis, M. et al. RNF43 is frequently mutated in colorectal and endometrial cancers. Nat. Genet. 46, 1264–1266 (2014).
Vilar, E. & Gruber, S.B. Microsatellite instability in colorectal cancer-the stable evidence. Nat. Rev. Clin. Oncol. 7, 153–162 (2010).
Stadler, Z.K. Diagnosis and management of DNA mismatch repair-deficient colorectal cancer. Hematol. Oncol. Clin. North Am. 29, 29–41 (2015).
Le, D.T. et al. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).
Watkins, J.C. et al. Universal screening for mismatch-repair deficiency in endometrial cancers to identify patients with Lynch syndrome and Lynch-like syndrome. Int. J. Gynecol. Pathol. 36, 115–127 (2017).
Umar, A. et al. Revised Bethesda Guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability. J. Natl. Cancer Inst. 96, 261–268 (2004).
Hause, R.J., Pritchard, C.C., Shendure, J. & Salipante, S.J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Tokunaga, E. et al. Frequency of microsatellite instability in breast cancer determined by high-resolution fluorescent microsatellite analysis. Oncology 59, 44–49 (2000).
Larson, A.A. et al. Analysis of replication error (RER+) phenotypes in cervical carcinoma. Cancer Res. 56, 1426–1431 (1996).
Taylor, N.P. et al. Defective DNA mismatch repair and XRCC2 mutation in uterine carcinosarcomas. Gynecol. Oncol. 100, 107–110 (2006).
Medina-Arana, V. et al. Adrenocortical carcinoma, an unusual extracolonic tumor associated with Lynch II syndrome. Fam. Cancer 10, 265–271 (2011).
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Liu, L., De, S. & Michor, F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nat. Commun. 4, 1502 (2013).
Kim, T.-M., Laird, P.W. & Park, P.J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–868 (2013).
Knudson, A.G. Jr. Mutation and cancer: statistical study of retinoblastoma. Proc. Natl. Acad. Sci. USA 68, 820–823 (1971).
Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546–1558 (2013).
Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
Cederquist, K. Genetic and epidemiological studies of hereditary colorectal cancer PhD thesis, Umeå University (2005).
Biswas, S. et al. Mutational inactivation of TGFBR2 in microsatellite unstable colon cancer arises from the cooperation of genomic instability and the clonal outgrowth of transforming growth factor β resistant cells. Genes Chromosom. Cancer 47, 95–106 (2008).
Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Maquat, L.E. Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat. Rev. Mol. Cell Biol. 5, 89–99 (2004).
Lewis, B.P., Green, R.E. & Brenner, S.E. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc. Natl. Acad. Sci. USA 100, 189–192 (2003).
Zhang, J., Sun, X., Qian, Y. & Maquat, L.E. Intron function in the nonsense-mediated decay of beta-globin mRNA: indications that pre-mRNA splicing in the nucleus can influence mRNA translation in the cytoplasm. RNA 4, 801–815 (1998).
Silva, A.L. et al. The canonical UPF1-dependent nonsense-mediated mRNA decay is inhibited in transcripts carrying a short open reading frame independent of sequence context. RNA 12, 2160–2170 (2006).
Deacu, E. et al. Activin type II receptor restoration in ACVR2-deficient colon cancer cells induces transforming growth factor-β response pathway genes. Cancer Res. 64, 7690–7696 (2004).
Ballikaya, S. Activin receptor type 2 A (ACVR2A)-dependent proteomic and glycomic alterations in a microsatellite unstable (MSI) colorectal cancer cell line model system PhD thesis, Ruperto-Carola University of Heidelberg (2014).
Niu, L. et al. RNF43 inhibits cancer cell proliferation and could be a potential prognostic factor for human gastric carcinoma. Cell. Physiol. Biochem. 36, 1835–1846 (2015).
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014).
Duraturo, F. et al. Association of low-risk MSH3 and MSH2 variant alleles with Lynch syndrome: probability of synergistic effects. Int. J. Cancer 129, 1643–1650 (2011).
de Wind, N. et al. HNPCC-like cancer predisposition in mice through simultaneous loss of Msh3 and Msh6 mismatch-repair protein functions. Nat. Genet. 23, 359–362 (1999).
Mzoughi, S., Tan, Y.X., Low, D. & Guccione, E. The role of PRDMs in cancer: one family, two sides. Curr. Opin. Genet. Dev. 36, 83–91 (2016).
Ge, P., Yu, X., Wang, Z.-C. & Lin, J. Aberrant methylation of the 1p36 tumor suppressor gene RIZ1 in renal cell carcinoma. Asian Pac. J. Cancer Prev. 16, 4071–4075 (2015).
Dong, S.-W. et al. Alteration in gene expression profile and oncogenicity of esophageal squamous cell carcinoma by RIZ1 upregulation. World J. Gastroenterol. 19, 6170–6177 (2013).
Liu, Z.Y. et al. Retinoblastoma protein-interacting zinc-finger gene 1 (RIZ1) dysregulation in human malignant meningiomas. Oncogene 32, 1216–1222 (2013).
Warzecha, C.C., Sato, T.K., Nabet, B., Hogenesch, J.B. & Carstens, R.P. ESRP1 and ESRP2 are epithelial cell-type-specific regulators of FGFR2 splicing. Mol. Cell 33, 591–601 (2009).
Ueda, J. et al. Epithelial splicing regulatory protein 1 is a favorable prognostic factor in pancreatic cancer that attenuates pancreatic metastases. Oncogene 33, 4485–4495 (2014).
Gordon, G.M., Lambert, Q.T., Daniel, K.G. & Reuther, G.W. Transforming JAK1 mutations exhibit differential signalling, FERM domain requirements and growth responses to interferon-γ. Biochem. J. 432, 255–265 (2010).
Ren, Y. et al. JAK1 truncating mutations in gynecologic cancer define new role of cancer-associated protein tyrosine kinase aberrations. Sci. Rep. 3, 3042 (2013).
Einav, U. et al. Gene expression analysis reveals a strong signature of an interferon-induced pathway in childhood lymphoblastic leukemia as well as in breast and ovarian cancer. Oncogene 24, 6367–6375 (2005).
Caspi, E. & Rosin-Arbesfeld, R. A novel functional screen in human cells identifies MOCA as a negative regulator of Wnt signaling. Mol. Biol. Cell 19, 4660–4674 (2008).
Taupin, D. et al. A deleterious RNF43 germline mutation in a severely affected serrated polyposis kindred. Hum. Genome Var. 2, 15013 (2015).
Howitt, B.E. et al. Association of polymerase e-mutated and microsatellite-instable endometrial cancers with neoantigen load, number of tumor-infiltrating lymphocytes, and expression of PD-1 and PD-L1. JAMA Oncol. 1, 1319–1323 (2015).
Lee, V., Murphy, A., Le, D.T. & Diaz, L.A. Jr. Mismatch repair deficiency and response to immune checkpoint blockade. Oncologist 21, 1200–1211 (2016).
Lujan, S.A., Clark, A.B. & Kunkel, T.A. Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects. Nucleic Acids Res. 43, 4067–4074 (2015).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
Rheinbay, E. et al. Recurrent and functional regulatory mutations in breast cancer. Nature 547, 55–60 (2017).
The Cancer Genome Atlas Data Portal. https://tcga-data.nci.nih.gov/docs/publications/tcga/ (accessed 10 October, 2016).
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Mayer, C., Leese, F. & Tollrian, R. Genome-wide analysis of tandem repeats in Daphnia pulex: a comparative approach. BMC Genomics 11, 277 (2010).
Ramos, A.H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015).
Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Acknowledgements
We thank C. Mayer for supplying and supporting the PHOBOS tool. G.G. was partially funded by the Paul C. Zamecnick, MD, Chair in Oncology at MGH and the NIH TCGA Genome Data Analysis Center (NIH U24CA143845). Y.E.M., P. Polak, and A. Kamburov were funded by G.G.'s start-up funds at Massachusetts General Hospital. K.W.M. was partially funded by an American Society of Radiation Oncology (ASTRO) Junior Faculty Career Research Training Award and a Harvard Catalyst KL2/CMeRIT Award. F.M. gratefully acknowledges support from the Dana-Farber Cancer Institute Physical Sciences Oncology Center (NIH U54CA193461). R.K. was supported by the European Commission Seventh Framework Programme (Integra-Life; grant 315997) and the Croatian Science Foundation (grant IP-2014-09-6400).
Author information
Authors and Affiliations
Contributions
Y.E.M., K.W.M., F.M., and G.G. devised the research strategy. Y.E.M. and G.G. developed the tools. Y.E.M., R.K., N.J.H., and J.M.H. performed analyses. Y.E.M., K.W.M., R.K., P. Parasuraman, A. Kamburov, P. Polak, N.J.H., J.M.H., E.R., Y.B., A. Koren, L.Z.B., A.D'A., M.S.L., A.J.B., A.B., F.M., and G.G. helped interpret results. Y.E.M., K.W.M., and G.G. wrote the manuscript. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The Broad Institute has filed a patent application regarding the analysis of somatic microsatellite indels in cancers, as reported in this publication.
Integrated supplementary information
Supplementary Figure 1 Motif size distribution.
The number of MS loci per motif size across the whole genome (red), exome (green), and in an annotated set of cancer genes from Lawrence et at1 (blue). Mono- and di-repeats represent ~99% of all MS loci.
1. Lawrence, M. S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Supplementary Figure 2 Sequencing coverage across motifs.
The number of MS loci per length for different motifs (A, C, AC, and AG) across the exome is shown in red while the average number of MS loci covered by at least 10 reads is shown in blue. The number of MS loci covered at 10x depth decreases more rapidly than the number of MS loci, demonstrating the difficulty in achieving sufficient coverage for longer repeat lengths. Together, the motifs A, C, AC, and AG represent 98% of MS loci in the exome.
Supplementary Figure 3 Comparison of accuracy of sequence-alignment tools at MS loci.
Noise is plotted as a function of the MS repeat length for the standard alignment (using Burrows-Wheeler Aligner, BWA2) versus the MS-specific alignment (adapted from lobSTR3). Data is shown for the AG motif. Noise was defined as the fraction of reads that differ from the modal number of repeats, aggregated over all the MS loci in the X-chromosome from normal male samples (which are assumed to be homozygous at each MS locus). On average, noise is reduced by approximately a factor of 5 using the MS-specific alignment method.
2. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009).
3. Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).
Supplementary Figure 4 Analysis of true-positive rates.
The number of detected simulated MS indels (out of 200) across repeat lengths (shown in different colors) and allele fractions. The sensitivity to detect MS indels decreases markedly at low allele fractions.
Supplementary Figure 5 False-positive rates.
False positive rates for the A and C motifs as a function MSMuTect parameters. Heat maps show the log10 false positive rate per MS locus (i.e. the fraction of false-called MS indels among all MS loci) for the A and C motifs. The y-axis is the threshold for the different AIC scores (Tr) and the x-axis is the threshold for the Kolmogorov-Smirnov (KS) filtering step.
Supplementary Figure 6 Distribution of MS indels and SNVs across cancer.
Comparison of the fraction of MS indels (upper panel) and number of SNVs (lower panel) across 4,041 tumors from 20 tumor types. Only samples with annotated MS indels and SNVs are shown. Red horizontal lines represent the mean number of MS indels in each tumor type.
Supplementary Figure 7 The number of MS indels for different changes in the number of alleles.
The number of MS indels for STAD samples (broken to MSI-H, MSI-L and MSS) plotted for different numbers of germline and tumor alleles. MSMuTect not only detects the presence of a somatic MS indel, but also infers the actual alleles in both the germline and tumor samples. The upper row shows the number of MS indels for loci that had one allele in the germline and the lower row for two alleles in the germline. The columns represent the number of somatic MS indels alleles in the tumor (range from one to four). For example, the plot in the third column of the second row shows cases in which the germline has two alleles (ie. heterozygous sites) but the tumor sample has 3 alleles. MS indels are more common in MSI-H tumors in all settings except when the germline has two alleles but the tumor has only a single allele (bottom left corner), which reflects loss-of-heterozygosity (LOH). MSI designations (MSI-H, MSI-L, or MSS) are based on Bethesda gel classification (taken from TCGA). The y-axis scale varies across panels. The significance of the difference was calculated using one tailed t-test (ns- p>0.05, * p<0.05, ** p<10-3, *** p<10-8, **** p<10-16).
Supplementary Figure 8 Correlation between germline variability and somatic MS indel frequency.
The x-axis represents the binned fraction of non-reference alleles at each MS locus (out of the 2*N alleles in our cohort, where N is the number of covered normal samples). The somatic MS indel frequency for each MS locus is plotted as blue dots. Black dots represent the mean of each bin. The upper panel shows germline variability of A8 in the range of germline variability between 0 to 0.1 and the lower panel in the range of 0 to 1. The effect of germline variability on the somatic rate is minor for germline variability <0.1.
Supplementary Figure 9 Distribution of MS indels in A8 in noncoding regions.
The observed frequency of mutated A8 loci per given number of indels are shown as black dots whereas the expected frequency using a fit based on a Binomial distribution is represented by the red line. The x-axis represents the number of MS indels and the y-axis represents the fraction of loci that have a particular number of MS indels.
Supplementary Figure 10 STAD quantile–quantile plot.
MSMutSig QQ plot for stomach adenocarcinoma (STAD). Quantile-quantile plot of observed vs. expected P-values under the negative binomial (also called gamma-Poisson) model. Significant MS loci (q<0.1) are shown in red.
Supplementary Figure 11 COAD quantile–quantile plot.
MSMutSig QQ plot for colon adenocarcinoma (COAD). Quantile-quantile plot of observed vs. expected P-values under the negative binomial (also called gamma-Poisson) model. Significant MS loci (q<0.1) are shown in red.
Supplementary Figure 12 UCEC quantile–quantile plot.
MSMutSig QQ plot for endometrial cancer (UCEC). Quantile-quantile plot of observed vs. expected P-values under the negative binomial (also called gamma-Poisson) model. Significant MS loci (q<0.1) are shown in red.
Supplementary Figure 13 PRDM2 transcript levels in WT versus mutant PRDM2 cases.
PRDM2 transcript levels (by RNAseq) was lower in cases with a PRDM2 p.K1489fs frameshift mutation than in PRDM2 WT cases (P=0.016, two tailed Mann-Whitney test).
Supplementary Figure 14 MutSig quantile–quantile plot for endometrial cancer (UCEC).
Quantile-quantile plot of observed vs. expected P-values for MSI-H cases using only previously identified mutations (red) and using previously identified mutations and MS indels (green). Using MutSig for datasets with large numbers of MS indels leads to an inflation in the number of significantly mutated genes.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–14 (PDF 5712 kb)
Supplementary Tables 1–5
Supplementary Tables 1–5 (XLSX 6429 kb)
Supplementary Software 1
MSMuTect (ZIP 167867 kb)
Supplementary Software 2
MSMutSig (ZIP 1362 kb)
Rights and permissions
About this article
Cite this article
Maruvka, Y., Mouw, K., Karlic, R. et al. Analysis of somatic microsatellite indels identifies driver events in human tumors. Nat Biotechnol 35, 951–959 (2017). https://doi.org/10.1038/nbt.3966
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.3966
This article is cited by
-
Microsatellite instability assessment is instrumental for Predictive, Preventive and Personalised Medicine: status quo and outlook
EPMA Journal (2023)
-
Mutational signatures reveal ternary relationships between homologous recombination repair, APOBEC, and mismatch repair in gynecological cancers
Journal of Translational Medicine (2022)
-
Genomic predictors of response to PD-1 inhibition in children with germline DNA replication repair deficiency
Nature Medicine (2022)
-
Intratumor heterogeneity: the hidden barrier to immunotherapy against MSI tumors from the perspective of IFN-γ signaling and tumor-infiltrating lymphocytes
Journal of Hematology & Oncology (2021)
-
Integrating molecular profiles into clinical frameworks through the Molecular Oncology Almanac to prospectively guide precision oncology
Nature Cancer (2021)