Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Accurate de novo and transmitted indel detection in exome-capture data using microassembly

Abstract

We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the Scalpel algorithm workflow.
Figure 2: Concordance of indels between pipelines.
Figure 3: MiSeq validation.

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

References

  1. DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. O'Rawe, J. et al. Genome Med. 5, 28 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Zook, J.M. et al. Nat. Biotechnol. 32, 246–251 (2014).

    Article  CAS  PubMed  Google Scholar 

  4. Mullaney, J.M., Mills, R.E., Pittard, W.S. & Devine, S.E. Hum. Mol. Genet. 19, R131–R136 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pearson, C.E., Edamura, N.K. & Cleary, J.D. Nat. Rev. Genet. 6, 729–742 (2005).

    Article  CAS  PubMed  Google Scholar 

  6. Iossifov, I. et al. Neuron 74, 285–299 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Li, H., Ruan, J. & Durbin, R. Genome Res. 18, 1851–1858 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Montgomery, S.B. et al. Genome Res. 23, 749–761 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Albers, C.A. et al. Genome Res. 21, 961–973 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Karakoc, E. et al. Nat. Methods 9, 176–178 (2012).

    Article  CAS  Google Scholar 

  12. Li, Y. et al. Nat. Biotechnol. 29, 723–730 (2011).

    Article  CAS  PubMed  Google Scholar 

  13. Li, H. Bioinformatics 28, 1838–1844 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Li, S. et al. Genome Res. 23, 195–200 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Iqbal, Z., Caccamo, M., Turner, I., Flicek, P. & McVean, G. Nat. Genet. 44, 226–232 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Chen, K. et al. Genome Res. 24, 310–317 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at http://arxiv.org/abs/1207.3907v2 (2012).

  19. Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. Genome Res. 22, 1154–1162 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Highnam, G. et al. Nucleic Acids Res. 41, e32 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. MacArthur, D.G. & Tyler-Smith, C. Hum. Mol. Genet. 19, R125–R130 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sjödin, P., Bataillon, T. & Schierup, M.H. PLoS ONE 5, e8650 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Sanders, S.J. et al. Nature 485, 237–241 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. O'Roak, B.J. et al. Nature 485, 246–250 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Neale, B.M. et al. Nature 485, 242–245 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Darnell, J.C. et al. Cell 146, 247–261 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Nagarajan, N. & Pop, M. Nat. Rev. Genet. 14, 157–167 (2013).

    Article  CAS  PubMed  Google Scholar 

  28. Li, H. & Durbin, R. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/abs/1303.3997v1 (2013).

  30. Smith, T.F. & Waterman, M.S. J. Mol. Biol. 147, 195–197 (1981).

    Article  CAS  PubMed  Google Scholar 

  31. Medvedev, P., Georgiou, K., Myers, G. & Brudno, M. Lect. Notes Comput. Sci. 4645, 289–301 (2007).

    Article  Google Scholar 

  32. Jackson, B.G. & Aluru, S. in 37th Int. Conf. Parallel Process. 346–353 (ICPP, 2008).

  33. Narzisi, G. & Mishra, B. Bioinformatics 27, 153–160 (2011).

    Article  CAS  PubMed  Google Scholar 

  34. Langmead, B. & Salzberg, S. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Fischbach, G.D. & Lord, C. Neuron 68, 192–195 (2010).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The project was supported in part by the US National Institutes of Health (R01-HG006677) and US National Science Foundation (DBI-1350041) to M.C.S. and by the Cold Spring Harbor Laboratory (CSHL) Cancer Center Support Grant (5P30CA045508), the Stanley Institute for Cognitive Genomics and the Simons Foundation (SF51 and SF235988) to M.W. The DNA samples used in this work are included within SSC release 13. Approved researchers can obtain the SSC population data set described in this study by applying at https://base.sfari.org/. We thank S. Eskipehlivan for the technical assistance with the MiSeq validation experiments. We thank M. Bekritsky, S. Neuburgerand, M. Ronemus, D. Levy, B. Yamron and B. Mishra for helpful discussions and comments on the paper. We thank R. Aboukhalil for testing the software.

Author information

Authors and Affiliations

Authors

Contributions

G.N. developed the software and conducted the computational experiments. G.N. and M.C.S. designed and analyzed the experiments. Y.W. assisted in designing the primers and performed the MiSeq validation experiments. J.A.O. designed the primers and analyzed the MiSeq data. H.F. and J.A.O. assisted with the computational experiments for the comparative analysis between different variant-detection pipelines. G.J.L. planned and supervised the experimental design for indel validation. Z.W. designed the primers and performed experiments for the validation of de novo and transmitted indels in the SSC. I.I., Y.-h.L. and M.W. assisted with the analysis of the SSC. G.N. and M.C.S. wrote the manuscript with input from all authors. All of the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Giuseppe Narzisi.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–20, Supplementary Tables 1–11 and Supplementary Notes 1–6 (PDF 3587 kb)

Supplementary Data 1

Scripts and data used to generate Figure 3b (ZIP 225 kb)

Supplementary Data 2

List of 97 high-quality de novo indels in 593 SSC families (XLS 47 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Narzisi, G., O'Rawe, J., Iossifov, I. et al. Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat Methods 11, 1033–1036 (2014). https://doi.org/10.1038/nmeth.3069

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3069

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics