Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

forestSV: structural variant discovery through statistical learning

Abstract

Detecting genomic structural variants from high-throughput sequencing data is a complex and unresolved challenge. We have developed a statistical learning approach, based on Random Forests, that integrates prior knowledge about the characteristics of structural variants and leads to improved discovery in high-throughput sequencing data. The implementation of this technique, forestSV, offers high sensitivity and specificity coupled with the flexibility of a data-driven approach.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The forestSV framework.
Figure 2: Mapping of features to structural variant calls by Random Forests (RF).

Similar content being viewed by others

References

  1. Sebat, J. et al. Science 305, 525–528 (2004).

    Article  CAS  Google Scholar 

  2. Iafrate, A.J. et al. Nat. Genet. 36, 949–951 (2004).

    Article  CAS  Google Scholar 

  3. Stankiewicz, P. & Lupski, J.R. Annu. Rev. Med. 61, 437–455 (2010).

    Article  CAS  Google Scholar 

  4. Sebat, J., Levy, D.L. & McCarthy, S.E. Trends Genet. 25, 528–535 (2009).

    Article  CAS  Google Scholar 

  5. Yoon, S., Xuan, Z., Makarov, V., Ye, K. & Sebat, J. Genome Res. 19, 1586–1592 (2009).

    Article  CAS  Google Scholar 

  6. Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).

    Article  CAS  Google Scholar 

  7. Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Nat. Genet. 43, 269–276 (2011).

    Article  CAS  Google Scholar 

  8. Chen, K. et al. Nat. Methods 6, 677–681 (2009).

    Article  CAS  Google Scholar 

  9. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  Google Scholar 

  10. Mills, R.E. et al. Nature 470, 59–65 (2011).

    Article  CAS  Google Scholar 

  11. Breiman, L. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  12. 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).

  13. Conrad, D.F. et al. Nature 464, 704–712 (2010).

    Article  CAS  Google Scholar 

  14. McCarroll, S.A. et al. Nat. Genet. 40, 1166–1174 (2008).

    Article  CAS  Google Scholar 

  15. Malhotra, D. & Sebat, J. Cell 148, 1223–1241 (2012).

    Article  CAS  Google Scholar 

  16. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  17. Bentley, D.R. et al. Nature 456, 53–59 (2008).

    Article  CAS  Google Scholar 

  18. Liaw, A. & Wiener, M. R News 2, 18–22 (2002).

    Google Scholar 

Download references

Acknowledgements

This work was performed under US National Institutes of Health grants HG005725 and MH076431 and with support from the Beyster Family Foundation. We also thank the 1000 Genomes Project for access to data and J. Wang, H. Zheng, Y. Li, X. Jin and Y. Shi from BGI-Shenzhen for their roles in producing the unpublished autism sequencing data.

Author information

Authors and Affiliations

Authors

Contributions

J.J.M. conceived of and implemented forestSV. J.J.M. and J.S. wrote the manuscript.

Corresponding author

Correspondence to Jonathan Sebat.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Tables 1 and 2 and Supplementary Results (PDF 2737 kb)

Supplementary Data 1

Genomic regions used for training (TXT 13315 kb)

Supplementary Data 2

Structural variant calls produced by forestSV in NA12878, NA12891, NA12892, NA19240, NA19238 and NA19239 (TXT 18421 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Michaelson, J., Sebat, J. forestSV: structural variant discovery through statistical learning. Nat Methods 9, 819–821 (2012). https://doi.org/10.1038/nmeth.2085

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2085

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing