We present novoBreak, a genome-wide local assembly algorithm that discovers somatic and germline structural variation breakpoints in whole-genome sequencing data. novoBreak consistently outperformed existing algorithms on real cancer genome data and on synthetic tumors in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge primarily because it more effectively utilized reads spanning breakpoints. novoBreak also demonstrated great sensitivity in identifying short insertions and deletions.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sequence Read Archive
Mitelman, F., Johansson, B. & Mertens, F. Nat. Rev. Cancer 7, 233–245 (2007).
Stephens, P.J. et al. Nature 462, 1005–1010 (2009).
Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).
Alkan, C., Sajjadian, S. & Eichler, E.E. Nat. Methods 8, 61–65 (2011).
Chen, K. et al. Nat. Methods 6, 677–681 (2009).
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Bioinformatics 25, 2865–2871 (2009).
Rausch, T. et al. Bioinformatics 28, i333–i339 (2012).
Chen, K. et al. Genome Res. 24, 310–317 (2014).
Li, Y. et al. Nat. Biotechnol. 29, 723–730 (2011).
Earl, D. et al. Genome Res. 21, 2224–2241 (2011).
Boutros, P.C. et al. Nat. Genet. 46, 318–319 (2014).
Ewing, A.D. et al. Nat. Methods 12, 623–630 (2015).
McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).
Saunders, C.T. et al. Bioinformatics 28, 1811–1817 (2012).
Li, H. Bioinformatics 28, 1838–1844 (2012).
Pleasance, E.D. et al. Nature 463, 191–196 (2010).
Wang, J. et al. Nat. Methods 8, 652–654 (2011).
Zhang, J. et al. Genome Res. 26, 108–118 (2016).
Li, H. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Bloom, B.H. Communications of the ACM 13, 422–426 (1970).
Sedgewick, R. & Wayne, K. Algorithms 4th edn. (Addison-Wesley, 2011).
Warren, R.L., Sutton, G.G., Jones, S.J.M. & Holt, R.A. Bioinformatics 23, 500–501 (2007).
DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).
Van der Auwera, G.A. et al. Curr. Protoc. Bioinformatics 11, 184.108.40.206–220.127.116.11 (2013).
We thank the ICGC-TCGA DREAM SMC Challenge organizers and participants for providing data and evaluations; and we thank A.K. Eterovic and G.B. Mills for assistance with the experiment and manuscript. This study was supported in part by the National Institutes of Health (grant numbers R01 CA172652 to K.C. and U41 HG007497 to C. Lee, Jackson Lab), the National Cancer Institute Cancer Center Support Grant (P30 CA016672 to R. Depinho, MD Anderson Cancer Center), Andrew Sabin Family Foundation to K.C., and a training fellowship from the Computational Cancer Biology Training Program of the Gulf Coast Consortia (CPRIT grant number RP140113) to Z.C. The results published here are in part based upon data generated by TCGA established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/.
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Illustration of the fate of the breakpoint spanning read pairs in alignment-based methods versus that in our k-mer targeted assembly method.
The alignment-based approaches underutilize read pairs spanning a breakpoint (top), while our k-mer targeted assembly strategy can fully utilize them (bottom).
At each breakpoint, there are k-1 k-mers covering the breakpoint. If a read fully covers a breakpoint, it must contain several k-mers (< k-1 if there are sequencing errors) covering the breakpoint. On the other hand, there should be several read pairs sharing identical k-mers, given sufficient coverage. Based on this relationship, a union-find algorithm is applied to accomplish the clustering procedure.
Supplementary Figures 1 and 2, Supplementary Tables 1, 2 and 6, and Supplementary Notes 1–4. (PDF 674 kb)
Previously experimentally validated somatic SV breakpoints in COLO-829 cell line. (XLSX 67 kb)
novoBreak predicted breakpoints in the COLO-829 whole genome sequencing data. (XLSX 47 kb)
Experimentally validated novel breakpoints found by novoBreak in COLO-829. (XLSX 41 kb)
All filtered novoBreak calls of the 22 TCGA samples. (XLSX 1169 kb)
Sensitivity of novoBreak in detecting gene fusions from the whole-genome sequencing data of 22 breast cancer patients from TCGA. (XLSX 48 kb)
novoBreak source code and binary distribution. (ZIP 2005 kb)
About this article
Cite this article
Chong, Z., Ruan, J., Gao, M. et al. novoBreak: local assembly for breakpoint detection in cancer genomes. Nat Methods 14, 65–67 (2017). https://doi.org/10.1038/nmeth.4084
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery
Genome Biology (2020)
The Journal of Molecular Diagnostics (2020)
NanoVar: accurate characterization of patients’ genomic structural variants using low-depth nanopore sequencing
Genome Biology (2020)
Current Protocols in Human Genetics (2020)
Nature Genetics (2020)