novoBreak: local assembly for breakpoint detection in cancer genomes

Chong, Zechen; Ruan, Jue; Gao, Min; Zhou, Wanding; Chen, Tenghui; Fan, Xian; Ding, Li; Lee, Anna Y; Boutros, Paul; Chen, Junjie; Chen, Ken

doi:10.1038/nmeth.4084

Brief Communication
Published: 28 November 2016

novoBreak: local assembly for breakpoint detection in cancer genomes

Zechen Chong¹^na1,
Jue Ruan²^na1,
Min Gao³^na1,
Wanding Zhou¹,
Tenghui Chen¹,
Xian Fan¹,
Li Ding ORCID: orcid.org/0000-0003-1517-2975⁴,
Anna Y Lee ORCID: orcid.org/0000-0001-8197-6272⁵,
Paul Boutros ORCID: orcid.org/0000-0003-0553-7520^5,6,7,
Junjie Chen³ &
…
Ken Chen¹

Nature Methods volume 14, pages 65–67 (2017)Cite this article

7661 Accesses
64 Citations
23 Altmetric
Metrics details

Subjects

Abstract

We present novoBreak, a genome-wide local assembly algorithm that discovers somatic and germline structural variation breakpoints in whole-genome sequencing data. novoBreak consistently outperformed existing algorithms on real cancer genome data and on synthetic tumors in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge primarily because it more effectively utilized reads spanning breakpoints. novoBreak also demonstrated great sensitivity in identifying short insertions and deletions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 2: novoBreak performance on DREAM 8.5 Mutation Calling Challenge data.**

Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak

Article Open access 17 January 2023

Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets

Article 19 December 2022

MutSpot: detection of non-coding mutation hotspots in cancer genomes

Article Open access 05 June 2020

Accession codes

Primary accessions

Sequence Read Archive

SRP042948

References

Mitelman, F., Johansson, B. & Mertens, F. Nat. Rev. Cancer 7, 233–245 (2007).
Article CAS Google Scholar
Stephens, P.J. et al. Nature 462, 1005–1010 (2009).
Article CAS Google Scholar
Alkan, C., Coe, B.P. & Eichler, E.E. Nat. Rev. Genet. 12, 363–376 (2011).
Article CAS Google Scholar
Alkan, C., Sajjadian, S. & Eichler, E.E. Nat. Methods 8, 61–65 (2011).
Article CAS Google Scholar
Chen, K. et al. Nat. Methods 6, 677–681 (2009).
Article CAS Google Scholar
Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).
Article CAS Google Scholar
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Bioinformatics 25, 2865–2871 (2009).
Article CAS Google Scholar
Rausch, T. et al. Bioinformatics 28, i333–i339 (2012).
Article CAS Google Scholar
Chen, K. et al. Genome Res. 24, 310–317 (2014).
Article Google Scholar
Li, Y. et al. Nat. Biotechnol. 29, 723–730 (2011).
Article CAS Google Scholar
Earl, D. et al. Genome Res. 21, 2224–2241 (2011).
Article CAS Google Scholar
Boutros, P.C. et al. Nat. Genet. 46, 318–319 (2014).
Article CAS Google Scholar
Ewing, A.D. et al. Nat. Methods 12, 623–630 (2015).
Article CAS Google Scholar
McKenna, A. et al. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar
Saunders, C.T. et al. Bioinformatics 28, 1811–1817 (2012).
Article CAS Google Scholar
Li, H. Bioinformatics 28, 1838–1844 (2012).
Article CAS Google Scholar
Pleasance, E.D. et al. Nature 463, 191–196 (2010).
Article CAS Google Scholar
Wang, J. et al. Nat. Methods 8, 652–654 (2011).
Article CAS Google Scholar
Zhang, J. et al. Genome Res. 26, 108–118 (2016).
Article CAS Google Scholar
Li, H. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Bloom, B.H. Communications of the ACM 13, 422–426 (1970).
Article Google Scholar
Sedgewick, R. & Wayne, K. Algorithms 4th edn. (Addison-Wesley, 2011).
Warren, R.L., Sutton, G.G., Jones, S.J.M. & Holt, R.A. Bioinformatics 23, 500–501 (2007).
Article CAS Google Scholar
DePristo, M.A. et al. Nat. Genet. 43, 491–498 (2011).
Article CAS Google Scholar
Van der Auwera, G.A. et al. Curr. Protoc. Bioinformatics 11, 43.11.10.1–43.11.10.33 (2013).
Google Scholar

Download references

Acknowledgements

We thank the ICGC-TCGA DREAM SMC Challenge organizers and participants for providing data and evaluations; and we thank A.K. Eterovic and G.B. Mills for assistance with the experiment and manuscript. This study was supported in part by the National Institutes of Health (grant numbers R01 CA172652 to K.C. and U41 HG007497 to C. Lee, Jackson Lab), the National Cancer Institute Cancer Center Support Grant (P30 CA016672 to R. Depinho, MD Anderson Cancer Center), Andrew Sabin Family Foundation to K.C., and a training fellowship from the Computational Cancer Biology Training Program of the Gulf Coast Consortia (CPRIT grant number RP140113) to Z.C. The results published here are in part based upon data generated by TCGA established by the NCI and NHGRI. Information about TCGA and the investigators and institutions who constitute the TCGA research network can be found at http://cancergenome.nih.gov/.

Author information

Zechen Chong, Jue Ruan and Min Gao: These authors contributed equally to this work.

Authors and Affiliations

Department of Bioinformatics and Computational Biology, The University of Texas Maryland Anderson Cancer Center, Houston, Texas, USA
Zechen Chong, Wanding Zhou, Tenghui Chen, Xian Fan & Ken Chen
Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
Jue Ruan
Department of Experimental Radiation Oncology, The University of Texas Maryland Anderson Cancer Center, Houston, Texas, USA
Min Gao & Junjie Chen
McDonnell Genome Institute, Washington University, St. Louis, Missouri, USA
Li Ding
Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
Anna Y Lee & Paul Boutros
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
Paul Boutros
Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario, Canada
Paul Boutros

Authors

Zechen Chong
View author publications
You can also search for this author in PubMed Google Scholar
Jue Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Min Gao
View author publications
You can also search for this author in PubMed Google Scholar
Wanding Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tenghui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xian Fan
View author publications
You can also search for this author in PubMed Google Scholar
Li Ding
View author publications
You can also search for this author in PubMed Google Scholar
Anna Y Lee
View author publications
You can also search for this author in PubMed Google Scholar
Paul Boutros
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ken Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.C., J.R., and K.C. conceived the algorithm. Z.C. developed the software. Z.C. and K.C. designed and analyzed the experiments. M.G. and J.C. designed and performed the validation experiments. W.Z. designed the scoring statistics. M.G., T.C., X.F., L.D., A.Y.L., and P.B. tested the algorithm and performed additional analyses. K.C. supervised the projects. Z.C. and K.C. wrote the manuscript with input from all authors. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Ken Chen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Illustration of the fate of the breakpoint spanning read pairs in alignment-based methods versus that in our k-mer targeted assembly method.

The alignment-based approaches underutilize read pairs spanning a breakpoint (top), while our k-mer targeted assembly strategy can fully utilize them (bottom).

Supplementary Figure 2 Illustration of clustering based on novo k-mers and associated read pairs.

At each breakpoint, there are k-1 k-mers covering the breakpoint. If a read fully covers a breakpoint, it must contain several k-mers (< k-1 if there are sequencing errors) covering the breakpoint. On the other hand, there should be several read pairs sharing identical k-mers, given sufficient coverage. Based on this relationship, a union-find algorithm is applied to accomplish the clustering procedure.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chong, Z., Ruan, J., Gao, M. et al. novoBreak: local assembly for breakpoint detection in cancer genomes. Nat Methods 14, 65–67 (2017). https://doi.org/10.1038/nmeth.4084

Download citation

Received: 30 October 2015
Accepted: 01 November 2016
Published: 28 November 2016
Issue Date: January 2017
DOI: https://doi.org/10.1038/nmeth.4084

This article is cited by

Analysis of Preimplantation and Clinical Outcomes of Two Cases by Oxford Nanopore Sequencing
- Jian Ou
- Jiaojiao Wang
- Jia Fei
Reproductive Sciences (2024)
Personalized genome assembly for accurate cancer somatic mutation discovery using tumor-normal paired reference samples
- Chunlin Xiao
- Zhong Chen
- Wenming Xiao
Genome Biology (2022)
Structural variations in cancer and the 3D genome
- Frank Dubois
- Nikos Sidiropoulos
- Rameen Beroukhim
Nature Reviews Cancer (2022)
Ultrafast prediction of somatic structural variations by filtering out reads matched to pan-genome k-mer sets
- Jang-il Sohn
- Min-Hak Choi
- Jin-Wu Nam
Nature Biomedical Engineering (2022)
Detecting structural variations with precise breakpoints using low-depth WGS data from a single oxford nanopore MinION flowcell
- Henry C. M. Leung
- Huijing Yu
- Tak-Wah Lam
Scientific Reports (2022)