Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

CONSERTING: integrating copy-number analysis with structural-variation detection

Abstract

We developed Copy Number Segmentation by Regression Tree in Next Generation Sequencing (CONSERTING), an algorithm for detecting somatic copy-number alteration (CNA) using whole-genome sequencing (WGS) data. CONSERTING performs iterative analysis of segmentation on the basis of changes in read depth and the detection of localized structural variations, with high accuracy and sensitivity. Analysis of 43 cancer genomes from both pediatric and adult patients revealed novel oncogenic CNAs, complex rearrangements and subclonal CNAs missed by alternative approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Strategy for CNA detection used by CONSERTING.
Figure 2: Comparison of WGS CNA detection by CONSERTING and four other methods.
Figure 3: A complex rearrangement in a sample of pediatric low-grade glioma identified by CONSERTING.

Similar content being viewed by others

References

  1. Mullighan, C.G. et al. N. Engl. J. Med. 360, 470–480 (2009).

    Article  CAS  Google Scholar 

  2. Ley, T.J. et al. Nature 456, 66–72 (2008).

    Article  CAS  Google Scholar 

  3. Chiang, D.Y. et al. Nat. Methods 6, 99–103 (2009).

    Article  CAS  Google Scholar 

  4. Xie, C. & Tammi, M.T. BMC Bioinformatics 10, 80 (2009).

    Article  Google Scholar 

  5. Boeva, V. et al. Bioinformatics 27, 268–269 (2011).

    Article  CAS  Google Scholar 

  6. Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. Genome Res. 21, 974–984 (2011).

    Article  CAS  Google Scholar 

  7. Xi, R. et al. Proc. Natl. Acad. Sci. USA 108, E1128–E1136 (2011).

    Article  CAS  Google Scholar 

  8. Downing, J.R. et al. Nat. Genet. 44, 619–622 (2012).

    Article  CAS  Google Scholar 

  9. Zhang, J. et al. Nature 481, 157–163 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Roberts, K.G. et al. Cancer Cell 22, 153–166 (2012).

    Article  CAS  Google Scholar 

  11. Zhang, J. et al. Nature 481, 329–334 (2012).

    Article  CAS  Google Scholar 

  12. Zhang, J. et al. Nat. Genet. 45, 602–612 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Brennan, C.W. et al. Cell 155, 462–477 (2013).

    Article  CAS  Google Scholar 

  14. Pleasance, E.D. et al. Nature 463, 191–196 (2010).

    Article  CAS  Google Scholar 

  15. Wang, J. et al. Nat. Methods 8, 652–654 (2011).

    Article  CAS  Google Scholar 

  16. Stephens, P.J. et al. Cell 144, 27–40 (2011).

    Article  CAS  Google Scholar 

  17. Sanborn, J.Z. et al. Cancer Res. 73, 6036–6045 (2013).

    Article  CAS  Google Scholar 

  18. Handsaker, R.E., Korn, J.M., Nemesh, J. & McCarroll, S.A. Nat. Genet. 43, 269–276 (2011).

    Article  CAS  Google Scholar 

  19. Parker, M. et al. Nature 506, 451–455 (2014).

    Article  CAS  Google Scholar 

  20. Wu, G. et al. Nat. Genet. 46, 444–450 (2014).

    Article  CAS  Google Scholar 

  21. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  22. Edmonson, M.N. et al. Bioinformatics 27, 865–866 (2011).

    Article  CAS  Google Scholar 

  23. Breiman, L., Friedman, J.M., Olshen, R. & Stone, C. Classification and Regression Trees edn. 1 (Chapman and Hall/CRC, 1984).

  24. Schwarz, G. Ann. Stat. 6, 461–464 (1978).

    Article  Google Scholar 

  25. Kent, W.J. et al. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  26. Rozen, S. & Skaletsky, H. Methods Mol. Biol. 132, 365–386 (2000).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This study was supported by the St. Jude Children's Research Hospital–Washington University Pediatric Cancer Genome Project, Cancer Center support grant P30 CA021765 from the US National Cancer Institute and the American Lebanese Syrian Associated Charities of St. Jude Children's Research Hospital. C.G.M. is supported as a Pew Scholar in the Biomedical Sciences and is a St. Baldrick's Scholar.

Author information

Authors and Affiliations

Authors

Contributions

X.C. and J.Z. conceived and designed the CONSERTING algorithm. X.C., P.G. and J.W. implemented the algorithm. J.Z., S.J.B., M.A.D., J.R.D., D.W.E. and C.G.M. designed the experiment. X.C., J.W., J.D.D., M.P., J.M., M.R., G.W., A.P., S.E., S. Pounds and J.Z. analyzed the data. K.R., J.D.D., S. Patel, L.H., D.P. and J.E performed validation and functional assays. J.N. and S.S. generated COLO-829 whole-genome sequencing data. X.C. and J.Z. wrote the manuscript.

Corresponding author

Correspondence to Jinghui Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Overview of the CONSERTING process.

A parallelogram shows input or output files, and a rectangle defines an analytical process. A diamond defines the condition for a follow-up process.

Supplementary Figure 2 CIRCOS plots for CNAs derived from SNP array, CONSERTING, CNV-seq, SegSeq, FREEC and BIC-seq for the 20 paired tumor-normal whole-genome sequencing data sets presented in this paper (SegSeq was not run on SJTALL015 and SJLGG039, as these two samples were mapped to hg19, which is not compatible with SegSeq).

All CIRCOS plots are shown in the same order. (a) Legend for the CIRCOS plot. (b) CIRCOS plots for the 12 ETP-ALL cases. (c) CIRCOS plots for the four RB cases. (d) CIRCOS plots for two B-ALL samples from COG. (e) A CIRCOS plot for the non-ETP T-ALL sample SJTALL015. (f) A CIRCOS plot for the low-grade glioma sample SJLGG039 (arrows show amplifications identified by both CONSERTING and SNP array).

Supplementary Figure 3 Somatic CNAs computed by CONSERTING and BIC-seq for 22 paired whole-genome sequencing data sets from The Cancer Genome Atlas (TCGA) GBM project.

Each sample is displayed in a colored box with the BIC-seq results (denoted by B) shown at the top, the SNP array result (denoted by S, and downloaded from TCGA) at the middle and the CONSERTING results (denoted by C) at the bottom. TCGA sample I.D.s are at the left. Samples marked with an asterisk had lower than median F1 scores for both CONSERTING and BIC-seq. Diagnosis and relapse sample pairs are shown in same color with different intensities. Pairs with underlined sample I.D.s had highly divergent CNA profiles from diagnosis to relapse.

Supplementary Figure 4 ROC curves for CONSERTING and BIC-seq in 11 ETP-ALL samples.

SJTALL008 was excluded from this analysis because there were no CNV calls in the curated SNP array result.

Supplementary Figure 5 CNA calls at chr1:164 -244 Mb of retinoblastoma tumor SJRB003 by SNP array, CONSERTING and four other methods.

(a) Global view of the CNA state in the 93-Mb region on 1q. The thin horizontal lines define the copy-neutral state (i.e., no copy-number variation). Blue blocks above the “neutral” line are the copy-number gains (amplifications) identified by each method, with the height of the block corresponding to the amplitude of the copy-number gain. (b) Detailed view of a 35-kb region showing that BIC-seq missed two breakpoints separating two CNV segments. Both breakpoints were confirmed experimentally by Sanger sequencing. The two CNV boundaries at chr1:231118034 and chr1:231123028 missed by BIC-seq are part of the complex rearrangement depicted in Supplementary Fig. 5c. Both were involved in interchromosomal translocations 60–70 Mb upstream, and the breakpoints were experimentally validated. (c) A complex rearrangement identified by CONSERTING at chr1:164-244 Mb. The top panel is an SV graph that connects the 12 SVs identified in this region. The black lines mark the breakpoints of the seven SVs detected only by CONSERTING, and the gray lines mark the breakpoints of the five SVs detected by both CONSERTING and CREST. The purple bar marks the boundaries of the CNA segment, with the amplitudes of CNAs marked at the bottom. The three colored dots mark the location of FISH probes used to validate the copy number and SVs. (d) PCR amplicon validation of CONSERTING-predicted SVs. Lane 68 (predicted amplicon size: 500 bp): chr1:231123028(-)|chr1:164744054(+); lane 69 (predicted amplicon size: 313 bp): chr1:166476222(+)|chr1:174088270(+); Lane 70 (predicted amplicon size: 343 bp): chr1:173901001(+)|chr1:233328796(+); Lane 71 (predicted amplicon size: 488 bp): chr1:230868636(-)|chr1:173478509(+); lane 72 (predicted amplicon size: 332 bp): chr1:241160552(+)|chr1:177362495(-); lane 73 (predicted amplicon size: 350 bp): chr1:177464060(+)|chr1:224896416(+); lane 74 (predicted amplicon size: 371 bp): chr1:236474589(+)|chr9:136321446(-). (e) FISH validation of the WGS predicted gain (scale bar, 10 µm; also refer to Supplementary Fig. 14 of Zhang, J. et al. Nature 481, 329–334 (2012)).

Supplementary Figure 6 Double-minute chromosomes identified in the TCGA-GBM data set.

The inner green and magenta lines connect intra- and interchromosomal SV breakpoints, respectively. The red and blue arcs represent amplification and deletion identified by CONSERTING, respectively. The thickness of the arc is proportional to the level of amplification or deletion. Samples marked with an asterisk (06-0152-01A, 06-0210-01A, 06-0211-01A, 06-0211-02A, 06-0648-01A,14-1402-01A, 14-1402-02A, 19-5960-01A and 27-1831-01A) had a chromothripsis-like CNA-SV profile.

Supplementary Figure 7 A novel intragenic NOTCH1 deletion resulting in expression of ICN T-ALL.

(a) The normalized read depth for part of NOTCH1 for normal germline (blue), tumor (red) and tumor-normal (gray), showing a deletion spanning exons 14–27 in the tumor sample. (b) A deletion in this region detected by CONSERTING but not by SNP array with two ends mapped within exons 14 and 27. (c) Sanger-sequencing chromatograms confirming the in-frame deletion in both genomic DNA and cDNA. (d) Western blot analysis of SJTALL015 with the intragenic deletion (∆) targeting the NOTCH1 HD domain, and the T cell lines MOLT3, HPBALL, DND41, PF382, TALL-1 and LOUCY and the murine fibroblast cell line GPE-86, either wild type (-) or harboring activating mutations (+). (e) NOTCH1 domain view showing that the intragenetic deletion removes several EGF domains, an LNR domain and the heterodimerization (HD) domain.

Supplementary Figure 8 Fractured genome in WGS data from SJRB002 and data from The Cancer Genome Atlas (TCGA) project.

(a) CNA plots of chromosomes 5 and 6 using the initial WGS data. Copy-number gain with log2 ratio > 0.17 is marked in red, copy-number loss with log2 ratio < –0.17 is marked in blue, and the remaining segments are marked in black. (b) CNA plot of the same sample based on the second WGS data set with no ‘fracture’. The 6p amplification is the only CNA that is replicated in the two WGS data sets (not detected by SNP array but validated by SKY mapping). (c) Genome-wide CNAs using the original WGS by all methods. (d) Genome-wide CNAs using the second 10x WGS by all methods. (e) Number of predicted somatic CNAs for TCGA-GBM samples with and without fractured genome by BIC-seq, CONSERTING and SNP array. (f) Three paired WGS data showing fractured genome. Each sample is displayed in a colored box with the BIC-seq results shown at the top, the SNP array result (downloaded from TCGA) at the middle and the CONSERTING results at the bottom. TCGA sample I.D.s are at the left. Gain is shown in red, and loss in blue.

Supplementary Figure 9 Distribution of quality score of a typical PCGP sample (SJTALL013) and that of a typical TCGA sample (TCGA-06-0145-01A).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 and Supplementary Data 1–4 (PDF 1523 kb)

Supplementary Table 1

Sample characteristics of 43 tumor pairs. (XLS 36 kb)

Supplementary Table 2

Manually curated autosomal CNAs by SNP array analysis in 12 ETP-ALL tumors. (XLS 41 kb)

Supplementary Table 3

Comparison of CNAs from whole-genome sequencing by CONSERTING, BIC-seq, SegSeq, CNV-seq and FREEC with manually curated CNVs from SNP array analysis in the 12 ETP-ALL samples. (XLS 49 kb)

Supplementary Table 4

Comparison of CNAs from whole-genome sequencing by CONSERTING and BIC-seq with CNAs from SNP array analysis in the 22 TCGA-GBM samples. (XLS 53 kb)

Supplementary Table 5

CNA profile and SV matching status of diluted COLO-829 sample analyzed by CONSERTING. (XLS 49 kb)

Supplementary Table 6

Experimental validation of novel CNAs in coding exons identified by CONSERTING and SegSeq in the two COG samples and SJTALL015. (XLS 36 kb)

Supplementary Software

CONSERTING package including instructions (README.pdf) and source code. Please see http://www.stjuderesearch.org/site/lab/zhang for updates, test data and instructions for running CONSERTING on the Amazon Web Services cloud (ZIP 1593 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, X., Gupta, P., Wang, J. et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat Methods 12, 527–530 (2015). https://doi.org/10.1038/nmeth.3394

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3394

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer