Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

ACUTE LYMPHOBLASTIC LEUKEMIA

RNAseqCNV: analysis of large-scale copy number variations from RNA-seq data

Abstract

Transcriptome sequencing (RNA-seq) is widely used to detect gene rearrangements and quantitate gene expression in acute lymphoblastic leukemia (ALL), but its utility and accuracy in identifying copy number variations (CNVs) has not been well described. CNV information inferred from RNA-seq can be highly informative to guide disease classification and risk stratification in ALL due to the high incidence of aneuploid subtypes within this disease. Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. We used models based on normalized gene expression and minor allele frequency to classify arm level CNVs with high accuracy in ALL (99.1% overall and 98.3% for non-diploid chromosome arms, respectively), and the models were further validated with excellent performance in acute myeloid leukemia (accuracy 99.8% overall and 99.4% for non-diploid chromosome arms). RNAseqCNV outperforms alternative RNA-seq based algorithms in calling CNVs in the ALL dataset, especially in samples with a high proportion of CNVs. The CNV calls were highly concordant with DNA-based CNV results and more reliable than conventional cytogenetic-based karyotypes. RNAseqCNV provides a method to robustly identify copy number alterations in the absence of DNA-based analyses, further enhancing the utility of RNA-seq to classify ALL subtype.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of RNAseqCNV.
Fig. 2: Chromosomal level CNVs detected by gene expression and MAF of SNVs.
Fig. 3: Diploid adjustment in a sample with a high proportion of CNVs.

Similar content being viewed by others

Code availability

RNAseqCNV and the tutorial are freely available from https://github.com/honzee/RNAseqCNV. Docker image of RNAseqCNV is available at: https://hub.docker.com/repository/docker/honzik1/rnaseqcnv

References

  1. Li JF, Dai YT, Lilljebjorn H, Shen SH, Cui BW, Bai L, et al. Transcriptional landscape of B cell precursor acute lymphoblastic leukemia based on an international study of 1223 cases. Proc Natl Acad Sci USA. 2018;115:E11711–E11720. Dec 11

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Gu Z, Churchman ML, Roberts KG, Moore I, Zhou X, Nakitandwe J, et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat Genet. 2019;51:296–307. Jan 14

    Article  CAS  Google Scholar 

  3. Liu Y, Easton J, Shao Y, Maciaszek J, Wang Z, Wilkinson MR, et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nature genetics 2017 Jul 03.

  4. Talevich E, Shain AH CNVkit-RNA: Copy number inference from RNA-Sequencing data. bioRxiv 2018: 408534.

  5. Serin Harmanci A, Harmanci AO, Zhou X. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun. 2020;11:89. Jan 3

    Article  CAS  Google Scholar 

  6. Iacobucci I, Mullighan CG. Genetic basis of acute lymphoblastic leukemia. J Clin Oncol: Off J Am Soc Clin Oncol. 2017;35:975–83. Mar 20

    Article  CAS  Google Scholar 

  7. Inaba H, Azzato EM, Mullighan CG. Integration of next-generation sequencing to treat acute lymphoblastic leukemia with targetable lesions: The St. Jude Children’s Research Hospital Approach. Front Pediatr. 2017;5:258.

    Article  Google Scholar 

  8. Chen X, Gupta P, Wang J, Nakitandwe J, Roberts K, Dalton JD, et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat methods. 2015;12:527–30. Jun

    Article  CAS  Google Scholar 

  9. Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012;28:423–5. Feb 01

    Article  CAS  Google Scholar 

  10. Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, et al. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol. 2010;11:R92.

    Article  CAS  Google Scholar 

  11. Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–74. Nov

    Article  CAS  Google Scholar 

  12. Mayrhofer M, Viklund B, Isaksson A. Rawcopy: improved copy number analysis with Affymetrix arrays. Sci Rep. 2016;6:36158. Oct 31

    Article  CAS  Google Scholar 

  13. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.

    Article  Google Scholar 

  14. McLeod C, Gout AM, Zhou X, Thrasher A, Rahbarinia D, Brady SW, et al. St. Jude cloud: a pediatric cancer genomic data-sharing ecosystem. Cancer Discov. 2021;11:1082–99. May

    Article  CAS  Google Scholar 

  15. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008. 2008 2008-11-10;28: 6.

  16. Flensburg C, Sargeant T, Oshlack A, Majewski IJ. SuperFreq: Integrated mutation detection and clonal tracking in cancer. PLoS computational Biol. 2020;16:e1007603. Feb

    Article  CAS  Google Scholar 

  17. Ma SK, Chan GC, Wan TS, Lam CK, Ha SY, Lau YL, et al. Near-haploid common acute lymphoblastic leukaemia of childhood with a second hyperdiploid line: a DNA ploidy and fluorescence in-situ hybridization study. Br J Haematol. 1998;103:750–5. Dec

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank the Biorepository, the Genome Sequencing Facility of the Hartwell Center for Bioinformatics and Biotechnology, and the Cytogenetics core facility of SJCRH. This work was supported by the American Lebanese Syrian Associated Charities of SJCRH, the American Society of Hematology Scholar Award (to Z.G.), the Leukemia & Lymphoma Society’s Career Development Program Special Fellow (to Z.G.), the NIH/NCI K99/R00 Award CA241297 (to Z.G.), NCI Outstanding Investigator Award R35 CA197695 (to C.G.M.), National Institute of General Medical Sciences grant P50 GM115279 (to C.G.M.), NCI grants P30 CA021765 (St. Jude Cancer Center Support Grant).

Author information

Authors and Affiliations

Authors

Contributions

ZG and CGM conceived and designed the study. JB and ZG developed the algorithms and the R package. JB, ZH, LW, DW and ZG analyzed and interpreted the genomic data. DR and CM uploaded the genomic data to St. Jude Cloud. JB, ZG, and CGM wrote the manuscript.

Corresponding authors

Correspondence to Zhaohui Gu or Charles G. Mullighan.

Ethics declarations

Competing interests

CGM has received consulting fees from Illumina, speaking fees from Amgen, and research support from Pfizer, Loxo Oncology and Abbvie

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bařinka, J., Hu, Z., Wang, L. et al. RNAseqCNV: analysis of large-scale copy number variations from RNA-seq data. Leukemia 36, 1492–1498 (2022). https://doi.org/10.1038/s41375-022-01547-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41375-022-01547-8

This article is cited by

Search

Quick links