Abstract
Transcriptome sequencing (RNA-seq) is widely used to detect gene rearrangements and quantitate gene expression in acute lymphoblastic leukemia (ALL), but its utility and accuracy in identifying copy number variations (CNVs) has not been well described. CNV information inferred from RNA-seq can be highly informative to guide disease classification and risk stratification in ALL due to the high incidence of aneuploid subtypes within this disease. Here we describe RNAseqCNV, a method to detect large scale CNVs from RNA-seq data. We used models based on normalized gene expression and minor allele frequency to classify arm level CNVs with high accuracy in ALL (99.1% overall and 98.3% for non-diploid chromosome arms, respectively), and the models were further validated with excellent performance in acute myeloid leukemia (accuracy 99.8% overall and 99.4% for non-diploid chromosome arms). RNAseqCNV outperforms alternative RNA-seq based algorithms in calling CNVs in the ALL dataset, especially in samples with a high proportion of CNVs. The CNV calls were highly concordant with DNA-based CNV results and more reliable than conventional cytogenetic-based karyotypes. RNAseqCNV provides a method to robustly identify copy number alterations in the absence of DNA-based analyses, further enhancing the utility of RNA-seq to classify ALL subtype.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Code availability
RNAseqCNV and the tutorial are freely available from https://github.com/honzee/RNAseqCNV. Docker image of RNAseqCNV is available at: https://hub.docker.com/repository/docker/honzik1/rnaseqcnv
References
Li JF, Dai YT, Lilljebjorn H, Shen SH, Cui BW, Bai L, et al. Transcriptional landscape of B cell precursor acute lymphoblastic leukemia based on an international study of 1223 cases. Proc Natl Acad Sci USA. 2018;115:E11711–E11720. Dec 11
Gu Z, Churchman ML, Roberts KG, Moore I, Zhou X, Nakitandwe J, et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat Genet. 2019;51:296–307. Jan 14
Liu Y, Easton J, Shao Y, Maciaszek J, Wang Z, Wilkinson MR, et al. The genomic landscape of pediatric and young adult T-lineage acute lymphoblastic leukemia. Nature genetics 2017 Jul 03.
Talevich E, Shain AH CNVkit-RNA: Copy number inference from RNA-Sequencing data. bioRxiv 2018: 408534.
Serin Harmanci A, Harmanci AO, Zhou X. CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat Commun. 2020;11:89. Jan 3
Iacobucci I, Mullighan CG. Genetic basis of acute lymphoblastic leukemia. J Clin Oncol: Off J Am Soc Clin Oncol. 2017;35:975–83. Mar 20
Inaba H, Azzato EM, Mullighan CG. Integration of next-generation sequencing to treat acute lymphoblastic leukemia with targetable lesions: The St. Jude Children’s Research Hospital Approach. Front Pediatr. 2017;5:258.
Chen X, Gupta P, Wang J, Nakitandwe J, Roberts K, Dalton JD, et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat methods. 2015;12:527–30. Jun
Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics. 2012;28:423–5. Feb 01
Yau C, Mouradov D, Jorissen RN, Colella S, Mirza G, Steers G, et al. A statistical approach for detecting genomic aberrations in heterogeneous tumor samples from single nucleotide polymorphism genotyping data. Genome Biol. 2010;11:R92.
Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–74. Nov
Mayrhofer M, Viklund B, Isaksson A. Rawcopy: improved copy number analysis with Affymetrix arrays. Sci Rep. 2016;6:36158. Oct 31
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
McLeod C, Gout AM, Zhou X, Thrasher A, Rahbarinia D, Brady SW, et al. St. Jude cloud: a pediatric cancer genomic data-sharing ecosystem. Cancer Discov. 2021;11:1082–99. May
Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008. 2008 2008-11-10;28: 6.
Flensburg C, Sargeant T, Oshlack A, Majewski IJ. SuperFreq: Integrated mutation detection and clonal tracking in cancer. PLoS computational Biol. 2020;16:e1007603. Feb
Ma SK, Chan GC, Wan TS, Lam CK, Ha SY, Lau YL, et al. Near-haploid common acute lymphoblastic leukaemia of childhood with a second hyperdiploid line: a DNA ploidy and fluorescence in-situ hybridization study. Br J Haematol. 1998;103:750–5. Dec
Acknowledgements
We thank the Biorepository, the Genome Sequencing Facility of the Hartwell Center for Bioinformatics and Biotechnology, and the Cytogenetics core facility of SJCRH. This work was supported by the American Lebanese Syrian Associated Charities of SJCRH, the American Society of Hematology Scholar Award (to Z.G.), the Leukemia & Lymphoma Society’s Career Development Program Special Fellow (to Z.G.), the NIH/NCI K99/R00 Award CA241297 (to Z.G.), NCI Outstanding Investigator Award R35 CA197695 (to C.G.M.), National Institute of General Medical Sciences grant P50 GM115279 (to C.G.M.), NCI grants P30 CA021765 (St. Jude Cancer Center Support Grant).
Author information
Authors and Affiliations
Contributions
ZG and CGM conceived and designed the study. JB and ZG developed the algorithms and the R package. JB, ZH, LW, DW and ZG analyzed and interpreted the genomic data. DR and CM uploaded the genomic data to St. Jude Cloud. JB, ZG, and CGM wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
CGM has received consulting fees from Illumina, speaking fees from Amgen, and research support from Pfizer, Loxo Oncology and Abbvie
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Bařinka, J., Hu, Z., Wang, L. et al. RNAseqCNV: analysis of large-scale copy number variations from RNA-seq data. Leukemia 36, 1492–1498 (2022). https://doi.org/10.1038/s41375-022-01547-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41375-022-01547-8