Clonal tracking methods provide quantitative insights into the cellular output of genetically labeled progenitor cells across time and cellular compartments. In the context of gene and cell therapies, clonal tracking methods have enabled the tracking of progenitor cell output both in humans receiving therapies and in corresponding animal models, providing valuable insight into lineage reconstitution, clonal dynamics and vector genotoxicity. However, the absence of a toolbox for analysis of clonal tracking data has precluded the development of standardized analytical frameworks within the field. Thus, we developed barcodetrackR, an R package and accompanying Shiny app containing diverse tools for the analysis and visualization of clonal tracking data. We demonstrate the utility of barcodetrackR in exploring longitudinal clonal patterns and lineage relationships in a number of clonal tracking studies of hematopoietic stem and progenitor cells (HSPCs) in humans receiving HSPC gene therapy and in animals receiving lentivirally transduced HSPC transplants or tumor cells.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Connecting past and present: single-cell lineage tracing
Protein & Cell Open Access 19 April 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
All clonal tracking datasets analyzed in this study are publicly available with accession instructions outlined in Supplementary Table 1. The Six dataset7 was downloaded from https://github.com/BushmanLab/HSC_diversity/tree/master/data. The Belderbos dataset31 was downloaded from its manuscript’s supplementary material. The Elder dataset32 was downloaded from GEO accession GSE149170. The Espinoza dataset30 was downloaded from GEO accession GSE153130. The Wu dataset11 was downloaded from its manuscript’s supplementary material. The Koelle dataset9 was downloaded from https://github.com/dunbarlabNIH/R-code-and-tabular-data. The Clarke dataset33 was downloaded from ref. 51. Data were pre-processed in R to create tabular data files amenable for use with barcodetrackR. Source data are provided with this paper.
The barcodetrackR package is freely available from GitHub under a Creative Commons 0 license and can be accessed at https://github.com/dunbarlabNIH/barcodetrackR. Additionally, the package is available through the Bioconductor repository (https://www.bioconductor.org/). A frozen version of the package at the time of publication is available on Zenodo52. A frozen and interactive version of the package at the time of publication is available on Code Ocean53, allowing readers to reproduce all figures and the full barcodetrackR vignette within a pre-specified computational environment.
Lu, R., Neff, N. F., Quake, S. R. & Weissman, I. L. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat. Biotechnol. 29, 928–933 (2011).
Wu, C. et al. Clonal tracking of rhesus macaque hematopoiesis highlights a distinct lineage origin for natural killer cells. Cell Stem Cell 14, 486–499 (2014).
Radtke, S. et al. A distinct hematopoietic stem cell population for rapid multilineage engraftment in nonhuman primates. Sci. Transl. Med. 9, eaan1145 (2017).
Kim, S. et al. Dynamics of HSPC repopulation in nonhuman primates revealed by a decade-long clonal-tracking study. Cell Stem Cell 14, 473–485 (2014).
Gerrits, A. et al. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood 115, 2610–2618 (2010).
Wu, C. et al. Geographic clonal tracking in macaques provides insights into HSPC migration and differentiation. J. Exp. Med. 215, 217–232 (2018).
Six, E. et al. Clonal tracking in gene therapy patients reveals a diversity of human hematopoietic differentiation programs. Blood 135, 1219–1231 (2020).
Biasco, L. et al. In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119 (2016).
Koelle, S. J. et al. Quantitative stability of hematopoietic stem and progenitor cell clonal output in rhesus macaques receiving transplants. Blood 129, 1448–1457 (2017).
Brugman, M. H. et al. Development of a diverse human T-cell repertoire despite stringent restriction of hematopoietic clonality in the thymus. Proc. Natl Acad. Sci. USA 112, E6020–E6027 (2015).
Wu, C. et al. Clonal expansion and compartmentalized maintenance of rhesus macaque NK cell subsets. Sci. Immunol. 3, eaat9781 (2018).
Merino, D. et al. Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer. Nat. Commun. 10, 766 (2019).
Porter, S. N., Baker, L. C., Mittelman, D. & Porteus, M. H. Lentiviral and targeted cellular barcoding reveals ongoing clonal dynamics of cell lines in vitro and in vivo. Genome Biol. 15, R75 (2014).
Sheih, A. et al. Clonal kinetics and single-cell transcriptional profiling of CAR-T cells in patients undergoing CD19 CAR-T immunotherapy. Nat. Commun. 11, 219 (2020).
Cordes, S., Wu, C. & Dunbar, C. E. Clonal tracking of haematopoietic cells: insights and clinical implications. Br. J. Haematol. https://doi.org/10.1111/bjh.17175 (2020).
Berry, C. C. et al. INSPIIRED: quantification and visualization tools for analyzing integration site distributions. Mol. Ther. Methods Clin. Dev. 4, 17–26 (2017).
Sherman, E. et al. INSPIIRED: a pipeline for quantitative analysis of sites of new DNA integration in cellular genomes. Mol. Ther. Methods Clin. Dev. 4, 39–49 (2017).
Thielecke, L., Cornils, K. & Glauche, I. genBaRcode: a comprehensive R-package for genetic barcode analysis. Bioinformatics 36, 2189–2194 (2020).
Bramlett, C. et al. Clonal tracking using embedded viral barcoding and high-throughput sequencing. Nat. Protoc. 15, 1436–1458 (2020).
Berry, C. C., Ocwieja, K. E., Malani, N. & Bushman, F. D. Comparing DNA integration site clusters with scan statistics. Bioinformatics 30, 1493–1500 (2014).
Afzal, S., Fronza, R. & Schmidt, M. VSeq-Toolkit: comprehensive computational analysis of viral vectors in gene therapy. Mol. Ther. Methods Clin. Dev. 17, 752–757 (2020).
Hocum, J. D. et al. VISA - Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing. BMC Bioinformatics 16, 212 (2015).
Spinozzi, G. et al. VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites. BMC Bioinformatics 18, 520 (2017).
Hawkins, T. B. et al. Identifying viral integration sites using SeqMap 2.0. Bioinformatics 27, 720–722 (2011).
Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Lyne, A.-M. et al. A track of the clones: new developments in cellular barcoding. Exp. Hematol. 68, 15–20 (2018).
Espinoza, D. A. et al. Aberrant clonal hematopoiesis following lentiviral vector transduction of HSPCs in a rhesus macaque. Mol. Ther. 27, 1074–1086 (2019).
Belderbos, M. E. et al. Donor-to-donor heterogeneity in the clonal dynamics of transplanted human cord blood stem cells in murine xenografts. Biol. Blood Marrow Transplant. 26, 16–25 (2020).
Elder, A. et al. Abundant and equipotent founder cells establish and maintain acute lymphoblastic leukaemia. Leukemia 31, 2577–2586 (2017).
Clarke, E. L. et al. T cell dynamics and response of the microbiota after gene therapy to treat X-linked severe combined immunodeficiency. Genome Med. 10, 70 (2018).
Morgan, M., Obenchain, V., Hester, J. & Pagès, H. SummarizedExperiment: SummarizedExperiment Container (2020); https://bioconductor.org/packages/SummarizedExperiment
Chang, W., Cheng, J., Allaire, J. J., Xie, Y. & McPherson, J. Shiny: Web Application Framework for R (2020); https://CRAN.R-project.org/package=shiny
Truitt, L. L. et al. Impact of CMV infection on natural killer cell clonal repertoire in CMV-naïve rhesus macaques. Front. Immunol. 10, 2381 (2019).
Adair, J. E. et al. DNA barcoding in nonhuman primates reveals important limitations in retrovirus integration site analysis. Mol. Ther. Methods Clin. Dev. 17, 796–809 (2020).
Thielecke, L. et al. Limitations and challenges of genetic barcode quantification. Sci. Rep. 7, 43249 (2017).
Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).
Maaten, Lvander & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Lin, D. S. et al. DiSNE movie visualization and assessment of clonal kinetics reveal multiple trajectories of dendritic cell development. Cell Rep. 22, 2557–2566 (2018).
Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol. 17, 86 (2016).
Ross, E. M. & Markowetz, F. OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol. 17, 69 (2016).
Zafar, H., Navin, N., Chen, K. & Nakhleh, L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res. 29, 1847–1859 (2019).
Sadeqi Azer, E. et al. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 36, i169–i176 (2020).
Vavoulis, D. V., Cutts, A., Taylor, J. C. & Schuh, A. A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa672 (2020).
Kebschull, J. M. & Zador, A. M. Cellular barcoding: lineage tracing, screening and beyond. Nat. Methods 15, 871–879 (2018).
Oksanen, J. et al. vegan: Community Ecology Package (2019); https://CRAN.R-project.org/package=vegan
de Vries, A. & Ripley, B. D. ggdendro: Create Dendrograms and Tree Diagrams using ‘ggplot2’ (2016); https://CRAN.R-project.org/package=ggdendro
Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).
Clarke, E. SCID multiomics post-processed data and analysis (version v0.1.0) [data set]. Zenodo https://doi.org/10.5281/zenodo.1256169 (2018).
Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. barcodetrackR: an R package for the interrogation of clonal tracking data (Zenodo freeze). Zenodo https://doi.org/10.5281/zenodo.4609410 (2021).
Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. barcodetrackR: an R package for the interrogation of clonal tracking data. Code Ocean https://doi.org/10.24433/CO.6231752.v2 (2021).
We thank D. Allan of the NHLBI Intramural Research Program for his contributions to approaches for statistical testing of genetic tag abundances. We thank members of the Dunbar lab for helpful feedback in revision of this manuscript. D.A.E. was supported by NIH Medical Scientist Training Program T32 GM07170 and T32 G000046. R.D.M., S.J.K., C.W. and C.E.D. were supported by the Division of Intramural Research at the National Heart, Lung and Blood Institute.
The authors declare no competing interests.
Peer review information Nature Computational Science thanks Jennifer E. Adair, Mark Enstrom, Ingmar Glauche and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Fernando Chirigati was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source Data Fig. 2
Numerical data underlying all panels of Fig. 2. Data for each panel is in a separate tab of the spreadsheet.
Source Data Fig. 3
Numerical data underlying all panels of Fig. 3. Data for each panel is in a separate tab of the spreadsheet.
Source Data Fig. 4
Numerical data underlying all panels of Fig. 4. Data for each panel is in a separate tab of the spreadsheet.
Source Data Fig. 5
Numerical data underlying all panels of Fig. 5. Data for each panel is in a separate tab of the spreadsheet.
Rights and permissions
About this article
Cite this article
Espinoza, D.A., Mortlock, R.D., Koelle, S.J. et al. Interrogation of clonal tracking data using barcodetrackR. Nat Comput Sci 1, 280–289 (2021). https://doi.org/10.1038/s43588-021-00057-4
This article is cited by
Connecting past and present: single-cell lineage tracing
Protein & Cell (2022)
A key toolbox for cellular barcoding analysis
Nature Computational Science (2021)