Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Interrogation of clonal tracking data using barcodetrackR


Clonal tracking methods provide quantitative insights into the cellular output of genetically labeled progenitor cells across time and cellular compartments. In the context of gene and cell therapies, clonal tracking methods have enabled the tracking of progenitor cell output both in humans receiving therapies and in corresponding animal models, providing valuable insight into lineage reconstitution, clonal dynamics and vector genotoxicity. However, the absence of a toolbox for analysis of clonal tracking data has precluded the development of standardized analytical frameworks within the field. Thus, we developed barcodetrackR, an R package and accompanying Shiny app containing diverse tools for the analysis and visualization of clonal tracking data. We demonstrate the utility of barcodetrackR in exploring longitudinal clonal patterns and lineage relationships in a number of clonal tracking studies of hematopoietic stem and progenitor cells (HSPCs) in humans receiving HSPC gene therapy and in animals receiving lentivirally transduced HSPC transplants or tumor cells.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Clonal tracking experimental design and barcodetrackR analysis.
Fig. 2: Global clonal distributions.
Fig. 3: Measures of clonal diversity.
Fig. 4: Longitudinal clonal patterns.
Fig. 5: Lineage bias.

Data availability

All clonal tracking datasets analyzed in this study are publicly available with accession instructions outlined in Supplementary Table 1. The Six dataset7 was downloaded from The Belderbos dataset31 was downloaded from its manuscript’s supplementary material. The Elder dataset32 was downloaded from GEO accession GSE149170. The Espinoza dataset30 was downloaded from GEO accession GSE153130. The Wu dataset11 was downloaded from its manuscript’s supplementary material. The Koelle dataset9 was downloaded from The Clarke dataset33 was downloaded from ref. 51. Data were pre-processed in R to create tabular data files amenable for use with barcodetrackR. Source data are provided with this paper.

Code availability

The barcodetrackR package is freely available from GitHub under a Creative Commons 0 license and can be accessed at Additionally, the package is available through the Bioconductor repository ( A frozen version of the package at the time of publication is available on Zenodo52. A frozen and interactive version of the package at the time of publication is available on Code Ocean53, allowing readers to reproduce all figures and the full barcodetrackR vignette within a pre-specified computational environment.


  1. Lu, R., Neff, N. F., Quake, S. R. & Weissman, I. L. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat. Biotechnol. 29, 928–933 (2011).

    Article  Google Scholar 

  2. Wu, C. et al. Clonal tracking of rhesus macaque hematopoiesis highlights a distinct lineage origin for natural killer cells. Cell Stem Cell 14, 486–499 (2014).

    Article  Google Scholar 

  3. Radtke, S. et al. A distinct hematopoietic stem cell population for rapid multilineage engraftment in nonhuman primates. Sci. Transl. Med. 9, eaan1145 (2017).

    Article  Google Scholar 

  4. Kim, S. et al. Dynamics of HSPC repopulation in nonhuman primates revealed by a decade-long clonal-tracking study. Cell Stem Cell 14, 473–485 (2014).

    Article  Google Scholar 

  5. Gerrits, A. et al. Cellular barcoding tool for clonal analysis in the hematopoietic system. Blood 115, 2610–2618 (2010).

    Article  Google Scholar 

  6. Wu, C. et al. Geographic clonal tracking in macaques provides insights into HSPC migration and differentiation. J. Exp. Med. 215, 217–232 (2018).

    Article  Google Scholar 

  7. Six, E. et al. Clonal tracking in gene therapy patients reveals a diversity of human hematopoietic differentiation programs. Blood 135, 1219–1231 (2020).

    Article  Google Scholar 

  8. Biasco, L. et al. In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119 (2016).

    Article  Google Scholar 

  9. Koelle, S. J. et al. Quantitative stability of hematopoietic stem and progenitor cell clonal output in rhesus macaques receiving transplants. Blood 129, 1448–1457 (2017).

    Article  Google Scholar 

  10. Brugman, M. H. et al. Development of a diverse human T-cell repertoire despite stringent restriction of hematopoietic clonality in the thymus. Proc. Natl Acad. Sci. USA 112, E6020–E6027 (2015).

    Article  Google Scholar 

  11. Wu, C. et al. Clonal expansion and compartmentalized maintenance of rhesus macaque NK cell subsets. Sci. Immunol. 3, eaat9781 (2018).

    Article  Google Scholar 

  12. Merino, D. et al. Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer. Nat. Commun. 10, 766 (2019).

    Article  Google Scholar 

  13. Porter, S. N., Baker, L. C., Mittelman, D. & Porteus, M. H. Lentiviral and targeted cellular barcoding reveals ongoing clonal dynamics of cell lines in vitro and in vivo. Genome Biol. 15, R75 (2014).

    Article  Google Scholar 

  14. Sheih, A. et al. Clonal kinetics and single-cell transcriptional profiling of CAR-T cells in patients undergoing CD19 CAR-T immunotherapy. Nat. Commun. 11, 219 (2020).

    Article  Google Scholar 

  15. Cordes, S., Wu, C. & Dunbar, C. E. Clonal tracking of haematopoietic cells: insights and clinical implications. Br. J. Haematol. (2020).

  16. Berry, C. C. et al. INSPIIRED: quantification and visualization tools for analyzing integration site distributions. Mol. Ther. Methods Clin. Dev. 4, 17–26 (2017).

    Article  Google Scholar 

  17. Sherman, E. et al. INSPIIRED: a pipeline for quantitative analysis of sites of new DNA integration in cellular genomes. Mol. Ther. Methods Clin. Dev. 4, 39–49 (2017).

    Article  Google Scholar 

  18. Thielecke, L., Cornils, K. & Glauche, I. genBaRcode: a comprehensive R-package for genetic barcode analysis. Bioinformatics 36, 2189–2194 (2020).

    Article  Google Scholar 

  19. Bramlett, C. et al. Clonal tracking using embedded viral barcoding and high-throughput sequencing. Nat. Protoc. 15, 1436–1458 (2020).

    Article  Google Scholar 

  20. Berry, C. C., Ocwieja, K. E., Malani, N. & Bushman, F. D. Comparing DNA integration site clusters with scan statistics. Bioinformatics 30, 1493–1500 (2014).

    Article  Google Scholar 

  21. Afzal, S., Fronza, R. & Schmidt, M. VSeq-Toolkit: comprehensive computational analysis of viral vectors in gene therapy. Mol. Ther. Methods Clin. Dev. 17, 752–757 (2020).

    Article  Google Scholar 

  22. Hocum, J. D. et al. VISA - Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing. BMC Bioinformatics 16, 212 (2015).

    Article  Google Scholar 

  23. Spinozzi, G. et al. VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites. BMC Bioinformatics 18, 520 (2017).

    Article  Google Scholar 

  24. Hawkins, T. B. et al. Identifying viral integration sites using SeqMap 2.0. Bioinformatics 27, 720–722 (2011).

    Article  Google Scholar 

  25. Zorita, E., Cuscó, P. & Filion, G. J. Starcode: sequence clustering based on all-pairs search. Bioinformatics 31, 1913–1919 (2015).

    Article  Google Scholar 

  26. Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2018).

    Article  Google Scholar 

  27. Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).

    Article  Google Scholar 

  28. Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).

    Article  Google Scholar 

  29. Lyne, A.-M. et al. A track of the clones: new developments in cellular barcoding. Exp. Hematol. 68, 15–20 (2018).

    Article  Google Scholar 

  30. Espinoza, D. A. et al. Aberrant clonal hematopoiesis following lentiviral vector transduction of HSPCs in a rhesus macaque. Mol. Ther. 27, 1074–1086 (2019).

    Article  Google Scholar 

  31. Belderbos, M. E. et al. Donor-to-donor heterogeneity in the clonal dynamics of transplanted human cord blood stem cells in murine xenografts. Biol. Blood Marrow Transplant. 26, 16–25 (2020).

    Article  Google Scholar 

  32. Elder, A. et al. Abundant and equipotent founder cells establish and maintain acute lymphoblastic leukaemia. Leukemia 31, 2577–2586 (2017).

    Article  Google Scholar 

  33. Clarke, E. L. et al. T cell dynamics and response of the microbiota after gene therapy to treat X-linked severe combined immunodeficiency. Genome Med. 10, 70 (2018).

    Article  Google Scholar 

  34. Morgan, M., Obenchain, V., Hester, J. & Pagès, H. SummarizedExperiment: SummarizedExperiment Container (2020);

  35. Chang, W., Cheng, J., Allaire, J. J., Xie, Y. & McPherson, J. Shiny: Web Application Framework for R (2020);

  36. Truitt, L. L. et al. Impact of CMV infection on natural killer cell clonal repertoire in CMV-naïve rhesus macaques. Front. Immunol. 10, 2381 (2019).

    Article  Google Scholar 

  37. Adair, J. E. et al. DNA barcoding in nonhuman primates reveals important limitations in retrovirus integration site analysis. Mol. Ther. Methods Clin. Dev. 17, 796–809 (2020).

    Article  Google Scholar 

  38. Thielecke, L. et al. Limitations and challenges of genetic barcode quantification. Sci. Rep. 7, 43249 (2017).

    Article  Google Scholar 

  39. Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

    Article  Google Scholar 

  40. Maaten, Lvander & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  41. Lin, D. S. et al. DiSNE movie visualization and assessment of clonal kinetics reveal multiple trajectories of dendritic cell development. Cell Rep. 22, 2557–2566 (2018).

    Article  Google Scholar 

  42. Jahn, K., Kuipers, J. & Beerenwinkel, N. Tree inference for single-cell data. Genome Biol. 17, 86 (2016).

    Article  Google Scholar 

  43. Ross, E. M. & Markowetz, F. OncoNEM: inferring tumor evolution from single-cell sequencing data. Genome Biol. 17, 69 (2016).

    Article  Google Scholar 

  44. Zafar, H., Navin, N., Chen, K. & Nakhleh, L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res. 29, 1847–1859 (2019).

    Article  Google Scholar 

  45. Sadeqi Azer, E. et al. PhISCS-BnB: a fast branch and bound algorithm for the perfect tumor phylogeny reconstruction problem. Bioinformatics 36, i169–i176 (2020).

    Article  Google Scholar 

  46. Vavoulis, D. V., Cutts, A., Taylor, J. C. & Schuh, A. A statistical approach for tracking clonal dynamics in cancer using longitudinal next-generation sequencing data. Bioinformatics (2020).

  47. Kebschull, J. M. & Zador, A. M. Cellular barcoding: lineage tracing, screening and beyond. Nat. Methods 15, 871–879 (2018).

    Article  Google Scholar 

  48. Oksanen, J. et al. vegan: Community Ecology Package (2019);

  49. de Vries, A. & Ripley, B. D. ggdendro: Create Dendrograms and Tree Diagrams using ‘ggplot2’ (2016);

  50. Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).

    Article  Google Scholar 

  51. Clarke, E. SCID multiomics post-processed data and analysis (version v0.1.0) [data set]. Zenodo (2018).

  52. Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. barcodetrackR: an R package for the interrogation of clonal tracking data (Zenodo freeze). Zenodo (2021).

  53. Espinoza, D. A., Mortlock, R. D., Koelle, S. J., Wu, C. & Dunbar, C. E. barcodetrackR: an R package for the interrogation of clonal tracking data. Code Ocean (2021).

Download references


We thank D. Allan of the NHLBI Intramural Research Program for his contributions to approaches for statistical testing of genetic tag abundances. We thank members of the Dunbar lab for helpful feedback in revision of this manuscript. D.A.E. was supported by NIH Medical Scientist Training Program T32 GM07170 and T32 G000046. R.D.M., S.J.K., C.W. and C.E.D. were supported by the Division of Intramural Research at the National Heart, Lung and Blood Institute.

Author information

Authors and Affiliations



D.A.E. and R.D.M. wrote the manuscript. D.A.E. and R.D.M. developed code and performed analysis of existing datasets. S.J.K. and C.W. aided with development of visualizations. C.E.D. supervised the project and edited the manuscript.

Corresponding author

Correspondence to Cynthia E. Dunbar.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks Jennifer E. Adair, Mark Enstrom, Ingmar Glauche and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Fernando Chirigati was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Section 1, Table 1 and Figs. 1–7.

Source data

Source Data Fig. 2

Numerical data underlying all panels of Fig. 2. Data for each panel is in a separate tab of the spreadsheet.

Source Data Fig. 3

Numerical data underlying all panels of Fig. 3. Data for each panel is in a separate tab of the spreadsheet.

Source Data Fig. 4

Numerical data underlying all panels of Fig. 4. Data for each panel is in a separate tab of the spreadsheet.

Source Data Fig. 5

Numerical data underlying all panels of Fig. 5. Data for each panel is in a separate tab of the spreadsheet.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Espinoza, D.A., Mortlock, R.D., Koelle, S.J. et al. Interrogation of clonal tracking data using barcodetrackR. Nat Comput Sci 1, 280–289 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing