Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Retrieval of vector integration sites from cell-free DNA

Abstract

Gene therapy (GT) has rapidly attracted renewed interest as a treatment for otherwise incurable diseases, with several GT products already on the market and many more entering clinical testing for selected indications. Clonal tracking techniques based on vector integration enable monitoring of the fate of engineered cells in the blood of patients receiving GT and allow assessment of the safety and efficacy of these procedures. However, owing to the limited number of cells that can be tested and the impracticality of studying cells residing in peripheral organs without performing invasive biopsies, this approach provides only a partial snapshot of the clonal repertoire and dynamics of genetically modified cells and reduces the predictive power as a safety readout. In this study, we developed liquid biopsy integration site sequencing, or LiBIS-seq, a polymerase chain reaction technique optimized to quantitatively retrieve vector integration sites from cell-free DNA released into the bloodstream by dying cells residing in several tissues. This approach enabled longitudinal monitoring of in vivo liver-directed GT and clonal tracking in patients receiving hematopoietic stem cell GT, improving our understanding of the clonal composition and turnover of genetically modified cells in solid tissues and, in contrast to conventional analyses based only on circulating blood cells, enabling earlier detection of vector-marked clones that are aberrantly expanding in peripheral tissues.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Retrieval of vector IS from cfDNA of patients treated with GT.
Fig. 2: Retrieval of IS from cfDNA purified from plasma of seven patients with MLD treated with LV-based HSC GT.
Fig. 3: IS retrieved from cfDNA allows identification of vector-marked tumors expanding in solid organs.
Fig. 4: Detection of malignant lymphomatous expansions in a mouse X-SCID HSC transplantation model by LiBIS-seq.
Fig. 5: Adjusted H-index over time in patients with MLD, WAS and SCID-X1 treated with LV- and γRV-based HSC GT.
Fig. 6: LV ISs retrieved from cfDNA purified from serum of dogs treated with in vivo liver-directed GT.

Data availability

All requests for raw and analyzed data are promptly reviewed by the corresponding author to verify if the request is subject to any intellectual property or confidentiality obligations. Patient-related data not included in the paper were generated as part of clinical trials and might be subject to patient confidentiality. Any data that can be shared will be released via a material transfer agreement. Source data are provided with this paper.

Code availability

Software scripts developed for the integration site analysis will be made available by the corresponding author upon reasonable request.

References

  1. 1.

    Naldini, L. Gene therapy returns to centre stage. Nature 526, 351–360 (2015).

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    Dunbar, C. E. et al. Gene therapy comes of age. Science 359, eaan4672 (2018).

  3. 3.

    Hacein-Bey-Abina, S. et al. LMO2-associated clonal T cell proliferation in two patients after gene therapy for SCID-X1. Science 302, 415–419 (2003).

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Howe, S. J. et al. Insertional mutagenesis combined with acquired somatic mutations causes leukemogenesis following gene therapy of SCID-X1 patients. J. Clin. Invest. 118, 3143–3150 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  5. 5.

    Ott, M. G. et al. Correction of X-linked chronic granulomatous disease by gene therapy, augmented by insertional activation of MDS1-EVI1, PRDM16 or SETBP1. Nat. Med. 12, 401–409 (2006).

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Stein, S. et al. Genomic instability and myelodysplasia with monosomy 7 consequent to EVI1 activation after gene therapy for chronic granulomatous disease. Nat. Med. 16, 198–204 (2010).

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Braun, C. J. et al. Gene therapy for Wiskott–Aldrich syndrome—long-term efficacy and genotoxicity. Sci. Transl. Med. 6, 227ra233 (2014).

    Article  CAS  Google Scholar 

  8. 8.

    Hacein-Bey-Abina, S. et al. Insertional oncogenesis in 4 patients after retrovirus-mediated gene therapy of SCID-X1. J. Clin. Invest. 118, 3132–3142 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Cavazzana-Calvo, M. et al. Transfusion independence and HMGA2 activation after gene therapy of human β-thalassaemia. Nature 467, 318–322 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10.

    Fraietta, J. A. et al. Disruption of TET2 promotes the therapeutic efficacy of CD19-targeted T cells. Nature 558, 307–312 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11.

    Cesana, D. et al. Whole transcriptome characterization of aberrant splicing events induced by lentiviral vector integrations. J. Clin. Invest. 122, 1667–1676 (2012).

  12. 12.

    Montini, E. et al. The genotoxic potential of retroviral vectors is strongly modulated by vector design and integration site selection in a mouse model of HSC gene therapy. J. Clin. Invest. 119, 964–975 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Aiuti, A. et al. Lentiviral hematopoietic stem cell gene therapy in patients with Wiskott–Aldrich syndrome. Science 341, 1233151 (2013).

  14. 14.

    Biffi, A. et al. Lentiviral hematopoietic stem cell gene therapy benefits metachromatic leukodystrophy. Science 341, 1233158 (2013).

  15. 15.

    Marktel, S. et al. Intrabone hematopoietic stem cell gene therapy for adult and pediatric patients affected by transfusion-dependent ss-thalassemia. Nat. Med. 25, 234–241 (2019).

    CAS  PubMed  Article  Google Scholar 

  16. 16.

    Sessa, M. et al. Lentiviral haemopoietic stem-cell gene therapy in early-onset metachromatic leukodystrophy: an ad-hoc analysis of a non-randomised, open-label, phase 1/2 trial. Lancet 388, 476–487 (2016).

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Crowley, E., Di Nicolantonio, F., Loupakis, F. & Bardelli, A. Liquid biopsy: monitoring cancer-genetics in the blood. Nat. Rev. Clin. Oncol. 10, 472–484 (2013).

    CAS  PubMed  Article  Google Scholar 

  18. 18.

    Wan, J. C. M. et al. Liquid biopsies come of age: towards implementation of circulating tumour DNA. Nat. Rev. Cancer 17, 223–238 (2017).

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Firouzi, S. et al. Development and validation of a new high-throughput method to investigate the clonality of HTLV-1-infected cells based on provirus integration sites. Genome Med. 6, 46 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  20. 20.

    Gillet, N. A. et al. The host genomic environment of the provirus determines the abundance of HTLV-1-infected T-cell clones. Blood 117, 3113–3122 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21.

    Berry, C. C. et al. Estimating abundances of retroviral insertion sites from DNA fragment length data. Bioinformatics 28, 755–762 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22.

    Spinozzi, G. et al. VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites. BMC Bioinformatics 18, 520 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  23. 23.

    Wells, D. W. et al. Correction to: An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses. BMC Genomics 21, 517 (2020).

    PubMed  PubMed Central  Article  Google Scholar 

  24. 24.

    Biffi, A. et al. Lentiviral vector common integration sites in preclinical models and a clinical trial reflect a benign integration bias and not oncogenic selection. Blood 117, 5332–5339 (2011).

    CAS  PubMed  Article  Google Scholar 

  25. 25.

    Schroder, A. R. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).

    CAS  PubMed  Article  Google Scholar 

  26. 26.

    Lusic, M. & Siliciano, R. F. Nuclear landscape of HIV-1 infection and integration. Nat. Rev. Microbiol. 15, 69–82 (2017).

    CAS  PubMed  Article  Google Scholar 

  27. 27.

    Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Adair, J. E. et al. DNA barcoding in nonhuman primates reveals important limitations in retrovirus integration site analysis. Mol. Ther. Methods Clin. Dev. 17, 796–809 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Cavazzana, M., Bushman, F. D., Miccio, A., Andre-Schmutz, I. & Six, E. Gene therapy targeting haematopoietic stem cells for inherited diseases: progress and challenges. Nat. Rev. Drug Discov. 18, 447–462 (2019).

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Abstract_753. ASGCT Annual Meeting Abstracts. Mol. Ther. 25, 1–363 (2017).

    Google Scholar 

  31. 31.

    Cicalese, M. P. & Aiuti, A. Clinical applications of gene therapy for primary immunodeficiencies. Hum. Gene Ther. 26, 210–219 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. 32.

    Schiroli, G. et al. Preclinical modeling highlights the therapeutic potential of hematopoietic stem cell gene editing for correction of SCID-X1. Sci. Transl. Med. 9, eaan0820 (2017).

  33. 33.

    Haegeman, B. et al. Robust estimation of microbial diversity in theory and in practice. ISME J. 7, 1092–1101 (2013).

    PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Wang, G. P. et al. Dynamics of gene-modified progenitor cells analyzed by tracking retroviral integration sites in a human SCID-X1 gene therapy trial. Blood 115, 4356–4366 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35.

    Cantore, A. et al. Liver-directed lentiviral gene therapy in a dog model of hemophilia B. Sci. Transl. Med. 7, 277ra228 (2015).

    Article  CAS  Google Scholar 

  36. 36.

    Nienhuis, A. W., Nathwani, A. C. & Davidoff, A. M. Gene therapy for hemophilia. Mol. Ther. 25, 1163–1167 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. 37.

    Chao, A. Estimating the population size for capture-recapture data with unequal catchability. Biometrics 43, 783–791 (1987).

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Butt, A. N. & Swaminathan, R. Overview of circulating nucleic acids in plasma/serum. Ann. NY Acad. Sci. 1137, 236–242 (2008).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  39. 39.

    Ballabio, A. & Gieselmann, V. Lysosomal disorders: from storage to cellular damage. Biochim. Biophys. Acta 1793, 684–696 (2009).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  40. 40.

    Peled, M. et al. Cell-free DNA concentration in patients with clinical or mammographic suspicion of breast cancer. Sci. Rep. 10, 14601 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41.

    Schwarzenbach, H., Hoon, D. S. & Pantel, K. Cell-free nucleic acids as biomarkers in cancer patients. Nat. Rev. Cancer 11, 426–437 (2011).

    CAS  PubMed  Article  Google Scholar 

  42. 42.

    Su, Y. et al. Increased plasma concentration of cell-free DNA precedes disease recurrence in children with high-risk neuroblastoma. BMC Cancer 20, 102 (2020).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  43. 43.

    Tissot, C. et al. Circulating free DNA concentration is an independent prognostic biomarker in lung cancer. Eur. Respir. J. 46, 1773–1780 (2015).

    CAS  PubMed  Article  Google Scholar 

  44. 44.

    Moss, J. et al. Comprehensive human cell-type methylation atlas reveals origins of circulating cell-free DNA in health and disease. Nat. Commun. 9, 5068 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  45. 45.

    Bianconi, E. et al. An estimation of the number of cells in the human body. Ann. Hum. Biol. 40, 463–471 (2013).

    PubMed  Article  Google Scholar 

  46. 46.

    Stanger, B. Z. Cellular homeostasis and repair in the mammalian liver. Annu Rev. Physiol. 77, 179–200 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. 47.

    Milani, M. et al. Phagocytosis-shielded lentiviral vectors improve liver gene therapy in nonhuman primates. Sci. Transl. Med. 11, eaav7325 (2019).

  48. 48.

    Duncan, A. W., Dorrell, C. & Grompe, M. Stem cells and liver regeneration. Gastroenterology 137, 466–481 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  49. 49.

    Racanelli, V. & Rehermann, B. The liver as an immunological organ. Hepatology 43, S54–S62 (2006).

    CAS  PubMed  Article  Google Scholar 

  50. 50.

    Benten, D. et al. Hepatic targeting of transplanted liver sinusoidal endothelial cells in intact mice. Hepatology 42, 140–148 (2005).

    CAS  PubMed  Article  Google Scholar 

  51. 51.

    Fausto, N., Campbell, J. S. & Riehle, K. J. Liver regeneration. Hepatology 43, S45–S53 (2006).

    CAS  PubMed  Article  Google Scholar 

  52. 52.

    Kopp, J. L., Grompe, M. & Sander, M. Stem cells versus plasticity in liver and pancreas regeneration. Nat. Cell Biol. 18, 238–245 (2016).

    PubMed  Article  CAS  Google Scholar 

  53. 53.

    Hacein-Bey-Abina, S. et al. Efficacy of gene therapy for X-linked severe combined immunodeficiency. N. Engl. J. Med. 363, 355–364 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. 54.

    Meddeb, R., Pisareva, E. & Thierry, A. R. Guidelines for the preanalytical conditions for analyzing circulating cell-free DNA. Clin. Chem. 65, 623–633 (2019).

    CAS  PubMed  Article  Google Scholar 

  55. 55.

    Wells, D. W. et al. An analytical pipeline for identifying and mapping the integration sites of HIV and other retroviruses. BMC Genomics 21, 216 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. 56.

    Bailey, T. L. & Elkan, C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell. Syst. Mol. Biol. 2, 28–36 (1994).

    CAS  PubMed  Google Scholar 

  57. 57.

    Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  58. 58.

    Simonsen, A. T. et al. Systematic evaluation of signal-to-noise ratio in variant detection from single cell genome multiple displacement amplification and exome sequencing. BMC Genomics 19, 681 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  59. 59.

    Chao, A., Tsay, P. K., Lin, S. H., Shau, W. Y. & Chao, D. Y. The applications of capture–recapture models to epidemiological data. Stat. Med. 20, 3123–3157 (2001).

    CAS  PubMed  Article  Google Scholar 

  60. 60.

    Schnabel, Z. E. The estimation of the total fish population of a lake. Am. Math. Monthly 45, 348–352 (1938).

    Google Scholar 

  61. 61.

    Gillet, N. A. et al. The host genomic environment of the provirus determines the abundance of HTLV-1-infected T-cell clones. Blood 117, 3113–3122 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. 62.

    Biasco, L. et al. In vivo tracking of human hematopoiesis reveals patterns of clonal dynamics during early and steady-state reconstitution phases. Cell Stem Cell 19, 107–119 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  63. 63.

    Scala, S. et al. Dynamics of genetically engineered hematopoietic stem and progenitor cells after autologous transplantation in humans. Nat. Med. 24, 1683–1690 (2018).

    CAS  PubMed  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Telethon Foundation TGT16B01 and TGT16B03 to E.M. and Giovani Ricercatori Grant 2013 from the Italian Ministry of Health to A. Calabria and D.C. (GR-2016–02363681). We are grateful to all members of the Montini lab, especially A. Albertini and S. Esposito, for technical help and to L. Albano for support on the X-SCID mouse model of HSC GT. We are also grateful for the work of J. Everett and A. Roche Doto from the Bushman lab. Elements of Fig. 4 were obtained from Servier Medical Art (https://smart.servier.com), which is licensed under a Creative Commons Attribution 3.0 Unported License, and from Vecteezy (https://www.vecteezy.com).

Author information

Affiliations

Authors

Contributions

D.C. conceived and developed the project, performed experiments and wrote the manuscript. L.R., P.G. and F. Benedicenti provided technical support. A. Calabria and G. Spinozzi performed bioinformatics analyses. G. Schiroli and P.G. performed experiments on the T cell lymphoma mouse model. A. Cantore, A.M., S.A., F.F., V.C. and M.W. provided sample material from patients treated with MLD, WAS and X-SCID GT and hemophilic dogs. F. Bushman provided information on vector insertion in lymphoma cells in X-SCID P9. C.K., A.A., A.F., M.C., E.S. and L.N. provided all the clinical samples and critically reviewed the manuscript. E.M. conceived and supervised the project, designed the experiments and wrote the manuscript.

Corresponding author

Correspondence to Eugenio Montini.

Ethics declarations

Competing interests

D.C. and E.M. are inventors of the LiBIS-seq method for which a United Kingdom patent was filed in April 2019. The patent is owned and managed by the San Raffaele Scientific Institute and the Telethon Foundation. All other authors have no competing interests.

Additional information

Peer review information Nature Medicine thanks Vijay Sankaran, Matthew Porteus and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Isolation of cfDNA from blood plasma of MLD patients treated by LV-based HSC-GT.

a, Representative examples of the size profile of the cfDNA fragments purified from blood plasma of the indicated MLD patients; b) amount of blood plasma cfDNA (ng/ml) in MLD patients. Blood plasma was collected before gene therapy (pre-GT), within the first 3 months post-GT (Early 1–3 mo) and thereafter (Late 6–72 mo); each dot is an independent measure across different time point for the same patient. Bar is the averaged value. Statistics was performed using Kruskal-Wallis with Dunn’s multiple comparison test, N = 6 biological independent patients (MLD04 was excluded from the analyses because affected by the early juvenile form of MLD and treated when already symptomatic while the remaining 6 patients were affected by the late infantile form and were treated when still asymptomatic; grey area refers to cfDNA concentration levels in healthy subjects); c, d) Distribution of the DNA fragment sizes after PCR amplification and sequencing starting from sonicated genomic DNA purified from PBMC (c) and cfDNA (d). The distribution of the fragment sizes after PCR amplification and sequencing from cfDNA, except for a peak around 167 bp, was similar to the size distribution of PCR products generated from the sonicated cell-derived DNA.

Source data

Extended Data Fig. 2 Retrieval of IS from cfDNA of MLD patients treated by LV-based HSC-GT.

a, b, Frequency distribution of LV integrations around gene transcriptional start sites in cfDNA (a) and cell-derived genomic DNA (b) in 7 MLD patients; c) Frequency distribution across the human genome of refSeq genes (black) and LV IS (cell-derived genomic DNA is represented in blue and cfDNA in red). d, Detail of four genomic hotspots of LV integration in cfDNA (red line) and cell-derived genomic DNA (blue line). % IS (y-axis) across the selected chromosomal coordinates (x-axis). The symbols of some highly targeted genes within the region are indicated; e) Scatter plot of gene targeting frequency by IS in cfDNA (y-axis) and cell-derived IS (x axis). Gene targeting frequency was calculated as the number of targeting IS per gene divided by gene the length (+200,000 bp as constant value to reduce the extreme high integration frequencies found when targeting very small genes) versus the total number of IS for each dataset. Each dot represents a gene and the color scale refers to the p-value calculated by Fisher exact test comparison and after FDR correction. No significant differences in gene targeting were observed between genomic- and cfDNA datasets. f, Gene Ontology (GO) analysis of ISs retrieved from cfDNA of 7 MLD patients and performed using GREAT software. Significant overrepresented Gene Ontology classes of the Cellular Component (CC), Biological Process (BP) and Molecular Function (MF). Systems are shown if having a Fold enrichment score higher than 2. Red lines indicate the threshold level of significance p < 0.001. g, Percentage of cell and cfDNA derived IS targeting accessible regions defined by ATAC-seq in human CD34 + HSCPs, common lymphoid progenitors (CLP), common myeloid progenitors, mature B and T cells. Number of IS in cfDNA, cell-derived IS targeting and ATAC-seq region were compared by two sided Fisher exact test.

Source data

Extended Data Fig. 3 Accuracy and quantification of cfDNA and cell derived IS in MLD HSC-GT patients.

a, Relative abundance (%) of CEM1 IS and CEM6 ISs (y-axis) in each DNA dilution (x-axis) defined by sonicLength quantification methods. Dashed lines represented the expected theoretical abundance of each IS; full lines represented the experimental value. For each dilution dot refers to the average among technical triplicates (N = 3) and s.e.m is shown. b, Fold change variation of the observed versus the expected value of relative abundance of CEM1 IS (y-axis) determined for each PCR replicate of each DNA dilution (x-axis). N = 8 biologically independent samples (dilution) in technical triplicates. c, Signal to noise ratio (SNR) for the different dilution. As shown in Figure, the SNR at the lowest dilution was tenfold higher (28%, SNR = 54) compared to 0.16% dilution (SNR = 4.5, corresponding to 210 theoretical molecules). d-f, Percentage of IS (y-axis) represented by the indicated number of genomes (x-axis, Genome bin) in IS dataset derived from cfDNA (d), genomic DNA of BM-derived cells (e) and genomic DNA of PB derived cells (f) of MLD patients treated by HSPC- gene therapy. N = 7 biologically independent patients. For each patient is shown the average value among different time points (N = 3–8) and s.e.m.

Extended Data Fig. 4 Tracking of IS from genomic DNA purified from blood cells of MLD patients.

a, b, Stacked bar plots showing the abundance of LV IS retrieved over time (months, x-axis) from genomic DNA purified from PBMC (a) and whole blood cells (b) of 7 MLD patients. In each column, each LV IS is represented by different colors, whose height is proportional with the number of fragments (plasma) retrieved for that IS over the total and for each specific time point (%IS Abundance, y-axis). Ribbons connect LV IS tracked among consecutive time points. The number of unique IS retrieved from each specific time points is indicated in blue above each column. PBMC: Peripheral Blood Mononuclear Cells. c) Dot plots showing the number of genomes associated to IS located nearby cancer-associated (red) or other (blue) genes in cfDNA- and WPB dataset of the different MLD patients. Represented on the y-axis for each IS is the number of genomes for WPB IS and the number of fragments for cfDNA. Number of IS retrieved and closed to gene annotated as cancer or other is indicated above each column. Shown are boxplots where min and max are boundary whiskers and boxes represent the interquartile range (IQR, between the 25th and 75th percentile), the median is represented as bar within the box. Statistics is performed for each pair by two tailed Mann-Whitney t-test, p-value is indicated. Total number of genes belonging to Cancer (red) or Other (blue) group is indicated at the top of each panel.

Source data

Extended Data Fig. 5 Tracking overtime of cfDNA-derived IS with cell-derived IS.

a-g, Classification map showing all LV IS (left part of the graph) retrieved over time from cfDNA purified from blood plasma of the indicated patients. Each row represents a unique LV IS, and for each IS the coloured bar indicate when and where the IS was identified across the different cell lineage and time point after gene therapy (columns). Lack of colour indicates that the integration was not retrieved at the indicated time point and source; Wh.: whole blood or whole bone marrow cells; MNC: Mononuclear Cells; PBMC: Peripheral Blood Mononuclear Cells; CD34: HSPCs; CD13, CD14, CD15: (Myeloid cells); CD19: B cells; CD3: T cells; h) Percentage (%) of IS shared between cell lineages and MNCs (black) and between cell-derived ISs and cfDNA-derived ISs in orange. Number of shared IS in cfDNA and cell-derived IS were compared by two-sided Fisher exact test, p-value is indicated. i, j, Percentage (%) of the indicated number of genomes (x-axis) for IS (y-axis) from cfDNA datasets that shared with myeloid (CD15+, CD13+, CD14+) or lymphoid cell lineages (CD19+ and CD3+) purified from BM and PB; k) Percentage (%) of cfDNA derived IS shared or not with hematopoietic cell lineages and represented by the indicated number of genomes (x-axis, Genome bin). IS of the cfDNA-derived datasets that shared with any hematopoietic lineage are represented by a significant higher number of genomes than IS from cfDNA that are not sharing with hematopoietic lineages, as determined by Multiple unpaired t-test and the Holm-Sidak method for multiple comparison correction; l) Percentage (%) of cfDNA derived IS that are recaptured across different time points; Statistics is performed by paired T test of Log-ODD transformed value. From I to L, N = 7 of biologically independent patients in all except for L, where N = 6 (No recaptured IS found in cfDNA from MLD07).

Source data

Extended Data Fig. 6 Genomic distribution of γRV ISs in T-ALL and lymphoma.

a, b, Genomic distribution of γRV IS identified in the T-ALL leukemic clones for patient WAS P7 (a) and WAS P5 (b) and SCID-X1 P9 (c). In each schema, genomic coordinates and scale are indicated. Blue boxes and bars indicate exons, black arrows indicate the orientation of gene transcription; red triangles indicate position and orientation of the integration γRV IS site.

Extended Data Fig. 7 Hematopoietic reconstitution, phenotypical and molecular characterization of lymphoblastic T lymphomas developing in SCID-X1 mice.

a, Schema of the gating strategy: single cells were selected by representing FSC Height parameter vs Area. Then, cell populations were selected based on their physical parameters (FSC-A/SSC-A). Dead cells were excluded by Live/dead cell staining kit that marked dead cells. Next, donor cells were gated by the expression of CD45.1 and Myeloid, B cells and T cells were then identified by expression of CD11b, CD19 or CD3 markers, respectively. T cell subtypes were distinguished by CD4 and CD8 expression; b) Engraftment level of SINLV transduced CD45.1 WT lin- cells detected in peripheral blood of transplanted SCID-X1 mice and stratified by tumor outcome; c) Percentage of GFP + cells within the transplanted population of CD45.1 cell. Mice are divided based on tumor outcome (N = 6 and N = 5 respectively) d) Relative proportion of B (CD19 + ) and T (CD3 + CD4 + and CD3 + CD 8 + ) cells within the CD45.1 cell population detected in peripheral blood at 8 weeks post-transplant; e) Heatmap graph showing the relative proportion of the indicated cell populations in the thymus of mice at sacrifice; f) Representative flow cytometry plot of CD4 + and CD8 + markers in thymus of tumor-bearing mice (A3, A4, and B1) and a WT control. G, I, K, L) Tracking over time and DNA sources of IS shared among plasma and tissues in mice that developed T cell lymphoblastic leukemia (B1, A4) and mice that did not (C1, C2). Each row represents a unique LV IS, whose color is proportional to the relative level of abundance of that IS over the total (red: high; blue: low level of abundance). Lack of color indicates that the IS was not retrieved at the indicated time point and source. Unique IS number retrieved at each time point is indicated above the column; h, j) Relative level of abundance over time and among different DNA sources of the top abundant LV IS retrieved in the thymus of mouse B1 (6 LV IS identified as top abundant that mapped near Trps1, Samsn1, DseI, Tbl1x, Zwint and Cep44) and A4 (2 LV IS identified as top abundant that mapped near RefSeq gene Gm10732 and Rictor).

Extended Data Fig. 8 Shannon Diversity index (H-index) correction.

H-index values (y-axis) of WBM and WPB IS datasets of MLD patients obtained by LAM-PCR (N = 108, independent PCR reaction from 7 patients) and SLiM PCR methods (N = 30, independent PCR reaction from 7 patients); b) PCR efficiency of LAM (N = 108) and SLiM PCR (N = 30) approaches obtained dividing the observed number of IS retrieved in WBM and WPB samples of the 7 MLD patients versus the expected number of IS. Shown is the average ± s.e.m, statistics performed by two-tailed non-parametric Mann-Whitney test. c, Correlation between the H-index values calculated in WBM, WPB and cfDNA samples of 7 MLD patients (y-axis) and the amount of DNA material used in the PCR reaction (ng DNA, x-axis); d) Shannon diversity index (H-index, y-axis) calculated overtime using standard methods (STD, see Online methods for detailed) in cfDNA- (blue line) and WPB- (red line) derived IS datasets from the indicated MLD patient. Red rectangle highlights the WPB time-points where LV IS were retrieved by SLiM PCR. e, f, Graphs showing the correlation between the STD and the Adjusted H index with VCN in cfDNA and WPB- derived IS datasets from MLD patients. Each dot is the average ± SEM of STD and Adjusted H-index values (y-axis) measured in cfDNA- (e) and WPB- (f) derived IS datasets obtained from MLD patients stratified by their averaged VCN measured in PMBC (x-axis); N = 7 independent patient; g-j) Dot plots showing STD and (G, I panels) and Adjusted (H, J panels) H-index values (y-axis) measured in cfDNA- (G, H panels) and WPB- (i, j panels) derived IS datasets obtained from 7 MLD patients stratified based on the averaged VCN in PMBC (x-axis). Shown is the average ± SEM, statistics performed by unpaired t-test with Welch’s correction. k-m, Shannon diversity index (H-index) calculated overtime in P5 and P7 WAS patient (K, L) and SCID-X1 patients (m).

Source data

Extended Data Fig. 9 Abundance of ISs retrieved from hemophilic dogs treated by liver-directed GT approaches.

a, b, Distribution of the number of genomes (x-axis, Genome bin) per integration site (y-axis) in liver genomic DNA (A) and cfDNA (b). In B, values are averaged among different time points for each dog, N = 7 for M57, N = 8 for 059 and N = 7 for 021, N = ; c) Gene Ontology (GO) analysis of ISs retrieved from cfDNA of in vivo liver-directed GT treated dogs, performed using DAVID EASE software. Red lines indicate the threshold level of significance, p = 0.001.

Extended Data Fig. 10 Estimate of the abundance levels of cfDNA derived-IS.

a, Scatter plots of IS abundance by number of distinct fragments (x-axis) and number of estimated fragments by “sonicLength” (y-axis). Each panel refers to a different study as indicated. Each dot represents an IS identified at different time point and PCR replicate. For SCID-X1 and WAS panel, the triangle shape indicates integrations retrieved from malignant clones; e) Shannon diversity index (H index, y-axis) by bins of ng of cfDNA (x-axis and different color code). Each dot represents a sample, and the shape of each dot refers to a different study. Shown is a boxplots where min and max are boundary whiskers and the box represents the interquartile range (IQR, between the 25th and 75th percentile), the median is represented as bar within the box. Statistics is performed by Kruskall-Wallis test and Mann-Whitney for each pair of bins connected by whisker, for each group N is indicated below each column.

Source data

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Supplementary Tables 1–7.

Reporting Summary

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cesana, D., Calabria, A., Rudilosso, L. et al. Retrieval of vector integration sites from cell-free DNA. Nat Med (2021). https://doi.org/10.1038/s41591-021-01389-4

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing