Cell-type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cells

The functional impact and cellular context of mosaic structural variants (mSVs) in normal tissues is understudied. Utilizing Strand-seq, we sequenced 1,133 single-cell genomes from 19 human donors of increasing age, and discovered the heterogeneous mSV landscapes of hematopoietic stem and progenitor cells. While mSVs are continuously acquired throughout life, expanded subclones in our cohort are confined to individuals >60. Cells already harboring mSVs are more likely to acquire additional somatic structural variants, including megabase-scale segmental aneuploidies. Capitalizing on comprehensive single-cell micrococcal nuclease digestion with sequencing reference data, we conducted high-resolution cell-typing for eight hematopoietic stem and progenitor cells. Clonally expanded mSVs disrupt normal cellular function by dysregulating diverse cellular pathways, and enriching for myeloid progenitors. Our findings underscore the contribution of mSVs to the cellular and molecular phenotypes associated with the aging hematopoietic system, and establish a foundation for deciphering the molecular links between mSVs, aging and disease susceptibility in normal tissues.

Each pyramid plots show the number of cells in each cell-type (Y-axis) required to statistically show the cell-type bias for given size of mSV clones (X-axis).This simulation analysis was performed for the mSV clone size range between 1 to 72 for five different cell-types we used for the cell-type enrichment analysis in Fig. 3c-e.
Figure S10: Cell type enrichment across all unique genotypes at the a) single donor/single genotype and b) cross-donor/cross-genotype levels.a) Extended dotplot of results of the cell-type enrichment analysis for each mSVs identified, showing the CF, enrichment and significance of enrichment in cell type per mSV sub-clone vs. an idealised control.This analysis was extended from the result in Fig. 3c to investigate the effect of whole chromosome losses (LOY, LOX), and the effect of newly arisen mosaicisms within subclones harboring mSVs.For instance, BM63_dup19_and_LOY denotes the LOY subclone originated from the bigger subclone harboring a duplication on chromosome 19.b) Dotplot of combined enrichments for the cells has singletons ('Singleton mSVs'), has LOYs ('LOY events'), or harboring subclonal mSVs that are not LOYs ('non LOY subclonal mSV').c) Comparison of cell-type enrichment between subclones acquired secondary mSVs, and the bigger subclones they originated from.Significant scores of cell-type enrichment (-log10 p.adjust) were visualized as color gradient of dot plots which were calculated by the permutation adjusted Pvalues of binomial test (Methods).

Selecting an optimal timepoint for BrdU incorporation in cultured human HSPCs
In order to obtain a high number of usable, high-quality Strand-seq libraries upon sequencing, a critical step in the experimental setup is to maximize the number of cells which have undergone a single cell division in the presence of BrdU, while ensuring that a second round of DNA synthesis has not begun (as this will incorporate BrdU into the template DNA strands, making cells unusable for Strand-seq).In order to identify this timepoint, we carried out a BrdU timecourse in CD34+ UCB cells, isolating nuclei and profiling their BrdU incorporation by staining with Hoechst at 0h, 24h, 48h and 72h (Supplementary Fig. 37; see 36 for more details on BrdU timecourses).Based on these profiles, 48h showed the closest to 100 % of cells having completed a single cell division (i.e. a Hoechst fluorescence at ~ 50 % of the 0 BrdU control).However, given the heterogeneity of cells within HSPCs, and the possibility of variability among donors, we settled on a more conservative timepoint of 45 hours for BrdU incorporation in our Strand-seq experiments.

Characteristics of de novo mSVs in HSPCs
In 1 out of every 43 cells, regardless of donor age, we identify a singleton mSV.We scrutinized these singleton mSVs by utilizing single-cell tri-channel processing (scTRIP), the underlying principle of which is that each structural variant is characterized and discerned by a specific 'diagnostic footprint' 37 .The footprint encapsulates the co-segregation patterns of rearranged DNA segments, identified by sequencing single strands of each chromosome in each cell in a haplotype-resolved manner, using Strand-seq.
Our analysis revealed that singleton mSVs, but not subclonal mSVs, bear the following characteristics indicative for de novo DNA rearrangement: (1) Amongst the 32 singleton mosaicisms we identify in our HSPC single-cell genomic dataset, 21 (66%) display terminal gains or losses confined to a single haplotype.In contrast, subclonal mSVs lack terminal rearrangements entirely and instead exhibit a significant enrichment for interstitial rearrangements (P=0.0004;Fisher's exact test) when compared to singleton mSVs.These terminal rearrangement footprints of singleton mSVs are depicted, amongst all singleton mSVs events, in Supplemental Data File 1 as well as in Fig. 1c.The frequent occurrence of terminal losses and gains in singleton mSVs suggests that the derivative chromosomes emerging from these rearrangements often lack telomeric stabilization events, which may potentially increase the likelihood that more DNA rearrangements accumulate in these cells (Fig. 1c,f) 38 .
(2) Three singleton mSVs exhibit characteristics of complex chromosomal rearrangements, encompassing mSVs triggered by breakage-fusion-bridge (BFB) cycles 37,39 and an amplification induced by terminal sister chromatid fusion, leading to a sevenfold increase in copy number (Fig. 1c).The latter rearrangement event could stem from a BFB process occurring in the absence of telomere stabilization 38 .
(3) Our observations, as delineated in Fig. 1j, Supplementary Fig. 5 and in the main text, show instances of SCEs occurring in the same cell and haplotype directly at the breakpoints of singleton mSVs (we did not, by comparison, observe significant colocalisation with SCEs for the breakpoints of subclonal mSVs).This indicates a link between SCE formation and DNA rearrangement processes resulting in mSV formation 40,41 .
(4) On average, singleton mSVs are ~17.6 times larger than subclonal mSVs (mean size of 36.9 and 2.1 Megasbasepairs (Mb), respectively; P=0.0009, Wilcoxon rank-sum test; Fig. 1d).Due to their substantial size, these mSVs result in significant autosomal aneuploidy, often tens of megabasepairs in length, which is likely to be detrimental for their clonal expansion given the adverse effects of autosomal aneuploidy in normal cells 42 .
We therefore infer that the singleton mSVs identified in our HSPC dataset predominantly represent de novo mSV formation events.Considering their distinct characteristics, such as substantial regions of autosomal aneuploidy and likely lacking telomeric functionality on the affected homolog, it is plausible that most newly formed mSVs are incapable of reaching considerable subclonal frequencies in normal HSPCs.

Comparison with prior surveys of mosaic copy-number alterations and mSVs
Placing our findings into the larger scope of research on clonal hematopoiesis and its previously known association with mosaic CNAs (i.e.copy-imbalanced mSVs), we find that the subclonal mSVs identified in our study through single-cell genomic sequencing (Strand-seq) are significantly smaller (Supplementary Fig. 31) than CNAs detected in surveys based on utilizing blood cells in bulk (primarily pursued using microarray based hybridization) [22][23][24][25][26][27][28][29][30] .This observation is consistent with the high genomic resolution of Strand-seq, which exceeds bulk approaches for detecting certain subclonal structural variant classes, particularly in the case of sub-Megabase sized somatic variants 37 .By comparison, the subclonal mSVs detected in our study fall into the same size range as for one study, by Mitchell and colleagues 23 , who undertook WGS of clones derived from single HSPCs.These data suggest that the mSVs identified in our study as well as by Mitchell et al. may have escaped detection in prior CNA-focused studies using bulk hybridization-based assays.We note that Mitchell et al. and our study are the only two surveys specifically examining HSPCs, rather than the peripheral blood, leaving the possibility that different cell types are differentially impacted by mSVs.
To bolster our findings relating to common fragile sites (CFSs) and their association with mSVs, we performed additional analyses on the aforementioned prior data on mosaic CNAs [22][23][24][25][26][27][28][29][30] .We permuted breakpoints from previously reported mosaic CNAs against the SCE hotspots identified in our study, identifying a similar trend to that in our own data, whereby mosaic CNA breakpoints demonstrate significant local enrichment at SCE hotspots (Supplementary Fig. 31).This observation suggests that genomic loci in HSPCs, predisposed to mSV formation, also denote fragile regions prone to CNAs in peripheral blood.

Investigation of genes associated with local effect of subclonal inversion in BM65
Amongst the genes our analysis associates with a local effect of the subclonal inversion in BM65, 3 genes -which include AR, the top hit, as well as HDAC8 and MAGEE1 -are inferred to be more active in mSV cells, whereas the remaining 10 genes are inferred to be downregulated in mSV cells.5/13 genes including AR, EDA2R, SLC7A3, HDAC8, and CHIC1 show expression in HSPCs according to previously published bulk RNA-seq data from HSPCs.

Investigation of functional links between dysregulated TFs in the 17p-Del subclone in BM712
Prior reports have tied Srebf1 knockout in murine blood cells to an increase in primitive HSPCs 43 , and our findings closely mirror these data, with 17p-Del cells showing enrichment for both HSCs and CMPs (Fig. 5a).Protein-protein interaction mapping of the dysregulated TFs using STRING 35 (Supplementary Methods) revealed significant functional associations between SREBF1 and six additional TFs, which suggests that SREBF1 cooperates with functionally-related TFs that become dysregulated in association with 17p-Del (P=3.57e-08;Supplementary Fig. 21).Collectively, these data implicate the somatic hemizygous loss of SREBF1 as a putative driver of gene dysregulation in 17p-Del-bearing cells.SREBF1 (also known as SREBP1) has been reported to induce PPARG expression 44,45 .Furthermore, PPARG activates CREB1 expression by binding its promoter 46,47 , in line with the tight functional connections between the TFs identified in the STRING-based PPI analysis, and in support of a possible causal role of SREBF1 deletion in mediating the molecular phenotype seen in BM712.

Potential small deletions at regions of recurrent SCE/mSV formation
We observed marked localized 'fragility' at the FRA3B locus in donor BM762 (Fig. 1j).Although this donor showed similar SCE counts to the other samples (Extended Data Fig. 1), nine single cells exhibit an SCE, an mSV, or both within a 500 kb region of this CFS.One cell shows two SCEs at FRA3B, one on each homolog, while another harbors a terminal deletion originating from the same locus (Fig. 1j, Supplementary Fig. 5).A closer look at FRA3B at sub-Mb resolution reveals potential small deletions (< 200 kb), which are below our mSV discovery resolution 37 , aligning with SCEs in the same cells (Fig. 1j; Supplementary Fig. 5).Manual inspection shows that these putative small deletions arise in 37.5% (3/8) of cells with an SCE versus 1.89% (1/53) without an SCE (P=0.0055;Fisher's exact test).

Analysis of somatic SNVs from the IntoGen Clonal Hematopoiesis Mutation Browser
We analysed data from the IntOGen Clonal Hematopoiesis Mutation Browser

Analysis of somatic SNVs affecting the AR gene in the UK Biobank
The AR gene has a sex-specific molecular biology 50 , and shows highly sex-specifc SNV distributions in the UK Biobank cohort (for example, AR pLoF SNVs are seen in females (Fig. 6d), whereas they are essentially absent in males from the UK Biobank cohort).Therefore, our comprehensive analysis of presumed mosaic SNVs within the AR gene employed a sex-specific approach.We utilized sex as a covariate in the employed multiple linear regression model; in addition, we built two separate models for males and females, respectively.We focused our analysis of the AR gene on females, as the mosaic inversion was identified in a female (BM65).

Singleton CNA discovery in HSPCs in scWGS data
We analyzed intermediate coverage scWGS data generated followed the fragmentation of single cell genomes with MNase (which cuts the human genome in a highly uniform manner) to explore the frequency of large sized copy-number imbalanced SVs (CNAs) in the absence of cell culturing and BrdU application.We performed copy-number segmentation using DNAcopy 32 , with standard parameter settings, to call CNAs among 480 scWGS cells -with each scWGS cell corresponding to a cell sequenced using the scMNase-seq protocol outlined in the Methods section 33,34 .From the 480 single-cell libraries, we find 20 CNAs in 16 cells (3.3%) (see e.g.examples shown in Supplementary Fig. 30).In the case of Strand-seq, we find 32 singleton mSVs in 1133 single-cell libraries, a frequency (2.8%) similar to the singleton CNA frequency seen in these intermediate coverage scWGS data.In conclusion, singleton mSV proportions identified in HSPCs using Strand-seq are in line with findings obtained from scWGS libraries generated without the incorporation of BrdU.Therefore, BrdU incorporation is unlikely to have a significant impact on singleton mSV frequencies seen in HPSCs.

Analysis of data release from a CRISPR in vivo knockout (KO) screen
During the revision of our manuscript, we reanalyzed CRISPR knockout (KO) screens performed in vivo in a mouse model, reported in a manuscript recently posted by Haney and colleagues 31 , to bolster observations made with respect to the functional consequences of candidate genes in HSPCs suggested in our study.These screens were conducted using expanded primary mouse hematopoietic stem and progenitor cells; targeting ~7,000 genes from various functional categories including transcription factors, kinases, phosphatases, drug targets, and genes linked to apoptosis and cancer 31 .
The KO screen under consideration included the Nf1 gene but did not encompass knockout of Srebf1 or activation of the Ar gene.Analysis of this screen data release (see www.hematopoiesis crispr screens.com)suggests that Nf1 KO promotes HSPC differentiation into mature cells in vivo -and preferably those of the myeloid lineage (P<0.00001,effect score=5.6, for myeloid cell enrichment).Remarkably, Nf1 is amongst the most myeloid-enriched genes detected based on this data release, ranking 4th among ~7,000 screened target genes based on P-value (Supplementary Fig. 32).These KO screen data hence further corroborate the data from our Strand-seq experiments (Fig. 5a), supporting our conclusions that Nf1 disruption may bias CD34+ cells towards myeloid lineages.

Interplay between mSVs and clonal hematopoiesis.
Another outstanding question regarding mSVs and CH pertains to the potential synergy between mSVs and SNV mosaicism in association with CH 51,52 .Though not our primary focus, we analyzed surplus material from 6 of the 19 donors for common driver SNVs tied to clonal hematopoiesis using high coverage gene panel sequencing (Supplementary Table 1).Only one out of these 6 donors (BM762) displayed a detectable SNV (TET2 p.W1291C, CF=39.4%) and also had a low frequency (CF=3.2%)X chromosome loss.In contrast, BM70, which had multiple mSVs, showed no common driver SNVs.These data are consistent with a recent study in 628,388 blood donors showing lack of association between common driver SNVs in CH and mosaic CNAs after adjusting for age, sex, and smoking status 53 , and suggest that CH driver SNVs and mSVs emerge as separate events in the blood compartment.
Prior reports have associated mosaic CNAs in blood with CH 22,28,54,55 an age-related phenomenon where HSPCs contribute to genetically distinct blood cell subpopulations.Our findings imply that subclonal mSVs, seen in 36% of donors over 60 years, commonly impact HSPC function by affecting diverse genomic loci, including genes with known or suspected roles in clonal hematopoiesis 48 .The prevalence of this class of mosaicism implies that the cumulative phenotypic impact of mSVs on specific tissues or organs could potentially parallel that of SNVs, a finding that underscores the necessity for future studies in larger cohorts.

Figure S9 :
Figure S9: Simulation analysis of cell-type bias for different subclone sizes.Each pyramid plots show the number of cells in each cell-type (Y-axis) required to statistically show the cell-type bias for given size of mSV clones (X-axis).This simulation analysis was performed for the mSV clone size range between 1 to 72 for five different cell-types we used for the cell-type enrichment analysis in Fig.3c-e.