Pharmacological inhibition of METTL3 impacts specific haematopoietic lineages

cation

of 52,424 transcriptomes passed these quality controls from 4 vehicle-treated (n= 8,113,7,585,6,756,6,656) and 3 STM2457-treated (n= 7,931, 7,731, 7,652) animals and were taken forward for subsequent analysis in an integrated manifold comprising all samples from vehicle and STM2457-treated animals. Counts were log-normalized and 1,105 highly variable genes (HVG) were selected following the method of Satija et al (3) implemented through Scanpy. Counts were scaled and cells scored and allocated to a cell cycle phase by calculating the difference in mean expression between a randomly sampled reference gene set and the given list of cell cycle genes (https://github.com/theislab/scanpy_usage/blob/master/180209_cell_cycle/data/rege v_lab _cell_cycle_genes.txt). The effect of cell cycle was regressed out to reduce impact on visualisation and clustering. Cell-cycle regressed values of 1,105 highly variable genes were used to compute 50 principal components. Correction for the batch effect of sample identity was performed using Harmony(4) and the top 30 Harmony batch-corrected principal components were utilised to calculate a k-nearest neighbour graph with k=15 and a Euclidean distance metric. This was then taken as inputs to visualise the data as a UMAP embedding (5). Data were clustered using the Leiden algorithm in Scanpy, following the implementation of Traag (6). The integrated dataset was mapped onto a previously published reference wild-type droplet-based scRNA-seq dataset of 44,802 mouse lineage negative c-kit+ (LK) cells(7) as previously described (8). Marker gene expression and projection of the dataset onto a reference LK dataset allowed cell type annotation of the 20 clusters ( Figure 1C, Suppl Fig1 D-G). Peripheral clusters containing contaminating mature cell types, and clusters containing fewer than 1,000 cells (<2% of total) were not considered in downstream cluster analysis. As a surrogate of effective METTL3 inhibition, the Scanpy command sc.tl.score_genes() was used to compute a gene score per cell for genes included in the GO term GO:GO:0051607 (defence to virus). Analogous to the calculation of cell cycle score, the gene score calculates the mean expression difference between a given gene list and a randomly sampled gene list.

Pseudobulk differential expression analysis
A pseudobulk method(9) was employed to perform differential gene expression analysis for the dataset as a whole and for individual clusters (10). Low count genes were excluded on a per-sample or per-cluster basis, retaining only genes with an estimated expression above 1 transcript per million in more than 25% of cells (11). Differential expression between treatment conditions was computed using the likelihood ratio test in EdgeR (12,13) with experimental batch used as a covariate in the model. Genes with log2 fold changes >0.5 or <-0.5 and Benjamini-Hochberg (BH) adjusted p values <0.05 were considered significant. Gene set enrichment analysis using gene lists ordered by BH-adjusted p value were used as input for gene set enrichment analysis in EnrichR (14).

Differential cell abundance analysis
Treatment-related differences in cellular abundance were computed using the python package MELD (15). First, a cell-similarity graph was calculated based on Euclidean distance in the shared PCA space of the integrated dataset. Then for each sample a kernel density estimate (KDE) was computed and smoothed over the cell similarity graph with 8 nearest neighbours and smoothing parameter beta = 15. Differential abundance was then calculated as the relative likelihood of observing a cell in either the STM2457-or vehicle-treated condition following cell-wise L1 normalization of the sample KDEs. The mean of relative likelihoods from all pairwise comparisons was used to plot differential abundance on the UMAP embedding ( Figure 1D), while comparison of the KDEs between treatments within experimental batch permitted calculation of statistical significance. An independent t-test was used to calculate significance, corrected for multiple testing with the Benjamini-Hochberg method. Differential abundance with a BH-corrected p value <0.2 was considered significant.

Cell fate probability analysis
Cell fate probability was computed using the python package CellRank (16). Root cells were assigned by calculating the HSC score (17) to infer the cells with the transcriptome most highly enriched for long term repopulating HSC genes. Diffusion pseudotime values were calculated using the scanpy function 'scanpy.tl.dpt' with default settings. For each sample a single-cell transition matrix was computed using the CellRank pseudotime kernel and 'soft' threshold scheme. Terminally differentiated macrostates were identified using the Generalized Perron Cluster Cluster Analysis (GPCCA) estimator, and absorption (fate) probabilities towards these states were then computed. These inferred fate probabilities were then compared for each cluster between treatment conditions using the Seurat 'find markers' function (18) to perform logistic regression and a likelihood ratio test. Experimental batch was included in the logistic regression model. Cell fate probability differences were considered significant if Benjamini-Hochberg adjusted p values were <0.05.

Trajectory-based differential gene expression dynamics
To identify potential gene-expression correlates of altered cell fate, gene expression dynamics were compared along pseudotime using the R package tradeSeq (19). Using cell fate probabilities computed previously, cells were assigned to seven trajectories (Erythroid, Neutrophil, Monocyte, Megakaryocyte, Lymphoid, Basophil and Mast Cell). Each cell was assigned to a trajectory if its cell fate probability for that trajectory was either the maximum or 0.6 times the maximum fate value. This permitted cells to be assigned to more than one trajectory at lower pseudotime values, indicating a more immature progenitor state. The erythroid and neutrophil trajectories were then selected for downstream analysis. To exclude a small number of outliers, cells were excluded from the erythroid trajectory if they were not also assigned to the HSC, MEP or early/late erythroid clusters, and excluded from the neutrophil clusters if they were not also assigned to the HSC, immature, myeloid, neutrophil or monocyte clusters. To ensure sufficient representation along pseudotime in the erythroid trajectory, the pseudotime range 0-0.6 was included. TradeSeq was then used to fit a negative binomial generalized additive model (fitGAM function, 6 knots) with experimental batch included as a covariate in the model. Expression patterns along lineage-specific pseudotime were then compared between treatment conditions using the 'condition test' function which applies a Wald test to the parameters of the fitted model to test a null hypothesis that these parameters do not differ between the two conditions. Genes with log2 fold changes of >1.0 and Benjamini-Hochberg adjusted p values <0.05 were considered significant.
RNA nucleoside quantification by mass spectrometry 8 to 12-week-old female C57BL/6 mice were treated daily for two weeks with either vehicle or 50 mg/kg STM2457 (STORM) and whole bone marrow from these mice was harvested. Subsequent total RNA extraction from cells of all origins was performed using QIAZOL Reagents (QIAGEN), following the manufacturer's protocol. Nucleosides were prepared from total RNA by addition of nuclease digest mix. Each 100 µL volume contained 62.5 units of Benzonase (Sigma Aldrich), 5 units of Antarctic Phosphatase (NEB) and 10 mU/µL of Phosphodiesterase I (PDEI) from Crotalus adamanteus venom (Sigma Aldrich) made up in a buffer composed of 20 mM Tris-HCl (pH 8), 20 mM MgCl2 and 100 mM NaCl. Nucleosides were liberated by digestion at 37ºC overnight. The following morning the sample was cooled to room temperature and 100 µl of ice cold 2x Mass Spec buffer was added (0.1% formic acid containing internal standard). Nucleosides were then quantified by LC-MS using a Sciex 4500 triple quadrupole mass spectrometer attached to either a U3000 or 1290 liquid chromatograph (Thermo Scientific or Agilent, respectively). Nucleosides were separated across a HSS T3 column (2mm x 10mm with 1.8 µm particles, Waters) using an increasing gradient of 2-15% mobile phase B. The mobile phases were 0.1% v/v formic acid in water or acetonitrile for mobile phase A and B, respectively, and the flow was held at 300 µL/min. Nucleoside concentrations were extrapolated from a concentration curve of external standards and expressed as modified nucleoside relative to the total amount of canonical nucleosides. All data in this section were plotted using GraphPad Prism (Version 9).

Statistics and visualisation
Statistical analysis was performed using the python package statsmodels. Unless otherwise specified, t-tests were independent student's t-tests. Homogeneity of variance was first determined using Levene's and Bartlett's tests. Analysis and visualisation were also performed in GraphPad Prism. Visualisations were generated using Matplotlib, Seaborn and ggplot2.

Data accessibility
The scRNA-seq datasets generated and analysed during the current study are available under the GEO accession number: GSE228562.