A genetic disorder reveals a hematopoietic stem cell regulatory network co-opted in leukemia

Voit, Richard A.; Tao, Liming; Yu, Fulong; Cato, Liam D.; Cohen, Blake; Fleming, Travis J.; Antoszewski, Mateusz; Liao, Xiaotian; Fiorini, Claudia; Nandakumar, Satish K.; Wahlster, Lara; Teichert, Kristian; Regev, Aviv; Sankaran, Vijay G.

doi:10.1038/s41590-022-01370-4

Download PDF

Article
Open access
Published: 15 December 2022

A genetic disorder reveals a hematopoietic stem cell regulatory network co-opted in leukemia

Richard A. Voit ORCID: orcid.org/0000-0002-6790-8641^1,2,3^na1,
Liming Tao³^na1^nAff7,
Fulong Yu ORCID: orcid.org/0000-0002-6100-8300^1,2,3^na1,
Liam D. Cato^1,2,3,
Blake Cohen^1,2,3,
Travis J. Fleming^1,2,3,
Mateusz Antoszewski ORCID: orcid.org/0000-0002-7651-0547^1,2,3,
Xiaotian Liao^1,2,3,
Claudia Fiorini^1,2,3,
Satish K. Nandakumar^1,2,3^nAff8,
Lara Wahlster^1,2,3,
Kristian Teichert^1,2,3,
Aviv Regev^3,4,5^nAff7 &
…
Vijay G. Sankaran ORCID: orcid.org/0000-0003-0044-443X^1,2,3,6

Nature Immunology volume 24, pages 69–83 (2023)Cite this article

13k Accesses
13 Citations
52 Altmetric
Metrics details

Subjects

Abstract

The molecular regulation of human hematopoietic stem cell (HSC) maintenance is therapeutically important, but limitations in experimental systems and interspecies variation have constrained our knowledge of this process. Here, we have studied a rare genetic disorder due to MECOM haploinsufficiency, characterized by an early-onset absence of HSCs in vivo. By generating a faithful model of this disorder in primary human HSCs and coupling functional studies with integrative single-cell genomic analyses, we uncover a key transcriptional network involving hundreds of genes that is required for HSC maintenance. Through our analyses, we nominate cooperating transcriptional regulators and identify how MECOM prevents the CTCF-dependent genome reorganization that occurs as HSCs differentiate. We show that this transcriptional network is co-opted in high-risk leukemias, thereby enabling these cancers to acquire stem cell properties. Collectively, we illuminate a regulatory network necessary for HSC self-renewal through the study of a rare experiment of nature.

Mapping genotypes to chromatin accessibility profiles in single cells

Article 08 May 2024

Tracking single-cell evolution using clock-like chromatin accessibility loci

Article Open access 09 May 2024

Single-cell mtDNA dynamics in tumors is driven by coregulation of nuclear and mitochondrial genomes

Article Open access 13 May 2024

Main

HSCs lie at the apex of the hierarchical process of hematopoiesis and rely on transcriptional regulators to coordinate self-renewal and lineage commitment to enable effective and continuous blood cell production¹. Perturbations of HSC maintenance or differentiation result in a spectrum of hematopoietic consequences, ranging from bone marrow failure to leukemia². Despite the importance of HSCs in human health and the therapeutic opportunities that could arise from being able to better manipulate these cells, the precise regulatory networks that maintain these cells remain poorly understood.

Recently, loss-of-function mutations in myelodysplastic syndrome (MDS) and ecotropic virus integration site-1 (EVI1) complex locus (MECOM) have been identified that lead to a severe neonatal bone marrow failure syndrome^3,4,5. Haploinsufficiency of MECOM leads to near complete loss of HSCs within the first months of life, suggesting an important and dosage-dependent role in early hematopoiesis. In mice, different Mecom isoforms have distinct hematopoietic functions^6,7,8,9,10, but the ability of Mecom haploinsufficient mice to maintain sufficient hematopoietic output stands in sharp contrast to the profound and highly penetrant HSC loss observed in patients with MECOM haploinsufficiency, irrespective of which isoform is impacted. This interspecies variation suggests that the clinical observations in MECOM haploinsufficiency may provide a unique opportunity to better understand human HSC regulation.

MECOM overexpression has been reported in ~10% of adult and pediatric acute myeloid leukemias (AMLs) and is associated with a particularly poor prognosis¹¹. Despite the potential mechanisms of MECOM activity that have been suggested from studies in AML cell lines^12,13,14,15, the holistic functions of MECOM that enable effective human HSC maintenance and drive leukemia remain enigmatic. Here, inspired by in vivo observations from patients who are MECOM haploinsufficient, we have modeled this disorder by genome editing of primary human CD34⁺ hematopoietic stem and progenitor cells (HSPCs). Through integrative single-cell genomic analyses in this model, we define fundamental transcriptional regulatory circuits necessary for human HSC maintenance. Finally, we demonstrate that this same HSC transcriptional regulatory network is co-opted in AML, thereby conferring stem cell features and a poor prognosis.

Results

MECOM loss impairs HSC function in vitro and in vivo

Monoallelic mutations spanning the coding sequence of MECOM have been reported in at least 31 individuals with severe, early-onset neonatal bone marrow failure (Fig. 1a, Supplementary Table 1 and Extended Data Fig. 1a,b)^3,4,5. The paucity of HSCs associated with MECOM haploinsufficiency prevents the mechanistic study of primary patient samples⁴, so we sought to develop a model to study MECOM haploinsufficiency in primary human cells by disrupting MECOM via CRISPR editing in CD34⁺ HSPCs purified from umbilical cord blood (UCB) samples of healthy newborns (Fig. 1b and Extended Data Fig. 1a,c,d). We achieved editing at >80% of alleles in the bulk CD34⁺ population, but the subpopulation of CD34⁺CD45RA⁻CD90⁺CD133⁺EPCR⁺ITGA3⁺ phenotypic long-term HSCs (LT-HSCs)¹⁶ displayed 48% editing of MECOM alleles (Fig. 1c), allowing for predominantly heterozygous edits in the LT-HSC compartment. Genotyping of single LT-HSCs following MECOM perturbation confirmed that 70% were heterozygous for MECOM edits (Fig. 1d), although this is likely an underestimation given that allelic dropout is common in single-cell genotyping¹⁷. These edits were transcribed to messenger RNA, but reduced transcript levels, possibly due to nonsense-mediated decay¹⁸ (Extended Data Fig. 1e–g).

**Fig. 1: Generating a faithful model of *MECOM* haploinsufficiency and HSC loss.**

MECOM-edited human HSPCs underwent 1.9-fold higher expansion over 5 d in culture conditions that promote HSC maintenance¹⁹ (Extended Data Fig. 1h,i), consistent with previous observations of differentiation and expansion of HSCs after MECOM loss⁸. MECOM perturbation was associated with a decrease in the proportion of bulk cells in G0/G1 on day 5, but no difference in the cell cycle states of HSCs (Extended Data Fig. 1j). Most HSCs remained in G0/G1 and the majority of LT-HSCs had G0/G1 transcriptional signatures (Extended Data Fig. 1k), as previously reported²⁰. MECOM editing resulted in more frequent cell divisions (Extended Data Fig. 1l) and a significant reduction in the absolute number of LT-HSCs (Extended Data Fig. 1m), with a 3.7-fold reduction by day 10 after editing (Fig. 1e,f). We observed a 6.4-fold reduction in multipotent colony-forming unit (c.f.u.) granulocyte erythroid macrophage megakaryocyte (GEMM) colonies and a 3.8-fold reduction in bipotent c.f.u. granulocyte macrophage (GM) colonies, along with increases in more differentiated unipotential c.f.u. granulocyte (G) and c.f.u. macrophage (M) colonies (Fig. 1g). There was a similar loss of multipotent and bipotent progenitor colonies derived from adult HSPCs following MECOM editing (Extended Data Fig. 1n), validating the importance of this factor across developmental stages.

Next, we performed non-irradiated xenotransplantation of edited HSPCs into immunodeficient and Kit-mutant (Methods) mice to assess how MECOM loss impacts human HSCs in vivo²¹. MECOM-edited HSPCs engrafted in only half of the transplanted animals with significantly lower human chimerism in the peripheral blood and bone marrow compared to AAVS1-edited controls (Fig. 1h). When we compared the edited allele frequency of cells collected from the bone marrow at 16 weeks with the cells before transplant, we found a fivefold enrichment of the unmodified MECOM allele (Fig. 1i and Extended Data Fig. 1o,p), consistent with selection occurring against MECOM-edited HSCs. In the mouse bone marrow, there was a 2.7-fold reduction in human CD34⁺ HSPCs in the MECOM-edited samples, but no detectable differences in engrafted lymphoid, erythroid, megakaryocytic or monocytic lineages (Fig. 1j). Similarly, we found significant reduction in human chimerism following primary xenotransplantation of adult HSPCs following MECOM editing (Extended Data Fig. 1q). When we performed secondary xenotransplantation of UCB HSPCs, we observed moderate secondary engraftment of AAVS1-edited cells (two of five mice), but no detectable secondary engraftment of MECOM-edited cells (zero of eight mice). To more sensitively assay for the presence of human cells in the secondary transplant recipients, we PCR-amplified human MECOM from all bone marrow samples. Sequencing revealed 100% wild-type MECOM in seven of eight secondary recipients and 95% in the remaining mouse (Extended Data Fig. 1r). This near complete absence of MECOM edits in serially repopulating LT-HSCs is consistent with the profound HSC loss observed in patients with MECOM haploinsufficiency. In summary, our model of MECOM haploinsufficiency reveals that MECOM is required for maintenance of LT-HSC in vitro and in vivo and enables us to capture LT-HSCs before their complete loss to directly study MECOM function.

Single-cell profiling reveals HSC loss after MECOM disruption

Having established a primary human HSC model of MECOM haploinsufficiency, we sought to gain insights into the transcriptional circuitry required for human HSC maintenance by single-cell RNA sequencing (scRNA-seq) before complete HSC loss. Three days after AAVS1 or MECOM perturbation, we sorted CD34⁺CD45RA⁻CD90⁺ HSPCs and performed scRNA-seq using the 10x Genomics platform. We used Celltypist²² to delineate cellular identity based on lineage-specific signatures and identified 11 cell clusters (Fig. 2a), of which only the earliest HSC cluster was significantly depleted after MECOM editing (Fig. 2b,c and Extended Data Fig. 2a). Next we examined cells expressing an HSC molecular signature (CD34, HLF and CRHBP)²³, which is found in a rare subpopulation representing only 0.6% of 263,828 UCB cells from the Immune Cell Atlas (Extended Data Fig. 2b,c). MECOM perturbation led to a significant loss of cells expressing the HSC signature (Fig. 2d,e and Extended Data Fig. 2d). To examine the gene expression changes in this population of transcriptional LT-HSCs, we again edited UCB CD34⁺ HSPCs and sorted for phenotypic CD34⁺CD45RA⁻CD90⁺CD133⁺EPCR⁺ITGA3⁺ LT-HSCs. We found that our sorted phenotypic LT-HSCs are highly enriched for the HSC signature (Fig. 2f and Extended Data Fig. 2e–g). Next, we compared the transcriptomes of 5,935 MECOM-edited and 4,291 AAVS1-edited phenotypic LT-HSCs. Following our stringent immunophenotypic sorting strategy, MECOM-edited LT-HSCs colocalized with AAVS1-edited cells (Fig. 2g). This confirmed that our sorting strategy would allow us to directly compare developmentally stage-matched cells before they are completely lost, to uncover transcriptional changes that underlie the profound depletion of LT-HSCs after MECOM editing.

**Fig. 2: Loss of transcriptional HSCs after MECOM perturbation.**

As an orthogonal approach to simultaneously profile the precise genomic editing outcome and transcriptional profile of LT-HSCs, we employed genome and transcriptome sequencing (G&T-seq)²⁴. MECOM heterozygous cells (Fig. 1d) colocalize with AAVS1-edited cells, as well as the non-genotyped cells examined with the 10x Genomics method (Fig. 2h). These results reveal a high degree of similarity in the high-dimensional transcriptomic analysis of LT-HSCs following MECOM perturbation, as expected given the stringent phenotypic sorting strategy we employed before scRNA-seq analysis. Furthermore, these results suggest that the profound functional consequences of MECOM loss are due to coordinated expression changes in a select group of genes.

MECOM loss in LT-HSCs elucidates a dysregulated gene network

To compare individual gene expression in single LT-HSCs following AAVS1 or MECOM editing, we used model-based analysis of single-cell transcriptomes (MAST)²⁵ (Fig. 3a and Extended Data Fig. 3a,b). Despite the high-dimensional transcriptional similarity in the LT-HSCs, we detected significant downregulation of a group of 322 genes following MECOM editing that we refer to as ‘MECOM down’ genes (Supplementary Table 2), which includes factors with previously described functions in HSC maintenance (Fig. 3a,b). We then used MAST to identify 402 genes that are significantly upregulated after MECOM editing, which we refer to as the ‘MECOM up’ gene set (Supplementary Table 2), which includes key factors expressed during hematopoietic differentiation (Fig. 3a,c). To validate these subtle differences, we performed random permutation analysis and did not detect any differentially expressed genes (Extended Data Fig. 3c,d).

**Fig. 3: Delineation of a MECOM regulatory network in LT-HSCs.**

To minimize the potential confounding influence of allelic dropout, we performed pseudobulk analysis of gene expression changes following MECOM perturbation²⁶. We observed that the MECOM down and up gene sets again represented the most differentially expressed genes with larger expression differences compared to the single-cell analysis (Fig. 3d). To validate that the gene expression differences that we observed in the population of immunophenotypic LT-HSCs accurately represented gene expression changes in molecularly defined LT-HSCs, we examined expression of each differentially expressed gene in the subset of cells with robust expression of the HSC signature. There was significant correlation of gene expression changes in this subpopulation of transcriptionally defined LT-HSCs compared to the total population of immunophenotypic LT-HSCs, demonstrating that MECOM network genes were indeed differentially expressed in cells with a stringent molecular HSC signature (Extended Data Fig. 3e). As further validation of this gene signature, we examined differential gene expression in bulk phenotypic LT-HSCs at days 3, 7 and 10 after MECOM perturbation and detected significant and consistent changes of the MECOM down and MECOM up gene sets at all time points (Fig. 3e).

Next, we sought to uncover differential gene expression patterns between AAVS1- and MECOM-edited HSPCs in each of the 11 hematopoietic cell clusters identified in our initial scRNA-seq profiling of CD34⁺CD45RA⁻CD90⁺ cells. The MECOM down genes were significantly depleted from the HSC and cycling multipotent progenitor clusters, but not in other early progenitor populations, including megakaryocyte-erythroid progenitors, megakaryocyte-erythroid-mast cell progenitors and common myeloid progenitors. Early megakaryocytes and mast cell progenitors also had differential expression of MECOM down genes (Extended Data Fig. 3f). Combining these data with the observed cell numbers in each cell cluster after MECOM perturbation revealed that only the HSC cluster was depleted (Extended Data Fig. 2a), providing further support for the notion that the MECOM down gene set is crucial for HSC maintenance. Gene set enrichment analysis (GSEA) for the MECOM up genes in each cluster revealed that these genes were significantly enriched in 7 out of the 11 cell clusters (Extended Data Fig. 3f), suggesting that MECOM up genes are expressed in cells undergoing differentiation into multiple lineages. We then evaluated the expression of the MECOM down and up genes during normal hematopoiesis by comparing the enrichment of the gene sets in 20 distinct hematopoietic cell lineages²⁷. Similar to MECOM itself (Fig. 3f), the MECOM down genes are collectively more highly expressed in HSCs and early progenitors (Fig. 3g). Conversely, the MECOM up genes are turned on during hematopoietic differentiation and are more highly expressed in differentiated cells of various lineages (Fig. 3h). Collectively, these analyses reveal that MECOM loss in LT-HSCs leads to functionally significant transcriptional dysregulation in genes that are fundamental to HSC maintenance and differentiation.

Increased MECOM expression rescues HSC dysregulation

To confirm that the functional and transcriptional impacts on LT-HSCs are due specifically to reduced MECOM levels, we sought to rescue the phenotype by lentiviral MECOM expression in HSCs after CRISPR editing (Fig. 4a). To avoid unintended CRISPR disruption of the virally encoded MECOM complementary DNA, we introduced wobble mutations in the single guide RNA (sgRNA) binding site in the cDNA (Extended Data Fig. 4a,b). Infection of MECOM-edited HSPCs with MECOM virus led to supraphysiologic levels of MECOM expression (Fig. 4b), which was sufficient to rescue the LT-HSC loss observed after MECOM editing (Fig. 4c,d and Extended Data Fig. 4c,d). Expression of the shorter MECOM isoform EVI1 resulted in a higher percentage of LT-HSCs on day 6, but this increase was blunted by endogenous MECOM editing. Expression of the MDS isoform did not result in rescue of LT-HSCs (Extended Data Fig. 4e). Green fluorescent protein (GFP) is coexpressed with MECOM and we observed a significantly higher ratio of GFP expression in LT-HSCs compared to the bulk population (Fig. 4e), confirming that increased MECOM expression favored LT-HSC preservation. Increased MECOM expression also rescued the loss of multipotent and bipotent progenitor colonies after MECOM editing (Fig. 4f). Together, these data reveal that restoration of the full-length MECOM isoform is sufficient to overcome the functional loss of LT-HSCs caused by endogenous MECOM perturbation.

**Fig. 4: MECOM rescue of functional and transcriptional changes in HSCs.**

Next, we performed RNA-seq of phenotypic LT-HSCs after MECOM editing and rescue. After MECOM perturbation alone, we observed significantly lower expression of the MECOM down gene set compared to a subset of randomly selected genes (Fig. 4g). Similarly, GSEA revealed significant depletion of the MECOM down genes (Fig. 4h). Following rescue by increasing MECOM expression, the MECOM down genes were significantly upregulated (Fig. 4i,j and Supplementary Table 3). While increasing MECOM expression can rescue the impact of MECOM perturbation in short-term in vitro contexts, due to the risk of leukemic transformation driven by constitutive MECOM overexpression¹², it is challenging to assess this rescue of HSC function in vivo.

We did not observe upregulation or subsequent rescue of the MECOM up genes in bulk following MECOM perturbation and overexpression (Extended Data Fig. 4g,h). The MECOM up gene set contains factors important for hematopoietic differentiation. Lentiviral infection may subtly alter this process. Alternatively, the supraphysiologic expression that we obtained may not allow effective regulation of the MECOM up genes. Regardless, these data collectively show that the loss of LT-HSCs after MECOM editing can be rescued with increased MECOM expression and is accompanied by restoration of the MECOM down gene set.

Defining the HSC cis-regulatory network mediated by MECOM

We next sought to define the cis-regulatory elements (cisREs) that control expression of the MECOM network, which underlies HSC self-renewal. To do so, we developed HemeMap, a computational framework to identify putative cisREs and cell-type-specific cisRE-gene interactions by integrating multiomic data from 18 hematopoietic cell populations (Fig. 5a and Extended Data Fig. 5a,b)^{28,29,30,31,32}. We calculated HemeMap scores based on chromatin accessibility for each cisRE-gene interaction in HSCs and found that the scores were correlated with gene expression (Extended Data Fig. 5c). There was significant overlap of the predicted enhancer–gene pairings from HemeMap with chromatin looping data in hematopoietic progenitors²⁹ and predicted regulatory elements in HSPCs³³. Our cisREs had a strong H3K4me1 signal and DNase hypersensitivity without an H3K27me3 signal, consistent with their likely identities as enhancer elements (Extended Data Fig. 5d). All of the interactions with a significant HemeMap score in HSCs were selected to construct an HSC-specific regulatory network (Extended Data Fig. 5e).

**Fig. 5: Defining the HSC *cis*-regulatory network coordinated by MECOM.**

To identify cooperating transcription factors (TFs) driving expression of the MECOM network genes in HSCs, we performed unbiased motif discovery within the MECOM network cisREs and found six significantly enriched motifs: ETS, RUNX, JUN, KLF, CTCF and GATA (Fig. 5b). The ETS family motif (AGGAAGT) was most highly enriched and can be bound by several hematopoietic TFs, including FLI1, ERG, ETV2 and ETV6 (ref. ³⁴). Additionally, the experimentally determined binding motif of EVI1 in AML¹³, is a near perfect mimic of our nominated ETS motif, suggesting that many of these cisREs may be directly occupied by MECOM (Fig. 5c). Notably, HemeMap scores were significantly higher in cisREs with ETS motifs compared to those without (Extended Data Fig. 5f).

Next, we performed digital genomic footprinting analyses to predict TF occupancy in HSCs (Supplementary Tables 4 and 5 and Fig. 5d). We observed a significant co-occurrence of footprints across TF pairs, with a particular enrichment of overlap between ETS with RUNX, JUN and GATA footprints, suggesting cooperativity between these TFs (Fig. 5e and Extended Data Fig. 5g,h). We evaluated specific TF binding to the MECOM network cisREs by integrating TF ChIP-seq data from human HSPCs³⁵. Consistent with the footprinting analysis, we found highly enriched TF occupancy of the ETS family member FLI1, as well as RUNX1 and GATA2 in HSPCs (Fig. 5f). These ChIP-seq data are derived from bulk CD34⁺ HSPCs, so while they provide a general indication of TF binding in HSPCs, there may be important differences in TF binding in LT-HSCs. As further evidence of TF cooperativity, we found that FLI1, RUNX1 and GATA2 have significant co-occupancy at the MECOM-regulated gene cisREs in HSPCs (Fig. 5g). Additionally, we examined EVI1 binding data from overexpression studies¹⁴ and found significant overlap with cisREs that contain ETS footprints (Extended Data Fig. 5i). These analyses from heterogenous populations of hematopoietic progenitors provide support for our model of cooperativity between MECOM and other hematopoietic TFs (these datasets are summarized in Supplementary Table 6).

Dynamic CTCF binding represses MECOM down genes

In addition to the enrichment of HSC TF motifs, the MECOM network cisREs showed CTCF motif enrichment. CTCF is a regulator of three-dimensional genome organization and acts by anchoring cohesin-based chromatin loops to insulate genomic regions of self-interaction³⁶. Recently, CTCF has been implicated in regulating HSC differentiation by altering looping to silence key stemness genes³⁷, while also cooperating with lineage-specific TFs during hematopoietic differentiation³⁸. Therefore, we hypothesized that CTCF plays a role in mediating the differential expression of MECOM down genes following loss of MECOM.

We uncovered CTCF footprints in bulk CD34⁺ HSPCs (Fig. 6a) and significant co-occurrence of CTCF with ETS, RUNX, JUN and KLF footprints in the cisREs of MECOM down genes (Fig. 6b). On average, the distance between ETS and CTCF footprints in our cisREs was 36 base pairs (Extended Data Fig. 6a). We observed significant CTCF binding to the nominated cisREs (Fig. 6c). We found CTCF occupancy of nominated footprints was highly conserved across erythroid cells, T cells, B cells and monocytes (Fig. 6d and Extended Data Fig. 6b). In HSPCs, CTCF binding was measured in bulk CD34⁺ cells, which contain LT-HSCs and numerous other progenitors. Despite the heterogeneity of the HSPC compartment, terminally differentiated cells showed significantly stronger CTCF signals compared to the CD34⁺ HSPCs and chromatin accessibility at those loci decreased during hematopoietic differentiation (Extended Data Fig. 6c–e). Although these analyses do not allow for a sensitive description of CTCF binding throughout the many intermediate stages of hematopoietic differentiation, they reveal increased binding of CTCF to the cisREs of MECOM down genes in differentiated cells in comparison with the heterogenous population of CD34⁺ HSPCs.

**Fig. 6: Dynamic CTCF binding facilitates repression of MECOM down genes as HSCs undergo differentiation.**

To gain mechanistic insights into the role of CTCF in the MECOM-driven regulation of HSC quiescence, we analyzed an overall set of 7,358 chromatin loops from studies of HSCs³⁷, as well as a subset of loops whose anchors colocalized with MECOM network cisREs. These loops were elucidated in the OCI-AML2 cell line, which was previously used to extrapolate differential looping as LT-HSCs exit quiescence³⁷. In total, 448 chromatin interactions were identified for MECOM down genes and the loop anchors showed a strong enrichment of CTCF footprints (Extended Data Fig. 6f). Next, we performed aggregate peak analysis to compare the genomic organization of the MECOM down genes upon exit from quiescence by integrating Low-C chromatin interaction data from phenotypic LT-HSCs and short-term (ST)-HSCs. Using all 7,358 common chromatin loops, there was significant enrichment of chromatin interaction apices in both LT-HSCs and ST-HSCs, as previously observed³⁷, but there was no significant difference between the populations. Analysis of the chromatin loops of CTCF footprint-containing cisREs associated with MECOM down genes revealed significantly stronger chromatin interactions in ST-HSCs compared to LT-HSCs. There was no chromatin interaction difference in MECOM down genes that lacked association with a CTCF footprint-containing cisRE (Fig. 6e,f). These observations are consistent with the concept that CTCF activity at the cisREs of MECOM down genes induces tighter chromatin looping and restricts gene expression, promoting differentiation of HSCs, as exemplified by the increased chromatin looping at MLLT3 and MEF2C concordant with their silencing as LT-HSCs differentiate (Fig. 6g,h).

To validate their functional interaction, we performed simultaneous MECOM and CTCF perturbation in primary human HSPCs (Extended Data Fig. 6g) and observed that concurrent CTCF perturbation was sufficient to rescue the loss of LT-HSCs (Fig. 6i) and prevent the increased expansion of HSPCs caused by MECOM perturbation (Extended Data Fig. 6h). GSEA revealed significant depletion of MECOM down genes and significant upregulation of MECOM up genes following MECOM compared to AAVS1 editing, corroborating our observations from single cells (Extended Data Fig. 6i). When compared to the AAVS1 sample, CTCF editing alone resulted in significant enrichment of the MECOM down gene set, but no significant changes in the MECOM up genes (Extended Data Fig. 6j). Dual editing of MECOM and CTCF resulted in significant upregulation of MECOM down genes (Fig. 6j) and significant depletion of MECOM up genes (Fig. 6k). Upon dual perturbation, there was significantly greater rescue of MECOM down genes that are associated with cisREs containing CTCF binding motifs compared to those without CTCF motifs (Extended Data Fig. 6k). These data demonstrate that MECOM plays a key role in activating the expression of genes critical for HSC maintenance, which are then subject to genomic reorganization by CTCF upon differentiation.

The MECOM gene network is hijacked in high-risk AMLs

Having elucidated a fundamental transcriptional regulatory network necessary for HSC maintenance, we wondered to what extent this network may be relevant to leukemia. First, we combined 165 primary adult AML samples from The Cancer Genome Atlas (TCGA)³⁹ with 430 adult samples from the BEAT AML dataset⁴⁰ into an adult AML cohort (Fig. 7a). We found significant enrichment of the MECOM down gene set in clinical samples with high MECOM expression levels (Extended Data Fig. 7a). We analyzed this adult AML cohort in parallel with 440 pediatric AML samples from the TARGET AML dataset⁴¹ (Fig. 7b). Using optimal thresholding to stratify patients by MECOM expression, we observed a survival disadvantage in both adult and pediatric AML (Fig. 7c), consistent with previous reports^42,43.

**Fig. 7: The MECOM down gene network is hijacked in high-risk adult and pediatric AML.**

Given the importance of the MECOM down gene network in HSC maintenance, we sought to determine whether expression of this network was associated with survival in AML. Using GSEA, we determined whether individual patient AML samples had enrichment or depletion of the MECOM down gene set (Extended Data Fig. 7b–d). Enrichment of the MECOM down gene set was associated with worse survival in both the adult (hazard ratio (HR) 1.52 (95% CI 1.13–2.04), P = 0.005) and pediatric AML cohorts (HR 1.96 (95% CI 1.38–2.69), P = 7.4 × 10⁻⁵; Fig. 7d).

We then generated a rank order list based on the normalized enrichment score (NES) for each sample to allow for further stratification based on the degree of network enrichment. We used optimal thresholding to stratify patients based on NES and found significantly worse overall survival in patients with high MECOM NES compared to patients with low NES in both adult (HR 1.58 (95% CI 1.18–2.11), P = 0.0016) and pediatric (HR 2.08 (95% CI 1.49–2.89), P = 3.6 × 10⁻⁵) patients (Fig. 7e).

Stratification based on clinical risk group or LSC17 score⁴⁴ had significant associations with survival (Fig. 7f,g) and we sought to determine whether MECOM network enrichment identified the same subgroup of high-risk patients. We observed that 48% of adult AML and 51% of pediatric AML with adverse clinical risk features also had MECOM network enrichment. Similarly, we found that 51% of adult AML and 55% of pediatric AML with high LSC17 scores had MECOM network enrichment (Extended Data Fig. 7e,f). Thus, MECOM network enrichment identifies a largely unique subset of patients compared to currently available risk stratification tools.

Next, we investigated whether the addition of MECOM network enrichment to the clinical risk group or LSC17 score resulted in improved risk stratification. In the adult AML cohort, MECOM down gene set enrichment was independently associated with mortality particularly in patients with intermediate risk AML (P = 0.005) (Fig. 7h) and high LSC17 score (P = 0.01) (Fig. 7i). The contribution of MECOM network enrichment to clinical risk grouping was even more striking in the pediatric AML cohort in which MECOM network enrichment was significantly associated with mortality independent of clinical risk group (P = 0.008) (Fig. 7h) and, separately, independent of LSC17 score (P = 0.01) (Fig. 7i). These results reveal that stratification of primary AML patient samples by MECOM down gene enrichment can be integrated with currently available prognostic tools to improve risk stratification for overall survival in both adult and pediatric AML. Additionally, MECOM down network enrichment was significantly associated with lower event-free survival, independent of clinical risk group and LSC17 score in pediatric AML (P = 1.72 × 10⁻⁶ and P = 5.62 × 10⁻⁵, respectively) (Extended Data Fig. 7g,k).

Finally, we calculated marginal HRs to evaluate the degree of MECOM expression or MECOM network NES with overall survival. We observed a modest effect of incremental increases of MECOM expression on the marginal HR of survival (Fig. 7j) and a much more significant effect of incremental increases in MECOM NES (Fig. 7k). Together, these data reveal that the MECOM down network is highly enriched in a subset of adult and pediatric AMLs with poor prognosis and can be integrated with currently available prognostic tools to improve risk stratification for patients with AML.

Validation of MECOM addiction in a subset of high-risk AMLs

Given the prognostic significance of MECOM network enrichment in AML, we sought to further study this network in AML cell lines. We examined 44 AML cell lines from the Cancer Cell Line Encyclopedia (CCLE) and stratified them based on MECOM expression (Extended Data Fig. 8a). We compared gene expression in MECOM-high compared to MECOM-low AML cell lines and found significant enrichment of MECOM down genes and depletion of MECOM up genes. (Fig. 8a). Comparison of gene expression in individual MECOM-high AML cell lines to the average expression in MECOM-low AML lines revealed highly significant MECOM network enrichment in MUTZ-3, F36P, HNT34 and OCI-AML4 cells (Extended Data Fig. 8b). We compared CRISPR dependencies of MECOM-high and MECOM-low AML cell lines and observed differential essentiality of RUNX1, consistent with our findings of potential cooperativity between RUNX1 and MECOM in regulating the HSC network genes (Extended Data Fig. 8c).

**Fig. 8: The MECOM gene regulatory network is indispensable in AML.**

To validate the role of the MECOM network in an otherwise isogenic AML background, we performed CRISPR editing of MECOM in the MUTZ-3 AML cell line^45,46. MUTZ-3 cells maintain a population of primitive CD34⁺ blasts in culture that can self-renew or differentiate into CD14⁺ monocytes (Fig. 8b and Extended Data Fig. 8d). MECOM editing in MUTZ-3 cells (Fig. 8c) resulted in significant reduction in MECOM expression level (Fig. 8d) and a loss of primitive CD34⁺ cells (Fig. 8e). Loss of progenitors after MECOM perturbation was accompanied by enrichment of edited MECOM alleles, as MECOM perturbed cells underwent greater expansion (Extended Data Fig. 8e). Maintenance of CD34⁺ cells was restored by lentiviral MECOM expression, but not lentiviral expression of the EVI1 isoform (Fig. 8f), consistent with our rescue data from primary HSPCs (Extended Data Fig. 4e). RNA-seq of CD34⁺ progenitor MUTZ-3 cells after MECOM editing revealed significant depletion of MECOM down genes and significant enrichment of MECOM up genes (Fig. 8g, Extended Data Fig. 8f and Supplementary Table 7), Additionally, MECOM perturbation in HNT34 AML cells led to significant depletion of MECOM down genes and significant enrichment of MECOM up genes (Fig. 8h), revealing the conservation of this gene regulatory network in multiple AML contexts.

Because of the functional interaction between MECOM and CTCF in the transcriptional control of LT-HSC quiescence, we reasoned that the loss of MUTZ-3 progenitors following MECOM perturbation may also be dependent on CTCF. We performed dual CRISPR editing of MECOM and CTCF and observed partial rescue of the loss of CD34⁺ progenitors induced by MECOM perturbation alone (Fig. 8i). The more modest rescue of progenitors in the MUTZ-3 system compared to the LT-HSC model (Fig. 6i) may be a function of less efficient CTCF editing in MUTZ-3 cells (Extended Data Fig. 8g).

To evaluate binding of CTCF to the cisREs of MECOM network genes, we generated a Cas9 and GFP expressing MUTZ-3 cell line which, we infected with a lentivirus encoding an sgRNA targeting AAVS1 or MECOM along with red fluorescent protein (RFP). We observed a gradual loss of CD34⁺ cells following MECOM sgRNA delivery and on day 4 after editing we examined CTCF binding in CD34⁺ MUTZ-3 progenitors by ChIP-seq before complete loss of CD34⁺ progenitors. In the AAVS1-treated samples, we observed strong CTCF binding in the cisREs of MECOM network genes that contain CTCF footprints (Extended Data Fig. 8h). There was no difference in CTCF binding after MECOM editing, suggesting that the co-regulation of MECOM network genes by CTCF is not due to differential CTCF chromatin occupancy in CD34⁺ MUTZ-3 cells, but may instead be due to differential cofactor interactions or chromatin looping. Collectively, these data reveal that the MECOM regulatory gene network co-regulated by CTCF is indispensable for AML progenitor maintenance.

Discussion

A greater fundamental understanding of the transcriptional circuitry that enables human HSCs self-renewal holds considerable promise for future mechanistic studies of HSC function and therapeutic applications. For instance, with emerging advances in gene therapy and genome editing of HSCs, the ability to better maintain and manipulate these cells both ex and in vivo would be clinically beneficial⁴⁷; however, the limitations in our molecular understanding of this regulatory process have hampered such efforts.

Here, we have taken advantage of a rare experiment of nature to illuminate fundamental transcriptional circuitry that is required for human HSC maintenance in vivo. We have followed up on the human genetic observation that MECOM haploinsufficiency results in early-onset bone marrow failure and by modeling this disorder in primary HSPCs, we show that the functional loss of HSCs is accompanied by alterations in a network of genes critical for HSC maintenance. The identification of this gene network highlights the need to couple rigorous functional assays that nominate cellular vulnerabilities with integrative genomic profiling and analyses. Our results demonstrate how subtle gene expression changes can translate into major defects in HSC maintenance and uncover additional regulators of HSCs that can be subject to systematic perturbation studies in the future.

Through integrative genomic analysis of this network, we have gained insights into critical gene targets and have elucidated cooperative interactions among hematopoietic TFs involved in HSC function. We identify an antagonistic role for CTCF in altering chromatin looping of MECOM network genes as the cells differentiate and validate this interaction by functional and molecular rescue, illuminating fundamental transcriptional circuitry required for human HSC maintenance. We also find that this very same network is co-opted in AMLs with poor prognosis. A notable finding is that the MECOM regulatory network serves as a better predictor of poor outcome than does MECOM expression itself, suggesting that some AMLs may augment MECOM function in a manner beyond expression changes. This will be an important area for future exploration. It is also notable that leukemias arising due to insertional mutagenesis following human gene therapy trials have resulted in activation of MECOM⁴⁸. Clones with increased MECOM expression often have a long latency, but can result in a more aggressive disease course. Our finding that an HSC regulatory program is co-opted by increased MECOM expression may help explain these perplexing clinical observations. A deeper understanding of how such stem cell networks are utilized in malignant states may enable improved therapeutic approaches and provide opportunities to expand and manipulate non-malignant HSCs for therapeutic benefit.

Methods

Data reporting

No statistical methods were used to predetermine sample sizes but our sample sizes are similar to those reported in previous publications^16,23,. Data distribution was assumed to be normal but this was not formally tested. Data collection and analysis were not performed blind to the conditions of the experiments. No animals or data points were excluded from analysis.

Cell line and primary cell culture

HSPCs were purified from discarded UCB samples of healthy male or female newborns using the EasySep Human CD34 Positive Selection Kit II following pre-enrichment using the RosetteSep Pre-enrichment cocktail (Stem Cell Technologies) and mononuclear cell isolation on Ficoll-Paque (GE Healthcare) density gradient. Cells were cryopreserved for later use. Granulocyte colony-stimulating factor mobilized adult CD34⁺ HSPCs and were purchased (Fred Hutchinson Cancer Research Center). Thawed cells were cultured at 37 °C and 5% O₂ in serum-free HSC medium consisting of StemSpan II medium (Stem Cell Technologies) supplemented with CC100 cytokine cocktail (Stem Cell Technologies), 100 ng ml⁻¹ TPO (Peprotech) and 35 nM UM171 (Stem Cell Technologies). Confluency was maintained between 2 × 10⁵ and 1 × 10⁶ cells per ml.

MUTZ-3 cells (DSMZ) were cultured at 37 °C in α-MEM (Life Technologies) supplemented with 20% FBS, 20% conditioned medium from 5,637 cells (ATCC)⁴⁹ and 1% penicillin/streptomycin. Confluency was maintained between 7 × 10⁵ and 1.5 × 10⁶ ml⁻¹.

HNT34 cells (Creative Bioarrray) were cultured at 37 °C in α-MEM (Life Technologies) supplemented with 20% FBS, 20% conditioned medium from 5,637 cells (ATCC)⁴⁹ and 1% penicillin/streptomycin. Confluency was maintained between 5 × 10⁵ and 1.5 × 10⁶ ml⁻¹.

The 293T cells were cultured at 37 °C in DMEM (Life Technologies) supplemented with 10% FBS and 1% penicillin/streptomycin.

Mouse model

NOD.Cg-Kit^W-41JTyr⁺Prkdc^scidIl2rg^tm1Wjl (NBSGW) mice were obtained from the Jackson Laboratory (stock 026622)²¹. Littermates of the same sex were randomly assigned to experimental groups. NBSGW were interbred to maintain a colony of animals homozygous or hemizygous for all mutations of interest. The Institutional Animal Care and Use Committee at Boston Children’s Hospital approved the study protocol and provided guidance and ethical oversight

CRISPR editing and analysis

Electroporation was performed on day 1 after thawing HSPCs using the Lonza 4D Nucleofector with 20 µl Nucleocuvette strips as described^23,50. Briefly, the RNP complex was made by combining 100 pmol Cas9 (IDT) and 100 pmol modified sgRNA (Synthego) targeting MECOM (5′-CAAGGTCTGCAAACCTAACA-3′), AAVS1 (5′-GGGGCCACTAGGGACAGGAT-3′) or CTCF (5′-CAATTCTCCACTGGTCACAA-3′) and incubating at 21 °C for 15 min. Between 2 × 10⁵ and 4 × 10⁵ HSPCs resuspended in 20 µl P3 solution were mixed with RNP and underwent nucleofection with program DZ-100. For samples that underwent dual perturbation, total amounts of 100 pmol Cas9 and 100 mol sgRNA (50 pmol each guide) were used. Cells were returned to HSC medium and editing efficiency was measured by PCR at 48 h after electroporation, unless otherwise indicated. First, genomic DNA was extracted using the DNeasy kit (QIAGEN) or both DNA and RNA were extracted using the AllPrep DNA/RNA Mini kit (QIAGEN) according to the manufacturer’s instructions. Genomic PCR was performed using Platinum II Hotstart Mastermix (Thermo Fisher Scientific) and edited allele frequency was detected either by Sanger sequencing and analyzed by ICE (ice.syngthego.com) or NGS and analyzed with Crispresso2 (ref. ⁵¹). The following primer pairs were used: MECOM-ICE (forward: 5′-ACATCAACCCAGAATCAGAAAC-3′; reverse: 5′-GGAAAAGGAAGGCTGCAAAG-3′); MECOM-NGS (forward: 5′-AGAAATGTGAGTTCCATGCAAGA-3′; reverse: 5′-AGCAAATATCATTGTCAGACCTGT-3′); and CTCF (forward: 5′-CAGCGGATTCAGATGGGTAA-3′; reverse: 5′-TCACCGTTTTAGCCAGGATG-3′). The effect on MECOM mRNA after editing was detected by quantitative PCR with reverse transcription (qRT–PCR) using SYBR green (Bio-Rad) after cDNA synthesis with iScript (Bio-Rad).

MUTZ-3 cells were edited as above with the following modification: cells were resuspended in 20 µl SF solution and program EO-100 was used for electroporation.

Viral constructs and transduction

MDS and EVI1 cDNA were synthesized from mRNA of human HSPCs using the following primers: MDS (forward: 5′-CGTACTCGAGGCCGCCACCATGAGATCCAAAGGCAGGGCAA-3′; reverse: 5′-TACGGAATTCTCACTCCCATCCATAACTGGGGTCT-3′); and EVI1 (forward: 5′-CGTACTCG AGGCCGCCACCATGATCTTAGACGAATTTTACAATG-3′; reverse: 5′-TACGGAATTCTCATACGTGGCTTATGGACTGG-3′). MECOM cDNA was synthesized using MDS-F and EVI1-R primers. Wobble mutations were introduced to disrupt the sgRNA binding site using the following primers EVI1-F and wobble reverse (5′-GTGCCGAGTGAGATTCGCGGATCTAGGAAAAAT-3′) and wobble forward (5′-ATTTTTCCTAGATCCGCGAATCTCACTCGGCAC-3′) with EVI1-R, followed by overlap PCR of the two fragments. Primers included restriction enzyme sites to allow for cloning using EcoRI and XhoI into the HMD IRES–GFP backbone⁵².

The lentiviral pXPR_049 plasmid was obtained from the Genomics Perturbation Platform at the Broad Institute and RFP was cloned in place of the puromycin resistance gene. sgRNA sequences targeting AAVS1 or MECOM as described above were cloned into pXPR_049-RFP using BsmBI. The lentiviral pXPR_104 plasmid encoding Cas9v3-2A-GFP was also obtained from the Broad Institute Genomics Perturbation Platform.

To produce lentivirus, approximately 24 h before transfection, 293T cells were seeded in 10-cm plates. Cells were co-transfected with 10 µg pΔ8.9, 1 µg VSVG and 10 µg HMD vector variant, Cas9–GFP or sgRNA–RFP using calcium phosphate. The medium was changed the following day and viral supernatant was collected 48 h after transfection, filtered with a 0.45-µm filter and concentrated by ultracentrifugation at 100,000g for 2 h at 4 °C.

For lentiviral rescue experiments, 24 h after CRISPR nucleofection, 1 × 10⁵ HSPCs were transduced at a multiplicity of infection (MOI) of 10, with HMD empty, MDS, EVI1 or MECOM virus in 12-well plates with 8 µg ml⁻¹ of polybrene (Millipore), spun at 931g for 1.5 h at 21 °C and incubated in the viral supernatant overnight at 37 °C. Virus was washed off 16 h after infection.

MUTZ-3 cells were transduced at an MOI of 1 by spinfection at 1,455g for 1.5 h at 21 °C and were incubated in the viral supernatant overnight. Virus was washed off 16 h after infection. MUTZ-3 cells underwent viral transduction first, followed by CRISPR editing at 48 h after infection. MUTZ-3 or HNT34 cell lines expressing Cas9–GFP were generated by spinfection followed by GFP purification and subsequent spinfection with sgRNA–RFP virus and a second sorting for GFP⁺RFP⁺ cells.

Transplantation assays

Non-irradiated NBSGW mice (between 4–8 weeks of age) were tail vein injected with UCB or adult CD34⁺ HSPCs (1–2 × 10⁵ cells) on day 3 after CRISPR editing. Peripheral blood was sampled monthly by retro-orbital sampling and animals were killed at 16 weeks for BM evaluation. Secondary transplantations were performed by directly transplanting 60% of total BM cells from primary recipients into secondary non-irradiated NBSGW recipients. Human chimerism was assessed by evaluation of the BMs of secondary recipients at 16 weeks by flow cytometry and MECOM sequencing.

Flow cytometry and cell sorting

Cells were washed with PBS and stained with the following panel of antibodies to quantify and enrich for LT-HSCs: anti-CD34-PerCP-Cy5.5 (BioLegend, 343612), anti-CD45RA-APC-H7 (BD, 560674), anti-CD90-PECy7 (BD, 561558), anti-CD133-super bright 436 (eBioscience, 62-1338-42), anti-EPCR-PE (BioLegend, 351904) and anti-ITGA3-APC (BioLegend, 343808). LT-HSCs were defined by the following immunophenotype: CD34⁺CD45RA⁻CD90⁺CD133⁺ITGA3⁺EPCR⁺ (ref. ¹⁶). Three microliters of each antibody were used per 1 × 10⁵ cells in 100 µl. Total LT-HSC numbers were calculated as a product of the frequency of LT-HSCs by flow cytometry and total cell number in culture.

Human cell chimerism after xenotransplantation was determined by staining with anti-mouse CD45-FITC (BioLegend, 103108) and anti-human CD45-APC (BioLegend, 368512). Human cell subpopulations were detected in the BM of transplanted mice using the following antibodies: anti-human CD45-APC (BioLegend, 368512), anti-human CD3-Pacific Blue (BioLegend, 344823), anti-human CD19-PECy7 (BioLegend, 302215), anti-human CD11b-FITC (BioLegend, 301330), anti-human CD41a-FITC (eBioscience, 11-0419-42), anti-human CD34-Alexa 488 (BioLegend, 343518) and anti-human CD235a-APC (eBioscience, 17-9987-42). Aliquots were stained individually for CD34 and CD235 or with CD45 in conjunction with the other lineage-defining markers. Mice with human cell chimerism <2% in the BM were excluded from subpopulation analysis.

MUTZ-3 cells were stained with anti-CD34-APC (BioLegend, 343607) and anti-CD14-PECy7 (BioLegend, 367112).

Flow cytometric analyses were conducted on a BD LSRII, LSR Fortessa or Accuri C6 instruments and all data were analyzed using FlowJo software (v.10.8). FACS was performed on BD Aria and samples were collected in PBS containing 2% BSA and 0.01% Tween for immediate processing for sequencing on the 10x Genomics platform. Alternatively, single cells were sorted into PCR plates containing 5 µl Buffer RLT Plus (QIAGEN) with 1% BME and immediately frozen at −80 °C for G&T sequencing.

Cell cycle analysis

For cell cycle analyses, on day 5 after CRISPR editing, cells were incubated with 5-ethynyl-2′-deoxyuridine (EdU) (Thermo Fisher Scientific, C10634) for 2 h, then fixed and permeabilized before cell surface staining as per the manufacturer’s recommendations. Multipotent progenitors were defined by the immunophenotype CD34⁺CD45RA⁻CD90⁺CD133⁺. Pegasus v.1.0 (https://github.com/klarman-cell-observatory/pegasus) in the Terra environment (https://app.terra.bio/#) was used to determine the expression of transcriptional signatures of cell cycle status of single LT-HSCs⁵³.

Analysis of cell division was performed by carboxyfluorescein succinimidyl ester (CFSE) labeling (Thermo, Fisher Scientific C34554). At 24 h after CRISPR editing, cells were incubated with CFSE, washed and subjected to flow cytometric analysis to establish a baseline and again on day 5. Proliferation modeling was performed in FlowJo v.10.8.0. Replication index was calculated in FlowJo v.10.8.0 as the total number of divided cells / cells that underwent at least one division.

Colony-forming unit cell assays

Three days after RNP electroporation, 500 CD34⁺ HSPCs were plated in 1 ml methylcellulose medium (H4034, Stem Cell Technologies) in triplicate unless otherwise noted. Primary colonies were counted after 14 d.

10x Genomics scRNA-seq

A suspension of 11,000 AAVS1-edited LT-HSCs and a suspension of 16,000 MECOM-edited LT-HSCs were loaded into two lanes of 10x RNA 3′ V3 kit (10x Genomics) according to the manufacturer’s guidelines. Libraries were constructed with distinct i7 barcodes, pooled in equal molecular concentrations and sequenced on one lane of Hiseq (Illumina) according to the manufacturer’s protocol. Briefly, 36 cycles were carried out for read1, 8 cycles for index1 and 90 cycles for read2, yielding ~15,000 reads per cell.

Bulk RNA-seq

Total RNA was extracted using the RNeasy Micro kit (QIAGEN, 74004) or using the 2.2× RNAClean XP kit (Beckman, A63987) from ~1,000 cells sorted in 25 µl Buffer RLT Plus with 1% BME. Then we proceeded with the SmartSeq2 protocol from the reverse transcription step using 10 ng of RNA⁵⁴. The whole transcriptome amplification step was set at ten cycles. The 15 bulk RNA libraries were pooled at equal molecular concentration and sequenced using the NextSeq550 High Output or Novaseq kit (Illumina) with 35 paired-end reads.

Genome and transcriptome sequencing

Plates of sorted LT-HSCs were thawed from −80 °C on ice and an equal volume of prepared 2× Dynabeads was added. Samples were incubated at 72 °C for 1 min, then 56 °C for 2 min, followed by 10 min at 25 °C to allow for mRNA hybridization. Plates were placed on a magnet for 2 min and 8 µl of the supernatant containing genomic DNA (gDNA) was transferred into a new plate. Beads were washed twice in 10 µl of cold 1× Hybridization Buffer and once in PBS + RNase Inhibitor. All washes were transferred to the gDNA plate. Once PBS was removed, Dynabeads were immediately resuspended in 7.34 µl of SmartSeq2 Mix 1 and the plate was incubated at 80 °C for 3 min. The plate was immediately placed on the magnet and the supernatant containing mRNA was rapidly transferred into a new plate on ice. Then, 2.66 µl of SmartSeq2 Mix 2 was added. At this point, we proceeded with the SmartSeq2 protocol from the reverse transcription step⁵⁴. The whole transcriptome amplification step was set at 23 cycles. gDNA which was present in the pooled supernatant/wash buffer was precipitated on DNA SPRI beads at a 0.6× ratio and eluted in 10 µl MDA Hyb buffer, denatured at 95 °C for 3 min and cooled on ice. Then 5 µl of Phi29 Mix was added and the mix was incubated at 45 °C for 8 h. The reaction was deactivated at 65 °C for 5 min. The MDA plate was stored at −20 °C. Eight plates of mRNA libraries were sequenced using the Nextseq550 high output kit (Illumina) with 35 paired-end reads according to the manufacturer’s recommendations. To genotype each cell based on MECOM editing status, MECOM from gDNA and whole transcriptome analysis was amplified by PCR and libraries were constructed, pooled and sequenced using the Miseq 300 cycle kit (Illumina) according to manufacturer’s protocol with 150 paired-end reads.

ChIP-seq

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) was performed on chromatin from 2×10⁶ CD34⁺MUTZ-3 after MECOM or AAVS1 editing. Sorted cells were cross-linked with 1% methanol-free formaldehyde (Pierce Life Technologies, 28906), quenched with 0.125 M glycine and frozen at −80 °C and stored until further processing. ChIP reaction was performed with iDeal ChIP-seq kit for TFs (Diagenode, C01010055) with modifications of the manual detailed below. Lysed samples were sonicated using the E220 sonicator (Covaris, 500239) in microTUBE AFA Fiber Pre-Slit Snap-Cap tubes (Covaris, 520045) with settings for 200-bp DNA shearing. Sheared chromatin was immunoprecipitated with 2.5 μg CTCF antibody (Abcam, ab128873, RRID AB_11144295) or 2.5 μg IgG antibody (Diagenode, C15410206, RRID AB_2722554). Eluted and decross-linked DNA was purified with MicroChIP DiaPure columns (Diagenode, C03040001) and eluted in 30 μl of nuclease-free water. ChIP and input libraries for sequencing were prepared with ThruPLEX DNA-Seq kit (Takara, R400674) and DNA Single Index kit, 12S Set A (Takara, R400695). Size selection steps were performed with Magbio Genomics HighPrep PCR beads (Fisher Scientific, 50-165-6582). The libraries were sequenced at Broad Institute Genomic Services by using the Illumina NextSeq 500 platform and the 150-bp paired-end configuration to obtain at least 30 million reads per sample.

Quantification and statistical analysis

Protein structure prediction

The MECOM sequence corresponding to amino acids 700–900 was submitted to the I-TASSER server for homology modeling⁵⁵. The predicted structure of the zinc finger domain was rendered and visualized using PyMOL.

Bulk RNA data analysis

Fastq files demultiplexed by bcl2fastq from bulk RNA-seq run were uploaded to Terra and processed with the Cumulus pipeline for bulk RNA-seq⁵³ to get gene counts and gene isoform matrices. Human reference genome GRCh38 and gene annotation reference Homo_sapiens.GRCh38.93.gtf were used in all the RNA analysis.

Single-cell RNA data analysis

BCL files generated by scRNA-seq were uploaded to Terra and processed with the Cumulus pipeline for 10x single-cell RNA data and SmartSeq2 (ref. ⁵³) to get gene matrices. Human reference genome GRCh38 and gene annotation reference Homo_sapiens.GRCh38.93.gtf were used in all the RNA analyses. For 10x data, doublets were filtered out and cells that contained reads for 500 to 8,000 genes with the percent of mitochondrial genes <20% were included in the analysis; cells were not filtered based on unique molecular identifier counts. For SmartSeq2 data, Scanpy⁵⁶ was used to integrate all plates and perform batch correction and normalization. Cells that contained reads for 2,000 to 20,000 genes with the percent of mitochondrial genes <20% were included. Genes expressed in at least 0.05% of cells were included. Scanorama⁵⁷ was used for batch correction. SmartSeq2 and 10x data were integrated and batch correction was performed on donor, technology and process batch with a Python version of Harmony⁵⁸. Celltypist²² was used to infer cell types with the Pan_Fetal_Human.pkl model.

MECOM genotyping in G&T data

MECOM editing was determined by CRISPResso2 (ref. ⁵¹). Genotyping from gDNA and from cDNA was combined for the same cell and cells that contained both an edited allele and a wild-type allele were defined as heterozygous. Genotyping annotation was integrated into gene matrix metadata.

Differential expression analysis

Differential expression analysis was performed by Seurat v.4.0 with the function FindMarkers pipeline in the 10x single-cell RNA data to compare AAVS1- and MECOM-edited LT-HSCs. The fold change threshold for significant gene expression was 0.05 on log₂scale, ident.1 was AAVS1-edited cells, ident.2 was MECOM-edited cells and the test algorithm was MAST. Permutation analysis was performed by randomly assigning single cells to one of two groups irrespective of the initial experimental group and repeating differential expression analysis. One hundred independent permutations were performed.

Pseudobulk analysis

Raw counts from single LT-HSCs that passed the quality control from each experimental condition (AAVS1 or MECOM-edited) were aggregated to generate pseudobulk data for each group. Genes that did not reach the detection ratio cutoff used in the single-cell differential gene expression discovery were removed from the pseudobulk analysis. Log₂ fold change between groups was calculated and correlation with gene expression data from single cells was calculated by Spearman’s rank correlation.

HSC signatures in the Immune Cell Atlas

Pegasus was used to determine the expression of the HSC signature (CD34, HLF and CRHBP)²³ in umbilical cord samples from the Immune Cell Atlas (https://data.humancellatlas.org/explore/projects/cc95ff89-2e68-4a08-a234-480eca21ce79).

Gene signature enrichment during hematopoiesis

We measured the enrichment of the MECOM down or MECOM up gene sets during hematopoiesis, using bulk RNA-seq datasets across 20 hematopoietic subpopulations²⁷. The observed expression for the tested gene set in each cell type was calculated by taking the mean expression of genes in the list. We performed 1,000 permutations in which we sampled gene sets with the same number of genes as the tested gene set. The expected expression for permuted gene set in cell type was calculated by taking the mean expression of genes in the list. The enrichment for gene set in cell type was computed as follows:

$$z_{i,j} = \frac{{y_{i,\,j} - {\rm{mean}}(\,y_{i,\,j}^{(P)})}}{{{\rm{s.d.}}(\,y_{i,\,j}^{(P)})}}$$

where the mean and variance of $y_{i,j}^{(P)}$ are taken over all values of P $\left( {P \in (1,\,2,...,1,000)} \right.$.

Gene set enrichment analysis

We used GSEApy (https://github.com/zqfang/GSEApy) for all GSEA analyses to determine the enrichment of MECOM network genes following MECOM editing and rescue and in the TCGA and CCLE datasets that were stratified based on MECOM expression or overall survival. Significant enrichment of the gene set was determined using a t-test for MECOM rescue in LT-HSCs and MUTZ-3 cells and diff_of_classes for TCGA analyses. Genes from CCLE data were preranked by determining mean expression for each gene in AML-high and AML-low cohorts and calculating log₂ fold change. GSEA was performed using 1,000 permutations to determine significance.

Development of HemeMap

A detailed description is provided in the Supplementary Note^{59,60,61,62,63,64,65}.

ChIP-seq data analysis

The raw ChIP-seq data³⁵ for the binding sites of hematopoietic TFs FLI1, GATA2 and RUNX1 in human CD34⁺ HSPCs, were downloaded and processed. The paired-end reads were trimmed and aligned to hg19 reference genome using Trimmomatic and Bowtie2, respectively. MACS2 (ref. ⁶⁶) was used for peak calling with the default narrow peak setting. Genomic tracks were generated from BAM files using counts per million mapped reads normalization to facilitate comparison between tracks. The processed CTCF ChIP-seq data from HSPCs and differentiated hematopoietic lineages were obtained from a previous study³⁸. To determine the significance of the enrichment of TF occupancy within cisREs of MECOM network genes, a permutation test was performed. For each TF, we calculated the number of cisREs overlapping with ChIP-seq peaks. The expected distribution of overlapping cisREs was generated by 1,000 permutations of an equal number of TF peaks across the genome. The presence of TF peaks in cisREs were counted and the Venn plot was generated by the web app BioVenn (https://www.biovenn.nl). The enrichment of CTCF signal on the footprints was performed using deepTools software⁶⁷. We used a Wilcoxon signed-rank test to evaluate the differences of normalized CTCF signals on footprints between HSPCs and other terminal blood cells, namely erythroid cells, T cells, B cells and monocytes.

CTCF-mediated loop enrichment analysis

A set of 7,358 representative chromatin interactions in hematopoietic cells was identified from a high-resolution Hi-C map of OCI-AML2 cells as previously described³⁷. The loops whose anchors overlap with cisREs of MECOM down genes were extracted for further analysis. The CTCF-mediated loops (at least one of the anchors containing a CTCF footprint) and non-CTCF-mediated loops (anchors without CTCF footprint) were identified separately. The Low-C data of chromatin looping in LT- and ST-HSC were normalized by Knight–Ruiz balanced interaction frequencies at a resolution of 25 Kb. We used Juicer to perform aggregate peak analysis³⁶ to test for enrichment of loops within the Low-C data from LT-HSCs and ST-HSCs. Loops containing genes were identified by the genes within the genomic domains between loop anchors.

Analysis of primary AML patient data

Included studies

Three study cohorts were included in the survival analyses. We downloaded RNA-seq V2 expression data and corresponding clinical outcomes from the TCGA LAML³⁹ cohort from cBioPortal (https://www.cbioportal.org/study/summary?id=laml_tcga_pub) for 173 patients with AML. The same was conducted for the BEAT AML cohort for 430 patients (https://www.cbioportal.org/study/summary?id=aml_ohsu_2018)⁴⁰. In addition, the TARGET dataset was downloaded for 440 pediatric patients with AML (https://www.cbioportal.org/study/summary?id=aml_target_2018_pub)⁴¹. To gain maximal insight, adult datasets (TCGA and BEAT) were combined, with subsequent adjustments in analyses to account for study specific features. The only pediatric data used were from the TARGET dataset. The results published here are in part based upon data generated by the Therapeutically Applicable Research to Generate Effective Treatments (https://ocg.cancer.gov/programs/target) initiative, phs000218. The data used for this analysis are available at https://portal.gdc.cancer.gov/projects.

Derivation of variables of interest

A detailed description is provided in the Supplementary Note.

Survival analyses

KM curves were constructed demonstrating survival for each cohort (adult and pediatric) and variables (MECOM expression, MECOM network enrichment score, MECOM network enrichment (categorical), LSC17 and clinical risk score). For continuous variables, to appreciate survival differences in the variable in this way, KM curves were stratified by thresholding on the optimum threshold determined by Youden’s J statistic, maximizing both sensitivity and specificity of the metric. Follow-up time was truncated at 2,500 d for the pediatric cohort (thereby including n = 350, 79.5% of all complete cases) and at 1,500 d for the adult cohort (thereby including n = 513, 83.8% of all complete cases) for this and subsequent analyses to limit the issue of data sparsity at very late event time points. KM curves were constructed in R using survival and ggsurvplot packages.

HRs and 95% CI of death were determined from Cox proportional hazards models. These were created for each variable, correcting for contributing study in the adult group. This allowed assessment of continuous variables at their full spectrum. This also allowed for assessment of association of MECOM down network enrichment with mortality, independent of existing clinical approaches such as the clinical risk score and LSC17. Corrected models for age and sex were created and marginal hazard of mortality was derived and displayed graphically by different ages. The R packages’ coxph, survival, rms and ggeffects were used.

For analysis of AML cells from the CCLE database, we downloaded RNA-seq and CRISPR dependency data from the Cancer Dependency Map (https://depmap.org)⁶⁸. We stratified the cohort based on MECOM expression (MECOM-low, log₂(RPKM + 1) < 1; MECOM-high, log₂(RPKM + 1) ≥ 1). Differential essentiality was determined by subtracting the CERES gene effect score of MECOM-high and MECOM-low AML samples. A negative value indicates stronger essentiality in MECOM-high AML.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Summary statistics from RNA-seq studies are available in Supplementary Tables 2,3 and 7. HemeMap correlation data are available in Supplementary Tables 4 and 5. All sequencing data are deposited in National Center for Biotechnology Information Gene Expression Omnibus under Super Series GSE175521, including GSE175515 for MUTZ-3 and primary human CD34⁺ LT-HSPC bulk RNA-seq; GSE175516 for LT-HSPC 10x Genomics single-cell RNA-seq data; GSE175518 for primary human CD34⁺ LT-HSPC Amplicon-seq data; GSE175520 for primary human CD34⁺ LT-HSPC SmartSeq2 data; GSE214399 for CTCF in MUTZ-3 ChIP-seq data; and GSE216225 for F36P, HNT34 and primary human CD34⁺ HSPC bulk RNA-seq data and HSPC 10x Genomics scRNA-seq data. Publicly available AML gene expression data were downloaded from the following links and analyzed as described in the Methods: TCGA LAML (https://www.cbioportal.org/study/summary?id=laml_tcga_pub), TARGET AML (https://www.cbioportal.org/study/summary?id=aml_target_2018_pub) and BEAT AML (https://www.cbioportal.org/study/summary?id=aml_ohsu_2018). Source data are provided with this paper.

Code availability

Source data for reproducing results of this study are available on GitHub (https://github.com/sankaranlab/mecom_var).

References

Liggett, L. A. & Sankaran, V. G. Unraveling hematopoiesis through the lens of genomics. Cell 182, 1384–1400 (2020).
Article CAS Google Scholar
Karantanos, T. & Jones, R. J. Acute myeloid leukemia stem cell heterogeneity and its clinical relevance. Adv. Exp. Med. Biol. 1139, 153–169 (2019).
Article CAS Google Scholar
Bluteau, O. et al. A landscape of germ line mutations in a cohort of inherited bone marrow failure patients. Blood 131, 717–732 (2018).
Article CAS Google Scholar
Germeshausen, M. et al. MECOM-associated syndrome: a heterogeneous inherited bone marrow failure syndrome with amegakaryocytic thrombocytopenia. Blood Adv. 2, 586–596 (2018).
Article CAS Google Scholar
Niihori, T. et al. Mutations in MECOM, encoding oncoprotein EVI1, cause radioulnar synostosis with amegakaryocytic thrombocytopenia. Am. J. Hum. Genet. 97, 848–854 (2015).
Article CAS Google Scholar
Goyama, S. et al. Evi-1 is a critical regulator for hematopoietic stem cells and transformed leukemic cells. Cell Stem Cell 3, 207–220 (2008).
Article CAS Google Scholar
Christodoulou, C. et al. Live-animal imaging of native haematopoietic stem and progenitor cells. Nature 578, 278–283 (2020).
Article CAS Google Scholar
Zhang, Y. et al. PR-domain-containing Mds1-Evi1 is critical for long-term hematopoietic stem cell function. Blood 118, 3853–3861 (2011).
Article CAS Google Scholar
Kataoka, K. et al. Evi1 is essential for hematopoietic stem cell self-renewal and its expression marks hematopoietic cells with long-term multilineage repopulating activity. Journal of Experimental Medicine 208, 2403–2416 (2011).
Article CAS Google Scholar
Yuasa, H. et al. Oncogenic transcription factor Evi1 regulates hematopoietic stem cell proliferation through GATA-2 expression. The EMBO Journal 24, 1976–1987 (2005).
Article CAS Google Scholar
Bindels, E. M. J. et al. EVI1 is critical for the pathogenesis of a subset of MLL-AF9-rearranged AMLs. Blood 119, 5838–5849 (2012).
Article CAS Google Scholar
Ayoub, E. et al. EVI1 overexpression reprograms hematopoiesis via upregulation of Spi1 transcription. Nat. Commun. 9, 4239 (2018).
Article Google Scholar
Glass, C. et al. Global identification of EVI1 target genes in acute myeloid leukemia. PLoS ONE 8, e67134 (2013).
Article CAS Google Scholar
Bard-Chapeau, E. A. et al. EVI1 oncoprotein interacts with a large and complex network of proteins and integrates signals through protein phosphorylation. Proc. Natl Acad. Sci. USA 110, E2885–E2894 (2013).
Article CAS Google Scholar
Kurokawa, M. et al. The evi-1 oncoprotein inhibits c-Jun N-terminal kinase and prevents stress-induced cell death. EMBO J. 19, 2958–2968 (2000).
Article CAS Google Scholar
Tomellini, E. et al. Integrin-α3 is a functional marker of ex vivo expanded human long-term hematopoietic stem cells. Cell Rep. 28, 1063–1073 (2019).
Article CAS Google Scholar
Pellegrino, M. et al. High-throughput single-cell DNA sequencing of acute myeloid leukemia tumors with droplet microfluidics. Genome Res. 28, 1345–1352 (2018).
Article CAS Google Scholar
Kurosaki, T., Popp, M. W. & Maquat, L. E. Quality and quantity control of gene expression by nonsense-mediated mRNA decay. Nat. Rev. Mol. Cell Biol. 20, 406–420 (2019).
Article CAS Google Scholar
Fares, I. et al. Cord blood expansion. Pyrimidoindole derivatives are agonists of human hematopoietic stem cell self-renewal. Science 345, 1509–1512 (2014).
Article CAS Google Scholar
Laurenti, E. et al. CDK6 levels regulate quiescence exit in human hematopoietic stem cells. Cell Stem Cell 16, 302–313 (2015).
Article CAS Google Scholar
McIntosh, B. E. et al. Nonirradiated NOD,B6.SCID Il2rγ-/- Kit(W41/W41) (NBSGW) mice support multilineage engraftment of human hematopoietic cells. Stem Cell Rep. 4, 171–180 (2015).
Article CAS Google Scholar
Domínguez Conde, C. et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, eabl5197 (2022).
Article Google Scholar
Bao, E. L. et al. Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells. Nature 586, 769–775 (2020).
Article CAS Google Scholar
Dey, S. S., Kester, L., Spanjaard, B., Bienko, M. & van Oudenaarden, A. Integrated genome and transcriptome sequencing of the same cell. Nat. Biotechnol. 33, 285–289 (2015).
Article CAS Google Scholar
Finak, G. et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16, 278 (2015).
Article Google Scholar
Squair, J. W. et al. Confronting false discoveries in single-cell differential expression. Nat. Commun. 12, 5692 (2021).
Article CAS Google Scholar
Wahlster, L. et al. Familial thrombocytopenia due to a complex structural variant resulting in a WAC-ANKRD26 fusion transcript. J. Exp. Med. 218, e20210444 (2021).
Article CAS Google Scholar
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Article CAS Google Scholar
Granja, J. M. et al. Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nat. Biotechnol. 37, 1458–1465 (2019).
Article CAS Google Scholar
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Article CAS Google Scholar
Ulirsch, J. C. et al. Interrogation of human hematopoiesis at single-cell and single-variant resolution. Nat. Genet. 51, 683–693 (2019).
Article CAS Google Scholar
Zhang, X. et al. Large DNA methylation nadirs anchor chromatin loops maintaining hematopoietic stem cell identity. Mol. Cell 78, 506–521 (2020).
Article CAS Google Scholar
Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article Google Scholar
Ciau-Uitz, A., Wang, L., Patient, R. & Liu, F. ETS transcription factors in hematopoietic stem cell development. Blood Cells Mol. Dis. 51, 248–255 (2013).
Article CAS Google Scholar
Beck, D. et al. Genome-wide analysis of transcriptional regulators in human HSPCs reveals a densely interconnected network of coding and noncoding genes. Blood 122, e12–e22 (2013).
Article CAS Google Scholar
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS Google Scholar
Takayama, N. et al. The transition from quiescent to activated states in human hematopoietic stem cells is governed by dynamic 3D genome reorganization. Cell Stem Cell 28, 488–501 (2021).
Article CAS Google Scholar
Qi, Q. et al. Dynamic CTCF binding directly mediates interactions among cis-regulatory elements essential for hematopoiesis. Blood 137, 1327–1339 (2021).
Article CAS Google Scholar
Cancer Genome Atlas Research Network. et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059–2074 (2013).
Article Google Scholar
Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature 562, 526–531 (2018).
Article CAS Google Scholar
Bolouri, H. et al. The molecular landscape of pediatric acute myeloid leukemia reveals recurrent structural alterations and age-specific mutational interactions. Nat. Med. 24, 103–112 (2018).
Article CAS Google Scholar
Glass, C., Wilson, M., Gonzalez, R., Zhang, Y. & Perkins, A. S. The role of EVI1 in myeloid malignancies. Blood Cells Mol. Dis. 53, 67–76 (2014).
Article CAS Google Scholar
Gröschel, S. et al. Deregulated expression of EVI1 defines a poor prognostic subset of MLL-rearranged acute myeloid leukemias: a study of the German-Austrian Acute Myeloid Leukemia Study Group and the Dutch-Belgian-Swiss HOVON/SAKK Cooperative Group. J. Clin. Oncol. 31, 95–103 (2013).
Article Google Scholar
Ng, S. W. K. et al. A 17-gene stemness score for rapid determination of risk in acute leukaemia. Nature 540, 433–437 (2016).
Article CAS Google Scholar
Gröschel, S. et al. A single oncogenic enhancer rearrangement causes concomitant EVI1 and GATA2 deregulation in leukemia. Cell 157, 369–381 (2014).
Article Google Scholar
Yamazaki, H. et al. A remote GATA2 hematopoietic enhancer drives leukemogenesis in inv(3)(q21;q26) by activating EVI1 expression. Cancer Cell 25, 415–427 (2014).
Article CAS Google Scholar
Porteus, M. H. A new class of medicines through DNA editing. N. Engl. J. Med. 380, 947–959 (2019).
Article CAS Google Scholar
Stein, S. et al. Genomic instability and myelodysplasia with monosomy 7 consequent to EVI1 activation after gene therapy for chronic granulomatous disease. Nat. Med. 16, 198–204 (2010).
Article CAS Google Scholar
Kappas, N. C. & Bautch, V. L. Maintenance and in vitro differentiation of mouse embryonic stem cells to form blood vessels. Curr. Protoc. Cell Biol. 23, Unit 23.3 (2007).
Google Scholar
Bak, R. O., Dever, D. P. & Porteus, M. H. CRISPR/Cas9 genome editing in human hematopoietic stem cells. Nat. Protoc. 13, 358–376 (2018).
Article CAS Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Article CAS Google Scholar
Basak, A. et al. Control of human hemoglobin switching by LIN28B-mediated regulation of BCL11A translation. Nat. Genet. 52, 138–145 (2020).
Article CAS Google Scholar
Li, B. et al. Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq. Nat. Methods 17, 793–798 (2020).
Article CAS Google Scholar
Trombetta, J. J. et al. Preparation of single-cell RNA-seq libraries for next generation sequencing. Curr. Protoc. Mol. Biol. 107, 4.22.1–17 (2014).
Article Google Scholar
Yang, J. et al. The I-TASSER Suite: protein structure and function prediction. Nat. Methods 12, 7–8 (2015).
Article CAS Google Scholar
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Article Google Scholar
Hie, B., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2019).
Article CAS Google Scholar
Korsunsky, I. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods 16, 1289–1296 (2019).
Article CAS Google Scholar
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Article CAS Google Scholar
Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
Article CAS Google Scholar
Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic Acids Res. 46, D252–D259 (2018).
Article CAS Google Scholar
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Article Google Scholar
Yu, F., Sankaran, V. G. & Yuan, G.-C. CUT&RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&RUN and CUT&Tag data analysis. Bioinformatics 38, 252–254 (2021).
Article Google Scholar
Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
Article CAS Google Scholar
Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447–455 (2011).
Article CAS Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).
Article Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article Google Scholar
Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia. Nature 569, 503–508 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful to members of the Sankaran laboratory and numerous colleagues for valuable comments and suggestions. This work was supported by the New York Stem Cell Foundation (V.G.S.), a gift from the Lodish Family to Boston Children’s Hospital (V.G.S.), the Klarman Cell Observatory (A.R.), the Edward P. Evans Foundation (V.G.S.) and National Institutes of Health Grants R01 DK103794, R01 CA265726 and R01 HL146500 (V.G.S.). R.A.V. and L.W. received support from National Institutes of Health Grant T32 HL007574. R.A.V. is supported by the Edward P. Evans Center for Myelodysplastic Syndromes at the Dana-Farber Cancer Institute, the Julia’s Wings Foundation and the Office of Faculty Development at Boston Children’s Hospital. S.K.N. is a Scholar of the American Society of Hematology. V.G.S. is a New York Stem Cell-Robertson Investigator.

Author information

Liming Tao & Aviv Regev
Present address: Genentech, South San Francisco, CA, USA
Satish K. Nandakumar
Present address: Department of Cell Biology, Albert Einstein College of Medicine, Albert Einstein Cancer Center, Ruth L. and David S. Gottesman Institute for Stem Cell Research and Regenerative Medicine, Bronx, NY, USA
These authors contributed equally: Richard A. Voit, Liming Tao, Fulong Yu.

Authors and Affiliations

Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
Richard A. Voit, Fulong Yu, Liam D. Cato, Blake Cohen, Travis J. Fleming, Mateusz Antoszewski, Xiaotian Liao, Claudia Fiorini, Satish K. Nandakumar, Lara Wahlster, Kristian Teichert & Vijay G. Sankaran
Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
Richard A. Voit, Fulong Yu, Liam D. Cato, Blake Cohen, Travis J. Fleming, Mateusz Antoszewski, Xiaotian Liao, Claudia Fiorini, Satish K. Nandakumar, Lara Wahlster, Kristian Teichert & Vijay G. Sankaran
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Richard A. Voit, Liming Tao, Fulong Yu, Liam D. Cato, Blake Cohen, Travis J. Fleming, Mateusz Antoszewski, Xiaotian Liao, Claudia Fiorini, Satish K. Nandakumar, Lara Wahlster, Kristian Teichert, Aviv Regev & Vijay G. Sankaran
Howard Hughes Medical Institute, Chevy Chase, MD, USA
Aviv Regev
Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
Aviv Regev
Harvard Stem Cell Institute, Cambridge, MA, USA
Vijay G. Sankaran

Authors

Richard A. Voit
View author publications
You can also search for this author in PubMed Google Scholar
Liming Tao
View author publications
You can also search for this author in PubMed Google Scholar
Fulong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Liam D. Cato
View author publications
You can also search for this author in PubMed Google Scholar
Blake Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Travis J. Fleming
View author publications
You can also search for this author in PubMed Google Scholar
Mateusz Antoszewski
View author publications
You can also search for this author in PubMed Google Scholar
Xiaotian Liao
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Fiorini
View author publications
You can also search for this author in PubMed Google Scholar
Satish K. Nandakumar
View author publications
You can also search for this author in PubMed Google Scholar
Lara Wahlster
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Teichert
View author publications
You can also search for this author in PubMed Google Scholar
Aviv Regev
View author publications
You can also search for this author in PubMed Google Scholar
Vijay G. Sankaran
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.A.V., L.T., F.Y. and V.G.S. conceived and designed the experiments and wrote the manuscript with input from all authors. R.A.V., L.T., L.D.C., B.C., T.J.F., M.A., X.L., C.F., S.K.N., L.W. and K.T. performed functional studies and provided interpretation. F.Y. and L.T. performed the computational analyses. F.Y. designed and developed HemeMap. A.R. and V.G.S. provided supervision and overall project oversight.

Corresponding authors

Correspondence to Richard A. Voit or Vijay G. Sankaran.

Ethics declarations

Competing interests

A.R. is a founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas Therapeutics and until 31 August 2020 was a scientific advisory board member of Syros Pharmaceuticals, Neogene Therapeutics, Asimov and Thermo Fisher Scientific. Since 1 August 2020, A.R. has been an employee of Genentech, a member of the Roche Group. V.G.S. serves as an advisor to and/or has equity in Branch Biosciences, Ensoma, Novartis, Forma and Cellarity, all unrelated to the present work. The authors have no other competing interests to declare.

Peer review

Peer review information

Nature Immunology thanks H. Grimes and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Laurie A Dempsey, in collaboration with the Nature Immunology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Modeling MECOM haploinsufficiency in human CD34⁺ HSPCs.

(a) Schematic of the MECOM locus annotated with the location of sgRNAs (sg1-sg9) tested for efficiency of MECOM editing. The binding site of sg8 (underlined) which is used in subsequent studies is shown, and clinical mutations annotated with amino acid number that have been described in MECOM haploinsufficient bone marrow failure (red) are indicated. (b) Predicted partial protein structure of the MECOM zinc finger domain with mutated residues shown as spheres. These mutations are expected to disrupt the structure of the zinc finger, either through abrogation of Zn coordination (H751, C766) or tethering between the ZnF (R750, R778). (c) Percent modified alleles (left y-axis) and percent LT-HSCs of total live cells (right y-axis) after CRISPR editing of primary human CD34⁺ HSPCs. Editing efficiency was detected at 72 hours after RNP delivery of Cas9 and sgRNA by nucleofection and percent of live cells that remained in the LT-HSC gate was evaluated on day 6. LT-HSCs are defined by the following immunophenotype: CD34⁺CD45RA⁻CD90⁺CD133⁺EPCR⁺ITGA3⁺. sg2, sg5, sg7, sg8 are sgRNAs targeting MECOM as described in Extended Data Fig. 1a. n = 3 biologically independent samples. Mean is plotted and error bars show s.e.m. (d) Comparison of Sanger sequencing followed by ICE analysis and Next Generation Sequencing (NGS) for the detection of CRISPR edits. AAVS1 (blue) and MECOM (red) edited samples were analyzed by ICE and NGS in parallel. (e) MECOM editing in human CD34⁺ HSPCs after RNP delivery by nucleofection. Editing frequency was detected at 48 hours by Sanger sequencing of genomic DNA. Transcription of edited MECOM alleles was determined by cDNA synthesis followed by Sanger sequencing of RNA from bulk HSPCs at 48 hours. n = 3 biologically independent samples. Mean is plotted and error bars show s.e.m. (f) MECOM expression following CRISPR editing. MECOM expression (normalized to GAPDH) in bulk HSPCs was detected by qRT-PCR (n = 3 AAVS1, n = 9 MECOM; three biologically independent experiments) and was normalized to expression in the AAVS1-edited sample on the same day. Mean is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 1.7e-3, **P = 2.5e-4. (g) MECOM expression in LT-HSCs. MECOM expression (normalized to GAPDH) was detected by qRT-PCR (n = 3 per group; three biologically independent experiments) in bulk CD34⁺ HSPCs and in LT-HSCs sorted on day 3 after CRISPR editing. Mean is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 5.1e-3, **P = 8.3e-4. (h) Expansion of LT-HSCs in culture. HSPCs were cultured in the presence (n = 2) or absence (n = 2) of the HSC self-renewal agonist UM171. Percent of LT-HSCs was determined by FACS as in Fig. 1e and was used to calculate the total LT-HSC number. Cells were supplemented with fresh media every 2 days. (i) Expansion time course of bulk CD34⁺ HSPCs following CRISPR editing. HSPCs were thawed into HSC media containing 35 nM UM171 and underwent CRISPR editing 24 hours later. Cells were counted daily by trypan blue exclusion starting on day 2 after CRISPR editing and media was added to maintain equal confluency. n = 3 per group. Mean is plotted and error bars show s.e.m. Error bars that are shorter than the size of the symbols have been omitted for clarity. Two-sided Student t-test used. *P = 5e-3. (j) Stacked bar graph of cell cycle status of bulk HSPCs and HSC (HSC: CD34⁺CD45RA⁻CD90⁺CD133⁺) as determined by Edu incorporation and 7-AAD staining. On day 5 after CRISPR editing, cells were incubated with Edu for 2 hours, then fixed and permeabilized prior to 7-AAD and cell surface staining. AAVS1-edited (A) and MECOM-edited (M) samples, were compared by the proportion of cells in G0/G1 (Edu⁻/2n DNA content), S (Edu⁺), or M (Edu⁻/>2n DNA content) in bulk CD34⁺ cells or CD34⁺CD45RA⁻CD90⁺ HSCs. n = 3 per group. Mean is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 8.1e-3. (k) Stacked bar graph of cell cycle status of LT-HSCs as determined by transcriptional signatures of single-cell LT-HSCs. UCB CD34⁺ underwent CRISPR perturbation of MECOM or AAVS1 and were maintained in HSC media. On day 4 after editing, LT-HSCs were sorted and 10x scRNA sequencing was performed. There was no difference in cell cycle state in LT-HSCs following AAVS1 or MECOM editing. (l) Analysis of cell expansion following CRISPR editing. AAVS1 or MECOM edited HSPCs were labeled with CFSE and successive generations of cell divisions were determined by CFSE signal intensity on day 5 which was used to calculate the replication index, showing the total number of divided cells/cells that underwent at least one division. Mean of three independent experiments is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 5e-2. (m) Total number of LT-HSCs following MECOM editing. Primary human CD34 + HSPCs underwent CRISPR editing on day 1 after thawing and were cultured in HSC media containing UM171 which was changed every 2 days. On day 6 after editing, the percentage of immunophenotypic LT-HSCs determined by flow cytometry, and the total cell number determined by trypan blue exclusion were used to calculate the total number of LT-HSCs in culture. Mean of three independent experiments is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 4.7e-3. (n) Stacked bar plots of colony-forming assay comparing MECOM edited HSPCs derived from peripherally mobilized CD34⁺ cells from healthy adult donors. (n = 6) to AAVS1-edited controls (n = 3). CFU-GEMM, colony-forming unit (CFU) granulocyte erythroid macrophage megakaryocyte; CFU-GM, CFU granulocyte macrophage; CFU-M, CFU macrophage; CFU-G, CFU granulocyte. Mean colony number is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 3.9e-2, **P = 2.5e-4, ***P = 1.7e-5, ns=not significant. (o-p) NGS of MECOM in human HSPCs following CRISPR editing, prior to xenotransplantation (o), and after harvest from bone marrow at 16 weeks of one representative mouse (p). Sequences present at frequencies >0.5% are displayed. (q) Analysis of bone marrow of mice at week 16 following transplantation of MECOM-edited (n = 5) and AAVS1-edited (n = 3) adult HSPCs. Mean is indicated by black line and each data point represents one mouse. Two-sided Student t-test used. *P = 3.8e-2. (r) Analysis of the MECOM locus of human cells harvested from mice following primary or secondary xenotransplantation. Half of the primary recipient mice (4/8) had human chimerism >0.25% (circles) and the other half had chimerism <0.25% (triangles) but had human MECOM sequences that were detectable by PCR. All of the secondary recipients had human chimerism <0.25% but had human MECOM sequences that were detectable by PCR.

Source data

Extended Data Fig. 2 Single-cell RNA sequencing of LT-HSCs after MECOM editing.

(a) Bar graphs showing the number of cells in each of the 11 hematopoietic cell clusters identified by single cell RNA sequencing of CD34⁺CD45RA⁻CD90⁺ HSCs following AAVS1 (yellow) or MECOM (red) editing. Mean is plotted and each of two biological replicates is shown. Total number of cells profiled in each group: AAVS1 – 19,375, MECOM – 19,821. (b) Uniform manifold approximation and projection (UMAP) plot of 263,828 single cells from human umbilical cord blood, colored according to HSC signature (CD34, HLF, CRHBP). (c) Transcriptional identities of cells stratified by HSC signature score. HSC signature score was calculated for CD34⁺CD45RA⁻CD90⁺ HSCs from Fig. 2d. Cells were grouped into high HSC signature score (>0.5), mid HSC signature score (>0 and <0.5), and low HSC signature score (<0) clusters, and cell identities were determined by transcriptional signatures using Celltypist⁴¹. Cells with a high HSC signature score were enriched for HSCs and cMPPs. Abbreviations of cell types defined in Fig. 2a. (d) Stacked bar graph showing AAVS1 or MECOM edited CD34⁺CD45RA⁻CD90⁺ HSCs stratified by expression of HSC signature score as defined in Extended Data Fig. 2c. MECOM editing leads to a depletion of cells with high HSC signature score. (e-g) UMAP plots of the normalized expression of CD34 (e), HLF (f), and CRHBP (g) in phenotypic LT-HSCs. The combined expression of these three genes defines the HSC signature in Fig. 2d, e and Extended Data Fig. 2c, d.

Source data

Extended Data Fig. 3 Characterization of the MECOM network in LT-HSCs.

(a) Scatter plot of gene expression in LT-HSCs following AAVS1 or MECOM editing showing the expression of all genes. The inset box highlights the subset of genes described in Fig. 3a that contains the differentially expressed genes that make up the MECOM regulatory network. (b) Volcano plot projection of the data from Fig. 3a displaying the small but significant fold changes in gene expression of MECOM down genes (log₂ fold change < −0.05) and MECOM up genes (log₂ fold change > 0.05) with p-value <1e-20 as determined by Mann-Whitney U test. Log₂fold change of MPO expression is out of scale of the axis and is noted by a red arrow. (c-d) Box plots showing expression of a subset of MECOM down (c) and MECOM up (d) genes in a representative random permutation of cohort assignments, demonstrating no difference in gene expression. Gray dots show imputed gene expression in single cells. n = 1,000 randomly assigned cells in each group. The box plot center line, limits, and whiskers represent the median, quartiles, and interquartile range, respectively. (e) Scatter plot of gene expression in CD34⁺CD45RA⁻CD90⁺CD133⁺EPCR⁺ITGA3⁺ LT-HSCs enriched for the HSC signature compared to bulk LT-HSCs. Expression differences between MECOM and AAVS1-edited LT-HSCs were calculated and MECOM down and MECOM up genes are plotted. Correlation was calculated using Spearman’s rank correlation test and significance was calculated using permutation testing. (f) Scatter plot of enrichment scores of MECOM down and MECOM up gene sets in hematopoietic progenitors. CD34⁺CD45RA⁻CD90⁺ HSCs from Fig. 2b were clustered by cell identities as in Fig. 2a. Differences in gene expression between AAVS1 and MECOM edited samples in each cluster were calculated and used to query for the enrichment of MECOM down (red) or MECOM up (blue) gene sets by GSEA. X-axis plots the Normalized Enrichment Score (NES) and y-axis plots -log₁₀(p-value) for each cluster as calculated by Kolmogorov Smirnov (K-S) test. Significant enrichment was defined as NES > 1.5 or < −1.5 and pval <0.01.

Source data

Extended Data Fig. 4 Lentiviral expression of MECOM rescues LT-HSCs but does not reverse upregulation of MECOM up genes.

(a) Schematic of lentiviral vector for increased MECOM expression. MECOM sgRNA binding site is shown in bold, and wobble mutations introduced by PCR are indicated. LTR, long terminal repeat; IRES, internal ribosome entry site. (b) Edited allele frequency of intended endogenous MECOM locus and MECOM cDNA after viral integration. Editing and infection were performed as in Fig. 4a. Integrated viral cDNA was amplified using a forward primer in the cDNA sequence and reverse primer in the IRES sequence. n = 3 biologically independent samples. Mean is plotted and error bars show s.e.m. (c) FACS plots for LT-HSC detection after MECOM editing and rescue. Gating strategy as in Fig. 1e. Percentages show the mean (± s.e.m) of three independent experiments. GFP ratio (Fig. 4e) is defined as the ratio of GFP⁺ cells in LT-HSC population (column 4) to GFP⁺ cells in the bulk population (column 5). (d) Cell expansion after MECOM editing and rescue. Increased expansion of HSPCs after MECOM editing is not reversed by viral MECOM expression. AAVS1, edited at AAVS1, infected with GFP virus; MECOM, edited at MECOM, infected with GFP virus; rescue, edited at MECOM, infected with MECOM virus, n = 3 for each group. Mean is plotted and error bars show s.e.m. Two-sided Student t-test used. *P = 3.7e-3. (e) Bar graph of the effect of MECOM isoform overexpression on the maintenance of LT-HSCs. HSPCs were edited at AAVS1 (yellow) or MECOM (red) and infected with lentivirus encoding GFP or MECOM isoforms as displayed. The percentage of LT-HSCs was determined by FACS. n = 2 biologically independent sample. Mean is plotted and error bars show s.e.m. (f-g) GSEA of MECOM up genes after editing and rescue in bulk LT-HSCs. (f) MECOM up genes are more highly enriched in AAVS1 samples in bulk in contrast to data from single cell analysis (Fig. 3a). (g) MECOM up genes are further increased after MECOM viral infection. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA.

Source data

Extended Data Fig. 5 Establishment of a cis-regulatory network in HSCs.

(a) Schematic view demonstrating different types of functional interactions between cis-regulatory elements and genes. HemeMap predicts these interactions by integration of multiomics data including RNAseq, ATACseq and promoter capture-HiC (PC-HiC) data across 16 or 18 hematopoietic cell types. (b) Bar graph showing the overlap between genomic interactions nominated by HemeMap and experimentally defined interactions. More than half of the direct interactions nominated by PC-HiC and RNA⁻ATAC correlations were supported by evidence from Hi-C interactions in HSPCs. (c) Correlation of cisRE-gene interaction strength with gene expression in HSCs. HemeMap scores were calculated for each cisREs-gene interaction and HemeMap interactions were arranged by increasing scores and grouped evenly into 50 bins. Median gene expression in each bin is depicted (bars). The median expression of a randomly sampled equal-sized gene set is shown (dots). (d) Validation of cisREs associated with MECOM network genes. H3K4 methylation, DNase hypersensitivity and H3K27 trimethylation signals from HSPCs⁵² at MECOM network cisREs reveals an active transcriptional pattern consistent with enhancer elements. (e) Distribution of HemeMap scores in HSCs. To construct the HSC-specific regulatory network, significant interaction scores >8.91 were included. Significance threshold was determined by Chi-square distribution. (f) Comparison of interaction strengths. cisREs containing ETS footprint (n = 711) were significantly associated with stronger HemeMap scores than those without (n = 6,371). P-values as shown were calculated using a two-sided Wilcoxson signed-rank test. The box plot center line, limits, and whiskers represent the median, quartiles and 1.5x interquartile range, respectively. (g-h) Analysis of TF footprint co-occurrence in the cisREs associated with MECOM down genes (g) and MECOM up genes (h), respectively. The frequency of occurrence and P values were calculated using a two-sided hypergeometric test. The color and size of dots are proportional to statistical significance. (i) Experimentally defined EVI1 ChIP-seq peaks²⁶ were compared to HemeMap predicted cisREs of MECOM network genes and show significant overlap with cisREs that contain ETS footprints. P-value was determined by permutation testing.

Extended Data Fig. 6 CTCF-mediated looping of MECOM down genes in HSCs.

(a) Density plot showing the distribution of distance between ETS motifs and CTCF motifs in cisREs of MECOM network genes. Average distance is 36 base pairs (BP). (b) Box plots depicting the quantitative difference of CTCF ChIP-seq signals between CD34⁺ HSPCs and lineage-committed cells from Fig. 6d. The normalized CTCF ChIP-seq signals of 50 bp regions centered on CTCF footprints (n = 6,185) were calculated and compared. The significance was determined using a two-sided Wilcoxson signed-rank test and p-values for each comparison are displayed. The box plot center line, limits, and whiskers represent the median, quartiles, and interquartile range, respectively. (c-e) Box plots displaying the chromatin accessibility of CTCF-associated cisREs during hematopoietic differentiation. MECOM down cisREs that contain a CTCF footprint (n = 6,185) are associated with progressively less chromatin accessibility during differentiation along the (c) erythroid, (d) myeloid, and (e) lymphoid lineages. The box plot center line, limits, and whiskers represent the median, quartiles, and interquartile range, respectively. (f) Chromatin interactions of MECOM down genes based on the presence and orientation of CTCF footprint. 448 chromatin interactions involving MECOM down genes were identified and were categorized as: (1) no CTCF footprint detected at either anchor (2) CTCF present both anchors in same orientation (3) CTCF present both anchors in opposite orientation (4) CTCF present at only one anchor. (g) Bar graphs of CRISPR editing frequencies in human HSPCs. Cells that underwent dual CRISPR perturbation of MECOM and CTCF had editing similar frequencies compared to single-edited cells. n = 3 per group. Mean is plotted and error bars show s.e.m. (h) Bar graphs of total cell number following CRISPR editing. Increased expansion of HSPCs following MECOM perturbation was seen as in Extended Data Fig. 1i and was rescued by dual MECOM and CTCF perturbation. n = 3 per group. Mean is plotted and error bars show s.e.m. Two-sided Student t-test used.* P = 5e-2. (i) GSEA of MECOM down genes and MECOM up genes in bulk LT-HSCs after MECOM perturbation compared to AAVS1 perturbation. MECOM down genes are depleted and MECOM up genes are enriched following MECOM editing. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA. (j) GSEA of MECOM down genes and MECOM up genes in bulk LT-HSCs after CTCF perturbation compared to AAVS1 perturbation. MECOM down genes are upregulated after CTCF editing alone, but there is no enrichment of MECOM up genes. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA. (k) Expression of MECOM down genes that are associated with at least two CTCF footprints (n = 80, left) and those not associated with CTCF footprints (n = 29, right), following either MECOM perturbation alone or dual MECOM and CTCF perturbation. P-values as shown were calculated using a two-sided Wilcoxson signed-rank test. The box plot center line, limits, and whiskers represent the median, quartiles, and interquartile range, respectively.

Source data

Extended Data Fig. 7 MECOM down gene network enrichment is independently associated with overall and event-free survival.

(a) GSEA of MECOM down genes in primary AML patient samples from TCGA (left) and BEAT AML (right) stratified by MECOM expression. Individual gene expression was averaged from TCGA or BEAT AML samples with MECOM expression of log₂(RPKM) > 4 and compared to the average of samples with MECOM expression log₂(RPKM) < 4. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA. (b-d) GSEA of MECOM down genes in primary AML patient samples from TCGA. For each patient sample, expression of every gene was compared to its average expression from all TCGA patient samples, and GSEA was performed to assess for enrichment of MECOM down genes. Representative plots of three individual patients are shown. (b) Patient 2896 had enrichment of MECOM down genes and an overall survival of 230 days. (c) Patient 3011 had depletion of MECOM down genes and an overall survival of 2450 days. (d) Patient 2982 had no significant enrichment or depletion of MECOM down genes and an overall survival of 1110 days. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA. (e-f) Stacked bar graph showing proportion of patients with MECOM network enrichment or depletion following stratification by clinical risk group or LSC17 score in adult (e) or pediatric AML (f). (g-k) KM event-free survival curves for the pediatric AML cohort stratified by (g) MECOM expression, (h) MECOM network enrichment, (i) MECOM NES, (j) clinical risk group, and (k) LSC17. For continuous variables in (g), (i), and (k) the optimal threshold was determined by maximizing sensitivity and specificity on mortality (Youden’s J statistic). Hazard Ratios (HR) were computed from univariate cox-proportional hazard models. P values representing the result of Mantel-Cox log-rank testing are displayed. Test for trend was performed for clinical risk group stratification (>2 groups).

Source data

Extended Data Fig. 8 Evaluation of the MECOM gene network in high-risk AML.

(a) Violin plots showing MECOM expression in AML samples from CCLE. AML samples were stratified by MECOM expression (log₂ RPKM + 1). Low, <1 (n = 31); High≥1 (n = 13). Mean is plotted and dashed lines indicate quartiles. (b) GSEA of MECOM down genes and MECOM up genes in four AML cell lines with high MECOM expression compared to average expression in MECOM low AML cell lines. MUTZ-3, F36P, HNT34, and OCI-AML4 have enrichment of MECOM down genes and depletion of MECOM up genes. The Kolmogorov Smirnov (K-S) test was used to determine the significance of GSEA. (c) Volcano plot showing differential CRISPR dependencies of CCLE AMLs stratified by MECOM expression. Average CRISPR dependencies for the CCLE AML cohorts as defined in Extended Data Fig. 8a were determined using CERES and effect size was calculated by comparing dependency scores of MECOM high and MECOM low AMLs. Effect size of 0 indicates no difference in essentiality whereas negative effect size indicates higher essentially in MECOM high AML. Significance was calculated with Mann-Whitney U test. (d) FACS plots showing the differentiation of MUTZ-3 cells after CD34 selection. CD34⁺ MUTZ-3 cells were magnetically separated using the EasySep Human CD34 Positive Selection Kit II, cultured in MUTZ-3 media, and analyzed by flow cytometry at the indicated timepoints. (e) Time course of edited allele frequency in MUTZ-3 AML. Genotyping was performed in bulk MUTZ-3 cells following CRISPR editing at AAVS1 (blue) or MECOM (red). n = 3 biologically independent samples. Mean is plotted and error bars show s.e.m. Missing error bars are obscured by the icons. (f) Violin plot of differential gene expression in CD34⁺ MUTZ-3 cells following MECOM perturbation. MECOM down genes are significantly depleted and MECOM up genes are significantly enriched in MECOM edited samples compared to AAVS1 edited samples, unlike a set of randomly selected genes. Two-sided Student t-test used. ****P = 1e-4. (g) Bar graphs of CRISPR editing frequencies in MUTZ-3 AML. Cells that underwent dual CRISPR perturbation of MECOM and CTCF had similar editing frequencies compared to single-edited cells. n = 2 biologically independent samples. Mean is plotted and error bars show s.e.m. (h) CTCF ChIP-seq in MUTZ3 cells after MECOM editing. MUTZ-3-Cas9 cells were transduced with sgMECOM lentivirus and cells were harvested on day 4 for ChIP-seq. There is significant CTCF binding to cisREs of MECOM down genes in MUTZ-3, but no differential binding after MECOM editing.

Source data

Supplementary information

Supplementary Information

Supplementary Methods.

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–7.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Voit, R.A., Tao, L., Yu, F. et al. A genetic disorder reveals a hematopoietic stem cell regulatory network co-opted in leukemia. Nat Immunol 24, 69–83 (2023). https://doi.org/10.1038/s41590-022-01370-4

Download citation

Received: 22 February 2022
Accepted: 25 October 2022
Published: 15 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1038/s41590-022-01370-4

This article is cited by

Deciphering cell states and genealogies of human haematopoiesis
- Chen Weng
- Fulong Yu
- Vijay G. Sankaran
Nature (2024)
MECOM Deficiency: from Bone Marrow Failure to Impaired B-Cell Development
- Richard A. Voit
- Vijay G. Sankaran
Journal of Clinical Immunology (2023)

Subjects

Abstract

Similar content being viewed by others

Main

Results

MECOM loss impairs HSC function in vitro and in vivo

Single-cell profiling reveals HSC loss after MECOM disruption

MECOM loss in LT-HSCs elucidates a dysregulated gene network

Increased MECOM expression rescues HSC dysregulation

Defining the HSC cis-regulatory network mediated by MECOM

Dynamic CTCF binding represses MECOM down genes

The MECOM gene network is hijacked in high-risk AMLs

Validation of MECOM addiction in a subset of high-risk AMLs

Discussion

Methods

Data reporting

Cell line and primary cell culture

Mouse model

CRISPR editing and analysis

Viral constructs and transduction

Transplantation assays

Flow cytometry and cell sorting

Cell cycle analysis

Colony-forming unit cell assays

10x Genomics scRNA-seq

Bulk RNA-seq

Genome and transcriptome sequencing

ChIP-seq

Quantification and statistical analysis

Protein structure prediction

Bulk RNA data analysis

Single-cell RNA data analysis

MECOM genotyping in G&T data

Differential expression analysis

Pseudobulk analysis

HSC signatures in the Immune Cell Atlas

Gene signature enrichment during hematopoiesis

Gene set enrichment analysis

Development of HemeMap

ChIP-seq data analysis

CTCF-mediated loop enrichment analysis

Analysis of primary AML patient data

Included studies

Derivation of variables of interest

Survival analyses

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links