While the majority of myelodysplasia and acute myeloid leukemia (MDS/AML) cases are sporadic, rare familial predisposition syndromes have been delineated and now represent a separate disease entity in the revised World Health Organization (WHO) classification of myeloid neoplasms . Germline mutations in ~14 disease genes have been uncovered thus far, with GATA2 representing one of the key transcriptional regulators commonly mutated in inherited MDS/AML . Increasing evidence suggests that aberrations in GATA2 impair its transcription and promoter activation, leading to a loss-of-function, supporting a mechanism of GATA2 haploinsufficiency [3,4,5]. Reduced penetrance, the observation that family members carry an identical germline mutation yet display variable clinical manifestations, is common and poses a clinical challenge in the diagnosis and management of familial leukemia's, particularly when identifying “silent” mutation carriers for genetic screening and exclusion as potential stem cell transplant donors [6, 7]. Indeed, we have noted that reduced penetrance is a feature among certain GATA2-mutated MDS/AML families , especially those harboring missense germline mutations such as c.1061C>T (p.Thr354Met) (Table S1) although the precise molecular explanation of such occurrence has not been investigated.
Analysis of five MDS/AML families harboring p.Thr354Met GATA2 mutations displayed significant intra- and interfamilial variations in disease latency, phenotype, and penetrance (Figure S1). These observations suggest that individuals require additional co-operating events for the development of overt malignancy within the context of a shared germline mutation. To investigate this hypothesis further, we examined an extensive five-generation pedigree  (Fig. 1a) where two first-degree cousins (IV.1 and IV.6) developed high-risk MDS/AML with monosomy 7, while a third cousin (IV.10) presented with recurrent minor infections and significant monocytopenia [0.1 × 109/L] and neutropenia [0.8 × 109/L] in year (yr.) 1–3 which subsequently stabilized (monocyte count, neutrophils [>1 × 109/L]) 3 years after presentation (Fig. 1b). This contrasted with the parental generation (III.1, III.5, and III.7) where mutation carriers remain symptom-free with no evidence of hematopoietic abnormality over 60 years of age.
We therefore started with targeted deep sequencing of 33 genes frequently mutated in MDS/AML to define the landscape of secondary genetic mutations across mutation carriers. Notably, while no acquired mutations were detected in asymptomatic family members, all affected cousins analyzed shared an identical somatic ASXL1 mutation (p.Gly646TrpfsTer12) (Fig. 1c). The variant allele frequency (VAF), however, was lower (12%) in IV.10 and remained stable (range 12–6%) over a 6-year monitoring period. While the co-occurrence of ASXL1 and GATA2 mutations has been proposed as one mechanism for driving the onset and severity of disease symptoms [9,10,11], the low VAF of ASXL1 mutation and stable improvement in hematopoiesis at IV.10 later follow-up suggested that a combination of GATA2–ASXL1 mutation alone is insufficient to promote clonal expansion and leukemic transformation, as this secondary somatic hit may not represent disease progression or identify when treatment is indicated. Intriguingly, apart from the ASXL1 mutation, no other acquired mutations were detected in the 33-myeloid genes assessed in the affected individuals. Moreover, on the basis of our observations and in agreement with previous studies [12, 13], it seems that monosomy 7 in IV.1 and IV.6 is acquired following acquisition of ASXL1 mutations, hence contributing to the malignancy but not initiating symptoms.
We next considered whether disease symptoms are modulated by endogenous levels of GATA2. Quantitative real-time PCR (qRT-PCR) of bone marrow material demonstrated total GATA2 expression to be significantly lower in the symptomatic (IV.10-yr.1) compared with an asymptomatic carrier (III.7) (Fig. 1d). Significantly, Sanger sequencing of the cDNA template revealed striking allele-specific expression (ASE), favoring the mutant (T) allele with the absence of the wild-type (WT) (C) allele expression in the symptomatic patient (IV.10), contrasting with biallelic expression in asymptomatic members (III.5 and III.7) (Fig. 1e). This observation was validated by cDNA cloning of III.7 and IV.10 bone marrow samples and subsequent Sanger sequencing of individual clones (Figure S2). As this suggested that an allelic imbalance in WT:mutant GATA2 expression ratio may account for the variable disease penetrance in this pedigree, we assessed GATA2 expression in IV.10 over a 6-year disease period at four time-points (yr. 1, 3, 4, and 6), demonstrating increased GATA2 expression at later time-points (yr. 4 and 6) (Fig. 1d) coinciding with reactivation of the WT (C) allele expression (Fig. 1f) and an improvement in hematological parameters, in the absence of any clinical intervention (Fig. 1b).
To test whether monoallelic GATA2 expression has an impact on the transcriptome driving the onset of disease symptoms, we performed RNA-seq with a view of examining downstream biological features distinctive of GATA2 monoallelic (IV.10-yr.1 and 3) vs. biallelic (IV.10-yr.4 and III.7) groups. Unsupervised analysis revealed a clear separation between GATA2 monoallelic and biallelic samples (Fig. 1g, S3 and Table S2). It was noteworthy that certain canonical pathways and gene sets related to tumorigenesis (e.g., DNA replication and cell cycle) were enriched in GATA2 monoallelic vs. biallelic groups (Figure S4), potentially reflecting the clinical and phenotypic switch between these two groups. We also noted a significant overexpression of genes with GATA2 cofactor PU.1 motifs in their regulatory regions (p value NES = 2.06) in GATA2 biallelic vs. monoallelic samples, in support of a recent finding  that p.Thr354Met mutants bind and interact with PU.1 more tightly than WT, thus leading to sequestration of PU.1 from its normal cellular functions. Consequently, the transcriptional activation triggered by PU.1 will be diminished in our GATA2 monoallelic samples.
The differences observed in these gene-expression profiles prompted us to explore the molecular mechanisms underlying monoallelic GATA2 expression. We hypothesized that these allele-specific changes in GATA2 expression are driven by transient epigenetic mechanisms that include changes in DNA methylation and chromatin mark deposition. A CpG single-nucleotide polymorphism (CpG-SNP) (rs1806462) [C/A] located within the promoter and 5′UTR of GATA2 overlapping a CpG island offered a marker to distinguish between mutant and WT alleles where this SNP creates/abolishes a CpG dinucleotide within the GATA2 promoter region (Fig. 2a). More specifically, cDNA sequencing of 5′UTR allowed us to define haplotypes, where the promoter SNP allele (A) resides on the germline mutant GATA2 allele (T) (Fig. 2a(ii)). Apart from IV.10, no other family members and only 2/12 individuals from pedigrees presented in Figure S1 were heterozygous for this SNP (one of whom is an asymptomatic carrier). Therefore, we do not infer that this haplotype would contribute to the progression of symptoms. Instead, we used this SNP to determine whether allele-specific differences in DNA methylation could explain the silencing of WT GATA2 allele expression observed in earlier time-points of IV.10. As illustrated in Fig. 2b and S5, bisulfite sequencing of a 200-bp region encompassing rs1806462 demonstrated a significant increase in promoter methylation in the WT allele of IV.10 in yr. 1 and yr. 3 following diagnosis, in contrast with the absence of allele-specific differences in methylation at a later time-point.
We next sought to establish whether these allele-specific changes in GATA2 methylation and expression are accompanied by changes in chromatin structure at the promoter. H3K4me3 and H3K27me3 define poised or closed chromatin, respectively, rendering them more or less accessible for transcription factors, thereby regulating gene expression . The deposition of these bivalent marks was assessed in IV.10 by allele-specific chromatin immunoprecipitation (ChIP) followed by Sanger sequencing within GATA2 promoter region encompassing the SNP rs1806462 [C/A]. While there were no apparent allele-specific differences in H3K27me3 deposition across the different time-points of IV.10, an enrichment in the deposition of H3K4me3 on the promoter of the mutant allele (A) relative to the WT allele (C) was noted in IV.10 monoallelic samples (yr. 1 and 3) (Fig. 2c, S6 and S7). In contrast, and consistent with the pattern observed with DNA methylation, there was no demonstrable difference in H3K4me3 deposition in the IV.10 biallelic sample (yr. 4), coinciding with reactivation of the WT allele expression and an overall improvement in clinical parameters. We believe that these observations are in keeping with the notion that H3K4me3 occupancy inhibits de novo DNA methylation  which was borne out by subsequent bisulfite sequencing of H3K4me3-enriched DNA from our ChIP experiments, demonstrating that DNA methylation and H3K4me3 deposition are mutually exclusive in our IV.10 samples (Figure S8).
Collectively, our findings provide a step forward in understanding the molecular mechanisms underlying reduced penetrance in GATA2-mutated MDS/AML pedigrees, which may be governed by the acquisition of additional co-operating mutations (e.g., ASXL1) combined with dynamic epigenetic reprogramming and subsequent allele-specific expression of GATA2 mutant allele, adding another level of complexity to the (epi)genetic basis of familial MDS/AML.
We are indebted to the family investigated in this study whose members have kindly donated samples for research. We also thank all the clinicians who have looked after this family over the years. This study was supported by the Saudi Arabian Ministry of Higher Education through a doctoral scholarship awarded to A.F.A.S. and a Bloodwise Programme grant (14032) awarded to J.F., T.V., and I.D.
J.F., A.F.A.S., and A.R.-M. designed the study; A.F.A.S. and A.R.-M. performed the experiments; A.F.A.S., A.R.-M., K.T., and J.F. analyzed the data and wrote the manuscript; K.T., H.S., C.H., T.V., I.D., M.S., and J.C. collated familial clinical information; S.I. provided patient material from tissue bank; S.B., N.L., and D.M. performed targeted deep sequencing; J.W. and A.N. carried out RNA-seq analysis; J.A.H. provided technical ChIP expertise; E.J.K., M.W.W., and C.M.N. provided familial samples; T.B. provided patient blood films; and C.B., A.E., S.R.C., H.T., T.V., and I.D. assisted with data analysis and contributed to the study with fruitful discussions. All authors read, reviewed, and approved the final manuscript.