Epigenomic analysis of KLF1 haploinsufficiency in primary human erythroblasts

Haploinsufficiency for the erythroid-specific transcription factor KLF1 is associated with hereditary persistence of fetal hemoglobin (HPFH). Increased HbF ameliorates the symptoms of β-hemoglobinopathies and downregulation of KLF1 activity has been proposed as a potential therapeutic strategy. However, the feasibility of this approach has been challenged by the observation that KLF1 haploinsufficient individuals with the same KLF1 variant, within the same family, display a wide range of HbF levels. This phenotypic variability is not readily explained by co-inheritance of known HbF-modulating variants in the HBB, HBS1L-MYB and/or BCL11A loci. We studied cultured erythroid progenitors obtained from Maltese individuals in which KLF1 p.K288X carriers display HbF levels ranging between 1.3 and 12.3% of total Hb. Using a combination of gene expression analysis, chromatin accessibility assays and promoter activity tests we find that variation in expression of the wildtype KLF1 allele may explain a significant part of the variability in HbF levels observed in KLF1 haploinsufficiency. Our results have general bearing on the variable penetrance of haploinsufficiency phenotypes and on conflicting interpretations of pathogenicity of variants in other transcriptional regulators such as EP300, GATA2 and RUNX1.

β-hemoglobinopathies such as β-thalassemia and sickle cell anemia (SCA) are caused by mutations within the β-globin subunit of adult hemoglobin (HbA, α2β2). High levels of fetal hemoglobin (HbF, α2γ2) ameliorate the symptoms of β-thalassemia and SCA. Reactivation of fetal γ-globin expression and, in the case of SCA, concomitant downregulation of sickle β-globin expression is therefore seen as an attractive approach to improve the condition of β-hemoglobinopathy patients. Although not completely resolved, fetal-to-adult hemoglobin switching depends on a core network of transcriptional regulators converging on B-cell lymphoma 11A (BCL11A) and Lymphoma Related Factor (LRF, encoded by ZBTB7A), that are direct repressors of the HBG1/2 genes encoding γ-globin [1][2][3] . In erythroid cells, BCL11A and ZBTB7A expression is induced by the transcription factor Krüppel-like factor 1 (KLF1) [4][5][6] . Haploinsufficiency for KLF1 results in hereditary persistence of fetal hemoglobin (HPFH) through a mechanism that involves reduced expression of BCL11A 4, 6 and LRF 5 . Modulation of this regulatory network has been proposed as an approach to increase HbF levels 1-10 , a notion that has been confirmed in animal models [11][12][13][14][15] . As BCL11A 16 and LRF 17 are tumor suppressors with important functions outside erythropoiesis, targeting the correct cells is important. In addition, KLF1 18,19 and LRF 20 are essential for terminal erythroid differentiation. KLF1 variants are associated with a broad spectrum of benign but also severe erythroid defects 9 . These include the In(Lu) and Indian blood types 21 , pyruvate kinase deficiency 22 , increased HbA2 (α2δ2) 23 , increased zinc protoporphyrin 24 , congenital dyserythropoiesis [25][26][27][28] and HPFH 4 . In families with HPFH caused by KLF1 haploinsufficiency, HbF levels are variable suggesting additional regulators 4 . This variation could only be partially explained by coinheritance of known HbF-modulating genotypes at the HBB locus www.nature.com/scientificreports/ itself and trans-acting HPFH loci BCL11A and HBS1L-MYBI 4,7,9 . This raises the possibility that another, as yet unidentified, trans-acting locus is involved. To investigate the phenotypic variability of KLF1 haploinsufficiency, we used cultured erythroid cells carrying the KLF1 p.K288X variant obtained from two Maltese HPFH pedigrees 4 and control Maltese individuals. This nonsense variant, now annotated as SNP rs267607202, ablates the complete zinc finger domain and therefore abrogates DNA binding of the mutant protein. Since the truncated KLF1 protein failed to rescue the HPFH phenotype, the KLF1 p.K288X variant is considered to be a null mutation 4,9 . Here, we used gene expression (RNA-seq) and chromatin accessibility assays (ATAC-seq 29 ) to demonstrate that KLF1 haploinsufficiency affects erythroid gene expression of KLF1 target genes, but -with the exception of a small set of genes including HBG1/2-chromatin accessibility was not altered. The HbF expression in HPFH samples correlated negatively with KLF1, BCL11A and ZBTB7A expression levels. We suggest that allelic variation affecting wildtype KLF1 expression explains a significant part of the HbF variability observed within this cohort of Maltese HPFH individuals.  . 1a). Terminal differentiation induced a short proliferation phase (3 days) followed by growth arrest (Fig. 1b) and morphological changes, which were comparable between HPFH and control cells (Fig. 1c). Based on CD71 (transferrin receptor) and CD235a (glycophorin A) expression and overall hemoglobinization, no major differences in erythroid differentiation were observed ( Fig. 1d; Suppl Fig. 2). The previously identified KLF1 target gene and cell surface marker CD44 was expressed at lower levels on cells from HPFH cultures ( Fig. 1e) 4,21 . Flow cytometry for HbA and HbF at 96 h of differentiation showed a population expressing only HbA and a population expressing both HbF and HbA (Fig. 1f). HbF levels were higher in HPFH compared to control cultures and generally recapitulated the relative expression levels of HbA and HbF in vivo ( Fig. 1f; Table 1, Suppl Table 1). The geometric mean fluorescence intensity (GeoMFI) correlated to the percentage of HbF measured by HPLC on both cultured erythroid cells (Fig. 1g) and erythrocytes (Suppl Fig. 2b), but no significant increase in HbF GeoMFI within the HbF/HbA double-positive cells was observed (Suppl Fig. 2c). This shows that higher HbF results from more cells expressing HbF as opposed to increased HbF expression per cell (Fig. 1f). As reported before, control and HPFH cultures showed higher HbF levels compared to erythrocytes obtained from the same individuals ( Fig. 1f www.nature.com/scientificreports/ differentiation). This was also reflected in the correlation matrix (ρ < 0.93 and ρ > 0.95, Suppl Fig. 3a). At T48, out of 9136 unique RNAs we identified 344 differentially expressed genes (3.7%; FDR < 0.05) between HPFH and control samples with 173 up-and 171 downregulated genes (Suppl Table 2a,b). In line with increased HbF expression, HPFH erythroblasts expressed higher γ-globin mRNA levels (HBG1/2) and lower β-globin mRNA levels (HBB) compared to controls (Fig. 2b,c). Expression of known KLF1 target genes BCAM and CD44 4, 21 was reduced in HPFH samples (Fig. 2d, Suppl Table 2a). Of a set of known globin regulators only BCL11A was significantly lower expressed (1.8-fold) in HPFH compared to control cultures (FDR < 0.05; Fig. 2b, Suppl Fig. 3b) 4 . Of note, KLF1 and ZBTB7A, the gene encoding LRF, were expressed approximately 1.6-fold lower, but these genes did not pass the correction threshold for multiple testing (P-value < 0.001, FDR > 0.05). RT-qPCR validated the different expression levels of BCL11A, HBG1/2, and HBB, but expression of KLF1 and ZBTB7A genes was not significantly different comparing control to HPFH samples (Suppl Fig. 3c). We previously reported that the KLF1 p.K288X mRNA may be subject to nonsense mediated decay and that KLF1 mRNA expression was lower in samples from HPFH individuals 4 . The stringent threshold applied here combined with heterogeneity between the samples may explain the inability to detect significant expression differences for KLF1 and ZBTB7A. Within individual samples, lower KLF1 expression www.nature.com/scientificreports/ correlated with lower expression of BCL11A and ZBTB7A, and higher expression of HBG1/2. (ρ = 0.90 P < 0.001, ρ = 0.67 P < 0.05, ρ = − 0.71 P < 0.05 respectively, Fig. 2c). In addition, HBG1/2 expression levels correlated well with the percentage of HbF found in the cultures (see also Fig. 1g). This suggests that the observed variability in HbF levels results from additional variation of KLF1 expression in the haploinsufficiency setting of the HPFH samples. Many genes change expression at the onset of differentiation 33 . Indeed, out of the 344 differentially expressed genes between HPFH and control erythroblasts, 296 were also found to change upon differentiation induction (T0 and T48; FDR < 0.05; Suppl Fig. 3c). Importantly, differential expression between HPFH and control samples was not due to an altered response to induction of differentiation (Fig. 1c, Gene-proximal chromatin accessibility is largely unchanged in KLF1 haploinsufficiency. To address if KLF1 haploinsufficiency controls specific chromatin accessibility, for instance on the deregulated target genes, the HPFH and control cells were subjected to ATAC sequencing. Fragment length in ATAC-seq showed the expected periodicity 29 , with more insert sizes at multiples of around 200 bp, corresponding to the lack of one or more nucleosomes (Suppl Fig. 4a, insert). Peaks were defined as present in HPFH or control samples when detected in three or more replicates of that group. ATAC-seq peaks around 200 bp in length were  www.nature.com/scientificreports/ enriched just before the transcription start sites (TSS; Fig. 3a). To detect differences in the abundance of specific regulatory elements, open regions were categorized as belonging to proximal or distal regulatory elements based on distance to the nearest TSS (ALTRE R-package) 34 . 14,654 putative regulatory elements were identified, with 76% found in both HPFH and control samples. 4% of all ATAC peaks were unique to controls and 20% to HPFH samples (Fig. 3b). Of these HPFH-specific elements the majority were identified at distal locations (> 5000 bp from the TSS). Next, the reads per called peak were quantified and annotated to the nearest TSS within a window of + 5000 bp upstream and − 3000 bp downstream, yielding 41,447 peaks. To determine the degree of variance in chromatin accessibility we performed PCA on these peaks. Samples from the two timepoints cluster along the first component, accounting for ~ 52% of total observed variance (Fig. 3c). This most likely reflects the changes in accessible chromatin regions during differentiation 35 . In the second component, the HPFH samples clustered separately from the controls (Fig. 3c). However, this separation was lost upon restricting the analysis to the 2000 most variable ATAC-seq peaks (Suppl Fig. 4c). This indicates that KLF1 haploinsufficiency only accounts for minor variations in chromatin accessibility in the 2000 most variable regions. At T48, 559 out of 41,447 peaks showed altered accessibility in HPFH samples compared to controls (EdgeR 36 ; FDR < 0.05, Suppl Table 2d); 339 were more and 220 were less accessible (Suppl Fig. 4c). For comparison, the variation in chromatin accessibility observed during erythroid differentiation identified 9338 differential peaks between T0 and T48. The minor differences in chromatin accessibility between HPFH and control samples against the larger scale rearrangements that occurred during differentiation illustrate the limited effect of KLF1 haploinsufficiency on chromatin accessibility in erythroid cells. A notable exception was within the HBB locus with HPFH samples showing increased accessibility of the HBG1 and HBG2 promoters. No differences were observed for the other globin genes, including HBB, or for the distal enhancers of the locus control region (LCR) (Fig. 3d, Suppl Table 2d). Of note, while the HPFH samples showed increased accessibility of the HBG1/2 promoters compared to control samples, a further subdivision of the HPFH samples into HbF levels above or below 28% yielded differential accessibility of 99 peaks between the two groups, but these did not include the HBG1/2 promoters (Fig. 3d, Suppl Table 2e).  Chromatin accessibility at KLF1 target genes. Whereas 96% of the 9136 expressed genes and 90% of differentially expressed genes identified by RNA-seq also contained one or more ATAC peaks (Fig. 4a), only 27 of the 344 differentially expressed genes displayed altered chromatin accessibility (Suppl Fig. 4d, Suppl Table 2a). The majority of differentially expressed genes had more ATAC signal at T48 compared to T0 indicating increased chromatin accessibility upon differentiation, regardless of sample genotype or direction of differential gene expression (Fig. 4b). As discussed above, the HBG1/2 promoters displayed increased accessibility in HPFH samples, which was paired with increased γ-globin RNA expression. In contrast, the HBB promoter, known to bind KLF1 37, 38 , did not show reduced accessibility paired with reduced β-globin RNA expression (Fig. 4c). Similar to HBB, KLF1 target genes that were previously identified by ChIP such as BCL11A and CARM1 showed differential RNA expression, but no differential chromatin accessibility. In contrast, CD44 showed both reduced accessibility and reduced expression (Fig. 4c). However, here differential accessibility was found in the first intron, while KLF1 binding was reported at the TSS 5 . Together this suggests that KLF1 haploinsufficiency is not resulting in an altered chromatin accessibility of its target genes. Rather, the increased fraction of genes with BCL11A and LRF binding sites among the differentially expressed genes suggests that indirect regulation by KLF1 is an important mechanism in those cases where both gene expression and chromatin accessibility were altered.

KLF1
Specific SNPs within the promoter of KLF1 may influence promoter activity. Next, we asked if variation in residual KLF1 expression might be linked to variability in HbF levels between HPFH cultures. In the RNA-seq data we observed an inverse correlation between HBG1/2 and KLF1 expression ( Fig. 2c and Fig. 5a). In contrast, previously identified KLF1 targets HBB, BCL11A, ZBTB7A, PLEK2, and CARM1 displayed a positive correlation with KLF1 expression, while SOX6 expression was constant and MYB expression very variable (Figs. 2c, 5a, Suppl Fig. 5). Within the KLF1 promoter several SNPs that may influence KLF1 expression have been annotated 39 . Sanger sequencing showed that control individual 7 and HPFH individuals 1 and 2 carried the minor allele of a SNP, rs3817621, located at position -251 in the KLF1 promoter (Fig. 5b, Suppl Table 1). The minor allele of SNP rs112943515 further discriminated between HPFH and control samples, as it was linked to the KLF1 p.K288X allele (Fig. 5b, Suppl Table 1). To test whether rs3817621 and rs112943515 might affect promoter activity, luciferase reporter experiments using the KLF1 promoter were performed in K562 cells. Promoters containing the minor alleles of rs3817621 or rs112943515 showed reduced luciferase expression compared to the wildtype KLF1 promoter, with rs3817621 having the strongest effect (Fig. 5c). For both rs1129435  (Table 1). CPM: counts per million mapped reads; enh = erythroid enhancer in intron 2 of the BCL11A gene. mean + /-SD; *P < 0.05, **P < 0.01, ***P < 0.01, Students T-test.

Discussion
Reduced expression of KLF1 decreases expression of the HBG1/2 repressors BCL11A and LRF, thus enhancing HbF expression 2, 4, 5 . There are, however, considerable differences in HbF levels between carriers of the KLF1 p.K288X haploinsufficiency allele, prompting us to find an explanation for this variability. We found that a SNP (rs3817621) in the promoter of the remaining wildtype KLF1 allele, which reduces KLF1 promoter activity in luciferase assays, correlated with high HbF levels in cultured erythroid progenitors derived from Maltese KLF1 p.K288X carriers. The rs3817621variant is present in HPFH individuals 1 and 2, displaying the highest ex vivo HbF levels. However, the in vivo HbF level of HPFH individual 2 is the lowest (1.3%) of the four HPFH individuals included here (Table 1, Suppl Table 1). Thus, our studies do not allow extrapolation to the role of rs3817621 in steady-state erythropoiesis in vivo. We note that erythroid progenitor cultures are thought to represent stress erythropoiesis 32 , which might explain the discrepancy between the in vivo and ex vivo observations. SNP rs3817621 (minor allele frequency 0.234) is not associated with the typical reduction in Lutheran and Indian blood groups or altered globin expression caused by KLF1 variants 9 . Intriguingly, rs3817621 was found to be associated with red blood cell indices in a population study 39,40 , supporting the notion that minor alleles may affect the activity of the KLF1 promoter in vivo. From our experimental data we surmise that subtle variations in expression of the remaining wildtype KLF1 allele affect HbF levels through differential expression of the two major HBG1/2 repressors BCL11A and LRF. This is consistent with previous work establishing that expression of these two repressors is under direct control of KLF1 [3][4][5]10 . We propose that the HbF level is the resultant of the compound effect of the activity of the remaining wildtype KLF1 allele and the p.K288X null variant on the other KLF1 allele. These individuals should therefore be considered as compound heterozygote carriers of KLF1 variants, as their phenotype extends beyond that expected for KLF1 haploinsufficiency. Finally, in agreement with our previous microarray RNA expression data 4 western blot analysis showed reduced expression of KLF1 www.nature.com/scientificreports/ in proerythroblasts derived from individuals carrying the p.K288X KLF1 allele. However, a further differential KLF1 protein expression between the individual HPFH cases could not be demonstrated by western blotting (data not shown). We believe that this reflects a technical limitation of quantitative western blot analysis due to the error margin of this technique, which may therefore fail to reliably detect micro-variation in KLF1 expression.

Increased chromatin accessibility of the HBG1/2 promoters. In erythroid progenitors from KLF1
p.K288X-bearing individuals, the HPFH phenotype is associated with increased chromatin accessibility at the HBG1/2 promoters. Notably, this increased promoter accessibility is an exception, as this is not observed for promoters of the vast majority of the 344 deregulated genes observed between control and HPFH erythroid cells. This overwhelming absence of locally altered chromatin accessibility in HPFH erythroblasts shows there is no major direct impact of reduced KLF1 levels on chromatin accessibility of its target genes. This is supported by cross-referencing the genes with differential expression and differential chromatin accessibility against datasets from published ChIP-seq experiments 2, 5 . These genes showed enrichment for binding sites of BCL11A and LRF.
In the context of HbF regulation, this is consistent with the hypothesis that reduced KLF1 expression results in downregulation of many of its target genes, including genes encoding the HBG1/2 repressors LRF and BCL11A. These repressors directly bind and repress the HBG1/2 promoters in adult erythroid cells 1,3 . Presumably, diminished expression of LRF and BCL11A results in reduced binding to the HBG1/2 promoters. Complementary reduced interaction of the HBB promoter with the LCR, previously described in the context of Klf1 null mice 37 , favors activation of the accessible HBG1/2 promoters. We propose that this dual role of KLF1, along with microvariation in KLF1 expression between HPFH individuals, explains a significant part of the variable penetrance of HbF expression levels associated with KLF1 haploinsufficiency.
KLF1 haploinsufficiency does not affect erythroid chromatin accessibility broadly. In addition to the HBG1/2 promoters there is a limited set of ATAC peaks, consisting of less than 2% of the total, that also show differential accessibility in the HPFH cells. Although most of the genes associated with these regions do not display differential expression, it remains possible that these regions act as distal regulatory elements for more remotely located genes. Since erythroid differentiation is accompanied by broad scale chromatin reorganization 35,41 , the limited set of regions changed during KLF1 haploinsufficiency may potentially interfere with differentiation. However, cell morphology and expression of differentiation markers are largely unperturbed in KLF1 haploinsufficiency indicating that the remaining KLF1 expression is sufficient to keep erythroid chromatin accessibility compatible with terminal erythroid differentiation. Reduced expression of the nuclear exportin XPO7, which in turn affects global chromatin condensation 42,43 , provides an alternative explanation for the increased chromatin accessibility. Similar to the HBB locus, regulation of Xpo7 expression by KLF1 ensues via a chromatin looping mechanism in mice 42 . Possibly, this mode of regulation renders genes particularly susceptible to the effects of reduced KLF1 levels. This point deserves future investigation of long-range interactions using chromosome conformation capture techniques 44,45 .
Phenotypic variability associated with haploinsufficiency for transcription regulators. The notion of variable penetrance resulting from mutated alleles was addressed in a study on the roundworm Caenorhabditis elegans. Expression of individual transcripts was counted in a small three-layer regulatory network 46 .
Mutations to the initiating node of the network caused highly variable expression of the intermediate nodes and resulted in bimodal expression of the most downstream genes. For different mutant alleles the threshold required for the downstream effects shifted, which was proposed as an explanation for incomplete penetrance.
In the context of Maltese HPFH, we propose that micro-variation in expression of the wildtype KLF1 allele alters direct and indirect regulation of globin gene expression from the HBB locus. Thus, the variable penetrance of high HbF expression in KLF1 haploinsufficiency fits very well with the idea that micro-variation in expression of a critical factor can be amplified in a regulatory network. In summary, we propose that micro-variations in KLF1 levels are a major source of variable HbF expression observed in KLF1 haploinsufficiency. Our findings have broader implications for understanding the phenotypic variability associated with mutations in other hematopoietic transcription factors, such as RUNX1, EP300 and GATA2. For example, variations in monoallelic GATA2 expression were recently shown to reduce penetrance in patients with hereditary GATA2-mutated MDS/AML 47 . Similarly, micro-variations in RUNX1 and EP300 expression might help explain the range of phenotypes observed in response to mutations in these factors [48][49][50] . Such cases of differential penetrance illustrate the importance of extended genotypic screening of transcription factor loci, in order to improve prognostic and therapeutic strategies in the event of haploinsufficiency. www.nature.com/scientificreports/ Hemoglobin content and cell morphology. Aliquots (1 × 10 5 cells) were analyzed for hemoglobin content by spectrophotometry as described 51 . Cell morphology was analyzed in cytospins stained with Giemsa and neutral benzidine 52 , using an Olympus BX40 microscope (40 × objective, NA 0.65), and a Leica DM-2500digital camera.

Methods
HPLC. 1 × 10 7 cells were collected and analyzed for hemoglobin expression by high-performance cationexchange liquid chromatography (HPLC) on Waters Alliance 2690 equipment. The column was purchased from PolyLC 53 .
Real-time quantitative PCR. Samples were collected from expanding (T0) and differentiating (T48) erythroid cultures. RNA was isolated using TRIzol RNA isolation reagents (ThermoFisher Scientific) and used for RT-qPCR analysis as described in Supplementary Materials and Methods.