Survey and evaluation of mutations in the human KLF1 transcription unit

Erythroid Krüppel-like Factor (EKLF/KLF1) is an erythroid-enriched transcription factor that plays a global role in all aspects of erythropoiesis, including cell cycle control and differentiation. We queried whether its mutation might play a role in red cell malignancies by genomic sequencing of the KLF1 transcription unit in cell lines, erythroid neoplasms, dysplastic disorders, and leukemia. In addition, we queried published databases from a number of varied sources. In all cases we only found changes in commonly notated SNPs. Our results suggest that if there are mutations in KLF1 associated with erythroid malignancies, they are exceedingly rare.

Erythroid Krüppel-like Factor (EKLF/KLF1) is an erythroid-enriched transcription factor that plays a global role in all aspects of erythropoiesis, including cell cycle control and differentiation. We queried whether its mutation might play a role in red cell malignancies by genomic sequencing of the KLF1 transcription unit in cell lines, erythroid neoplasms, dysplastic disorders, and leukemia. In addition, we queried published databases from a number of varied sources. In all cases we only found changes in commonly notated SNPs. Our results suggest that if there are mutations in KLF1 associated with erythroid malignancies, they are exceedingly rare.
Erythroid Krüppel-like Factor (EKLF/KLF1) is a red cell-enriched, zinc finger DNA binding protein that interacts with its cognate 5′CCMCRCCCN3′ element at target promoters and enhancers 1 . Its roles in ß-like globin gene regulation during terminal erythroid differentiation have been well-established using genetic, biochemical, and molecular approaches 2,3 . Specific functional properties and expression characteristics of EKLF, along with recognition of its surprisingly broad role prior to and during red cell differentiation (reviewed in [4][5][6][7][8], provide the conceptual basis for the present study.
First, single amino acids and their modifications are critically important for EKLF protein-protein interactions and its function as an activator or repressor [9][10][11][12] , raising the possibility that mutations at these sites or within their consensus sequences could have dramatic functional effects on gene expression control.
Second, subtle altering of EKLF cellular levels can change the erythroid cell cycle status from proliferation to differentiation, an effect mediated, at least in part, by its direct activation of the p21, p18, p27, and E2f2 genes [13][14][15][16][17] . Induction of p21 is reminiscent of a similar up-regulation that has been observed with KLF4 18 and with KLF6 19 , known tumor suppressors that function in a p53-independent manner and that are frequently inactivated or downregulated in human cancer 20 .
Third, EKLF mRNA is highly restricted in its expression pattern during development to erythropoietic organs such as the yolk sac, fetal liver, adult bone marrow, and red pulp of the spleen 1,21 . Although most abundant in the erythroid cell, EKLF is also highly expressed in the megakaryocyte/erythroid progenitor (MEP) 22 . Its level is downregulated as MEPs differentiate towards the megakaryocytic lineage yet remains high in the erythroid lineage 22 . By using both gain-and loss-of-function approaches, we 12,22 and others [23][24][25] have found that the expression levels of EKLF impacts the bipotential lineage decisions that are made by the MEP; specifically, EKLF inhibits the formation of megakaryocyte colony and cell numbers while at the same time stimulating erythroid differentiation.
Fourth, there are now links between KLF1 mutation and altered mammalian hematology [26][27][28] . For example, the semi-dominant mouse mutation Nan (neonatal anemia), which presents with hereditary spherocytosis, was mapped to a single amino acid change (E339D) within the second zinc finger of EKLF [29][30][31] . The mutation alters the DNA binding specificity of EKLF such that it no longer binds promoters of a subset of its DNA targets 29 . In addition, recognition of a novel target DNA sequence by Nan-EKLF leads to ectopic expression of genes not normally expressed in the red cell, yielding a neomorphic phenotype with cellular and systemic consequences 32,33 . Excitingly, this murine mutation has converged with human disease, particularly a subtype of congenital dyserythropoietic anemia (CDA) [34][35][36][37][38][39] , where the same amino acid is altered (albeit to another charged residue, lysine) in human EKLF/KLF1. These patients are severely anemic with highly elevated HbF and reticulocyte levels, membrane abnormalities, severe hemolytic anemia, erythroid hyperplasia with dyserythropoiesis, splenomegaly, and growth delay 36 . KLF1/E325K is recognized as a characteristic feature of CDA type IV 40 .
Collectively these functional properties suggest that genetic mutation or altered levels of KLF1 may also be a causative factor for a specific subset of hematopoietic disease. We focused our attention on two types of disorders. Myeloproliferative neoplasms (MPN) are chronic hematological malignancies that yield an excessive proliferation of blood cells with normal differentiation 44,45 . On the other hand, unrestricted proliferation and impaired differentiation are characteristic of acute myeloid leukemia (AML) 46,47 . Our test hypothesis is that dysregulation of KLF1 function may contribute or lead to either of these human malignancies.

Results
Chromosomal associations. 19p13.13, the chromosomal locus of KLF1, has been associated with variation in blood cell traits in meta-analysis studies 48 . Recent studies provide an extensive catalogue of SNP variants of consequence for red cell parameters 49 . Perusal of the data enable us to extract the ones most relevant to KLF1 as summarized in Table 1, suggesting these KLF1 gene variants are significantly associated with altered MCH, MCV, and MCHC red cell indices, and RBC and RET numbers. In combination with studies summarized in the Introduction, we felt justified to address whether mutagenic changes in KLF1 might also be associated with aberrant or malignant red cell parameters.
Genomic analyses of KLF1 in selected populations. The complete human KLF1 transcription unit is only 3.5 kB 50 , enabling us to interrogate its proximal promoter, 5′ UTR, introns, exons, and 3′ UTR by eight overlapping amplimers. The proximal promoter contains highly conserved transcription factor binding sequences that are critical for its expression in erythroid cells [51][52][53][54][55] . To begin our evaluation of human KLF1 genomic status, we focused first on sequencing human leukemia cell lines that retain erythroid and/or megakaryocytic features 56,57 . These lines are derived from CML, AML-M6 or -M7 patients (F36P, HEL, JK1, K562, KMOE2, KU812, LAMA84, OCIM1, TF1), and include those with mixed erythroid/megakaryocytic features (CMK, KG1, Meg01). We conclude from genomic sequence comparison of these lines to the 1000 Genomes project 58,59 that the KLF1 genomic changes observed represent known single nucleotide polymorphisms (SNPs), but no novel mutations ( Table 2).
We then directly assessed primary human DNA samples from a selected cohort of patients. Given that KLF1 levels may be playing a directive role in erythroid/megakaryocyte bipotential decisions, we focused on myeloproliferative neoplasms (MPNs) as cells whose aberrant properties might result from expression of mutated KLF1. MPNs are a heterogeneous group of clonal hematological malignancies that are characterized by hypercellular bone marrow, panmyelosis, and a gradual evolution to myelofibrosis (MF) and acute leukemia 60,61 . These disorders are clonal and yield an excessive proliferation of blood cells that exhibit normal differentiation 44 . We were particularly interested in two subtypes of MPNs: polycythemia vera (PV) and essential thrombocythemia (ET), which represent abnormalities in the proliferation of erythroid and megakaryocytic lineage respectively 45 . We therefore sequenced genomic DNA from a collection of individuals with PV (eighteen), ET (eleven), and MF (five) samples. No novel mutations were identified, only SNPs (Table 2). Although rs115672848 is a rare variant, we found no evidence for its selective enrichment.
Many KLF1 target genes overlap with the expression signature identified in the differential analysis of myelodysplastic syndrome (MDS) patients that vary in their response to lenalidomide treatment 62 . We hypothesized that the mutational status of KLF1 may provide a mechanistic basis to explain the differential expression signature in these patients, and potentially predict whether they will respond to lenalidomide. We analyzed 26 samples each of responders and non-responders. All variants identified in these samples were known SNPs (Table 2), and none partitioned significantly to either of the two differentially responding groups. There were no significant difference in clinical parameters between the most common rs3817621 (p = 0.44) and rs2072597 (p = 0.53) SNPs. In addition, while rs201870270 and rs182276666 are rare variants, they are not selectively enriched. We next examined ten acute megakaryoblastic leukemia (AMKL) samples 63 , including four from patients with Down syndrome (DS) and six from those without DS. AMKL is an aggressive form of leukemia where a majority of the expanding cells are abnormal megakaryoblasts 64 , and whose DNA methylation patterns are distinct between DS and non-DS cells 65 . Again, all samples contain known KLF1 SNPs ( Table 2).
As indicated in Table 2, many of these variants have been noted before 26 . The ones that result in non-synonymous coding changes are predicted not to affect KLF1 function 66,67 (with the possible exception of rs2072596 as highlighted in Table 2). However, it is noteworthy that a geographical analysis 68,69 shows that, unlike ones whose variation is common and universal (e.g., rs2072597), some with an overall low frequency (~5%) are nonetheless highly enriched (~20%) prevalently and exclusively in selected genetic or geographic sub-populations while completely absent in all others (e.g., rs16978754) (Fig. 1).
Alternative analyses. Given our hypothesis, we were surprised by the absence of mutations associated with our target samples/sources. To expand our analysis, we also considered an in silico approach and queried published data from whole genome and exome analyses. These global analyses enable tabulation of altered genes to be accumulated in an unbiased manner. Consistent with our own directed sequencing studies, perusal of recent MPN 70,71 , acute erythroleukemia (250 samples total in two studies 72,73 ), and AML tabulations 74 , along with those derived from The Cancer Genome Atlas (TCGA) data sets 75,76 do not reveal a role for KLF1 in any case in these blood cancers. KLF1 is not one of 142 driver genes identified from an analysis of 1699 pediatric leukemias and tumors 77 . An shRNA screen of AML cells lines also did not implicate a role for KLF1 78 , and KLF1 does not appear in differential analyses related to predicting therapy resistance in AML 79 . Examination of COSMIC data (v83) 80 indicates that, out of 3478 curated hematopoietic and lymphoid samples, only a single coding sequence variant was observed (F27V) in a CLL patient 81 , one predicted to be benign 66,67 .
We finally considered whether variation in levels of KLF1, rather than a mutated form, might be correlated with a particular blood cancer. This hypothesis arises from two bits of data. One, it is known that some genes are uniquely sensitive to haploinsufficient levels KLF1 (reviewed in 4,26 26 , indicating that any amino acid change is likely benign, a conclusion supported by the "Predicted effect" based on other criteria 66,67 . KLF1 transcription initiation is at BP = 12887203 in Ch38 (based on 50,54 ). *rare (<1%); **noted previously in reference 26  various malignancies as judged by tabulation of RNA sequence datasets 82 (Fig. 2a). To further address this idea we queried a series of 200 AML samples to see whether KLF1 levels correlate with a particular AML subtype 83 . We were surprised by two observations (Fig. 2b). First, there is a wide range in KLF1 expression within the AML M6 and M7 categories that generally would be anticipated to express elevated levels of KLF1. Second, other AML subtypes, including M0-M5, contain samples that express similar high levels of KLF1. This was not expected as KLF1 is most highly enriched in erythroid, not myeloid, cells. Clearly, normal cellular and genetic control mechanisms are altered in many of these patient samples, as KLF1 levels do not even correlate with HBG expression (Fig. 2b), a target that is normally repressed by KLF1 and whose levels are indirectly proportional to that of KLF1 42 . We conclude there is no apparent correlation between KLF1 levels and malignancy status across a range of AML malignancy subtypes.

Discussion
Given the molecular and biochemical properties of KLF1, along with its regulation of selected downstream target gene expression, it remains surprising that our study did not uncover any mutations associated with erythroid dyplasia or malignancy. A simple explanation is that an insufficient number of samples have been analyzed. Although this possibility cannot be excluded, we directly analyzed nearly 150 samples, and in addition perused numerous databases without success. In any case, other explanations come to mind. For instance, we have not considered whether the epigenetic status of the KLF1 gene may be playing a role in its regulation that may be of consequence to malignancy. For example, 5mC modification at the KLF1 locus inversely correlates with its expression when compared across a number of cell types 84,85 . In addition, the level and extent of 5hmC modification at the KLF1 locus is also inversely correlated with its expression as the CD34+ cell differentiates to the mature erythroid cell 86 . Intriguingly, the KLF1 gene exhibits a synergistic "type III/cluster 3" pattern of expression control such that KLF1 transcript levels are dramatically increased in Tet2/Dnmt3a double knockout cells 87 . Of relevance to the present discussion, increased KLF1 levels in a subset of AML is only seen in samples from patients with mutations in both genes 87 . The DNA modification locales overlap regions demonstrated to be important for KLF1 expression control 7,54,55 . These studies demonstrate that the DNA modification status of controlling regions in the KLF1 gene is important for establishing its optimal level of expression. However, the causative versus correlative nature of KLF1 gene epigenetic modification and aberrant erythropoiesis will remain challenging to tease out.
One final explanation for our findings is that KLF1 functions during late stages of erythropoiesis, which may circumscribe any search for an effect at early stages 8 . In other words, it is known that terminal maturation of erythroid cells, particularly at the transition from orthochromatic to reticulocyte stage, is completely dependent on KLF1 17 . Part of the explanation for this requirement is KLF1 regulation of cell cycle inhibitors such as p18 and p27 specifically at this late stage. It is notable that these genes are not dependent on KLF1 at an earlier stage, for example in proliferating erythroblasts; indeed, there are no cell cycle differences when comparing such cells from WT vs KLF1-null 17 . Given that the blood cell disorders that we tested exhibit unrestricted proliferation, it remains possible that mutated KLF1 would not have a causative effect on cell cycle in this context in any case.
In spite of these considerations, we are still left with the example of the monoallelic mutation in KLF1 that leads to CDA type IV, with its dominant effect on erythroid cell properties, including proliferation 34,36,39 . Nonetheless, our results suggest that if there are any KLF1 mutants implicated in erythroid malignancy, they are quite rare. 88 ; all other cell lines were purchased from either the ATCC or DSMZ. K562 cells merit additional discussion. There are inconsistencies in the literature as to whether KLF1 is expressed in this cell line, with some studies indicating low to nil 50,88,89 , and others showing detectable levels 90 . There is a large body of work on its use as a cotransfection reporter line whose utility is dependent on lack of KLF1 expression (studies that began with 88 ). This line was established decades ago 91 , and early on was noted to exhibit variability 92,93 . We have noted major differences in transfection efficiency upon comparing our lab K562 stock with the  Table 2 as examples of widely dispersed (rs2072597; >40%) or limited (rs16978754; ~5%) MAFs. Blue pie conveys a given MAF percentage out of 100% across the indicated global populations. rs16978754 is common (~20%) only in the Gambian, Sierra Leone, and Nigerian populations on the African continent, and beyond is commonly detected only in the African Caribbean and African American populations; otherwise it is not detectable. ATCC K562 line (unpublished observations) although we did not find any KLF1 genomic sequence differences. We suggest labs are working with dissimilar isolates, differing in levels of GATA1 and/or KLF1 94 .

K562 cells were from our original lab stock
Patient samples were procured after informed consent and IRB approval within the individual institutions (Mount Sinai School of Medicine, Columbia University Medical Center, New York Blood Center, Northwestern University School of Medicine). All methods were performed in accordance with the relevant institutional guidelines and regulations. Mononuclear cells from MPN patients were provided by the Myeloproliferative Disorders Research Consortium Tissue Bank Core C.
Genomic DNA was isolated from all samples using a Qiagen DNeasy Blood & Tissue Kit. PCR primers spanning the complete KLF1 transcription unit were used to amplify eight overlapping regions across the locus (Supplemental Table 1). These were individually sequenced in both directions (with their corresponding PCR forward and reverse primer) using a 96-well format (Macrogen USA). With regards to comparison of clinical parameters in MDS samples, for continuous variables satisfying the normality assumption, a two-tailed unpaired t test was used. Discovery of any nucleotide change(s) followed alignment to the GRCh38 reference sequence using Vector NTI ContigExpress software (ThermoFisher Scientific). SNPs and any associated parameters were identified from the 1000 Genomes databases 58,59,68 . Other datasets queried included BloodSpot 82 , cBioPortal 95 , GGV 69 , UK10K 96,97 , COSMIC 80 , and NCBI public resources.