Main

All of the cellular components of the blood are derived from lineage-committed and multipotential progenitors found in the bone marrow, which are ultimately produced by self-renewing hematopoietic stem cells (1,2). These cellular components include the red blood cells (RBC) that are responsible for the delivery of oxygen to the tissues of the body, the platelets that have a role in hemostasis, and the white blood cells (principally the neutrophils, monocytes, and lymphocytes) that function in host defense and other physiologic functions (3). An extensive body of literature exists on how hematopoiesis occurs, the various molecules regulating this process, and how this process can be perturbed in disease (1,2,3). While many questions remain to be addressed, such as the exact hierarchical arrangement of cells that produce the blood (4,5), much more is understood about this system compared with most other aspects of cell differentiation observed in physiology. Yet despite our sophisticated understanding of hematopoiesis, the ability to explain a number of clinical observations in patients with blood disorders is often limited. However, this limitation can be viewed as a potentially rich opportunity to gain further insight into hematopoiesis by studying human variation.Footnote 1

While an incredible amount of knowledge on hematopoiesis has been gained from studies on model organisms (3), recent work has shown that there is at least some divergence both at that level of specific molecules and more globally at a genomic level when comparing blood cell production in humans with that observed in mice (6,7,8,9,10). For example, mice with mutations in Sec23b, whose orthologue is known to cause congenital dyserythropoietic anemia type II in humans, have completely normal blood counts and ostensibly normal red blood cell production (erythropoiesis) (11,12,13). In addition, there has also been an inability to develop faithful models for other congenital forms of anemia, such as Diamond-Blackfan anemia, which is characterized by a paucity of the earliest identifiable erythroid precursors with a reduction in erythroid-committed progenitors as well (14). At the genomic level, gene expression and histone modifications show global divergence at comparable stages of mouse and human erythropoiesis (6,7,8,10). Some of this divergence seems to be mediated by alteration in the binding of key transcription factors between species (10), which is similar to what has been noted in other tissues and cell types (15). In considering the 75 million years of evolutionary divergence between mice and humans, this extent of regulatory rewiring is perhaps not all that surprising and emphasizes an opportunity to use observations in humans to help us better understand the transcriptional and other regulatory programs that may underlie human disease.

Insight From Rare Disorders of Erythropoiesis

The first diseases whose molecular basis was uncovered were those affecting the red blood cell. These included sickle cell disease, which is due to a point mutation in the β-subunit of hemoglobin, and thalassemia, which is due to reduced production of either the α- or β-subunits of hemoglobin. While both sickle cell disease and thalassemia are among the most common genetic diseases worldwide, other rare disorders of erythropoiesis have provided important insight into this process. Traditionally, linkage approaches in families were used to identify mutations that segregate with a particular disease (16). While these approaches led to the identification of hundreds of disease genes, the pace of discoveries to identify etiologies underlying rare genetic disorders has accelerated recently with the development of modern massively parallel sequencing approaches. The observed acceleration in disease gene discovery stems in part because large pedigrees are no longer required to map mutations segregating with a disease. Instead, disease gene identification can be accomplished by assessing multiple unrelated individuals who all have a particular disease after removal of variants found in healthy individuals in the general population (17). This has led to the widespread use of such approaches that enrich and sequence the 1–2% of the genome containing protein-coding genes, termed the exome, and has commonly been referred to as whole-exome sequencing (17,18,19). While using such broad-based sequencing approaches can lead to the identification of a number of rare variants, most of which will not be causal for a disease, specific approaches can enrich for true causal alleles, such as by looking at multiple unrelated probands and using segregation of mutations in affected individuals within a family (17). Given our interest in gaining a deeper understanding of hematopoiesis and specifically erythropoiesis, several years ago we began to utilize this approach for disease gene discovery.

Our efforts initially focused on Diamond-Blackfan anemia (DBA), an anemia characterized by a paucity of erythroid precursors and progenitors in the bone marrow without observable defects in other blood lineages (2). This disease is caused by autosomal dominant mutations in one of 17 different ribosomal protein genes (14). Such mutations account for approximately 60–70% of cases of this disorder. How mutations in ubiquitous ribosomal proteins can lead to such a cell-type–specific defect remained a major mystery. By utilizing whole-exome sequencing, we were able to show that rare cases of DBA can be caused in males by mutations that reduce the production of the key hematopoietic transcription factor GATA1 (20). However, whether such rare mutations related to the much more commonly observed ribosomal protein mutations remained a mystery. Recently, we were able to address this by showing that the ribosomal protein mutations in DBA reduce translation of specific mRNAs, including that encoding GATA1 (21). Thus, downregulation of GATA1 protein provides a common mechanism to explain the pathogenesis of the erythroid defects observed in this disorder. However, additional questions remain, including what other mutations may underlie this disorder and how these may connect to the GATA1 pathway, as well as how a selective defect in translation can arise from ribosomal protein haploinsufficiency. This latter issue may teach us more broadly about blood cell production, as protein synthesis appears to be exquisitely regulated at various stages of this process (22).

While the focus of our ongoing genetic analysis in rare erythroid disorders has been on DBA, we have also been pursuing the genetic cause of other rare forms of anemia that involve impaired erythropoiesis and where no causal mutations have been identified. A group of disorders characterized by disordered and ineffective erythropoiesis has been termed the dyserythropoietic anemias. The majority of cases have known genetic etiologies, including mutations in the CDAN1, SEC23B, KIF23, and KLF1 genes (23). We recently identified a family with a macrocytic dyserythropoietic anemia found in multiple females that could not be explained by mutations in one of these previously characterized genes. Thus, whole-exome sequencing of multiple affected and unaffected family members was performed and revealed an X-linked dominantly acting loss-of-function mutation in the ALAS2 gene (24). This case not only revealed a new, albeit likely rare, etiology of dyserythropoietic anemia, but also revealed the value of whole-exome sequencing to extend the phenotypic spectrum of known genes that can cause anemia—mutations in ALAS2 in males can cause a microcytic sideroblastic anemia, but such mutations have not generally been associated with dyserythropoietic anemia. Functional studies additionally showed that this mutation likely causes a severe impairment in the differentiation of the erythroid precursors in which the mutant allele was expressed and the wild-type allele-expressing cells then undergo dyserythropoietic changes in the drive to compensate, providing broader mechanistic insight into normal and disordered erythropoiesis.

Use of whole-exome sequencing has significantly accelerated the pace of disease gene discovery, yet many mutations may lie in noncoding regions of the genome and may explain some of the cases where mutations have not been identified to date. In the case of erythroid disorders, previous studies have identified GATA1 binding site mutations located in noncoding regions of a number of genes implicated in human erythroid disorders: ALAS2 in X-linked sideroblastic anemia (25,26), PKLR in pyruvate kinase deficiency (27), and UROS in congenital erythropoietic porphyria (28). These noncoding variants provide a unique opportunity to examine the necessity of these elements using modern genome editing tools, such as CRISPR/Cas9 (29). It is likely that such genome editing approaches coupled with human genetic analysis of whole-genome sequencing data, will help to identify additional noncoding variants implicated in human disease. We have started to examine how we can systematically approach this challenging problem in human genetics.

Fetal Hemoglobin Regulation and Variation in the Hemoglobin Disorders

As discussed above, the most common human monogenic disorders are those affecting the hemoglobin molecule: sickle cell disease and β-thalassemia. While these disorders have been considered to be simple monogenic disorders with recessive inheritance, clinical observations have highlighted the substantial phenotypic diversity in these cases (30). Indeed, there are some individuals harboring disease-causing mutations who are entirely asymptomatic. In both sickle cell disease and β-thalassemia, elevated levels of fetal hemoglobin (HbF), a form of hemoglobin predominantly expressed throughout gestation that normally gets silenced shortly after birth, have been shown to ameliorate clinical symptoms in a quantitative manner. To address this heterogeneity in HbF expression and clinical severity in hemoglobin disorders, we and others utilized genome-wide association studies (GWAS) several years ago in both nonanemic and sickle cell disease populations to identify common genetic variants associated with HbF levels (30,31,32,33). GWAS allow one to assess whether any one of thousands to millions of common genetic variants are associated with a particular trait or disease of interest (34). The GWAS revealed three loci associated with HbF levels. There were variants within the β-globin gene cluster itself, in an intergenic region on chromosome 6 between the genes HBS1L and MYB, and in an intron of the gene BCL11A that were associated with HbF levels.

BCL11A was particularly attractive to study, as it was a zinc-finger transcription factor that was well studied for its role in B lymphocyte production, yet a role in erythropoiesis had not been previously recognized. We were able to demonstrate that BCL11A is a potent silencer of HbF in human adult erythroid cells and has a key role in developmental hemoglobin switching (35,36). BCL11A appears to associate with key co-repressor complexes and mediate chromatin looping (37,38), although the exact mechanisms by which it is able to silence the HbF genes remain unknown. Recent work has also identified erythroid-specific enhancers of this gene that may be affected by common genetic variation (39).

However, a couple of major questions that have remained are to what extent BCL11A may silence HbF in vivo and what other developmental functions it may have in humans. These questions are not only of academic interest, but if targeting of BCL11A is considered as a therapeutic approach to treat the hemoglobin disorders, this knowledge could serve a key function. We have recently addressed this by studying three patients with rare deletions causing haploinsufficiency of BCL11A (40). All of the patients had substantial persistence of HbF ranging between 15–30% in the setting of BCL11A mRNA haploinsufficiency, without observable abnormalities in either other blood or immune cell phenotypes. Similar findings have also been reported in subsequent follow-up studies (41). Importantly, we could demonstrate using a variety of orthogonal genetic datasets that BCL11A had a key role in human neurodevelopment, suggesting the importance of careful phenotypic assessment with any potential BCL11A-targeting therapies that may be developed in the future (40). Studies of such patients with elevated levels of HbF, provides an opportunity to gain important mechanistic insight into HbF regulation. Indeed, our own studies have been able to reveal a role for the transcription factor MYB and identify specific cis-elements necessary for HbF silencing by studying rare patients (42,43).

Common Variation in RBC Traits and Erythropoiesis

While much of our discussion has focused on identifying the genetic causes for various forms of anemia and the factors that can ameliorate these disorders, a fruitful opportunity exists to understand hematopoiesis better by studying common variation in blood cell traits (44) ( Figure 1 ). In fact, the identification of BCL11A as a key factor in HbF regulation would not have been possible without such studies that utilized GWAS (35). We have gone on to use insight from common variation in RBC traits to identify new regulators of erythropoiesis. Our focus has been on using insight from GWAS to identify regulatory elements and the genes they affect to gain new insight into hematopoiesis. For example, we identified variants altering the levels of cyclins D3 and A2, which have key roles in regulating the cell divisions that occur in the last stages of erythropoiesis before the cells prepare to enucleate and form mature RBCs (45,46). It should be noted that over 75 loci have been identified as associated with RBC and other blood cell traits (47), suggesting that there may be more opportunities to gain important new insight into regulators of erythropoiesis and hematopoiesis through more systematic studies. In addition, rare or low frequency genetic variation may underlie some of the common variation observed in blood cell traits and important insight into new regulators of hematopoiesis may be gained through studies of this kind of variation ( Figure 1 ), particularly as large-scale genome or exome sequencing is performed in cohorts with blood traits ascertained.

Figure 1
figure 1

The genetic landscape of human hematopoiesis. Here, we depict a sample normal distribution, as is observed for blood cell counts or other related traits (such as HbF). We illustrate how variation in such counts/ traits can arise from common genetic variation (depicted by individuals shown in blue) and more extreme phenotypes, such as is observed with anemia or other cytopenias, can be due to rare mutations (depicted by individuals shown in orange).

PowerPoint slide

Conclusions and Future Directions

In this short review, we have discussed how while we have had an extremely sophisticated understanding of hematopoiesis largely derived from model systems, studies of human variation in blood production provide an important opportunity to gain additional insight into this process. We have discussed several vignettes that illustrate the type of insight that can be gained from such studies. As further technologic advances are made in human genetics, additional opportunities to systematically and comprehensively assess these forms of genetic variation will arise.

There are several opportunities that we envision for the coming years. While rare cases of anemia, such as DBA, have provided us with insight into other genetic etiologies, a significant fraction of the underlying causes remain to be identified. Future work will help to identify such causes using further analysis of whole-exome and genome sequencing data. Moreover, in many conditions, noncoding variation may underlie the etiology in some fraction of cases and use of sequencing tools coupled with modern genome editing will allow for such variation to be interrogated. The regulation of processes such as HbF silencing remains poorly understood and it is likely that the study of rare patients with significant elevations of HbF may provide us with additional mechanistic insight into this process. Indeed, even for well-studied factors like BCL11A, the underlying mechanisms of action remain poorly understood.

In addition to studies of human disease, examining natural variation in blood cell traits that does not meet the criteria for anemia or other cytopenias, still provides an opportunity to gain important insight into hematopoiesis ( Figure 1 ). Indeed, we have used common variation from GWAS to identify new regulators of erythropoiesis. However, these are only a couple of examples and the majority of loci remain unstudied. Systematically dissecting this will likely be fruitful in the coming years. Not only can basic insight into hematopoiesis be gained from such studies, but these findings can provide insight into blood disorders. Common variation in RBC traits appears to be associated with variation in the clinical course of specific forms of anemia, such as β-thalassemia (47). There is likely a much greater contribution of such common variation to disease states than we currently appreciate.

Finally and perhaps most important is the fact that such studies can lead to promising therapeutic approaches for blood disorders. BCL11A is a promising target for HbF induction and significant efforts using genome editing, gene therapy, or small molecule approaches are in development currently (2). Since GATA1 appears to underlie a common pathway in DBA pathogenesis, development of gene therapy approaches using GATA1 vectors may represent a fruitful approach to treat the anemia in cases of DBA (21). In addition, insight from common variation in blood cell traits could provide an opportunity to improve upon autologous blood cell therapies, such as the use of ex vivo produced RBCs as an alternative to donor-derived blood units (48). Clearly there is a lot that can and hopefully will be learned about blood cell production from studies of natural human variation in this process.

Statement of Financial Support

The authors received support from the March of Dimes Basil O’Connor Scholar Award (White Plains, NY), the Diamond-Blackfan Anemia Foundation (West Seneca, NY), and the National Institutes of Health (Bethesda, MD) (grants R01 DK103794, R21 HL120791, and U01 HL117720).

Disclosure

No relevant conflicts of interest exist.