Introduction

Hematological traits are essential medical indicators describing blood cells circulating in the human body.1, 2 Blood cells have an integral role in a variety of physiological processes involved in the maintenance of our vital activities. Blood cells are classified into three general categories: (i) white blood cells (WBC, also known as leucocytes), which mediate the immune system; (ii) red blood cells (RBC, also known as erythrocytes), which transport oxygen and respiratory gases and (iii) platelets (PLT, also known as thrombocytes), which form blood clots.1 Homeostasis of hematological traits, including the counts and volume of peripheral blood cells and their biological activities, is tightly regulated within narrow physiological ranges, and abnormalities are closely linked to the development of disease. Therefore, laboratory data of hematological traits are widely used as important diagnostic markers in clinical practice. For example, a rapid increase in the WBC counts along with the development of a fever suggests that the patient has an infectious disease, and an acute decrease in RBC volume implies bleeding. Hematological traits can be useful prognostic parameters for several diseases and the overall health of individuals. PLT volume has been proposed as an independent predictor of morbidity and mortality for cardiovascular diseases (CAD).3 Therefore, the elucidation of the genetic components of an individual’s hematological traits would have a substantial impact.

Hematological traits are highly heritable, and it has been suggested that genetic factors as well as environmental factors, such as age, gender, obesity, smoking and drinking behaviors, contribute significantly to inter-individual variance.4, 5, 6, 7, 8 Studies comparing monozygotic and dizygotic twins have demonstrated that 50–90% of inter-individual variance in WBC, RBC and PLT counts could be attributed to genetic components.4, 5 These heritabilities were observed only within healthy individuals, but genetic factors would also give some variance to disease abnormal data, which may get clinicians confused in interpretation of values. The processes by which blood cells differentiate from hematopoietic stem cells and are maintained in the peripheral blood are partly or fundamentally shared among several lineages of blood cells.9 For this reason, the values of each of the hematological traits are not distributed independently but are significantly correlated.10 Additionally, significant differences in hematological traits exist between ethnic groups. African Americans are known to have lower WBC counts than Europeans,11 and Japanese people have, on average, lower levels of RBC-related traits.12 These observations suggest that there exist both shared and divergent genetic backgrounds for hematological traits and that they might be characterized by ethnicity.2

Recently, genome-wide association studies (GWAS), a new approach in genetic epidemiology that comprehensively evaluates hundreds of thousands or millions of polymorphisms in genome, have successfully identified causal genetic loci for many diseases.13 As the initial GWAS was conducted at the RIKEN institute in Japan in 2002,14, 15, 16 >1000 GWAS have been published until mid 2011 for >200 diseases or traits.13 At first, GWAS were primarily conducted for binary phenotypes; that is, case-control association studies.17 Then GWAS for quantitative phenotypes were launched, in which hematological, biological and physical traits were assessed.18, 19 To date, GWAS for a variety of hematological traits have been conducted in several ethnic populations, and >100 genetic loci that affect individuals’ hematological traits have been identified (Table 1).20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43

Table 1 Genetic association studies for hematological traits

In this review, we describe the current approaches being used to elucidate the genetics of hematological traits in humans, which are primarily based on the three major lineages: WBC, RBC and PLT. We also propose additional approaches that are promising as next steps in the post-GWAS era.

Genetic factors for WBC

WBC form a substantial role in the immune system and defend the body against invading foreign microorganisms.1 WBC include a variety of cells with distinct biological roles and are morphologically classified into five major subtypes: neutrophils, lymphocytes, monocytes, basophils and eosinophils. Although further refined classifications exist, especially for lymphocytes,44 the five subtypes mentioned above are widely used in routine clinical screenings.

Neutrophils comprise 50–70% of the total WBC population and have an essential part in the innate immune system as an early defenses against microorganisms.45, 46 Inadequate numbers of circulating neutrophils, a hematological disorder called neutropenia, make affected individuals susceptible to bacterial infections. Lymphocytes comprise 20–30% of the total WBC population and have a pivotal role in the adaptive immune system.1 Lymphocytes are further subdivided into several subsets based on their surface molecules, such as T-cells, B-cells and natural killer cells. Monocytes comprise 5–10% of the total WBC population and mediate the phagocytosis of pathogens and antigen presentation.47 Eosinophils and basophils comprise 1–5% and 0.5–1% of the total WBC population, respectively. Both are known to be involved in the regulation of allergic reactions.48, 49 Basophils are the rarest of the major WBC subtypes and their role in host immunity has been less clear, although recent studies have indicated that they essentially provide unique functions including acting as antigen-presenting cells that promote T helper cell 2-associated allergic responses.48, 50

Associations at the DARC locus

Genetic association studies on WBC-related hematological traits firstly focused on the ethnic differences in total WBC counts between European and African populations11 that could not be explained by known non-genetic factors (Table 1).7 Admixture and fine mapping scans determined that a common African-derived null variant of rs2814778 at the Duffy antigen receptor for chemokines (DARC) gene at 1q23 was responsible for lower WBC and neutrophil counts in African Americans.21, 25 This null variant explained as much as around 20% of the inter-individual variance of total WBC counts in African Americans21 and have been known to confer selective advantage against malaria.51 A recent GWAS in African Americans identified an additional significant association between WBC counts and the HYDIN locus at 16q22, but this association was considered to be a probe cross hybridization artifact because of sequence similarity with the DARC locus that was induced by segmental duplication.40

Associations at the PSMD3–CSF3 locus

Since 2009, GWAS for total WBC and WBC subtypes have identified a number of genetic loci (Table 1).26, 28, 32, 33, 34, 35, 38, 39, 40, 41, 42 As expected, both common and unique associations for each WBC subtype were observed (Figure 1a), which reflects their shared and non-shared processes in hematopoiesis. Because neutrophils comprise the majority of WBC, it is not surprising that the associations identified in GWAS for these two phenotypes were mostly shared. This was most evident for the PSMD3–CSF3 locus at 17q21.28, 34, 35 The association at the PSMD3–CSF3 locus would be interesting from a clinical point of view, because CSF3, which is also known as granulocyte colony-stimulating factor (G-CSF), encodes a cytokine that regulates granulocyte production52, 53 and its recombinant protein is used in clinical therapy to treat neutropenia.1 Further analysis assessing relationships between PSMD3–CSF3 variants and clinical outcomes, such as a response to G-CSF therapy and duration of agranulocytosis after chemotherapy, would be desirable.

Figure 1
figure 1

Shared associations between white blood cells’ (WBC) subtypes. Shared associations of genetic loci with WBC subtypes in a Japanese population.41 (a) A Venn diagram of the shared associations among the WBC subtypes-associated genetic loci. The colors in the Venn diagram (red, orange, purple, green and aqua) correspond to each of the WBC subtypes (neutrophils, lymphocytes, monocytes, eosinophils and basophils, respectively). (b) A scatter plot of the normalized Z values of the eosinophil and basophil counts in Japanese subjects (n=14 654). The center, 50% probability ellipse and 95% probability ellipse of the subjects with the AA (red)/AG (green)/GG (blue) genotypes of rs4328821 at the GATA2 locus are indicated as crosses, solid ellipses and dashed ellipses, respectively. The subjects who were homozygous for the A allele had 1.19-fold and 1.28-fold higher eosinophil and basophil counts, respectively, than the subjects who were homozygous for the G allele. These figures are taken from Okada et al.41 under a Creative Commons Attribution 2.5 Generic License.

Associations at the ITGA4 locus

GWAS for monocytes in European and Japanese populations identified the ITGA4 locus at 2q31.39, 41, 42 ITGA4 encodes the α4 chain of integrin, which mediates the migration of WBC into the tissues.54 The variants of the ITGA4 locus were shown to regulate the expression of ITGA4 messenger RNA in monocytes.42, 55 Recent studies have demonstrated the clinical benefits of inhibition of α4 integrin in treatments for autoimmune diseases.56 Although functional relevance between messenger RNA levels of ITGA4 and their biological effects on immune-related cells needs to be elucidated, the identified variants of the ITGA4 locus could be promising pharmacogenetic targets for anti-α4 integrin therapy.

Shared associations between eosinophils and basophils

Reflecting the similar biological roles of eosinophils and basophils, several overlapping genetic loci were identified for these two subtypes (Figure 1a). In a Japanese population, the GATA2 and ERG loci demonstrated prominent association with both eosinophils and basophils.41 In particular, the index single nucleotide protein (SNP) in the GATA2 locus, rs4328821, explained 2.7% of the inherent correlation between these two phenotypes (Figure 1b).10, 41 Compared with the other WBC subtypes, GWAS for lymphocytes have been less successful.32, 38, 39, 40, 41 This may be because the lymphocytes can be further subdivided, and these sub-phenotypes have not been sufficiently examined enough. A GWAS focusing on lymphocyte subsets by Ferreia et al.33 successfully identified associated genetic loci, and additional accumulations of the studies on lymphocytes subsets would be desirable.

Development of trans-ethnic analysis

Notably, genetic association studies for WBC-related hematological traits have been cooperatively conducted to allow for trans-ethnic comparisons of regional associations among different populations. Recently, three large-scale GWAS for WBC and WBC subtypes were concurrently reported for European (by the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) consortium),39, 57 African American (by the COGENT (Continental Origins and Genetic Epidemiology Network) consortium)40 and Japanese populations (by the BioBank Japan Project).41, 58 Comparative analysis of these populations revealed that WBC have both ethnically shared and divergent genetic backgrounds. The association with the DARC locus was only observed in African Americans,40 whereas the associations with the PSMD3–CSF3 and CXCL2 loci were shared among all three populations.39, 40, 41 The index SNP (rs4328821) in the GATA2 locus was associated with eosinophils and basophils in Japanese and with monocytes and basophils in Europeans.40, 41 This observation suggests that rs4328821 would be a candidate of a causal variant in this locus that may have a pivotal role in the regulation of these WBC subtypes. Allelic differences in the binding affinity of nuclear extracts from human basophilic leukemia cells have been demonstrated for this SNP using an electrophoretic mobility shift assay(personal communication with Drs. T. Hirota and M. Tamari at CGM (Center for Genomic Medicine), RIKEN).

Overlap with disease-associated loci

GWAS have also identified substantial overlap between the loci associated with WBC-related hematological traits and those implicated in susceptibility to several diseases. The PSMD3–CSF3 locus and the major histocompatibility complex class II region have been implicated for rheumatoid arthritis, an autoimmune disease characterized by inflammation of the joints.59, 60, 61, 62 The PSMD3CSF3 locus is also a well-known risk locus for asthma.63 The ERG locus, one of the loci associated with eosinophils and basophils,41 is located in the Down syndrome critical region on chromosome 21.64 These observations are compatible with our knowledge that WBC are essentially involved in the pathology of the multiple inflammatory diseases46, 65 and should encourage further comparative analysis of WBC-related loci with disease susceptibility.

Genetic factors for RBC

RBC are the most abundant blood cells and comprise approximately half of the total blood volume.1 RBC are an essential components of the respiration system. They transport oxygen and carbon dioxide into tissues using the circulation of blood flow. The cytoplasm of RBC is rich in hemoglobin (Hb), an iron-containing molecule that is responsible for the red color of blood. Decreases and increases of RBC are called anemia and polycythemia, respectively, and are associated with multiple comorbid conditions, such as chronic bleeding, inflammation, smoking and renal impairment.12, 66 RBC counts, Hb levels and the proportion of RBC in the total blood volume (hematocrit; Ht) are the most common indicators of RBC used in clinical practice. Additionally, the following three indices are calculated using the RBC counts, Hb and Ht:

Mean corpuscular volume; MCV (fL)=Ht (%)/RBC ( × 106 μl−1) × 10;

Mean corpuscular hemoglobin; MCH (pg)=Hb (g dl−1)/RBC ( × 106 μl−1) × 10;

Mean corpuscular hemoglobin concentration; MCHC (%)=Hb (g dl−1)/Ht (%) × 100.

Although RBC, Hb and Ht are useful indicators of the basal status of anemia and polycythemia, MCV, MCH and MCHC provide additional information regarding the underlying pathologies of these disorders.

Associations at the HBS1L–MYB locus

Patients affected with blood disorders of sickle cell disease and β-thalassemia have an abnormal proportions in the compositions of Hb subunits, which is characterized by increase of fetal Hb.67 Studies investigating this phenomenon have found that the BCL11A and HBS1L–MYB loci contribute to the inter-individual variance of fetal Hb levels.20, 22, 23 The HBS1L–MYB locus is now known as a fundamental regulator of blood cells, and it exhibits significant associations with all hematological traits, including those related to the three major lineages: WBC (WBC and WBC subtypes), RBC (RBC, Hb, Ht, MCV, MCH and MCHC) and PLT.41 Associations in the HBS1L–MYB locus were observed in the intergenic region between HBS1L and MYB. The eQTL (expression quantitative trait locus) analyses demonstrated HBS1L–MYB as both cis-eQTL on HBS1L within 6q23 and trans-eQTL on HBG1 and HBG2 at 11p15 (eQTL Browser; http://eqtl.uchicago.edu/Home.html), whereas Ganesh et al.29 reported relatively higher expression levels of MYB in erythroblasts than other genes in the locus.

Associations at the TMPRSS6 locus

Large-scale GWAS for RBC-related hematological traits were first conducted in European populations,28, 29, 30, 31, 32 and then in a Japanese population.34 These studies identified dozens of novel loci, including the TMPRSS6 locus.30, 31 TMPRSS6 encodes a type II plasma membrane serine protease that has a pivotal role in iron homeostasis by regulating hepatic hepcidin production.68 Concurrently, GWAS have been conducted on indicators of serum iron concentrations, such as serum transferrin, iron and ferritin levels, which are important clinical parameters for anemia.69, 70, 71, 72 These studies also identified the associations with the TMPRSS6 locus.70, 71

Associations at the ABO locus

A GWAS in a Japanese population found that a variant located in the ABO locus was associated with certain RBC-related hematological traits (RBC, Hb, Ht and MCHC).34, 73 ABO encodes glycosyltransferases that transfer specific sugar residues to the surface substance of RBC to produce blood-type-specific antigens. The combination of the sequence polymorphisms at ABO determines ABO blood type of the individual, the most important classification of blood types used in human blood transfusion.74 The variants identified through the GWAS in the Japanese population were in complete linkage disequilibrium with rs8176743 (nt703),34 one of the major deterministic variants of B antigens.74 When the ABO blood types of 14 097 Japanese subjects were classified based on possession of B alleles, as defined by nt703, the subjects with the B allele (B or AB blood types) had 0.15 mg dl−1 higher Hb levels than the subjects without the B allele (A or O blood types; P=1.4 × 10−8; Table 2).34, 73 This association was independently validated in 8421 Japanese subjects (P=3.0 × 10−5).73 Nevertheless, ABO blood type of an individual can be more precisely defined using other polymorphisms in the locus,75 and further validations is required.

Table 2 Mean hemoglobin levels of Japanese individuals based on ABO blood typesa

Dissection of association patterns among RBC-related traits

Intriguingly, the number of identified loci and the degree of their associations were generally more prominent in the GWAS for MCV, MCH and MCHC than in those for RBC, Hb and Ht.28, 29, 34 One probable explanation for this discrepancy is that the associations with medical indicators of physiological aspects of RBC (MCV, MCH and MCHC) are more robust to confounding factors than those with direct morphological measurements of RBC (RBC, HB and Ht). The associations observed for MCV, MCH and MCHC substantially overlapped, which is probably because their trait values were strongly correlated because of their dependent definitions being based on RBC counts, Hb and Ht.2 Future studies should further explore which phenotype, or phenotypes, are the most responsible for the associations of each of the loci.

Genetic factors for PLT

PLT are the second most abundant blood cells. They have an essential role in hemostasis, the process that stops bleeding.1 In case of bleeding, PLT adhere to damaged blood vessels and form a plug through aggregation. An abnormal excess of PLT in the blood may induce the formation of clots that obstruct vessels and can result in vascular events, including myocardial or brain infarction or pulmonary embolism. In contrast, a relative decrease in PLT in the blood may cause bruising or hemorrhage. It is known that PLT release multiple growth factors, such as platelet-derived growth factor, vascular endothelial growth factor and transforming growth factor-β, which have broad spectrums of impacts in physiology and medicine, including the repair and regeneration of tissues.76, 77 In clinical practice, two major phenotypes are measured; PLT counts and mean PLT volume (MPV). MPV has been suggested as an independent predictor of CAD.3

Associations at the THPO locus

Thrombocythemia is a chronic myeloproliferative syndrome that is caused by the sustained proliferation of megakaryocytes, the bone marrow cells responsible for production of PLT, which results in an increased numbers of circulating PLT. In 1998, a study examining a Dutch family with hereditary thrombocythemia found that THPO was the responsible gene for the disease.78 THPO encodes thrombopoietin, a glycoprotein hormone that regulates the production of PLT.53 A splice donor mutation in THPO leads overproduction of thrombopoietin.78 Associations between the THPO locus with PLT counts were subsequently verified by a linkage analysis on an Asian Indian kindred79 and a GWAS in a Japanese cohort.34

Associations at the WDR66 locus

As with other hematological traits, GWAS have successfully identified the genetic loci associated with PLT-related traits.24, 27, 28, 32, 34, 38, 43 The initial GWAS for MPV was conducted in European populations and identified WDR66, ARHGEF3 and TAOK1 as candidate genes responsible for the regulation of PLT.24 Of these, a variant located in the genetic region of WDR66 showed the most significant association, explaining approximately 2.0% of the inter-individual variance in MPV. A recent study reported that another variant of WDR66 was associated with a metabolic trait in human urine (2-hydroxyisobutyrate),80 although its relevance to the regulation of PLT is unclear. Although we do not discuss the results in this review, GWAS for coagulation and fibrinolysis have also been reported.81, 82, 83, 84, 85

Associations at 12q24

Given the close biological relationship between PLT and CAD,3 Soranzo et al.28 evaluated the risks of CAD in the loci associated with PLT and found an overlap between these two traits at 12q24. The variants identified at 12q24 were encompassed by a long-range haplotype that extended >1.5 Mbps on which significant pressure of natural selection was observed. Interestingly, the association between PLT and CAD and the existence of the long-range haplotype at 12q24 was also observed in East Asian populations.34, 86, 87 These results suggest the existence of variant(s) with strong biological effects on multiple traits at 12q24, which may have influenced natural selection in several populations.

Findings from a large-scale GWAS meta-analysis

Recently, Gieger et al.43 reported a large-scale GWAS meta-analysis by the HaemGen consortium for PLT and MPV. The study incorporated >66 000 individuals of European ancestry and led to the identification of as many as 53 novel associated loci. A detailed comparison of the identified associations with those identified in Asian populations revealed a substantial overlap of the association signals for the two populations. Interestingly, gene silencing of the identified genes using Danio rerio and Drosophila melanogaster demonstrated the functional involvement with hematological phenotypes. This is one of the first successful studies that comprehensively assessed the genetic factors of the traits, including trans-ethnic comparison and functional verification of the identified loci using model organisms. Additional studies of this type with other hematological traits should be warranted.

Strategies for the post-GWAS era

As we have discussed in this review, the recent development of the GWAS approach has provided us with fundamental knowledge regarding the genetic factors that influence hematological traits. However, it has been suggested that the majority of genetic factors could not be explained by the loci identified though GWAS because of the stringent significance threshold used.88 In fact, despite the large sample size in the GWAS by the HaemGen consortium, the identified loci explained <10 and 20% of the inter-individual variance in PLT and MPV, respectively.43 Although further accumulations of larger numbers of subjects in the GWAS should be recommended, alternative strategies would also be necessary to uncover these undetected loci in the post-GWAS era. It should also be noted that molecular mechanisms through which the identified variants could influence hematological traits are yet to be uncovered. Here, we would like to introduce several promising approaches (Figure 2).

Figure 2
figure 2

Strategies for detecting genetic factors in the post-genome-wide association studies (GWAS) era. (a) Meta-analysis of GWAS involving multiple populations would lead to the identification of novel loci that were not detected by the studies in single populations. (b) Prioritization of GWAS results using external biological databases, such as eQTL,94 biological pathways95 and the scientific literatures.96 (c) Assessment of pleiotropic associations. A variant in the interleukin (IL)6 locus (rs2097677) increased C-reactive protein (CRP), white blood cells (WBC), platelets (PLT) and serum protein, but decreased anemia-related indices.98 (d) GWAS for structural variants, including copy number polymorphisms (CNPs)100 and epigenetic variants.101 (e) Evaluation of gene–gene or gene–environmental interactions. (f) Detection of rare variants using next generation sequencing (NGS). Pictures of NGS machines were obtained from the Togo picture gallery (http://g86.dbcls.jp/~togoriv/). ©2011 DBCLS Licensed under a Creative Commons Attribution 2.1 Japan License. (g) Extraction of shared components from inherently correlated traits. A correlation matrix comparing the covariate adjusted hematological trait values and principal components (PC) 1–8 obtained from a Japanese population (n=13 008) is shown in the left panel. The colors in the matrix correspond to the correlation coefficient, r, or coefficient of determination, R2, in the legend. The proportion of variance explained by each PC or its cumulative proportion is shown in the right panel.

Meta-analysis of GWAS involving multiple populations

As shown in the previous trans-ethnic studies for WBC subtypes and PLT,39, 40, 41, 43 the genetic factors that are associated with traits are partially shared by different populations.89 Therefore, meta-analysis of GWAS involving multiple populations would yield the identification of novel associated loci that were not detected in studies of single populations and also contribute to fine mapping of the identified loci (Figure 2a). One issue to be resolved is that the different impacts of the variant(s) of the locus among different populations. Even when a significant association of a variant was observed in multiple populations, the effect sizes and allele frequencies would not be identical between the populations.90 Most of the recent studies have used meta-analysis assuming a fixed-effect model for effect sizes; however, meta-analysis assuming a random-effect model or other sophisticated methods is required for trans-ethnic studies.91, 92

Prioritization of GWAS results using external biological databases

To effectively explore the genetic loci that were uncaptured by the standard GWAS approach, it would be important to assess the subloci of GWAS results by prioritizing or weighting them by incorporating information from external biological databases (Figure 2b).93 We conducted a highly selective replication study by prioritizing the result of a GWAS for systemic lupus erythematosus, an autoimmune disease of B-cell abnormality.94 Using eQTL analysis for the lymphoblastoid B-cell lines, we successfully identified a novel susceptibility locus of AFF1 at 4q21.94 Because genes involved in specific biological pathways are likely to exhibit enrichments of association signals with related traits, prioritization based on pathway databases would also be useful.95 Recently, approaches that aimed at understanding the biological relatedness of the implicated genes and the traits based on published scientific literatures have yielded substantial achievements.39, 60, 96

Assessment for pleiotropic associations

Pleiotropy is a phenomenon that a single genetic locus influences multiple traits.88, 97 Because blood cells mediate a variety of biological processes in a coordinated manner, it would be plausible that some of the genetic loci would have pleiotropic effects on multiple hematological traits or on other related traits. A GWAS for C-reactive protein, a biomarker for the inflammatory response, identified pleiotropic associations between the interleukin (IL)-6 locus at 7p15, and multiple hematological and biochemical traits.98 The C-reactive protein-increasing allele of the variant in the interleukin-6 locus also increases WBC, PLT and serum protein levels; however, it decreases RBC-related anemia indices (Figure 2c).

GWAS for structural and epigenetic variants

Although SNP-based GWAS have been successful, there exists a concern that not all of the genetic risks are attributable to SNPs. Alternatively, a GWAS investigating structural variants, such as copy number polymorphisms could be a key to dissect this missing pieces (Figure 2d).99 Kumasaka et al.100 conducted a CNP-based GWAS for hematological traits in Japanese. The study suggested that the association of the major histocompatibility complex region with WBC and PLT previously observed in a SNP-based GWAS34 may have reflected an association with copy number polymorphisms by linkage disequilibrium. Contributions of epigenetic variants on human diseases are also suggested. Integration of epigenome-wide association study with GWAS would be an effective method to detect origins of functional causality in the identified genetic loci.101

Evaluation of gene–gene or gene–environmental interaction

Standard GWAS are performed using the single-locus test that considers the impact of each locus separately. Recently, interest has focused on epistasis that considers interactions between multiple factors, such as gene–gene or gene–environmental interactions (Figure 2e).102 Although there are few reports on gene–gene or gene–environmental interactions in the field of human quantitative traits, these approaches deserve further evaluation.

Detection of rare variants using next generation sequencing

Most of the GWAS to date have focused on common variants (minor allele frequency 0.01–0.05), although recent developments in next generation sequencing technologies have enabled the assessment of rare variants (minor allele frequency <0.01; Figure 2f).103, 104 Large-scale databases of rare human variants assembled by the 1000 Genomes Project Consortium are now publically available,105 and association studies assessing rare variants have been launched.106, 107 Family-based linkage analysis is known to capture rare variants that are not detected by case-control GWAS, and the application of next generation sequencing to linkage analysis is also reported.108

Extraction of shared components among correlated traits

Finally, we would like to introduce a preliminary approach that would dissect the shared genetic architecture of multiple hematological traits. Because hematological traits are inherently correlated, we performed principal component analysis for the trait values to extract shared components using Japanese subjects obtained from the BioBank Japan Project (n=13 008; Figure 2g).58 Each PC showed unique patterns of correlation with the original hematological traits, and the top four PC (PC 1–4) explained >90% of variance in the covariate-adjusted trait values. We conducted a GWAS in which each PC was assessed as a dependent variable, and using this approach, we identified novel loci that satisfied genome-wide significance threshold of P<5.0 × 10−8 for a certain PC, but did not satisfy the threshold in any of the GWAS for the original hematological traits (Okada Y, unpublished data). Although the biological meaning of each PC would be difficult to interpret, QTL-GWAS for PC could provide additional information for screening the loci implicated in hematological traits. An approach to evaluate the ratios of the trait pairs as secondary traits, in addition to the primary traits themselves, was also proposed.109

Summary

As we have described in this review, the GWAS approach has successfully elucidated the genetics of hematological traits in humans. Additional studies, including assessments of the proposed strategies, are needed to uncover underlying genetic architectures that have yet to be detected.