Introduction

Hair morphology is one of the most divergent traits among human populations. Africans and Melanesians have twisted hair, and Asians have thicker hair than people of the other continents.1 To discover genes involved in human hair morphology, 21 candidate genes were previously picked up based on their functions and genetic differentiation between populations. Among these candidate genes, a nonsynonymous single nucleotide polymorphism (SNP) in ectodysplasin A receptor (EDAR), EDAR 1540T/C (rs3827760), which is highly differentiated between Asian (HapMap-CHB+JPT) and other populations (YRI and CEU), was found to be strongly associated with hair fiber thickness.2, 3 In addition, the chromosomal region of the EDAR gene showed a strong signature of recent positive selection in East Asian populations2, 4, 5, 6, 7, 8, 9 These results led to the conclusion that EDAR is a major genetic determinant of Asian hair thickness and is likely to play an important role in adaptation to the local environments of East Asia.

As the variation in hair thickness cannot be explained solely by EDAR 1540T/C,3 there might be other genetic variants associated with hair thickness. In the previous studies, however, only EDAR, but not other candidate genes, was evaluated. In addition, genes associated with the shape of cross-sectional area or hair index have not been identified yet. To further identify genes associated with hair morphology, this paper examined the possible association between hair morphology and 10 candidate genes including ectodysplasin A (EDA).

Materials and methods

Subjects

As described in the previous studies,2, 3 DNA samples and hair morphological data were obtained from 189 Japanese (JPN), 121 Indonesian (IDN) and 65 Thai-Mai (THM) individuals. Hair cross-sectional area, small diameter, large diameter and hair index, that is, the ratio of small diameter to large diameter, were used as the measure of hair fiber thickness and that of the shape of hair fibers.

SNP selection and genotyping

In the previous study, 21 candidate genes were shown to be highly differentiated.2 Of these genes, we subjected LEF1, MSX2, DLL1, EGFR, CUTL1, NOTCH1, FGFR2, KRT6IRS, GPC5, AKT1, MYO5A, TGM3, EDA2R and EDA, which are highly differentiated between HapMap-CHB+JPT and other HapMap populations (YRI and CEU). For ectodysplasin A2 receptor (EDA2R), a nonsynonymous SNP (rs1385699) was genotyped, because rs1385699 was recently reported to be involved in androgenetic alopecia.10, 11 In total, 17 SNPs of the 14 candidate genes listed in Table 1 were genotyped using either the DigiTag2 method12 or PCR-direct sequencing.

Table 1 Candidate SNPs with high population differentiation for the association study

Association of candidate SNPs with hair morphology

The allele frequencies were estimated by gene counting. SNPs with minor allele frequency less than 0.1 in all the studied populations (that is, JPN, IDN and THM) were excluded from the subsequent association analyses. As the previous study did not show the significant difference in phenotype of hair morphology between IDN and THM,2 these populations were merged and designated as the Southeast Asian population (SEA) in the following multiple regression analysis. Associations between the candidate SNPs and hair morphology (cross-sectional area, small diameter, large diameter or hair index) were evaluated by a multiple regression analysis with the number of the major allele in the CHB+JPT population (that is, 0, 1 or 2) of each SNP, age, sex and population (SEA or JPN) as independent variables. As the hair of Asians has a larger cross-sectional area, larger small diameter, larger large diameter and a higher hair index than African and European populations, alleles associated with an increase in hair thickness and/or hair index were assumed to have higher population frequencies in the CHB+JPT population than the YRI and the CEU populations. Therefore, the P-values were calculated by a one-sided test. A multiple regression analysis considering the effect of EDAR 1540T/C was also performed, in addition to the other factors. To compare the goodness of fit between a statistical model including EDAR 1540T/C and that not including EDAR 1540T/C, we used Akaike's informational criteria (AIC) as guidance. For testing the association of X-linked genes, the multiple regression analyses were performed for male and female participants separately.

FST values and structure of linkage disequilibrium in FGFR2

FST values in the FGFR2 region were calculated based on the HapMap data. The structures of linkage disequilibrium (LD) in a 200 kb genomic region containing FGFR2 in the CHB+JPT population were assessed using pairwise D′ and r2 values between SNPs with minor allele frequency of more than 0.05.

Association of FGFR2 genotype with mRNA expression level

The association between an SNP of fibroblast growth factor receptor 2 (FGFR2), rs4752566 and the mRNA expression level was evaluated using a simple regression analysis with the number of rs4752566-T as an independent variable. Normalized mRNA data from Epstein–Barr virus-transformed lymphoblastoid cell lines derived from 44 JPT and 45 CHB HapMap subjects were obtained from the database of the Gene Expression Variation (GENEVAR) project (http://www.sanger.ac.uk/humgen/genevar/).13 The mRNA expression data detected by GI_13186256-I probe for FGFR2 were used in this study.

Results

The allele frequencies of 17 SNPs listed in Table 1 were investigated in JPN, IDN and THM. Five SNPs including rs1385699 of EDA2R showed minor allele frequency of less than 0.1 in all the studied populations (data not shown). Such SNPs with low minor allele frequency were excluded from the subsequent association analyses because the association test was not expected to attain enough statistical power.

The association P-values for the remaining 12 SNPs on 10 genes are summarized in Table 2. Of these, a G/T SNP (rs4752566) in the 9th intron of FGFR2 showed the strongest associations with cross-sectional area (P-value=0.0330 and P-value=0.0052 when considering the effect of EDAR 1540T/C; Table 2), small diameter (P-value=0.116 and P-value=0.029 when considering the effect of EDAR 1540T/C; Table 2) and large diameter (P-value=0.0074 and P-value=0.0015 when considering the effect of EDAR 1540T/C; Table 2). These significant associations were observed in IDN, but not in the other two populations, when the test was performed in each population (data not shown). SNPs on the TGM3 and EDA genes were weakly associated with large diameter (Table 2). However, no associations were detected for the other genes.

Table 2 Results of the association analysis between SNP genotypes and hair morphology

The estimated per-copy effects of the rs4752566-T allele (that is, regression coefficient of rs4752566) on cross-sectional area, small diameter and large diameter were 206.3 μm2, 2.8 μm and 2.8 μm, respectively. When considering the effect of EDAR 1540T/C, the per-copy effects of the rs4752566-T allele on cross-sectional area, small diameter and large diameter were 274.3 μm2, 1.4 μm and 3.4 μm, respectively. The per-copy effects of rs4752566-T were smaller than those of EDAR 1540C (578.4 μm2 for cross-sectional area, 4.2 μm for small diameter and 4.5 μm for large diameter). A power calculation revealed that a high statistical power (>0.8) is difficult to be achieved in this study if the per-copy effect of an allele is half of EDAR 1540C (data not shown). Thus, the possibility that a false negative may have occurred for SNPs with small effect cannot be excluded.

For the associations of rs4752566 with cross-sectional area, small diameter and large diameter, model evaluation using AIC suggested that a regression model including EDAR 1540T/C as one of independent variables fits the data better than that not including EDAR 1540T/C (AIC=5195.21 vs 5233 for cross-sectional area; AIC=1536.29 vs 1576.2 for small diameter; AIC=1838.34 vs 1858 for large diameter). As the association of rs4752566 with large diameter was still significant after stringent multiple testing correction (Bonferroni-adjusted P-value=0.018 when considering the effect of EDAR 1540T/C) and the association of rs4752566 with cross-sectional area was marginal (Bonferroni-adjusted P-value=0.062 when considering the effect of EDAR 1540T/C), we focused on rs4752566 in the following analyses.

The rs4752566-T, which is derived and a major allele in the HapMap-CHB+JPT population, was associated with the increase in the cross-sectional area. A multiple regression analysis also revealed the significant effect of population (SEA or JPN) on the cross-sectional area (P-value<0.0001) even when considering the effects of EDAR 1540T/C and rs4752566. This implies the presence of other genes, which account for the variation in the cross-sectional area in Asians, although unknown environmental factors, which are different between Southeast Asia and East Asia, may have an important role in hair formation.

To examine whether FGFR2 has a number of SNPs highly differentiated between CHB+JPT and YRI, FST values of the SNPs on FGFR2 were calculated (Figure 1a). No SNPs with a higher or equal FST value more than the 95th percentile other than rs4752566 were found in the FGFR2 region. To evaluate the extent of LD from rs4752566, the structure of LD in a 200 kb genomic region where rs4752566 was located at the center was investigated in the HapMap-CHB+JPT population (Figures 1b and c). In this region containing only the FGFR2 gene, rs4752566 was not in strong LD with the surrounding SNPs, suggesting that the significant association of rs4752566 with the hair thickness is not due to LD with polymorphisms of the other genes. Therefore, a polymorphism associated with hair thickness appears to be located at least in the FGFR2 region, although rs4752566 may not be a causative SNP.

Figure 1
figure 1

FST, LD and expression of FGFR2. (a) Structure of the FGFR2 gene and FST values in the FGFR2 gene. (b) Pairwise LD measured with D′ between SNPs in a 200 kb genomic region including the FGFR2 gene. Bright red squares indicate high D′ values (D′=1) and high LOD scores (LOD>=2), and light blue squares indicate high D′ values (D′=1) and low LOD scores (LOD<2). For other cases, the D′ value is shown in each square. (c) Pairwise LD measured with r2 between SNPs in a 200 kb genomic region including the FGFR2 gene. Black squares indicate high r2 values (r2=1), gray squares indicate intermediate r2 values (0<r2<1) and white squares indicate no LD (r2=0). (d) The association between rs4752566 genotypes and the expression level of FGFR2 in the HapMap-CHB+JPT population. The number of samples for each genotype is presented in parentheses.

The association between the genotypes of rs4752566 and mRNA expression level of FGFR2 was evaluated to determine the functional role of rs4752566-T, which was associated with thicker hair. The number of rs4752566-T was significantly correlated with the level of expression (P-value=0.027) in EBV-transformed lymphoblastoid cell lines from the CHB+JPT individuals (Figure 1d). As no significant difference in the mRNA expression level was observed between GT and TT genotypes, the influence of rs4752566 on the FGFR2 expression level seems to be modest, and an apparent correlation may have come from small sample size of the GG genotype. The present results therefore need to be confirmed in larger samples.

Discussion

In this study, an SNP in intron 9 of FGFR2, rs4752566, was found to be significantly associated with hair thickness after multiple testing correction, and to be significantly correlated with the mRNA expression level of FGFR2. It is known that FGFR2 has an essential role in the proliferation of epidermal cells,14, 15 thus rs4752566 itself or a polymorphism in LD with rs4752566 would influence hair thickness through alternation in the expression level of FGFR2. As no significant correlations between rs4752566 and the FGFR2 expression were observed in EBV-transformed lymphoblastoid cell lines from the CEU and YRI populations (data not shown), rs4752566 may not be a causative SNP and there might be an Asian-specific causative polymorphism near rs4752566. As the association of rs4752566 with hair thickness was not strong and much weaker than that of EDAR 1540T/C, it is necessary to conduct a replication study to confirm the association of FGFR2 with hair thickness.

The candidate genes included two EDAR-related genes, EDA and EDA2R; two different isoforms of EDA, EDA-A1 and EDA-A2, specifically bind to EDAR or EDA2R.16 They also harbored SNPs with high population differentiation.2, 7, 10 Especially, rs1385699 in EDA2R is a nonsynonymous SNP with high population differentiation, in which the derived allele of rs1385699 was not observed in YRI, whereas it reached the population frequency of 79.8 and 100.0% in CEU and CHB+JPT, respectively.7 Although rs1385699 in EDA2R is a promising candidate for the hair morphology-determining gene, the association cannot be tested because of a low minor allele frequency in the studied Asian populations. Association studies in European populations would be appropriate for examining the effect of EDA2R on hair morphology. The other EDAR-related gene, EDA, also showed a high differentiation between HapMap-CEU and HapMap-CHB+JPT, and a significant association of the EDA polymorphism, rs1938023, with large diameter was detected in male participants (P-value=0.029 when considering the effect of EDAR 1540T/C; Table 2). However, no significant association was observed in female participants. Thus, the association of the EDA polymorphism with large diameter requires to be re-examined in future studies.

Unlike EDAR 1540C allele, no extended LD was observed from rs4752566-T allele of FGFR2 in CHB+JPT (Figures 1b and c), suggesting that the higher population frequency of rs4752566-T in CHB+JPT than YRI and CEU has not been attained by recent positive selection. As rs4752566-T is observed in YRI (Table 1), a mutation of rs4752566-T appears to predate the ‘out-of-Africa’ event of modern humans. Thus, high interpopulation differentiation of rs4752566 may have been caused by random genetic drift, although it is difficult to fully exclude the possibility of positive selection having acted in ancestors of East Asian origin because the extended LD, as a signature of positive selection, is difficult to be detected for a standing allele such as rs4752566-T.6

As EDAR 1540C allele is almost absent in African and European ancestors,7 the mutation is considered to have occurred in the ancestors of Asian after the split from the ancestors of European origin. Thus, the possibility of local adaptation or positive selection related to hair thickness in non-Asian populations could not be discussed in our previous study.2, 3 If thicker hair is always advantageous in humans, an allele associated with thicker hair is expected to be highly frequent in all the populations where it exists. The rs4752566-T allele, which was found to be associated with hair thickness has lower population frequency in YRI and CEU (Table 1), implying that thicker hair may have been less advantageous in the African and European than in East Asian populations or selection intensity may be different among populations.

In this study, no strong genetic determinants of hair morphology aside from EDAR 1540T/C were detected. Since polymorphisms associated with hair thickness or hair index are not always a high population differentiation, it is therefore necessary to analyze the other hair-related genes to identify further genetic variants associated with hair thickness.

Conflict of interest

The authors declare no conflict of interest.