Introduction

Skin pigmentation is essential in the protection against ultraviolet (UV) radiation1,2. A complex regulatory system controls the production of melanin3, the main pigment providing colour to the skin4. Melanin is produced by melanocytes in the epidermis and is deposited in melanosomes, which are transferred to adjacent keratinocytes5. Melanocytes are also implicated in several other important bioregulatory, metabolic and homeostatic processes, both in the skin and in other organs5.

Skin colour varies among different populations and is strongly correlated with latitude due to the variation in UV radiation intensity6. Moreover, several selective factors have been implicated in the evolution of human pigmentation towards darker pigmentation in equatorial and tropical regions2, including: protection against the harmful effects of UV radiation exposure7; protection against folate photolysis2,8; maintenance of adequate levels vitamin D9; and contributing to the skin’s barrier function by optimizing water conservation and improving cutaneous antimicrobial defense10.

The colour of unexposed skin (constitutive skin pigmentation) is a complex trait11. Indeed, evidence supports that many genes and other interacting factors are involved in determining normal skin pigmentation12,13. However, candidate-gene and genome-wide association studies (GWAS) have revealed only a few of the total estimated number of genes implicated in the variability of human skin colour2,14,15. Moreover, despite the large differences in skin pigmentation across populations, most genetic association studies of skin colour have been performed in European16,17 and Asian populations18,19,20. Only a handful of candidate-gene association studies has been performed in African ancestry populations14,21,22,23, and one lone GWAS has been carried out in African-European admixed individuals from Cape Verde24.

Hispanics/Latinos from Puerto Rico are the result of the admixture of European, African, and Native American ancestry. Specifically, the Native American component derives from the Taínos, the native population of Puerto Rico, which was highly reduced by slavery trade, warfare and diseases25, and the European component was introduced by the Spanish settlers26,27,28. Later on, the Spanish brought African slaves who replaced the indigenous population of the island. Therefore, the resulting population has nowadays a predominant European admixture, followed by African ancestry and lower Native American component. Given that performing GWAS in recently admixed African-ancestry populations provides an opportunity to identify novel genes implicated in skin colour variability14, we hypothesized that a GWAS in Hispanics/Latinos from Puerto Rico could reveal novel genes contributing to this trait. Herein, we identify genetic variants influencing skin pigmentation in African-admixed individuals analysing 14 million genetic variants across the genome.

Methods

Ethics statement

This study has been approved by the institutional review boards of University of California San Francisco and all participant centres. Written informed consent was obtained from all subjects or from their appropriate surrogates for participants under 18 years old. All methods were performed in accordance with the relevant guidelines and regulations for human subject research, in accordance with the Declaration of Helsinki.

Study populations

Samples from the Genes-environment & Admixture in Latino Americans (GALA II) Study and the Study of African Americans, Asthma, Genes & Environments (SAGE II) were used for the discovery of genetic variants associated with skin colour and replication of results, respectively. The GALA II and SAGE II studies are two independent case-control studies initially conceived for the study of genetic and environmental factors involved in asthma. Both studies used the same protocol and questionnaires to recruit unrelated children aged 8 to 21 years old, but focused on two different racial/ethnic groups: Hispanics/Latinos in GALA II and African Americans in SAGE II. All recruited subjects must have reported that all four grandparents self-identified as Hispanics/Latinos (GALA II) or African Americans (SAGE II). Participants from GALA II with skin colour measurements included in this study were recruited in Puerto Rico, while SAGE II participants were recruited from the San Francisco Bay Area29,30.

Skin colour characterization

We used the DSM II ColorMeter (Cortex Technology, Hadsund, Denmark) to measure skin pigmentation in triplicate for each participant along the inner side of each upper arm. Melanin was measured using the melanin index, defined as the inverse of the melanin reflectance measured at 650 nm24; lower values of the melanin index correspond to light skin colour, whereas larger values correspond to dark skin colour.

Genotyping and assessment of genetic ancestry

Genome-wide genotyping data from participants of both studies were obtained by using the Axiom LAT1 array (World Array 4, Affymetrix, Santa Clara, CA, United States), and quality control procedures were performed as described elsewhere29,30. Genotype data were obtained for the 285 Hispanics/Latinos from Puerto Rico from GALA II and the 373 African Americans from SAGE II who had available skin colour measurement data.

Genetic ancestry was initially assessed by performing a principal components analysis (PCA) using EIGENSOFT31. We assessed ancestry structure among Hispanics/Latinos from Puerto Rico and African Americans using the African (YRI) and European (CEU) reference populations from the 1000 Genomes Project (1KGP)32, and Native American (NAM) individuals genotyped with the Axiom LAT1 array29. Genetic ancestry proportions for each subject were also estimated with an unsupervised model from ADMIXTURE33, using the CEU and YRI as parental populations for African Americans, and CEU, YRI and NAM as parental populations for Hispanics/Latinos from Puerto Rico.

Imputation, association testing and meta-analysis

Genetic variants located in autosomal chromosomes were imputed by means of the Michigan Imputation Server34, using SHAPEIT35 for haplotype reconstruction, and Minimac3 software for the imputation step36. The first release of the Haplotype Reference Consortium (HRC) was used as the reference population37.

Association testing with skin colour was performed in Hispanics/Latinos from Puerto Rico by means of the linear Wald test implemented in the software EPACTS 3.2.638, adjusting by the proportions of African and Native American ancestries. The results were then filtered to retain those variants with a minor allele frequency (MAF) ≥1% and Rsq ≥0.3. Variants associated with skin pigmentation in the GALA II discovery sample at a suggestive significance level (p ≤ 1 × 10−5) were followed up for replication in SAGE II African Americans. Association testing was performed similarly in the replication sample, with the exception that only African ancestry was used to adjust for genetic ancestry.

Results from the discovery and replication samples were meta-analyzed using METASOFT. Random-effects models were applied for single nucleotide polymorphisms (SNPs) showing heterogeneity of effects between studies (Cochran’s Q test p-value ≤ 0.05) and fixed effects models for those SNPs without evidence of heterogeneity (Cochran’s Q test p-value Q > 0.05)39. Genome-wide significance was declared at p-value ≤ 5 × 10−8.

Chromosomal regions containing variants that were genome-wide significant were plotted for the discovery sample using Locus Zoom 1.1 (ref. 40) based on linkage disequilibrium (LD) data from the 1KGP (GRCh37/hg19 build)32. Independence of association signals with skin colour among SNPs located within the same genomic region was assessed by multivariate linear regression analyses conditioned on the most significant SNP of each region using R 3.2.2 (ref. 41).

Allele frequency distribution assessment of rs6602666

We assessed the distribution of the minor allele frequency of the novel associated variant rs6602666 across different populations. We first used the Geography of Genetic Variants Browser Beta v0.2 to plot allele distributions in African, admixed American, East Asian, European, and South Asian populations from 1KGP Phase III42. Given that Native American populations are not represented in the 1KGP dataset, we downloaded publicly available data for 108 Native American individuals described in Lazaridis et al.43 (7 Bolivian, 12 Karitiana, 18 Mayan, 10 Mixe, 10 Mixtec, 10 Nasoi, 4 Piapoco, 14 Pima, 5 Quechua, 8 Surui, and 10 Zapotec). Allele frequency in those groups was assessed using PLINK44.

Results

Ancestry composition and skin colour distribution

Our analysis of the ancestral composition using PCA revealed that no individuals were outliers regarding their ancestry composition (Supplementary Fig. S1). As expected, Hispanics/Latinos from Puerto Rico had a larger proportion of European and lower contribution of African and Native American admixture compared with African Americans (Table 1, Supplementary Fig. S2). The replication sample showed a predominant African component and, to a lesser extent, European ancestry (Table 1, Supplementary Fig. S2). Therefore, despite being two African-admixed populations, Hispanics/Latinos from Puerto Rico had significantly smaller proportions of African admixture (22.8% ± 9.5%) compared with African Americans (80.9% ± 10.0%, p < 0.001).

Table 1 Characteristics of the individuals included in the discovery and replication stages.

A summary of the descriptive data of the individuals from our study is shown in Table 1. Age average and proportion of males were similar across the discovery and replication samples, and neither of those characteristics was associated with skin pigmentation (p > 0.05) and therefore they were not included as covariates in the GWAS. Additionally, Hispanics/Latinos from Puerto Rico had lighter skin (45.8 ± 6.8) than African Americans (71.9 ± 13.5) (p < 0.001). Actual distributions of the melanin index for the two populations are shown in Fig. 1.

Figure 1: Distribution of melanin index in the discovery and replication samples.
figure 1

The x-axis represents the melanin index for Hispanics/Latinos from Puerto Rico (blue) and African American (yellow) samples; the y-axis represents the number of observations.

Discovery study in Hispanics/Latinos from Puerto Rico

Association analyses of the 14 million imputed variants with MAF ≥1% in the discovery sample revealed a total of 82 SNPs associated with skin colour at a suggestive significance level (p-value ≤ 1 × 10−5) (Supplementary Table S1). No major genomic inflation (λGC = 1.02) was observed in the Q-Q plot (Fig. 2A) and the most significant SNPs were located in chromosomes 5, 10, and 15 (Fig. 2B). The top hit was rs2675345, located within SLC24A5, which was near genome-wide significance (p = 5.83 × 10−8; β for G allele: 3.31, 95% CI: 2.14–4.47).

Figure 2: Results of the melanin index GWAS in the discovery stage.
figure 2

(A) Quantile-quantile plot showing the observed -log10 p-values versus the expected -log10 p-values. (B) Manhattan plot of association results (represented as -log10 p-value on the y-axis) along the chromosomes (x-axis). The suggestive significance threshold for replication is indicated by the black line (p = 1 × 10−5).

Replication of associated variants in African Americans and meta-analysis

Of the 82 SNPs that were significant at a suggestive level in Hispanics/Latinos from Puerto Rico, 77 were followed up for replication in the African American sample, since the remaining five were either monomorphic (three SNPs) or had a MAF <1% (two SNPs). Out of the 77 SNPs, 14 replicated in the African American sample (p-value < 0.05) and effect sizes were all in the same direction and of similar magnitude as the discovery sample (Table 2).

Table 2 Melanin index meta-analysis results for suggestively associated SNPs that also nominally replicated.

The meta-analysis showed evidence of association for nine of the 14 SNPs at a genome-wide significance level (p < 5 × 10−8) (Table 2). These SNPs are located within three genomic regions. Two regions are already known to contribute to skin colour, including SLC24A518,19,20,21,23,45 and surrounding genes (Supplementary Fig. S3) as well as the SLC45A218,24,45 gene (Supplementary Fig. S4). We also report one novel region described for the first time in the current study, located in the intergenic region of BEND7 and PRPF18. The remaining five SNPs not reaching genome-wide significance were located in three genes SST-RTP2, ATP8B4, and EIF2S2-ASIP; the two last genes have previously been associated with skin colour-related traits18,23,46,47.

Three SNPs within SLC24A5 showed the strongest meta-analysis association signals: rs1426654 (β for G allele: 4.36, p = 2.62 × 10−14), rs2675345 (β for G allele: 3.89, p = 2.98 × 10−14), and rs2470102 (β for G allele: 4.31, p = 3.70 × 10−14). We also detected two SNPs near SLC24A5 that were genome-wide significant, one located within DUT (rs11637235, β for T allele: −3.83, p = 3.34 × 10−10) and the other within MYEF2 (rs8028919, β for A allele: −3.31, p = 1.62 × 10−10). After including all five SNPs located within or near SLC24A5 in one common regression model, we determined that the association signal among Hispanics/Latinos from Puerto Rico was driven by the top SNP (rs2675345), as the regression coefficients for the other SNPs were not significant in the common model (Supplementary Table S2).

The second locus with genome-wide significant association with skin colour in the meta-analysis was in SLC45A2: rs16891982 (β for G allele: −2.85, p = 9.71 × 10−10) and rs35397 (β for T allele: −2.66, p = 2.05 × 10−8). These two SNPs had high LD (r2 ≥ 0.82) in both populations and their association with skin pigmentation was driven by rs16891982 (a SNP associated with skin colour by previous studies)18, given that rs35397 lost significance after performing regression analysis conditioned on rs16891982 (p = 0.945) (Supplementary Table S2).

Moreover, two SNPs located in the intergenic region of BEND7 and PRPF18 (Fig. 3) were associated with skin colour at genome-wide significance in the meta-analysis: rs6602665 (β for C allele: 4.01, p = 6.14 × 10−9) and rs6602666 (β for G allele: 4.03, p = 4.58 × 10−9), which showed strong LD in the discovery and replication samples (r2 = 0.99). These SNPs were more significantly associated with skin colour in Hispanics/Latinos from Puerto Rico (β = 4.72, p = 7.27 × 10−7 for both SNPs) than in African Americans (β = 3.20, p = 1.80 × 10−3 and β = 3.14, p = 2.34 × 10−3 for rs6602666 and rs6602665, respectively). As expected by the high LD between the two SNPs, they represented one association signal in regression analysis when both SNPs were incorporated into the same model (p = 0.788 for rs6602666).

Figure 3: Regional plot of association results in the discovery stage for the BEND7/PRPF18 intergenic region, a novel locus for skin pigmentation.
figure 3

The statistical significance of association results (-log10 p-value) is represented for each SNP as a dot (y-axis) by chromosome position (x-axis). The top hit (rs6602665) is represented by a diamond and remaining SNPs are colour coded based on their LD with this SNP, indicated by pairwise r2 values for American populations of the 1KGP.

The frequency distribution of the G allele of rs6602666 in 1KGP Phase III (Fig. 4) showed that this variant is more prevalent in populations with African ancestry (MAF = 30%) and in South Asians (MAF = 8%), and has a lower frequency in admixed American populations (MAF = 3%). Among admixed American populations, this variant was more prevalent in Puerto Ricans residing in Puerto Rico. In contrast, this variant is almost absent in Europeans and East Asians. An assessment of allele frequency in populations of Native American origin revealed that this variant is monomorphic in the 108 samples from the 11 Native American populations with available data43.

Figure 4: Allele frequency map for rs6602666, the most significant meta-analysis SNP from the intergenic region of BEND7 and PRPF18.
figure 4

Frequency proportions for the effect (G) and non-effect (A) alleles are represented in dark and light gray, respectively. Obtained from the Geography of Genetic Variants Browser Beta v0.242.

Discussion

In this study, we performed the first GWAS of skin colour in Hispanics/Latinos from Puerto Rico from the GALA II study. After performing genotype imputation and subsequent association testing, we detected 82 suggestive association signals in Hispanics/Latinos, 14 of which replicated at nominal significance in an independent African American sample from the SAGE II study. We identified novel, genome-wide significant associations between skin colour and variants from the BEND7/PRPF18 intergenic region. We also validated the association of five genes already known to contribute to skin colour identified primarily in European populations: two loci with genome-wide significance (SLC24A5 and SLC45A2), and three at a suggestive level (EIF2S2, ASIP, and ATP8B4). In addition to replicating previously described SNPs18,21, our results also revealed additional loci within the same region (e.g., rs2675345 from SLC24A5).

Among the three most significantly associated gene regions, variants near or within SLC24A5 showed the strongest association signals. This gene is located in the 15q21.1 chromosomal band and encodes the NCKX5 protein (solute carrier family 24 [sodium/potassium/calcium exchanger], member 5), an intracellular membrane protein whose function has been associated with skin colour and diseases related to skin pigmentation21,48. The top SNP in our meta-analysis (rs1426654) has been also associated with skin colour in African American and African Caribbean populations in a candidate-gene study21, and has broadly replicated across different populations18,19,20,23,24,45.

We also validated the association of SLC45A2 with skin colour in Hispanics/Latinos from Puerto Rico and African Americans. This gene encodes SLC45A2, which is a transporter highly expressed in the melanosomal membrane of melanocytic cell lines, where it is overexpressed in melanoma cells49. SLC45A2 has been associated with different pigmentary traits (e.g., eye, skin, and hair colour)18 and diseases50. We confirmed the association of a previously described SNP with normal skin pigmentation (rs16891982 [Phe374Leu]), which was first identified in South Asians18 and has since been validated in other populations, including African-admixed individuals24,45.

Notably, we detected two novel genome-wide significant associations (rs6602665 and rs6602666) in the intergenic region of BEND7 and PRPF18 with skin colour. Both variants are located closer to PRPF18 (approximately 23 kb) than to BEND7 (approximately 83 kb). The function of the intracellular protein encoded by BEND7 (BEN domain-containing protein 7) is not extensively known. Nevertheless, it contains the BEN domain, which is involved in transcription regulation throughout recruitment of chromatin remodelling factors and DNA-protein interactions51. The other gene located nearest the top SNP of this region, PRPF18 (pre-mRNA processing factor 18), encodes a splicing factor implicated in pre-mRNA splicing by means of protein-protein interactions52. While no skin pigmentation-specific functions have been attributed to any of these two flanking genes, RNA for both genes is expressed in skin regardless of exposure to UV light, with higher levels of expression for PRPF1853. However, at the protein level, only PRPF18 is expressed in melanocytes and other skin cells54.

Interestingly, based on 1KGP data, the rs6602666 G allele (which is associated with darker skin colour) is present in African, South Asian, and admixed American populations, rare in Europeans, and completely absent in Native Americans. Therefore, differences in the proportion of African genetic ancestry may provide a simple explanation as to why this locus has not been detected in previous GWAS, since they were predominantly focused on populations of European descent. This observation underscores the importance and scientific benefit of studying admixed populations, as the inclusion of genetically diverse groups improves statistical power, particularly when genetic variants are rare55.

Identification of genes implicated in human skin pigmentation has high anthropological, forensic and biomedical interest56,57. For example, genes associated with skin colour are also important in regulating vitamin D levels in Caucasian populations58. Given that vitamin D deficiency has been implicated with a variety of diseases59, and the fact that the majority of circulating vitamin D is derived from photochemical reactions in the skin, genes affecting skin pigmentation could play an indirect role in several diseases60. Furthermore, identification of genes involved in controlling melanin levels in the skin could provide new insights regarding the genetics of several types of skin cancer61. In fact, some of the skin pigmentation associated loci in our present study, such as SLC45A2, have also been associated with protection against basal cell carcinoma, squamous cell carcinoma62, and melanoma among Europeans50, who are at increased risk of developing skin cancer60. Furthermore, another gene associated with skin colour in our study, ASIP, has been previously implicated in basal cell carcinoma16. Therefore, the novel locus associated with skin pigmentation in the current GWAS might also be relevant for skin cancer susceptibility in African-descent populations. Indeed, data from the National Cancer Institute’s Surveillance, Epidemiology and End Results Program have shown that melanoma incidence is lowest among African Americans (1.0% in females and 1.1% in males), intermediate among Hispanics/Latinos (4.4% in females and 4.8% in males), and highest among non-Hispanic whites (19.4% in females and 32.2% in males)63. The lower incidence of melanoma in individuals from populations with darker skin may be attributed to the protective effects of higher melanin levels64. Furthermore, there are differences in the prevalence of other types of skin cancer among Hispanics and African Americans, such as basal cell carcinoma, which is the most prevalent skin cancer in Hispanics65,66 and the second most common skin malignancy in non-Hispanics with African ancestry.

In addition to skin cancer, other diseases of the skin (e.g., vitiligo, psoriasis, or alopecia areata) could be affected by the genetic variants identified in the current study. To date, association studies linking either of the genome-wide significant SNPs (rs6602665 and rs6602666) or their flanking genes (BEND7 and PRPF18) with any of these skin diseases are lacking; future studies should therefore investigate the association of the novel locus with these skin diseases.

Some of the SNPs identified as suggestively associated with skin pigmentation in the current study are located in gene regions previously associated with skin pigmentation, such as EIF2S246, ASIP47, and ATP8B418. Nevertheless, we found another suggestive hit near two genes that had not been previously associated with this trait (SST and RTP2) and deserve further attention in future studies.

Our study has several advantages that should be highlighted: a) skin colour was assessed using skin reflectance spectrometry obtaining a quantitative measure of skin pigmentation, as opposed to many previous studies based on self-reported skin colour; b) skin pigmentation measures were obtained using the same instrument in different recruitment centres participating in the GALA II and SAGE II to reduce possible biases; c) the analysed samples were genotyped with a specific array for African-admixed populations, providing good representation of their genomic variation67; d) for the first time in a GWAS of skin colour, the extensive catalogue of genetic variants provided by whole-genome sequencing data from the HRC reference panel was used37.

The current study also has some limitations that should be considered. Sample size was relatively limited, yet we had sufficient statistical power in the discovery sample (83%) to detect the association of genetic variants with allele frequencies ≥25% and effects sizes (β) ≥ 3.5. Statistical power was limited for variants with lower allele frequencies and modest effect sizes. Moreover, the differing proportions of African admixture and distinct distributions of skin colour between our study populations point to differences in the genetic architecture of skin colour between our two populations. In fact, a proportion of the SNPs associated with skin colour in Hispanic/Latinos was monomorphic or had a low frequency in African Americans, precluding replication attempts of those variants among African Americans. Given that Hispanics/Latinos from Puerto Rico have lower Native American proportions than other Hispanic/Latino subgroups29, our results may not generalize to other Hispanic/Latino groups. Therefore, additional replication should be performed in other populations with different ancestry proportions and skin phototypes.

We measured skin pigmentation using the melanin index. It is possible that our GWAS could have yielded different results had we used alternative methods for measuring melanin, such as pyrrole-2,3,5-trycarboxilic acid (PTCA), aminohydroxyphenylalanine (AHP), or electron paramagnetic resonance spectroscopy (EPR)68. Future studies using these alternative methods may provide convergent validity to our results if the associated loci are truly related to the melanogenesis. Finally, clear functional evidence relating the novel locus, BEND7-PRPF18, with skin pigmentation has not yet been described. Given the involvement of these genes in pre-mRNA processing and transcription regulation, these loci could be related to melanocytic proliferation. Unfortunately, skin biopsies from the patients in this study are unavailable for performing histologic studies or in vitro experiments using epidermal cell cultures. Therefore, the functional role of associated variants will need to be assessed by future studies.

In summary, this GWAS validated the role of SLC24A5 and SLC45A2 with skin melanin levels in Hispanics/Latinos from Puerto Rico and African Americans, and identified a novel association of variants in the intergenic region of BEND7 and PRPF18 with this trait. Therefore, this study reinforces the advantages and the necessity of analyzing African-admixed populations to identify new loci involved in complex traits.

Additional Information

How to cite this article: Hernandez-Pacheco, N. et al. Identification of a novel locus associated with skin colour in African-admixed populations. Sci. Rep. 7, 44548; doi: 10.1038/srep44548 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.