Introduction

Elucidating the genetic basis of human pigmentation traits such as eye, hair and skin colour is of great interest in many areas of scientific research. For example, pigmentation traits are known to be associated with a number of human diseases, including melanoma and non-melanoma skin cancer.1, 2, 3 Moreover, the prediction of pigmentation phenotypes from genotypes would be highly relevant to ancient DNA research4, 5, 6 and, if and when legally possible, to forensic case work, particularly for solving cases without a suspect.7, 8

The aetiology of human pigmentation traits is thought to be highly complex, possibly involving gene–gene and gene–environment interactions,9, 10, 11 and a high level of phenotypic diversity has been observed among individuals of European descent.12, 13 Interestingly, while the heritability of eye and hair colour is known to be very strong, skin colour appears to be less genetically determined.14

In the recent past, some important advances have been made towards the identification of genes associated with human pigmentation traits. For eye and hair colour, a few major genes show large effects even though some additional genes of minor effect have been found to be trait-associated as well in genome-wide studies, thereby highlighting the polygenic nature of these phenotypes. The HERC2/OCA2 genes have a specifically strong effect on human eye colour.10, 15, 16, 17, 18 Single-nucleotide polymorphism (SNP) rs12913832 in HERC2 shows a particularly strong genotype–phenotype association that is potentially modified by SNP rs1800407 in OCA2,10, 17 a gene involved in human pigmentation via the regulation of melanin production.19 More recently, the rs12913832 region of HERC2 has been reported to act as an enhancer of the transcription of OCA2.19

The red hair phenotype appears to be predominantly determined by the MC1R gene,20 which encodes the melanocortin receptor. For other hair colours, such as blond, brown and black, and for skin colour, weaker genetic associations involving OCA2 and HERC2 have been described in European populations.10 Additional genes suggested to have a minor effect upon pigmentation phenotypes include SLC24A4,21, 22 IRF4,22 TYR,21 TYRP1,23 LYST,24 TTC3/CSCR9,24 ASIP23 and SLC45A5,25 among others.

Recently, the prediction of external features from DNA data has gained considerable interest in forensic science, where this approach is also referred to as ‘forensic DNA phenotyping’, and where considerable progress has been made in this regard, particularly for eye colour.7, 26, 27, 28 Two marker sets comprising six SNPs for eye colour and 13 markers for hair colour have been suggested to reliably predict these traits in European populations.29, 30

We aimed to develop a more specific prediction model for a realistic target population of forensic case work from Northern Germany. To this end, we investigated the association between eye, hair and skin colour and 12 candidate pigmentation SNPs in six different genes. These SNPs were partly overlapping with the two previously established marker sets mentioned above. Because of the special role of red hair, we followed a new phenotyping strategy, dividing hair colour into two, possibly independent, sub-phenotypes: the red tint component and the light-dark component. The light-dark component was defined by nine evenly graded types of shading.

After model selection, we evaluated the predictive capability of our models and compared our results with that of the other two marker sets.29, 30 To obtain unbiased estimates of predictive capability, we adopted a two-stage design for our study with 300 individuals in stage 1, the ‘modelling sample’, and 100 different individuals in stage 2, the ‘estimation sample’. Finally, in addition to deriving population-specific prediction models, we analysed the association between phenotypes and tried to define coherent phenotype groups.

Materials and methods

Study population

A total of 400 unrelated individuals (197 male, 203 female) from Northern Germany were recruited for our study between 2010 and 2011. The median age was 27 years (interquartile range: 24-33 years). All individuals were born in Germany and had German parents and grandparents (self-report). All 400 participants were recruited and investigated in the same way. The first 300 participants were included in the ‘modelling sample’ of stage 1 (used for SNP selection), the remaining 100 individuals constituted the ‘estimation sample’ of stage 2 (prediction evaluation). All participants gave written informed consent before the study. Genotype and phenotype data were de-identified for analysis purposes according to the declaration of Helsinki. The project was approved by the Ethics Committee of the Medical Faculty of Christian-Albrechts University Kiel.

Phenotyping

Pigmentation phenotypes were documented by photographs taken at daylight conditions and from a distance of 30 cm, using a Canon EOS 400D (Canon Deutschland GmbH, Krefeld, Germany) (18–55 mm focal length). For each participant, one photograph was taken of each eye, the scalp hair and the inner arm. Photographs were normalised using the standard functions in Photoshop 4.0 (Adobe Systems Software, San Jose, CA, USA), and consensus phenotype calling was carried out by two raters by discussion. In the rare cases where no agreement was reached a third party was involved.

Eye colour was divided into three categories, namely blue, green and brown (Figure 1a). Individual skin type was classified applying the Fitzpatrick scheme31 to the inner arm. Hair colour type was defined in multitiered fashion. To this end, a collection of coloured hair strands obtained from a hairdresser was categorised into nine evenly graded types of shading, ranging from light blond (type I) to black (type IX) (Figure 1b). Strands of red hair or with red tint were omitted from this classification because it was intended to address the light-dark component only. Then, hair colour was divided into two sub-phenotypes, namely the red tint component (yes/no) and the light-dark component (I–IX). For each individual, the presence of red tint was ascertained (by questioning) in head hair, facial hair (beard), axillary hair or pubic hair. If an individual had red head hair, this was noted separately to enable a separate analysis for this special phenotype. The light-dark component was determined by reference to the hair strand collection mentioned above. Here, individuals with recognisable red tint were classified according to their basic hair colour type. For example, strawberry blonds were deemed class I (blond), whereas people with auburn hair were classified as one of IV, V or VI (brown). Although this was possible for 22 red-haired individuals, four had no definable basic hair colour. These were excluded from the analysis of the light-dark component. No participants with exclusively white hair were included in the study. When a participant had dyed hair, the original hair colour was determined by the hairline.

Figure 1
figure 1

Definition of eye and hair colour phenotypes. (a) Classification of eye colour; blue: 1 – pure blue, 2 – blue-brown, 3 – blue-green; green: 4 – blue-green-brown, 5 – pure green, 6 – green-brown; brown: 7 – amber, 8 – brown-green, 9 – pure brown. (b) Hair strands used for hair colour categorisation; light blond – I, blond – II, dark blond – III, ash – IV, light brown – V, brown – VI, dark brown – VII, black-brown – VIII, black – IX.

Genotyping

Buccal swabs (COPAN) were taken from all 400 participants and DNA was extracted using Chelex 100 (Walsh et al32). In a comprehensive PubMed search, 12 SNPs were identified as promising candidates for further analysis using the following criteria (Table 1, Supplementary Table S1): large odds ratio (OR), validation in several independent studies, large sample sizes and adequate population backgrounds, low to no linkage disequilibrium with other candidate markers and suitability for genotyping in a single assay. In addition to the 12 SNPs, participants were genotyped for rs1426654 (SLC24A5), rs1129038 (HERC2) and rs1667394 (OCA2). SNP rs1426654 is a European ancestry marker (Giardina et al33) used to control population background. SNPs rs1129038 and rs1667394 served as genotyping quality markers because they are in perfect LD with candidate SNPs rs12913832 and rs916977 respectively (Mengel-From et al34; Sturm et al17).

Table 1 Candidate SNPs investigated for an association with different pigmentation traits

Primers were designed and checked for possible dimer and hairpin structures using the DNAstar Lasergene v8.1.2 software (DNASTAR, Madison, WI, USA) and BLAST. PCR fragments had to be shorter than 200 bp in order to meet the standards of reliable forensic or ancient DNA analysis.

For DNA amplification, a Multiplex PCR Master Mix (Qiagen, Hilden, Germany) was used in a total reaction volume of 12.5 μl, with 0.2–0.5 ng template DNA. PCR was performed with a thermal cycler 2700 (Life Technologies, Carlsbad, CA, USA) under the following conditions: (1) 95 °C 15 min; (2) 35 cycles of 94 °C 30 s, 64 °C (SNPs nos 2–5, 7–8, 10 in Table 1) or 58 °C (SNPs nos 1, 6, 9, 11 and 12) 90 s, 72 °C 1 min; and (3) 60 °C 30 min. PCR products were purified using ExoSAP-IT (Affymetrix, Santa Clara, CA, USA) according to the manufacturer’s protocol. Single-base extension was carried out in a total reaction volume of 7 μl, including 0.5 μl of cleaned PCR products, using the SNaPshot Multiplex Kit (Life Technologies) on the same PCR cycler as before. The single-base extension cycling conditions were as follows: 25 cycles of 96 °C 10 s; 55 °C 5 s; and 60 °C 30 s. Fragment analysis was performed with the ABI Prism 3130 Genetic Analyzer (Life Technologies) using GeneMapper v3.2 (Life Technologies). For more information on primer sequences and concentrations, see Supplementary Table S2a.

The model proposed by Walsh et al30 for the prediction of eye colour is based upon six SNPs. Five of these had been genotyped in all our study participants before (Table 1, SNPs nos 1, 3, 10–12). The 100 individuals of stage 2 were additionally genotyped for rs16891982 (SLC45A2) as described above, with an annealing temperature of 58 °C in the first PCR. For more information on primer sequences and concentrations, see Supplementary Table S2b.

The model devised by Branicki et al29 for predicting hair colour is based upon 13 single or compound markers. Three of these were also included in our set of candidate SNPs (Table 1, SNPs nos. 1, 3, 10) and were genotyped in all participants. The 100 stage 2 individuals were also genotyped for the remaining 10 markers, namely two compound markers in MC1R and rs1042602 (TYR), rs4959270 (EXOC2), rs28777 (SLC45A2), rs683 (TYRP1), rs2402130 (SLC24A4), rs12821256 (KITLG), rs16891982 (SLC45A2) and rs2378249 (ASIP). SNPs were analysed as described above, with an annealing temperature of 58 °C for the first PCR. The MC1R markers were analysed by sequencing the whole locus. To this end, a 1080 bp fragment was amplified and sequenced with an ABI Prism 3130 × l Genetic Analyzer using the Big Dye Terminator v3.1 Cycle Sequencing Kit (both Life Technologies), following the manufacturer’s protocols. See Supplementary Table S2c for more information on primers used in this study.

Genotype and phenotype data of this study were submitted to the European Genome-phenome Archive (EGA, https://ega.crg.eu) with study accession number EGAS00001001174 (sample/proband ids EGAN00001268626-EGAN00001269025).

Statistical analysis

Genotypes for all markers of the two previously published models (or marker sets) (Branicki et al29; Walsh et al30) were only available for 100 individuals in our study (stage 2). To compare the predictive capability of the two marker sets with a model derived specifically for our target population, the 300 individuals of stage 1 were used to detect significant genotype–phenotype associations and to create an appropriate prediction model. Data from stage 2 then served for estimation and comparison of the predictive capability (sensitivity, specificity, predictive accuracy, area under the receiver operating characteristic curve (AUC)) for each new model and the two previously published models (Branicki et al29; Walsh et al30). For illustration, we also performed model selection and prediction evaluation on the whole data set (ie, stages 1 and 2 combined), using cross-validation to estimate sensitivity and specificity.

Sample size calculations indicated that ~100 individuals per group would suffice to detect an OR of 3 as nominally significant, depending upon the minor allele frequency of the SNP of interest, and 150 individuals per group after Bonferroni adjustment (12 SNPs tested, 80% power, 5% significance level). Stage 1 therefore comprised 300 individuals. The association between a given trait (ie, eye colour, hair colour/red tint, hair colour/light-dark component, skin colour) and a candidate SNP was tested for statistical significance using regression models. To allow for scarce genotypes, we performed permutation tests (100 000 permutations) in addition to standard asymptotic tests. Since P-values were not found to be notably different, only P-values from permutation tests will be given. Each SNP was analysed both individually (simple regression) and in combination with other candidate SNPs (multiple regression with backward selection), also allowing for possible SNP–SNP interactions. Genotypic, additive allelic, dominant and recessive models were considered for each SNP. Results, however, will be presented for the additive model only because this model required the least parameters but yielded consistently large effects. To derive robust prediction models, phenotypes were categorised in various ways. Dependent on the scaling of the outcome, we performed logistic, linear, ordinal (proportional odds) and/or multinomial logistic regression. Since blue was by far the most frequent eye colour in our study, the analysis of eye colour was confined to the discrimination between blue and non-blue. For hair colour, red tint was treated as a dichotomous outcome whereas the light-dark component was treated in three different ways, either as dichotomous (blond versus non-blond), ordinal or quantitative (nine types of increasing darkness). Skin colour was treated either as dichotomous (types I–II versus types III–IV) or as ordinal. Model selection was performed differently for the four traits. For eye colour and red tint, SNPs that remained significant in the multiple logistic regression analysis after backward selection and adjustment for multiple testing were included in the final model. For the light-dark component of hair colour, and for skin colour, a SNP had to be significant in all or in all but one of the multiple regression analyses after backward selection using different outcome definitions (ie, at least two of three analyses for the light-dark component, at least one of two analyses for skin colour).

The relationships between traits were analysed using logistic regression analysis, treating one trait as the dependent variable and the other traits as independent variables, both with and without the additional inclusion of SNP genotypes. All four traits were encoded as dichotomous variables in these analyses. Model selection was again performed by backward selection. Multidimensional scaling was used to detect and visualise patterns in the phenotype data.

The predictive capability of a derived model was evaluated by means of the phenotype probability π from logistic regression analysis. This was done only for dichotomous outcomes (eg, blue versus non-blue eye colour). If π>0.5, the corresponding phenotype was assumed to be present. Predictive capability was quantified by the sensitivity, specificity, predictive accuracy and AUC of the model in question.

All statistical analyses were performed with R v2.10.1 (R Development Core Team35) unless indicated otherwise. Hardy–Weinberg equilibrium was assessed by means of the exact test implemented in R package genetics (Warnes et al36). Package MASS was used for ordinal and multinomial regression (Venables and Ripley37). Permutation tests of the linear and logistic regression models were performed with package glmperm (Werft et al38). For ordinal regression models, permutation tests were programmed in house. The predictive capabilities of different models were evaluated with packages DiagnosisMed (Brasil39) and pROC (Robin et al40). The proportion of phenotype heritability explained by a given marker was calculated according to So et al.41 Note that these estimates apply to single markers and do not take into account the characteristics of the respective regression models. Furthermore, these estimates refer to the liability scale and therefore tend to be higher than on the observation scale.

Sample size calculations were performed with the GPower software v3.0.8.42 All tests were two-sided and a P-value smaller than 0.05 was considered nominally statistically significant. P-values were adjusted for multiple testing using the Bonferroni method.

Results

Hardy–Weinberg equilibrium

After adjustment for multiple testing, none of the SNPs showed a significant deviation from the Hardy–Weinberg equilibrium.

Eye colour

Since blue was by far the most frequent eye colour in our study population, we confined our analysis of eye colour to the discrimination between blue and non-blue (Supplementary Table S3a; for the stratified genotype distribution, see Supplementary Table S4; for the eye colour categorization, see Figure 1a). When analysed individually, five SNPs were found to be significantly associated with blue eye colour (Supplementary Table S3a), namely rs12913832, rs916977 (both HERC2), rs7495174, rs4778241 and rs4778138 (all OCA2). When all 12 candidate SNPs were included in a multiple logistic regression analysis, backward selection left only SNPs rs12913832 (HERC2) and rs1800407 (OCA2) with a significant phenotype association after adjustment for multiple testing (Table 2). Of these two SNPs, rs12913832 showed by far the strongest effect (P<1.0 × 10−5, padj<1.2 × 10−4, OR=40.0, 95%CI=18.3–87.5), although the effect of rs1800407 was still of considerable size (P=0.0014, padj=0.017, OR=4.9, 95%CI=1.8–13.6). When the expected prevalence of blue eye colour was calculated for each rs12913832/rs1800407 genotype combination (Supplementary Table S3b), reasonable agreement with the observed frequencies was observed. Models of brown versus non-brown eye colour and blue versus brown eye colour revealed similarly strong genotype–phenotype associations. Moreover, the results were largely independent of whether dichotomous, ordinal or multinomial regression analyses were performed. When modelling brown versus non-brown, blue versus brown and in the ordinal regression analysis of eye colour, SNP rs4778138 (OCA2) showed a nominally significant association with the respective trait in a multiple regression that did not, however, withstand correction for multiple testing.

Table 2 Model-based genotype–phenotype association for four pigmentation traits for stage 1

Hair colour – red tint

Hair colour was defined by two distinct traits, namely a light-dark component and whether red tint was visible in the scalp or body hair, or not. Only two SNPs in the MC1R gene were found to be significantly associated with the red tint trait in a multiple regression analysis (Supplementary Table S5a; for the corresponding genotype data, see Supplementary Table S6). Of these, rs1805007 showed the stronger effect (Table 2; P<1.0 × 10−5, padj<1.2 × 10−4, OR=5.4, 95%CI=2.8–10.3) whereas only a moderate association was noted for rs1805008 (P=1.0 × 10−4, padj=0.0012, OR=3.5, 95% CI=1.9–6.6). As with eye colour, the observed and expected genotype-specific prevalence of red tint were found to agree well (Supplementary Table S5b). Similar results were obtained when red hair colour (26 individuals) was analysed instead of red tint (102 individuals).

Hair colour – light-dark component

The light-dark component of hair colour was initially categorised as blond, brown or black, with blond being the predominant type. In addition, we defined nine evenly graded hair colour types of different shading, ranging from light blond (type I) to black (type IX) (Figure 1b). The most consistent association with hair colour was noted for rs12913832 (HERC2), which showed a statistically significant effect in all three analyses (dichotomous blond versus non-blond, ordinal and linear regression of the light-dark component; Supplementary Table S8a). SNP rs12203592 (IRF4) also showed a highly significant association with the light-dark component of hair colour, but was less strongly associated with the blond versus non-blond trait. The two SNPs were also the only ones included in the final model of the genotype–phenotype relationship (Table 2; rs12913832: P<1.0 × 10−5, padj<1.2 × 10−4, OR=2.9, 95% CI=1.9–4.4; rs12203592: P<1.0 × 10−5, padj<1.2 × 10−4, OR=3.6, 95%CI=2.0–6.3). For further details on the genotype–phenotype relationship of the light-dark component, see Supplementary Tables S7 and S8.

Skin colour

Skin colour was categorised into four types on using the Fitzpatrick scale.31 Our genetic association analysis was performed twice, once discriminating between skin types I–II and III–IV, and once by ordinal regression of the four skin types (for details see Supplementary Tables S9 and S10). SNPs rs1805007, rs1805008 (both MC1R) and rs4778138 (OCA2) were selected for the final model of the genotype–phenotype relationship and showed a consistent albeit moderate association with skin type (Table 2; rs1805008: P=6.0 × 10−5, padj=7.2 × 10−4, OR=3.0, 95% CI=1.8–5.1; rs1805007: P=0.0035, padj=0.042, OR=2.5, 95% CI=1.4–4.3; rs4778138: P=4.9 × 10−4, padj=0.0059, OR=3.2, 95% CI=1.6–6.2). For further details on the genotype–phenotype relationship of skin type, see Supplementary Tables S9 and S10.

Association between phenotypes

As was to be expected, the four pigmentation traits were not statistically independent. Thus, the phenotypes blue eye colour and blond hair colour and the phenotypes red tint and fair skin were strongly associated with one another even when the respective genotypes from the final genotype–phenotype models (Table 2) were taken into account (Table 3). Interestingly, red tint and blond hair colour were not significantly associated with one another in our data.

Table 3 Associations between dichotomous pigmentation traits for stage 1

Joint multidimensional scaling analysis of the four traits in both stages combined resulted in four distinct clusters (Figure 2). These clusters were determined completely by blue eye colour and red tint which therefore seem to be the most differentiating pigmentation traits. The same clusters were also found in stages 1 and 2 individually (data not shown). For information on the exact phenotypic composition of the four clusters, see Supplementary Table S11.

Figure 2
figure 2

Multidimensional scaling analysis of four pigmentation traits. Squares: blue eye colour, red tint; circles: blue eye colour, no red tint; triangles: no blue eye colour, red tint; crosses: no blue eye colour, no red tint. All individuals of stages 1 and 2 were included in the analysis except four individuals with pure red hair. Phenotypes were coded as follows: eye colour, ordinal (blue, green, brown); hair colour - red tint, dichotomous; hair colour - light-dark component, ordinal (I to IX); skin colour, ordinal (I to IV).

Predictive capability of SNP-based models

We next assessed the predictive capability of the different SNP- and phenotype-based models derived in our study (Table 4a). In the process, we used the most prevalent phenotype as the reference category for eye and hair colour (light-dark component), that is, blue eyes and blond hair, to ensure sufficient sample size. SNP-based prediction was found to perform best for eye colour, with a sensitivity of 93%, a reasonable specificity of 59%, a predictive accuracy of 84% and an AUC of 77%. Additional inclusion of blond hair colour as a predictor of eye colour increased the predictive capability only marginally. Prediction of red tint also yielded comparatively high accuracy (74–77%) owing to the high specificity of the SNP genotypes (97–99%) and the low prevalence of the trait (31%), but had low sensitivity (19–32%). The AUC was approximately 75%. Blond hair was predicted moderately well by SNP genotypes (83% sensitivity, 67% specificity, 76% accuracy, 76% AUC). Interestingly, when phenotypes were included in the logistic regression model, backward selection excluded SNP rs12913832 (HERC2) to the benefit of blue eye colour. However, the ensuing model gave poorer predictive power (67% accuracy, 71% AUC) owing to a reduced sensitivity (67%). Finally, fair skin (types I and II) could be predicted with very high specificity (100%) using three SNPs, but sensitivity was low (9%), thereby resulting in a predictive accuracy of only 51% and an AUC of 64%. Only a slight improvement was achieved by the inclusion of other pigmentation traits as predictors.

Table 4a Predictive capability of several prediction models for pigmentation phenotypes (stage 2 only). Capability of selected models to predict dichotomous pigmentation phenotypes

Comparison with previously proposed marker sets

We compared our eye colour model comprising only rs12913832 (HERC2) and rs1800407 (OCA2) (Table 2) to the six SNPs of the so-called ‘IrisPlex’, proposed by Walsh et al.30 Because of the high prevalence of blue eye colour and the low prevalence of green and brown eye colour in our study population, we focused upon the discrimination between blue and non-blue eye colour. For comparison, we also considered a model based upon major SNP rs12913832 (HERC2) alone. All three models yielded comparable predictive accuracy but the AUC of the IrisPlex model was found to be considerably larger (89% versus 77%, Table 4b). For hair colour, we compared our models (Table 2) to the 13 single or compound markers proposed by Branicki et al.29 The latter performed worse than our model for red tint. For blond hair, its predictive capacity was rather low, with an accuracy of 56% and an AUC of 57%, and was even outperformed by a model including only the major SNP (Table 4b). Interestingly, the predictive accuracy for blond hair and red tint was similar for the selected models of this study and a model incorporating only the respective major SNP, ie, either rs12913832 (HERC2) or rs1805007 (MC1R).

Table 4b Predictive capability of several prediction models for pigmentation phenotypes (stage 2 only). Comparative analysis of prediction models for pigmentation phenotypes (present study, Walsh et al,30 Branicki et al29)

Analysis of the whole data set

We also analysed our whole data set (ie, stages 1 and 2 combined) and the results were consistently found to be similar to those of the two-tiered analysis (Supplementary Tables S12-S17). Owing to the larger sample size occasionally more significant results emerged. Thus, two additional markers (rs1805008 in MC1R and rs12896399 in SLC24A4) were significantly associated with the light-dark component of hair colour, and one additional marker (rs7495174 in OCA2) emerged for skin colour. For red tint, the same two markers in MC1R as before were found to be significant. For eye colour the marker rs1800407 in OCA2 was now disregarded in the final model, therefore this now only consists of the main marker rs12913832 in HERC2.

When the associations between phenotypes were investigated, a new significant relationship between blond hair colour and fair skin emerged. Still, no significant association between red tint and blond hair colour was found.

For blue eye colour, the predictive capability of the main marker alone as estimated by cross-validation in the whole data set was higher than the capability estimated from stage 2 for a model of one or two markers selected from stage 1. Similar results were obtained for skin colour. For red tint and blond hair colour, by contrast, the stage 2-based estimates for models derived in stage 1 were found to be the higher ones.

Discussion

In our study population, only six of 12 candidate SNPs investigated in stage 1 were significantly associated with a pigmentation phenotype of eye, hair or skin. These SNPs were located in four genes, namely HERC2, MC1R, IRF4 and OCA2. The remaining SNPs showed no consistent association in the different analyses performed. Due to the limited sample size (300 individuals in stage 1), however, we cannot exclude that weak effects may have been overlooked. This possibility is also highlighted by the fact that an analysis of the whole data set yielded more significantly associated markers than the two-tiered approach. Larger sample sizes certainly would have revealed more genes to be associated with the (polygenic) pigmentation traits of interest.

The marker set previously proposed by Walsh et al,30 including six SNPs, achieved the best predictive capability for blue eye colour. This implies that it would also be well suited for use in the comparatively homogeneous Northern German population under study here. Notably, for the red tint component of hair colour, the model including only major SNP rs1805007 in the MC1R gene performed slightly better in our study than the model suggested by Branicki et al,29 which includes 13 single or compound markers. However, it must be taken into account that Branicki et al29 considered only red hair, and not red tint, as was done in our study. Using the same 13 markers to predict blond hair colour, their predictive capability turned out to be even poorer. We observed that the Branicki model was much inferior to major SNP rs12913832 (HERC2) alone and performed only slightly better than mere chance prediction in the population under study. Thus, the comprehensive prediction model previously suggested for hair colour did not achieve convincing predictive results in our population whilst a model comprising fewer (or only a single) marker(s) performs equally well or even better.

The main goal of our analysis was to develop a prediction model with a view to its practical application in a realistic target population, and to compare its predictive capability to that of previously proposed marker sets. We are fully aware that some of our results may only apply to Northern Germany. Even though population genetic variation is known to be small in Europe,43, 44, 45 more markers or different markers may be required for accurate phenotype prediction in other or less homogeneous populations. Moreover, the power of forensic DNA phenotyping depends upon the prevalence of the pigmentation phenotype(s) in question, and these frequencies differ considerably between European countries (as is clearly demonstrated in the case of red hair). Instead of sensitivity and specificity, the negative and positive predictive values of a given model are more important parameters for practical use, and these are a function of the prevalence. Nevertheless, at least for blue eye colour, our evaluation of the markers proposed by Walsh et al30 gave similar results as a Europe-wide evaluation, by the same group,46 of a similar prediction model proposed by Liu et al.47 However, since Northern Germany constitutes only a very small segment of the European gene pool, the general validity of our conclusion needs to be clarified in future studies. For worldwide samples, different models may be required and the addition of ancestral markers may be worthwhile in these instances, owing to the preponderance of brown eye colour and dark skin or hair colour in some regions. Such markers were successfully applied before. In the respective studies28, 48 SNP rs12913832 (HERC2) again had the largest impact on eye colour prediction.

One reason for the poor performance of the blond hair markers proposed by Branicki et al29 may be that the original study used a rather small sample of 385 individuals to estimate a large number of parameters (13 influential variables and four response hair categories) in a multinomial model, which rendered the analysis prone to over-fitting. The use of non-validated SNPs with minor or no effect and the possibility of population-specific genotype–phenotype relationships imply that the development of ever refined models with large numbers of SNPs may result in prediction tools that are no longer robust. Consequently, weak-to-moderate effects could often not be replicated.11, 18, 49 Recently, the model originally proposed by Branicki et al29 has been refined on the basis of a large European data set. The resulting so-called ‘HIrisPlex’50 comprises all previously proposed 13 markers plus 8 additional markers also considered by Branicki et al.29 Because of its recency we have not been able to determine whether the HIrisPlex produces better results than the smaller model in the Northern German population.

For simplicity and comparability, phenotype prediction employed a probability threshold of 0.5 in our analyses. Possibly, other thresholds might produce better results for some of the models. Another aspect is that individuals with a phenotype probability around 0.5 are difficult to classify correctly anyway. One possibility to overcome this problem would be to apply a high and a low threshold in the first place, and to treat individuals between these thresholds as ‘undetermined’.29, 46 In this case, specificity and sensitivity of the prediction model could be increased but would result in a large proportion of cases where no prediction is possible. A second option would be to discard thresholds altogether and for the scientist or forensic expert to merely communicate the actual phenotype probability.

Several methods are available for statistical model selection with the aim of prediction. We used regression models with backward selection based upon a likelihood ratio test of the association between influential variables (ie, candidate SNPs) and an outcome of interest. Another popular method is maximisation of the AUC as employed in the studies by Walsh et al30 and Branicki et al.29 Here, the individuals of the two different outcome groups are ordered by the predictive probability to belong to group 1, and the AUC measures how well the individuals can be differentiated by individuals of group 1 having higher probabilities than individuals of group 2 (as would be desirable). Maximisation of the AUC is perhaps the most widely used but its applicability has come under critical debate lately.51, 52 In addition, for small marker sets, estimation of the AUC is imprecise because the corresponding receiver operating curve has only few supporting points. For our study including 12 markers in the full model, but only two to three markers in the final models, we consequently chose a regression approach. Note that the marker sets of Walsh et al30 and Branicki et al29 included a higher number of markers, therefore the estimation of the AUC is easier in those studies.

We noted that a more refined definition of the pigmentation phenotypes yielded more significant genetic associations. This was most pronounced for hair colour which was analysed as a dichotomous (blond versus non-blond), ordinal and quantitative trait (types I–IX). Thus, fine phenotyping may facilitate the detection of more moderate genotype–phenotype associations in the future. Fine phenotyping has been applied with favourable results to eye colour before by Liu et al.24 In the same vein, Candille et al11 successfully measured pigmentation of eye, skin and hair colour on a quantitative scale.

Our study also highlights that pigmentation, even of one and the same part of the body, is genetically complex. This is best illustrated by hair colour which was decomposed into two different sub-phenotypes lacking significant association with one another in our data. Furthermore, whereas the light-dark component was associated with SNP rs12913832 in the HERC2 gene (the SNP that had a very strong effect on eye colour), red tint was associated with SNP rs1805007 in the MC1R gene. Variation in MC1R, in turn, was also shown to be associated with skin colour.

Of all pigmentation traits investigated in our study, eye colour turned out to be best predictable by SNP genotypes. Even for eye colour, however, these predictions were far from being very reliable. Moreover, as was illustrated by the comparison between our models and those of Walsh et al30 and Branicki et al,29 the incorporation of additional SNPs is likely to achieve only small improvements, if any, in terms of the predictive power in a specific population. For skin colour, no reliable gene-based prediction model could be developed at all. These findings, together with the supposedly highly polygenic nature of the pigmentation traits, suggest that it may simply not be possible in a specific population to predict some of the pigmentation phenotypes with sufficient certainty from a handful of SNP genotypes. Instead, reliable prediction may require the use of a high number of SNPs, for example, as formatted on available DNA microarrays. Not least, similar approaches have been used successfully to facilitate the inference of population affiliation in Europe.43, 44, 53 Finally, our study also revealed that the prediction of one pigmentation phenotype may benefit from using information on other pigmentation phenotypes, if and when such information is available. In the future our method might also be combined with other SNP-based assays, eg, for the determination of human origin.