Introduction

Alopecia areata (AA) is a common hair loss disorder, which affects approximately 1–2% of the general population. It affects both sexes and all age groups.1 Affected individuals present with a non-scarring, circumscribed hair loss, which has a sudden onset and a recurrent course. The scalp is the most commonly affected site, although all hair-bearing areas of the skin may be involved. Episodes of hair loss typically start with isolated hairless patches. These extend centrifugally and may coalesce. AA is divided into three main clinical types on the basis of the degree of hair loss and the sites affected: (i) patchy AA; (ii) AA totalis, affecting the whole scalp; and (iii) AA universalis, affecting the whole body. Individuals with AA may experience complete remission, a chronic course, or progression towards AA totalis or AA universalis. Familial AA also occurs, and recurrence risks of 5–6% have been reported for the children of affected individuals.2, 3 The pattern of familiality and a limited twin study, which reports a concordance rate in monozygotic twins of 55%,3 suggest that multiple genetic factors and environmental factors are involved in the pathology of AA. Nevertheless, the etiopathogenesis of AA remains poorly understood. One hypothesis is that AA is a tissue-specific autoimmune disease of the hair follicle.4 This is supported by reports of association between AA and specific HLA alleles.5, 6, 7, 8 Association has also been reported between AA and the W620 variant of the PTPN22 gene (protein tyrosine phosphatase, nonreceptor type 22), which has been implicated in several autoimmune disorders.9, 10 To date, only one systematic genome-wide linkage study of AA has been performed in humans. This identified four susceptibility loci on chromosomes 6, 10, 16, and 18, respectively.11 The pattern of familiality and a limited twin study which reports a concordance rate in monocygotic twins of 55%.

To identify new genetic variants and thus further elucidate the genetic basis of AA, we performed a genome-wide association (GWA) study in 729 AA cases and 656 controls, using the strategy of pooled DNA genotyping. The top 61 single nucleotide polymorphisms (SNPs) were selected for individual genotyping in the initial sample of the pooling approach for the purpose of confirmation. We further analyzed these SNPs in an independent replication sample. The best association finding was then analyzed in a second independent replication sample. The strongest association was found for variants in the HLA region. These results thus confirm the validity of the pooling-based approach to DNA genotyping. Borderline significance was found for rs304650 in the SPATA5 (spermatogenesis-associated protein 5) gene, which may therefore represent a new susceptibility gene locus for AA.

During the final stages of the present study, a GWA study of an AA sample from the US was published by Petukhova et al.12 This identified eight gene loci for AA, most of which are implicated in autoimmunity. Although SPATA5 was not among these loci, six SNPs in SPATA5, of all which show evidence for linkage disequilibrium with rs304650, were reported with a P-value <10−4.

Materials and methods

DNA samples

The analyses involved 1720 individuals with AA (cases) and 2677 controls. For cases, the inclusion criterion was a dermatologist-assigned diagnosis of AA. Patients with Down's syndrome or Turner's syndrome were excluded. The cases were recruited from: (i) outpatient clinics in Belgium (University Hospitals of Antwerp and Gent) and Germany (University Hospitals of Munich, Münster, Düsseldorf, Berlin, Bonn, Gießen, Hamburg and Mannheim); (ii) a private dermatology practice (Wesseling, Germany); and (iii) AA self-help support groups (Germany, The Netherlands). All cases were asked whether or not they had a positive family history of AA. This was defined as a history of at least one first- or second-degree relative with any form of the disorder. The controls were drawn from the general population, and were therefore not specifically screened for the absence of AA. They were healthy unrelated blood donors from the University Hospital of Bonn (n=1313) or participants in one of three population-based epidemiological studies: (1) PopGen13 (n=490); (2) KORA14 (n=490); and (3) HNR15 (n=384). All cases and controls were of central European origin. Ethical approval was obtained from the respective ethics committees, and all participants provided written informed consent before blood sampling. The DNA of patients and blood donors was extracted from peripheral blood leukocytes. This was achieved by salting out with saturated NaCl solution according to standard methods, or through the use of a Chemagic Magnetic Separation Module I (Chemagen, Baesweiler, Germany) in accordance with the manufacturer's instructions. The DNA was stored in liquid nitrogen until use. For the controls from the three population-based epidemiological studies, the genotypes were taken from previously generated GWA data sets. The PopGen and KORA data sets were generated within the German National Genome Research Network to serve as a national research resource. The HNR data set was generated as part of a collaboration to generate a set of universal controls for genetic studies.

DNA pooling

The DNA samples of 729 AA cases and 656 blood donor controls were selected at random. These samples were then used to generate a control pool and two AA case pools. The first AA case pool consisted of the DNA of all 729 AA cases (including the 224 AA cases with a positive family history). The second AA case pool consisted of the DNA of ‘positive family history’ AA patients only (n=224). To ensure equal amounts of DNA in each pool, each individual DNA sample was subjected to a series of dilution steps (above 100 ng/μl to 20 ng/μl, and then to a final concentration of 10 ng/μl). Each DNA sample was double-quantified using a NanoDrop device (PEQLAB Biotechnologie GmbH, Erlangen, Germany) and adjusted to ±10% of the required concentration. For the 10 ng/μl dilution, a broader range of −60 to +80% was used to reflect the measurement error of the NanoDrop device at low DNA concentrations. Equimolar amounts of each DNA sample (100 ng) were pipetted manually into a large tube to form the pool. Each pool was then concentrated to 60 ng/μl using the Microcon YM-100 Centrifugal Filter Device (Millipore GmbH, Schwalbach, Germany), and then requantified with the NanoDrop device.

Genotyping of the pooled DNA and data analysis

The Illumina Sentrix HumanHap550v3 genotyping BeadChips were used. These contain more than 550 000 tag SNPs (Illumina Inc., San Diego, CA, USA). To avoid inter-experimental variation, each pool was genotyped on five chips in accordance with the manufacturer's recommendations for individual samples. SNP allele frequencies were estimated using data from the BeadArray Reader imaging (Illumina Inc.) and Illumina's genotyping software Bead Studio 2.0 (Illumina Inc.). Three analyses were performed on the basis of the allele frequency estimates obtained from the genotyping experiment: (i) 729 cases versus 656 controls; (ii) 224 cases with a positive family history versus 656 controls; and (iii) a sliding window analysis using the data set of 729 cases versus 656 controls, in accordance with recommendations described elsewhere.16

For each replicate, the following approximation of allele frequencies in cases and controls was performed on the basis of the raw data: fi=Xraw/(Xraw+Yraw), where Xraw and Yraw are the intensities of the two dyes (Cy5 and Cy3) used to genotype SNPs on the Illumina platform, and i is the replicate number. The obtained allele frequency was then averaged over the number of replicates of each pool f=(f1+…+fM)/M (where M is the number of replicates), and corrected for unequal amplification using the formula fcorrected=f/(f+k·(1-f)).17 For each SNP, the correction factor k was estimated using the HapMap CEPH data (http://hapmap.ncbi.nlm.nih.gov/) and the formula k=(fcon-fcon·fCEPH)/(f_CEPH-fcon·fCEPH), where fcont is the allele frequency in controls obtained from the pooling experiment, and fCEPH is the allele frequency calculated from the HapMap CEPH data.

The correction factor k was used as the quality control measure, as it provides an indication of how close the estimated allele frequencies in controls are to those of the CEPH data. SNPs were excluded from further analysis when k was >4 or <0.25.

The coefficient of variation was also used as a quality control measure for each SNP in each pool. This was calculated using the formula sqrt(vare)/f, where vare is the experimental variance estimated from the replicates data. This indicates how close the SNP is to the replicates, while taking into account allele frequencies and the number of replicates per pool. SNPs were excluded from further analysis when the coefficient of variation was >1 (upper boundary of the 95% confidence interval) in at least in one of the three pools (ie, cases, controls, familial cases).

For the remaining 487 932 SNPs, association analyses were performed to compare 729 cases and 224 familial cases with 656 controls. Modified Z-statistics were used, as described by Abraham et al.18 These take into account both the experimental variance (see above) and the sampling variance.

A signal processing analysis technique was used to interpret the results of the case–control association analysis. This enables the detection of genetic association with disease, while taking into account the significance of several subsequent genetic markers in a sliding window. Moskvina et al17 has described a theoretical approach to calculating the probability of at least one false alarm being flagged by the detection statistic under the Null hypothesis of no signal (association).14 This is equivalent to the probability of type I error when taking the number of comparisons in the window into account, and thus provides a ‘genome-wide window-based’ significance level.19

SNP selection, confirmation of pooling results and independent replication

A number of SNPs were selected to confirm the pooling results on an individual genotyping level. A replication step was performed in two independent case–control samples (for an overview, see Figure 1). The following SNPs were selected from the three pooling-based analyses: (i) the top 50 SNPs from the analysis ‘729 cases versus 656 controls’; (ii) the top 20 SNPs from the analysis ‘224 cases with a positive family history versus 656 controls’; and (iii) the top 35 SNPs from the ‘sliding window analysis’. All selected SNPs had shown P-values of <1 × 10−4 in the respective pooling analysis. As SNPs from the previously known HLA region were highly overrepresented, all but three of these SNPs were excluded. The three remaining HLA SNPs, rs3115553 (chr. 6: 32 353 805 bp), rs9275141 (chr. 6: 32 759 095 bp) and rs9275572 (chr. 6: 32 786 977 bp), were selected, as they were the best findings in the HLA region in the ‘sliding window analysis’. These were used as positive controls in subsequent analyses. Regrettably, rs9275141 failed during the assay design of the Sequenom iPlex reaction (Sequenom GmbH, Hamburg, Germany). It was therefore replaced by the next best SNP in the HLA region from the ‘sliding window analysis’, rs9268528 (chr. 6: 32 491 086 bp). By excluding other SNPs from the HLA region and SNPs appearing in more than one of the analyses (doubles and triples), it was possible to reduce the SNP set for the confirmation and replication step (step 2) from 105 to 61 SNPs. Individual genotyping was performed on the Illumina platform. For the DNA samples from the HNR-, KORA- and Popgen controls (n=1364), HumanHap550v3 genotyping BeadChips (Illumina Inc.) were used. For all other DNA samples, Sequenom's Compact MALDI-ToF Mass Array system and iPLEX Gold reagents (Sequenom GmbH) in multiplex reactions were used. Primer sequences and Sequenom's standard assay conditions are available upon request. All primers were checked by MALDI-ToF. For quality reasons, a success rate of at least 95% was required for all analyzed SNPs. A 95% call rate was required for each of the samples used in the confirmation and replication steps. In the confirmation step (step 2), DNA samples from eight AA cases and 11 controls failed to fulfill the quality criteria and were excluded from the analysis, and thus 721 AA cases and 645 controls remained for the analyses. In the independent replication step (step 2), four DNA case samples failed to fulfill the quality criteria, and thus 450 cases and 1 364 controls remained for the analyses. Of the 61 selected SNPs, six SNPs (rs12493901, rs30117, rs41515, rs4777450, rs7246435 and rs9520256) were technical failures, and five SNPs (rs10123149, 11098149, rs6700586, rs7099812 and rs724841) had to be excluded, as they did not reach the required call rate of ≥95% in one or more of the confirmation and/or replication samples. The SNP rs7334982 was not biallelic in the present samples and was therefore excluded from further analysis. Thus, a set of 49 SNPs remained for further analysis.

Figure 1
figure 1

Overall workflow. The study was conducted in three steps: (a) pooling-based analyses using three different approaches (I) all cases vs all controls, (II) cases ‘positive family history’ vs all controls and (III) a sliding window analysis; (b) confirmation of selected best pooling-based findings through individual genotyping in previously pooled case- and control samples; (c) independent replication and follow-up analyses in additional samples of cases and controls.

Statistical analyses of individual genotyping data

The FAMHAP software package20 was used for the association and haplotype analyses. The Armitage trend test was used for the single marker analyses.21 All SNPs met the following quality criteria: minor allele frequency >1%, PHWE in cases >0.001 and PHWE in controls >0.05. The resulting P-values were corrected for multiple testing according to the number of SNPs successfully analyzed on the individual genotyping level (n=49).

Expression analyses

We used the forward primer 5′-CCTTCAAACCGACGCATACT-3′ and the reverse primer 5′-GCAGCCCACTCTTCTCTTGA-3′ to analyse the expression of SPATA5 in human hair follicle and skin samples (expected product size: 197 bp). As SPATA5 expression has been proven in human kidney and lung (http://www.genecards.org/cgi-bin/carddisp.pl?gene=SPATA5&search=SPATA5), these tissues were included as positive controls. Total RNA was extracted from human hair follicles and skin using the the RNeasy Micro Kit (Qiagen, Hilden, Germany), and single strand cDNA was synthesized from a total of 400 ng RNA using the Super Script III First Strand Synthesis System (Invitrogen, Karlsruhe, Germany). Single strand cDNA from kidney and lung was obtained from the human Multiple-Tissue cDNA Panel I (LOT Nr. 6060248; Clontech, Takara Bio Europe/Clontech, Saint-Germain-en-Laye, France). A negative reverse transcription reaction (no enzyme) was included as a negative control (Figure 2).

Figure 2
figure 2

Expression analysis of mRNA of SPATA5. SPATA5 was found to be expressed in human hair follicle, skin, kidney, and lung (samples displayed from left to right). The final lane shows a negative control.

Results

Step 1: Pooling-based approach

The genome-wide pooling-based approach involved three DNA-pools: (1) 729 AA cases (including the 224 cases with a positive family history), (2) 224 AA cases with a positive family history, and (3) 656 controls. Each pool was genotyped on five replicates of the Illumina Sentrix HumanHap550v3 genotyping BeadChip. Pool 1 was successfully analyzed in all five replicates. Pool 2 was successfully analyzed in four replicates. Two of the five control chips were excluded following quality control filtering.

Quality control measures and allele frequencies were estimated for all 504 931 SNPs. The quality control measures used were: (i) the coefficient of variation of each SNP in each pool, which reflected how close the SNP was to the replicates while taking into account allele frequencies and the number of replicates per pool; and (ii) correction factor k as an indicator of the closeness of the allele frequency estimates in the control pool to the allele frequencies in the CEPH sample from HapMap (for details see the Materials and methods section).

Markers were excluded if the coefficient of variation was >1 in at least one of three DNA pools, or if the correction factor k was >4 or <0.25. The remaining 487 932 SNPs were corrected for k, and SNPs with minor allele frequencies of <5% in controls were filtered out. The remaining 468 389 SNPs were tested for association using modified Z-statistics.18 Separate comparisons with controls were made for cases and familial cases. The results of the case–control comparison were further analyzed in a sliding window. The best markers from the top regions identified by the sliding window analysis19 were selected for replication. Thus, three different analyses were performed: (I) 729 AA cases versus 656 controls; (II) 224 AA cases with a positive family history versus 656 controls; and (III) a sliding window analysis (Figure 1). These analyses identified 31 SNPs with P-values <5 × 10−7 (Supplementary Table 1), resulting in a total of 18 SNPs after the exclusion of duplicates and triplicates. Of these 18 SNPs, 8 SNPs were localized in the HLA region, and 10 SNPs were localized elsewhere in the genome. The best SNP of all three analyses was rs9952976 (chr. 18: 42 561 717 bp), which had a P-value of 6.48 × 10−14 in analyses I and III (Supplementary Table 1). The best SNP in analysis II was rs9275572, which is localized in the HLA region (P=1.87 × 10−8). This SNP was also the best HLA–SNP in analyses I (P=1.00 × 10−11) and III (P=5.67 × 10−12; Supplementary Table 1). When the best SNPs from each analysis were considered (see selection criteria in Materials and methods section), three SNPs (rs9275141 (chr. 6: 32 759 095 bp); rs9275572 (chr. 6: 32 786 977 bp); and rs9952976 (chr. 18: 42 561 717 bp)) appeared in all three analyses. Two of these SNPs (rs9275141 and rs9275572) are localized in the HLA region (Supplementary Table 1).

Step 2: Individual confirmation and independent replication

The top 50 SNPs from analysis I, the top 20 SNPs from analysis II, and the top 35 SNPs from analysis III, were selected for further analysis (Figure 1). The elimination of duplicates and triplicates resulted in a total of 61 SNPs (see Materials and methods section and Supplementary Table 1). To confirm the association findings of the 61 selected SNPs in the pooling approach, individual genotyping was performed in the previously pooled discovery sample of 729 AA cases and 656 controls. An independent replication step involved 454 AA cases and 1364 controls. Following quality control, 49 SNPs remained for analysis (see Materials and methods section). With the exception of five SNPs, this analysis confirmed the pooling results at a nominal level of significance, and thus demonstrated the validity of the DNA pooling approach. The strongest association was found for the three HLA–SNPs (rs3115553, rs9268528 and rs9275572). The SNP rs9275572 showed the strongest association with P=2.50 × 10−10 (OR=1.65 (1.41–1.94); Table 1). These were the only SNPs to withstand correction for multiple testing using the previously suggested threshold of P=5 × 10−7.22 The remaining SNPs failed to show strong association. The best association finding outside of the HLA region was for rs2110597 (chr. 12: 12 832 280 bp) with P=1.42 × 10−5 (OR=1.44 (1.22–1.68)). The best finding from the pooling-based analysis, rs9952976 (chr. 18: 42 561 717 bp), showed only borderline significance on the level of individual genotyping, with P=0.034 (OR=1.20 (1.01–1.43)). This was one of the weakest association findings in this confirmation step (Table 1).

Table 1 Association between alopecia areata and selected markers in the case–control confirmation- and case–control replication analyses

In the independent replication step, the strongest association was again found for the three HLA SNPs. The SNP rs9275572 was the most strongly associated SNP (P=7.94 × 10−11; OR=1.71 (1.46–2.01)). Twenty-seven SNPs showed the same risk alleles as in the discovery sample. Only one SNP outside of the HLA region (rs304650; chr. 4: 124 303 368 bp) showed significant association (P-value of 0.001; OR=1.31 (1.12–1.53)). Following Bonferroni correction for the number of SNPs tested (n=49), only the three HLA SNPs and rs304650 (P=0.049) remained significant (Table 1).

Step 3: Follow-up analysis of the top finding

A second independent sample of 537 cases and 657 controls was then used to investigate the association finding for rs304650 further (Figure 1). Genotyping of rs304650 failed in one case and two controls, and thus, 536 cases and 655 controls remained. In this analysis, rs304650 could not be replicated at a significant level (P=0.127; Table 2). However, the risk allele remained the same. A combined analysis was therefore performed using all AA cases and controls from the independent replication and follow-up steps (a total of 985 cases and 2014 controls were successfully genotyped for this SNP; data not shown). Here, rs304650 showed stronger association, with P=3.43 × 10−4 (OR=1.24 (1.10–1.39)). After combining all cases and controls, we obtained a P-value of 1.58 × 10−5; OR=1.23 (1.12–1.35). Interestingly, SPATA5 expression was observed in hair follicles and skin, which confirms the importance of this gene in terms of hair biology (Figure 2).

Table 2 Follow-up analysis of rs304650 in an additional independent sample of 536 AA cases and 655 controls

Discussion

The present GWA study of AA is the first to have used pooled DNA. The analysis was performed in several stages to avoid the high costs of performing a GWA study in large individual samples. Genotyping of DNA pools was performed on 15 Illumina HumanHap550 arrays of patients and controls. A limitation of pooling studies in comparison to individual genotyping approaches is that allele frequencies are estimates deriving from DNA pools, which are inherently imprecise. In view of this, and the fact that the generally used quality control measures cannot be applied, the pooling-based approach was used as the discovery step (step 1; Figure 1). In the second step, the top SNPs were confirmed using individual genotyping of the previously pooled case and control samples, and replicated through individual genotyping in an independent sample of cases and controls (step 2; Figure 1). In the third step, the best SNP was followed up in a further independent replication sample (Figure 1).

The major histocompatibility complex on chromosome 6p21.3 was identified as a major risk locus for AA. Previous research by our group and others has implicated various HLA alleles in AA susceptibility. The best replicated findings have been for alleles of the DRB1 and DQB1 loci.5, 6, 8, 23, 24 The present highly significant findings for variants in the HLA region demonstrate that the pooling-based strategy is a valid alternative to individual genotyping in complex disorders.25 Although the pooling-based results for the HLA locus were not followed up systematically (step 1), the best three variants from the sliding window analysis were genotyped in the discovery and independent replication samples to confirm the initial pooling-based results (step 2). As expected, all three variants were confirmed on the individual genotyping level and reached the genome-wide significant P-values. This was also the case in the independent replication step. This indicates that the DNA pooling approach can reliably detect SNPs that have shown genome-wide significance in association studies. Furthermore, pooling detects highly significant results, and it is therefore very unlikely that any genes beyond the HLA region are more significant. However, our strategy carries a risk of false-negative findings in the case of smaller genetic effects. DNA pooling adds extra experimental error (eg, pipetting for pool construction) to the allele frequency measurement that directly influences the power to detect small effect sizes.26 Furthermore, only 61 top hits from the GWA study step were pursued in individual samples, with the great majority of nominally significantly associated markers having been excluded from the subsequent analyses. These are the two most likely explanations as to why the present study may have missed previously reported association findings.12 Therefore, the reliable detection of genes with smaller effects requires larger sample sizes and individual genotyping.

The only other SNP to reach experiment-wide significance in the combined analysis was rs304650 in the SPATA5 gene on chromosome 4q27-q28. Joint analysis of the replication samples used in steps 2 and 3, which included a total of 985 cases and 2014 controls, who were successfully genotyped for this SNP, revealed a significant association between this variant and AA, with a P-value of 3.43 × 10−4 (OR=1.24 (1.10–1.39)). A joint analysis of all of the investigated samples revealed a significant association between this variant and AA, with a P-value of 1,58 × 10−5 (OR=1.23 (1.12–1.35)). The SNP rs304650 maps to an intronic region of the SPATA5 transcript. At the time of writing, the functional aspects of this protein are unknown. However, one study identified SPATA2, another member of the spermatogenesis-associated protein family, as a susceptibility gene for the autoimmune disorder psoriasis.27

Interestingly, although SPATA5 was not among the eight loci with genome-wide significance reported by Petukhova et al,12 the authors reported six SNPs in SPATA5, with a P-value <10−4 in their Supplementary Material. Although not very strong, there is evidence for LD between rs304650 and the Petukova et al12 SNPs (ranging between r2=0.29 for rs11735364 and r2=0.46 for rs2201997). Although this might be viewed as supportive evidence, a more detailed workup of the region in very large samples is required to allow more definitive conclusions to be drawn. It is also interesting that the SPATA5 gene is located only 320 kb distal to rs7682241, a genome-wide significant marker in the study of Petukhova et al,12 which strongly suggests the involvement of the IL2/IL21 gene locus. Data from the CEU HapMap sample, however, show that rs304650 is not in LD with the best variants of the IL2/IL21 gene locus. Thus, the two loci probably confer their risk independently of each other. It remains theoretically possible, however, that the true causal variant for AA may be a functional variant that is in moderate LD to the variants reported in both analyses, and which is located between the IL2/IL21 and SPATA5 regions.