Introduction

Psoriasis is a chronic inflammatory skin disease mediated by both the innate and adaptive immune systems [1]. Psoriasis is highly heritable with a heritability estimate of 68% (60 to 75%) and a prevalence of around 2.5% among European populations with little gender difference [2, 3]. Genome-wide association studies (GWASs) have highlighted more than 40 loci associated with psoriasis in Caucasian samples, which collectively explain <30% of the total genetic variation [4, 5]. Although meta-analyses of additional GWAS cohorts may reveal additional novel loci, as observed in other immune-mediated diseases, most newly discovered variants are likely to be intergenic with moderate/weak effects and subsequently pose challenges in biological interpretation [6]. One often needs to interrogate GWAS signals in the light of other sources of information such as pathways and regulatory elements [7, 8], where non-additive effects such as gene–gene (GxG) and/or gene–environment (GxE) interactions may also be implicated [9].

Genome-wide exploring GxG and/or GxE interactions in GWAS data has been attempted widely but the results so far have been relatively disappointing mainly due to low power of detection [10,11,12]. Testing for non-additive interactions is more challenging than testing for additive signals, because, for example, detection of a GxG epistatic interaction requires markers to tag the causal variants at both loci and, where that tagging is imperfect, power is adversely affected [12]. A recent genotype–covariate interaction study also showed that GxE interactions jointly could contribute substantially (e.g., 8.1% by genotype–age interactions) to the variation of body mass index but required large samples to detect individually because of likely small effect sizes per interaction [13]. Several studies have reported GxG interactions in psoriasis by examining only the known susceptibility loci such as the major histocompatibility complex (MHC) [14,15,16,17]. A well-known example is the interaction between two GWAS significant loci HLA-C and ERAP1, where ERAP1 encodes an amino peptidase regulating the quality of peptides bound to MHC class I molecules encoded by HLA-C [16, 18]. This epistatic signal is not consistently detected possibly due to the relatively weak interaction effects [19]. Other examples include interactions between the HLA-Cw6 allele and the LCE3C and LCE3B deletion [14, 15, 17], and between HLA-C and IL12B in Chinese populations [17]. These reported GxG interactions each carry a rather moderate association signal, and thus are more useful in indicating regulatory mechanisms than explaining genetic variance. Environmental factors such as smoking and obesity are thought to interact with psoriasis susceptibility genes, but remain to be quantified because these factors are not documented adequately and consistently [20]. Nonetheless, a relatively weak interaction between IL12B and obesity was detected when using large samples [21].

The issue of low power of detection of non-additive loci is likely to be persistent if assuming a specific interaction model (e.g. either GxG or GxE) in a genome search [12, 13]. On the other hand, a locus can be involved in multiple non-additive interactions [22, 23] where the aggregate non-additive effects could be substantial [9]. Genotypic variability-based genome-wide association study (vGWAS) provides an alternative that can prioritize non-additive loci without requiring any prior knowledge of interacting types and factors [12, 24, 25]. vGWAS can achieve this by considering substantial genotypic variability, i.e., differences of phenotypic variation across three SNP genotypes, as the potential interaction signatures tend to be weak when only additive effects are important but strong when non-additive effects such as GxE interactions are important [26, 27]. Additional explicit tests of GxE and/or GxG interactions would be needed but only for the identified vGWAS loci, leading to a power increase attributed to a reduced multiple testing burden [23, 28].

The vGWAS approach has been applied to a number of quantitative traits and revealed interacting loci also carrying major additive effects [23, 28,29,30,31,32]. Recently, we adapted vGWAS to analyze dichotomous disease phenotypes using a two-stage vGWAS approach on the liability scale and identified MHC as a key interacting locus in three sero-positive rheumatoid arthritis (RA) cohorts [22]. These vGWAS findings showed that the MHC genes carry not only important additive but also non-additive effects in RA [22]. In a separate vGWAS meta-analysis of six sero-negative RA cohorts, we identified not only the MHC genes but also novel loci DHCR7 and IRF4 illuminating interesting biology of cutaneous synthesis of vitamin D and light skin color when early humans to adapt to Northern Europe where residents have reduced ultraviolet-B exposure [33]. In addition, a very recent study showed that only variants increasing both the mean and between-subject variance in fasting glucose levels (i.e., with signals in both GWAS and vGWAS) could predict risk of type II diabetes [34].

Nevertheless, there remain some unanswered questions about the vGWAS approach. First, the previous RA vGWAS analyzed cohorts genotyped with immunochip carrying a high-density of markers preselected for a set of immune-relevant genes [35] and consequently provided only a partial genome view of vGWAS. It is therefore of interest to find out whether vGWAS is useful in other diseases and at the genome-wide scale. Second, power of vGWAS is determined by the sample size, coverage and density of genomic markers such as single nucleotide polymorphism (SNP), effect sizes of underlying interactions, and (probably unobserved) interacting factors [22, 26]. Imputation can markedly increase SNP coverage and density, i.e., to increase linkage disequilibrium (LD) with any causal variants [12, 36], but is yet to be employed in vGWAS of diseases. Therefore, we applied the vGWAS approach [22] to analyze two psoriasis cohorts: the Wellcome Trust Case Control Consortium 2 (WTCCC2, for discovery) and the cohort derived for the Collaborative Association Study of Psoriasis (CASP, for replication), to explore these aspects.

Materials and methods

Study cohorts and quality control

We used the WTCCC2 psoriasis cohort derived as a component of the Wellcome Trust Case Control Consortium 2 project to discover vGWAS signals and the cohort derived for the Collaborative Association Study of Psoriasis (CASP, General Research Use only) to replicate. Both cohorts have been documented in details elsewhere [16, 37]. Briefly, the WTCCC2 cohort samples were recruited from five centers in England, Scotland and Ireland, where phenotypic data and blood samples were collected after research ethics approval was received from each participating institution and after subjects had given written informed consent, and DNA samples were genotyped on the Illumina Human660W-Quad and Illumina custom Human1.2M-Duo for cases and controls respectively. The CASP cohort samples were recruited from the US and all participants provided written informed consent and were genotyped on the PERLEGEN-600K. All experiments (e.g., genotyping) were performed in accordance with relevant guidelines and regulations.

We excluded SNPs on sex chromosomes and samples of non-European origin. The WTCCC2 and CASP datasets had 513,278 and 372,025 SNPs available for imputation, respectively. The two cohorts were pre-phased using SHAPEIT2 [38] and imputed to the 1000 Genomes Project reference panel (phase1 integrated variant set v3) using IMPUTE2 [39]. SNPs with an imputation INFO score <0.8 and indels were excluded and the remaining imputed data were converted to PLINK format. We used PLINK [40] to perform quality control in the imputed as well as the raw data of each cohort to ensure: minor allele frequency >0.01, SNP call rate >0.95, sample call rate >0.99, deviation from Hardy–Weinberg equilibrium P < 1.0e−04.

Two-stage vGWAS

We conducted vGWAS of the raw and imputed data for each cohort using the two-stage approach previously developed for case–control disease phenotypes [22]. Briefly, GCTA [41] was used to compute the genomic relationship matrix (GRM) and subsequently the first ten principal components (PCs) and then to partition the dichotomous phenotypes into polygenic liability risk and residuals for each unrelated individual, where the disease prevalence was set as 0.025, the GRM relatedness threshold was set as 0.05 and the first 10 PCs were fitted as covariates in the mixed model. Then the resultant residuals were tested for variance heterogeneity across three SNP genotypes using the Levene’s (Brown–Forsythe) test implemented in an R package VariABEL [42] that requires no assumption of normally distributed phenotypes:

$$T^2 = \frac{{(N - k)\mathop {\sum}\limits_{j = 1}^k {n_j\left( {Z_j. - Z..} \right)^2} }}{{(k - 1)\mathop {\sum}\limits_{i = 1}^N {\left( {Z_i - Z_{gi}.} \right)^2} }}$$
(1)

where the residuals were the trait y; Zi = |yigi| is the deviation of y of the ith sample (yi) and the median of y in samples with genotype g (gi); N is the sample size and k is the total possible genotypes; nj is the number of samples with genotype j; Zj. is the mean deviation from the median for genotype j and Z.. is the mean deviation from the overall median. When N is large, T2 is an approximate χ2 test taking two degrees of freedom.

For simplicity, we adopted the GWAS consensus threshold of 5.0e−08 as the genome-wide significance threshold for vGWAS. We also derived the 5% genome-wide suggestive threshold as 6.7e−05 using permutation of the WTCCC2 data in these steps: (a) randomly permute the environmental residuals of samples; (b) perform the Levene’s (Brown–Forsythe) test genome-wide; (c) record the lowest P-value; (d) reiterate steps (a) to (c) 1000 times, sort the recorded P-values and derive the 5% threshold. We considered vGWAS signals exceeding the genome-wide suggestive threshold in WTCCC2 as discoveries and examined their replication in the CASP cohort: as direct replication if the same SNP had two degrees of freedom in the CASP vGWAS with a P-value <5.0e−02, or indirect replication if the SNP is not available but a proxy had a P-value <5.0e−02 and with two degrees of freedom. We reported mostly the results from the imputed data in the main text.

Additional statistical analysis

For comparison we also performed a conventional GWAS for each cohort using PLINK following the previous study example [22]. We used the following logistic regression model to test GxE or GxG interactions for an identified vGWAS SNP K:

$$\log \left( {\frac{P}{{1 - P}}} \right) = \mu + \beta X + \beta _kK + \beta _fF + \beta _{kf}KF + e$$
(2)

where P is the probability of an individual being a case rather than a control in a population, µ is the model constant, β is the effects of fixed covariates (e.g., gender and age-of-onset), βk is the effect of a vGWAS SNP K, βf is the effect of the interacting factor F that can be an environmental factor (i.e., testing GxE) or another SNP (i.e., testing GxG), βkf is effect of interactions between K and F, e is the random error. We quoted SNP genomic locations in the GRCh38/hg38 version throughout.

Results

After quality control and a removal of genomic related individuals based on the imputed data, 2175 cases and 5144 controls each with genotypes of about 5.5 million SNPs were included in the WTCCC2 analyses, in contrast to 849 cases and 641 controls with about 4.6 million SNPs in the CASP analyses (Table 1). The polygenic heritability estimate was 53.7% in WTCCC2, much lower than 67.9% estimated from the CASP cohort that had fewer controls (i.e., possibly inflated genetic effects) but a much larger standard error estimate. On the basis of the GWAS consensus genome-wide significance threshold of 5.0e−08 and a genome-wide suggestive threshold of 6.76e−05 derived from permutation, we found 6298 SNPs in WTCCC2 with only 1% located outside of MHC in contrast to 905 significant SNPs detected in CASP all within MHC (Table 1).

Table 1 Summary information of the study cohorts and vGWAS results

In the discovery cohort WTCCC2, vGWAS identified two genome-wide significant (P < 5.0e−08) loci MHC and IL12B that were also genome-wide significant in the conventional GWAS. The vGWAS also identified suggestive loci KCNH7, ANKRD55, EXOC2, SFTPD, and FUT2 of which only KCNH7 was significant in the GWAS (Fig. 1). On the other hand, a number of genome-wide significant (P < 5.0e−08) GWAS hits, e.g., rs27524 (mapping to ERAP1), were not detected in the vGWAS analysis (Supplementary Table S1). The vGWAS SNPs above the genome-wide suggestive threshold in WTCCC2 were tested in CASP but only the two genome-wide significant SNPs rs115903802 (mapping to HLA-C) and rs4921482 (mapping to IL12B) were replicated (Table 2).

Fig. 1
figure 1

Aligned Manhattan plots of GWAS (left) and vGWAS (right) analyses in WTCCC2 with imputed data. P-values are at the −log10 scale; red line represents the genome-wide significance threshold; blue line represents the genome-wide suggestive threshold for vGWAS derived from permutation; significant loci are annotated to genes in red (or blue if reached only the vGWAS suggestive threshold, with the minimum P-value in brackets if <1.0e−20)

Table 2 Genome-wide suggestive vGWAS signals in WTCCC2 and their replication in CASP

Imputation increased both the strength and number of vGWAS signals in the WTCCC2 cohort, where rs115903802 (HLA-C), rs4921482 (IL12B), and rs601338 (FUT2) were all imputed SNPs each carrying a stronger signal than that from the corresponding genotyped SNPs within the gene region (Table 2; Supplementary Fig. S1; Supplementary Table S2). In addition, most of the identified genotyped vGWAS SNPs (Supplementary Table S2) gained certain signal strength from imputation that enabled a better estimate/partition of the additive/non-additive genetic variances, e.g., the heritability estimate was 0.495 in the raw data without imputation, much lower than 0.537 estimated from the imputed data (Table 1).

We tested the genome-wide significant vGWAS SNPs for GxE interactions with either gender or age-of-onset in WTCCC2. We found a moderate interaction (interaction P = 8.1e−05, with the full two degrees of freedom (df)) between age-of-onset and rs3132560 (vGWAS P-value of 5.4e−18) mapping to the PSORS1C1 gene. The age-of-onset interaction was not directly replicated (interaction P = 9.1e−02, with 2 df) but indirectly by a proxy SNP rs1265092 (interaction P = 3.2e−04, with 2 df) in CASP. We also tested GxG interactions within the MHC region and found signals similar to previously reported [43] (results not showed). In addition, we explicitly tested the reported interaction between HLA-C and ERAP1 but found no important signal in either cohort.

Discussion

We present a genome-wide profile of vGWAS of psoriasis for the first time. Similar to our previous vGWAS of RA based on immunochip data [22], vGWAS was able to identify non-additive loci that were either significant in the conventional GWAS (e.g., HLA-C, IL12B), or not significant in the GWAS but detected in previous GWAS meta-analyses, i.e., EXCO2 and ANKRD55 for psoriasis [5, 44] and SFTPD for RA [45]. In addition, vGWAS of WTCCC2 identified a suggestive locus FUT2 that was only claimed genome-wide significant very recently in a large scale GWAS meta-analysis combining WTCCC2 with seven other Caucasian cohorts [46]. Only the two major psoriasis loci HLA-C and IL12B were replicated in the CASP cohort, which might be viewed as a limitation of the study (Table 2). However, it should be noted that sample sizes in CASP were much smaller than WTCCC2 (Table 1), limiting the power to detect modest effect sizes. Also limited by the low power issue, we only found evidence of GxG and GxE in MHC region but not in IL12B. Nevertheless, both HLA-C and IL12B were previously known interacting loci [14,15,16,17, 21], suggesting vGWAS does pick up non-additive signals correctly.

As expected, imputation was found to be important to increase power of detection in vGWAS [12, 22]. The increase was seen in at least three aspects: (a) better estimates of additive and non-additive variances, and hence better phenotypes for vGWAS, (b) stronger vGWAS signals in known loci (e.g., HLA-C and IL12B) because of imputed SNPs better tagging causal variants, (c) for the same reason, improving the chances of identifying novel associations, e.g., without imputation FUT2 would have been missed in several studies [46, 47]. Nonetheless, additional cares are recommended when interpreting results derived from imputed data as imputation itself could have posed extra uncertainty to genotypic variability. Further replication tests and functional investigation of FUT2 and the other suggestive vGWAS loci in the relevant environmental context are therefore recommended.

FUT2 encodes alpha-(1,2)fucosyltransferase which regulates the H antigen (precursor of the human ABO blood group antigens) on the surface of epithelial cells and in body fluids. Interestingly, FUT2 was found moderately associated with psoriasis in a cross-disease GWAS combining psoriasis with cardiovascular and metabolic diseases, where rs492602 (in perfect LD with the vGWAS SNP rs681343) had a GWAS P-value of 6.59e−05 in the WTCCC2 cohort [47]. In another early cross-disease GWAS combining samples of Crohn disease and psoriasis, the genome-wide significant SNP rs281379 mapping to FUT2, had a nearly genome-wide significant association with psoriasis when combining all psoriasis samples (P = 7.86e−08) [48]. These cross-disease GWAS results suggested that FUT2 perhaps carried only moderate additive effects in association with psoriasis. Such moderate associations thus require large samples to identify as showed in the latest large GWAS meta-analysis of psoriasis in Caucasians where rs492602 was the lead SNP.

From this perspective, vGWAS has potential in discovering novel associations driven by both non-additive and additive effects such as DHCR7 and IRF4 showed previously [33] and FUT2 here. There is evidence supporting that FUT2 could be important in responding to different environmental factors. For example, a separate study of FUT2 in Behçet’s disease [49] highlighted environmental aspects (e.g., infection by virus, inflammation by rheumatic, or metabolic factors) in the heterogeneity of the FUT2 associations. Such environmental aspects may reduce the power of association tests based solely on additive effects. Similarly, the vGWAS locus EXCO2, known to be responsible for skin color and highly related to sunlight, was only detected in GWAS of psoriasis with very large samples due to moderate additive effects, or in GWAS of a directly linked phenotype (e.g., tanning phenotype, cutaneous basal cell carcinoma) in samples with European ancestry with sufficient allele frequencies [50, 51].

vGWAS appears to be an effective filter to improve power of detection GxG interactions by concerning only the identified vGWAS loci [12, 22, 33]. However, other than the known GxG interactions within the MHC region [33, 43, 52], this study is obviously low powered to explicitly test either GxG or GxE interactions at a given model because of small samples in each single GWAS cohort. This is in line with the previous observations where multiple reasons could account for lack of robust findings of interactions, e.g., unmeasured causal environmental triggers [12, 13, 22, 33]. These reasons are also relevant to explain limited replication of the identified vGWAS loci and thus warrant further efforts to dissect any interactions potentially involved these loci. Our results together indicate that vGWAS is a useful complement to GWAS in discovering loci where non-additive effects are equally (if not more) important but requires large samples.