In a previous large-scale exome sequencing analysis for psoriasis, we discovered seven common and low-frequency missense variants within six genes with genome-wide significance. Here we describe an in-depth analysis of noncoding variants based on sequencing data (10,727 cases and 10,582 controls) with replication in an independent cohort of Han Chinese individuals consisting of 4,480 cases and 6,521 controls to identify additional psoriasis susceptibility loci. We confirmed four known psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114; 2.30 × 10−20≤P≤2.41 × 10−7) and identified three new susceptibility loci: 4q24 (NFKB1) at rs1020760 (P=2.19 × 10−8), 12p13.3 (CD27-LAG3) at rs758739 (P=4.08 × 10−8) and 17q12 (IKZF3) at rs10852936 (P=1.96 × 10−8). Two suggestive loci, 3p21.31 and 17q25, are also identified with P<1.00 × 10−6. The results of this study increase the number of confirmed psoriasis risk loci and provide novel insight into the pathogenesis of psoriasis.
Psoriasis is a common, chronic, inflammatory disease of the skin, affecting approximately 2% of the population worldwide1. Our understanding of the genetic architecture of psoriasis has increased rapidly due to genome-wide association studies (GWASs), which have identified more than 40 psoriasis susceptibility genes/loci2,3,4,5,6,7,8,9,10,11,12,13. However, the combined effect of these loci does not fully account for the observed genetic susceptibility to psoriasis, indicating that additional genetic factors remain to be discovered. Furthermore, most of the associated variants emerging from GWASs lie within noncoding regions14.
In our previous sequencing analysis, which aimed to investigate the contribution of functional coding variants to the genetic component of psoriasis by using 10,727 cases and 10,582 controls, 7 common and low-frequency missense single-nucleotide variants (SNVs) within IL23R, GJB2, LCE3D, ERAP1, CARD14 and ZNF816A were identified as being associated with psoriasis15. However, single-variant and gene-based association analyses of non-synonymous SNVs failed to identify new genes associated with psoriasis, indicating that coding variants, at least non-synonymous ones with low and rare frequencies might make a limited contribution to the overall genetic risk of psoriasis.
In this study, we performed an in-depth data analysis focusing on noncoding SNVs, which were derived from the flanking regions (within ∼200 bp) of target genes in the exome sequencing and targeted sequencing stages (10,727 cases and 10,582 controls)15, and conducted a replication study in an independent cohort of Han Chinese individuals (4,480 cases and 6,521 controls) to explore additional common susceptibility variants for psoriasis. We confirmed four known psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114) and identified three new susceptibility loci (NFKB1, 12p13.3 and 17q12) as well as two suggestive loci (3p21.31 and 17q25) for psoriasis. These findings reveal new genetic susceptibility factors and suggest several new biological pathways related to psoriasis.
Exome sequencing and targeted sequencing
During exome sequencing of samples from 781 cases and 676 controls, we identified 518,308 SNVs. Of these variants, 133,671 were noncoding SNVs with an MAF >0.5%. Through the targeted sequencing of 1,362 genes from 9,946 cases and 9,906 controls, 14,365 high-confidence noncoding SNVs were detected15. By subjecting a combined data set of 1,362 genes (from 10,727 psoriasis cases and 10,582 controls) obtained through these two steps to a trend test, the SNV rs1020760, which is located within NFKB1, reached genome-wide significance (Pexome-target=2.19 × 10−8, odds ratio (OR)=1.12) and the SNV rs1609798 at this locus showed suggestive association (Pexome-target=9.87 × 10−8, OR=1.12) (Table 1) as determined by the logistic regression (additive model). Linkage disequilibrium (LD) analysis showed that these two significant SNVs represented a signal showing a strong LD (D′=0.98, r2=0.66). Further conditional analysis indicated that rs1020760 (Pcondition=3.52 × 10−2, OR=1.08) was more significant than rs1609798. (Pcondition=1.64 × 10−1, OR=1.05) (Supplementary Table 1). In addition, we confirmed three previously reported psoriasis susceptibility genes: IL12B (rs2288831, Pexome-target=2.30 × 10−20, OR=0.83), IFIH1 (rs13431841, Pexome-target=2.96 × 10−9, OR=0.83) and ERAP1 (rs27043, Pexome-target=6.50 × 10−12, OR=0.87) with logistic regression (additive model). We also observed 22 additional SNVs showing suggestive associations (Ptarget<0.01) (Table 2 and Supplementary Table 2), which were selected for validation via genotyping. Genotype information of every sample on reported associated SNP locations were available in Supplementary Data 1.
We genotyped the 22 SNVs showing suggestive evidence (Ptarget<0.01) in additional 4,480 psoriasis cases and 6,521 controls. After we combined the genotyping data from these three stages, we found that three additional SNVs at two loci reached genome-wide significance (Pcombined=5.00 × 10−8, Table 2) by using logistic regression (additive model): 12p13.3 (rs758739, Pcombined=4.08 × 10−8, OR=0.91; rs2243750, Pcombined=4.38 × 10−8, OR=0.91) and 17q12 (rs10852936, Pcombined=1.96 × 10−8, OR=1.10). In addition, SNV rs12936231 within 17q12 also showed a strong association with psoriasis (rs12936231, Pcombined=5.02 × 10−8, OR=1.10). LD analysis revealed that the two SNVs at 12p13.3 represented a single signal displaying a strong LD (D′=0.98, r2=0.78), whereas the two SNVs at 17q12 were in complete LD (D′=1.00, r2=1.00). In the validation stage, we also identified two additional loci exhibiting suggestive evidence of association through logistic regression (additive model, Table 2): 3p21.31 (rs1863837, Pcombined=3.91 × 10−7, OR=0.90) and 17q25 (rs3744017, Pcombined=5.83 × 10−7, OR=1.12) and confirmed the association of RNF114 with psoriasis (rs4647954, Pcombined=2.41 × 10−7, OR=1.09).
In the current study, we performed an in-depth data analysis focusing on noncoding SNVs based on large-scale sequencing data (from 10,727 cases and 10,582 controls)15. Furthermore, we performed a replication study in an independent cohort of Han Chinese individuals (4,480 cases and 6,521 controls). We replicated four known susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114) and identified three novel psoriasis susceptibility loci: NFKB1 (rs1020760), 12p13.3 (rs758739) and 17q12 (rs10852936) (Supplementary Fig. 1). In addition, we identified two suggestive loci (3p21.31 and 17q25). Although we achieved good sequencing coverage and depth15, no functional coding variants (non-synonymous) were identified that showed significance within these new susceptibility loci.
At 4q24, we identified two noncoding variants (rs1020760 and rs1609798) within NFKB1 (Supplementary Fig. 1a). NFKB1 (nuclear factor of kappa light polypeptide gene enhancer in B-cells 1) encodes a 105 kDa protein that functions as a Rel protein-specific transcription inhibitor and a 50 kDa protein that functions as a DNA binding subunit of the NF-kappa-B (NF-κB) protein complex16. NF-κB is one of the most important regulators of proinflammatory gene expression and has an important role in balancing growth and differentiation in the epidermis17. The synthesis of cytokines such as tumour necrosis factor α (TNF-α), interleukin-6 (IL-6), IL-1b and IL-8 is mediated by NF-κB. In psoriasis, many cytokine transcription factors and inflammatory mediators released from chronic inflammatory cells have important roles in modulating and regulating keratinocyte differentiation and proliferation18. As NF-κB regulates cytokine gene expression, it is likely that NF-κB is one of the key factors involved in the pathogenesis of this disease. Another gene, NFKB1A (14q13.2), has also been described to be strongly associated with psoriasis2,9. Furthermore, we performed a gene expression analysis based on public psoriasis databases19 and found that NFKB1 is upregulated in skin samples from psoriatic patients compared with normal controls (both comparing skin samples from cases (PP) versus normal skin from controls (NN) (P=2.33 × 10−24) and comparing affected (PP) versus unaffected (PN) skin samples obtained from cases (P=3.28 × 10−21)). In addition, interrogation of the ENCODE database revealed that rs1609798 lies in a DH site in the last intron of NFKB1 in human umbilical vein endothelial cells (HUVECs), skeletal muscle myoblasts (HSMMs) and embryonic stem cells (H1s).
Candidate genes lying close to the associated SNVs (rs758739 and rs2243750) at 12p13.3 included CD27 (CD27 molecule), CHD4 (chromodomain helicase DNA binding protein 4) and LAG3 (lymphocyte activation gene 3) (Supplementary Fig. 1b). These genes were determined to be candidates based on their biological implications for psoriasis. CD27, a member of the TNF receptor superfamily, is a transmembrane receptor required for the generation and long-term maintenance of T-cell immunity20. It binds to the ligand CD70 and provides co-stimulatory signals for T-, B- and NK-cell activation. Furthermore, it enhances T-cell survival and effector function, NK-cell function, B-cell differentiation and plasma cell function21,22. CHD4 is a catalytic subunit of the NuRD (nucleosome remodelling and deacetylase) complex. Recently, it has emerged that CHD4 has important roles in the DNA damage response, transcriptional regulation, cell cycle progression and the maintenance of genomic integrity23. LAG3, a ligand for MHC class II receptors, is closely related to CD4 and downregulates the activated T-cells on which it is expressed through a high-affinity interaction with its receptor; this interaction blocks the binding of CD4 (ref. 24). Studies in LAG3 knockout mice have shown that LAG3 negatively regulates both the expansion of T-cells and the size of the memory-T-cell pool25. During inflammation, both LAG3 and MHC class II receptors are strongly upregulated, and their interaction may also be involved in the activation of antigen-presenting dendritic cells26. In addition, LAG3 has been shown to confer susceptibility to multiple sclerosis. An analysis of gene expression based on public psoriasis databases19 indicated that the expression of CD27 and LAG3 is upregulated in psoriasis lesions compared with healthy skin samples in both psoriasis cases (P=7.47 × 10−15 and P=6.56 × 10−10, respectively) and controls (P=4.64 × 10−18 and P=1.16 × 10−23, respectively). A search of the ENCODE database showed that rs2243750 is located within the DHS in medulloblastoma (medullo), osteoblasts (osteobl) and pulmonary artery endothelial cells (HPAECs), suggesting an effect on transcription. Further studies to clarify the identity of the causal gene for psoriasis that is located in the novel susceptibility locus (12p13) identified in this study are therefore warranted.
At 17q12, we identified two psoriasis-associated SNVs, which were in complete LD (D′=1, r2=1) (Supplementary Fig. 1c). The 17q12 region has been reported to be associated with several autoimmune diseases, including Crohn’s disease, ulcerative colitis, rheumatoid arthritis and systemic lupus erythematosus27,28,29,30,31. Within this region, IKZF3 (IKAROS family zinc finger 3) is a plausible candidate gene. IKZF3 is a member of the IKAROS family of transcription factors involved in preventing the apoptosis of IL2-deprived B-cells32 and regulating B-cell activation33; in addition, it has been implicated in autoimmune disorders, including a lupus-like syndrome that develops in Ikzf3-deficient mice34. The expression of IKZF3 is upregulated in the skin of psoriasis patients, according to public psoriasis databases19 (PP versus NN: P=9.28 × 10−17, PP versus PN: P=2.25 × 10−16), and evidence from the ENCODE database also indicated that rs12936231 is located within the DHS in different cell lines, including mammary epithelial cells (HMECs) and prostate adenocarcinoma cells (LNCaPs), and obtains weak enhancer signal in B-lymphocytes (GM12878).
IFIH1 (Interferon induced with helicase C domain 1) (2q24) encodes a member of the DEAD box proteins family. Members of this family have been implicated in a number of cellular processes. The SNP rs17716942, which is located ∼85 kb upstream of IFIH1, has been reported to be associated with European population9; however it is monomorphic in the Han Chinese population, which is consistent with data from the 1000 Genomes Project. In this study, we identified a susceptibility intron variant of IFIH1 (rs13431841), which indicated that IFIH1 is also a psoriasis susceptibility gene in the Han Chinese population. In addition, IFIH1 has also been shown to be related to systemic lupus erythematosus35, type 1 diabetes36 and Graves’ disease37.
RNF114 (ring finger protein 114) (20q13.13) encodes a 2.4 kb transcript and a 25.7 kDa protein. A previous GWAS identified the SNP rs495337, which is located ∼30 kb upstream of RNF114 and was found to be associated with psoriasis in the European population11. This European SNP was observed in our exome sequencing data. In this study, we determined that another SNV (rs4647954) in this LD block is associated with psoriasis in Han Chinese individuals. To evaluate the relationship between rs495337 and rs4647954, we performed LD analysis using exome sequencing data, and we found that rs4647954 identified in our study produced a signal separate from that of the European SNP (D′=0.46, r2=0.08). It has been shown that RNF114 is abundantly expressed in skin, T-lymphocytes and dendritic cells and has a putative role in the regulation of immune responses11.
For the four SNVs within four known susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114), the minor alleles identified in the present study were similar between the European and Han Chinese populations, except for rs27043. However, their allele frequencies were highly different, suggesting that these SNVs exhibit allelic heterogeneity within the two ethnic groups (Supplementary Table 3).
In conclusion, we have established three new psoriasis susceptibility loci in a Han Chinese population: NFKB1, 12p13.3 and 17q12. Some of these loci have also been implicated in other autoimmune diseases. Furthermore, we confirmed four previously reported psoriasis susceptibility loci (IL12B, IFIH1, ERAP1 and RNF114) and identified two additional candidate loci (3p21.31 and 17q25). These data, along with data on other reported susceptibility loci, collectively demonstrate the complexity of the heritable contribution to the pathogenesis of psoriasis. Further study will be required if we are to advance our understanding of how the loci identified in this study influence the aetiology of psoriasis.
After the application of quality control procedures to the samples, 15,207 cases and 17,103 healthy controls were enrolled in this study. These cases and controls were Chinese individuals who were enrolled through collaboration among multiple hospitals in China. The samples included in the exome sequencing analysis (the initial stage) were consisted of 781 patients with psoriasis and 676 controls (Table 3), which were mainly selected from previous GWAS samples12. A total of 9,946 psoriasis cases and 9,906 controls (Table 3) were recruited for the targeted sequencing analysis (the second stage). The third genotyped cohort consisted of 4,480 psoriasis cases and 6,521 controls (Table 3). The clinical diagnosis of all selected individuals was verified by at least two dermatologists, and clinical information on the patients with psoriasis was collected through a full clinical workup by professional investigators. All the healthy controls were clinically confirmed to be free from psoriasis, autoimmune disorders and systemic disorders and to have a family history of psoriasis and other autoimmune-related disorders (including first-, second- and third-degree relatives). All participants provided written informed consent. The study was approved by the institutional ethical committee of each hospital (The Second Hospital of Anhui Medical University, The First Affiliated Hospital of Anhui Medical University and Huashan Hospital of Fudan University) and was conducted according to the principles of the Declaration of Helsinki.
In the first stage, a total of 1,500 exomes were sequenced including 800 cases and 700 controls, at BGI Shenzhen, China. Genomic DNA samples from these individuals were extracted using Flexi Gene DNA kits (Qiagen) and hybridized with a NimbleGen 2.1 M-probe sequence capture array38 to enrich for exonic DNA in each library. To evaluate exon capture efficiency, the proportions of reads mapping to target regions and to their flanking regions (within 200 bp) were calculated for each individual. Then, we captured and verified the sequencing data for each sample independently using the Illumina Hiseq 2000 platform (San Diego, CA) with an average sequencing depth of ∼34 × (ref. 15). Raw image files were processed via the Illumina Pipeline (version 1.3.4) for base calling with the default parameters, and sequences were generated for each individual in the form of 90 bp reads.
In the second stage, we carried out targeted sequencing in a large independent sample of 10,003 cases and 10,002 controls. The target regions consisted of three components: (1) all the exons and exon–intron boundaries for the top 742 genes (based on gene-based association analysis of the exome sequencing data); (2) all the exons and exon–intron boundaries for 622 immune disease-related genes according to the GWAS catalogue ( http://www.genome.gov/gwastudies), including 57 candidate genes within 44 susceptibility loci associated with psoriasis; and (3) regions containing the top 133 non-synonymous SNVs obtained through single-variant association analysis of the exome sequencing data (an average of 100 bp per SNV). Furthermore, the proportions of reads that mapped to target regions or their flanking regions (within 200 bp) were calculated for each individual to evaluate the exon capture efficiency. The final target regions, including 1,362 genes, were merged from these three resources (∼3.2 Mb in total), optimized and used to manufacture a SeqCap EZ Library (Roche NimbleGen, Inc.) to enrich the target regions. Targeted sequencing was finally performed using the Illumina Hiseq 2000 platform with paired-end read lengths of 90 bp, which provided a 64 × coverage on average for these samples15.
We aligned the samples to the NCBI human genome reference assembly (build 36.3) using the Burrows–Wheeler Aligner39 and processed BAM files to perform re-alignment around known indels using the Genome Analysis ToolKit (GATK v1.6)40. All the aligned read data were subjected to Count Covariates (GATK, Count Covariates) on the basis of known SNVs (dbSNPs), and the base quality was subsequently recalibrated (GATK, TableRecalibration).
Sample quality control
The sequencing data for all individuals were evaluated against quality control metrics by verifying (1) an average sequencing depth ≥10 × in the first stage (exome sequencing) and ≥30 × in the second stage (targeted sequencing); (2) that 90% of the target region was covered ≥8 × ; (3) the GC content; (4) the heterozygosity of SNVs on autosomes; (5) the inbreeding coefficient; (6) population stratification (PCA)15; (7) the pairwise IBD; and (8) concordance with the genotyping data. All the QC metrics were reviewed to identify deviations of the data from known or historical norms. Samples that failed QC were not included in further analyses.
A total of 43 individuals from the initial stage and 153 individuals from the second stage failed the sample QC procedure and were removed from further analyses.
We generated genotype information for the target and flanking regions in each individual using SOAPsnp (v1.05)41, and we further identified and merged candidate SNVs for which at least one individual was called as a high-quality SNV. We then extracted the genotypes of all samples at these positions to build a genotype matrix as an input for the subsequent analysis. A number of filtering criteria were applied to remove false positive calls, and the data quality and error rates were carefully evaluated.
During exome sequencing and targeted sequencing, the detection of variants and genotyping were performed in the target region and the 200 bp flanking region for each individual using SOAPsnp (version 1.05). Then, variation sites such that at least one sample has a confident variation (genotype quality ≥20, depth ≥8 × and depth of variation allele ≥4 × ) were collected and merged. Finally, the genotype information for all the samples was extracted and combined.
Variant quality control
After the initial SNV calls were generated, we performed further filtering to obtain high-confidence SNVs in the exome sequencing and targeted sequencing steps. For the exome sequencing step, the following criteria were applied: (1) the SNV should be in the target region or the 200 bp flanking region; (2) the SNV should be covered by a total depth of at least 4,500 × across all individuals (average of 3 × for each individual); and (3) the SNV should be observed in at least one individual in such a way that the number of reads containing mutant alleles is >4 in the individual genotype files. For the targeted sequencing data sets, we excluded all SNVs with a call rate below 90% (Q20 and depth ≥8 × were considered high quality, others were considered missing). We further excluded SNVs that did not pass the allele balance test (the distributions of the major and minor alleles at the heterozygous positions should be kept in balance, binom test, P>1.00 × 10−7), the end enrichment test (major and minor alleles should not be enriched at the ends of the reads, Fisher’s test, P>1.00 × 10−7) or the strand bias test (major and minor alleles should not be enriched on either strand, Fisher’s test, P>1.00 × 10−7). Furthermore, SNVs should be located in regions with a homopolymer length ≤6 bp, and SNVs located within the MHC and regions of homologous sequence or known indels (indels identified in the 1000 Genomes Project and extending 5 bp) were removed.
We sequenced 24 CHB samples included in the 1000 Genomes Project in this study during the targeted sequencing stage to allow direct comparison of the variants and genotypes called in independent experiments. We relied on the conservative genotype calls provided by the 1000 Genomes Project. There were 35,297 variants showing genotype calls in both studies in the 24 CHB samples. Combining the results for all samples, we estimated an overall concordance rate of 99.59%. In addition, 27 SNVs obtained from the sequencing data were genotyped to determine the fingerprint concordance of the genotype via MALDI-TOF mass spectrometry (Sequenom) in 975 samples, and we observed that 99.49% of the individuals showed a genotype concordance ≥99% (ref. 15).
Annotation of SNVs
Using the RefSeq database, we annotated the obtained SNVs into different functional categories with ANNOVAR42 according to their genetic location and their expected effect on the encoded gene products. In addition, we categorized the SNVs as either known or novel according to their existence in dbSNP (version 135).
In the third stage of the project, we selected the top 22 SNVs with P<0.01 from the targeted sequencing data to be genotyped in an additional 4,480 psoriasis cases and 6,521 controls. Approximately 15 ng of genomic DNA from each sample was used for genotyping. Locus-specific PCR and detection primers were designed using MassARRAY Assay Design 3.0 software (Sequenom) (Supplementary Table 4). After the DNA samples were amplified via multiplex PCR, allele detection was performed through MALDI-TOF mass spectrometry (Sequenom) at the State Key Laboratory of Dermatology, Ministry of Science and Technology, Hefei, Anhui, China. In this case–control cohort, we excluded one SNV showing a call rate of <90% from further analysis. After quality control, 21 SNVs were analysed in a total of 11,001 individuals at the genotyping replication stage.
The single-variant association analysis of SNVs showing an MAF >0.5% was performed through a standard case/control association analysis (basic χ2 allelic test) using PLINK 1.07 (ref. 43). We then examined the number of such variants carried by each of the patients and controls, and we compared the number of carriers using Fisher’s exact tests or Cochran–Mantel–Haenszel tests44 across the three collections. Heterogeneity was assessed using the Breslow–Day test. The frequency of alleles in cases and controls, the asymptotic P-values for this test and the odds ratio (OR) were generated in parallel. A Manhattan plot was generated depending on the P-value and location of the SNVs across the genome (Supplementary Fig. 2). Quantile–quantile (QQ) plots were generated using the R package (Supplementary Fig. 3). LD analysis was conducted using Haploview 4.2 (ref. 45).
Accession codes: Exome sequence data for psoriasis cases and controls have been deposited in GenBank/EMBL/DDBJ Sequence Read Archive (SRA) under the accession code SRA168458.
How to cite this article: Sheng, Y. et al. Sequencing-based approach identified three new susceptibility loci for psoriasis. Nat. Commun. 5:4331 doi: 10.1038/ncomms5331 (2014).
Sequence Read Archive
We thank the individuals and their families who participated in this project. This study was funded by the National Science Fund for Excellent Young (81222022), the Program of Outstanding Talents of the Organization Department of the CPC Central Committee, the Key Program of the National Natural Science Foundation of China (81130031), the Local Universities Characteristics and Advantages of Discipline Development Program of the Ministry of Finance of China and the General Program of the National Natural Science Foundation of China (81072461, 30971644, 31171224, 31000528, 81000692, 81071285, 81172866, 81172591, 31200939), the New Century Excellent Talents in University (NCET-11-0889), the Science and Technological Fund of Anhui Province for Outstanding Youth (1108085J10) as well as the Pre-National Basic Research Program of China (973 Plan; 2012CB722404) and the National Basic Research Program of China (973 Plan; 2009CB825404).
Genotypes of associated SNPs in this study
About this article
International Journal of Molecular Sciences (2018)