Mapping of susceptible variants for cold medicine-related Stevens–Johnson syndrome by whole-genome resequencing

Stevens–Johnson syndrome (SJS) and its severe condition with extensive skin detachment and a poor prognosis, toxic epidermal necrolysis (TEN), are immunologically mediated severe cutaneous reactions of the skin and mucous membranes such as the ocular surface. Genetic variations on the HLA-A and other autosomal genes have been identified as risk factors for cold medicine-related SJS/TEN with severe ocular complications (CM-SJS/TEN with SOC). Using a whole-genome sequencing (WGS) approach, we explored other susceptible variants of CM-SJS/TEN with SOC, especially among rare variants and structural variants (SVs). WGS was performed on samples from 133 patients with CM-SJS/TEN with SOC and 418 healthy controls to obtain single nucleotide polymorphisms (SNPs) and SVs. Genome-wide association tests were performed with these variants. Our genome-wide association test reproduced the associations of the common variants of HLA-A and loci on chromosome 16q12.1. We also identified novel associations of SVs on these loci and an aggregation of rare coding variants on the TPRM8 gene. In silico gene expression analysis on the HLA-A locus revealed that the SNP (rs12202296), which was significantly associated with susceptibility to CM-SJS/TEN with SOC, was correlated to an increase in HLA-A expression levels in the whole blood (P = 2.9 × 10−17), from the GTEx database. The majority of variants that were significantly associated with CM-SJS/TEN with SOC were found in non-coding regions, indicating the regulatory role of genetic variations in the pathogenesis of CM-SJS/TEN with SOC.


INTRODUCTION
Stevens-Johnson syndrome (SJS) and its severe condition with extensive skin detachment and a poor prognosis, toxic epidermal necrolysis (TEN) are immunologically mediated acute inflammatory vesiculobullous reactions on the skin and mucous membranes. It manifests as painful blistering skin rashes, fever, and hematologic abnormalities. Many drugs and infectious agents have been reported to trigger these reactions [1][2][3] . Although the annual incidence is only 1-6 cases/million, these reactions show high mortality rates (SJS: 3%; TEN 27%) [1][2][3][4] . Among those who survive, about 40% have been reported to develop severe ocular complications (SOC) 4,5 . Carbamazepine, which is used primarily for treating epilepsy and neuropathic pain, allopurinol, which is used to decrease blood uric acid levels, and cold medicines (CM), including multi-ingredient cold medications and nonsteroidal antiinflammatory drugs (NSAIDs), have been implicated in the development of SJS/TEN 4
We have discovered loci associated with susceptibility to CM-SJS/TEN with SOC among SNPs that were directly genotyped by genome-wide SNP arrays 14 or obtained by genotype imputation 15 . Because genotype imputation can infer virtually all genotypes of common SNPs (minor allele frequency > 5%) 16 , the vast majority of common SNPs among the samples have been tested for association with CM-SJS/TEN with SOC in the previous studies. However, we have not tested the rare SNPs and structural variants (SVs) that could not be analyzed by SNP array genotyping. These variants may affect the susceptibility of the individual to develop complex traits and diseases. Indeed, the functional impact of structural variations is known to be higher than that of single nucleotide alterations 17 . Further, a recently constructed comprehensive map of gene expression has shown that the contribution of SVs to gene expression is not negligible 18 . In this study, we performed a comprehensive analysis of variations by wholegenome sequencing (WGS) of patients with CM-SJS/TEN with SOC to investigate the impact of rare variants and SVs on the development of this disease.

SNP and SV discovery from WGS data
We performed WGS of 133 patients with CM-SJS/TEN with SOC and 418 control individuals. Manipulation of DNA library and sequencing of all samples were performed in a single facility. Average read depths of autosomal chromosomes were 41.6 and 41.4 for case and control samples, respectively. We performed the discovery and genotyping of polymorphic variants using four different tools (Supplementary Table 1). After applying quality control (QC), we identified 21,207,465 variants. The most common form of variants was SNV (18,824,284). In addition, we detected various forms of structural variations (SVs): short insertion and deletions (INDEL), large deletions (DEL), duplications (DUP), inversions (INV), mobile element insertions (MEI), and short tandem repeats (STR). INDELs (1,702,192) were the most prevalent SVs, followed by STRs (670,775), MEIs (7821), DELs (2065), DUPs (273), and INVs (55). The number of variants per individual was uniform across case and control samples. In addition, no significant population stratification was observed between case and control samples by the principal component analysis ( Supplementary Fig. 1).
Gene-based test for rare variants Accumulation of rare (MAF < 1%) protein-altering mutations was assessed by Wald's test and SKAT-O test. A total of 17,315 proteincoding genes with at least one rare variant were assessed in this analysis. Although no gene exhibited significant association after Bonferroni correction (p < 0.05/17315 = 2.89 × 10 −6 ), the association shown by one and four genes were suggestive (p < 5.77 × 10 −5 ), according to the results of the Wald's test and SKAT-O test, respectively ( Table 2). The transient receptor potential cation channel subfamily M member 8 (TRPM8) gene was significant in both tests. Among 11 missense single nucleotide variants (SNVs) and two stop-gained SNVs detected in TRPM8, eight were detected in CM-SJS/TEN with SOC patients, while two were detected in control samples and three were shared between case and control samples (Supplementary Data 2).
The effects of CM-SJS/TEN with SOC-susceptibility polymorphisms in HLA-A and chromosome 16q12.1 on gene expression levels The endogenous expression level of HLA-A was strongly associated with polymorphisms in the HLA-A locus in a cis manner (Fig. 3a). These associations were probably caused by linkage disequilibrium (LD) with primary functional variants that regulated the efficiency of HLA-A expression levels. To assess the regulatory effects of HLA-A polymorphisms significantly associated with CM-SJS/TEN-susceptibility on gene expression, the differences in HLA-A expression levels in whole blood were compared among these SNPs using the GTEx portal database (V7) 19 . Among the SNPs which showed the top-100 of the strongest associations with susceptibility to CM-SJS/TEN with SOC, 48 SNVs were included in the SNPs which showed the top-100 of the strongest associations with the allelic difference of HLA-A expressions (Fig. 3b, c). Among these SNVs, the disease susceptibility allele of rs12202296, which showed both significant association with susceptibility to CM-SJS/ TEN with SOC and endogenous expression levels of HLA-A in whole blood, was associated with an increase in HLA-A expression levels (P = 2.9 × 10 −17 , Fig. 3d). These results indicated that the significant association of the HLA-A locus with susceptibility to CM-SJS/TEN with SOC was caused by the regulatory effects of primary functional polymorphisms (including SNVs, INDELs, and SVs) located near HLA-A; further, the expression level of HLA-A was up-regulated by the disease susceptibility-associated HLA-A polymorphism. We explored the functional effects of polymorphisms that were significantly associated with susceptibility to CM-SJS/TEN with SOC in chromosome 16q12.1 (Fig. 2b). The association between allelic differences of the expression levels of all genes expressed in the whole blood and the SNPs associated with susceptibility to CM-SJS/TEN with SOC were assessed using the GTEx portal database 19 . The disease susceptibility allele rs6500265, one of the SNPs most strongly associated with CM-SJS/TEN with SOC susceptibility, was also associated with a decrease in the expression level of bromodomain-containing 7 (BRD7) (P = 0.000018, Fig. 3e).

DISCUSSION
Previous studies on CM-SJS/TEN identified common SNPs and HLA alleles as susceptibility polymorphisms. In this study, we investigated the role of rare and structural variants in the development of CM-SJS/TEN with SOC. So far, the association of these variants with disease susceptibility has been difficult to assess due to technical limitations of SNP array-based GWAS. Therefore, we performed WGS on 133 patients and 418 control subjects, who were previously analyzed by SNP arrays. Extensive variant discovery, followed by joint genotyping of case and control samples, enabled us to test the associations of SNPs and SVs in a genome-wide manner. Unlike other genotyping methods such as SNP array or HLA allele typing, WGS allows the evaluation of the disease susceptibilities of all polymorphisms in the human genome. This allows the direct construction of a SNP catalog on both disease susceptibility and allelic differences in gene expressions by combining the WGS data with data from the eQTL database; it also helps us understand the functional mechanisms underlying disease susceptibility. In this study, we used this in silico fine-mapping approach to explore the regulatory role of gene expression in HLA-A and BRD7 loci (Fig. 3). However, it is difficult to adopt the study design in which a large number of samples are enrolled to gain enough statistical power in the GWAS of rare diseases like SJS. Therefore, we could not repeat the association test with replication cohorts. In the future, it is necessary to perform genome-wide metanalysis between replication cohorts from different populations to reinforce the associations found in this study. WGS has the advantage of being able to analyze almost all variants, but it is necessary to consider the increase in potential type 1 errors. This study uses a stringent 5 × 10 −8 as the genome-wide significant p value, but there remains the potential of false positives and false negative findings.
HLA class I genes, including HLA-A, HLA-B, and HLA-C, have been reported as the loci most strongly associated with susceptibility to all types of SJS/TEN, including CM-SJS/TEN with SOC. Although non-synonymous substitutions that influence the peptide binding or conformation of HLA molecules have been considered the major factors in the pathogenesis of immunological diseases, our results show that the expression levels of HLA genes also play an equally critical role. Indeed, differential expression levels of HLA-C in different alleles have been reported, with higher HLA-C expression causing an enhanced response of Tc and a deleterious effect in Crohn's disease 20 . In addition, the expression level of HLA-A has been reported to be regulated in an allele-specific manner because of methylation at several CpG sites, depending on HLA-A alleles 21 . In the present study, we constructed a SNP catalog on both CM-SJS/TEN with SOC susceptibility and allelic differences in gene expression levels by combining WGS and eQTL and elucidated the effects of SNPs associated with susceptibility to CM-SJS/TEN with SOC on HLA-A gene expression levels (Fig. 3b, c). Among these SNPs, the alleles associated with susceptibility to CM-SJS/TEN with SOC showed high HLA-A expression levels (Fig. 3d); thus, these alleles probably enhance the response of Tc as well as HLA-C 20 .
The association of SNPs with expression levels of HLA-A is likely caused by linkage disequilibrium with primary functional SVs that directly regulate HLA-A expression. For identifying primary functional polymorphisms, including SNPs and SVs, future studies could compare the HLA-A expression levels in cells that are knocked-in with different candidate functional alleles using geneediting technologies such as the CRISPR/Cas9 system. We had recently used a similar approach to identify primary functional variants in the primary biliary cholangitis (PBC)-susceptibility gene loci NFKB1/MANBA 22 .
We had previously used GWAS to confirm that chromosome 16q12.1 was associated with susceptibility to CM-SJS/TEN with SOC 15 . In the present study, we constructed a SNP catalog of chromosome 16q12.1 on both CM-SJS/TEN with SOC susceptibility and allelic differences in gene expression levels by combining WGS and eQTL analysis. We found that the gene BRD7 showed a significant association between gene expression level and rs6500265, a SNP that was one of the most strongly associated with susceptibility to CM-SJS/TEN with SOC. BRD7 is a cell growth inhibitor that causes cell cycle arrest and apoptosis by directly regulating BIRC2, BIRC3, TXN2, and NOTCH1 23-25 . A recent study had reported that BRD7 could repress the Ras/Raf/MEK/ERK pathway and inflammatory cytokines such as interleukin 6 (IL-6) and tumor necrosis factor α (TNFα) 26,27 . In the present study, BRD7 expression levels were lower in samples with allele of rs6500265, a CM-SJS/TEN-susceptibility SNP in chromosome 16q12.1 (Fig. 3e), according to the GTEx database. Lower expression levels of BRD7 could possibly attenuate the anti-inflammatory effects and may induce the explosion of inflammatory cytokines in patients with CM-SJS/TEN with SOC.
We successfully reproduced the CM-SJS/TEN with SOC variants that have been previously reported in the Japanese population. Two loci (HLA-A and chromosome 16q12.1) exhibiting genomewide significance (p < 5 × 10 −8 ) have been previously reported as associated loci of CM-SJS/TEN with SOC. In this study, the most significant association of rs9933632 at 16q12.1 was obtained by direct SNP calling from WGS analysis; we have also reported the association of this SNP by whole-genome imputation analysis 15 . This suggested that whole-genome imputation could cover a vast majority of common variants present in a reference panel, if a haplotype reference panel was matched to the study population 16 . In this study, we focused on SVs and rare variants that could not be imputed due to the limitations of the reference panel. In addition to the common SNPs in genes that have been reported to be associated with susceptibility to CM-SJS/TEN with SOC, novel candidate susceptibility variants were identified among SVs and rare variants. Cadherin 12 (CDH12) is a member of the cadherin gene family involved in calcium-dependent homophilic cell-cell adhesion 28 . In the CDH12 locus, an STR variant exhibited the strongest association with susceptibility to CM-SJS/TEN with SOC (Fig. 2d). Although the contributions of this molecule to immunological reactions and inflammations is unknown, CDH12 is known to affect vascular permeability in response to inflammation. Further research should be performed to elucidate the molecular mechanisms of disease onset attributed to the STR in the CDH12 locus, which showed suggestive association with susceptibility to CM-SJS/TEN with SOC in the present study. The rare variant association test identified the aggregation of rare variants that alter the peptide sequences of gene products ( Table 2). The gene transient receptor potential melastatin 8 (TRPM8) encodes an ion channel that acts as a cold receptor involved in thermosensation 29,30 . The activation of TRPM8 has also been reported to inhibit UV-B-induced production of prostaglandin E 2 (PGE 2 ), an inflammatory cytokine, in keratinocytes 31 . This result indicated that TRPM8 was related to inflammatory responses in the skin. The expression of EP 3 , an inhibitory receptor of PGE 2 , was found to be significantly reduced in the conjunctival epithelium on the ocular surface of patients with SJS/TEN with SOC 32 . Similar to such previous reports, rare variants whose allele frequencies are increased in SJS/TEN patients in this study might also modify the function of TRPM8 and might have effects on the production of PGE 2 . PARD3 (also known as Par3) is an adapter protein localized in the epithelial tight junction;  it plays a critical role in it by forming a complex with PARD6 (also known as Par6) and atypical protein kinase C (aPKC) 33,34 . There is evidence that PARD3 contributes to the acceleration of vascular permeability in response to inflammation. Vascular endothelial (VE)-cadherin, Pals1, and PARD3 regulates endothelial polarity and vascular lumen formation 35 . Additionally, in an in vitro study using human monocytic cell line THP1, PARD3 was revealed to contribute to cell migration in response to gradients of inflammatory chemokines 36 . Further, the PARD3-PARD6-aPKC complex has been shown to be downregulated by TNF-α signaling in epithelial cells and in vivo during intestinal inflammation 37 . Therefore, the rare variants in PARD3 detected here may have a role in acute inflammation via abnormal functioning of tight junctions in SJS/TEN with SOC patients. We also confirmed the known associations of rs897693 (OR = 4.39, p = 2.74 × 10 −6 ) and rs4917014 (OR = 0.54, p = 2.08 × 10 −5 ) associated with differential splicing isoforms of IKZF1 14 . We previously performed a meta-analysis of several ethnic groups (not only Japanese, but also Korean, Indian, and Brazilian) and reported that the IKZF1 gene, especially rs4917014 SNP (OR = 0.5, p = 8.46 × 10 −11 ) 14 , had a significant genome-wide association with CM-SJS/TEN with SOC. In addition, we recently found significant association between CM-SJS/TEN with SOC and IKZF1 gene including rs4917014 SNP in Thai population (in submission). We suggest that IKZF1 might be a universal marker for susceptibility to CM-SJS/TEN with SOC 14 . Our functional analysis of SNPs of the IKZF1 gene including rs4917014 revealed that the ratio of the splicing isoforms Ik2/Ik1 could be affected by IKZF1 SNPs and we found that the quantity of Ik2 isoform, which might dominant-negative, is increased in disease-protective genotypes of IKZF1 (rs4917014 G/G) 14 , suggesting that the function of Ikaros, the protein of IKZF1, is enhanced in CM-SJS/TEN with SOC 14 . Moreover, we produced K5-Ikzf1-EGFP transgenic (Ikzf1-Tg) mice by introducing the Ik1 isoform into cells expressing keratin 5, which was expressed in epithelial tissues such as the epidermis and conjunctiva and found that mucocutaneous inflammation was exacerbated in Ikzf1-Tg mice, suggesting that IKZF1 could play a critical role in maintaining mucocutaneous homeostasis 38 .
We previously performed HLA analysis of CM-SJS/TEN with SOC and reported that in Japanese, HLA-A*02:06 and HLA-B*44:03 were significantly associated with CM-SJS/TEN with SOC 12 , and these associations were not observed in Japanese cases of CM-SJS/TEN without SOC and cold medicine-unrelated (other medicinerelated) SJS/TEN with SOC 12 . Therefore, there might be different genetic predisposition between CM-SJS/TEN with SOC and cold medicine-unrelated (other medicine-related) SJS/TEN with SOC.
Moreover, we also reported that PGE 2 acts at EP 3 in epidermis and mucosal epithelium and negative regulates mucocutaneous inflammation. CM such as acetaminophen, NSAIDs, and multiingredient cold medications similarly down-regulate PGE 2 which suppress mucocutaneous inflammation. We also suspect that not only CM but also some viral or microbial infections might be important to develop CM-SJS/TEN with SOC. When individuals with a genetic background containing SJS/TEN with SOC susceptibility factors are infected by some viral or microbial infection, they develop abnormal immune responses, and then the administration of cold medicine can down-regulate PGE 2 inflammatory suppressing mechanism and might augment abnormal immune response resulting in the induction of SJS/ TEN with SOC. In contrast, individuals with no genetic susceptibility factors develop a normal immune response upon microbial infection, and the administration of cold medicine has no effect outside of its normal function 36 .
To the best of our knowledge, this is the first study on CM-SJS/ TEN with SOC that employed the WGS approach. This enabled extensive discovery of variants, including common and rare SNVs and a variety of SVs. Genome-wide association testing for these Partial coincidence between the 100 SNPs with the strongest association with susceptibility to CM-SJS/TEN with SOC (dots shown in yellow) and that with endogenous expression levels of HLA-A in whole blood (dots shown in blue). d Significant association between rs12202296 and the expression level of HLA-A in whole blood. CM-SJS/TEN with SOC-susceptibility allele rs12202296 is denoted as the C allele. e Significant association between rs6500265 and the expression level of BRD7 in whole blood. CM-SJS/TEN with SOC-susceptibility allele rs6500265 is denoted as the T allele.
variants revealed the regulatory role of common variants, as well as the contribution of rare and structural variants in the development of CM-SJS/TEN with SOC. This study has shed light on the genomic architecture of immune-mediated complex disease and will hopefully be a foundation for more such studies in the future.

Study subjects
In total, 133 patients with CM-SJS/TEN with SOC and 418 healthy volunteers were enrolled in this study. All participants are Japanese descent. DNA sample from 133 patients and all control samples have collected in our previous study 15 . The protocol of the study was approved by the institutional review board at Kyoto Prefectural University of Medicine and the University of Tokyo. Adult patients provided written informed consent, and if the patients were minors (<20 years old), parental consent was obtained. Diagnosis of SJS/TEN by the ophthalmologists was based on a confirmed history of acute onset of high fever, serious mucocutaneous illness with skin eruptions, and involvement of at least two mucosal sites including the ocular surface 39 . We defined patients with SOC as those who manifested pseudomembranes and epithelial defects on the ocular surface (cornea and/or conjunctiva) in the acute stage 40 , and as patients with ocular sequelae such as severe dry eye, trichiasis, symblepharon, and conjunctival invasion into the cornea in the chronic stage 41 .
We have classified the patients who had taken cold medicines such as NSAIDs and multi-ingredient cold medications for a few~several days before the disease onset for common-cold symptoms as CM-SJS/TEN. The specific drugs they used were not named by all patients 15 . In Japan, doctors in hospital usually prescribed cold medicine such as NSAIDs and acetaminophen with some antibiotics. However, we previously reported that the use of cold medicine such as NSAIDs and cold-remedies was significantly associated with SJS/TEN with SOC, while the use of antibiotics was not associated 6 .

Sample preparation and WGS
DNA library preparation and WGS were performed by a contracted laboratory (Toshiba Inc., Tokyo, Japan). DNA libraries were prepared using the TruSeq ® DNA PCR-Free Sample Prep Kit (Illumina, San Diego, CA), according to the manufacturer's instructions. WGS was performed using the Illumina HiSeqX (Illumina, San Diego, CA) with 150-bp paired-end reads. The resultant fastq-formatted raw sequence reads were then shipped to our laboratory.

Discovery and genotyping of polymorphisms
Discovery of SNVs and short INDELs was performed using the Haplotype-Caller implemented in GATK3.8, according to the recommended protocol 43 . Genotype calling was jointly carried out by GenotypeGVCFs. Variant quality score recalibration (VQSR) was applied to the raw genotype call set using the HapMap and International 1000 Genomes Omni2.5 sites as true datasets, the high-confidence SNPs of 1000 G sites as training datasets, and the dbSNP138 sites as known datasets. All variants that passed the 99.9% sensitivity filter were used for subsequent analyses.
Discovery and genotyping of MEI were performed using MELT version 2.1.4 46 , according to the author's instructions. The filtering criteria provided by MELT were applies, and loci without PASS in the FILTER field of the VCF file were excluded from downstream analyses. STR variants were analyzed using HipSTR v0.6.1 47 using the reference files of STR regions provided by the author. PCR stutter models and STR alleles were estimated without external references. STR loci were filtered using the filter_haploid_vcf.py program bundled with HipSTR with options (--min-call-qual 0.9 --max-call-flank-indel 0.15 --max-call-stutter 0.15).

Association test for single variants
Association tests of case/control samples were conducted for SNVs, short INDELs, and other SVs. The following filters were applied before the association tests: call rate < 95%, minor allele frequency (MAF) < 1% in cases and controls, and multi-allelic loci except for STRs, low-complexity regions, or Hardy-Weinberg equilibrium (HWE) p value < 1 × 10 −5 . For STR multiallelic loci, we filtered out loci in which the frequency of the most common allele was larger than 99%, instead of the MAF < 1% criterion. Statistical association was tested under the null model in which there are no difference in the allele count between case and controls for each variant. For biallelic variants (SNVs, short INDELs, DUPs, DELs, INVs, and MEIs), chi-squared tests with 2 × 2 contingency tables were performed using plink1.9 48 . For STR loci, we tested the independence of allele counts between the case and control samples using chi-squared tests with 2 × n contingency tables, where n is the number of alleles in a locus calculated by the in-house program.
Association test for rare variants EPACTS 3.2.6 (https://github.com/statgen/EPACTS) was used for genebased tests. Variants with call rates < 97%, MAF > 5%, or HWE p value < 1 × 10 −5 were excluded from this analysis. We evaluated the cumulative effect of rare variants that affect the protein peptide (non-synonymous or stop gain) or RNA splicing (splice site) (in which the variant impact was flagged as "HIGH" or "MODERATE" by the SNPeff program 49 ) with the hg19 dataset using Wald's test and optimal sequence kernel association test (SKAT-O) 50 . These high-impact variants were grouped by gene.