Gene mapping and functional annotation of GWAS of oral ulcers using FUMA software

Oral ulcers not only influence the physical health of patients, but they also interfere with their quality of life. However, the exact etiology of oral ulcers is not clear. To explore the roles of genetic factors in oral ulcers, a genome-wide association study of the condition in European individuals was re-evaluated by the FUMA v1.3.5e online tool. A total of 380 independent significant single nucleotide polymorphisms (SNPs) and 89 lead SNPs were identified in 34 genomic risk loci. Out of these identified genomic risk loci, 280 possible causal genes were pinpointed by positional mapping and expression quantitative trait locus mapping. Among these genes, 216 novel genes were identified. Furthermore, some genomic loci were mapped to a single gene. Functional annotation of these prioritized genes revealed that the immune response pathway was implicated in the onset of oral ulcers. Overall, our findings revealed novel possible causal genes and demonstrated that the immune response has a crucial role in the occurrence of oral ulcers.

Scientific RepoRtS | (2020) 10:12205 | https://doi.org/10.1038/s41598-020-68976-2 www.nature.com/scientificreports/ information 14 . Until now, some tools for bioinformatics analyses have been developed to aid identification of the causal variants associated with traits/diseases [15][16][17] . Nonetheless, some packages also have some defects when perform post-GWAS analysis. For example, DEPICT and INRICH tools do not take local LD into account, which might induce false-positive enrichment analysis. To provide a highly efficient, concise, and easy-to-use tool, Posthuma et al. developed an Internet-based program named FUMA v1. 3.5e (https ://fuma.ctgla b.nl/) that can further explore GWAS data by utilizing multiple biological databases 18 . Furthermore, FUMA can simultaneously carry out functional annotation of candidate SNPs, gene mapping, tissue-expression analysis of prioritized genes, gene set enrichment analysis (GSEA), and interactive visualization. Posthuma and colleagues re-analysed the GWAS results for Crohn's disease, schizophrenia and body mass index by FUMA. They found that FUMA not only validated known candidate genes in these traits, it also identified some additional putative causal genes by eQTL mapping and chromatin interaction mapping 18 . Taken together, FUMA could undertake robust and reliable post-GAWS analyses and provide valuable clues for understanding the genetic mechanism of traits/diseases. We wished to further explore the genetic mechanism of oral ulcers. Previously reported GWAS summary data of oral ulcers 19 were integrated with the published database by FUMA 18 . First, the most likely causal genes associated with oral ulcers were identified by a combination of positional mapping and eQTL mapping. Next, these prioritized genes were dissected to reveal their molecular function and implicated biological pathways in oral ulcers.

Methods
Summary statistics of GWAS for oral ulcers. The GWAS summary data of oral ulcers used in the present study were downloaded from the GWAS ATLAS database 20 . The detailed criteria of sample screening and quality control of SNPs have been presented previously 19 . After quality control of data, 10,599,054 SNPs were used to carry out post-GWAS analyses of oral ulcers. All participants provided their written informed consent. Ethics approval involved in these participants was obtained from the North West Centre for Research Ethics Committee (11/NW/0.382) 19 . All methods used in this study were carried out in accordance with the declaration of Helsinki. The general information of oral ulcers GWAS used in this study was given in Supplementary Table 1.
Identification of genes and their roles in oral ulcers using FUMA. Definition of genomic risk loci based on oral ulcers GWAS. Independent significant SNPs for which P < 5 × 10 −8 and r 2 < 0.6 were identified from GWAS results. Lead SNPs were defined further from these independent significant SNPs if pairwise SNPs had r 2 < 0.1. Genomic risk loci in which SNPs were in LD (r 2 > 0.6) with independent significant SNPs were identified. The maximum distance between LD blocks to merge into a genomic locus was 250 kb. The genetic data of European populations in 1000G phase3 21 were viewed as reference data to conduct LD analyses. Besides, 24 SNPs reported by Dudding and colleagues 4 which reached GWA significant P-values and displayed the same effect directions in different independent populations were defined as lead SNPs, as shown in Supplementary  Table 2.
Gene mapping. We used two methods to map SNPs to genes. First, CADD scores are deleterious scores of genetic variants obtained by 63 functional annotations. Kircher and colleagues proposed that 12.37 could be viewed as the threshold for a deleterious score 22 . Therefore, SNPs were filtered based on a CADD score > 12.37 when undertaking positional mapping. Then, genes in each genomic risk locus were determined by screened SNPs if the physical distance between a SNP and gene was < 10 kb. Second, for eQTL mapping, SNPs were mapped to a gene if these SNPs had significant effects on expression of the gene. eQTL data of 27 tissues (single-cell RNA eQTL 23 , Database of Immune Cell Expression (DICE) 24 , Biobank-Based Integrative Omics Study (BIOS) QTL browser 25 , and GTEx v8 Whole Blood and Minor Salivary Gland 26 ) were used for eQTL mapping. Only significant eQTL values (false discovery rate (FDR) < 0.05) were employed to map SNPs to genes.
Expression (transcripts per million) of prioritized genes in different tissues was estimated from GTEx v8 26 following winsorization at 50 and log 2 transformation with pseudocount 1.
GSEA by the GENE2FUN tool in FUMA. Using hypergeometric tests, the possible biological functions of the genes identified by positional mapping and eQTL mapping were explored further by comparing them with genes in the GWAS catalog 27 , as well as gene sets in WikiPathways 28 and the Molecular Signatures Database (MsigDB) v7.1 29 . Overall, there were 20,260 background genes which were applied to GSEA. Gene sets were reported if they met two criteria: (i) at least two prioritized genes belonged to the gene set; (ii) the adjusted P value of the gene set was < 0.05. The Benjamini-Hochberg correction 30 was used to assess the statistical significance of inputted gene sets. Next, genetic correlation analyses between oral ulcers and other available traits/diseases were undertaken by the LD hub 31 .

Results
Summary results of GWAS of oral ulcers by FUMA. The summary statistics of GWAS for oral ulcers were explored further by FUMA (Table 1). A total of 380 independent significant SNPs and 89 lead SNPs were identified from GWAS of oral ulcers by FUMA (Supplementary Tables 3, 4). For these 89 lead SNPs, we found one stop gained variant, one splice region variant, one non-coding transcript exon variant, three missense variants, five untranslated region variants, 14 regulatory region variants, 24 intergenic variants, and 40 intron variants (Supplementary Table 4). Dudding et al. conducted GWAS of oral ulcers based on the UK Biobank Project, and found 97 independent lead variants 4 . There were 37 lead SNPs which we also found in comparison with the 97 variants discovered by Dudding et al. (Supplementary Table 4). Furthermore, these 89 lead SNPs could be classified into 34 genomic risk loci (Table 2 and Supplementary Table 5). Compared with the 33 risk loci identified in the original study by Bycroft and colleagues 19 , two novel genomic loci (genomic loci 16 and 18) were identified in the present study. We also plotted the results of these 34 genomic risk loci (Fig. 1 www.nature.com/scientificreports/ that most SNPs and genes were mapped in chromosome 6, followed by chromosome 3 and chromosome 17. Furthermore, 280 prioritized genes that may be involved in the biological mechanism of oral ulcers were recognized by FUMA (Supplementary Table 6). Among these genes of interest, 183 genes and 81 genes were located inside and outside genomic risk loci, respectively. www.nature.com/scientificreports/ Gene prioritization. We pinpointed 280 possible causal genes involved in the genetic etiology of oral ulcers by positional mapping and eQTL mapping (Supplementary Table 6). Even though FUMA provides a chromatininteraction method to map SNPs to genes, chromatin-interaction data related to oral ulcers in FUMA are absent. Therefore, we did not conduct chromatin interaction mapping. For positional mapping, there were 143 genes in 20 genomic risk loci identified by deleterious SNPs. Besides, 262 genes in 28 genomic regions were pinpointed by eQTL mapping. Among these prioritized genes, 125 genes were identified by both deleterious SNPs and eQTL. Out of 34 genomic loci, there were four loci in which no genes were identified; the remaining loci were mapped to at least one candidate gene; nearly half of candidate genes were located in genomic locus 11 (Supplementary Table 6). Compared with findings by Dudding and colleagues 4 , we pinpointed 216 novel genes. Furthermore, we found that six genomic loci were mapped to a single gene (GTDC1 in genomic locus 5, NDUFAF2 in genomic locus 10, IFNGR1 in genomic locus 13, NSMCE2 in genomic locus 20, BLID in genomic locus 23, and CEBPB in genomic locus 34), implying that associations between these loci and oral ulcers were likely attributed to these genes. Furthermore, NDUFAF2, CEBPB and BLID were genes identified for the first time in the present study. NDUFAF2 encodes a complex I assembly factor, which facilitates the translocation of protons from across to inside the mitochondrial membrane. NDUFAF2 was pinpointed by eQTLs in naïve CD8 T cells, indicating that an SNP may affect NDUFAF2 expression in CD8 T cells, which further alters the functions of CD8 T cells. CEBPB, which was identified by eQTLs in whole blood cells, encodes transcription factors that include a basic leucine zipper domain; CEBPB is involved mainly in the regulation of genes related to immune and inflammatory responses. Therefore, abnormal expression of CEBPB might lead to dysregulation of the immune and inflammatory response, and then increase the risk of contracting an oral ulcer. BLID, which was identified by a deleterious SNP, encodes the BH3-like motif acting on cell death. Therefore, BLID may affect the risk of contracting an oral ulcer by regulating cell death. More importantly, we found that GTDC1 in genomic locus 5 was recognized by both deleterious SNPs and eQTL mapping. Therefore, a regional plot of genomic locus 5 was carried out (Fig. 2). Results revealed that GTDC1 was prioritized. In the study by Dudding and colleagues, GTDC1 was also identified by DEPICT software, indicating that GTDC1 might be implicated in the genetic basis of oral ulcers 4 .  Table 7). Most genes showed consistent expression in these 30 tissues. Moreover, 155 genes showed relatively high expression (> 2.84) in salivary-gland tissues.
GSEA. GSEA was undertaken to test the possible biological mechanisms of 280 candidate genes implicated in oral ulcers (Supplementary Table 8). A total of 793 gene sets with an adjusted P < 0.05 were identified against 20,260 background genes. Among these gene sets, the most significant gene set was the gene set involved in the autism spectrum disorder or schizophrenia (adjusted P value = 5.5392E-112), followed by other gene sets involved in other diseases. Furthermore, we found that the pinpointed 280 genes showed strong enrichment signals in gene sets related to the immune response and cytokine regulation, for example: GO_POSITIVE_REGU-LATION_OF_IMMUNE_RESPONSE (adjusted P values = 1.  Table 8). Even so, strong enrichment signals in some T-cell regulatory gene sets were observed in our study and in the study by Dudding and colleagues 4 .
To assess further the genetic associations of oral ulcers and other traits, genetic correlation analyses between oral ulcers and these traits were conducted (Fig. 3 and Supplementary Table 9). We found significant positive correlations between oral ulcers and neuroticism, allergic disease (asthma, hay fever or eczema), depression, monocyte percentage of white cells, and asthma (adult onset). We observed significant negative correlations between oral ulcers and height, moderate to vigorous physical activity, white blood cell count, and granulocyte www.nature.com/scientificreports/ percentage of myeloid cells (P < 0.05). However, there were no significant associations between oral ulcers and asthma (adult onset), height or moderate to vigorous physical activity after using the Bonferroni correction.

Discussion
The factors implicated in RAS are incompletely understood. Possible risk factors are genetic susceptibility, stress, immune-related diseases, as well as a lack of vitamins and minerals 1 . Lake and colleagues assessed the etiology of RAS. They collected incidence information of RAS in twins and their parents. They found that genetic factors contributed to > 60% of variations in RAS onset 32 , implying that genetic factors had pivotal roles in RAS.
Using the UK Biobank Project, Bycroft and colleagues collected the phenotypic data of 500,000 individuals and evaluated associations between genetic data and several phenotypes. For RAS, they found that many genetic variations were involved in the risk of suffering from an oral ulcer 19 . In the present study, to further explore the genetic architecture of oral ulcers, post-GWAS analyses of oral ulcers were carried out to pinpoint possible causal variants and genes using FUMA. Using FUMA, we pinpointed 34 genomic risk loci, including 89 lead SNPs and 380 independent significant SNPs, from GWAS of oral ulcers. Next, these 280 prioritized genes were identified from these 34 genomic risk loci by positional mapping and eQTL mapping. When comparing our results with those of other scholars 4,19 , we not only validated some previous findings, we also obtained novel insights into the genetic architecture of oral ulcers. For instance, 216 out of 280 prioritized genes were not reported in the study by Dudding and colleagues 4 . For these novel identified genes, we found that most genes were located in the human leukocyte region (HLA) region. Genes in the HLA regions might be in strong LD due to complicated LD structure of HLA regions. Thus, most novel genes were found in the HLA region. Moreover, we identified 83 novel genes in other regions. Among these novel prioritized genes, 121 had relatively high expression (> 2.84) in salivary gland tissues, which implied that these genes might have important roles in the onset of oral ulcers. Furthermore, six single genes were found in six genomic regions, respectively. Four genes (NDUFAF2, IFNGR1, NSMCE2 and CEBPB) showed high expression in most tissues. GTDC1 showed intermediate expression in these tissues, whereas BLID showed low expression in all tissues. Therefore, BLID might exert little influence on the onset of oral ulcers given its low www.nature.com/scientificreports/ expression in these tissues. Further functional validations of these novel genes must be conducted to unveil the effect of these genes in oral ulcers. GSEA of pinpointed genes revealed that most gene sets implicated in immune-related biological processes had been identified (Supplementary Table 8). Scholars have pointed out that Th1 type immunologic response is closely related to RAS development, and that RAS patients showed higher secretion of Th1 cytokines than that in healthy controls [33][34][35] . Dudding et al. explored the enrichment patterns of genetic variants related to oral ulcers in regulatory motifs using GARFIELD software. They found that these variants were enriched significantly in DNAsel hypersensitive sites in many T cells, and concluded that identified genes showed tissue-specific expression 4 . We identified GO_T_CELL_ACTIVATION, GO_T_CELL_PROLIFERATION, GO_REGULATION_OF_T_ CELL_ACTIVATION and other T cell-related biological processes, implying that T cells have a crucial role in RAS onset. Furthermore, changes in the microbiome community within the oral cavity have also been viewed as risk factors for oral ulcers 36,37 . Dudding and colleagues pointed out that genetic loci associated with oral ulcers might induce oral ulcers by affecting host microbiome compositions; also, the susceptibility of non-infective factors for oral ulcers was influenced by these genetic loci 4 . Therefore, we inferred that genetic variants in genomic risk loci might interfere with the function of immune-related genes, which increases the risk of contracting oral ulcers by eliciting immune response disorders or affecting other risk factors.
Here, we noted genetic correlations between oral ulcers and other traits, especially oral ulcers and neuroticism and depression. Dudding et al. also found significant positive associations between oral ulcers and neuroticism and depression. Furthermore, they conducted local genetic-correlation analyses between oral ulcers and these two traits by the rho-HESS method, and their results revealed that genetic correlations among these traits was scattered evenly in the whole genome 4 . We inferred that there might be a shared genetic architecture among oral ulcers, neuroticism and depression that contributed to these diseases. Besides, the pleiotropy of complex diseases/traits might also lead to their genetic correlations 20 .
In this study, we identified some novel causal genes and gene sets by conducting post-GWAS of oral ulcers based on FUMA. Our data could provide novel insights into the genetic mechanisms of oral ulcers. Nevertheless, our study had three main limitations. First, patients with an oral ulcer were selected by reviewing questionnaire data. This method may have resulted in the misclassification of recruited individuals. Given the short duration of oral ulcers, some affected individuals might not manifest visible symptoms. Therefore, the misclassification of recruited individuals could not be avoided in our study. Besides, RAS and other types of oral ulcers could not be distinguished from each other based on these investigation data. Even so, genetic factors might show slight effects on other types of oral ulcers. More importantly, Dudding and colleagues proposed that the phenomena mentioned-above could elicit small effects on the GWAS of oral ulcers 4 . Second, GWAS of oral ulcers were undertaken in individuals with European ancestries, which revealed the genetic mechanism of the disease in European populations. However, geographic, ethnic, and dietary differences also exert effects on the genetic background of oral ulcers 1 . Consequently, further research on the genetic architecture of oral ulcers in East Asian (especially Chinese) populations should be carried out. Third, the genetic mechanisms of oral ulcers were explored by bioinformatics analysis only. Further validation by cell or tissue experiments for these identified genes was not undertaken. Some genes may show false-positive correlations with oral ulcers if they are located in a LD block which includes causal genes. Taken together, the gene sets presented in our study (especially those for novel genes) require further functional analyses.

conclusion
We re-processed the GWAS data of oral ulcers using FUMA, and pinpointed some novel genes associated with oral ulcers. Further functional annotations of prioritized genes revealed that immune regulation pathways were implicated in the risk of contracting oral ulcers. Our results can aid clarification of the genetic architecture of oral ulcers.