Introduction

Mouth ulcer (known as oral ulceration) refers to a lesion that forms on the mucous membrane of the oral cavity. It occurs due to the damage caused to both the epithelium and lamina propria. Mouth ulcer is a non-specific term capturing different entities, but the most common one is aphthous ulceration. Children and adolescents are more likely to develop mouth ulcers, which are the most common ulcerative disease in humans1,2. Even though mouth ulcers do not pose a significant health burden, they can negatively affect your quality of life and overall health3,4,5. Several factors may contribute to the etiology, such as stress, food, trauma, hormonal imbalances, immune disorders, gastrointestinal disorders, and smoking6. The high incidence of mouth ulcers and their adverse effects on quality of life have stimulated much research on the etiology and effective treatment of this disease. Even though several risk factors are simultaneously related to mouth ulcers, the genes responsible for causing mouth ulcers are still unknown. It is therefore necessary to identify molecular features of mouth ulcer pathogenesis to offer a rationale for treatment.

Genome-wide association studies (GWAS) have identified several loci associated with mouth ulcers using high-throughput sequencing technologies7. Even with some efforts, the underlying mechanisms attributed to MU risk have remained elusive, making it difficult to translate identified risk loci into clinical interventions. Meanwhile, proteins are the most efficient biomarkers and therapeutic targets8,9 Since they are primary components of cellular and biological processes and products of gene expression10. Therefore, it is vital to screen the risk proteins in mouth ulcers.

Recently, large-scale quantitative trait loci (QTL) data have been used to examine the association between genotypes and protein abundance (pQTL) and gene expression (eQTL)11,12, which has promoted the emergence of statistical methods that facilitate the combination of multidimensional data13. Wingo et al.14 constructed a new framework named proteome-wide association study (PWAS) to integrate gene and protein expression data with GWAS findings (integrated gene expression data and GWAS results) in depression pathogenesis. In addition, Mendelian randomization (MR) and Bayesian colocalization analyses have been widely employed to screen hub genes by combining QTL and disease-related GWAS data15,16. Altogether, By combining GWAS data with multidimensional QTL data, potential genes contributing to mouth ulcers will be identified.

The data from high-throughput proteomics and blood proteomes were combined in this study to identify potential genes related to mouth ulcers. To identify potential protein biomarkers, we used a three-step approach to systematically link protein biomarkers to mouth ulcers. First, we performed PWAS analysis using GWAS data from oral ulcers and a protein quantitative trait locus (pQTL) dataset obtained from blood. Second, the Mendelian randomization (MR) analysis was performed to validate PWAS significant genes. Third, the COLOC method was used to combine the GWAS data and blood pQTL via Bayesian co-localization analysis to determine whether the two correlation signals correlate with a common variation.

Results

PWAS identified 35 genes correlated with MU

By combining MU GWAS data with blood proteomes, the FUSION pipeline was used to conduct the PWAS of mouth ulcers17. Based on the Bonferroni correction threshold of P 0.05/number of genes analyzed, PWAS identified 35 genes with protein abundances related to oral ulcers (Fig. 1 and Supplementary Table 1). These genetic instruments all had F statistics exceeding 10, which indicates strong instruments18. During the study, the researchers created a PPI network (depicted in Fig. 2A and B) and discovered that TLR1, IL1RN, and CTSB were the central genes in the protein interaction network. The GO enrichment analysis demonstrates that 1,170 terms, including 1065 BP terms, 70 CC terms, and 34 MF terms, were enriched. Among these GO categories, defense response to the bacterium, cytokine-mediated signaling pathway, and toll-like receptor signaling pathway was identified (Fig. 3A). The KEGG enrichment analysis19 revealed that four KEGG terms were found, including the Toll-like receptor signaling pathway, Cytokine-cytokine receptor interaction, Tuberculosis, and Pertussis (Fig. 3B). Through the above finding, we found that these genes were involved in inflammation.

Figure 1
figure 1

Manhattan plot for the discovery mouth ulcers PWAS integrating the mouth ulcer GWAS with human blood proteomes. Each point represents a single association test between a gene and mouth ulcers ordered by genomic position on the x axis and the association strength on the y axis as the − log10(P) of a z-score test.

Figure 2
figure 2

Protein–protein interaction (PPI) among 35 genes screened by the PWAS. (A) PPI network; (B) hub genes identified by the degree of PPI network.

Figure 3
figure 3

The functional enrichment analysis among candidate genes. (A) GO terms; (B) KEGG categories.

MR validates 30 genes correlated with MU using blood pQTL

The majority of proteins analyzed could be instrumented with one or more SNPs. Thus, the Wald ratio method, IVW, and MR egger method were primarily used to estimate MR. MR analysis of blood pQTL and mouth ulcer GWAS identified 30 protein biomarkers, which exhibited strong evidence of association [P < 0.001 (0.05/35)] (Table 1). The proteins encoded by eight genes (PMEL, ARFIP1, FAM213A, GLUL, CADM2, FAIM3, RIPK2, and DECR1) have only one SNP instrumental variable and have a association with mouth ulcers using the Wald ratio method. Five genes have two SNP instrumental variables. Among 23 genes with more than 3 instrumental variables, 11 genes (RIPK2, CXCL6, ARFIP1, TDGF1, TIRAP, TNFSF8, TLR1, GLCE, MICA, and CTSS) have a association with mouth ulcers using the MR egger method. Most of the 11 genes except for CADM2, CXCL6, and ARFIP1 have no horizontal pleiotropy using the inception of MR egger with a P value > 0.05.

Table 1 Candidate genes identified by Mendelian randomization.

Colocalization between MU risk genes and pQTL

Our study analyzed the posterior probability of a common variant between a pQTL and mouth ulcers for these biomarkers that satisfied the Bonferroni-corrected P-value threshold in previous MR analyses. However, only six genes (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) assembled the criterion (PPH4 > 75%) in the analysis of mouth ulcers, indicating a shared single variant with mouth ulcers (Table 2). Through colocalization analysis, six SNP (rs9393711, rs4921484, rs6127742, rs12262228, rs28573806, and rs7749390) were significantly associated with six genes (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2), respectively (Fig. 4).

Table 2 Candidate genes identified by Bayesian colocalization.
Figure 4
figure 4

Illustration of the colocalization results. (AF) Is for BPI, BTN3A3, FAM213A, IL12B, IL22RA2 and PLXNB2, respectively.

Discussion

Based on MU GWAS data and blood-derived proteome data, a combination of PWAS, MR, and Bayesian colocalization analysis was used to screen crucial genes for mouth ulcers. PWAS analysis revealed that 35 variations in the gene expression were associated with mouth ulcers. In the MR analysis, 30 variations in the gene expression have a association with mouth ulcers. Ultimately, we identified 6 potential risk genes (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) of mouth ulcers with altered protein abundances in the blood. Research into these genes may provide mechanistic and therapeutic targets.

Human genetics research aims to identify therapeutic targets for diseases, which is especially essential for mouth ulcer research. Of these identified genes, IL12B was associated with the risk of recurrent Oral ulcers20 and peptic ulcer disease21. Meanwhile, IL12B was closed to inflammation development22,23. It has been reported that overexpression of BPI inhibits Treg differentiation and intrigues exosome-mediated inflammatory responses in systemic lupus erythematosus24. Oh et al., demonstrated that FAM213A is associated with the prognostic significance of tumor development through regulation of oxidative stress, such as myelopoiesis25 and oral carcinoma26. Zhang et al.27 reported that the synergistic effect of CD100 and PlxnB2 promotes the inflammatory response of keratinocytes through the activation of NF-κB and NLRP3 inflammasomes and is involved in the pathogenesis of psoriasis. Several studies demonstrated that IL22RA2 was involved in the inflammation process28,29,30. The expression of several butyrophilin (BTN) and butyrophilin-like (BTNL) molecules was significantly altered by inflammation, including BTN1A1, BTN2A2, BTN3A3, and BTNL831, and associated with tumor development32,33,34.

Several advantages can be drawn from our study. First, Our PWAS for mouth ulcers included the largest and most comprehensive human proteome and GWAS data. Meanwhile, by integrating multidimensional QTL data, we were able to gain a comprehensive understanding of the complex biology of MU in blood. Second, by using Bayesian colocalization, two correlated signals with common causal variants can be estimated at specific sites, and the causative proteins of oral ulcers (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) were validated. Finally, this study analyzed protein levels associated with oral ulcers using PWAS. Blood protein screening for mouth ulcers may help provide greater insight into those at high risk for MU recurrence.

There are also some limitations in our study. First, pQTL mapping does not resolve all GWAS signatures. It is difficult to explain how genes are involved in the biological development of oral ulcers at a single level, such as the protein level. It is necessary to conduct more epigenetic studies, such as mQTL analysis, single-cell sequencing, and whole-genome sequencing, to design tailored treatments and fully understand mouth ulcer molecular mechanisms. Second, A larger MU GWAS dataset will be necessary for the validation of our analysis, since we only analyzed one mouth ulcer dataset. Third, it is not sufficient to elucidate the numerous MU GWAS-recognized motifs at the protein and transcriptional levels. To gain a deeper understanding of disease progression, methylation data can be integrated into the analysis. Fourth, considering other races should be taken into consideration when extending our findings. Additionally, this research examines predicted protein levels using an in-silico investigation. To enhance the validation of the results, it would be more desirable to have an independent sample with measured proteomics instead of predicted. Functional genomics and biological experiments must be conducted to elucidate and understand the molecular mechanisms underlying mouth ulcers. Fifth, functional studies and/or genetic evidence suggest that although the identified genetic variations are directly involved in the pathogenesis of mouth ulcers, the underlying mechanisms of the disease are multifactorial, and need to be taken into consideration.

In conclusion, we found strong evidence supporting six novel blood proteins (BTN3A3, IL12B, BPI, FAM213A, PLXNB2, and IL22RA2) associated with mouth ulcers. In our study, we provided suggestions for future biological and therapeutic studies to verify their potential roles in MU.

Method

The present MR analysis was based on summary data from previous studies35,36 that had gained written informed consent and ethics approval. No ethical permit is required for the secondary analysis of summary data.

Mouth ulcers of GWAS data

A GWAS summary for mouth ulcers was collected from the UK Biobank (UKB) of European ancestry for the present study7. By completing questionnaires, and interviews, completing physical measurements, and donating biological samples, participants provided information pertinent to health outcomes in adulthood and later life (data showcase available at http://www.ukbiobank.ac.uk)37. The UKB was used for GWAS on mouth ulcers, in which all participants completed a baseline questionnaire regarding oral health. The term "Mouth ulcers (yes/no)" was defined as having had mouth ulcers within the past year.

Human blood proteomic data

The serum proteomic data was obtained from a large population-based study (Atherosclerosis Risk in Communities (ARIC) study; N ~ 9000)36. In 1987 and 1989, 15,792 participants were recruited from four communities in the U.S. for the ARIC study: Forsyth County, North Carolina; suburban Minneapolis, Minnesota; Washington County, Maryland, and Jackson, Mississippi. Blood samples for proteomic measurements were acquired during the third visit (v3) in 1993–1995. After excluding participants without genotype data, the current study retained 9084 participants with plasma protein data. The modified aptamer (‘SOMAmer Reagent’, hereafter referred to as SOMAmers) is a proteomics analysis platform for the determination of serum levels of 4,657 human serum proteins.

Statistical analysis

Proteome‑wide association studies (PWAS)

Using FUSION, PWAS was performed38. FUSION was used to estimate the effect of SNPs on protein abundance for proteins with significant heritability (heritability P < 0.01). Several predictive models were used in the analysis, including top1, blup, lasso, enet, and bslmm38. A selection of protein weights originated from the comprehensive predictive models. We then integrate the genetic effect of oral ulcers (mouth ulcer GWAS z-score) with protein weights for PWAS of oral ulcers using FUSION. By summing Z-scores and weights of independent SNPs on the locus, a linear sum is calculated. To reduce false positives, Bonferroni-corrected P values were used. Benjamini-Hochberg (BH) method was also used to impute the P value when the false discovery rate was adjusted.

Mendelian Randomization (MR) analysis

Through its cis-regulated protein abundance, the PWAS significant genes obtained from the FUSION method were related to mouth ulcers. The most significant genome-wide SNPs (P < 5 × 10–6) were targeted and LD clustering was used to determine independent SNPs (R2 > 0.01). Data from QTLs and MU GWAS were harmonized following the same effect alleles. When only one independent QTL is obtained, Wald ratios were employed to estimate the association between the mouth ulcer and genes. In cases where multiple SNPs are available, the ratios of SNP exposures to SNP outcomes were combined using the inverse variance weighting (IVW) method for random-effects meta-analysis. In addition, when the number of genes exceeded three, horizontal pleiotropy was tested using the MR-egger method39,40. Bonferroni correction thresholds for the number of genes analyzed were set at P < 0.05/multiple comparisons. Using R version 4.0, the two-sample Mendelian randomization analysis was performed using "TwoSampleMR" version 0.5.5.

Bayesian colocalization analysis

To determine whether the same causal signal shared MU risk loci and pQTL, we used the Coloc Bayesian test for colocalization17. It was defined that the default COLOC prior should be P1 = 10–4, P2 = 10–4, and P12 = 10–5, where P1 is the probability that a specific variant causes a mouth ulcer, P2 is a measure of the likelihood that a variant in a mouth ulcer correlates with a significant pQTL, and P12 indicates the probability that a specific variant shares common pQTL in a mouth ulcer. Five mutually exclusive hypotheses were tested: H0, no relationship with either GWAS or pQTL; H1, relationship with GWAS and no relationship with pQTL; H2, relationship with pQTL and no relationship with GWAS; H3, relationship with GWAS and pQTL, two independent SNPs; and H4, relationship with GWAS and pQTL, one shared SNP. The posterior probability (PP) method is used to estimate H4 support (denoted as PPH4). Co-localization is defined as strong when PPH4 ≥ 0.7541.