East Asian-specific and cross-ancestry genome-wide meta-analyses provide mechanistic insights into peptic ulcer disease

Peptic ulcer disease (PUD) refers to acid-induced injury of the digestive tract, occurring mainly in the stomach (gastric ulcer (GU)) or duodenum (duodenal ulcer (DU)). In the present study, we conducted a large-scale, cross-ancestry meta-analysis of PUD combining genome-wide association studies with Japanese and European studies (52,032 cases and 905,344 controls), and discovered 25 new loci highly concordant across ancestries. An examination of GU and DU genetic architecture demonstrated that GUs shared the same risk loci as DUs, although with smaller genetic effect sizes and higher polygenicity than DUs, indicating higher heterogeneity of GUs. Helicobacter pylori (HP)-stratified analysis found an HP-related host genetic locus. Integrative analyses using bulk and single-cell transcriptome profiles highlighted the genetic factors of PUD being enriched in the highly expressed genes in stomach tissues, especially in somatostatin-producing D cells. Our results provide genetic evidence that gastrointestinal cell differentiations and hormone regulations are critical in PUD etiology.


Supplementary Note
Plausible pathways suggested by GWAS and pQTL analysis.
By searching for pQTL associations [1][2][3][4][5] , we found that the PUD risk alleles of lead variants at ABO and GGT1 were linked with a reduced level of coagulation factor VIII (F8) and increased levels of factor X (F10) and PROS1(cofactor to activated protein C in the degradation of factor VIII).The risk allele of a lead SNP (rs1801020; F12) identified in the cross-ancestry meta-analysis is associated with decreased plasma levels of factor XII 6 .Factor XII, VIII, X, and PROS1 are all involved in the intrinsic pathway of blood coagulation 7,8 .These suggest that blood coagulation may be involved in peptic ulcer bleeding and healing.It is also likely that these signals were identified due to selection bias since PUD patients with severe symptoms are more likely to be detected than those without such severity.Additionally, in the pQTL study 1 that observed the proteomic associations with ABO blood groups and FUT2 secretor status, we found 31 proteins to be significantly associated with secretor status and non-O blood groups in the same direction (Methods).Notably, five proteins (CBLIF, CRNN, DSG2, REG1A, and REG1B) showed directionally concordant associations with secretor status and all three non-O blood groups (A, B, and AB) (Supplementary Table 19).

Genetic correlations between PUD and its risk factors
It has long been known cigarette smoking increases the risk for PUD 9 .In the genetic correlation analysis, we detected a nominally significant (P<0.05)negative correlation between PUD and age at smoking initiation, but such correlation was not detected for PUD and cigarettes per day, which suggest that PUD might share genetic components with long-term smoking behavior.We did not detect significant genetic correlations between PUD and drinking-related traits.The PUD risk allele for rs3859862 at GGT1 (gammaglutamyl transferase 1) identified in this study was linked with the alleles of a cis-eQTL and a cis-pQTL for GGT1 that decrease the expression of GGT1 in stomach and protein level of GGT in serum.It has been shown that alcohol consumption and smoking could lead to an increased level of GGT 10,11 .It is worth investigating if GGT plays a role in the association between PUD and smoking/drinking.Additionally, we identified nominally significant genetic correlations of PUD/GU with chronic obstructive pulmonary disease (COPD), asthma, and rheumatoid arthritis (RA) in EAS (Supplementary Figure 12; Supplementary Table 21).A previous large-scale meta-analysis of asthma 12 identified the significant genetic correlation between asthma and PUD in addition to the significant genetic correlation of asthma with COPD or RA.The consistent observations suggested a genetic link between PUD and immune-related diseases, which has not been well studied yet and warranted further investigations.

Potential roles of D cells and EC cells in PUD etiology
Gastrointestinal D cells are estimated to secret ~65% circulating somatostatin, suppressing the release of gastric hormones and gastric acid 13,14 .The PUD-associated rs2233580 in PAX4 is a missense variant predicted to be highly deleterious (CADD > 20).It has been shown that PAX4 is a transcriptional repressor for somatostatin 15 and regulates duodenal hormone-secreting cells and serotonin/somatostatin-producing cells of the distal stomach 16,17 .We also detected significant associations of PDX1 in gene-based tests with independent signals (rs139276646) in its regulatory region.PDX1 activates somatostatin transcription by interacting with its promoter 18 .These suggested that D cell/somatostatin dysregulation may contribute to PUD development.EC cells are the predominant source of body serotonin and play key roles in the gutbrain axis as chemosensors 19 , affecting a wide range of physiological processes, including gastrointestinal motility and secretion, nausea, and visceral hypersensitivity.Psychological conditions, including stress and depression, were associated with a higher risk of PUD 20,21 .A previous large-scale study identified the causal effect of major depression (MD) on PUD 22 .Given the important role of serotonin in psychological conditions, the association of EC cells/serotonin with PUD, and the bidirectional effects of the brain-gut axis, it is worth further investigating whether serotonin might be a key factor in the link between depression and PUD.

Comparison of fixed-effect and random-effects models in the metaanalysis of PUD in EAS
We conducted a random-effects meta-analysis using GWAMA 23 for PUD in EAS. 15 out of the 17 significant loci from the fixed-effect meta-analysis still showed significant association (P < 5 x 10 -8 ) under the random-effects model (Supplementary Figures 33), with the other two loci being observed at the suggestive level (P < 5 x 10 -6 ).We note that the analysis under the random-effects model was substantially underpowered (λGC = 0.889) compared with the fixed-effect model (λGC = 1.052).In the fixed-effect model, LD score regressions also supported no substantial inflation for the statistics obtained by the fixed-effect model (interceptEAS, PUD = 1.02).Based on these results, we selected the fixed-effect approach for population-specific meta-analysis of PUD in this study.

Comparison of GWASs using GRCh38-based datasets and GRCh37based datasets
To investigate the potential benefits of using imputation panels based on GRCh38/hg38, we processed and extracted the unrelated 2504 samples from the 1000 Genomes High Coverage datasets (GRCh38/hg38) as in the 1000 Genomes Project Phase3v5 panel (GRCh37/hg19).BBJ1-12K and BBJ2-42K were additionally imputed using the 1000 Genomes Project panel (GRCh38/hg38).We then conducted GWAS with the same settings and additional meta-analyses using the GRCh38-based datasets and GRCh37-based datasets from BBJ1-12K, BBJ2-42K, and FinnGen for comparison.The meta-analysis using GRCh38-based datasets showed overall high consistency of -log10(P) values (r = 0.92) with GRCh37-based results and consistent genomic inflation factors λgc, with substantially high consistency for significant variants (r = 0.9991).No additional novel loci were identified in GRCh38-specific regions.(Supplementary Figures 34 -35)

Cross-ancestry meta-analysis quality control
Only autosomal variants in the 1KGp3v5 dataset were included in the meta-analysis; all variants were normalized 28 , duplicate and multiallelic variants were removed for each dataset, and variants with imputation quality scores less than 0.3 were removed.For summary statistics from UKB-SAIGE (imputation quality scores not available), variants included in the previously published GWAS of PUD in UKB were kept, missing information of chromosome and base pair positions were assigned according to rsID, variants with extreme effect size values (|log(OR)| > 10) were removed, variants with minor allele count (MAC) < 5 were removed, and the strand of palindromic variants with MAF < 0.40 was further inferred using the allele frequencies obtained from each population in 1KGp3v5 dataset.Finally, we compared the effect allele frequencies in summary statistics and the population-specific alternative allele frequencies in 1KGp3v5.Variants with deviation in allele frequencies > 0.16 were excluded.In total, more than 19 million variants were included in the meta-analysis.

ABO blood group and secretor status interaction analysis
Logistic regression analyses adjusting for age, sex, and top 10 PCs were performed to examine the association of blood group or secretor status with PUD or subtypes.Blood group-specific effect sizes were estimated using the target blood group as exposure and combination of the other three blood groups as nonexposure (for example, individuals with A vs. B, AB, and O).Secretor status effect sizes were estimated considering the non-secretor status as exposure and the secretor status as non-exposure.To investigate the interaction of blood group O with non-secretor status, we further performed logistic regression analyses for blood group O-secretor interaction, adjusting for O blood group, FUT2 secretor status, age, sex, and top 10 PCs.Similarly, the blood group Onon-secretor status interaction was tested using imputed dosages of the variants determining O antigen and secretor status.All the above-mentioned logistic regressions were conducted in R v4.1.0.Additionally, we explored a previous pQTL study that had investigated proteomic associations with ABO blood groups and FUT2 secretor status for proteins associated with secretor status and A, B, and AB blood groups.

EAS LD reference panel construction for SBayesS
Briefly, a full LD matrix on HapMap3 SNPs was computed using 50,000 randomly selected and unrelated East Asian individuals from BBJ1-180K.The off-diagonal entries of full LD matrix were shrunk with the interpolated genetic map for the 1000 Genomes Project JPT population (https://github.com/joepickrell/1000-genomes-genetic-maps).The effective population size and genetic map sample size were set to 11,600 and 100, respectively, according to 1000 Genomes Project phased OMNI data (http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates/).The sparse shrunk LD matrix used for SBayesS was created by setting elements of the shrunk matrix to zero if their chi-squared statistic under the sampling distribution of the correlation coefficient did not exceed 10.
The MHC region was excluded.We ran four parallel MCMC chains with a length of 50,000 and a burn-in size of 20,000 for each trait.To evaluate the convergence in MCMC, potential scale reduction statistics for each parameter were computed.Traits with potential scale reduction statistic < 1.2 for all three parameters, including SNP-based heritability, polygenicity estimates, and S, were considered to have good convergence, and were, therefore, used in the study.