GWAS of peptic ulcer disease implicates Helicobacter pylori infection, other gastrointestinal disorders and depression

Genetic factors are recognized to contribute to peptic ulcer disease (PUD) and other gastrointestinal diseases, such as gastro-oesophageal reflux disease (GORD), irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD). Here, genome-wide association study (GWAS) analyses based on 456,327 UK Biobank (UKB) individuals identify 8 independent and significant loci for PUD at, or near, genes MUC1, MUC6, FUT2, PSCA, ABO, CDX2, GAST and CCKBR. There are previously established roles in susceptibility to Helicobacter pylori infection, response to counteract infection-related damage, gastric acid secretion or gastrointestinal motility for these genes. Only two associations have been previously reported for duodenal ulcer, here replicated trans-ancestrally. The results highlight the role of host genetic susceptibility to infection. Post-GWAS analyses for PUD, GORD, IBS and IBD add insights into relationships between these gastrointestinal diseases and their relationships with depression, a commonly comorbid disorder.


. 65
A previous study reported rs10512344 to be the only SNP associated at the level of genome-wide significance (reported P = 3.6E-8) in females with IBS using UKB self-reported illness (Data Field: 20003) data 1 . However, the P value for this SNP in our analyses is 5.0E-5 (4.4E-7 in females, Supplementary Fig. 10b). Given that they used a self-report phenotype while we used a combination of self-reported, hospital admission and primary care diseases diagnoses, we conducted a sensitivity analysis. We first removed individuals with diagnosis records from at least two resources and identified individuals with a diagnosis record from only one resource. For each of the IBS, PUD and GORD, We then regenerated three subgroup phenotypes using cases from oneresource diagnosis record and controls from the original phenotype (Supplementary Fig. 10a). We conducted GWAS analyses (Methods) and investigated the summary statistics for rs10512344 among the GWAS analyses of the original IBS phenotype and the three IBS subgroup phenotypes (Supplementary Fig. 10b). We also investigated the SNP-based heritability and genetic correlation among the three subgroup phenotypes for each of the IBS, PUD and GORD (Supplementary Fig. 10c-d and Supplementary Table 8-9). Results showed that the three subgroup phenotypes for each of IBS, PUD and GORD are highly genetically correlated with each other (Supplementary Fig. 10d and Supplementary Table 9) (and from simulation and theory this overlap is not a reflection of the use of shared controls 2 ). Our results do not support a robust association of rs10512344 with IBS.

Supplementary Note 3: Sensitivity analysis for SNP-based heritability and genetic correlation analyses.
As a sensitivity analysis, SNP-based heritability (ℎ !"# $ ) and genetic correlation (rg) analyses were conducted for PUD, GORD, IBS and IBD using the phenotypes generated after excluding individuals with more than one of the four gastrointestinal (GI) disorders (defined as sensitivity analysis phenotypes). The number of overlapped individuals for PUD, GORD, IBS and IBD case are in Supplementary Fig. 9a. As expected, the ℎ !"# $ estimates on the observed scale for these disorders were lower due to the excluded individuals from the cases of original phenotypes but still significantly different from zero (Supplementary Fig. 9b); conversion to liability scale is difficult after case exclusion as it contravenes underlying assumptions of the transformation. We then calculated the rg within sensitivity analysis phenotypes, between sensitivity analysis phenotypes with traits from LD Hub and nine psychiatric and neurologic diseases from published studies (Supplementary Fig. 9c) (genetic correlations are robust to case/control ascertainment strategies 2 ). The rg within sensitivity analysis phenotypes showed high rg among PUD, GORD, IBS and all low non-statistically significant rg with IBD. The rg between GORD and PUD is 0.38 (SE = 0.08, P = 3.6E-6) and the rg between GORD and IBS is 0.47 (SE = 0.08, P = 3.4E-10), which are lower to the original results. The rg between PUD and IBS is 0.25 (SE = 0.12, P = 0.034), of which the original rg is 0.49 (SE = 0.08, P = 2.0E-10). As shown in Supplementary Fig. 9d, the number of overlapped individuals for PUD and IBS is 1,740, however, we over-removed 4,751 individuals for PUD and 6,323 individuals for IBS due to the overlap with the other two GI disorders (GORD and IBD, shown in Supplementary Fig. 9a). The number of total PUD and IBS cases is 16,666 and 29,524 and these over-removed individuals occupy ~1/4 for PUD cases and ~1/5 for IBS cases, resulting in reducing power to estimate the rg. Thus we only removed the only 1,740 overlapped individuals for PUD and IBS and re-calculated the rg between PUD and IBS, as shown in Supplementary Fig. 9e. The rg is 0.33 (SE = 0.09, P = 3.0E-4). PUD, GORD and IBS sensitivity analyses phenotypes showed statistically significantly rg with depressive symptoms while there is no statistically significant rg between IBD and depressive symptoms. The ℎ !"# $ for sensitivity analyses phenotypes are in Supplementary Table 6. The rg within sensitivity analysis phenotypes and between sensitivity analysis phenotypes and traits from the nine published psychiatric and neurologic studies are in Supplementary Table  7 and 11. The rg between sensitivity analysis phenotypes with traits from LD Hub are in Supplementary Data 4.

Supplementary Note 4: Sensitivity analysis for Mendelian Randomization between major depression and PG+M.
Given the statistically significant results between MD and PG+M from bidirectional GSMR analyses, we also conducted bidirectional MR analyses between MD and PG+M using the TwoSampleMR package (https://github.com/MRCIEU/TwoSampleMR). For each of MD and PG+M GWAS summary statistics, we first generated the independent loci using PLINK(v1.90b) 3 (--clump-p1 5.0E-8 -clump-p2 5.0E-8 -clump-r2 0.01clump-kb 1000) and the genotype data (8,545,065 SNPs with MAF > 0.01) of 20,000 randomly sampled unrelated European individuals from UKB as a LD reference. Only the most significant SNP across the MHC region was retained. For each of the genetic instruments (i.e., SNPs), we extracted the allele, effect size, standard error and P value from the exposure GWAS summary statistics. We also extracted the corresponding information from the outcome GWAS summary statistics for these genetic instruments. If a SNP was unavailable in the outcome GWAS summary statistics, we identified proxy SNP with a minimum LD r 2 = 0.7. For each direction of potential influence, we combined MR estimates using inverse variance-weighted (IVW) 4 analysis, which essentially translates to a weighted regression of SNP-outcome effects on SNP-exposure effects where the intercept is constrained to zero. The IVW method will return an unbiased estimate if there is no or balanced horizontal pleiotropy. To account for this, we compared results from IVW method with results from MR Egger 5 and weighted median method 6 , from which the estimates are known to be relatively robust to horizontal pleiotropy, though at the cost of reduced statistical power. To assess robustness of significant results, we also conducted the MR Egger intercept test for horizontal pleiotropy. We also applied MR-PRESSO 7 (Pleiotropy Residual Sum and Outlier) to detect and correct for any outliers reflecting likely pleiotropic biases for all reported results. The IVW results showed bidirectionally statistically significant results, of which the pattern is similar as the GSMR results (Supplementary Fig. 15). The MR-Egger intercept test showed no statistical significance, suggesting that there is no horizontal pleiotropy (Supplementary Table 22). There is no outliers being removed after MR-PRESSO analyses. LCV method 8 is designed to separate confounders from causality and hence is more likely to differ from MR where there is a unidirectional MR result. We used the LCV method to explore the relationship between MD and PG+M, following the instructions from https://github.com/lukejoconnor/LCV. Briefly, we used the munged file from LDSC for MD and PG+M, together with the provided LD score from 1000 Genomes Europeans data (eur_w_ld_chr, MHC region removed), as input. After selecting SNPs with MAF > 0.05 and sorted the SNPs from GWAS summary statistics by genomic region, the RunLCV() function was used for analysis. As expected ,the genetic causality proportion is not significant for PG+M and MD because of the strong bidirectional significance (Supplementary Table 24).

Supplementary Note 5: Sensitivity analysis for Mendelian Randomization between major depression and depression-removed sensitivity GI phenotypes.
As sensitivity analyses, we removed the depression cases (the combined cases from the UKB eight depression phenotypes) from the five GI disorder phenotypes (defined as GI-DepComRMV phenotypes) and conducted GWAS analyses. We repeated LDSC genetic correlation and MR analyses between major depression and these five GI-DepComRMV phenotypes. All genetic correlation results retained a pattern that did not change our interpretation of the original results (Supplementary Table 25). The MR results showed similar pattern results for four GI-DepComRMV phenotypes (GORD, PG+M, IBS and IBD), although the major depression and PUD-DepComRMV became non-significant (Supplementary Table 26). These analyses removed a very high number of cases and controls based on the combined eight depression phenotype cases from PUD and hence the magnitude of the standard errors of the estimates from these sensitivity analyses were large. To gain further insight, we regenerated eight PUD sensitivity phenotypes after removing the cases of each of the eight depression phenotypes from PUD in turn and repeated MR analyses. The MR results were significant for seven of them, as shown in Supplementary Table 27. The MR result were non-significant when cases and controls were removed based on the GPpsy-seen a GP for nerves, anxiety, tension or depression in which only 59% of cases and 65% of controls were retained for analysis.

Supplementary Tables
Supplementary Table 1 Table 7. Genetic correlation estimates for each pair of the five original digestion phenotypes and each pair of the five sensitivity digestion phenotypes from bivariate LD score regression analysis 15 . (see Fig. 4b and Supplementary Fig. 9c1 -0.01 0.005 * In LD score regression the sample overlap is partitioned into the gcov_int (the genetic covariance intercept) for which the expected value is phenotypic correlation * proportion of shred individuals between the two GWAS datasets contributing to the LDSC genetic correlation analysis. † RO: remove individuals with more than one of PUD, GORD, IBS and IBD disorders.
Supplementary Table 8. SNP-based heritability estimates and other parameters from LD score regression 14 for self-report, primary care and hospital admission subgroup phenotypes for each of PUD, GORD and IBS. (see Supplementary Fig. 10c 21 and corresponding number of cases and controls and cells with orange colour represent four digestion phenotypes and corresponding number of cases and controls. Cells with red colour represent the number of overlapped cases and controls for each of 32 digestion-depression phenotype pairs. Abbreviation: Seen general practice (GP) for nerves, anxiety, tension or depression (GPpsy); Seen psychiatrist for nerves, anxiety, tension or depression (Psypsy); Probable recurrent major depression or single probable major depression episode (DepAll); Self-reported depression (SelfRepDep); ICD10 defined depression (ICD10Dep); DSM-V clinical guideline defined major depression (LifetimeMDD); Major depression recurrence (MDDRecur); Seen GP for depression but no cardinal symptoms (GPNoDep); Peptic ulcer disease (PUD); Gastro-oesophageal reflux disease (GORD); Irritable bowel syndrome (IBS) and Inflammatory bowel disease (IBD). Abbreviation: Seen general practice (GP) for nerves, anxiety, tension or depression (GPpsy); Seen psychiatrist for nerves, anxiety, tension or depression (Psypsy); Probable recurrent major depression or single probable major depression episode (DepAll); Self-reported depression (SelfRepDep); ICD10 defined depression (ICD10Dep); DSM-V clinical guideline defined major depression (LifetimeMDD); Major depression recurrence (MDDRecur); Seen GP for depression but no cardinal symptoms (GPNoDep); Peptic ulcer disease (PUD); Gastro-oesophageal reflux disease (GORD); Irritable bowel syndrome (IBS) and Inflammatory bowel disease (IBD). * The direction represents using trait 1 as exposure to investigate the causality hypothesis on trait 2. † The direction represents using trait 2 as exposure to investigate the causality hypothesis on trait 1. ‡ The unit represents per standard deviation change in liability to the exposure trait. § Yellow highlighted cells indicate use of a relaxed significance threshold for genetic instrument inclusion in the Trait2 -> Trait1 analysis, specified in the significance threshold column. # Given the unidirectional statistically significant GSMR results between major depression and PUD (i.e., Major depression -> PUD) and relaxed significance threshold for the reverse direction (PUD -> Major depression) to obtain SNPs > 10, we also conducted GSMR analyses using 8 genome-wide significant SNPs for PUD. It remains unidirectional statistically significant for GSMR between major depression and PUD, suggesting that major depression is putatively causal for PUD. This analyses should be revisited when we have more genome-wide significant SNPs for PUD.

Supplementary
Supplementary 7e-04 9 † * The unit represents per standard deviation change in liability to the exposure trait. † GSMR requires at least 10 SNPs as genetic instrument for exposure. After controlling for EA, BMI and smoking-related traits, only 9 SNPs retained, we used these 9 SNPs as genetic instrument rather than relaxing the significance threshold to obtain more SNPs.  24 (European ancestry and UK Biobank participants were excluded). † The odds ratio of PG+M risk for participants with polygenic score at 10 th decile compared with participants with polygenic score at 1 st decile. ‡ 95% confidence interval for the odds ratio at 10 th decile. Supplementary Table 29. Summary statistics for UKB peptic ulcer diseases genome-wide associated significant SNPs (P < 5e-8) after mtCOJO analyses 23 conditional on UKB gastro-oesophageal reflux diseases GWAS summary statistics (P < 5e-8  Supplementary Fig. 1. Full workflow of the study.

Phenotype
Genotype

European ancestry identification
Risk prediction in relatives and heritability estimation (Fig. 1

Post-GWAS
Comorbidity analyses (Fig. 1) Polygenic score prediction (Supplementary Table 28, Fig. 6c,  6d) PUD GORD PG + M IBS IBD Supplementary Fig. 2. Manhattan plot for IBD. SNPs with red diamond represent genome-wide statistically significant independent loci (P < 5.0E-8) for each trait. SNPs highlighted with yellow colour, bold and italic font represent loci that haven't been reported to be associated with IBD.  Supplementary Fig. 4. Regional association plot for peptic ulcer disease.
Supplementary Fig. 9. SNP-based heritability and genetic correlation for PUD, GORD, IBS and IBD after removing the overlapped individuals using LD score regression analyses 14,15 . Panel a. Venn diagram for the number of overlapped individuals among PUD, GORD, IBS and IBD cases. Panel b. Comparison of SNP-based heritability on observed and liability scale for PUD, GORD, IBS and IBD between the original phenotypes and phenotypes generated after removing individuals with more than one disorder (defined as sensitivity analysis phenotypes). "RO" represents removing overlapped individuals with more than one disorder. We took sample risk, i.e. the proportion cases for each phenotype in the UKB cohort, as the population lifetime risk to calculate the SNP-based heritability on the liability scale for each digestion phenotype; the sample risk percentage is shown below x axis. The error bars represent 95% confidence intervals for the estimated SNP-based heritability. "*" represents that the SNP-based heritability was still significant after Bonferroni correction (P < 0.05/16). Panel c. Genetic correlation between sensitivity analysis phenotypes with the original phenotypes, within sensitivity analysis phenotypes, between sensitivity analysis phenotypes with traits from LD Hub and published neuro-psychiatric disorder studies. "*" represents that genetic correlation estimate was still significant after Bonferroni correction (P < 0.05/(4*4+4*258+4*9)) while "√" represents the P value for genetic correlation between IBS (RO) and Supplementary Fig. 10a. The process for generation of subgroup phenotypes for each of PUD, GORD and IBS. Supplementary Fig. 10b. Summary statistics for rs10512344 from GWAS analyses of IBS and the three subgroup phenotypes of IBS using BOLT-LMM 11 . Supplementary Fig. 10c. SNP-based heritability estimates and 95% confidence intervals from LD score regression 14 for self-report, primary care and hospital admission subgroup phenotypes and the original phenotype for each of PUD, GORD and IBS. "*" represents that the    statistical P value for the corresponding SNP-based heritability less than 0.05 and "**" represents that the statistical P value for the corresponding SNP-based heritability less than 0.05/24 (after Bonferroni correction). Supplementary Fig. 10d. Genetic correlation among self-report, primary care, hospital admission subgroup phenotypes and the original phenotype for each of PUD, GORD and IBS using bivariate LD score regression 15 . Supplementary Fig. 10e. The effect size of genome-wide significant SNPs from broad definition of PUD, GORD and IBS in hospital admission subgroup phenotype versus those in self report subgroup phenotype and primary care subgroup phenotype, respectively. Due to the whole-part relationships for each of broadly defined phenotype and the corresponding three subgroup phenotypes, as expected, the effect size estimates for the SNPs associated with broadly defined phenotype showed high concordance in the GWAS summary statistics of the corresponding three subgroup phenotypes.
Supplementary Fig. 11. Significant heritability enrichment for GORD, PG+M and IBD of functional annotation based on the variants within each category after Bonferroni correction (P < 0.05/(53*5+5*5)). The nonsignificant annotations are not shown.  predicts odds ratio (OR) for PUD (Panel a) and IBS (Panel b) respectively in individuals from GERA cohort using logistic regression model. Polygenic score of individuals from GERA cohort were calculated based on PUD associated SNPs with P < 5.0E-8 and IBS associated SNPs with P < 0.1 from UKB and converted to deciles (1 = lowest, 10 = highest). OR and 95% confidence intervals (CI, orange diamonds and bars) relative to decile 1 were estimated using logistic regression. The blue dashed lines in a represent that compared with the lowest decile, the highest decile have an OR of 1.80 for PUD. The number of PUD cases and controls from GERA cohort were 1,004 and 60,843, respectively. The P value for case-control PUD polygenic score difference from GERA cohort is 2.5E-6. The blue dashed lines in b represent that compared with the lowest decile, the highest decile have an OR of 1.42 for IBS. The number of IBS cases and controls from GERA cohort were 3,359 and 58,488, respectively. The P value for case-control IBS polygenic score difference from GERA cohort is 5.4E-8.