Sequence variant at 4q25 near PITX2 associates with appendicitis

Appendicitis is one of the most common conditions requiring acute surgery and can pose a threat to the lives of affected individuals. We performed a genome-wide association study of appendicitis in 7,276 Icelandic and 1,139 Dutch cases and large groups of controls. In a combined analysis of the Icelandic and Dutch data, we detected a single signal represented by an intergenic variant rs2129979 [G] close to the gene PITX2 associating with increased risk of appendicitis (OR = 1.15, P = 1.8 × 10−11). We only observe the association in patients diagnosed in adulthood. The marker is close to, but distinct from, a set of markers reported to associate with atrial fibrillation, which have been linked to PITX2. PITX2 has been implicated in determination of right-left symmetry during development. Anomalies in organ arrangement have been linked to increased prevalence of gastrointestinal and intra-abdominal complications, which may explain the effect of rs2129979 on appendicitis risk.

Appendicitis is a condition whereby the inner lining of the appendix becomes inflamed. Appendicitis has a relatively high prevalence; 8.6% of all males and 6.7% of all females will at some point develop appendicitis 1 . Rate of diagnosis peaks in childhood, although around half of cases are diagnosed in adulthood (Supplementary Table 1). The median age of diagnosis for all cases in the US is 21 1 . Appendix perforation presents a significant risk in acute appendicitis, causing release of bacteria into the abdominal cavity and significantly worsening prognosis 2 . Obstruction of the appendiceal lumen appears to play an important role in the pathogenesis of appendicitis 3 , but the condition remains poorly understood 4 .
No sequence variants influencing risk of appendicitis have been reported. A family history of appendicitis greatly increases an individual's risk, and twin studies suggest that genetics account for around 30% of appendicitis risk, with a strong sex-linked effect 5,6 . We estimated the risk ratio among siblings (λ S ) of Icelandic patients with appendicitis (N = 8,160) to be 1.95 by cross-matching with the nationwide genealogy 7 (Supplementary Table 2).
In order to discover sequence variants associating with risk of appendicitis, we performed a genome-wide association study (GWAS) of appendicitis, combining data from Icelandic and Dutch populations.
imputed markers identified through whole-genome sequencing of 15,220 Icelanders and subsequently imputed into chip-typed individuals through long-range haplotype phasing. Genotype probabilities were calculated for first and second-degree relatives of chip-typed individuals 8 . In the Netherlands we test 15,268,903 markers identified through whole-genome sequencing of 249 Dutch trios (GoNL). We combined the results of 11,290,636 markers found in both populations. When testing for association, we used weighted genome-wide significance thresholds depending on variant class 9 .
A sequence variant at 4q25 associates with appendicitis. In the meta-analysis of Iceland and the Netherlands, we found a single genome-wide significant signal at 4q25 (Figs 1 and 2). The signal is represented by the common intergenic variant rs2129979 [G] (MAF ICE = 29.29%; MAF NL = 27.66%) associating with an increased risk of appendicitis (OR = 1.15; 95% CI = 1.10, 1.20; P = 1.8 × 10 −11 ) ( Table 1). Three markers show high correlation with rs2129979 (r 2 > 0.93) in Iceland and in the main 1000Genome populations 10 . The variant is the most significant marker in the Icelandic dataset (OR = 1.14; 95% CI = 1.09, 1.19; P = 3.5 × 10 −9 ), and its effect in Holland is comparable to the one in Iceland (OR = 1.19; 95% CI = 1.07, 1.32; P = 1.1 × 10 −3 ; P het = 0.50). In addition, we called microsatellites and structural variants from the WGS set, but found no such markers that are highly correlated with or more significant than rs2129979. After conditional analysis, no further variants associated with appendicitis.
We assessed the genotypic effect of rs2129979 in Iceland under a full model limited to chip-typed individuals in Iceland (n = 151,677; Supplementary Table 4). The risk of appendicitis is significantly greater for homozygous  The leading variant is shown as a purple circle, and other variants are coloured according to correlation (r 2 ) with the leading marker (legend at top-right). −log 10 P values are shown along the left y-axis, and correspond to the variants depicted in the plot. The right y-axis shows calculated recombination rates at the chromosomal location, plotted as a solid red line. PITX2 is located 177 kb upstream of rs2129979.
We found that rs2129979[G] had a significant and positive correlation with age of appendicitis diagnosis among cases in Iceland (P = 3.9 × 10 −5 ), but a significant association was not seen in the Dutch cohort (P = 0.43). The combined effect on age of diagnosis for both datasets is about two years later per copy of the rs2129979 G allele (P = 7.2 × 10 −4 ). The median age of diagnosis for all cases in Iceland based on hospital record dates was 22 years, consistent with published data from the US 1 . When we divide Icelandic appendicitis patients into age-at-diagnosis quintiles, the appendicitis risk conferred by rs2129979 increased monotonically with age (from 1.03 to 1.30), and was only significant for the 3 strata corresponding to an adult age (Supplementary Tables 1 and 5, Fig. 3). When we stratified our case control test in Iceland for cases above or below the median age of diagnosis, no other signals besides rs2129979 were detected ( Supplementary Figures 1 and 2). No significant differences in variant frequency were found when the Icelandic appendicitis group was subdivided by sex, or presence of peritonitis (data not shown). We found no association of rs2129979 with any of 16 other phenotypes related to diseases of the colorectum, or infectious and inflammatory processes (Supplementary  Tables 6 and 7).   Table 1. Association of rs2129979 and its three fully correlated variants with appendicitis in Iceland and the Netherlands. The minor and major alleles are the same in Iceland and the Netherlands, and all effects are presented for the minor allele. A chi-square test was used to compute P-values. * Imputation information = 1.00; MAF = minor allele frequency; OR = odds ratio; CI = confidence interval. Functional Annotation. The closest protein-coding gene to rs2129979 and its correlates is the homeobox transcription factor PITX2, located around 170 kb upstream from the variant. We examined eQTL data from adipose tissue and blood. We did not find rs2129979 to affect PITX2 expression in adipose tissue (P = 0.66; N = 684), and expression was too low in blood for an analysis to be conducted (N = 2,512). No variants affecting expression of PITX2 are reported by GTEx Portal 11 . Furthermore, rs2129979 did not affect expression of other nearby genes in our data or in GTEx. The three variants are strongly correlated (r 2 ≥ 0.93) with rs2129979 in the four main 1000Genome super-populations (AFR, AMR, EUR, ASN) 10 (Supplementary Table 8), as well as in Iceland (r 2 = 1.00) ( Table 1). The four markers are all located within a 1.6 kb region. All four markers are reported to be within gastrointestinal enhancer histone marks specific to the fetal small intestine, and the three markers correlated to rs2129979 are also within enhancers specific to the fetal large intestine 10 .
Long range expression regulation has been found to occur to a large extent within so called topologically associated domains (TAD) 12 . rs2129979 and the three correlated appendicitis risk variants overlap a 1.5 Mb long TAD (chr4:110579395-112059395), within which PITX2 is the only protein-coding gene.

Relationship of appendicitis and other reported signals at 4q25. Two sequence variants (rs2200733
[T] and rs10033464 [T]) at the 4q25 locus have been reported to associate with increased risk of atrial fibrillation (AF) in the Icelandic population 13 (Table 2). rs2200733 correlates modestly with rs2129979 (r 2 = 0.32), and weakly with rs10033464 (r 2 = 0.04). Although the markers are physically close, the association signals are fully distinct (Fig. 4, Supplementary Figure 3 Table 2). The distance between the reported AF markers and the signal represented by rs2129979 ranges from 236 bp to 80 kb. None of the four variants reported to associate with AF associate with risk of appendicitis in the Icelandic dataset after adjusting for the rs2129979 signal (all P > 1 × 10 −3 ).

Discussion
We detected a significant association between four correlated sequence variants at 4q25 and appendicitis. The closest protein-coding gene is PITX2, encoding the transcription factor pituitary homeobox 2. The region containing PITX2 has previously been described in the context of atrial fibrillation 13,15 , and several papers have provided speculation as to the direct functional involvement of PITX2 16,17 . PITX2 has, in addition, been conclusively demonstrated to be important in the determination of right-left symmetry in development (i.e. situs-specific morphogenesis) 18 . Mutations in PITX2 are the cause of several Mendelian diseases including Rieger syndrome, a morphogenesis disorder that can present with systemic anomalies that include imperforate anus and anal stenosis 19 . Many individuals with situs anomalies also display various intra-abdominal and/or gastrointestinal abnormalities 20 . PITX2 signaling has also been shown to regulate the development of the cecum in mice, an intestinal structure directly connected to the appendix 21,22 , and to regulate gut looping and vascular development in the gut of various animal models 23,24 .
The small intestine and colon are among the tissues in which PITX2 is expressed the most 25 . In the Roadmap data 26 , rs2129979 and its three correlates are assigned an enhancer state specifically in fetal small intestine, and all four overlap a H3K4me1 enhancer histone mark region in intestinal cell lines. Two of the variants are predicted 27 to alter binding of transcription factors with tissue-specific expression in colon and/or small intestine: rs2129979 disrupts a binding site for EVI1 (encoded by the gene MECOM), and rs17042195 introduces an Early B-cell Factor (EBF) binding motif. This collection of evidence suggests that these variants affect appendicitis risk through modulation of tissue specific regulation of the PITX2 gene in intestinal tissues, and in a time-dependent fashion. We note that the second-closest protein-coding gene is ENPEP, but no clear links can be drawn between it and the effects of variants at 4q25 based on the literature.
It is unclear if rs2129979 or correlated variants at 4q25 directly increase the risk of appendicitis, the likelihood of being diagnosed with the condition, or patient prognosis and risk of complications. While the relationship  Table 2. Association of the appendicitis and previously reported atrial fibrillation signals at 4q25 with appendicitis (APP; N = 7,427) and atrial fibrillation (AF; N = 13,471) in Iceland. A chi-square test was used to compute P-values. MAF = minor allele frequency; OR = odds ratio; CI = confidence interval.
between situs anomalies and appendicitis are unclear, we cannot exclude that an anatomical abnormality could affect the likelihood of appendicitis diagnosis. The association of rs2129979 with appendicitis is limited to patients diagnosed above the median age of diagnosis. We note that the incidence of appendicitis is much lower among the adult population where we observe a higher risk conferred by rs2129979.
Our data do not indicate a direct relationship between the signals for atrial fibrillation and appendicitis at 4q25, although the reported appendicitis signal is close to previously reported atrial fibrillation signals. We speculate that both signals are encompassed within an important hub for tissue-specific regulatory elements close to 4q25.
We have discovered an association between a common sequence variant and risk of appendicitis at PITX2. We only detected association for cases diagnosed during adulthood, suggesting different pathogenesis for the disease in different age groups. Further studies are required, however, to elucidate biological mechanisms underlying this association.

Methods
Study subjects from Iceland. This study is based on whole-genome sequence data from the whole blood of 15,220 Icelanders participating in various disease projects at deCODE genetics. In addition, 151,677 Icelanders have been genotyped using Illumina SNP chips and genotype probabilities for untyped relatives has been calculated based on Icelandic genealogy.
All participating individuals who donated blood, or their guardians, provided written informed consent. The family history of participants donating blood was incorporated into the study by including the phenotypes of first-and second-degree relatives and integrating over their possible genotypes.
All sample identifiers were encrypted in accordance with the regulations of the Icelandic Data Protection Authority. Approval for the study was provided by the National Bioethics Committee (ref:VSNb2015100030/03.03). Personal identities of the participants and biological samples were encrypted by a third-party system approved and monitored by the Icelandic Data Protection Authority. The National Bioethics Committee approved the study, including the protocol, methodology and all documents presented to the participants, and all methods were performed in accordance with the relevant guidelines and regulations.
To identify appendicitis cases, we searched for patients with International Classification of Diseases (ICD10) codes, diagnosis codes K35-K38, indicative of appendicitis at Landspitali-The National University Hospital of Iceland in Reykjavik (LUH), a community hospital for half of Iceland's population. The records spanned from 1983-2015. A total of 7,267 appendicitis cases were included in the association analysis; 175 of these were whole-genome sequenced, 3,372 were genotyped using various Illumina chips and imputed using long-range phased haplotypes, and genotype probabilities for 3,720 were imputed on the basis of information from genotyped close relatives. Controls comprised individuals recruited through different genetic research projects at deCODE. Individuals in the appendicitis cohort were excluded from the control group. Of the controls, 7,534 were whole-genome sequened, 134,153 were genotyped by chip, and 185,447 were imputed on the basis of the genotypes of close relatives. The total number of controls was 327,134.
The process used to whole-genome sequence the Icelandic population, and the subsequent imputation from which the data for this analysis were generated has been extensively described in a recent publication 28 . Study subjects from the Netherlands. The Dutch cases consisted of 1,139 individuals with self-reported appendicitis and/or appendectomy. Of those, 659 cases were recruited in a project entitled "Nijmegen Biomedical Study" 29 . The remaining cases were recruited through previously described studies on bladder cancer (225 cases) 30 , prostate cancer (109 cases) 31 , breast cancer (109 cases) 32 and ovarian cancer (37 cases) 33 . The 4,587 Dutch controls were individuals from the Nijmegen Biomedical Study that did not report having had appendicitis or appendectomy.
The study protocols of all the cancer studies and the Nijmegen Biomedical Study were approved by the Institutional Review Board of the Radboud University Medical Center and all study subjects gave written informed consent.
Population structure. To study the population structure and the ancestry of samples in the Dutch cohort we used the ADMIXTURE (v 1.2) 37 and EIGENSOFT (v 6.0.1) 38 software. Samples were excluded if they were identified as ethnic outliers and to adjust for remaining population substructure 10 principle components were included as covariates in the subsequent association analysis.
Association testing. Logistic regression was used to test for association between variants and disease, assuming a multiplicative model, treating disease status as the response and expected genotype counts from imputation as covariates. For the Icelandic cohort this was done using software developed at deCODE genetics 28 , but the Dutch cohort was analyzed using the SNPTEST (v.2.5) software 39 . Testing was performed using the likelihood ratio statistic.
Meta-analysis. Variants in the GoNL imputation datasets were mapped to NCBI Build38 positions and matched to the variants in the Icelandic dataset based on allele variation. Results from the different study groups were combined using a Mantel-Haenszel model 40 in which the groups were allowed to have different population frequencies for alleles and genotypes but were assumed to have a common OR. Heterogeneity was tested by comparing the null hypothesis of the effect being the same in all populations to the alternative hypothesis of each population having a different effect using a likelihood ratio test. I 2 lies between 0% and 100% and describes the proportion of total variation in study estimates that is due to heterogeneity.