Sequence variants in ARHGAP15, COLQ and FAM155A associate with diverticular disease and diverticulitis

Diverticular disease is characterized by pouches (that is, diverticulae) due to weakness in the bowel wall, which can become infected and inflamed causing diverticulitis, with potentially severe complications. Here, we test 32.4 million sequence variants identified through whole-genome sequencing (WGS) of 15,220 Icelanders for association with diverticular disease (5,426 cases) and its more severe form diverticulitis (2,764 cases). Subsequently, 16 sequence variants are followed up in a diverticular disease sample from Denmark (5,970 cases, 3,020 controls). In the combined Icelandic and Danish data sets we observe significant association of intronic variants in ARHGAP15 (Rho GTPase-activating protein 15; rs4662344-T: P=1.9 × 10−18, odds ratio (OR)=1.23) and COLQ (collagen-like tail subunit of asymmetric acetylcholinesterase; rs7609897-T: P=1.5 × 10−10, OR=0.87) with diverticular disease and in FAM155A (family with sequence similarity 155A; rs67153654-A: P=3.0 × 10−11, OR=0.82) with diverticulitis. These are the first loci shown to associate with diverticular disease in a genome-wide study.

Reactive oxygen species (ROS) production test. Neutrophil respiratory burst after stimulation with: E. Coli, fMLP (Low control) and PMA (High control). Number of blood donors: n=14 non-carriers of the ARHGAP15 sequence variant rs4662344, n=12 heterozygotes and n=14 homozygotes. A) The average geometric mean of fluorecent intesity for each group of rs4662344 carriers with standard deviation. B) Percentage of ROS positive cells, for each group of rs4662344 carriers, average with standard deviation. a b

Supplementary Figure 3. Effect of sequence variants on the expression of COLQ in white blood cells by RNA sequencing. a) COLQ transcripts b) COLQ exons
The RNAseq expression in whole blood for each transcript and exon is plotted for carriers of wild type, heterozygotes and homozygotes for the associated sequence variant. A line indicates the median, the box 25th and 75th percentiles of the distribution and the whiskers indicate the 95% confidence interval. A) Expression of two COLQ transcripts from 2,708 individuals plotted for the sequence variant rs7609897. B) COLQ expression, from 2,246 individuals, plotted for the sequence variant rs7609897. Polygenic risk scores (PRS) were calculated using publicly available summary statistics of IBD/UC/CD in Europeans from an Immunochip GWAS study on IBD 2 . Polygenic risk scores were then tested for association with diverticular disease and diverticulitis using generalized additive regression with smoothed age, sex and the first five principal components as covariates using P-value threshold of P <5x10 -8 and P<0.05. R 2 (%): Variance explained was estimated using Nagelkerke's pseudo R-squared. P-value and odds ratio (OR) with 95% CI, 95% confidence interval for the association with diverticular disease in Iceland.

Supplementary note.
We did not find any effect of rs4662344-T (or other variants associating with diverticular disease) at the ARHGAP15 locus on expression in blood or adipocytes. We looked for potential effects of these variants on transcription factor and enhancer regions. H3K27 acetylation, DNase I sensitivity and the SiPhy conservation score indicate that the indel rs61603193 (AT/A, r 2 =0.97 with rs4662344-T) could be in an enhancer region (Haploreg 4.1) 3 (Supplementary Table   5). Acetylation marks are observed in multiple tissues including colon smooth muscle, sigmoid colon and colonic mucosa and rs61603193 alters six potential transcription factor (TF) binding sites, with the largest effect on Cdx, the caudal-related homeodomain TFs. Cdx1 and Cdx2 are expressed in colon and Cdx2 is required for intestinal development and colon specification in mice 4, 5 .
rs7609897-T in COLQ is in an intron with potential histone marks in four tissues, including primary T-cells and could introduce regulatory motifs for BATF_disc3 and Nrf1_known2 (Haploreg 4.1 3 ).
At the FAM155A (Family With Sequence Similarity 155A) locus there are fourteen variants in LD (r 2 >0.6) with rs67153654-A, which has the potential to disrupt regulatory motifs for Sox_13, Sox_10 and Nanog_disc2 and has histone modification signals in brain and neuronal cells (Haploreg 4.1 3 ).
The best coding signal identified in this study is represented by a rare missense variant rs61756577-C (p.

Effect of smoking.
We used questionnaire data to assess whether smoking affects the risk of DD in Iceland 6,7 . Heavy smokers defined to have 10 pack years or more (N=26,113, >10 pack-years; 1 packyear defined as 1 pack of cigarettes per day for 1 year) were compared to those who answered that they had never smoked (N=22,815). The risk of being diagnosed with DD was calculated for each group and adjusted for sex and age. The effect of smoking on the risk of the sequence variants rs4662344-T in ARHGAP15, rs7609897-T in COLQ and rs67153654-A in FAM155A was also calculated.

Effect of inflammatory biomarkers.
The Data on inflammatory biomarkers were obtained from three of the largest laboratories in Iceland c) TruSeq Nano DNA library preparation method. Illumina HiSeq X sequencers.
A more detailed description of each sample preparation method is provided below.
Sample preparation and sequencing using the standard TruSeq DNA library preparation method.
Approximately 1 μg of genomic DNA, isolated from frozen blood samples, was fragmented to a mean target size of approximately 300-400 bp using a Covaris E210 instrument. The resulting fragmented DNA was end repaired using T4 and Klenow polymerases and T4 polynucleotide kinase with 10 mM dNTP followed by addition of an 'A' base at the ends using Klenow exo fragment (3′ to 5′-exo minus) and dATP (1 mM). Sequencing adaptors containing 'T' overhangs were ligated to the DNA products followed by agarose (2%) gel electrophoresis. Fragments of about 450-500 bp were isolated from the gels (QIAGEN Gel Extraction Kit), and the adaptor-modified DNA fragments were PCR enriched for ten cycles using Phusion DNA polymerase (Finnzymes Oy) and a PCR primer cocktail needed for pairedend sequencing. Enriched libraries were purified using AMPure XP beads. The quality and concentration of the