Genetic architecture differences between pediatric and adult-onset inflammatory bowel diseases in the Polish population

Most inflammatory bowel diseases (IBDs) are classic complex disorders represented by common alleles. Here we aimed to define the genetic architecture of pediatric and adult-onset IBDs for the Polish population. A total of 1495 patients were recruited, including 761 patients with Crohn’s disease (CD; 424 pediatric), 734 patients with ulcerative colitis (UC; 390 pediatric), and 934 healthy controls. Allelotyping employed a pooled-DNA genome-wide association study (GWAS) and was validated by individual genotyping. Whole exome sequencing (WES) was performed on 44 IBD patients diagnosed before 6 years of age, 45 patients diagnosed after 40 years of age, and 18 healthy controls. Altogether, out of 88 selected SNPs, 31 SNPs were replicated for association with IBD. A novel BRD2 (rs1049526) association reached significance of P = 5.2 × 10−11 and odds ratio (OR) = 2.43. Twenty SNPs were shared between pediatric and adult patients; 1 and 7 were unique to adult-onset and pediatric-onset IBD, respectively. WES identified numerous rare and potentially deleterious variants in IBD-associated or innate immunity-associated genes. Deleterious alleles in both groups were over-represented among rare variants in affected children. Our GWAS revealed differences in the polygenic architecture of pediatric- and adult-onset IBD. A significant accumulation of rare and deleterious variants in affected children suggests a contribution by yet unexplained genetic components.


Supplementary Figure 1.
Manhattan plots for loci found to be significant in comparison of all IBD patients vs controls at the GWAS stage and positively verified by Taqman genotyping.

Allelotyping GWAS
A pooled-DNA sample-based GWAS was performed as described previously 1

Statistical analysis
For each SNP on each microarray in the GWAS, the relative allele signal (RAS) was calculated as described previously 1 . The RAS was used as an approximation of the allele ratio.
Due to a lack of call-rate statistics for pooled samples, quality was assessed by visual inspection of the first two principal components. Principal component analysis (PCA) was performed for subsets of 250,000 probes due to memory constraints. Six control pools and three UC pools were removed as outliers (Supplementary Figure 1). No probe filtering was performed. The Student's t-test (Welch variant) was used to compare allele ratios between groups. Distribution assumptions were verified by visual inspection of the t-statistic QQ plot (Supplementary Figure   2). Genomic inflation factor λ was also computed. It is defined as the ratio of the median observed and expected test statistics (Supplementary Materials and Methods Table 1). To obtain informative λ values for the Student's t-test, we modified the formula by taking the absolute values of the observed and expected statistics. P-values were corrected for multiple hypothesis testing using the Benjamin-Hochberg algorithm to control the false discovery rate (FDR) 2 .
Manhattan plotting was performed using the qqman R package 3 . All computations were performed in the R environment 4 .

Individual genotyping
To validate the findings of the GWAS and replicate the SNP typing study, individual patients and controls were genotyped using TaqMan SNP Genotyping Assays (Life Technologies, USA) with TaqMan Universal Master Mix II and a 7900HT Real-Time PCR system (Life Technologies, USA). Associations were examined by the Fisher-exact test implemented in R (version 3.1.1). The ORs and 95% confidence intervals (CIs) were estimated by normal approximation using the EpiTools R package 5 . Significance threshold was set at 6.67E-4 (0.05/75), because 88 assessed SNPs represented 75 LD blocks.

Exome sequencing
A human exome sequencing library was prepared using the Ion AmpliSeq™ Exome Kit (Thermo Fisher) according to the manufacturer's protocol. Briefly, 100 ng of genomic DNA was subjected to multiplex amplification with 2x Exome Primer Pool. Next, primers were digested and adapters ligated to the amplicons. The samples were then purified using Agencourt AMPure XP beads (Beckman Coulter) and stored at -20°C for further processing. The concentration of each library was determined using a Qubit fluorometer (Thermo Fisher) and DNA fragment length assessed using High Sensitivity DNA Analysis Kits on a Bioanalyzer 2100 (Agilent).
Each library was diluted to ~100 pM prior to template preparation. Up to three barcoded libraries were subjected to automated template preparation with an Ion PI IC 200 Kit on the Ion Chef Instrument, which performs emulsion PCR on Ion Sphere Particles, followed by particle recovery and template loading on a PI chip. Samples were sequenced in an Ion Proton instrument

Deleterious variants
A variant was deemed deleterious if it met three criteria. First, its impact, as determined by Variant Effect Predictor, was not 'low'. Second, it was not previously reported as benign or likely benign in ClinVar. Finally, its CADD 11 score was at least 10 on the PHRED scale. The last criterion was based on cut-off provided by Kelsen et al 12 .
We investigated rare (MAF < 2%) deleterious variants present as homozygotes only in children (i.e., present in neither affected adults nor healthy controls). This allele frequency is based on frequency of rare, deleterious variant rs2066847 associated with IBD in European population.
We took into special consideration variants present in histocompatibility complex (HLA) genes, variants in genes previously associated with monogenic IBD according to the list provided by Uhlig et al. 13 (50 genes), and variants in genes previously associated with IBD according to the list provided by Jostins et al. 14 (1715 genes).

Over-representation of deleterious alleles among rare alleles
Only variants in coding regions with a global minor allele frequency (GMAF) featured in the 1000 Genomes Project database (1kGP), European minor allele frequency (MAF) in the 1kGP, European-American MAF in the NHLBI Exome Sequencing Project, and ExaC MAF <2%, or novel variants were chosen for further analysis. Genes from HLA genes were excluded from this analysis and analyzed separately. Two subgroups of variants were analyzed: variants in genes previously associated with CD and/or UC according to the gene lists supplied by Jostins et al. 14 , and variants in genes associated with the innate immune system according to Reactome 15 . The over-representation of deleterious alleles was determined by Fisher's exact test. In each individual, only one deleterious variant per gene was taken into account. Figure 2. QQ plot of the inflammatory bowel disease GWAS. CD: Crohn's disease; UC: ulcerative colitis