A cross-ethnic survey of CFB and SLC44A4, Indian ulcerative colitis GWAS hits, underscores their potential role in disease susceptibility

Gupta, Aditi; Juyal, Garima; Sood, Ajit; Midha, Vandana; Yamazaki, Keiko; Vich Vila, Arnau; Esaki, Motohiro; Matsui, Toshiyuki; Takahashi, Atsushi; Kubo, Michiaki; Weersma, Rinse K; Thelma, B K

doi:10.1038/ejhg.2016.131

Download PDF

Article
Published: 19 October 2016

A cross-ethnic survey of CFB and SLC44A4, Indian ulcerative colitis GWAS hits, underscores their potential role in disease susceptibility

Aditi Gupta¹,
Garima Juyal¹,
Ajit Sood²,
Vandana Midha²,
Keiko Yamazaki³,
Arnau Vich Vila⁴,
Motohiro Esaki⁵,
Toshiyuki Matsui⁶,
Atsushi Takahashi⁷,
Michiaki Kubo³,
Rinse K Weersma⁴ &
…
B K Thelma¹

European Journal of Human Genetics volume 25, pages 111–122 (2017)Cite this article

927 Accesses
8 Citations
Metrics details

Subjects

Genome-wide association studies

Abstract

The first ever genome-wide association study (GWAS) of ulcerative colitis in genetically distinct north Indian population identified two novel genes namely CFB and SLC44A4. Considering their biological relevance, we investigated allelic/genetic heterogeneity in these genes among ulcerative colitis cohorts of north Indian, Japanese and Dutch origin using high-density ImmunoChip case–control genotype data. Comparative linkage disequilibrium profiling and test of association were performed. Of the 28 CFB SNPs, similar strength of association was observed for rs4151657 (novel ulcerative colitis GWAS SNP) in north Indians (P=1.73 × 10⁻¹⁰) and Japanese (P=2.02 × 10⁻¹²) but not in the Dutch. Further, a three-marker haplotype was shared between north Indians and Japanese (P<10⁻⁸), but a different five-marker haplotype was associated (P=2.07 × 10⁻⁶) in the Dutch. Of the 22 SLC44A4 SNPs, rs2736428 (novel ulcerative colitis GWAS SNP) was found significantly associated in north Indians (P=4.94 × 10⁻¹⁰) and Japanese (P=3.37 × 10⁻⁹), but not among the Dutch. These results suggest (i) apparent allelic heterogeneity in CFB and genetic heterogeneity in SLC44A4 across different ethnic groups; (ii) shared ulcerative colitis genetic etiological factors among Asians; and finally (iii) re-exploration of GWAS findings together with high-density genotyping/sequencing and trans-ethnic fine mapping approaches may help identify shared and population-specific risk variants and enable to explain missing disease heritability.

Local genetic variation of inflammatory bowel disease in Basque population and its effect in risk prediction

Article Open access 01 March 2022

Genetic architecture of the inflammatory bowel diseases across East Asian and European ancestries

Article 08 May 2023

Identifying high-impact variants and genes in exomes of Ashkenazi Jewish inflammatory bowel disease patients

Article Open access 20 April 2023

Introduction

Ulcerative colitis (UC), a subtype of inflammatory bowel disorder (IBD), is a complex autoimmune disorder of severe medical consequences. Multiple genetic along with environmental and immunological factors and their interactions contribute to susceptibility to the disease.¹ This condition is emerging as an important health problem in India with an incidence rate of 6.02/10⁵ persons/year and a crude prevalence rate of 44.3/10⁵ individuals, which is comparable to the west, where incidence is 3–15/10⁵/year and prevalence is 50–80/10⁵. But these statistics are much higher than other Asian countries like Japan and Korea, with incidence rates of 1.95/10⁵/year and 1.23/10⁵/year, respectively, and prevalence rates of 5.5–18.12/10⁵ and 7.57/10⁵, respectively.²

Over the preceding years, several potential UC associated loci were identified, initially via genome-wide linkage scans and thereafter by genome-wide association studies (GWASs) and their meta-analysis revealing new insights into UC pathogenesis.^{3, 4, 5, 6, 7, 8, 9} However, most of these studies were primarily carried out in European populations. Recently, International Inflammatory Bowel Disease Genetics Consortium (IIBDGC) conducted a trans-ancestry study using new genotype array, called Immunochip. The chip was designed to densely genotype overlapping risk loci among common immune-mediated diseases. This study substantially increased the number of known genetic risk loci for IBD to 200.¹⁰ The non-European UC GWAS performed to date also identified novel susceptibility loci¹¹ and revealed shared UC risk loci between European and non-European cohorts.¹² These UC-specific studies also confirmed the long-established association between UC and the classical human leukocyte antigen (HLA) locus, which contains genes encoding antigen-presenting proteins, and plays a crucial role in the regulation of the adaptive immune system.

Our first ever GWAS on UC from the genetically distinct north Indian (NI) population identified seven novel susceptibility genes namely CFB, SLC44A4, 3.8-1/HCG26, MSH5, NOTCH4, HSPA1L and BAT2 from the extended HLA region and were shown to be HLA independent based on conditional regression analysis.¹³ Of these seven novel genes, the two top significant hits, namely CFB (rs4151657; P=5.10 × 10⁻¹⁴) and SLC44A4 (rs2736428; P=4.86 × 10⁻¹¹), were selected for further analysis.

Complement activation can occur via three pathways: classical, alternative or the lectin pathway. CFB (Complement factor B; 6141 bp) encodes a secreted protein that is involved in the alternative pathway of complement activation and is expressed mainly by liver and mononuclear phagocytes.^{14, 15} The complement system has an important role to play in the body and is involved in lysis of pathogens, opsonization, inflammation and immune clearance,¹⁶ thus warranting perfect regulation. Improper regulation of the complement system has been implicated in a number of autoimmune and inflammatory disorders.¹⁷ Variations within CFB have been previously associated with age-related macular degeneration¹⁸ and atypical hemolytic uremic syndrome,¹⁹ suggesting its potential role in inflammatory disorders. A recent study²⁰ showed overexpression of CFB mRNA in inflamed versus normal colonic mucosa of IBD patients, suggesting its role in IBD pathogenesis by inappropriate activation of the complement system, contributing to chronic inflammation, one of the hallmarks of UC. This confirms the role of CFB in UC etiology and further supports our novel GWAS findings. Based on this knowledge, complete exon resequencing of CFB in 50 NI UC cases to identify novel UC associated variant(s) revealed five reported SNPs, one non-synonymous in exon 1 (rs4151667 T>A), two adjacent non-synonymous in exon 2 (rs12614 C>T, rs641153 G>A) and two synonymous (rs1048709 G>A in exon 3 and rs4151669 G>A in exon 4), all of which were in the same haplotype block (D′=1) with the GWAS index SNP rs4151657 within intron 10. Of these, rs12614 was predicted to be the most damaging on the basis of in silico analysis and was taken forward for functional analysis. The % alternate pathway activity assessed in the 52 UC case sera samples with 29 wild-type homozygous (CC) and 23 heterozygous and homozygous variant (CT+TT) genotypes of rs12614 revealed significantly (P=0.01) lower activity in the latter group.¹³ These findings correlate to lower hemolytic activity of variant CFB which is consistent with the autoimmune nature of the disease, resultant lower efficiency of clearance of pathogens and thus increased susceptibility to infections and consequently disease development.

Next, an extensive investigation of structural and regulatory variants within SLC44A4 (solute carrier family 44, member 4; 15855 bp) was undertaken,²¹ which revealed possible functional relevance of this gene in UC biology. The protein encoded by this gene, also named TPPT (thiamine pyrophosphate transporter), is a transmembrane thiamine pyrophosphate transporter expressed mainly in the colon. It has been suggested that TPPT plays an important role in the uptake of thiamine pyrophosphate generated in the colon by gut microbiota, thus contributing to thiamine nutrition, especially of the colonocytes.²² It has been observed that chronic fatigue in IBD is a consequence of mild thiamine deficiency.²³

Given the biological relevance of CFB and SLC44A4 in UC pathogenesis as exemplified by our work, the present study evaluated allelic heterogeneity in these two genes across three genetically divergent populations namely NI, Japanese and Dutch to (a) corroborate our GWAS findings and (b) identify population-specific signals by utilizing high-density ImmunoChip genotype data generated as a part of the IIBDGC project.

Subjects and methods

ImmunoChip genotype data and quality control

Genotype data for a total of 28 SNPs within CFB (~6 kb) and 22 SNPs within SLC44A4 (~16 kb) were retrieved from the total genotype data generated on an Illumina Infinium ImmunoChip platform, a custom-made chip with 196 524 markers used in a recently completed trans-ethnic ImmunoChip study.¹⁰ Sample quality control (QC) for the Indian and Japanese study samples was done using PLINK v1.07 (http://pngu.mgh.harvard.edu/purcell/plink/).²⁴ Samples with ambiguous sex, missing genotype rate ≥0.02 and outlying heterozygosity rate (threshold=mean±4 SD) were removed. Sample QC for Dutch study samples are detailed elsewhere.¹⁰

Study participants

Indian UC patients and controls were self-reported north Indians, recruited from Dayanand Medical College and Hospital, Ludhiana, Punjab state. These were a subset of the larger cohort previously used for the GWAS as detailed elsewhere.¹³ Similarly, Japanese UC patients were recruited from the Kyushu University with 25 affiliated hospitals. Controls were collected from the Midosuji and other related Rotary Clubs and the BioBank Japan project. All these samples were used in previous studies.^{11, 25} Dutch UC patients were recruited from the outpatient IBD clinic at the Department of Gastroenterology and Hepatology, University of Groningen and University Medical Center Groningen, the Netherlands. Control DNA samples were derived from healthy blood donors. All these samples were used in previous studies.⁹ All the three sample sets have been included in the recent ImmunoChip analysis.¹⁰ Briefly, UC subjects were diagnosed according to standard clinical diagnostic criteria. The controls were age, sex and ethnicity matched healthy unrelated blood donors with no history of chronic inflammatory autoimmune or infectious diseases. Informed consent was obtained from each participant, and approval for the study was obtained from the ethical committees of respective institutions.

Statistical analyses

Firstly, LD was estimated in each of the three populations using Haploview 4.2 (http://www.broadinstitute.org/haploview/haploview).²⁶ We next performed single SNP and haplotypic association analyses using PLINK v1.07 (http://pngu.mgh.harvard.edu/purcell/plink/).²⁴ Sliding window haplotypes were generated using UNPHASED 3.1.5.²⁷ P-values for individual marker and sliding window haplotypes were represented graphically using Graphical Assessment of Sliding P-values (GrASP v0.82 beta) (http://research.nhgri.nih.gov/GrASP/)²⁸ to present and assess P-values from multiple tests.

In silico analysis of SNPs

SIFT (http://sift.jcvi.org/);²⁹ PolyPhen2 (http://genetics.bwh.harvard.edu/pph2/);³⁰ PolyMiRTS (http://compbio.uthsc.edu/miRSNP/)^{31, 32} and RegulomeDB (http://regulomedb.org/)³³ were used for in silico characterization of SNPs analyzed in this study.

The association data for NI, Japanese and Dutch populations have been submitted to GWAS central database (Submission ID: HGVST 1840) available at the URL http://www.gwascentral.org/study/HGVST1840.

Results

CFB

ImmunoChip genotype data for 28 CFB SNPs (Table 1) obtained for NI (897 cases and 896 controls), Japanese (719 cases and 3263 controls) and Dutch (1729 cases and 1350 controls) UC case–control cohorts were tested for allelic and haplotypic association separately and population-wise results are presented below.

Table 1 Association status of CFB SNPs with UC in north Indian, Japanese and Dutch populations

Full size table

NI UC cohort

CFB coverage on ImmunoChip, QC and LD profile

Of the 28 SNPs, 13 were monomorphic and one deviated from Hardy–Weinberg Equilibrium (HWE) (P=2.08 × 10⁻⁷). Of the 14 remaining SNPs, each with HWE P>10⁻³, MAF >0.001 and ≥99.7% genotyping efficiency (Table 1) three exonic SNPs namely rs4151667, rs4151669 and rs4151672 were in LD (r²>0.9) with each other and an intronic SNP rs541862 was in LD (r²=0.78) with rs2072634, an exonic SNP (Figure 1). These 14 SNPs were taken forward for analysis.

Allelic association

Of the 14 SNPs, the Indian GWAS index SNP rs4151657 (intron 10) was the most significant (unadjusted P=1.73 × 10⁻¹⁰), and five others namely rs12614 (exon 2), rs13194698 (intron 2), rs1048709 (exon 3), rs4151670 (exon 5) and rs17201431 (intron 6) were nominally associated at P≤0.05 (Table 1).

Haplotypic association

Of the three exonic SNPs in LD namely rs4151667, rs4151669 and rs4151672, only rs4151667 was used as proxy as it was non-synonymous and damaging on in silico predictions and of rs541862 and rs2072634 in LD, rs2072634 which was exonic was retained. Using these 11 markers and 1–11 marker sliding window haplotypes constructed on PLINK, 66 sliding windows and a total of 280 haplotypes with minimum frequency ≥0.01 were generated (Supplementary Table S1). The threshold P-value of <1.8 × 10⁻⁴ was set after Bonferroni correction was applied. A number of haplotypes were found significantly associated. A four marker haplotype (rs17201431–rs537160–rs2072634–rs4151657) was the smallest haplotype (A–G–G–G), encompassing the GWAS index SNP rs4151657 that was most significantly associated (P=4.4 × 10⁻¹¹). Of the 11 marker haplotypes that were generated, one predisposing haplotype (T–G–G–G–G–G–A–G–G–G–G) with frequency 0.42 in cases and 0.31 in controls (377 cases and 278 controls), containing the predisposing alleles of GWAS index SNP rs4151657 and all SNPs except rs17201431 showing allelic association was found to be significantly associated (P=2.7 × 10⁻¹¹). Global P-values of 1–11 marker sliding window haplotypes generated using UNPHASED 3.1.5 and graphed using GrASP v0.82 beta, keeping a minimum haplotype frequency threshold of 0.001 are presented in Supplementary Table S2 and Figure 2. Of the 11 marker haplotypes generated, the same haplotype as shown above (T–G–G–G–G–G–A–G–G–G–G) was found significantly associated (P=2.4 × 10⁻¹⁰). rs12614, rs13194698 and rs4151657 seem to be the main contributors within CFB, as can be seen from Figure 2.

Japanese UC cohort

CFB coverage on immunochip, QC and LD profile

Of the 28 SNPs in CFB on ImmunoChip, 2 were not called and 13 were monomorphic/uninformative in the Japanese UC cohort. Of the 13 remaining SNPs, each with MAF >0.001, HWE P>10⁻³ and ≥99.7% genotyping efficiency (Table 1), the two exonic SNPs rs4151667 and rs4151672 were in LD (r²=1) with each other, an exonic SNP rs1048709 was in LD (r²=0.78) with an intronic SNP rs537160, and an intronic SNP rs541862 was in LD (r²>0.9) with an exonic SNP rs2072634, which is slightly different from the NI pattern (Figure 1). These 13 SNPs were taken forward for analysis.

Allelic association

NI UC GWAS index SNP rs4151657 came up significantly associated (P=2.02 × 10⁻¹²) along with eight other SNPs namely rs4151667 (exon 1), rs12614 (exon 2), rs13194698 (intron 2), rs1048709 (exon 3), rs4151670 (exon 5), rs537160 (intron 7), rs2072633 (intron 17) and rs4151672 (3’UTR) showing nominal association (P≤0.05) (Table 1).

Haplotypic association

Of the two exonic SNPs namely rs4151667 and rs4151672 in LD, rs4151667 was retained as it is non-synonymous and damaging on in silico predictions; of rs1048709 and rs537160 in LD, rs1048709 was retained as it is exonic and damaging on in silico predictions; and of rs541862 and rs2072634 in LD, rs2072634 was retained as it is exonic. 1–10 marker sliding window haplotypes constructed on PLINK generated 55 sliding windows and a total of 187 haplotypes with minimum frequency ≥0.01 (Supplementary Table S3). The threshold P-value of <2.7 × 10⁻⁴ was set after Bonferroni correction was applied. A number of haplotypes were found significantly associated. A three-marker haplotype (rs17201431–rs2072634–rs4151657) was the smallest haplotype (A–G–G), encompassing the GWAS index SNP rs4151657 that showed most significant association (P=9.6 × 10⁻¹³). Of the 10 marker haplotypes that were generated, the same predisposing haplotype (T–C–C–G–G–C–T–C–C–G) that was found associated in NI was found associated in Japanese population (P=2.6 × 10⁻¹¹) as well, with frequency 0.53 in cases and 0.43 in controls (384 cases and 1407 controls). 1–10 marker sliding window haplotypes generated using UNPHASED 3.1.5 and GrASP v0.82 beta, keeping a minimum haplotype frequency threshold of 0.001, revealed the same pattern of association (Supplementary Table S2 and Figure 2). From Figure 2, it is apparent that the three SNPs namely rs1048709, rs4151657 and rs2072633 are the main drivers for association of this region to UC.

Dutch UC cohort

CFB coverage on ImmunoChip, QC and LD profile

Of the 28 CFB SNPs on ImmunoChip three were not in 1000 Genome, one failed HWE, one failed heterogeneity, and three were monomorphic and three had MAF<0.001 in Dutch UC cohort. Of the remaining 17 SNPs which were analyzed further, each with MAF >0.001, HWE P>10⁻³ and ≥99.8% genotyping efficiency (Table 1), three exonic SNPs namely rs4151667, rs4151669 and rs4151672 were in LD (r²=0.99) with each other (Figure 1).

Allelic association

India GWAS index SNP rs4151657 showed only a nominal association (P=0.002), along with two other intronic SNPs, namely rs537160 (P=4.29 × 10⁻⁵) and rs2072633 (P=0.003, Table 1).

Haplotypic association

Of the three exonic SNPs namely rs4151667, rs4151669 and rs4151672 which were in LD, only rs4151667 was retained as it was non-synonymous and damaging on in silico predictions. 1–15 marker sliding window haplotypes constructed on PLINK generated 120 sliding windows and a total of 695 haplotypes with minimum frequency ≥0.01 (Supplementary Table S4). Some haplotypic combinations withstood the Bonferroni corrected P-value threshold of <7.2 × 10⁻⁵. A five-marker haplotype (rs4151651–rs4151652–rs17201431–rs512559–rs537160) was the smallest haplotype (G–G–A–A–A) showing most significant association (P=2.07 × 10⁻⁶). It was a protective haplotype with a frequency of 0.3 in cases and 0.35 in controls. 1–15 marker sliding window haplotypes generated using UNPHASED 3.1.5 and GrASP v0.82 beta, keeping a minimum haplotype frequency threshold of 0.001 revealed similar pattern of association (Supplementary Table S2 and Figure 2). As can be seen from Figure 2, rs537160 seems to be the only contributor within this region to UC.

In silico analysis of CFB SNPs

SIFT and POLYPHEN2 prediction of the four missense SNPs namely rs4151667 (exon 1), rs12614 (exon 2), rs4151651 (exon 5) and rs4151659 (exon 13) showed the first two to be damaging (Supplementary Table S5). The 3′ UTR SNP rs4151672 was checked on PolymiRTS Database 3.0, and the reference allele C was found to disrupt two conserved miRNA sites and variant allele T was found to create a new miRNA site, and thus possibly functional. Checking all SNPs on RegulomeDB, most SNPs were predicted to be near DNA features or regulatory elements like transcription factor-binding sites and also affect protein binding. Of note, three SNPs namely rs1048709 (exon 3), rs17201431 (intron 6) and rs2072633 (intron 17), which showed allelic association in either of the three populations (Table 1), were predicted to have cis-eQTL effects on a number of HLA genes (Supplementary Table S6), which are in the vicinity of CFB, which may suggest the role of CFB via HLA genes.

SLC44A4

ImmunoChip genotype data for 22 SLC44A4 SNPs (Table 2) obtained for NI (897 cases and 896 controls), Japanese (724 cases and 3271 controls) and Dutch (1729 cases and 1350 controls) UC case–control cohorts were tested for allelic and haplotypic association separately and population-wise results are presented below.

Table 2 Association status of SLC44A4 SNPs with UC in north Indian, Japanese and Dutch populations

Full size table