IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry

Single-nucleotide polymorphisms (SNPs) in cytokine genes can affect gene expression and thereby modulate inflammation and carcinogenesis. However, the data on the association between SNPs in the interleukin 1 beta gene (IL1B) and colorectal cancer (CRC) are conflicting. We found an association between a 4-SNP haplotype block of the IL1B (-3737C/-1464G/-511T/-31C) and CRC risk, and this association was exclusively observed in individuals with a higher proportion of African ancestry, such as individuals from the Coastal Colombian region (odds ratio, OR 2.06; 95% CI 1.31–3.25; p < 0.01). Moreover, a significant interaction between this CRC risk haplotype and local African ancestry dosage was identified in locus 2q14 (p = 0.03). We conclude that Colombian individuals with high African ancestry proportions at locus 2q14 harbour more IL1B-CGTC copies and are consequently at an increased risk of CRC. This haplotype has been previously found to increase the IL1B promoter activity and is the most frequent haplotype in African Americans. Despite of limitations in the number of samples and the lack of functional analysis to examine the effect of these haplotypes on CRC cell lines, our results suggest that inflammation and ethnicity play a major role in the modulation of CRC risk.


Results
Participants' characteristics. Descriptive characteristics for three groups of participants, AP, CRC and controls, from the Andean and Coastal Colombian regions are summarized in Supplementary Table S1. The AP and CRC groups contained more subjects within the age range of 50 to 69 years compared to controls (68.6%, 63.8% and 45.8%, respectively; p < 0.01). In addition, compared to controls, the proportion of male subjects was higher in the CRC group (42.4% and 50.7%, respectively; p = 0.03), and a larger percentage of the AP group had a college degree or a higher education level (14.6% and 24.1%, respectively; p = 0.03), whereas more patients with CRC had no education or only completed primary school (34.7% and 51%, respectively; p < 0.01). The family history of CRC in immediate relatives, NSAID consumption or region of origin did not differ between groups.
Differences in the allele, genotype and haplotype frequencies of IL1B SNPs by case-control status. We selected five IL1B SNPs for which conflicting results regarding their association with inflammation and neoplastic processes had been published in previous studies [24][25][26][27][28][29] ; four of these SNPs are located within the promoter region (− 3737 C > T, − 1464 G > C, − 511C > T and − 31 T > C), and the fifth is located in the coding region (+ 3954 C > T). In single SNP analyses we found that IL1B-1464CC individuals (versus CG + GG) were protected from AP but not CRC (p = 0.04). No other associations were observed (Supplementary Table S2).
Given the high linkage disequilibrium (LD) between the SNPs included in the analysis and due to previous work investigating haplotypes of IL1B in other populations 10 , we obtained haplotypes for this region based on the 5 genotyped SNPs. The four SNPs located in the promoter region of the IL1B gene are in strong LD and form one haplotype block ( Supplementary Fig. S1). Within this haplotype block, only five of 16 possible IL1B promoter haplotypes were found with a frequency greater than 0.01 (Table 1). Because the IL1B-CCTC haplotype (N°4) was the most frequent haplotype in our admixed sample (41%) ( Table 1), it was used as the reference in all analyses. According to the unadjusted general lineal model (GLM) analysis, the haplotype IL1B-CGTC (N°3) was associated with an increased risk of CRC (odds ratio, OR 1.39; 95% CI 1.02-1.90; p = 0.03) ( Table 1). In the conditional haplotype tests controlling for the IL1B-511/IL1B-31 SNPs, we confirmed that the IL1B-1464G allele (haplotype N°3) was a risk factor for CRC compared to having the IL1B-1464C allele (haplotype N°4) (subnull p = 0.04; likelihood ratio test: chi-square = 5.63, Degrees of freedom (df ) = 2, p = 0.06). This result was concordant with the independent effect of IL1B-1464 test (likelihood ratio test: chi-square = 4.38, df = 1, p = 0.04) ( Table 1).  1C). We also identified a strong correlation between ancestry proportions estimated using two different statistical approaches, i.e., RFMix 31 and ADMIXTURE 30 , in the 393 samples with genome-wide genotypes (0.94, 0.92 and 0.99 for European, Amerindian and African components, respectively) (Fig. 1C). Due to the right-skewed distribution of the African ancestry proportions in our admixed samples, we selected a non-parametric test to assess differences in global ancestry estimates among Colombian cases and controls (Fig. 1). We found significant differences in the European and Amerindian ancestry proportions between the AP and control groups for both sets of individuals (genome-wide set: p values < 0.01 for both components; candidate-gene set: p values of 0.04 and < 0.01, respectively) (Fig. 1A,B). Moreover, the African ancestry proportions significantly differed between the CRC group and the control group for genome-wide samples only (p = 0.04) (Fig. 1A).
IL1B haplotype association with AP and CRC risk adjusted for global ancestry. Among   The difference in ancestries between the AP and control groups was significant for the European and Amerindian components (p < 0.01) and between the CRC and control groups for the African proportion (p = 0.04). (B) Ancestry estimations per group for the candidate-gene set. The difference in ancestries between the AP and control groups was significant for the European and Amerindian components (p = 0.04 and < 0.01, respectively), whereas the African proportion did not significantly differ between the CRC and control groups within these candidate-gene sample sets. (C) Pearson's correlation per ancestry component. The correlations between genome-wide genotyped samples obtained with RFMix and ADMIXTURE are for 393 Colombians. The correlations between genome-wide and candidategene sets obtained with ADMIXTURE are for 85 overlapping samples. 1k-HGDP, 1000 genomes plus Human Genome Diversity Project databases; EUR -AME -AFR corresponds to global European, Amerindian and African components; AP, adenomatous polyps; CRC, colorectal cancer.
To identify differences in the effect of these haplotypes on the AP and CRC risk for each Colombian region, we used the same stratification method as that described in Supplementary Fig. S3. The latter showed that people from the Coast have higher African ancestry than those from the Andean region of the country. In this analysis, we saw a trend in the association of the IL1B-TGCT (N°5) haplotype with AP in both regions, Andean (95% CI 0.96-2.40) and Coastal (95% CI 0.93-2.49) (Table 3). Interestingly, we found that the association between the IL1B-CGTC (N°3) haplotype and CRC risk was exclusive to Colombians from the Coast (OR 2.06; 95% CI 1.31-3.25; p < 0.01) ( Table 3).
IL1B haplotype frequencies among self-reported African American CRC cases and reference populations. The IL1B-CGTC (N°3) haplotype found to be associated with an increased risk of CRC in Colombian subjects from the Coastal region was also the most frequent haplotype in the self-reported African American CRC cases from this study and the African Americans from the Atherosclerosis Risk Communities Cohort (ARIC) 10 (Supplementary Table S4). In addition, this haplotype was the least frequent haplotype in Non-Hispanic White populations from the US 10 . We used information available on HapMap3 32 via Haploview 33 , including a simplified haplotype configuration (with IL1B-511 and IL1B-31 SNPs) and found similarities in    Table S5).

Role of locus-specific ancestry and IL1B risk haplotypes on CRC and AP. The association of IL1B
haplotypes and locus-specific ancestry with the risk of AP and CRC were analysed using only the 393 Colombian samples for which the aforementioned information was available; these samples represented only 50% of the 791 sample used previously in the adjusted regression analyses. We saw great variations in the level of excess of local African and European ancestries per marker in chromosome 2 for the CRC and AP groups relative to the control group ( Supplementary Fig. S4), and the corresponding −log 10 (P-values) obtained for these differences in the GLM analyses adjusted by sex, age, educational level and global ancestries are shown in Fig. 2. Interestingly, the 18 SNPs located in the selected 2q14 region exhibited the largest differences in African ancestry dosage between the CRC group and the control group (overlapped green dots, p = 6.58 × 10 −4 ; false discovery rate, FDR, corrected p = 0.09) ( Fig. 2A); although these differences are significant at a nominal level, after correction for multiple testing the association is not significant at the 5% level but it is suggestive (p < 0.1).
To better understand this result, we plotted each ancestry dosage within the 2q14 region by phenotype and found that the CRC group contains a higher proportion of African ancestry dosage (1 or 2 copies) than the AP and control groups (Fig. 3A). Moreover, 50% of the CRC samples with two copies of African ancestry carried two copies of the IL1B-CGTC haplotype, and the remaining 50% carried at least one copy (Fig. 3C). Table 4 shows the multinomial logistic regressions conducted to evaluate the effect of global and locus-specific African ancestry on the risk of CRC, including the IL1B-CGTC haplotype copies, adjusted by sex, age, educational level, NSAID consumption and family history of CRC. We found that both global and locus-specific African ancestries were individually associated with an increased risk of CRC (Model 1a and 2a, respectively). When we included the main effects of both variables in the same model, the association with CRC remained significant only for locus-specific African ancestry (OR 3.40; 95% CI 1.05-10.98; p = 0.04; Model 3a). Although we found that the IL1B-CGTC haplotype was associated with CRC risk in Colombians, especially individuals from the Coastal region characterized by having a higher African ancestry, our results for this small set of 393 individuals only suggests this association (95% CI 0.88-2.82; Model 4a). Remarkably, the interaction of IL1B-CGTC haplotype copies with locus-specific African ancestry was associated with CRC (p = 0.03; Model 6a); nevertheless, when testing only the main effects of all three variables, only locus-specific African ancestry was found to be associated with CRC (OR 3.58; 95% CI 0.97-13.30; p = 0.06; Model 7a). For European ancestry dosage, none of the comparisons revealed significant differences or suggestive p values in the adjusted GLM analysis between the AP and control groups after correction for multiple testing, including locus 2q14 (overlapped green dots; Fig. 2B). These results are consistent with those displayed in Fig. 3B, which shows the excess or defect of each locus-specific ancestry within locus 2q14 with respect to the average of each ancestry along chromosome 2.
We also used adjusted multinomial logistic regressions to evaluate AP risk, including global and locus-specific European ancestry, and copies of the IL1B-TGCT haplotype (Table 4). We found that AP risk was mainly explained by the effect of global European ancestry (OR 1.91; 95% CI 1.17-3.13; p = 0.01; Model 7b) and that the data suggested an association with AP risk for the IL1B-TGCT haplotype copies (95% CI 0.83-2.66; Model 4b).
As already observed in the multimarker regression analysis for AP (Fig. 2B), the European ancestry proportion in 2q14 does not play an important role in AP risk (Model 7b, Table 4).

Discussion
We found that AP is less common among recessive carriers of the IL1B-1464C allele. Furthermore, the risk of CRC significantly differed between carriers of the IL1B-1464G allele and the IL1B-1464C allele, when considering the haplotype context (N°3 versus N°4). Our results and the literature support the potential effect of this variant in preventing neoplastic changes in some tissues because the IL1B-1464G/C SNP flanks a putative binding site for the proteins DBP, C/EBP alpha and Pit-1a, and the C allele shows a lower transcriptional activity in haplotype context with IL1B-511T and IL1B-31C 10,34 . Therefore, accounting for haplotype context is important in association studies.
Although increased IL-1 beta production has also been reported for other SNPs in the promoter region of the IL1B gene 10,34-37 , associations of these SNPs with cancer are variable and depend on the cancer model and genetic background of the population [24][25][26][27][28][29] . These inconsistencies are likely related to a small effect or the low frequency of these variants in some of the studied populations, which makes the identification of any association with the disease of interest difficult. Our three-way admixed population provides an opportunity to test differences in the effect of IL1B haplotypes among individuals with varying degrees of admixture. Regarding these analyses, we found that the second most frequent haplotype in Colombian controls, IL1B-TGCT, was associated with AP risk irrespective of the region of origin. Also, our results suggest that most of the AP risk can be attributed to global European ancestry proportions instead of local European ancestry at 2q14, meaning that non-genetic factors associated with the European component could have an important role in the risk of AP. Further genome-wide association analysis taking into account other non-genetic risk factors could help disentangle the observed association between AP risk and global European ancestry in Colombians. We did not find an effect of the IL1B-TGCT haplotype on CRC risk, which could be explained in part by the small sample size or the fact that this haplotype exhibits moderate transcriptional activity in vitro 10 , making it more suitable as a susceptibility marker of milder lesions that do not necessarily evolve to CRC.
CRC risk was consistently associated with the IL1B-CGTC haplotype in Colombians, especially in individuals from the Coastal region of the country, who exhibit the highest African ancestry proportions. Despite this result, an analysis of 393 individuals with local ancestry inference (LAI) data only suggested an effect of IL1B-CGTC on CRC risk; lack of significance for this association is likely due to the small sample size and the overall low frequency of this haplotype in Colombians (~13%). The important role of African ancestry within locus 2q14 in CRC risk was suggested by the fact that variants within this region showed the lowest p-values in the multimarker regression analysis along chromosome 2. Interestingly, we found a significant interaction between the IL1B-CGTC haplotype and local African ancestry for CRC risk. Because only local African ancestry remained significantly associated with CRC risk when we tested the main effects of the risk haplotype as well as local and global African ancestries, we conclude that additional African-related variants that could explain the risk of CRC may be located within the selected 2q14 region, and fine-mapping methods will help to identify these variants.
The relationship of the IL1B-CGTC haplotype with CRC risk and African ancestry identified herein corroborates previous work showing that this haplotype exhibits the highest transcriptional activity amongst four other possible IL1B promoter haplotypes 10 and that it is the most frequent haplotype in African Americans 10 . Therefore, further population case-control studies among Colombians with different African ancestry proportions, seeking differences in the expression levels of IL-1 beta and its targets (such as COX-2 and PGE2) in colorectal tissues and plasma samples will help prove their utility as susceptibility markers for the risk of CRC in the general population. The identification of such markers will allow individuals who are at an increased risk to be offered effective measures to prevent this type of cancer. For example, according to the atlas of cancer mortality in Colombia published in 2010 13 , two of the cities included in this study from the Andean region, Bogotá DC and Bucaramanga, belong to two regions with 27% and 11% higher risks of CRC mortality, respectively, than the rest of the country. Moreover, whereas among the cities included from the Caribbean Coast and surrounding areas, Cartagena, Barranquilla and Santa Marta, the risk of CRC mortality is lower than in the general population, in Cali and its  Table 4. Association of ancestry proportions or IL1B risk haplotypes with CRC or AP risk. P values for the adjusted multinomial logistic regression analyses to evaluate the effect of African or European ancestry (global and/or locus-specific) on CRC and AP risk. These models also included the main effects and interactions with copies (0 versus 1 or 2) of the IL1B haplotypes of risk (IL1B-CGTC for CRC and IL1B-TGCT for AP risk). All models are adjusted for sex, age, educational level, NSAID consumption and a family history of CRC. AFR, African; EUR, European.
Scientific RepoRts | 7:41920 | DOI: 10.1038/srep41920 surroundings in Valle del Cauca located in the Pacific Coast, the risk of CRC mortality is 22% higher than that in the rest of the country 13 . Interestingly, Cali has the highest concentration of Afro-Colombians 38 .
Overall, these results are of particular interest due to previous observations about disparities in CRC risk among different population groups, such as differences between Non-Hispanic White and African American U.S. populations, with respect to inflammatory diseases and cancer development 11,39 . Our findings highlight the importance of conducting genetic studies in admixed populations to reveal potential population-specific susceptibility markers of risk for complex diseases.

Methods
Subjects. This is a multicentre, hospital-based case-control study that was conducted in six Colombian cities from the Andean (Bogotá and Bucaramanga) and Coastal regions (Cartagena, Santa Marta and Barranquilla in the Caribbean and Cali in the Pacific). These cities are characterized by differences in CRC mortality risk and ancestry proportions. A total of 306 CRC cases, 191 AP and 500 matched controls between age 30 and 74 years were enrolled. All cases were incident and confirmed by histopathology. The control group consisted of individuals without gastrointestinal symptoms attending the outpatient services of primary care units. Neither cases nor controls had a personal history of other cancers, and neither group received chemotherapy or radiotherapy. Each participant provided written informed consent. This work was approved by the Ethics Committee of the Instituto Nacional de Cancerología, Bogotá, Colombia as a "study with greater than minimal risk" according to guidelines established in the document "RESOLUCIÓN 8430 DE 1993" for Ethical Aspects of Human Research, (Title II, Chapter 1) published by the República de Colombia Ministerio de Salud (https://www.invima.gov.co/images/pdf/ medicamentos/resoluciones/etica_res_8430_1993.pdf).
We also included samples from 177 self-reported African American patients with CRC diagnosed at the Howard University College of Medicine in Washington, D.C. and the Ochsner Clinic Foundation in New Orleans, LA for IL1B SNPs genotyping to compare the IL1B haplotype frequencies with those of Colombian samples. Because these samples were de-identified FFPE tissues and the study was retrospective, the LSUHSC IRB following the recommendations for the use of human tissues and under strictest HIPAA protocols classified this specific study as an "Institutional Review Board (IRB) exempt" study.
All methods were performed in accordance with the declaration of Helsinki and local relevant guidelines and regulations, as already mentioned. IL1B SNPs genotyping. All subjects were genotyped for five IL1B SNPs using TaqMan SNP Genotyping Assays Kits (Applied Biosystems, Foster City, CA, USA) according to the manufacturer's protocol. Four of these SNPs are located in the promoter region of the gene: − 3737 C > T (rs4848306), − 1464 G > C (rs1143623), 511 C > T (rs16944) and − 31 T > C (rs1143627). The fifth SNP + 3954 C > T (rs1143634) is located within exon 5. After PCR amplification, the genotype was determined using the Sequence Detection System (SDS) software (Applied Biosystems, Foster City, CA, USA). DNA controls with known genotype for each SNP were run in parallel. Hardy-Weinberg equilibrium for IL1B-genotyped SNPs was assessed using the exact test statistics P HWE in PLINK 40,41 . All IL1B SNPs were in Hardy-Weinberg equilibrium (p > 0.05) (data not shown).
Global ancestry estimation. We used two different platforms from Illumina ® , a candidate-gene "Cancer SNP Panel" array that includes 1421 SNPs, and a genome-wide "Infinium ® OmniExpressExome Array", which includes 958178 SNPs, according to the manufacturer's instructions, to genotype 521 and 443 Colombian samples, respectively.
Quality control (QC) and pruning steps were performed separately due to large differences in the number of markers tested within each array. These steps were based on the protocol described by Anderson et al. 42  We used the unsupervised model-based mode in ADMIXTURE 30 for individual ancestry estimations using genotypes from autosomal markers and K = 3 ancestral populations individually for candidate-gene and genome-wide datasets. For candidate-gene data, reference populations from the HapMap3 project 32  After eliminating duplicates between both arrays (n = 85 samples) and selecting samples also genotyped for IL1B SNPs, a total of 791 Colombian samples with available global ancestry estimates remained for further adjusted regression analyses. [47][48][49][50][51] , we selected a discriminative approach supported by the RFMix v1.5.4 software 31 because it performs an iterative analysis that, beginning with unadmixed panels, also utilizes information in the chromosomes of admixed individuals to infer local ancestry. This approach allows refining our knowledge of haplotype patterns in the ancestral populations and improves accuracy via expectation maximization (EM) steps 31 .

Local ancestry inference (LAI). Among all available methods for LAI
For the LAI procedures, we used a reference panel that included the same populations from 1000 genomes 45 and HGDP 46 databases incorporated previously to calculate global ancestry proportions. Because RFMix 31 requires phased haplotypes input, we first phased genotypes from our genome-wide Colombian samples and HGDP 46 data using 1000 genomes 45 as a reference prior to the merging steps (Supplementary Methods). Specifically, we used the segmented haplotype estimation and imputation tool, SHAPEIT v2.778 52 , based on the modelling of stochastic processes through a Hidden Markov Model to phase genotypes. To correct for admixture in our reference populations, we performed five EM iterations, as recommended 31 .

Statistical analysis.
A descriptive analysis based on the characteristics of the 997 Colombian samples genotyped for the IL1B gene SNPs was performed in R 43 using Pearson's Chi-Squared Test to assess for differences among groups. Full model association tests between the disease and each SNP were performed in PLINK 40 . LD among all five IL1B SNPs was calculated using the Haploview algorithm 33 .
Global ancestry estimates from ADMIXTURE 30 with candidate-gene or genome-wide datasets were obtained for 791 unique Colombian samples with IL1B haplotype information. Furthermore, the LAI results from RFMix 31 were used to calculate global ancestry as the average locus-specific ancestry across all loci for each individual with genome-wide data. Pearson's correlation per ancestry component was computed between all estimates. Differences in ancestry proportions between the control and CRC or AP group were assessed with the Wilcoxon rank sum test for ancestry estimates obtained with ADMIXTURE 30 .
Because the sample size in this study is at best moderate and the observed African ancestry proportions were substantially skewed, we transformed the European and African ancestry components to a symmetric distribution using a logit transformation before conducting adjusted regression analyses. In these models, Amerindian ancestry was treated as the reference, and global ancestry proportions were always corrected by the "array" variable, referring to candidate-gene or genome-wide estimates.
Multinomial logistic regressions of phenotypes, including global ancestry estimates, sex, age, educational level, city of origin, NSAID consumption and a family history of CRC as explanatory covariates, were conducted with R 43 . The AIC was used to select the best model.
The IL1B haplotype frequencies among Colombian and African American CRC samples were inferred in R 43 , through EM computation of haplotype probabilities with progressive insertion of loci. Conditional haplotype tests were conducted in PLINK 40 to evaluate differences in disease risk related to haplotype context when controlling for some SNPs or when assessing for their independent effect. The odds ratio (OR) for each IL1B haplotype and disease risk were obtained for Colombian samples using unadjusted and adjusted GLM in R 43 while controlling for sex, age, educational level, global ancestry proportions and array variables. GLM analyses of IL1B haplotypes and disease risk were conducted in a stratified analysis by region of origin, using sex, age and educational level as covariates. In all cases, the most frequent haplotype was used as the reference, and rare haplotypes (less than 0.01) were not included in the analyses.
For the 393 Colombian samples with both IL1B haplotypes and LAI information, we estimated global ancestry for chromosome 2 as the average locus-specific ancestry across all loci. We also selected a 100000-bp region at locus 2q14 (Chr2:113500000:113600000; 18 SNPs; build 37) that holds the IL1B gene to calculate locus-specific ancestry. We plotted the variation of each ancestry proportion per marker in chromosome 2 in the CRC and AP groups relative to those in the control group used as the baseline and conducted a multimarker GLM analysis for CRC risk. Specifically, the African dosage per marker at chromosome 2 corrected for sex, age, educational level and global African ancestry was used as an explanatory variable. We conducted the same analysis comparing the AP and control groups but used the European dosage per marker at chromosome 2 and global European ancestry instead. Corresponding -log10 (P-values) for each analysis are displayed in Manhattan plots. The computed P-values were adjusted for multiple comparisons using the FDR correction for 28579 test/variants along chromosome 2.
Finally, we performed multinomial logistic regression analyses that compared the respective IL1B risk haplotypes copies and the effect of global and locus-specific African or European ancestry of the control group with those of the CRC and AP groups, and these comparisons were adjusted for sex, age, educational level, NSAID consumption and a family history of CRC.