Introduction

Colorectal cancer (CRC) is considered a major public health problem and is the fourth most common cancer and the fifth leading cause of cancer-related death in both sexes worldwide1. Thus, studies of the effects of non-genetic2,3 and genetic risk factors on CRC development4 have important implications for understanding the aetiology of this disease.

Inflammation is a hallmark of cancer5 and has been particularly associated with genetic instability and stromal mechanisms that affect the CRC tumour microenvironment, including angiogenesis, invasion and metastasis6. Specifically, interleukin 1 beta protein (IL-1 beta) plays a critical role as a pro-inflammatory cytokine in colon inflammation and carcinogenesis7 by up-regulating cyclooxygenase-2 (COX-2) and prostaglandin E2 (PGE2) via several signalling pathways8,9; furthermore, SNPs within the promoter region of the gene can modulate the IL-1 beta levels and this modulation depends upon the haplotype context10. However, the frequencies of these haplotypes vary by population10, which may partly explain the disparities observed in CRC incidence between African Americans and U.S. Non-Hispanic Whites11,12.

A heterogeneous risk pattern for CRC mortality across Colombian populations has been reported13, and this pattern could be explained by the influence of environmental and genetic factors13, including genetic ancestry. The latter is supported by the fact that Colombian populations are characterized by different three-way admixture contributions from ancestral source populations; for instance, the proportions of African ancestry are higher across cities near the Coastal compared to those located within the Andean region14,15. Moreover, African ancestry was previously shown to be associated with increased risk of CRC in individuals from Colombia16.

Previous genetic studies of admixed individuals have identified DNA markers linked to genetic ancestry that are associated with the risk of various types of cancer, and these markers partly explain the disparities in health between some populations groups17,18,19,20,21. Because recombination events can change allele and haplotype frequencies among populations, especially in cases of admixture between different ethnic groups22,23, and because the data on the effects of IL1B variants on cancer risk differ by study population24,25,26,27,28,29, we took advantage of the characteristic variation in the genetic structure of our admixed samples to identify the associations between adenomatous polyps (AP) as well as CRC risk with IL1B haplotypes and their interactions with global and local ancestry proportions in Colombians; we also analysed the variations in these effects by region. To the best of our knowledge, this study is the first to explore the association of IL1B haplotypes with AP and CRC risk in a genetically admixed population.

Results

Participants’ characteristics

Descriptive characteristics for three groups of participants, AP, CRC and controls, from the Andean and Coastal Colombian regions are summarized in Supplementary Table S1. The AP and CRC groups contained more subjects within the age range of 50 to 69 years compared to controls (68.6%, 63.8% and 45.8%, respectively; p < 0.01). In addition, compared to controls, the proportion of male subjects was higher in the CRC group (42.4% and 50.7%, respectively; p = 0.03), and a larger percentage of the AP group had a college degree or a higher education level (14.6% and 24.1%, respectively; p = 0.03), whereas more patients with CRC had no education or only completed primary school (34.7% and 51%, respectively; p < 0.01). The family history of CRC in immediate relatives, NSAID consumption or region of origin did not differ between groups.

Differences in the allele, genotype and haplotype frequencies of IL1B SNPs by case-control status

We selected five IL1B SNPs for which conflicting results regarding their association with inflammation and neoplastic processes had been published in previous studies24,25,26,27,28,29; four of these SNPs are located within the promoter region (−3737 C > T, −1464 G > C, −511C > T and −31 T > C), and the fifth is located in the coding region (+3954 C > T). In single SNP analyses we found that IL1B-1464CC individuals (versus CG + GG) were protected from AP but not CRC (p = 0.04). No other associations were observed (Supplementary Table S2).

Given the high linkage disequilibrium (LD) between the SNPs included in the analysis and due to previous work investigating haplotypes of IL1B in other populations10, we obtained haplotypes for this region based on the 5 genotyped SNPs. The four SNPs located in the promoter region of the IL1B gene are in strong LD and form one haplotype block (Supplementary Fig. S1). Within this haplotype block, only five of 16 possible IL1B promoter haplotypes were found with a frequency greater than 0.01 (Table 1). Because the IL1B-CCTC haplotype (N°4) was the most frequent haplotype in our admixed sample (41%) (Table 1), it was used as the reference in all analyses. According to the unadjusted general lineal model (GLM) analysis, the haplotype IL1B-CGTC (N°3) was associated with an increased risk of CRC (odds ratio, OR 1.39; 95% CI 1.02–1.90; p = 0.03) (Table 1). In the conditional haplotype tests controlling for the IL1B-511/IL1B-31 SNPs, we confirmed that the IL1B-1464G allele (haplotype N°3) was a risk factor for CRC compared to having the IL1B-1464C allele (haplotype N°4) (subnull p = 0.04; likelihood ratio test: chi-square = 5.63, Degrees of freedom (df ) = 2, p = 0.06). This result was concordant with the independent effect of IL1B-1464 test (likelihood ratio test: chi-square = 4.38, df = 1, p = 0.04) (Table 1).

Table 1 IL1B haplotypes and their association with AP and CRC risk in Colombian samples.

Genetic structure and global ancestry estimations

We were able to perform Multidimensional Scaling analyses (MDS) and global ancestry estimations in Colombian samples genotyped with genome-wide and candidate-gene platforms (Supplementary Methods). In both sets, cases and controls from Colombia overlapped, and most of them scattered between the European and Amerindian/Asian reference populations but were more closely related to Europeans; in addition, a small subset of them were close to the Africans (Supplementary Fig. S2). Global ancestry proportions per individual calculated with ADMIXTURE30 are shown in Supplementary Fig. S2. Regarding these estimates, European, Amerindian and African ancestry proportions for genome-wide and candidate-gene sets were strongly correlated for 85 samples genotyped using both platforms, despite the use of different reference populations and numbers of SNPs in each case (Pearson’s correlation coefficients of 0.77, 0.76 and 0.87 for each component, respectively) (Fig. 1C). We also identified a strong correlation between ancestry proportions estimated using two different statistical approaches, i.e., RFMix31 and ADMIXTURE30, in the 393 samples with genome-wide genotypes (0.94, 0.92 and 0.99 for European, Amerindian and African components, respectively) (Fig. 1C).

Figure 1: Differences in global ancestry proportions between cases and controls for genome-wide and candidate-gene genotyped samples (P values for the Wilcoxon rank sum Test are displayed).
figure 1

(A) Ancestry estimations per group for the genome-wide set. The difference in ancestries between the AP and control groups was significant for the European and Amerindian components (p < 0.01) and between the CRC and control groups for the African proportion (p = 0.04). (B) Ancestry estimations per group for the candidate-gene set. The difference in ancestries between the AP and control groups was significant for the European and Amerindian components (p = 0.04 and <0.01, respectively), whereas the African proportion did not significantly differ between the CRC and control groups within these candidate-gene sample sets. (C) Pearson’s correlation per ancestry component. The correlations between genome-wide genotyped samples obtained with RFMix and ADMIXTURE are for 393 Colombians. The correlations between genome-wide and candidate-gene sets obtained with ADMIXTURE are for 85 overlapping samples. 1k-HGDP, 1000 genomes plus Human Genome Diversity Project databases; EUR - AME - AFR corresponds to global European, Amerindian and African components; AP, adenomatous polyps; CRC, colorectal cancer.

Due to the right-skewed distribution of the African ancestry proportions in our admixed samples, we selected a non-parametric test to assess differences in global ancestry estimates among Colombian cases and controls (Fig. 1). We found significant differences in the European and Amerindian ancestry proportions between the AP and control groups for both sets of individuals (genome-wide set: p values < 0.01 for both components; candidate-gene set: p values of 0.04 and <0.01, respectively) (Fig. 1A,B). Moreover, the African ancestry proportions significantly differed between the CRC group and the control group for genome-wide samples only (p = 0.04) (Fig. 1A).

IL1B haplotype association with AP and CRC risk adjusted for global ancestry

Among the 791 samples with both IL1B haplotype information and global ancestry estimations, we tested different multinomial logistic regression analyses that model phenotypes by global ancestry proportions (Supplementary Table S3). The best and least complex model according to the Akaike Information Criterion (AIC) was model 13, which included European and African ancestries along with array, sex, age, educational level and NSAID consumption (Supplementary Table S3.1). According to this model, European ancestry was only associated with AP risk (OR 1.98; 95% CI 1.35–2.91; p < 0.01), whereas African ancestry was associated with both AP (OR 1.12; 95% CI 1.03–1.22; p = 0.01) and CRC risk (OR 1.10; 95% CI 1.03–1.18; p = 0.01). Some of the AP and CRC risk was also explained by age, educational level and NSAID consumption (Supplementary Table S3).

Results from the unadjusted and adjusted GLM analyses to evaluate the effect of IL1B haplotypes on AP and CRC risk among these 791 Colombian samples are shown in Table 2. We found that haplotype IL1B-TGCT (N°5) was associated with an increased risk of AP in the unadjusted (OR 1.45; 95% CI 1.05–2.00; p = 0.02) and adjusted model (OR 1.40; 95% CI 0.98–2.00; p = 0.06) (Table 2). Furthermore, the association of the IL1B-CGTC (N°3) haplotype and CRC risk remained in both the unadjusted (OR 1.55; 95% CI 1.09–2.20; p = 0.02) and the adjusted model (OR 1.46; 95% CI 0.99–2.14; p = 0.06) (Table 2).

Table 2 Association of IL1B haplotypes with AP and CRC risk adjusting for global ancestry and other covariates in Colombian samples.

To identify differences in the effect of these haplotypes on the AP and CRC risk for each Colombian region, we used the same stratification method as that described in Supplementary Fig. S3. The latter showed that people from the Coast have higher African ancestry than those from the Andean region of the country. In this analysis, we saw a trend in the association of the IL1B-TGCT (N°5) haplotype with AP in both regions, Andean (95% CI 0.96–2.40) and Coastal (95% CI 0.93–2.49) (Table 3). Interestingly, we found that the association between the IL1B-CGTC (N°3) haplotype and CRC risk was exclusive to Colombians from the Coast (OR 2.06; 95% CI 1.31–3.25; p < 0.01) (Table 3).

Table 3 Association between IL1B haplotypes and AP or CRC risk among Colombians stratified by region of origin.

IL1B haplotype frequencies among self-reported African American CRC cases and reference populations

The IL1B-CGTC (N°3) haplotype found to be associated with an increased risk of CRC in Colombian subjects from the Coastal region was also the most frequent haplotype in the self-reported African American CRC cases from this study and the African Americans from the Atherosclerosis Risk Communities Cohort (ARIC)10 (Supplementary Table S4). In addition, this haplotype was the least frequent haplotype in Non-Hispanic White populations from the US10. We used information available on HapMap332 via Haploview33, including a simplified haplotype configuration (with IL1B-511 and IL1B-31 SNPs) and found similarities in the haplotype frequencies between the Colombian control group, HapMap3 African Americans32 and African Americans from the ARIC study10 (Supplementary Table S5).

Role of locus-specific ancestry and IL1B risk haplotypes on CRC and AP

The association of IL1B haplotypes and locus-specific ancestry with the risk of AP and CRC were analysed using only the 393 Colombian samples for which the aforementioned information was available; these samples represented only 50% of the 791 sample used previously in the adjusted regression analyses. We saw great variations in the level of excess of local African and European ancestries per marker in chromosome 2 for the CRC and AP groups relative to the control group (Supplementary Fig. S4), and the corresponding −log10 (P-values) obtained for these differences in the GLM analyses adjusted by sex, age, educational level and global ancestries are shown in Fig. 2.

Figure 2: Manhattan plots of the −log10 (P-values) from adjusted GLM analyses that model phenotypes by local ancestry dosage per marker along chromosome 2.
figure 2

(A) Plot of the differences in local African ancestry copies between the CRC and control groups. Green overlapping dots are SNPs within the 2q14 region, which holds the IL1B gene (FDR corrected p = 0.09). (B) Plot of the differences in local European ancestry copies between the AP and control groups. Green overlapping dots are the SNPs in the 2q14 region. CRC, colorectal cancer; AP, adenomatous polyps; FDR, false discovery rate.

Interestingly, the 18 SNPs located in the selected 2q14 region exhibited the largest differences in African ancestry dosage between the CRC group and the control group (overlapped green dots, p = 6.58 × 10−4; false discovery rate, FDR, corrected p = 0.09) (Fig. 2A); although these differences are significant at a nominal level, after correction for multiple testing the association is not significant at the 5% level but it is suggestive (p < 0.1).

To better understand this result, we plotted each ancestry dosage within the 2q14 region by phenotype and found that the CRC group contains a higher proportion of African ancestry dosage (1 or 2 copies) than the AP and control groups (Fig. 3A). Moreover, 50% of the CRC samples with two copies of African ancestry carried two copies of the IL1B-CGTC haplotype, and the remaining 50% carried at least one copy (Fig. 3C).

Figure 3: Ancestry dosage and IL1B-CGTC haplotype copies in Colombian samples by phenotype.
figure 3

(A) Ancestry dosage in Colombian samples by phenotype. This analysis included 393 samples with available locus ancestry estimates. An ancestry dosage of 0, 1 or 2 corresponds to the number of specific ancestry copies in the selected 100000-bp region at locus 2q14 (Chr2:113500000:113600000) that contains the IL1B gene. (B) Local ancestry proportions in locus 2q14 relative to the average locus-specific ancestry across all loci in chromosome 2 by phenotype. (C) Percentage of IL1B-CGTC haplotype copies (0, 1 or 2) within each local African ancestry dosage group (0, 1 or 2). CRC, colorectal cancer; AP, adenomatous polyps; Chr2, chromosome 2; EUR - AME - AFR, corresponds to European, Amerindian and African ancestries; Eur2q_chr2 - Ame2q_chr2 - Afr2q_chr2, corresponds to European, Amerindian or African locus specific ancestry minus their respective ancestry proportion for all of chromosome 2.

Table 4 shows the multinomial logistic regressions conducted to evaluate the effect of global and locus-specific African ancestry on the risk of CRC, including the IL1B-CGTC haplotype copies, adjusted by sex, age, educational level, NSAID consumption and family history of CRC. We found that both global and locus-specific African ancestries were individually associated with an increased risk of CRC (Model 1a and 2a, respectively). When we included the main effects of both variables in the same model, the association with CRC remained significant only for locus-specific African ancestry (OR 3.40; 95% CI 1.05–10.98; p = 0.04; Model 3a). Although we found that the IL1B-CGTC haplotype was associated with CRC risk in Colombians, especially individuals from the Coastal region characterized by having a higher African ancestry, our results for this small set of 393 individuals only suggests this association (95% CI 0.88–2.82; Model 4a). Remarkably, the interaction of IL1B-CGTC haplotype copies with locus-specific African ancestry was associated with CRC (p = 0.03; Model 6a); nevertheless, when testing only the main effects of all three variables, only locus-specific African ancestry was found to be associated with CRC (OR 3.58; 95% CI 0.97–13.30; p = 0.06; Model 7a).

Table 4 Association of ancestry proportions or IL1B risk haplotypes with CRC or AP risk.

For European ancestry dosage, none of the comparisons revealed significant differences or suggestive p values in the adjusted GLM analysis between the AP and control groups after correction for multiple testing, including locus 2q14 (overlapped green dots; Fig. 2B). These results are consistent with those displayed in Fig. 3B, which shows the excess or defect of each locus-specific ancestry within locus 2q14 with respect to the average of each ancestry along chromosome 2.

We also used adjusted multinomial logistic regressions to evaluate AP risk, including global and locus-specific European ancestry, and copies of the IL1B-TGCT haplotype (Table 4). We found that AP risk was mainly explained by the effect of global European ancestry (OR 1.91; 95% CI 1.17–3.13; p = 0.01; Model 7b) and that the data suggested an association with AP risk for the IL1B-TGCT haplotype copies (95% CI 0.83–2.66; Model 4b). As already observed in the multimarker regression analysis for AP (Fig. 2B), the European ancestry proportion in 2q14 does not play an important role in AP risk (Model 7b, Table 4).

Discussion

We found that AP is less common among recessive carriers of the IL1B-1464C allele. Furthermore, the risk of CRC significantly differed between carriers of the IL1B-1464G allele and the IL1B-1464C allele, when considering the haplotype context (N°3 versus N°4). Our results and the literature support the potential effect of this variant in preventing neoplastic changes in some tissues because the IL1B-1464G/C SNP flanks a putative binding site for the proteins DBP, C/EBP alpha and Pit-1a, and the C allele shows a lower transcriptional activity in haplotype context with IL1B-511T and IL1B-31C10,34. Therefore, accounting for haplotype context is important in association studies.

Although increased IL-1 beta production has also been reported for other SNPs in the promoter region of the IL1B gene10,34,35,36,37, associations of these SNPs with cancer are variable and depend on the cancer model and genetic background of the population24,25,26,27,28,29. These inconsistencies are likely related to a small effect or the low frequency of these variants in some of the studied populations, which makes the identification of any association with the disease of interest difficult. Our three-way admixed population provides an opportunity to test differences in the effect of IL1B haplotypes among individuals with varying degrees of admixture. Regarding these analyses, we found that the second most frequent haplotype in Colombian controls, IL1B-TGCT, was associated with AP risk irrespective of the region of origin. Also, our results suggest that most of the AP risk can be attributed to global European ancestry proportions instead of local European ancestry at 2q14, meaning that non-genetic factors associated with the European component could have an important role in the risk of AP. Further genome-wide association analysis taking into account other non-genetic risk factors could help disentangle the observed association between AP risk and global European ancestry in Colombians. We did not find an effect of the IL1B-TGCT haplotype on CRC risk, which could be explained in part by the small sample size or the fact that this haplotype exhibits moderate transcriptional activity in vitro10, making it more suitable as a susceptibility marker of milder lesions that do not necessarily evolve to CRC.

CRC risk was consistently associated with the IL1B-CGTC haplotype in Colombians, especially in individuals from the Coastal region of the country, who exhibit the highest African ancestry proportions. Despite this result, an analysis of 393 individuals with local ancestry inference (LAI) data only suggested an effect of IL1B-CGTC on CRC risk; lack of significance for this association is likely due to the small sample size and the overall low frequency of this haplotype in Colombians (~13%). The important role of African ancestry within locus 2q14 in CRC risk was suggested by the fact that variants within this region showed the lowest p-values in the multimarker regression analysis along chromosome 2. Interestingly, we found a significant interaction between the IL1B-CGTC haplotype and local African ancestry for CRC risk. Because only local African ancestry remained significantly associated with CRC risk when we tested the main effects of the risk haplotype as well as local and global African ancestries, we conclude that additional African-related variants that could explain the risk of CRC may be located within the selected 2q14 region, and fine-mapping methods will help to identify these variants.

The relationship of the IL1B-CGTC haplotype with CRC risk and African ancestry identified herein corroborates previous work showing that this haplotype exhibits the highest transcriptional activity amongst four other possible IL1B promoter haplotypes10 and that it is the most frequent haplotype in African Americans10. Therefore, further population case-control studies among Colombians with different African ancestry proportions, seeking differences in the expression levels of IL-1 beta and its targets (such as COX-2 and PGE2) in colorectal tissues and plasma samples will help prove their utility as susceptibility markers for the risk of CRC in the general population. The identification of such markers will allow individuals who are at an increased risk to be offered effective measures to prevent this type of cancer. For example, according to the atlas of cancer mortality in Colombia published in 201013, two of the cities included in this study from the Andean region, Bogotá DC and Bucaramanga, belong to two regions with 27% and 11% higher risks of CRC mortality, respectively, than the rest of the country. Moreover, whereas among the cities included from the Caribbean Coast and surrounding areas, Cartagena, Barranquilla and Santa Marta, the risk of CRC mortality is lower than in the general population, in Cali and its surroundings in Valle del Cauca located in the Pacific Coast, the risk of CRC mortality is 22% higher than that in the rest of the country13. Interestingly, Cali has the highest concentration of Afro-Colombians38.

Overall, these results are of particular interest due to previous observations about disparities in CRC risk among different population groups, such as differences between Non-Hispanic White and African American U.S. populations, with respect to inflammatory diseases and cancer development11,39. Our findings highlight the importance of conducting genetic studies in admixed populations to reveal potential population-specific susceptibility markers of risk for complex diseases.

Methods

Subjects

This is a multicentre, hospital-based case-control study that was conducted in six Colombian cities from the Andean (Bogotá and Bucaramanga) and Coastal regions (Cartagena, Santa Marta and Barranquilla in the Caribbean and Cali in the Pacific). These cities are characterized by differences in CRC mortality risk and ancestry proportions. A total of 306 CRC cases, 191 AP and 500 matched controls between age 30 and 74 years were enrolled. All cases were incident and confirmed by histopathology. The control group consisted of individuals without gastrointestinal symptoms attending the outpatient services of primary care units. Neither cases nor controls had a personal history of other cancers, and neither group received chemotherapy or radiotherapy. Each participant provided written informed consent. This work was approved by the Ethics Committee of the Instituto Nacional de Cancerología, Bogotá, Colombia as a “study with greater than minimal risk” according to guidelines established in the document “RESOLUCIÓN 8430 DE 1993” for Ethical Aspects of Human Research, (Title II, Chapter 1) published by the República de Colombia Ministerio de Salud (https://www.invima.gov.co/images/pdf/medicamentos/resoluciones/etica_res_8430_1993.pdf).

We also included samples from 177 self-reported African American patients with CRC diagnosed at the Howard University College of Medicine in Washington, D.C. and the Ochsner Clinic Foundation in New Orleans, LA for IL1B SNPs genotyping to compare the IL1B haplotype frequencies with those of Colombian samples. Because these samples were de-identified FFPE tissues and the study was retrospective, the LSUHSC IRB following the recommendations for the use of human tissues and under strictest HIPAA protocols classified this specific study as an “Institutional Review Board (IRB) exempt” study.

All methods were performed in accordance with the declaration of Helsinki and local relevant guidelines and regulations, as already mentioned.

DNA extraction

DNA was extracted from 200 μl of buffy coat using the QIAamp DNA Blood Mini Kit (QIAGEN, Valencia, CA, USA) according to the manufacturer’s protocol. The DNA was resuspended in 100 μl of Ambion Nuclease-free Water (Ambion, Foster City, CA, USA) and stored at −20 °C. DNA purity and concentration were assessed using a NanoDrop 2000 (Thermo Scientific, Wilmington, DE, USA).

IL1B SNPs genotyping

All subjects were genotyped for five IL1B SNPs using TaqMan SNP Genotyping Assays Kits (Applied Biosystems, Foster City, CA, USA) according to the manufacturer’s protocol. Four of these SNPs are located in the promoter region of the gene: −3737 C > T (rs4848306), −1464 G > C (rs1143623), 511 C > T (rs16944) and −31 T > C (rs1143627). The fifth SNP +3954 C > T (rs1143634) is located within exon 5. After PCR amplification, the genotype was determined using the Sequence Detection System (SDS) software (Applied Biosystems, Foster City, CA, USA). DNA controls with known genotype for each SNP were run in parallel. Hardy-Weinberg equilibrium for IL1B-genotyped SNPs was assessed using the exact test statistics PHWE in PLINK40,41. All IL1B SNPs were in Hardy-Weinberg equilibrium (p > 0.05) (data not shown).

Global ancestry estimation

We used two different platforms from Illumina®, a candidate-gene “Cancer SNP Panel” array that includes 1421 SNPs, and a genome-wide “Infinium® OmniExpressExome Array”, which includes 958178 SNPs, according to the manufacturer’s instructions, to genotype 521 and 443 Colombian samples, respectively.

Quality control (QC) and pruning steps were performed separately due to large differences in the number of markers tested within each array. These steps were based on the protocol described by Anderson et al.42 using PLINK v1.0740 and the R statistics v3.2.243 software, as recommended. All datasets were lifted over to the human genome build 37 coordinates as necessary. After QC steps for the candidate-gene dataset (Supplementary Methods), a total of 1237 SNPs and 483 samples remained for further analysis. For genome-wide data, a total of 720815 SNPs and 415 samples remained for further analysis after QC procedures (Supplementary Methods).

We used the unsupervised model-based mode in ADMIXTURE30 for individual ancestry estimations using genotypes from autosomal markers and K = 3 ancestral populations individually for candidate-gene and genome-wide datasets. For candidate-gene data, reference populations from the HapMap3 project32 public database were used (CEU = Utah residents of Northern and Western European ancestry; LWK = Luhya in Webuye, Kenya; and CHB = Han Chinese in Beijing, China; MEX = Mexican ancestry in Los Angeles, California) (Supplementary Methods). Because the allele frequencies are similar between Asians and Amerindians, we used the CHB reference population to discriminate the Amerindian component16,44. The final merged and pruned candidate-gene database used to infer global ancestry in 483 Colombian samples included 473 overlapping SNPs. For genome-wide data, we included the genotypes of reference populations from public databases, 1000 genomes45 (IBS = Iberian Population in Spain and YRI = Yoruba in Ibadan, Nigeria) plus HGDP46 (AME = Amerindians, which includes Pima, Maya, Karitiana, Surui and Colombian Native Americans) (Supplementary Methods). The final merged and pruned genome-wide database used to infer global ancestry in 415 Colombian samples included 9663 overlapping SNPs.

After eliminating duplicates between both arrays (n = 85 samples) and selecting samples also genotyped for IL1B SNPs, a total of 791 Colombian samples with available global ancestry estimates remained for further adjusted regression analyses.

Local ancestry inference (LAI)

Among all available methods for LAI47,48,49,50,51, we selected a discriminative approach supported by the RFMix v1.5.4 software31 because it performs an iterative analysis that, beginning with unadmixed panels, also utilizes information in the chromosomes of admixed individuals to infer local ancestry. This approach allows refining our knowledge of haplotype patterns in the ancestral populations and improves accuracy via expectation maximization (EM) steps31.

For the LAI procedures, we used a reference panel that included the same populations from 1000 genomes45 and HGDP46 databases incorporated previously to calculate global ancestry proportions. Because RFMix31 requires phased haplotypes input, we first phased genotypes from our genome-wide Colombian samples and HGDP46 data using 1000 genomes45 as a reference prior to the merging steps (Supplementary Methods). Specifically, we used the segmented haplotype estimation and imputation tool, SHAPEIT v2.77852, based on the modelling of stochastic processes through a Hidden Markov Model to phase genotypes. To correct for admixture in our reference populations, we performed five EM iterations, as recommended31.

Statistical analysis

A descriptive analysis based on the characteristics of the 997 Colombian samples genotyped for the IL1B gene SNPs was performed in R43 using Pearson’s Chi-Squared Test to assess for differences among groups. Full model association tests between the disease and each SNP were performed in PLINK40. LD among all five IL1B SNPs was calculated using the Haploview algorithm33.

Global ancestry estimates from ADMIXTURE30 with candidate-gene or genome-wide datasets were obtained for 791 unique Colombian samples with IL1B haplotype information. Furthermore, the LAI results from RFMix31 were used to calculate global ancestry as the average locus-specific ancestry across all loci for each individual with genome-wide data. Pearson’s correlation per ancestry component was computed between all estimates. Differences in ancestry proportions between the control and CRC or AP group were assessed with the Wilcoxon rank sum test for ancestry estimates obtained with ADMIXTURE30.

Because the sample size in this study is at best moderate and the observed African ancestry proportions were substantially skewed, we transformed the European and African ancestry components to a symmetric distribution using a logit transformation before conducting adjusted regression analyses. In these models, Amerindian ancestry was treated as the reference, and global ancestry proportions were always corrected by the “array” variable, referring to candidate-gene or genome-wide estimates.

Multinomial logistic regressions of phenotypes, including global ancestry estimates, sex, age, educational level, city of origin, NSAID consumption and a family history of CRC as explanatory covariates, were conducted with R43. The AIC was used to select the best model.

The IL1B haplotype frequencies among Colombian and African American CRC samples were inferred in R43, through EM computation of haplotype probabilities with progressive insertion of loci. Conditional haplotype tests were conducted in PLINK40 to evaluate differences in disease risk related to haplotype context when controlling for some SNPs or when assessing for their independent effect. The odds ratio (OR) for each IL1B haplotype and disease risk were obtained for Colombian samples using unadjusted and adjusted GLM in R43 while controlling for sex, age, educational level, global ancestry proportions and array variables. GLM analyses of IL1B haplotypes and disease risk were conducted in a stratified analysis by region of origin, using sex, age and educational level as covariates. In all cases, the most frequent haplotype was used as the reference, and rare haplotypes (less than 0.01) were not included in the analyses.

For the 393 Colombian samples with both IL1B haplotypes and LAI information, we estimated global ancestry for chromosome 2 as the average locus-specific ancestry across all loci. We also selected a 100000-bp region at locus 2q14 (Chr2:113500000:113600000; 18 SNPs; build 37) that holds the IL1B gene to calculate locus-specific ancestry. We plotted the variation of each ancestry proportion per marker in chromosome 2 in the CRC and AP groups relative to those in the control group used as the baseline and conducted a multimarker GLM analysis for CRC risk. Specifically, the African dosage per marker at chromosome 2 corrected for sex, age, educational level and global African ancestry was used as an explanatory variable. We conducted the same analysis comparing the AP and control groups but used the European dosage per marker at chromosome 2 and global European ancestry instead. Corresponding - log10 (P-values) for each analysis are displayed in Manhattan plots. The computed P-values were adjusted for multiple comparisons using the FDR correction for 28579 test/variants along chromosome 2.

Finally, we performed multinomial logistic regression analyses that compared the respective IL1B risk haplotypes copies and the effect of global and locus-specific African or European ancestry of the control group with those of the CRC and AP groups, and these comparisons were adjusted for sex, age, educational level, NSAID consumption and a family history of CRC.

Additional Information

How to cite this article: Sanabria-Salas, M. C. et al. IL1B-CGTC haplotype is associated with colorectal cancer in admixed individuals with increased African ancestry. Sci. Rep. 7, 41920; doi: 10.1038/srep41920 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.