Original Article

Oncogene (2012) 31, 3777–3784; doi:10.1038/onc.2011.564; published online 12 December 2011

The 14q22.2 colorectal cancer variant rs4444235 shows cis-acting regulation of BMP4

S J Lubbe1, A M Pittman1, B Olver1, A Lloyd1, J Vijayakrishnan1, S Naranjo2, S Dobbins1, P Broderick1, J L Gómez-Skarmeta2 and R S Houlston1

  1. 1Section of Cancer Genetics, Institute of Cancer Research, Surrey, UK
  2. 2Centro Andaluz de Biología del Desarrollo, CSIC-UPO, Carretera de Utrera Km1, Seville, Spain

Correspondence: Professor RS Houlston, Section of Cancer Genetics, Institute of Cancer Research, Sutton, Surrey SM2 5NG, UK. E-mail: Richard.houlston@icr.ac.uk

Received 13 August 2011; Revised 3 October 2011; Accepted 3 November 2011
Advance online publication 12 December 2011



Common genetic variation at human 14q22.2 tagged by rs4444235 is significantly associated with colorectal cancer (CRC) risk. Re-sequencing was used to comprehensively annotate the 17kb region of strong linkage disequilibrium encompassing rs4444235. Through bioinformatic analyses using H3K4Me1, H3K4Me3, and DNase-I hypersensitivity chromatin signatures and evolutionary conservation we identified seven candidate disease-causing single-nucleotide polymorphisms mapping to six regions within the 17-kb region predicted to have regulatory potential. Reporter gene studies of these regions demonstrated that the element to which rs4444235 maps acts as an allele-specific transcriptional enhancer. Allele-specific expression studies in CRC cell lines heterozygous for rs4444235 showed significantly increased expression of bone morphogenetic protein-4 (BMP4) associated with the risk allele (P<0.001). These data provide evidence for a functional basis for the non-coding risk variant rs4444235 at 14q22.2 and emphasizes the importance of genetic variation in the BMP pathway genes as determinants of CRC risk.


bone morphogenetic protein-4; colorectal cancer; cis-regulatory



Many colorectal cancers (CRCs) develop in genetically susceptible individuals, most of whom are not carriers of germline mismatch repair or APC gene mutations (Lichtenstein et al., 2000; Aaltonen et al., 2007; Lubbe et al., 2009). Much of the heritable risk of CRC is now thought to be the consequence of the co-inheritance of multiple low-risk variants. Such an assertion is supported by recent genome-wide association studies (GWASs) of CRC (Broderick et al., 2007; Tomlinson et al., 2007, 2008; Zanke et al., 2007; Houlston et al., 2008, 2010; Jaeger et al., 2008; Tenesa et al., 2008).

As the single-nucleotide polymorphisms (SNPs) genotyped in GWAS are generally not themselves strong candidates for causality, enumeration of the genetic and functional basis at a specific disease locus is a challenge. As demonstrated by recent studies of the 8q23, 8q24 and 18q21 risk loci for CRC, dissecting the genetic and functional basis of associations identified by GWAS can provide novel insights into cancer biology (Pittman et al., 2009, 2010; Pomerantz et al., 2009; Tuupanen et al., 2009).

We have shown recently that common variation at chromosome 14q22.2 annotated by the SNP rs4444235 influences CRC risk (Houlston et al., 2008). To elucidate a basis of this association we have systematically interrogated the 14q22.2 association signal through targeted re-sequencing, linkage disequilibrium (LD) mapping and functional analyses. We demonstrate that rs4444235 shows cis-acting regulation of bone morphogenetic protein-4 (BMP4) expression. These data provide strong support for the functional significance of this SNP contributing to the association observed for CRC at this locus.



Our previously published GWAS data (Houlston et al., 2008) provided evidence for an association at the 14q22.2 locus annotated by rs4444235. Assuming there is high LD between rs4444235 and a functional disease-causing variant (that is, r2greater than or equal to0.5), we identified a 17-kb genomic region of LD encompassing rs4444235 (chr14q22.2:53477192–53494200; UCSC March 2006 assembly, NCBI build 36.1) likely to harbor a functional variant (Figure 1).

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The 17-kb colorectal cancer locus tagged by rs4444235, including the BMP4 gene. (a) A regional plot using data from the original GWAS showing the single-marker association statistics (as −log10P, left y-axis) in the combined analysis of phase-1 (red dot), phases 1 and 2 (blue dots), and all phases (yellow dot) as a function of genomic position (NCBI build 36.1) (Houlston et al., 2008). The recombination rate across each region derived from the HapMap CEU samples is shown in black (right y-axis). The relative position of the BMP4 gene is shown as a green box. Also highlighted is the region identified for re-sequencing and annotation in the variant discovery panel. (b) The regional plot showing the single-marker association statistics assessed from the variant discovery panel (green dots), the replication phase (blue dots), the variants identified using MACH 1.0 (purple dots) and the coding variant used for allele-specific expression analysis (red dots) as a function of genomic position (NCBI build 36.1). (c) MACH 1.0 quality imputation scores for each variant. (d) The LD structure encompassing rs4444235 derived by the Haploview software (v4.1). The seven candidate variants for luciferase expression analysis from the replication phase (five blue triangles) and from MACH 1.0 analysis (two purple triangles) are shown. The coding SNP rs17563 used for detection of allele-specific expression analysis is denoted by a red triangle.

Full figure and legend (117K)

To annotate this 17-kb region we re-sequenced constitutional DNA from 90 CRC cases. Only 708bp (4.2%) of the 17-kb region was refractory to re-sequencing owing to a low-complexity genomic sequence. We identified 68 variants (Supplementary Table 1); these included three insertion/deletion polymorphisms and 22 common variants (minor allele frequency greater than or equal to0.05) of which only six had been genotyped by HapMap. We calculated pairwise LD statistics between each of these 22 common SNPs in respect of rs4444235 and 12 polymorphisms showed evidence of high LD with rs4444235 (r2greater than or equal to0.50; Figure 1 and Supplementary Table 1). These 13 polymorphisms, including rs4444235, were genotyped in 3665 CRC cases and 2891 controls, which had not been subject of previous genetic analyses. The strongest associations were shown for rs4444235 (P=0.012), rs12898159 (P=0.024), rs35107139 (P=0.006), rs10130587 (P=0.009) and rs2855532 (P=0.028) (Figure 1 and Supplementary Table 2). To explore the possibility that we may have failed to identify a potential candidate variant, we re-examined the association using data generated from sequencing the 90 cases, and imputed all untyped SNPs in the case–control cohort and tested them for association with CRC. No additional SNP to those directly genotyped provided superior evidence for an association, indicating our selection of candidate SNPs based on LD criterion was sufficient to delineate a candidate disease-associated variant. From this analysis rs2761884 (P=0.012) and rs2738265 (P=0.012) also showed evidence of an association with CRC (Figure 1 and Table 1). The SNPs rs4444235, rs12898159, rs35107139, rs10130587, rs2855532, rs2761884 and rs2738265 were strongly correlated with one another (that is, pairwise r2>0.4; Figure 1), and defined a single risk haplotype. Collectively these data support the hypothesis that one of these seven variants is likely to underscore the 14q22.2 association.

Cross-species sequence comparison of the 17-kb interval showed that this genomic region is highly conserved between mammals (Figure 2). Sequence conservation in non-coding regions has been shown to be a predictor of cis-regulatory sequences (Gomez-Skarmeta et al., 2006). Furthermore, it has been proposed that variation with evolutionarily conserved regions (ECRs) is likely to be associated with phenotypic differences, which contribute to expression of traits (Gomez-Skarmeta et al., 2006). Regions of the genome prone to H3K4Me1/3 modifications are known to influence expression by influencing chromatin accessibility, and regions hypersensitive to DNase-I cutting are indicative of regulatory regions. To further examine the nature of the sequence within the 17-kb region of association, we implemented a number of computational methodologies. Using the dcode ECR Browser and the UCSC Genome Browser we examined H3K4Me1, H3K4Me3 and DNase-I hypersensitivity chromatin signatures in publicly available cell line data. The seven candidate disease-causing SNPs mapped to six regions predicted to have regulatory potential (Figure 2)—specifically annotated by rs4444235, rs12898159, rs2855532, rs2761884, rs2738265, rs35107139 and rs10130587, which map 3bp apart and effectively define a single motif.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The 17-kb region of 14q22.2 showing the relationship between the seven candidate variants and ECRs, H3K4Me1 and H3K4Me3 modifications, and DNase-I hypersensitivity clusters. Regions of DNase-I hypersensitivity are shown by light blue bars. Regions prone to H3K4Me1 or H3K4Me3 modification in the different cell lines are shown. ECRs between species and human are denoted by pink lines relative to BMP4 (blue, coding exons; yellow, untranslated regions; red, intergenic regions; salmon, intronic regions; green, transposons and simple repeats of the gene). Also shown are the genome regions, which were subjected to reporter assays.

Full figure and legend (187K)

To directly evaluate the enhancer activity of the six putative regulatory regions, we cloned DNA fragments containing the six conserved islands, incorporating the different alleles of rs4444235, rs12898159, rs2855532, rs2761884, rs2738265, and rs35107139 and rs10130587, into luciferase (Luc2) reporter vectors in the human CRC cell lines RKO and LoVo. For five of the six regions we detected enhancer activity with both alleles and there was no significant difference in allele-specific enhancer activity (Supplementary Figure 1). By contrast, the region incorporating the risk allele of rs4444235 was associated with a significant increase in transcriptional activity compared with the protective allele (Figure 3). These data are consistent with the enhancer element defined by rs4444235 acting in an allele-specific manner on transcription.

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Luciferase reporter gene activity for the rs4444235 construct cloned into pGL3-Control vector. The ratio of luminescence from the experimental pGL3-rs4444235 construct to the luminescence from the Renilla internal control, pRL-CMV reporter, was calculated to define the relative luciferase activity, and was compared between the risk allele (red) and the protective allele (green). The risk allele shows enhancer activity in the LoVo and RKO CRC cell lines, in comparison with activity associated the protective allele and the Renilla internal control (blue). A full colour version of this figure is available at the Oncogene journal online.

Full figure and legend (65K)

Direct evidence supporting allele-specific functional affinity for rs4444235 was provided by electrophorectic mobility-shift assays (EMSAs) showing reduced affinity for nuclear protein–DNA complex formation with the risk allele in human lymphoblastoid as well as LoVo and RKO CRC cell lines (Figure 4). Using the EEL and TFSEARCH programs we searched for transcription factors (TFs) likely to differentially bind to the genomic region defined by rs4444235. These respectively predicted SRY-box-7 (SOX7), which influences Wnt/β-catenin-stimulated transcription (Takash et al., 2001), and early growth response-1 (EGR1), a direct regulator transforming growth factor-β1 (TGF-β1; Liu et al., 1996) as potential TFs at rs4444235. Using SOX7- and EGR1-specific antibodies we sought to validate these TF predictions through ‘supershift’ EMSA. Neither antibody, however, produced a visible ‘supershift’ indicative with disruption of TF binding (data not shown).

Figure 4.
Figure 4 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

EMSA of rs4444235 showing differential binding of the nuclear protein for the T-allele (protective allele) and the C-allele (risk allele). Autoradiographs for binding of double-stranded T-allele and C-allele probes to lymphoblastoid, LoVo and RKO CRC nuclear extracts are shown, indicating a marked reduction of DNA–protein binding associated with the risk allele in the lymphoblastoid nuclear extract and a slight but visible reduction in the LoVo and RKO CRC nuclear extracts.

Full figure and legend (93K)

To examine for somatic selection of the risk allele at rs4444235 in CRC we studied the allele-specific expression (ASE) of BMP4 in three different CRC cell lines, which were heterozygous for rs4444235 (CaCo2, HCA7, HT29). The strong correlation between rs4444235 and the coding SNP rs17563 (r2=0.74, D′=0.92), whereby the risk allele rs4444235 is in LD with the A-allele at rs17563 (which causes the conservative valine-to-alanine amino-acid substitution) allowed us to implement real-time PCR to quantify ASE (Figure 5). In each of the three CRC cell lines, which were heterozygous for both rs4444235 and rs17563, the expression of the A-allele transcript was significantly higher than that of the G-allele transcript, consistent with rs4444235 having a cis-acting regulatory influence on BMP4 (Table 2). Allelic imbalance is a potential confounder in ASE-based studies. The CRC cell lines analyzed have not, however, been documented previously to harbor chromosome 14q22.2 anomalies: HT29 (Wellcome Trust Sanger Institution, CGP LOH and Copy-Number Analysis: http://www.sanger.ac.uk/cgi-bin/CGP/cghviewer/; Abdel-Rahman et al., 2001; Kawai et al., 2002), CaCo2 (Tsushimi et al., 2001; Gaasenbeek et al., 2006) and HCA7 (Abdel-Rahman et al., 2001). Furthermore, sequencing of the three cell lines provided no evidence for allelic imbalance at rs4444235 or rs17563 (data not shown).

Figure 5.
Figure 5 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Allele-specific BMP4 expression analysis using real-time quantitative PCR. (a) The log10 of FAM intensity/VIC intensity was plotted against the log10 of FAM allele/VIC allele of DNAs known to be homozygous for either the risk allele or the protective allele at rs17563 mixed at various ratios (4:1, 2:1, 1.5:1, 1.4:1, 1.3:1, 1.2:1, 1.1:1, 1:1, 1:1.1, 1:1.2, 1:1.3, 1:1.4, 1:1.5, 1:2 and 1:4; VIC allele/FAM allele). The average allele ratio of gene expression per heterozygous CRC cell line cDNA sample was extrapolated by intercepting the constructed standard curve after real-time quantitative PCR. (b) Real-time quantitative PCR amplification of cDNA samples from the HT29 and CaCo2 colorectal cancer cell lines indicating a subtle increase in expression associated with the risk allele (red) compared with the protective allele (blue) for rs17563 within BMP4. A full colour version of this figure is available at the Oncogene journal online

Full figure and legend (92K)



Increased expression of BMP pathway genes, resulting from downstream suppression of Wnt signaling (He et al., 2004), leading to aberrant β-catenin activation, is a common event in the pathogenesis of many cancers (Polakis, 2000). In terms of CRC development, controlled regulation of BMP signaling has been proposed to be vital for maintenance of Wnt signaling in order to inhibit differentiation of basal crypt cells of the colonic epithelium (Kosinski et al., 2007). Indeed, in CRC, overexpression of BMP4 confers a more invasive and migratory phenotype (Deng et al., 2007). Here we have demonstrated that possession of the risk allele of rs4444235 shows cis-acting regulation of BMP4.

We have shown previously that the CRC risk allele at 18q24 is associated with impaired enhancer function of SMAD7 (Pittman et al., 2009). Reduction of the levels of this BMP-signaling antagonist would also cause increased function of the BMP pathway. Collectively these data provide evidence that generation of high BMP signaling levels provides a mechanism for both CRC predisposition and maintenance of the malignant phenotype of CRC.

Although our studies are consistent with CRC being associated with increased BMP4 expression caused by the risk allele rs4444235 affecting an enhancer, we do not exclude the possibility that the regulatory region we identified influences other genes through cis- and trans-effects. Furthermore, our findings do not preclude other risk alleles mapping to 14q22.2 also having a functional effect on BMP4 expression and hence CRC risk. Given evidence for an additional independent risk locus for CRC at 14q22.2 (Tomlinson et al., 2011) the architecture of inherited susceptibility to CRC through genetic variation BMP4 is likely to be complex.

We found a correlation between reduced affinity for nuclear protein–DNA complex formation in EMSA and increased enhancer activity of the regulatory element annotated by rs4444235 in the presence of the risk allele. These results are compatible with the risk allele reducing the affinity of a repressor factor to this regulatory element. Although both EGR1 and SOX7 represented attractive TF candidates, we were unable to demonstrate the involvement of either TF impacting on the enhancer element annotated by rs4444235. Hence elucidation of the transcriptional basis of enhancer control at this locus will be contingent on further studies.

We have recently provided SNP association data supporting the existence of an additional independent CRC susceptibility variant (rs1957636) mapping ~136kb centromeric to the BMP4 gene (Tomlinson et al., 2011). Previous studies investigating the regulation of BMP4 transcription have identified long-range cis-acting multiple enhancer elements mapping up to 200kb telomeric and centromeric to the BMP4 promoter (Pregizer and Mortlock, 2009). Collectively these data raise the possibility that multiple risk alleles at independent loci at 14q22.2 may influence CRC risk through differential BMP4 expression.

The risk allele of the 8q24 variant rs6983267 has been reported to be preferentially amplified during the development of CRC. A similar mechanism of allele-specific somatic selection in tumorigenesis does not appear to operate with respect to 14q22.2 variation.

Several CRC risk alleles map to non-coding genomic regions in the vicinity of other TGF-β-signaling pathway genes, notably GREM1, BMP2, SMAD7 and LAMA5 (Broderick et al., 2007; Houlston et al., 2008, 2010; Jaeger et al., 2008; Tenesa et al., 2008). From our analyses of 14q22.2 and 18q24 risk variants we could predict that, if linked to cis-regulatory regions, alleles at these loci are likely to confer CRC risk by impacting on differential gene expression through similar mechanisms to that described here.

Here we have used the relative expression levels of two SNP alleles within BMP4 in the same sample, instead of the total expression level for demonstrating the cis-acting regulatory action of the SNP. A major advantage of the ASE approach is that the two SNP alleles are measured in the same cellular environment, thereby serving as internal standards for each other to control for other cis-acting genetic factors and environmental factors that may cause differences in expression levels between samples. Given that many cancer associations appear to be mediated through differential gene expression rather than impacting on the protein sequence, as previously discussed ASE is likely to offer a powerful means of establishing allele-specific effects on gene expression for other disease-causing loci (Milani et al., 2007).

In conclusion, we have demonstrated that rs4444235 acts as a cis-regulator of BMP4, thus providing a basis for CRC risk through differential expression and further emphasizes the importance of genetic variation in the BMP pathway genes as determinants of CRC risk.


Materials and methods


Medical ethical committee approval for this study was obtained from the UK Multi Research Ethics Committee (protocol number MREC 02/0/97).

Re-sequencing SNP discovery panel

Ninety patients with CRC were ascertained through the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry (50 male; mean age at diagnosis 59 years). All patients were unrelated UK residents with self-reported European ancestry.

Genotyping cohort

A total of 3665 CRC cases (2142 male; mean age at diagnosis 59 years) were ascertained through The National Study of Colorectal Cancer Genetics (Penegar et al., 2007). In all cases CRC was defined according to the ninth revision of the International Classification of Diseases by codes 153–154 and all cases had histologically proven colorectal adenocarcinoma. A total of 2891 healthy individuals (1102 male; mean age at sampling 59 years) were recruited as part of ongoing National Cancer Research Network genetic epidemiological studies, The National Study of Colorectal Cancer Genetics (n=1192), the Genetic Lung Cancer Predisposition Study (1999–2004; n=561) (Eisen et al., 2008) and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History DNA Registry (1999–2004; n=1138). These controls were spouses or unrelated friends of patients with malignancies. None had a personal history of malignancy at the time of ascertainment. All cases and controls were UK residents and had self-reported European ancestry, and there were no obvious differences in the demography of cases and controls in terms of place of residence.


Sequence changes in the chromosome 14q22.2.2 interval 53477192–53494200bp (UCSC March 2006 assembly, NCBI build 36.1) were identified by direct sequencing. PCR and sequencing primers were designed by the Primer3 software (PCR primer sequences and conditions available on request). Amplicons were sequenced by ABI chemistry (BigDye v3.1; Applied Biosystems, Foster City, CA, USA) and implemented on ABI 3730 × l DNA analyzers (Applied Biosystems). Sequence reads were analyzed by using the Mutation Surveyor software v3.10 (Softgenetics, State College, PA, USA).


DNA was extracted from EDTA-venous blood samples using conventional methodologies and quantified using PicoGreen (Invitrogen, Renfrew, UK). Custom genotyping was conducted using KASPar (KBiosciences, Hertfordshire, UK) or the Pyrosequencing technology (Qiagen, Crawley, UK) where KASPar primers could not be accurately designed. Assay details available on request. Genotyping quality control was tested using duplicate DNA samples within studies and SNP assays. For all SNPs, >99.9% concordant results were obtained. A sample was removed from analysis if it failed genotyping for more than two SNPs.

Luciferase assay

The allele-specific fragments of each region were amplified from human DNA using primers (detailed in Supplementary Table 3), cloned into the PCR8/GW/TOPO vector and then transferred into pGL3 luc2 vectors by using the gateway technology. The gateway-compatible pGL3 luc2 vectors used in this study have been described recently (Houlston et al., 2010). pGL3 luc2 constructs were amplified in Escherichia coli followed by purification of plasmid DNA using Qiagen Endotoxin-free Maxi-prep kits. LoVo (human colon adenocarcinoma) and RKO CRC cell lines (ECACC, Salisbury, UK) were grown in F12 (Ham's) and McCoy's 5a culture medium, respectively, supplemented with 10% fetal calf serum (37°C, 100% relative humidity, 5% CO2). Cultured cells were seeded at 2.7 × 105cells/well in 96-well tissue culture plates (Greiner, Stonehouse, UK) in 200μl of media and grown for ~24h until 80% confluence. Transient transfection was performed with the Transfast transfection reagent (Promega, Southampton, UK) at a charge ratio of 1:1 of transfection reagent to DNA in serum-free medium. In each well, cells were transfected with 150ng of pGL3-construct DNA and 5ng of the internal control plasmid DNA (pRL-CMV; Promega) that encodes the Renilla luciferase gene under the control of the cytomegalovirus (CMV) promoter. Six replicates of cells, both LoVo and RKO, were transfected by each reporter construct. Each transfection experiment was repeated twice. Transiently transfected cells were grown for 48h, following which the luciferase assay was performed using the Dual-Glo luciferase assay system (Promega) as per the manufacturer's instructions. Firefly luciferase (from the pGL3 constructs) and Renilla luciferase (from the pRL-CMV internal control) were measured sequentially on a 96-well (Dynex Inc., West Sussex, UK). The ratio of luminescence from the experimental reporter to the luminescence from the control reporter was calculated for each sample, defined as the relative luciferase activity. Difference in relative activity of each experiment was assessed by Mann–Whitney test.

Electrophoretic mobility-shift assay

Biotin end-labeled and unlabeled complementary oligonucleotide probes (5′-ACAGCCCTGATACTA[T/C]GTCCAGGCAGCTTAA-3′-biotin and 5′-TTAAGCTGCCTGGAC[A/G]TAGTATCAGGGCTGT-3′) (Invitrogen, Crawley, UK) were annealed to generate double-stranded EMSA probes. Nuclear protein was extracted from a lymphoblastoid cell line using NE-PER nuclear and cytoplasmic extraction kits (Thermo Scientific, Loughborough, UK). EMSA experiments were performed using the Lightshift Chemiluminescent EMSA kit (Pierce, Thermo Scientific). Each 20- μl binding reaction contained 20fmols of biotin end-labeled target DNA, 10 × binding buffer, 50ng poly(dI.dC), 2.5% glycerol, 0.05% NP-40 and ~5μg of nuclear protein extract. After 20-min incubation, reactions were electrophoresed for 1h at 100V in a 6% polyacrylamide gel (0.5% TBE running buffer) and then electroblotted for 1h at 30V. Chemiluminescent detection of biotin end-labeled DNA was performed with a streptavidin–horseradish peroxidase conjugate captured onto X-ray film and developed according to the manufacturer's instructions. Omitting nuclear extract and addition unlabelled probes (1000-fold excess) served as controls. ‘Supershift’ EMSAs were conducted using antibodies specific for SOX7 and EGR1 (Abcam, Cambridge, UK) adopting a similar methodology as described previously (Fried and Crothers, 1981; Garner and Revzin, 1981). Antibodies at a concentration of 1–10μg/μl were individually added to the EMSA–reaction mixture and incubated for a further 10min after the initial 20-min incubation. Reactions were treated the same as mentioned above, as was the chemiluminescent detection.

Allele-specific expression analysis

DNA and total RNA were extracted from five human CRC cell lines (HT29, HCA7, CaCo2, LoVo and RKO). Sequencing showed that three of the five cell lines were heterozygous for both the rs17563 and rs4444235 genotype (HT29, HCA7 and CaCo2), and that the LoVo and RKO cell lines were homozygous for the G and A alleles, respectively, of rs17563. Peak heights were measured from sequencing chromatograms using Mutation Surveyor software v3.10 (Softgenetics) and were used to calculate allelic ratios when assessing allelic imbalance at the 14q22.2.2 locus. Complementary DNA was synthesized in duplicate using the Transcriptor High Fidelity cDNA synthesis kit according to the manufacturer's instructions (Roche, Hertfordshire UK).

The genotyping assay protocol (Applied Biosystems) to examine allele-expression BMP4 expression used the methodology of Lo et al. (2003). By using various dilutions of cDNA and DNA from LoVo and RKO a standard curve of the ratios of gene expression between the G (VIC-labeled) and A (FAM-labeled) alleles was deduced by measurement of fluorescence intensity (Lo et al., 2003). DNA and cDNA were diluted to 2.5–5.0ng/μl. Genomic DNA from LoVo and RKO was mixed together in the following ratios: 4:1, 2:1, 1.5:1, 1.4:1, 1.3:1, 1.2:1, 1.1:1, 1:1, 1:1.1, 1:1.2, 1:1.3, 1:1.4, 1:1.5, 1:2 and 1:4 (VIC allele/FAM allele) assuming that a small difference in expression would be detected. We used the ABI PRISM 7900HT Sequence Detection System, and the TaqMan SNP Sequence Detection Systems software (SDS 2.0; Applied Biosystems) was used to automatically collect and analyze data, and generate genotype calls. For each mixing ratio, the log of the ratio of fluorescence intensity data (FAM intensity/VIC intensity) generated at the end of cycle 40 was used to construct a standard linear regression curve, y=a+bx, where y is the log of the FAM intensity/VIC intensity ratio at a given mixing ratio, x is the log of the mixing ratio, a is the intercept and b is the slope. Each sample for each cDNA duplicate was repeated five times and the average allele ratio of gene expression per heterozygous CRC cell line cDNA sample was extrapolated by intercepting the log of FAM intensity/VIC intensity on the standard curve after real-time quantitative PCR. Differences in expression were determined by comparing the observed the G-allele/A-allele intensity ratios of rs17563 between CRC cell lines and that observed assuming equal expression, that is, the intensity ratio of samples mixed at a 1:1 allelic ratio.

Statistical and bioinformatic analysis

Statistical analyses were performed using STATAv10 (Stata Corp., College Station, TX, USA). Deviation of genotype frequencies in the controls from those expected under Hardy–Weinberg Equilibrium was assessed by χ2-test. Unconditional logistic regression was used to calculate the per allele odds ratio of CRC and associated 95% confidence intervals for each SNP.

Haplotype analysis was performed using the Haploview software (v4.1) and tested for association by likelihood ratio test. LD metrics were calculated using the Haploview software (v4.1). Prediction of untyped SNPs in the case–control data was performed with MACH1.0 on reference phased haplotypes from HapMap phase-II data (January 2007 on NCBI B35 assembly, dbSNP build 125) and the SNP discovery panel. To predict TF-binding sites, we used the EEL (Palin et al., 2006) and TFSEARCH (Heinemeyer et al., 1998) programs.


Electronic References

dcode ECR Browser: http://ecrbrowser.dcode.org/

UCSC Genome Browser: http://genome.ucsc.edu/

Wellcome Trust Sanger Institution, CGP LOH and copy-number analysis: http://www.sanger.ac.uk/cgi-bin/CGP/cghviewer/


Conflict of interest

The authors declare no conflict of interest.



  1. Aaltonen L, Johns L, Jarvinen H, Mecklin JP, Houlston R. (2007). Explaining the familial colorectal cancer risk associated with mismatch repair (MMR)-deficient and MMR-stable tumors. Clin Cancer Res 13: 356–361. | Article | PubMed | ISI | CAS |
  2. Abdel-Rahman WM, Katsura K, Rens W, Gorman PA, Sheer D, Bicknell D et al. (2001). Spectral karyotyping suggests additional subsets of colorectal cancers characterized by pattern of chromosome rearrangement. Proc Natl Acad Sci USA 98: 2538–2543. | Article | PubMed | CAS |
  3. Broderick P, Carvajal-Carmona L, Pittman AM, Webb E, Howarth K, Rowan A et al. (2007). A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat Genet 39: 1315–1317. | Article | PubMed | ISI | CAS |
  4. Deng H, Makizumi R, Ravikumar TS, Dong H, Yang W, Yang WL. (2007). Bone morphogenetic protein-4 is overexpressed in colonic adenocarcinomas and promotes migration and invasion of HCT116 cells. Exp Cell Res 313: 1033–1044. | Article | PubMed | ISI | CAS |
  5. Eisen T, Matakidou A, Houlston R. (2008). Identification of low penetrance alleles for lung cancer: the GEnetic Lung CAncer Predisposition Study (GELCAPS). BMC Cancer 8: 244. | Article | PubMed | CAS |
  6. Fried M, Crothers DM. (1981). Equilibria and kinetics of lac repressor–operator interactions by polyacrylamide gel electrophoresis. Nucleic Acids Res 9: 6505–6525. | Article | PubMed | ISI | CAS |
  7. Gaasenbeek M, Howarth K, Rowan AJ, Gorman PA, Jones A, Chaplin T et al. (2006). Combined array–comparative genomic hybridization and single-nucleotide polymorphism–loss of heterozygosity analysis reveals complex changes and multiple forms of chromosomal instability in colorectal cancers. Cancer Res 66: 3471–3479. | Article | PubMed | ISI | CAS |
  8. Garner MM, Revzin A. (1981). A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res 9: 3047–3060. | Article | PubMed | ISI | CAS |
  9. Gomez-Skarmeta JL, Lenhard B, Becker TS. (2006). New technologies, new findings, and new concepts in the study of vertebrate cis-regulatory sequences. Dev Dyn 235: 870–885. | Article | PubMed | ISI |
  10. He XC, Zhang J, Tong WG, Tawfik O, Ross J, Scoville DH et al. (2004). BMP signaling inhibits intestinal stem cell self-renewal through suppression of Wnt–beta-catenin signaling. Nat Genet 36: 1117–1121. | Article | PubMed | ISI | CAS |
  11. Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV et al. (1998). Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res 26: 362–367. | Article | PubMed | ISI | CAS |
  12. Houlston RS, Cheadle J, Dobbins SE, Tenesa A, Jones AM, Howarth K et al. (2010). Meta-analysis of three genome-wide association studies identifies susceptibility loci for colorectal cancer at 1q41, 3q26.2, 12q13.13 and 20q13.33. Nat Genet 42: 973–977. | Article | PubMed | CAS |
  13. Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, Lubbe S et al. (2008). Meta-analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet 40: 1426–1435. | Article | PubMed | ISI | CAS |
  14. Jaeger E, Webb E, Howarth K, Carvajal-Carmona L, Rowan A, Broderick P et al. (2008). Common genetic variants at the CRAC1 (HMPS) locus on chromosome 15q13.3 influence colorectal cancer risk. Nat Genet 40: 26–28. | Article | PubMed | ISI | CAS |
  15. Kawai K, Viars C, Arden K, Tarin D, Urquidi V, Goodison S. (2002). Comprehensive karyotyping of the HT-29 colon adenocarcinoma cell line. Genes Chromosomes Cancer 34: 1–8. | Article | PubMed |
  16. Kosinski C, Li VS, Chan AS, Zhang J, Ho C, Tsui WY et al. (2007). Gene expression patterns of human colon tops and basal crypts and BMP antagonists as intestinal stem cell niche factors. Proc Natl Acad Sci USA 104: 15418–15423. | Article | PubMed |
  17. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M et al. (2000). Environmental and heritable factors in the causation of cancer—analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 343: 78–85. | Article | PubMed | ISI | CAS |
  18. Liu C, Calogero A, Ragona G, Adamson E, Mercola D. (1996). EGR-1, the reluctant suppression factor: EGR-1 is known to function in the regulation of growth, differentiation, and also has significant tumor suppressor activity and a mechanism involving the induction of TGF-beta1 is postulated to account for this suppressor activity. Crit Rev Oncog 7: 101–125. | Article | PubMed | ISI | CAS |
  19. Lo HS, Wang Z, Hu Y, Yang HH, Gere S, Buetow KH et al. (2003). Allelic variation in gene expression is common in the human genome. Genome Res 13: 1855–1862. | Article | PubMed | ISI |
  20. Lubbe SJ, Webb EL, Chandler IP, Houlston RS. (2009). Implications of familial colorectal cancer risk profiles and microsatellite instability status. J Clin Oncol 27: 2238–2244. | Article | PubMed | ISI |
  21. Milani L, Gupta M, Andersen M, Dhar S, Fryknas M, Isaksson A et al. (2007). Allelic imbalance in gene expression as a guide to cis-acting regulatory single nucleotide polymorphisms in cancer cells. Nucleic Acids Res 35: e34. | Article | PubMed |
  22. Palin K, Taipale J, Ukkonen E. (2006). Locating potential enhancer elements by comparative genomics using the EEL software. Nat Protoc 1: 368–374. | Article | PubMed | ISI |
  23. Penegar S, Wood W, Lubbe S, Chandler I, Broderick P, Papaemmanuil E et al. (2007). National study of colorectal cancer genetics. Br J Cancer 97: 1305–1309. | Article | PubMed | ISI | CAS |
  24. Pittman AM, Naranjo S, Jalava SE, Twiss P, Ma Y, Olver B et al. (2010). Allelic variation at the 8q23.3 colorectal cancer risk locus functions as a cis-acting regulator of EIF3H. PLoS Genet 6: e1001126.
  25. Pittman AM, Naranjo S, Webb E, Broderick P, Lips EH, van Wezel T et al. (2009). The colorectal cancer risk at 18q21 is caused by a novel variant altering SMAD7 expression. Genome Res 19: 987–993. | Article | PubMed | ISI | CAS |
  26. Polakis P. (2000). Wnt signaling and cancer. Genes Dev 14: 1837–1851. | PubMed | ISI | CAS |
  27. Pomerantz MM, Ahmadiyeh N, Jia L, Herman P, Verzi MP, Doddapaneni H et al. (2009). The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat Genet 41: 882–884. | Article | PubMed | ISI | CAS |
  28. Pregizer S, Mortlock DP. (2009). Control of BMP gene expression by long-range regulatory elements. Cytokine Growth Factor Rev 20: 509–515. | Article | PubMed | ISI |
  29. Takash W, Canizares J, Bonneaud N, Poulat F, Mattei MG, Jay P et al. (2001). SOX7 transcription factor: sequence, chromosomal localisation, expression, transactivation and interference with Wnt signalling. Nucleic Acids Res 29: 4274–4283. | Article | PubMed | ISI | CAS |
  30. Tenesa A, Farrington SM, Prendergast JG, Porteous ME, Walker M, Haq N et al. (2008). Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet 40: 631–637. | Article | PubMed | ISI | CAS |
  31. Tomlinson I, Carvajal-Carmona L, Dobbins SE, Tenesa A, Jones A, Howarth K et al. (2011). Multiple common susceptibility variants near BMP pathway loci GREM1, BMP4 and BMP2 explain part of the missing heritability of colorectal cancer. PLoS Genet 7: e1002105. | Article | PubMed | CAS |
  32. Tomlinson I, Webb E, Carvajal-Carmona L, Broderick P, Kemp Z, Spain S et al. (2007). A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nat Genet 39: 984–988. | Article | PubMed | ISI | CAS |
  33. Tomlinson IP, Webb E, Carvajal-Carmona L, Broderick P, Howarth K, Pittman AM et al. (2008). A genome-wide association study identifies colorectal cancer susceptibility loci on chromosomes 10p14 and 8q23.3. Nat Genet 40: 623–630. | Article | PubMed | ISI | CAS |
  34. Tsushimi T, Noshima S, Oga A, Esato K, Sasaki K. (2001). DNA amplification and chromosomal translocations are accompanied by chromosomal instability: analysis of seven human colon cancer cell lines by comparative genomic hybridization and spectral karyotyping. Cancer Genet Cytogenet 126: 34–38. | Article | PubMed | ISI | CAS |
  35. Tuupanen S, Turunen M, Lehtonen R, Hallikas O, Vanharanta S, Kivioja T et al. (2009). The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nat Genet 41: 885–890. | Article | PubMed | ISI | CAS |
  36. Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM et al. (2007). Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 39: 989–994. | Article | PubMed | ISI | CAS |


We acknowledge the National Health Service (NHS) funding to the National Institute for Health Research (NIHR) Biomedical Research Centre. Finally, we are grateful to all patients and individuals for participation. Cancer Research UK (C1298/A8362 supported by the Bobby Moore Fund) provided principal funding for the study. JLG-S acknowledges grants from the Spanish Ministry of Education and Science (BFU2010-14839 and CSD2007-00008) and Junta de Andalucía (CVI-3488). SJL is in receipt of a PhD studentship from Cancer Research UK. The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/207-2013) under grant no. 258236, FP7 collaborative project SYSCOL.

Supplementary Information accompanies the paper on the Oncogene website