Erratum: A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer

Al-Tassan, Nada A., Whiffin, Nicola, Hosking, Fay J., Palles, Claire, Farrington, Susan M., Dobbins, Sara E., Harris, Rebecca, Gorman, Maggie, Tenesa, Albert, Meyer, Brian F., Wakil, Salma M., Kinnersley, Ben, Campbell, Harry, Martin, Lynn, Smith, Christopher G., Idziaszczyk, Shelley Alexis, Barclay, Ella, Maughan, Timothy Stanley, Kaplan, Richard, Kerr, Rachel, Kerr, David, Buchannan, Daniel D., Ko Win, Aung, Hopper, John, Jenkins, Mark, Lindor, Noralane M., Newcomb, Polly A., Gallinger, Steve, Conti, David, Schumacher, Fred, Casey, Graham, Dunlop, Malcolm G., Tomlinson, Ian P., Cheadle, Jeremy Peter and Houlston, Richard S. 2015. A new GWAS and meta-analysis with 1000Genomes imputation identifies novel risk variants for colorectal cancer. Scientific Reports 5 , 10442. 10.1038/srep10442 file


Ethics statement
Collection of blood samples and clinico-pathological information from all subjects was undertaken with informed consent and ethical review board approval in accordance with the tenets of the Declaration of Helsinki.

Subjects and datasets
In all cases CRC was defined according to the 9th revision of the International Classification of Diseases (ICD) by codes 153-154.

COIN
The COIN GWAS was based on 2,244 CRC cases (64% male, mean age 61 years, SD=10) ascertained through two independent Medical Research Council clinical trials of advanced/metastatic CRC; COIN and COIN-B (9). COIN patients were randomised 1:1:1 to receive continuous oxaliplatin and fluoropyrimidine chemotherapy, continuous chemotherapy plus cetuximab, or intermittent chemotherapy. COIN-B patients were randomised 1:1 to receive intermittent chemotherapy and cetuximab or intermittent chemotherapy and continuous cetuximab. All patients gave informed consent for their samples to be used for bowel cancer research (approved by REC [04/MRE06/60]).
DNA was extracted from EDTA-venous blood samples using conventional methods and quantified using Nanodrop spectrophotometry (Thermo Scientific, MA, USA). Cases were genotyped using Affymetrix Axiom Arrays according to the manufacturer's recommendations (Affymetrix, Santa Clara, CA 95051, USA) at the King Faisal Specialist Hospital and Research Center, Saudi Arabia (under IRB approval 2110033). Genotyping quality control was tested using duplicate DNA samples, together with direct sequencing of significantly associated SNPs in a subset of samples to confirm genotyping accuracy. For all SNPs, >99% concordant results were obtained.
For controls, we made use of publicly accessible Affymetrix 6.0 array data generated by the Wellcome Trust Case Control Consortium 2 (WTCCC2) on 2,674 individuals from the UK Blood Service Control Group. We excluded individuals from analysis if they failed one or more of the following thresholds: overall successfully genotyped SNPs < 95% (n = 122), discordant sex information (n = 8), classed as out of bounds by Affymetrix (n = 30), duplication or cryptic relatedness (identity by descent >0.185, n = 4), and evidence of non-Meta-analyses were carried out using META v2.4-1, using the genotype probabilities from IMPUTEv2, where a SNP was not directly typed. We calculated Cochran's Q statistic to test for heterogeneity and the I 2 statistic to quantify the proportion of the total variation that was caused by heterogeneity (35). I 2 values ≥75% are considered characteristic of large heterogeneity (35-37). Associations by sex, age and clinico-pathological phenotypes were examined by logistic regression in case-only analyses. LD blocks were defined on the basis of HapMap recombination rate (cM/Mb) as defined using the Oxford recombination hotspots and on the basis of the distribution of CIs defined by Gabriel et al (37).
The familial relative risk of CRC attributable to a variant was calculated using the formula (38): where p is the population frequency of the minor allele, q=1-p, and r 1 and r 2 are the relative risks (approximated by ORs) for heterozygotes and the rarer homozygotes relative to the more common homozygotes respectively. From λ * , it is possible to quantify the influence of the locus on the overall familial risk of CRC in first-degree relatives of CRC patients. Assuming a multiplicative interaction between risk alleles, the proportion of the overall familial risk attributable to the locus is given by log (λ * )/log(λ 0 ), where λ 0 , the overall familial risk of CRC, shown in epidemiological studies is 2.2 (39).
To explore epigenetic profiles of association signals, we used ChromHMM (40). States were inferred from ENCODE Histone Modification data on the CRC cell line HCT116 (DNAse, H3K4me3, H3K4me1, H3K27ac, Pol2 and CTCF) binarized using a multivariate Hidden Markov Model.
To examine whether any of the SNPs or their proxies (i.e. r2 > 0.8 in 1000genomes CEU reference panel) annotate putative transcription factor binding/enhancer elements we used the CADD (combined annotation dependent depletion) webserver (16) which integrates information from the Ensembl Variant Effect Predictor (VEP) (41) and ENCODE (42). We assessed sequence conservation using PhastCons (score <0.3 indicative of conservation) and Genomic Evolutionary Rate Profiling (GERP). GERP scores (−12 to 6, with 6 being indicative of complete conservation) reflect the proportion of substitutions at that site rejected by selection compared with observed substitutions expected under a neutral evolutionary model, based on sequence alignment of 34 mammalian species (43). We also derived CADD scores to assess functionality of non-coding changes (CADD score >10.0 deemed to be deleterious).

Relationship between SNP genotype and mRNA expression.
To examine for a relationship between SNP genotype and mRNA expression we made use of Tumor Cancer Genome Atlas (TCGA) RNA-seq expression and Affymetrix 6.0 SNP data (dbGaP accession number: phs000178.v7.p6) on 223 colorectal adenocarcinoma (COAD) and 75 rectal adenocarcinoma samples using a best proxy where SNPs were not represented directly. Association between normalised RNA counts per-gene and SNP genotype was quantified using the Kruskal-Wallis trend test.

Mutation frequency.
The frequency of somatic mutations in CRC was obtained using the CBioPortal for Cancer Genomics and TumorPortal web servers.

Pathway analysis
To determine whether any genes mapping to the three newly identified regions act in pathways already over-represented in GWAS regions we utilized the NCI pathway interaction database (http://pid.nci.nih.gov/index.shtml). All genes within the LD block containing each tagSNP, or linked to the SNP through functional experiments (MYC) were submitted as a Batch query using the NCI-Nature curated data source.

Assignment of microsatellite instability (MSI), KRAS, NRAS and BRAF status in cancers
Tumour MSI status in CRCs was determined using the mononucleotide microsatellite loci BAT25 and BAT26, which are highly sensitive MSI markers. Briefly, 10 μm sections were cut from formalin-fixed paraffin-embedded CRC tumours, lightly stained with toluidine blue and regions containing at least 60% tumour microdissected. Tumour DNA was extracted using the QIAamp DNA Mini kit (Qiagen, Crawley, UK) according to the manufacturer's instructions and genotyped for the BAT25 and BAT26 loci using 32P-labelled oligonucleotide primers or FAM-labelled BAT26 and HEX-labelled BAT25 primers with visualisation on an ABI 3100 (Life Technologies, CA, USA). Samples showing more than or equal to five novel alleles, when compared with normal DNA, at either or both markers were assigned as MSI-H (corresponding to MSI-high)(44).

SUPPLEMENTARY TABLE AND FIGURE LEGENDS
Supplementary     Samples were excluded due to call rate (<95% or failed genotyping), Ethnicity (principle components analysis or other samples reported to be not of white, European descent), Relatedness (any individuals found to be duplicated or related within or between data sets through IBS), sex discrepancies or others (cases found to contain a previously reported susceptibility allele, controls with a 1 st degree relative with CRC, low concordance of genotyping in duplicates or samples which have been subsequently withdrawn from a study). For a single allele (i) of frequency p, relative risk R and log risk r, the variance (V i ) of the risk distribution due to that allele is given by

Supplementary
Where E is the expected value of r given by For multiple risk alleles the distribution of risk in the population tends towards the normal with variance = � The total genetic variance (V) for all susceptibility alleles has been estimated to be √2.2. Thus the fraction of the genetic risk explained by a single allele is given by (A): Previously published risk loci discovered in European populations. Shown for each region are the GWAS tagSNP, the most associated variant in the imputation and the associated odds ratio and P-values associated with each along with the linkage disequilibrium metrics between the SNPs. Imputation was not carried out on the X chromosome so this locus was not included. Genes with an asterisk (*) were also found to be significant in East Asian populations.