The COGS project brought together four consortia whose design of the custom iCOGS genotyping array facilitated taking variants from suggestive associations to robust identification in large replication studies. The dense genomic coverage on this array, combined with genotyping across studies for multiple cancers, allowed for the identification of overlapping susceptibility regions suggesting shared mechanisms. The demonstrated benefits of using the iCOGS array have inspired the design of a next-generation cancer genotyping platform to identify risk variants for five of the most prevalent cancers.
The COGS project was designed to improve understanding of genetic susceptibility to three hormone-related cancers: breast, ovarian and prostate cancers. To this end, the consortium worked together with Illumina to design a custom iSelect SNP genotyping array (the iCOGS array) suited for genotyping in large case-control studies for these three cancers. The project goals included the identification of common variants contributing to susceptibility to each of these cancers, as well as variants associated with several relevant subtypes of disease and disease outcome. The major strategy included replication of genome-wide association study (GWAS)-identified associations, and secondary studies included dense genotyping of SNPs for the fine mapping of associated regions.
COGS is a European Union (EU)-funded project, which is a collaboration between four consortia. These consortia have been active since 2005 and have been central to the identification and characterization of susceptibility loci for these cancers.
COGS study design and development of the iCOGS array
The COGS project brought together four consortia to undertake a detailed investigation of the genetics of these hormone-related cancers. Its specific aims are: to identify genetic variants associated with susceptibility to breast, ovarian and prostate cancers; to determine the risks associated with these variants; to evaluate the interactions between genetic and lifestyle risk factors in determining risk; to identify variants associated with subtypes of disease and clinical outcome; to develop risk models for these cancers; and to investigate the cost effectiveness of using such risk models in prevention strategies and the associated organizational, ethical, legal and social implications.
The COGS project had initially planned for the evaluation of common variation contributing to these three cancers within a three-stage design, including (i) combining GWAS data for each cancer to identify potential risk-associated loci, (ii) genotyping 1,536 SNPs using the Illumina GoldenGate assays in case-control studies for each cancer, and (iii) genotyping at least 50 SNPs for each cancer in the remaining cases and controls. For this project, Illumina developed custom iSelect arrays, allowing the genotyping of over 200,000 SNPs on a single array in 12-sample format. This approach was similar to that used by other disease study consortia to develop the MetaboChip and subsequently the ImmunoChip (Nat. Genet. 43, 1193–1201, 2011).
However, during this time, there was a fundamental redesign of the project, owing to technological developments as well as the acquisition of additional funding. Taking into account these considerations, the COGS collaborators decided that it would be more cost-effective to design a single array relevant to all three cancers, the iCOGS. Additional funding from several other sources, notably Cancer Research UK and the US National Institutes of Health, meant that genotyping of over 150,000 samples would be possible. This allowed the majority of samples in BCAC, OCAC, PRACTICAL and CIMBA to be genotyped on this single iCOGS array. In addition, the ability to include a much larger number of SNPs on the iCOGS array meant that it would be possible not only to replicate a much larger number of suggestive associations from GWAS but also to investigate a wider variety of phenotypes. The COGS collaborators also opted to investigate associations for several disease subtypes, including estrogen receptor (ER)-negative disease, aggressive prostate cancer and serous ovarian cancer. Also included were SNPs showing suggestive evidence for association with endpoints of disease survival. In addition, it was possible to include comprehensive coverage of SNPs in regions known to be associated with these cancers and additional regions of general interest. The array also included candidate SNPs proposed through the consortia (including rarer variants that would not be captured through GWAS), SNPs associated with other cancers and SNPs associated with relevant quantitative traits such as body mass index (BMI) and the onset of menarche.
SNP selection for the iCOGS array
The iCOGS array is an Illumina Custom Infinium array including over 200,000 SNPs. Further details on SNP selection for inclusion on the array and the contributions from each of the consortia are available on the iCOGS project website.
Sakoda, L.C., Jorgenson, E. & Witte, J.S.
Investigators of each consortium selected SNPs for inclusion on the iCOGS array, in particular, markers (i) associated with cancer susceptibility or survival in previous GWAS, including those specific to particular subtypes (for example, aggressive prostate cancer) and subgroups (for example, BRCA1 and BRCA2 mutation carriers); (ii) for fine mapping genomic regions of interest to each cancer and across cancers (for example, 8q24 region, TERT, CDKN2A–CDKN2B and ESR1); (iii) associated with cancer-related quantitative traits (for example, age at menarche and mammographic density); (iv) in selected candidate genes or pathways; and (v) associated with other cancers (for example, lung, endometrial, melanoma or testicular). These SNPs were classified into one of three categories: GWAS replication, fine-mapping and candidate SNPs. Space on the iCOGS array was shared among the consortia, with approximate initial allocations of 25% each to BCAC, OCAC and PRACTICAL, 17.5% to CIMBA and 7.5% to makers of mutual interest.
The iCOGS SNPs were selected to enhance SNP genotyping success, with the majority having Illumina design scores of ≥ 0.8. SNPs were chosen preferentially in the following order: (i) SNPs previously genotyped by Illumina (with design scores of 1.1); (ii) SNPs with linkage disequilibrium (LD) of r2 = 1 with the index (previously associated) SNP and the best design score; and (iii) SNPs with r2 > 0.8 with the index SNP and the best design score. SNPs in strong LD with other selected SNPs (r2 > 0.9) were excluded, although, for GWAS-identified SNPs with association P value < 1 × 10–5, two surrogate SNPs were also included. The final set of SNPs was compiled by first including the selected fine-mapping SNPs, followed by the addition of selected GWAS replication and candidate SNPs. The penultimate list included 220,123 SNPs, and, of these, 211,155 were successfully included on the iCOGS array.
How the iCOGS array has proved useful
The studies in this collection of COGS papers have provided a wealth of data. The clear benefit of using this custom iCOGS array for these studies has been that, by including a large number and high density of SNPs, it has been possible to follow up the results from many GWAS in greater depth, selecting all SNPs showing even quite weak evidence of association and still being able to genotype all these variants in a very large replication study. A second advantage is that, by genotyping with this same array across studies for multiple different cancers, it has been possible to explore in much greater depth than before the overlapping susceptibilities between the different cancers. In particular, because the array includes fine-mapping SNPs for known susceptibility regions for any of the cancers, it has been possible to identify susceptibility variants for multiple cancers in the same region, pointing to shared mechanisms, exemplified by the fine mapping of the TERT and HNF1B regions (Nat. Genet. doi:10.1038/ng.2566, 27 March 2013, Hum. Mol. Genet. doi:10.1093/hmg/ddt086, 27 March 2013 and Shen, H. et al. Nat. Commun. doi:10.1038/ncomms2629, 27 March 2013). A third advantage is that the iCOGS data provide an excellent basis for risk profiling. Because the iCOGS array includes a large panel of SNPs selected from GWAS, it is possible to develop risk profiles based not just on the established risk loci but also using larger panels, including variants with disease associations that fall below traditional genome-wide significance levels.
Future plans and the OncoChip
Many further analyses of the iCOGS data sets are still ongoing and will be reported in subsequent publications. These include more detailed fine-mapping analyses for the additional susceptibility regions; analyses of variants selected from candidate genes or pathways; analyses of rare variants in susceptibility genes such as BRCA1, BRCA2 and PALB2; analyses of gene-gene and gene-environment interactions; more risk profiling analyses; and analyses of SNPs related to quantitative traits, such as breast density.
The iCOGS array itself has now been retired, but a second-generation chip is being designed. The OncoChip will draw on many of the features of the iCOGS array but will be used in an even larger experiment. It will include approximately 600,000 SNPs, selected for relevance to 5 cancers: breast, ovarian, prostate, colorectal and lung. In addition to further follow-up of the results of GWAS and iCOGS, the OncoChip will include fine mapping of all the new susceptibility regions identified with the iCOGS array, as well as associated variants identified through whole-genome, whole-exome or targeted sequencing studies.
About this article
Journal of Human Genetics (2021)