The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.
The genomes of all cancers accumulate somatic mutations1. These include nucleotide substitutions, small insertions and deletions, chromosomal rearrangements and copy number changes that can affect protein-coding or regulatory components of genes. In addition, cancer genomes usually acquire somatic epigenetic ‘marks’ compared to non-neoplastic tissues from the same organ, notably changes in the methylation status of cytosines at CpG dinucleotides.
A subset of the somatic mutations in cancer cells confers oncogenic properties such as growth advantage, tissue invasion and metastasis, angiogenesis, and evasion of apoptosis2. These are termed ‘driver’ mutations. The identification of driver mutations will provide insights into cancer biology and highlight new drug targets and diagnostic tests. Knowledge of cancer mutations has already led to the development of specific therapies, such as trastuzumab for HER2 (also known as NEU or ERBB2)-positive breast cancers3 and imatinib, which targets BCR-ABL tyrosine kinase for the treatment of chronic myeloid leukaemia4,5. The remaining somatic mutations in cancer genomes that do not contribute to cancer development are called ‘passengers’. These mutations provide insights into the DNA damage and repair processes that have been operative during cancer development, including exogenous environmental exposures6,7. In most cancer genomes, it is anticipated that passenger mutations, as well as germline variants not yet catalogued in polymorphism databases, will substantially outnumber drivers.
Large-scale analyses of genes in tumours have shown that the mutation load in cancer is abundant and heterogeneous8,9,10,11,12,13. Preliminary surveys of cancer genomes have already demonstrated their relevance in identifying new cancer genes that constitute potential therapeutic targets for several types of cancer, including PIK3CA14, BRAF15, NF1 (ref. 10), KDR10, PIK3R1 (ref. 9), and histone methyltransferases and demethylases16,17. These projects have also yielded correlations between cancer mutations and prognosis, such as IDH1 and IDH2 mutations in several types of gliomas13,18. Advances in massively parallel sequencing technology have enabled sequencing of entire cancer genomes19,20,21,22.
Following the launch of comprehensive cancer genome projects in the United Kingdom (Cancer Genome Project)23 and the United States (The Cancer Genome Atlas)24, cancer genome scientists and funding agencies met in Toronto (Canada) in October 2007 to discuss the opportunity to launch an international consortium. Key reasons for its formation were: (1) the scope is huge; (2) independent cancer genome initiatives could lead to duplication of effort or incomplete studies; (3) lack of standardization across studies could diminish the opportunities to merge and compare data sets; (4) the spectrum of many cancers is known to vary across the world; and (5) an international consortium will accelerate the dissemination of data sets and analytical methods into the user community.
Working groups were created to develop strategies and policies that would form the basis for participation in the ICGC. The goals of the consortium (Box 1) were released in April 2008 (http://www.icgc.org/files/ICGC_April_29_2008.pdf). Since then, working groups and initial member projects have further refined the policies and plans for international collaboration.
ICGC members agreed to a core set of bioethical elements for consent as a precondition of membership (Box 2). The Ethics and Policy Committee has created patient consent templates for both prospective collection and retrospective use of samples and data for ICGC projects. Differences in project-specific requirements and national legal frameworks may require some local amendments, while still reflecting the core principles of ICGC.
The ICGC recognizes a delicate balance between protecting participants’ personal data and sharing these data to accelerate cancer research. Data access policies have been drawn up that are respectful of the rights of the donors, while allowing ICGC data derived from samples to be shared ethically among a wide research community. Two levels of access have been implemented. For data that cannot be used to identify individuals, ‘open access’ data sets are publicly available. These include data such as gender, age range, histology, normalized gene expression values, epigenetic data sets, somatic mutations, summaries of germline data, and study protocols. ‘Controlled access’ data sets contain germline genomic data and detailed clinical information that are associated to a unique individual whose personal identifiers have been removed. To access controlled data sets researchers must seek authorizations by contacting the Data Access Compliance Office (DACO) (http://www.icgc.org/daco). An independent International Data Access Committee (IDAC) oversees the work of the DACO and provides assistance with resolving issues that arise.
Pathology and clinical annotation
Large-scale genomic studies of human tumours rely on the availability of freshly frozen tumour tissue. To address the paucity of samples that meet ICGC standards, many projects have initiated prospective collections of high-quality source material. Accordingly, the ICGC recommended procedures to promote consistency of sample processing throughout the consortium and ensure a series of quality features such as high tissue integrity and tumour cell content. Each project will need to include diverse data types, such as environmental exposures, clinical history of participants, tumour histopathology, and clinical outcomes.
Tumours show considerable clinical and biological heterogeneity that has resulted in a variety of tumour classifications. Within the ICGC, special measures are taken to promote the consistency of diagnosis. These include the coordination of diagnostic criteria among groups investigating tumours that are related, and policies that all samples will be reviewed by at least two independent reference pathologists. Furthermore, images of the stained tumour sections (or blood smear or cytospins for haematological neoplasias) from which diagnoses were made, will be stored and made available to the community.
Although different tumour types may require specific procedures for tumour acquisition or compilation of clinical and environmental data, the ICGC has set guidelines about the use of common definitions and data standards. This will allow ICGC data users to identify correlations between tumour-specific molecular changes with clinical and histopathological data including prognosis, prediction of therapy response and tumour classification schemes for diagnosis.
Study design and statistical issues
To identify cancer-related genes, one needs to detect genes that are mutated at a higher frequency than the background mutation rate. Given that several driver genes have been found to be mutated at low frequencies, the ICGC will identify somatic mutations observed in at least 3% of tumours of a given subtype. The ICGC determined that 500 samples would be needed per tumour type (although for rare tumour types, a smaller sample size may be justified). In practice, the degree of heterogeneity of a given tumour type is difficult to know in advance, such that some particularly heterogeneous tumour types may require larger sample collections.
Cancer genome analyses
High-quality catalogues of somatic mutations from whole cancer genomes will ultimately be the ICGC standard. Shotgun sequencing using second generation technologies can detect all classes of somatic mutation implicated in cancer. Moreover, if the level of coverage is sufficient, comprehensive high-quality catalogues of somatic mutations from individual cancer genomes can be acquired with >90% sensitivity and >95% specificity. To achieve this, it will be necessary to sequence the genome of both the cancer and a normal tissue from the same individual to distinguish germline variants. Although a few genomes of this standard have already been generated, the cost and the continuing technology development will mean that interim analyses of particularly informative sectors of the genome will be carried out, for example of all coding exons and microRNAs.
For each individual cancer genome, the catalogue of somatic mutations will be supplemented by genome-wide information on the state of methylation of CpG dinucleotides. The optimal strategies and technologies to achieve this are not yet clear. Moreover, the genomes of individual cancers will be accompanied, where possible, by analyses of the transcriptome. Although conventional array-based approaches predominate at present, it is preferable that RNA sequencing becomes the standard as sequencing has a greater dynamic range25 and provides further information including new transcripts and sequence variants26.
ICGC data sets
The distributed nature of the consortium coupled with the large size of the data sets makes it cumbersome to store all data in a single centralized repository. For this reason, the ICGC has adopted a ‘franchise’ database model for integrating the information and making it available to the public. Under this model, each member project releases tumour information by copying it into its local franchise database after it has been quality checked. Each franchise database shares a common schema to describe the specimens, the associated clinical information, and their genome characterization data. ICGC primary data files, are sent to the National Center for Biotechnology Information (NCBI) and/or the European Bioinformatics Institute (EBI) for archiving, while interpreted data sets, such as somatic mutation calls, are stored in franchise databases. The ICGC franchise databases and web portal use BioMart27, a data federation technology originally developed for use in Ensembl28, and since adopted for use by several model organism and genome databases. The management of the ICGC data flow is the responsibility of the ICGC Data Coordination Center (DCC) located at the Ontario Institute for Cancer Research.
The DCC also operates the ICGC data portal that allows researchers to access both open and controlled access portions of the ICGC data. The portal provides a variety of user interfaces that range from simple gene-oriented queries (‘show me all the non-silent coding mutations identified in PIK3R1 for all cancers’) to queries that integrate genomic, clinical and functional information (‘show me all members of the Toll-receptor pathway having deletions in stage III breast cancer’). These queries will be distributed across the franchise databases in a manner that is invisible to the user. The portal will also provide links to the primary files at the NCBI and EBI, interfaces for generating tabular reports, data dumps in common bioinformatics formats, and other visualizations including genome browser tracks, pathway diagrams and survival curves. The portal is available via a link at http://www.icgc.org.
At the time of this publication, the following cancer and reference data sets will be available through the ICGC web portal: (1) initial data releases from ICGC members for breast cancer (UK), liver cancer (Japan), and pancreatic cancer (Australia and Canada); (2) a whole genome data set of a metastatic melanoma cell line (COLO829)6; (3) open data sets from The Cancer Genome Atlas (TCGA) for glioblastoma multiforme (GBM) and serous cystadenocarcinoma of the ovary (see later); (4) whole exome somatic mutation data from 68 individuals with breast, colorectal, pancreatic cancer and GBM11,12,13; (5) links to the human reference genome (http://www.genomereference.org/) and gene annotations from the GENCODE project (http://www.sanger.ac.uk/gencode/) that includes the CCDS gene set29; (6) links to the single nucleotide polymorphism database (dbSNP)30 and the HapMap31 databases, providing access to common patterns of variation in reference population samples; (7) links to Reactome32, a curated database of biological pathways in human; and (8) a set of reference gene models, mirrored from ENSEMBL28.
The current version of the web portal provides an entry point to the open access data tier by interactive query as well as bulk download of data files. We expect that in mid-2010 both open access and controlled data will be available.
The ICGC recently established a bioinformatics analysis working group to compare pipelines, analytic methods, consistency within and among algorithms, and establish guidelines or best practices for the consortium. Over time, significant resources will be deployed to develop strategies to analyse the large complex data sets generated by ICGC member projects, and provide value-added views of cancer genomic data by integrating them with other biological and epidemiological data sets.
Data release and intellectual property policies
The data release policies of the ICGC are intended to maximize public benefit while, at the same time, protecting the interests and rights of sample donors and their relatives. Members of the ICGC are committed to the principles of rapid data release (with appropriate controlled access mechanisms), in concordance with the Toronto statement33. ICGC members encourage the scientific community to use any data that targets specific genes and mutations, without any restrictions. To allow ICGC members the opportunity to be the first to publish global analyses from data sets they generate, the consortium has also agreed that member projects may specify conditions that include a time limit during which other data users are asked to refrain from publishing global analyses (defined by several ICGC member projects as 100 tumours and matched controls), a provision referred to as a ‘publication moratorium’. To allow time for a data set to be analysed and submitted for publication, ICGC members will have at most one year after released data sets reach the specified threshold before third parties are permitted to submit manuscripts describing global analyses. Further details on data release guidelines for data producers, users and reviewers are available http://www.icgc.org. Users of ICGC data are expected to respect these terms and to cite this manuscript and the source of pre-publication data, including the version of the data set. In cases of uncertainty, scientists using ICGC data are encouraged to contact the member projects to discuss publication plans.
ICGC members believe that maximum public benefit will be achieved if the data remain publicly accessible without patent restrictions, hence no claims to possible intellectual property derived from primary data (including somatic mutations) will be made. Users of ICGC data (including ICGC members) may elect to perform further research and to exercise their intellectual property rights on these downstream discoveries. If this occurs, users are expected to implement licensing policies that do not obstruct further research.
Initial ICGC projects
At present, ten countries and two European consortia have initiated cancer genome projects under the umbrella of the ICGC. The initial projects, listed in Supplementary Table 1, will analyse tumour types found around the globe and throughout the human body affecting a diversity of organs, including blood, brain, breast, kidney, liver, pancreas, stomach, oral cavity and ovary. Over time, the ICGC will investigate 50 or more types and subtypes of cancer in adults and children. In the case of tumours with several subtypes, analyses should be focused on subtypes that may be defined on pathological, molecular, aetiological or geographical differences. It is expected that some cancer types will be studied in parallel in different parts of the world, as the mutation profiles may differ among populations. The consortium has enabled the coordination of initial projects analysing similar cancers in different countries, and in some cases, the redirection of resources to launch new projects.
The Cancer Genome Atlas
TCGA is a comprehensive program in cancer genomics that is jointly supported and managed by the National Cancer Institute and the National Human Genome Research Institute of the US National Institutes of Health. TCGA began in 2006 as a pilot focused on three projects, glioblastoma multiforme (GBM), serous cystadenocarcinoma of the ovary, and lung squamous carcinoma, and has recently expanded to produce comprehensive genomic data sets for at least ten other cancers in the next two years. Given TCGA’s contributions in launching the ICGC and cooperation to ensure that its policies (posted at http://cancergenome.nih.gov) are coordinated with those of the ICGC, TCGA’s participation in the ICGC is considered to be equivalent to that of a full member. TCGA, however, is not able to join the ICGC formally at this time, because of technical and legal issues in the US related to the mechanisms of the distribution of controlled-access data, although such data are directly available to investigators at http://cancergenome.nih.gov/dataportal. The National Institutes of Health (NIH) policies relating to distribution of controlled-access data sets are being reviewed with the intent of enabling researchers to integrate and analyse across databases, for example, using the franchise model adopted by the ICGC. Meanwhile, TCGA is ensuring that projects are coordinated and data sets are compatible with those of the consortium.
ICGC in the next decade
A large proportion of common cancers affecting patients around the world have been or will soon be selected for comprehensive cancer genome studies. Further efforts will be needed to leverage support and expertise to tackle the remaining tumour types, including rare cancers. The challenges of the ICGC are daunting owing to the scope of the initiative, the complexity that is inherent to the heterogeneity of cancer, and the limitations of current technologies to provide accurate long-range assemblies of highly rearranged chromosomes found in tumour cells. These challenges underscore the importance of continued international coordination and further engagement of the scientific community in the next decade.
Moving towards clinical applications
ICGC catalogues, which are expected to grow exponentially, will have immediate relevance in the cancer research community. Early insight into the biology of somatic mutations will come from functional studies in cell-based and animal models of tumours. Mutation screens in retrospective tumour banks linked to registries or clinical trials having significant clinical data will inform on the potential clinical utility of somatic mutations as biomarkers for prognosis or drug-response. Germline variants identified by ICGC projects may allow the discovery of genes predisposing to familial malignancies, such as PALB2 and pancreatic cancer12,34. High throughput screens of RNA interference or small molecule libraries, and the adaptation of existing model systems, will have a major role in refining potential therapeutic candidates for further study35.
Translating these discoveries into clinical practice will require more sophisticated clinical trials that take into account the increases in phenotypic subdivisions, further coordination to identify subjects having tumours with similar profiles, and increased use of biomarkers, genomic analyses, informatics and other technologies in the clinical development of new therapeutics. Given the tremendous potential for relatively low-cost genomic sequencing to reveal clinically useful information, we anticipate that in the not so distant future, partial or full cancer genomes will routinely be sequenced as part of the clinical evaluation of cancer patients and as part of their continuing clinical management. The successful and appropriate translation of cancer genome research into clinical practice will raise important social and ethical questions. It will be essential to combine the expertise of oncologists, biostatisticians, pathologists, geneticists, policy-makers and members of the biopharmaceutical industry to meet this challenge by developing new policies and clinical models that enable rapid translation of many new biomarkers and cancer targets into new clinical tests and therapeutic interventions that will benefit cancer patients.
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009)
Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000)
Slamon, D. J. et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N. Engl. J. Med. 344, 783–792 (2001)
Druker, B. J. et al. Efficacy and safety of a specific inhibitor of the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl. J. Med. 344, 1031–1037 (2001)
Druker, B. J. et al. Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. N. Engl. J. Med. 344, 1038–1042 (2001)
Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 184–190 (2010)
Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 191–196 (2010)
Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 (2007)
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008)
Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075 (2008)
Wood, L. D. et al. The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113 (2007)
Jones, S. et al. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806 (2008)
Parsons, D. W. et al. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812 (2008)
Samuels, Y. et al. High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 (2004)
Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002)
van Haaften, G. et al. Somatic mutations of the histone H3K27 demethylase gene UTX in human cancer. Nature Genet. 41, 521–523 (2009)
Dalgliesh, G. L. et al. Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463, 360–363 (2010)
Yan, H. et al. IDH1 and IDH2 mutations in gliomas. N. Engl. J. Med. 360, 765–773 (2009)
Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66–72 (2008)
Mardis, E. R. et al. Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med. 361, 1058–1066 (2009)
Shah, S. P. et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 461, 809–813 (2009)
Stephens, P. J. et al. Complex landscapes of somatic rearrangement in human breast cancer genomes. Nature 462, 1005–1010 (2009)
Dickson, D. Wellcome funds cancer database. Nature 401, 729 (1999)
Collins, F. S. & Barker, A. D. Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci. Am. 296, 50–57 (2007)
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621–628 (2008)
Shah, S. P. et al. Mutation of FOXL2 in granulosa-cell tumors of the ovary. N. Engl. J. Med. 360, 2719–2729 (2009)
Haider, S. et al. BioMart Central Portal–unified access to biological data. Nucleic Acids Res. 37, W23–W27 (2009)
Hubbard, T. J. et al. Ensembl 2009. Nucleic Acids Res. 37, D690–D697 (2009)
Pruitt, K. D. et al. The consensus coding sequence (CCDS) project: identifying a common protein-coding gene set for the human and mouse genomes. Genome Res. 19, 1316–1323 (2009)
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001)
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007)
Matthews, L. et al. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 37, D619–D622 (2009)
Toronto International Data Release Workshop Authors. Prepublication data sharing. Nature 461, 168–170 (2009)
Jones, S. et al. Exomic sequencing identifies PALB2 as a pancreatic cancer susceptibility gene. Science 324, 217 (2009)
Chin, L. & Gray, J. W. Translating insights from the cancer genome into clinical practice. Nature 452, 553–563 (2008)
We thank research participants who are generously donating samples and data, as well as physicians and clinical staff contributing to sample annotation and collection. A complete list of organizations that support ICGC projects is in Supplementary Table 1.
Author Contributions See list of consortium authors below.
The author declare no competing financial interests.
A list of participants and their affiliations appears at the end of the paper.
About this article
Estimating the number of genetic mutations (hits) required for carcinogenesis based on the distribution of somatic mutations
PLOS Computational Biology (2019)
IET Signal Processing (2019)
npj Precision Oncology (2019)
Animal Cells and Systems (2019)
Integrative Analysis Reveals Across-Cancer Expression Patterns and Clinical Relevance of Ribonucleotide Reductase in Human Cancers
Frontiers in Oncology (2019)