There is a growing tendency to restrict one's research agenda along the lines of Crick's central dogma, such that one has expertise in either genomics, transcriptomics or proteomics. The high degree of intellectual and technical specialization required by emergent technologies may make this stratification necessary. However, a thorough exploration of biological questions requires the integration of data and ideas obtained from varied approaches. This need for a multi-pronged approach to understand the biology of cancer formed the rationale for Oncogenomics 2002*, a meeting co-hosted by Nature Genetics and The American Association for Cancer Research. The meeting provided a forum for discussing the current challenges faced by cancer researchers and the means by which these problems are being surmounted.
Identifying low-risk susceptibility genes.
One of the greatest problems facing geneticists today is the identification of genes underlying disorders of complex inheritance. As only a minority of familial cancers segregate in a mendelian fashion, identifying cancer genes through family studies poses the same difficulties encountered with all polygenic diseases. And, as most cases of cancer occur sporadically, the problem is vastly more difficult. If one then considers all the nongenetic risk factors that may be involved (such as diet and other forms of environmental exposure), the identification of genetic determinants underlying this multifactorial class of disorders represents a herculean task.
With three-quarters of the finished human genome sequence completed, and the completion date of April 2003 confirmed, Francis Collins (National Human Genome Research Institute, NHGRI) spelled out the rationale for generating haplotype maps to facilitate genome-wide association studies. This approach is based on the hypothesis that common ancestral alleles underlie common human diseases, the so-called common disease/common variant hypothesis. Recent studies have revealed the block-like structure of the human genome, which Collins estimates consists of average haplotypes of 25 kb. Using hap-SNPs, which are representative of each haplotype, a back-of-the-envelope calculation suggests that one would need 120,000 markers to provide coverage of the genome. Thus, such an approach would require advances in current genotyping technology and the optimization of DNA pooling techniques so as to allow discrimination of small allelic differences in case–control studies. There is certainly no shortage of nucleotide variation in the human genome as reported by David Bentley (Sanger Institute), who described a map of 2.2 million SNPs.
Bruce Ponder (Cambridge Univ.) has been carrying out candidate gene linkage disequilibrium studies and reported the identification of several genes that confer a small risk for developing cancer. The odds ratios for these genes range between 1.2 and 1.4, which begs the question of their clinical usefulness. Ponder estimates that known cancer genes only account for 1.7% of the population risk, suggesting that the vast majority of disease risk alleles are yet to be determined. To this end, family studies continue to have an important role, as illustrated by John Carpten's (NHGRI) report of the identification of RNASEL mutations in families with multiple cases of prostate cancer. But, as Carpten pointed out, the high sporadic rate of prostate cancer poses a serious hindrance to family studies, as disease occurrence within a single pedigree can be due to both inherited and sporadic genetic defects.
The use of mouse models can minimize the factors that contribute to a particular type of cancer. Allan Balmain (Univ. of California, San Fransisco) described an approach to mapping quantitative trait loci in inbred mouse lines that have varying degrees of resistance to carcinogens. Balmain has generated congenic mouse strains to refine a quantitative trait locus. The orthologous region of the human genome was then tested in a case–control study to identify STK6 as a low penetrance tumor modifier gene. Allan Bradley (Sanger Institute) described the generation of targeted haploinsufficient mice to test for the existence of tumor suppressor genes within defined intervals. The extrapolation of this approach to a genome-wide analysis would require a huge number of mouse strains. Thus, his group is using the Blm−/− mouse, which has a high rate of loss of heterozygosity owing to an increased rate of mitotic recombination, as a means of identifying tumor-suppressor genes.
Genomic approaches to identifying human cancer risk alleles will certainly continue to benefit from model organisms. Thus, the rapid progress in the sequencing of the rat genome, as described by Richard Gibbs (Baylor Univ. College of Medicine), and the nearing completion of the mouse genome are good news for the cancer research community.
Molecular and clinical phenotyping.
Well defined phenotypes are essential to genetic analyses. Charis Eng (Ohio State Univ.) described three clinically distinct disorders known to be associated with mutations in PTEN, which comprise the PTEN harmartoma tumor syndrome. She stressed the importance of meticulous clinical characterization coupled with genotype analysis-a point all too often neglected in molecular genetic studies and one that should be addressed by improved dialogue between clinicians and laboratory—based researchers.
Gene expression microarrays provide molecular phenotypes that have potential for both diagnostic purposes and providing insights into biology. Over the past year several high-profile papers would seem to presage a robust diagnostic application of expression arrays. Emerging insights into cancer biology are also being provided by expression arrays. Todd Golub (Dana Farber Cancer Institute) reported that a 21-gene set from metastatic lung tumors can predict which other breast tumors and medulloblastoma are likely to metastasize. Laura van't Veer (The Netherlands Cancer Institute) described a correlation between distinct expression profiles of breast cancer and prognosis. The emerging theme from these findings is that primary tumors are 'born' with metastatic potential, which is detectable as a distinct gene expression signature. Consistent with these findings is an observation made by Barbara Weber (Univ. Penn): genes upregulated in full-blown breast cancer, such as ERBB2, are also upregulated in the precursor lesions of ductal carcinoma in situ. In attempting the leap from patterns of gene expression to mechanistic insight, however, David Botstein (Stanford Univ.) cautioned that one must discern cause from effect. For example, cell-cycle genes are upregulated in metastatic tumors—which is not entirely unexpected given the rapid proliferation of tumor cells.
Whereas global gene expression analysis provides a molecular portrait of a tumor, a global picture of protein levels may constitute a more complete description of the tumor environment. The complexity of the proteome, however, makes this characterization a formidable task. Nonetheless, nascent approaches offer a means of addressing this question. Roy Jensen (Vanderbilt Univ.) described how matrix-assisted laser desorption ionization—time of flight (MALDI-TOF) analysis could be applied directly to micro-dissected tumors to produce protein profiles. The demonstration that this can be carried out on brain slices provides a direct link between protein profiles and anatomical information. The limitation of this approach is that it does not allow quantitation of proteins. Addressing this problem is an approach described by Rudolf Aebersold (Institute for Systems Biology): the use of isotope-coded affinity tags with tandem mass spectrometry to identify and quantify proteins in tumors.
Mechanisms and pathways.
The pursuit of risk alleles and patterns of gene expression in cancers ideally leads to the identification of pathways relevant to the biology of cancer. Whereas global analysis of expression provides hypothesis-generating studies, it is usually necessary to carry out hypothesis-driven research to identify the key players and events in molecular pathways.
The way in which epigenetic modifications influence tumorigenesis provides a good case in point. Peter Jones (Univ. Southern California, Los Angeles) affirmed that gene silencing caused by the methylation of promoters can provide the 'second hit' to tumor-suppressor genes. Studies have also revealed extensive methylation in coding regions; however, this does not seem to have a silencing effect. To identify the genes that are silenced by hypermethylation, Stephen Baylin (Johns Hopkins Univ.) described a screen in which expression arrays are used to identify genes up-regulated in cells following treatment with a low dose of the demethylating agent 5-aza-2′ deoxycytidine and trichostatin A, a histone deacetylase inhibitor (see page 141 of this issue).
Transformation of cells provides an in vitro model of cancer. Tom Curran (St Jude Children's Research Hospital) described a system in which transformation can be alternately induced and repressed. By identifying gene expression patterns that are altered, he is able to identify molecules and epigenetic events that correlate to transformation. Michael Stratton (The Sanger Institute) described a genomic approach involving heteroduplex assay of genes in already-implicated pathways. Using this approach, he has discovered mutations of BRAF (encoding a kinase) in non–small-cell lung carcinoma, melanoma, colorectal cancer and ovarian cancer. On analyzing the status of RAS (Ras activates B-Raf) in the same tumors and taking into account protein sequence and kinase assays, he speculates that mutations in the region of BRAF encoding the kinase domain mimic the activation of B-Raf by Ras.
Studies such as this build on the many years of work that have gone into elucidating net molecualar interactions. Charles Sherr (St Jude Children's Research Hospital) discussed the importance of the Ink4a–Arf locus in the Rb and p53 pathways and suggested that perturbation of this network is common to all cancer cells.
The determination of comprehensive binding maps of transcription factors represents a global approach to identifying networks of interacting genes. Work described by Vishy Iyer (Univ. of Texas) entails the construction of microarrays containing all intergenic regions of yeast, which are used to hybridize genomic fragments pulled out by chromatin immunoprecipitation. Applying this approach to the human genome is attractive, but the huge amount of non-coding DNA presents a significant obstacle to a comprehensive analysis. Iyer is currently testing bioinformatic methods to identify regions of the genome that are likely to regulate transcription, thus enabling him to prioritize regions for arraying—and is validating this approach on yeast before adapting it to the human genome.
Array technology provides a means of observing the effect of different drugs at the cellular level. Examples of this approach were described by Paul Meltzer (NHGRI), who is investigating the response of breast cancer cell lines to various chemical agents. Patrick Johnston (Queen's Univ., Dublin) described the search for targets of 5-fluorouracil, a common chemo-therapeutic agent, using expression arrays. Those unfamiliar with pharmacological research might benefit from a study design described by Paul Workman (Institute of Cancer Research), who stressed the importance of time-course analysis, concentration dependence analysis and repetition in different cell lines when carrying out expression array analysis in response to different drugs. Lynn Matrisian's (Vanderbilt Univ.) discussion about the disappointing efficacy of performance of synthetic inhibitors of the matrix-degrading metalloproteinases in clinical trials underscored the challenges in designing appropriate clinical trials using preclinical data.
With the vast amount of data now being generated through expression array studies, careful planning of data management, analysis and archiving are required. Thus, the recent launch of ArrayExpress, which provides a web-based infrastructure for sharing microarray data, comes as welcome news. This initiative, led by Alvis Brazma (European Bioinformatics Institute), is currently ramping up, as is the Gene Expression Omnibus repository (of the National Center for Biotechnology Information), and should lead to a reduction in the current practice of storing microarray data on personal websites.
To permit facile comparison of data sets, standards must be established. As outlined by John Quackenbush (The Institute for Genomic Research) and Christian Stoekhert (Univ. Pennsylvania), these standards apply not only to data, but also to ancillary information. Moreover, a standardized language to describe both study design and the data is being developed by the Microarray and Gene Expression Databases (MGED) group.
A recent report in Cancer (94, 2766–2792; 2002) indicates that cancer deaths in the US decreased by 1% per year from 1993 to 1999. The report attributes this reduction to a decrease in tobacco use, earlier detection through screening programs and more effective treatments. Despite this trend, the report states that the increasing age of the US population will result in the number of cancer cases doubling by 2050. One can imagine that this will be the case in many countries. Thus, cancer continues to be one of the greatest public health concerns of the western world—although significant advances in understanding the biology of cancer are apparent. This is most noticeable in the maturation of expression array technology, which has begun to provide some of the dividends that have been touted for so long. To maximize the full potential of expression array analysis, it must be wisely integrated with other approaches. Although this can occur within a single laboratory, the advancement of science derives enormous benefit from collaboration. It was with this in mind that co-organizer Jeffrey Trent (NHGRI) closed the meeting by emphasizing the value of getting everybody sitting in the same room. Such occasions ensure that the flow of information from different areas of research defies Crick's central dogma.