Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Sequence-based cancer genomics: progress, lessons and opportunities

Key Points

  • The genomics revolution has catalysed the development of new technologies that can be applied to provide a comprehensive view of the molecular changes that occur during cancer development.

  • Three independent projects — the Cancer Genome Anatomy Project (CGAP), the Human Cancer Genome Project (HCGP) and the Cancer Genome Project (CGP) — have applied sequence-based technologies to produce synergistic data sets that are amenable to integration.

  • The data of these projects are derived from the human genome (through sequencing of gene exons to identify cancer mutations), as well as from the human transcriptome in the form of expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE) tags that are generated from tumours and normal tissues.

  • The CGAP has facilitated the interface of the human genome sequence with the cytogenetic map through FISH-mapping of BAC clones that were substrates for generating the finished genome sequence. This linkage facilitates the characterization of chromosomal aberrations that are associated with cancer.

  • A suite of informatics tools is accessible through the CGAP website that allow in silico analysis of CGAP and HCGP gene-expression data, polymorphisms and chromosomal aberrations of cancer. In the future, these data sets will be integrated with the mutation analysis of the CGP.

  • The data sets that are generated by these projects are a platform for a variety of applications in cancer research, such as the design and generation of microarrays.

Abstract

Technologies that provide a genome-wide view offer an unprecedented opportunity to scrutinize the molecular biology of the cancer cell. The information that is derived from these technologies is well suited to the development of public databases of alterations in the cancer genome and its expression. Here, we describe the synergistic efforts of research programmes in Brazil, the United Kingdom and the United States towards building integrated databases that are widely accessible to the research community, to enable basic and applied applications in cancer research.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The contribution of sequence-based genomic data to cancer research.

Similar content being viewed by others

References

  1. Hanahan, D. & Weinberg, R. A. The hallmarks of cancer. Cell 100, 57–70 (2000).

    CAS  PubMed  Google Scholar 

  2. Van Dyke, T. & Jacks, T. Cancer modeling in the modern era: progress and challenges. Cell 108, 135–144 (2002).

    Article  CAS  PubMed  Google Scholar 

  3. Dunn, G. P., Bruce, A. T., Ikeda, H., Old, L. J. & Schreiber, R. D. Cancer immunoediting: from immunosurveillance to tumor escape. Nature Immunol. 3, 991–998 (2002).

    Article  CAS  Google Scholar 

  4. Adams, M. D. et al. Sequence identification of 2,375 human brain genes. Nature 355, 632–634 (1992).The development and early application of ESTs to study human gene expression.

    Article  CAS  PubMed  Google Scholar 

  5. Strausberg, R. L. & Riggins, G. J. Navigating the human transcriptome. Proc. Natl Acad. Sci. USA 98, 11837–11838 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene-expression. Science 270, 484–487 (1995).The initial description of the serial analysis of gene-expression strategy.

    Article  CAS  PubMed  Google Scholar 

  7. Strausberg, R. L., Buetow, K. H., Emmert-Buck, M. R. & Klausner, R. D. The cancer genome anatomy project — building an annotated gene index. Trends Genet. 16, 103–106 (2000).The launch of the Cancer Genome Anatomy Project.

    Article  CAS  PubMed  Google Scholar 

  8. Riggins, G. J. et al. SAGEmap: a gene expression resource for the Cancer Genome Anatomy Project. Am. J. Human Genet. 67, 357 (2000).

    Article  Google Scholar 

  9. Boon, K. et al. An anatomy of normal and malignant gene expression. Proc. Natl Acad. Sci. USA 99, 11287–11292 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hough, C. D. et al. Large-scale serial analysis of gene expression reveals genes differentially expressed in ovarian cancer. Cancer Res. 60, 6281–6287 (2000).

    CAS  PubMed  Google Scholar 

  11. Loging, W. T. et al. Identifying potential tumor markers and antigens by database mining and rapid expression screening. Genome Res. 10, 1393–1402 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Riggins, G. J. Using serial analysis of gene expression to identify tumor markers and antigens. Dis. Markers 17, 41–48 (2001).

    Article  CAS  PubMed  Google Scholar 

  13. Porter, D. A. et al. A SAGE (serial analysis of gene expression) view of breast tumor progression. Cancer Res. 61, 5697–5702 (2001).

    CAS  PubMed  Google Scholar 

  14. St Croix, B. et al. Genes expressed in human tumor endothelium. Science 289, 1197–1202 (2000).

    Article  CAS  PubMed  Google Scholar 

  15. Lal, A. et al. Transcriptional response to hypoxia in human tumors. J. Natl Cancer Inst. 93, 1337–1343 (2001).

    Article  CAS  PubMed  Google Scholar 

  16. Birney, E., Clamp, M. & Hubbard, T. Databases and tools for browsing genomes. Annu. Rev. Genomics Hum. Genet. 3, 293–310 (2002).

    Article  CAS  PubMed  Google Scholar 

  17. Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).Ensembl combines many genomic data sources to provide a comprehensive view of the human and other genomes. In the future, cancer genome data could be viewed in a similar way.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Karolchik, D. et al. The UCSC genome browser database. Nucleic Acids Res. 31, 51–54 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Druker, B. J. Perspectives on the development of a molecularly targeted agent. Cancer Cell 1, 31–36 (2002).

    Article  CAS  PubMed  Google Scholar 

  20. Cheung, V. G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953–958 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kirsch, I. R. & Ried, T. Integration of cytogenetic data with genome maps and available probes: present status and future promise. Semin. Hematol. 37, 420–428 (2000).

    Article  CAS  PubMed  Google Scholar 

  22. Kirsch, I. R. et al. A systematic, high-resolution linkage of the cytogenetic and physical maps of the human genome. Nature Genet. 24, 339–340 (2000).

    Article  CAS  PubMed  Google Scholar 

  23. Schaefer, C., Grouse, L., Buetow, K. & Strausberg, R. L. A new cancer genome anatomy project web resource for the community. Cancer J. 7, 52–60 (2001).

    CAS  PubMed  Google Scholar 

  24. Neto, E. D. et al. Shotgun sequencing of the human transcriptome with ORF expressed sequence tags. Proc. Natl Acad. Sci. USA 97, 3491–3496 (2000).The development and application of ORESTES.

    Article  Google Scholar 

  25. Neto, E. D. et al. Mini-libraries constructed from cDNA generated by arbitrarily primed RT-PCR: an alternative to normalized libraries for the generation of ESTs from nanogram quantities of mRNA. Gene 186, 135–142 (1997).

    Article  Google Scholar 

  26. de Souza, S. J. et al. Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. Proc. Natl Acad. Sci. USA 97, 12690–12693 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Camargo, A. A. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome. Proc. Natl Acad. Sci. USA 98, 12103–12108 (2001).

    Article  PubMed  PubMed Central  Google Scholar 

  28. The International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  29. Quackenbush, J. et al. The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species. Nucleic Acids Res. 29, 159–164 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Christoffels, A. et al. STACK: sequence tag alignment and consensus knowledgebase. Nucleic Acids Res. 29, 234–238 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wheeler, D. L. et al. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 31, 28–33 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Reymond, A. et al. Nineteen additional unpredicted transcripts from human chromosome 21. Genomics 79, 824–832 (2002).

    Article  CAS  PubMed  Google Scholar 

  33. Rondeau, G. et al. Comprehensive analysis of a large genomic sequence at the putative B-cell chronic lymphocytic leukaemia (B-CLL) tumour suppresser gene locus. Mutat. Res. 458, 55–70 (2001).

    CAS  PubMed  Google Scholar 

  34. Bullrich, F. et al. Characterization of the 13q14 tumor suppressor locus in CLL: identification of ALT1, an alternative splice variant of the LEU2 gene. Cancer Res. 61, 6640–6648 (2001).

    CAS  PubMed  Google Scholar 

  35. Montpetit, A., Boily, G. & Sinnett, D. A detailed transcriptional map of the chromosome 12p12 tumour suppressor locus. Eur. J. Hum. Genet. 10, 62–71 (2002).

    Article  CAS  PubMed  Google Scholar 

  36. Sood, R. et al. Cloning and characterization of 13 novel transcripts and the human RC58 gene from the 1q25 region encompassing the hereditary prostate cancer (HPC1) locus. Genomics 73, 211–222 (2001).

    Article  CAS  PubMed  Google Scholar 

  37. Buetow, K. H. et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl Acad. Sci. USA 98, 581–584 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Mironov, A. A., Fickett, J. W. & Gelfand, M. S. Frequent alternative splicing of human genes. Genome Res. 9, 1288–1293 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Modrek, B. & Lee, C. A genomic view of alternative splicing. Nature Genet. 30, 13–19 (2002).

    Article  CAS  PubMed  Google Scholar 

  40. Xu, Q., Modrek, B. & Lee, C. Genome-wide detection of tissue-specific alternative splicing in the human transcriptome. Nucleic Acids Res. 30, 3754–3766 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Xie, H. et al. Computational analysis of alternative splicing using EST tissue information. Genomics 80, 326 (2002).

    Article  CAS  PubMed  Google Scholar 

  42. Correa, R. G., de Carvalho, A. F., Pinheiro, N. A., Simpson, A. J. G. & de Souza, S. J. NABC1 (BCAS1): alternative splicing and downregulation in colorectal tumors. Genomics 65, 299–302 (2000).

    Article  CAS  PubMed  Google Scholar 

  43. Iseli, C. et al. Long-range heterogeneity at the 3′ ends of human mRNAs. Genome Res. 12, 1068–1074 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Strausberg, R. L., Buetow, K. H., Greenhut, S. F., Grouse, L. H. & Schaefer, C. F. The cancer genome anatomy project: online resources to reveal the molecular signatures of cancer. Cancer Invest. 20, 1038–1050 (2002).

    Article  CAS  PubMed  Google Scholar 

  45. Strausberg, R. L., Greenhut, S. F., Grouse, L. H., Schaefer, C. F. & Buetow, K. H. In silico analysis of cancer through the cancer genome anatomy project. Trends Cell Biol. 11, 66–71 (2001).

    Article  Google Scholar 

  46. Leerkes, M. R. et al. In silico comparison of the transcriptome derived from purified normal breast cells and breast tumor cell lines reveals candidate upregulated genes in breast tumor cells. Genomics 79, 257–265 (2002).

    Article  CAS  PubMed  Google Scholar 

  47. Schmitt, A. O. et al. Exhaustive mining of EST libraries for genes differentially expressed in normal and tumour tissues. Nucleic Acids Res. 27, 4251–4260 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Mitas, M. et al. Prostate-specific Ets (PSE) factor: a novel marker for detection of metastatic breast cancer in axillary lymph nodes. Br. J. Cancer 86, 899–904 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Olsson, P. et al. GDEP, a new gene differentially expressed in normal prostate and prostate cancer. Prostate 48, 231–241 (2001).

    Article  CAS  PubMed  Google Scholar 

  50. Nelson, P. S. et al. Comprehensive analyses of prostate gene expression: convergence of expressed sequence tag databases, transcript profiling and proteomics. Electrophoresis 21, 1823–1831 (2000).

    Article  CAS  PubMed  Google Scholar 

  51. Nelson, P. S. Identifying immunotherapeutic targets for prostate carcinoma through the analysis of gene expression profiles. Ann. NY Acad. Sci. 975, 232–245 (2002).

    Article  CAS  PubMed  Google Scholar 

  52. De Young, M. P., Damania, H., Scheurle, D., Zylberberg, C. & Narayanan, R. Bioinformatics-based discovery of a novel factor with apparent specificity to colon cancer. In Vivo 16, 239–248 (2002).

    CAS  PubMed  Google Scholar 

  53. Shillitoe, E. J. et al. Genome-wide analysis of oral cancer — early results from the Cancer Genome Anatomy Project. Oral Oncol. 36, 8–16 (2000).

    Article  CAS  PubMed  Google Scholar 

  54. Patel, V., Leethanakul, C. & Gutkind, J. S. New approaches to the understanding of the molecular basis of oral cancer. Crit. Rev. Oral Biol. Med. 12, 55–63 (2001).

    Article  CAS  PubMed  Google Scholar 

  55. Brinkmann, U. et al. PAGE-1, an X chromosome-linked GAGE-like gene that is expressed in normal and neoplastic prostate, testis, and uterus. Proc. Natl Acad. Sci. USA 95, 10757–10762 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Vinals, C., Gaulis, S. & Coche, T. Using in silico transcriptomics to search for tumor-associated antigens for immunotherapy. Vaccine 19, 2607–2614 (2001).

    Article  CAS  PubMed  Google Scholar 

  57. Scanlan, M. J. et al. Identification of cancer/testis genes by database mining and mRNA expression analysis. Int. J. Cancer 98, 485–492 (2002).An excellent example of the use of the EST databases to identify genes that are relevant to cancer.

    Article  CAS  PubMed  Google Scholar 

  58. Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

    Article  CAS  PubMed  Google Scholar 

  59. Davies, H. et al. Mutations of the BRAF gene in human cancer. Nature 417, 949–954 (2002).The first report of mutations in the BRAF gene.

    Article  CAS  PubMed  Google Scholar 

  60. Lowinger, T. B., Riedl, B., Dumas, J. & Smith, R. A. Design and discovery of small molecules targeting Raf-1 kinase. Curr. Pharm. Des. 8, 2269–2278 (2002).

    Article  CAS  PubMed  Google Scholar 

  61. Miller, D. G. On the nature of susceptibility to cancer. The presidential address. Cancer 46, 1307–1318 (1980).

    Article  CAS  PubMed  Google Scholar 

  62. Strausberg, R. L. et al. An international database and integrated analysis tools for the study of cancer gene expression. Pharmacogenomics J. 2, 156–164 (2002).

    Article  CAS  PubMed  Google Scholar 

  63. Sorlie, T. et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. USA 98, 10869–10874 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Chung, C. H., Bernard, P. S. & Perou, C. M. Molecular portraits and the family tree of cancer. Nature Genet. 32, 533–540 (2002).

    Article  CAS  PubMed  Google Scholar 

  65. van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. New Eng. J. Med. 347, 1999–2009 (2002).

    Article  CAS  PubMed  Google Scholar 

  66. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000).

    Article  CAS  PubMed  Google Scholar 

  67. Alizadeh, A. et al. The lymphochip: a specialized cDNA microarray for the genomic-scale analysis of gene expression in normal and malignant lymphocytes. Cold Spring Harb. Symp. Quant. Biol. 64, 71–78 (1999).

    Article  CAS  PubMed  Google Scholar 

  68. Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).

    Article  CAS  PubMed  Google Scholar 

  69. Edgar, R., Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Buetow, K. H. et al. Cancer Molecular Analysis Project: weaving a rich cancer research tapestry. Cancer Cell 1, 315–318 (2002).

    Article  CAS  PubMed  Google Scholar 

  71. Velculescu, V. E., Zhang, L., Vogelstein, B. & Kinzler, K. W. Serial analysis of gene-expression. Science 270, 484–487 (1995).

    Article  CAS  PubMed  Google Scholar 

  72. Druker, B. Signal transduction inhibition: results from phase I clinical trials in chronic myeloid leukemia. Semin. Hematol. 38, 9–14 (2001).A classic example of targeted molecular therapeutics.

    Article  CAS  PubMed  Google Scholar 

  73. Druker, B. J. Imatinib and chronic myeloid leukemia: validating the promise of molecularly targeted therapy. Eur. J. Cancer 38, 70–76 (2002).

    Article  Google Scholar 

  74. Rozycka, M., Collins, N., Stratton, M. R. & Wooster, R. Rapid detection of DNA sequence variants by conformation-sensitive capillary electrophoresis. Genomics 70, 34–40 (2000).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

R.L.S. thanks R. Klausner for initiating the Cancer Genome Anatomy Project and for vigorous encouragement and support during the implementation. A.J.G.S. is indebted to F. Perez and L. Old, respectively the Scientific Directors of the State of São Paulo Research Foundation (FAPESP) and the Ludwig Institute for Cancer Research, for their enthusiastic support of the Human Cancer Genome Project. R.W. thanks M. Stratton, A. Futreal and the Wellcome Trust.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert L. Strausberg.

Related links

Related links

DATABASES

LocusLink

ABL

ARAF1

BCR

BRAF

BRCA1

BRCA2

RAF1

OMIM

chronic myeloid leukaemia

FURTHER INFORMATION

Cancer Genome Anatomy Project

Cancer Genome Project

dbEST

Gene Expression Omnibus

Human Cancer Genome Project

IMAGE Consortium

Mitelman Database of Chromosome Aberrations in Cancer

SAGE Genie

SAGEmap

Spectral Karyotyping/Comparative Genomic Hybridization Database

Glossary

TRANSCRIPTOME

The complete catalogue of all the RNA species of a cell, tissue or organism.

cDNA

Complementary DNA that is produced from an RNA template by an RNA-dependent DNA polymerase

HYPOXIA

A physiological state in which insufficient oxygen reaches a tissue.

FLUORESCENCE IN SITU HYBRIDIZATION

(FISH). A technology in which chromosomes (or chromosomal segments) are painted with fluorescent molecules.

BACTERIAL ARTIFICIAL CHROMOSOME

(BAC). A DNA molecule that can be propagated in bacteria and is useful for cloning large (100–200 kb) segments of DNA from other species.

PHASE I TRIAL

The first stage in a clinical trial, which is designed to assess the safety and dosage levels of a new treatment, and usually involves only a few patients.

BLAST CRISIS

The progression of myeloid leukaemia from a clonal proliferation of myeloid cells to a highly refractory progressive disease with >30% blast cells in the peripheral blood and bone marrow, and a one-year survival of <10%.

HETERODUPLEX ASSAY

A rapid method to detect mutations that relies on the fact that double-stranded DNA molecules with a single base-pair mismatch migrate to a different location compared with molecules that do not have a mismatch.

PASSENGER ALTERATIONS

A phenomenon that refers to the fact that some mutations do not seem to have any functional benefit for the tumour; they probably arise owing to faulty DNA-repair machinery in the tumour, and are 'just along for the ride'.

COMPARATIVE GENOMIC HYBRIDIZATION

(CGH). A technology through which tumour and reference DNA are differentially labelled to show copy-number changes in tumour genomes

SPECTRAL KARYOTYPING

(SKY). A technique for painting each chromosome in a different colour, which is useful for looking at chromosomal aberrations such as translocations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strausberg, R., Simpson, A. & Wooster, R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet 4, 409–418 (2003). https://doi.org/10.1038/nrg1085

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1085

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing