Cancer and genomics

Futreal, P. Andrew; Kasprzyk, Arek; Birney, Ewan; Mullikin, James C.; Wooster, Richard; Stratton, Michael R.

doi:10.1038/35057046

Analysis
Published: 15 February 2001

Cancer and genomics

P. Andrew Futreal¹,
Arek Kasprzyk³,
Ewan Birney³,
James C. Mullikin²,
Richard Wooster^1,4 &
…
Michael R. Stratton^1,4

Nature volume 409, pages 850–852 (2001)Cite this article

4098 Accesses
106 Citations
Metrics details

Abstract

Identification of the genes that cause oncogenesis is a central aim of cancer research. We searched the proteins predicted from the draft human genome sequence for paralogues of known tumour suppressor genes, but no novel genes were identified. We then assessed whether it was possible to search directly for oncogenic sequence changes in cancer cells by comparing cancer genome sequences against the draft genome. Apparently chimaeric transcripts (from oncogenic fusion genes generated by chromosomal translocations, the ends of which mapped to different genomic locations) were detected to the same degree in both normal and neoplastic tissues, indicating a significant level of false positives. Our experiment underscores the limited amount and variable quality of DNA sequence from cancer cells that is currently available.

You have full access to this article via your institution.

Download PDF

Computational analysis of cancer genome sequencing data

Article 08 December 2021

Isidro Cortés-Ciriano, Doga C. Gulhan, … Peter J. Park

Pan-cancer analysis of whole genomes

Article Open access 05 February 2020

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium

The landscape and driver potential of site-specific hotspots across cancer genomes

Article Open access 13 May 2021

Randi Istrup Juul, Morten Muhlig Nielsen, … Jakob Skou Pedersen

Main

All cancers are caused by abnormalities in DNA sequence. Throughout life, the DNA in human cells is exposed to mutagens and suffers mistakes in replication, resulting in progressive, subtle changes in the DNA sequence in each cell. Occasionally, one of these somatic mutations alters the function of a critical gene, providing a growth advantage to the cell in which it has occurred and resulting in the emergence of an expanded clone derived from this cell. Additional mutations in the relevant target genes, and consequent waves of clonal expansion, produce cells that invade surrounding tissues and metastasize. Cancer is the most common genetic disease: one in three people in the western world develop cancer, and one in five die from it¹.

Around 30 recessive oncogenes (tumour suppressor genes) and more than 100 dominant oncogenes have been identified. In the past, the most successful way to identify such genes was to narrow their location to a small part of the genome using mapping strategies, and then to screen candidate genes in the region for mutations in cancer cases. However, this strategy has its limitations. Mapping information can be confusing or misleading. Moreover, some cancer genes leave no obvious ‘identifiers’ in the genome and therefore cannot be readily positioned using such maps (for example, dominant oncogenes that are activated by single nucleotide substitutions leading to a single amino-acid change). These mutated genes are essentially invisible to conventional detection and their identification has usually depended upon selection of likely candidates on the basis of the biological features associated with cancer.

How will the genome sequence help us to identify the remaining cancer genes? One possibility is to generate a longer list of plausible candidates by searching for paralogues of known cancer genes. To this end we searched a protein set which represents the current ‘best guess’ of proteins encoded by the human genome² for paralogues of known recessive oncogenes/tumour suppressor genes (see Supplementary Information; Table 1). Although we detected most of the known family members, we found no convincing evidence of new ones at the level of stringency of this search (50% or greater amino-acid identity over a minimum of 50 amino acids). The lack of detectable paralogues could be due to deficiencies of the protein set, despite the fact that new paralogues have been found for many other families². To confirm our results, we ran an exhaustive search of one of the genes, TP53, against the total draft genome sequence, considering every possible gene prediction and reading frame³. This additional search revealed no unknown paralogues.

Table 1 Recessive oncogenes used in the search for paralogues

Full size table

The lack of novel paralogues may reflect the biological and medical importance of these gene families; most of their members may have already been found. But additional paralogues could be hiding in unsequenced regions of the genome or may have gone undetected owing to the deficiencies of gene prediction. Clearly, detection of paralogues also depends upon the search criteria used. Predictably, when we lowered the stringency of the search the number of hits increased substantially—but many of these have questionable biological validity.

Was this really a useful way of generating likely candidate cancer genes? Mutated cancer genes do recur in certain gene families (for example, signalling kinases and GDP binding proteins), but these families are often large and, in most cases, only a minority of members are implicated in cancer. In fact, the diversity of structure and function of cancer genes is striking. For example, hardly any of the known recessive oncogenes have strong homology to any other, and their proteins are associated with diverse biological and biochemical functions. Moreover, many close relatives of important cancer-related genes (for example, the genes for p73 and p63, which show sequence similarity to TP53 (ref. 4)) are not known to be mutated in cancer (although it is possible that they are significantly altered by changes in expression). So we may learn more about the mutations driving cancer if we are not too heavily influenced by past experience. Instead, we should persevere in exploring every gene or protein, whatever its structure or putative function, as a possible candidate.

By simply mining the genome sequence for similarity to known cancer genes, we may miss an opportunity. In addition to many more gene sequences, the working draft contains information concerning the organization of genes, their structures and ordering along the chromosomes. Cancers are characterized by disruption and disorganization of genomes. Perhaps we could use the genome sequence as a template against which we can detect structural alterations of the genome in cancer cells.

To do this we need to compare the working draft (and ultimately the finished sequence) with corresponding sequences from cancer cell genomes. However, there is very little cancer genome sequence available, and what is available is patchy. The largest body of sequence from cancer genomes originates from programs that sample clones in complementary DNA libraries constructed from neoplastic tissues (for example, the Cancer Genome Anatomy Project (CGAP, http://www.ncbi.nlm.nih.gov/CGAP/)). In principle, we could try and compare these with the working draft for the somatic base substitutions and small insertions and deletions that often result in inactivation of tumour suppressor genes or activation of dominantly acting oncogenes. In practice, however, these databases contain relatively little sequence and do not sample most transcripts in any single tumour. Moreover, the available sequence is mostly from untranslated regions of genes (whereas the cancer-causing mutations cluster in the coding regions). Even if the rare, meaningful somatic mutations could be detected, they would be buried in the debris of sequence errors (both in the cDNA libraries and in the genomic sequence) or camouflaged in a forest of innocuous polymorphisms.

We attempted to use these cDNA library sequences in conjunction with the working draft to look for a different type of alteration in cancer. Gene rearrangements that result in activation of oncogenes can arise as a result of chromosomal translocation. This type of abnormality is common in leukaemias, lymphomas and sarcomas^5,6, and often results in the formation of a chimaeric transcript, the product of a fusion gene derived from portions of transcribed genes on either side of the chromosomal breakpoints (although in some instances the translocation simply results in dysregulation of an intact transcript). Intriguingly, this pattern of oncogenesis has not been frequently documented amongst most of the common, adult epithelial cancers. Whether the rearrangements exist but are hidden in the disorganized complexity of epithelial cancer cell genomes or are simply not present in epithelial cancers is a question that may be addressed in the near future.

To look for such gene rearrangements we obtained all the sequences from the CGAP program (derived from cDNA libraries constructed from both normal tissue samples and neoplastic samples) and selected those clones from which two sequences had been obtained (normal, 215,889 clones; cancer, 25,446 clones). Most of these sequences were from either end of the clones. We then looked in the genomic sequence for matches to these paired cDNA sequences that would allow their chromosomal localization (see Supplementary Information).

Most pairs of sequences from the same cDNA clone mapped to the same position in the genome, as would be expected if they had originated from a single normal transcript. But there were a few pairs derived from a single cDNA clone that mapped to two different parts of the genome. Could they represent transcripts of chimaeric genes generated by chromosomal translocation? Possibly, but this strategy has limitations. Even this conceptually simple experiment was dogged by the intrinsic complexities of the genome, such as low-frequency repeats and multiple, high-fidelity copies of some genes. Moreover, our analyses yielded proportionally the same number of apparently chimaeric transcripts derived from normal tissues as from cancers (3% of starting clones in both cases), indicating a significant rate of false positives. These could result from chimaeric clones arising as artefacts of cDNA library construction, mistracking of sequencing gels, errors in annotating and curating databases or misassembly of the draft sequence.

Of course, for this experiment we used a resource that had not been designed to support such investigations, so it is not surprising that it did not bear much fruit. In addition to the problems of false positives, relatively few clones have been sequenced from most of the libraries, many of the libraries are not normalized (leading to undersampling of less abundant transcripts) and many of the sampled cDNAs are not full length. This was illustrated by the fact that when we searched the CGAP annotations of libraries constructed from five cancer types with known chimaeric genes, we found only one of the ten genes involved in the five chimaeric transcripts. This exercise underscores the point that elucidating the complexity of cancer at the genomic level will require much more sequence data from cancer genomes, which will need to be configured appropriately for the task at hand.

So, the working draft will not immediately reveal the natures of the abnormalities in cancer cell genomes. To facilitate these analyses we will need the finished sequence, which will form a structural framework for a new generation of massive-scale comparisons of cancer cell and normal genomes. Ultimately, it may also prompt a shift away from strategies that depend upon primary genomic localization and allow systematic genome-wide searches for mutations. New technology will be required; there is no single technology at present that will detect all the types of abnormality (large deletions, rearrangements, base substitutions, small insertions and deletions, amplifications, and epigenetic changes such as methylation) that are present in cancer cells. Sequencing of genomic libraries constructed from cancer genomes would come closest to this goal, but given the diversity of cancers and the effort and cost required to obtain reasonable coverage of a human genome this is a daunting challenge.

References

Higginson, J., Muir, C. & Munoz, N. Human Cancer: Epidemiology and Environmental Causes (Cambridge Monographs on Cancer Research, Cambridge, UK, 1992).
Book Google Scholar
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article ADS Google Scholar
Birney, E. & Durbin, R. Using GeneWise in the Drosophila annotation experiment. Genome Res. 10, 547–548 (2000).
Article CAS Google Scholar
Kaelin, W. G. Jr The p53 gene family. Oncogene 18, 7701–7705 (1999).
Article CAS Google Scholar
Rowley, J. D. The critical role of chromosome translocations in human leukemias. Annu. Rev. Genet. 32, 495–519 (1998).
Article CAS Google Scholar
Bell, R. S., Wunder, J. & Andrulis, I. Molecular alterations in bone and soft-tissue sarcoma. Can. J. Surg. 42, 259–266 (1999).
CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Cancer Genome Project and,
P. Andrew Futreal, Richard Wooster & Michael R. Stratton
Informatics Division, Sanger Centre, Wellcome Trust Genome Campus, Cambridge, CB10 1SA, UK
James C. Mullikin
EBI-EMBL, Wellcome Trust Genome Campus, Cambridge, CB10 1SD, UK
Arek Kasprzyk & Ewan Birney
Institute of Cancer Research, Sutton, SM2 5NG, Surrey, UK
Richard Wooster & Michael R. Stratton

Authors

P. Andrew Futreal
View author publications
You can also search for this author in PubMed Google Scholar
Arek Kasprzyk
View author publications
You can also search for this author in PubMed Google Scholar
Ewan Birney
View author publications
You can also search for this author in PubMed Google Scholar
James C. Mullikin
View author publications
You can also search for this author in PubMed Google Scholar
Richard Wooster
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Stratton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael R. Stratton.

Supplementary information

Supplementary Method.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Futreal, P., Kasprzyk, A., Birney, E. et al. Cancer and genomics. Nature 409, 850–852 (2001). https://doi.org/10.1038/35057046

Download citation

Issue Date: 15 February 2001
DOI: https://doi.org/10.1038/35057046

This article is cited by

ICDM-GEHC: identifying cancer driver module based on graph embedding and hierarchical clustering
- Shiyu Deng
- Jingli Wu
- Yumeng Zhao
Complex & Intelligent Systems (2024)
Colorimetric and electrochemical determination of the activity of protein kinase based on retarded particle growth due to binding of phosphorylated peptides to DNA – capped silver nanoclusters
- Congcong Shen
- Kaina Zhang
- Minghui Yang
Microchimica Acta (2016)
CYP3A5*3 polymorphism and cancer risk: a meta-analysis and meta-regression
- Bao-Sheng Wang
- Zhen Liu
- Shao-Long Sun
Tumor Biology (2013)
Patterned growth of vertically aligned silicon nanowire arrays for label-free DNA detection using surface-enhanced Raman spectroscopy
- Changqing Yi
- Cheuk-Wing Li
- Mengsu Yang
Analytical and Bioanalytical Chemistry (2010)
Nanofluidic proteomic assay for serial analysis of oncoprotein activation in clinical specimens
- Alice C Fan
- Debabrita Deb-Basu
- Dean W Felsher
Nature Medicine (2009)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Cancer and genomics

Abstract

Similar content being viewed by others

Computational analysis of cancer genome sequencing data

Pan-cancer analysis of whole genomes

The landscape and driver potential of site-specific hotspots across cancer genomes

Main

References

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Supplementary Method.

Rights and permissions

About this article

Cite this article

This article is cited by

ICDM-GEHC: identifying cancer driver module based on graph embedding and hierarchical clustering

Colorimetric and electrochemical determination of the activity of protein kinase based on retarded particle growth due to binding of phosphorylated peptides to DNA – capped silver nanoclusters

CYP3A5*3 polymorphism and cancer risk: a meta-analysis and meta-regression

Patterned growth of vertically aligned silicon nanowire arrays for label-free DNA detection using surface-enhanced Raman spectroscopy

Nanofluidic proteomic assay for serial analysis of oncoprotein activation in clinical specimens

Comments

Search

Quick links

Abstract

Similar content being viewed by others

Main

References

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links