Rules of evidence for cancer molecular-marker discovery and validation

Ransohoff, David F.

doi:10.1038/nrc1322

Opinion
Published: 01 April 2004

Rules of evidence for cancer molecular-marker discovery and validation

David F. Ransohoff^1,2

Nature Reviews Cancer volume 4, pages 309–314 (2004)Cite this article

2225 Accesses
388 Citations
15 Altmetric
Metrics details

Abstract

According to some claims, molecular markers are set to revolutionize the process of evaluating prognosis and diagnosis for cancer. Research about cancer markers has, however, been characterized by inflated expectations, followed by disappointment when original results can not be reproduced. Even now, disappointment might be expected, in part because rules of evidence to assess the validity of studies about diagnosis and prognosis are both underdeveloped and not routinely applied. What challenges are involved in assessing studies and how might problems be avoided so as to realize the full potential of this emerging technology?

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Method of dividing original sample to assess reproducibility and overfitting.**

REMARK guidelines for tumour biomarker study reporting: a remarkable history

Article 07 December 2022

Daniel F. Hayes, Willi Sauerbrei & Lisa M. McShane

An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression

Article Open access 04 December 2020

Kevin L. Sheng, Lin Kang, … Robin T. Varghese

Pan-cancer prognostic genetic mutations and clinicopathological factors associated with survival outcomes: a systematic review

Article Open access 20 April 2022

Jurgita Gammall & Alvina G. Lai

References

Ramaswamy, S. & Perou, C. M. DNA microarrays in breast cancer: the promise of personalised medicine. Lancet 361, 1576–1577 (2003).
PubMed Google Scholar
Kolata, G. Breast cancer: genes are tied to death rates. New York Times A1 (December 19, 2002).
Zhu, W. et al. Detection of cancer-specific markers amid massive mass spectral data. Proc. Natl Acad. Sci. USA 100, 14666–14671 (2003).
Article CAS PubMed PubMed Central Google Scholar
US Preventive Services Task Force. Guide to clinical preventive services 2nd edn Ch. 2 (US Government Prinitng Office, 1996).
Woolf, S. H. Practice guidelines, a new reality in medicine. II. Methods of developing guidelines. Arch. Intern. Med. 152, 946–952 (1992).
Article CAS PubMed Google Scholar
Tannock, I. F. & Warr, D. G. Unconventional therapies for cancer: a refuge from the rules of evidence? CMAJ 159, 801–802 (1998).
CAS PubMed PubMed Central Google Scholar
Vogelstein, B. et al. Genetic alterations during colorectal-tumor development. N. Engl J. Med. 319, 525–532 (1988).
Article CAS PubMed Google Scholar
Ahlquist, D. A. et al. Colorectal cancer screening by detection of altered human DNA in stool: feasibility of a multitarget assay panel. Gastroenterology 119, 1219–1227 (2000).
Article CAS PubMed Google Scholar
Stears, R. L., Martinsky, T. & Schena, M. Trends in microarray analysis. Nature Med. 9, 140–145 (2003).
Article CAS PubMed Google Scholar
Ransohoff, D. F. Developing molecular biomarkers for cancer. Science 299, 1679–1680 (2003).
Article CAS PubMed Google Scholar
Thomson, D. M., Krupey, J., Freedman, S. O. & Gold, P. The radioimmunoassay of circulating carcinoembryonic antigen of the human digestive system. Proc. Natl Acad. Sci. USA 64, 161–167 (1969).
Article CAS PubMed PubMed Central Google Scholar
Reid, M. C., Lachs, M. S. & Feinstein, A. R. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 274, 645–651 (1995).
Article CAS PubMed Google Scholar
Ransohoff, D. F. & Feinstein, A. R. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N. Engl. J. Med. 299, 926–930 (1978).
Article CAS PubMed Google Scholar
Sackett, D. L. Zlinkoff honor lecture: basic research, clinical research, clinical epidemiology, and general internal medicine. J. Gen. Intern. Med. 2, 40–47 (1987).
Article CAS PubMed Google Scholar
Feinstein, A. R. Clinical biostatistics XXXI. On the sensitivity, specificity, and discrimination of diagnostic tests. Clin. Pharmacol. Ther. 17, 104–116 (1975).
Article CAS PubMed Google Scholar
Ransohoff, D. F. Challenges and opportunities in evaluating diagnostic tests. J. Clin. Epid. 55, 1178–1182 (2002).
Article Google Scholar
Sullivan Pepe, M. et al. Phases of biomarker development for early detection of cancer. J. Natl Cancer Inst. 93, 1054–1061 (2001).
Article Google Scholar
Sackett, D. L., Haynes, R. B., Tugwell, P. & Guyatt, G. H. Clinical Epidemiology: a Basic Science for Clinical Medicine (Little, Brown and Company, Boston, 1991).
Google Scholar
Bogardus, S. T., Concato, J. & Feinstein, A. R. Clinical epidemiological quality in molecular genetic research: the need for methodological standards. JAMA 281, 1919–1926 (1999).
Article PubMed Google Scholar
Deyo, R. A. & Jarvik, J. J. New diagnostic tests: breakthrough approaches or expensive add-ons? Ann. Intern. Med. 139, 950–951 (2003).
Article PubMed Google Scholar
Simon, R. & Altman, D. G. Statistical aspects of prognostic factor studies in oncology. Br. J. Cancer 69, 979–985 (1994).
Article CAS PubMed PubMed Central Google Scholar
Wasson, J. H., Sox, H. C., Neff, R. K. & Goldman, L. Clinical prediction rules. Applications and methodological standards. N. Engl. J. Med. 313, 793–799 (1985).
Article CAS PubMed Google Scholar
Lachs, M. S. et al. Spectrum bias in the evaluation of diagnostic tests: lessons from the rapid dipstick test for urinary tract infection. Ann. Intern. Med. 117, 135–140 (1992).
Article CAS PubMed Google Scholar
Jaeschke, R., Guyatt, G. & Sackett, D. L. Users' guides to the medical literature. III. How to use an article about a diagnostic test. A. Are the results of the study valid? Evidence-Based Medicine Working Group. JAMA 271, 389–391 (1994).
Article CAS PubMed Google Scholar
Sackett, D. L. & Haynes, R. B. The architecture of diagnostic research. BMJ 324, 539–541 (2002).
Article CAS PubMed PubMed Central Google Scholar
Bossuyt, P. M. et al. Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD Initiative. Ann. Intern. Med. 138, 40–44 (2003).
Article PubMed Google Scholar
Bossuyt, P. M. et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann. Intern. Med. 138, W1–W12 (2003).
Article PubMed Google Scholar
Potter, J. D. At the interfaces of epidemiology, genetics and genomics. Nature Rev. Genet. 2, 142–147 (2001).
Article CAS PubMed Google Scholar
Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA 99, 6562–6566 (2002).
Article CAS PubMed PubMed Central Google Scholar
Simon, R., Radmacher, M. D., Dobbin, K. & McShane, L. M. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl Cancer Inst. 95, 14–18 (2003).
Article CAS PubMed Google Scholar
Katz, M. H. Multivariable analysis: a primer for readers of medical research. Ann. Intern. Med. 138, 644–650 (2003).
Article PubMed Google Scholar
Selaru, F. M. et al. Artificial neural networks distinguish among subtypes of neoplastic colorectal lesions. Gastroenterology 122, 606–613 (2002).
Article PubMed Google Scholar
Petricoin, E. F. et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002).
Article CAS PubMed Google Scholar
Qu, Y. et al. Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clin. Chem. 48, 1835–1843 (2002).
CAS PubMed Google Scholar
Huang, E. et al. Gene expression predictors of breast cancer outcomes. Lancet 361, 1590–1596 (2003).
Article CAS PubMed Google Scholar
Harrell, F. E. Jr. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, And Survival Analysis (Springer–Verlag, New York, 2001).
Book Google Scholar
Ntzani, E. E. & Ioannidis, J. P. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362, 1439–1444 (2003).
Article CAS PubMed Google Scholar
van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).
Article CAS PubMed Google Scholar
van 't Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).
Article CAS PubMed Google Scholar
Ransohoff, D. F. Gene-expression signatures in breast cancer. N. Engl. J. Med. 348, 1715–1717 (2003).
Article PubMed Google Scholar
Baker, S. G., Kramer, B. S. & Srivastava, S. Markers for early detection of cancer: statistical guidelines for nested case–control studies. BMC Med. Res. Methodol. 2, 4 (2002).
Article PubMed PubMed Central Google Scholar
Chang, J. C. et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362, 362–369 (2003).
Article CAS PubMed Google Scholar
Brenton, J. D. & Caldas, C. Predictive cancer genomics: what do we need? Lancet 362, 340–341 (2003).
Article PubMed Google Scholar
Rosenwald, A. et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937–1947 (2002).
Article PubMed Google Scholar
Hunter, K. W. Allelic diversity in the host genetic background may be an important determinant in tumor metastatic dissemination. Cancer Lett. 200, 97–105 (2003).
Article CAS PubMed Google Scholar
Masters, J. R. & Lakhani, S. R. How diagnosis with microarrays can help cancer patients. Nature 404, 921 (2000).
Article CAS PubMed Google Scholar
Pharmalicensing. Agenda to develop microarray-based breast cancer test using Agilent Technologies' gene expression platform [online], (cited 22 Sept. 2003) (2003).
Wooster, R. & Weber, B. L. Breast and ovarian cancer. N. Engl. J. Med. 348, 2339–2347 (2003).
Article CAS PubMed Google Scholar
Feinstein, A. R. Multivariable Analysis: An Introduction (Yale University Press, New Haven, 1996).
Book Google Scholar
Fletcher, R. H., Fletcher, S. W. & Wagner, E. H. Clinical Epidemiology: The Essentials 3rd edn (Williams & Wilkins, Baltimore, 1996).
Google Scholar
Bleeker, S. E. et al. External validation is necessary in prediction research: a clinical example. J. Clin. Epidemiol. 56, 826–832 (2003).
Article CAS PubMed Google Scholar
Sorace, J. M. & Zhan, M. A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4, 24 (2003).
Article PubMed PubMed Central Google Scholar
Baggerly, K. A., Morris, J. S. & Coombes, K. R. Reproducibility of SELDI–TOF protein patterns in serum: comparing data sets from different experiments. Bioinformatics. 29 Jan 2004. (doi:10.1093/bioinformatics/btg484)
Hingorani, S. R. et al. Preinvasive and invasive ductal pancreatic cancer and its early detection in the mouse. Cancer Cell 4, 437–450 (2003).
Article CAS PubMed Google Scholar
Feinstein, A. R. Clinical Epidemiology: The Architecture of Clinical Research (WB Saunders, Philadelphia, 1985).
Google Scholar
Hennekens, C. H. & Buring, J. E. Epidemiology in Medicine (Little, Brown and Company, Boston, 1987).
Google Scholar
Freiman, J. A., Chalmers, T. C., Smith, H. Jr & Kuebler, R. R. The importance of β, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 'negative' trials. N. Engl. J. Med. 299, 690–694 (1978).
Article CAS PubMed Google Scholar
Ransohoff, D. F. Discovery-based research and fishing. Gastroenterology 125, 290 (2003).
Article PubMed Google Scholar

Download references

Acknowledgements

Thanks to many colleagues at the National Cancer Institute, The University of North Carolina at Chapel Hill and elsewhere for reviewing and commenting on earlier versions of the manuscript.

Author information

Authors and Affiliations

Departments of Medicine and Epidemiology, University of North Carolina at Chapel Hill, CB# 7080, Bioinformatics Bldg. 4103, Chapel Hill, 27599-7080, North Carolina, USA
David F. Ransohoff
Division of Cancer Prevention, National Cancer Institute, National Institutes of Health, Bethesda, 20892–7354, Maryland, USA
David F. Ransohoff

Authors

David F. Ransohoff
View author publications
You can also search for this author in PubMed Google Scholar

Ethics declarations

Competing interests

The author declares no competing financial interests.

Glossary

CROSS-VALIDATION: A technique used in multivariable analysis that is intended to reduce the possibility of overfitting and of non-reproducible results. The method involves sequentially leaving out parts of the original sample ('split-sample') and conducting a multivariable analysis; the process is repeated until the entire sample has been assessed. The results are combined into a final model that is the product of the training step.
DISCOVERY-BASED RESEARCH: Research in which large amounts of data are examined, without prior hypothesis, to discover markers or patterns that might discriminate among groups of individuals.
HIGH-THROUGHPUT ANALYSIS: Research in which large numbers of variables are analysed simultaneously. RNA expression analysis using microarrays simultaneously examines expression levels of tens of thousands of genes. Proteomic analysis of serum using mass spectroscopy simultaneously examines thousands of peaks related to proteins and peptides.
MICROARRAY: A solid surface on which thousands of specimens, such as synthetic oligonucleotides representing different genes, can be placed in separate locations and used to assess the status of genotype or gene expression for one individual.
MULTIVARIABLE MODELS: Models that simultaneously consider how multiple variables — such as age, gender, co-morbidity, symptoms and gene expression — relate to an outcome such as diagnosis or prognosis.
OVERFITTING: Finding a discriminatory pattern by chance, which can happen when large numbers of variables are assessed for a small number of outcomes.
POLYMERASE CHAIN REACTION: (PCR). A method to replicate or amplify small amounts of DNA into larger amounts that can be used in chemical analysis.
RULES OF EVIDENCE: Rules that are used to evaluate the strength or validity of research results by considering problems such as heterogeneity, complexity, bias and 'generalizeability'. Rules vary depending on the subject or purpose of the study: diagnosis, prognosis, therapy or aetiology.
SERIAL ANALYSIS OF GENE EXPRESSION: (SAGE). A method to estimate numbers of copies of genes.
SINGLE-NULEOTIDE POLYMORPHISM: (SNP). Variations involving a single base.
SPLIT-SAMPLE VALIDATION: Split sample validation is used, confusingly, to mean two different things. It can refer to the method in the training step by which the sample is divided during the process of cross-validation. It can also refer to the method used to divide the original sample of subjects into two groups for use in training and then in independent validation.
VALIDITY: Refers in general to efforts that are made to confirm the accuracy, precision or effectiveness of results, including reproducibility.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ransohoff, D. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 4, 309–314 (2004). https://doi.org/10.1038/nrc1322

Download citation

Issue Date: 01 April 2004
DOI: https://doi.org/10.1038/nrc1322

This article is cited by

Deep learning and machine learning in psychiatry: a survey of current progress in depression detection, diagnosis and treatment
- Matthew Squires
- Xiaohui Tao
- Yuefeng Li
Brain Informatics (2023)
5-Hydroxymethylcytosine is an independent predictor of survival in malignant melanoma
- Gerald Saldanha
- Kushal Joshi
- J Howard Pringle
Modern Pathology (2017)
Adhesion molecules in peritoneal dissemination: function, prognostic relevance and therapeutic options
- Nina Sluiter
- Erienne de Cuba
- Elisabeth Atie te Velde
Clinical & Experimental Metastasis (2016)
Prognostic and predictive miRNA biomarkers in bladder, kidney and prostate cancer: Where do we stand in biomarker development?
- Maria Schubert
- Kerstin Junker
- Joana Heinzelmann
Journal of Cancer Research and Clinical Oncology (2016)
Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value
- Brian David Lehmann
- Yan Ding
- Yajun Yi
BMC Cancer (2015)

Rules of evidence for cancer molecular-marker discovery and validation

Abstract

Access options

Similar content being viewed by others

REMARK guidelines for tumour biomarker study reporting: a remarkable history

An integrated approach to biomarker discovery reveals gene signatures highly predictive of cancer progression

Pan-cancer prognostic genetic mutations and clinicopathological factors associated with survival outcomes: a systematic review

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Related links

DATABASES

Cancer.gov

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Deep learning and machine learning in psychiatry: a survey of current progress in depression detection, diagnosis and treatment

5-Hydroxymethylcytosine is an independent predictor of survival in malignant melanoma

Adhesion molecules in peritoneal dissemination: function, prognostic relevance and therapeutic options

Prognostic and predictive miRNA biomarkers in bladder, kidney and prostate cancer: Where do we stand in biomarker development?

Evaluation of public cancer datasets and signatures identifies TP53 mutant signatures with robust prognostic and predictive value

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Ethics declarations

Competing interests

Related links

Related links

DATABASES

Cancer.gov

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links