Machine learning classifies cancer

Brain tumours are often classified by visual assessment of tumour cells, yet such diagnoses can vary depending on the observer. Machine-learning methods to spot molecular patterns could improve cancer diagnosis.
Derek Wong is in the Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada.

Search for this author in:

Stephen Yip is in the Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia V6T 2B5, Canada.

Search for this author in:

Accurate diagnosis is essential for appropriate disease treatment. A core technique used to diagnose brain cancer today is the microscope-based analysis of tumour samples on glass slides, termed histology. However, this requires the appraisal of subtle cellular alterations, which in some cases may lead to different classifications for a given sample by different individuals. Nowadays, technological developments enable vast amounts of molecular data to be obtained and assessed for a tumour without the need for such subjective diagnostics. Machine-based-learning approaches are being developed to aid the diagnosis of clinical samples, and in a paper in Nature, Capper et al.1 report such a method for classifying brain tumours on the basis of molecular patterns.

In 1926, a publication entitled A Classification of the Tumors of the Glioma Group on a Histo-Genetic Basis with a Correlated Study of Prognosis2 by neurosurgeons Percival Bailey and Harvey Cushing provided early insight into the development, cellular characteristics and clinical consequences of glioma, a type of cancer of the central nervous system (CNS). The book’s title was prophetic and ambitious, given that the microscope-based diagnostic approach they advocated was not common then. The authors’ ideas were ahead of their time — for example, the word ‘histo-genetic’ in the book’s title points to a link between cellular changes and genetics. Bailey and Cushing’s obsessive attention to detail allowed them to identify gross and microscopic tumour features that correlated with clinical outcomes, and the book reported the classification of 14 types of tumour.

Today, many brain tumours are identified by analysis of both histological and molecular features35. The identification6,7 of biologically relevant, tumour-type-defining and clinically informative genetic alterations in brain tumours prompted the World Health Organization (WHO) to update its diagnostic guidelines for certain brain tumours in 2016 to recommend an integrative diagnostic approach that combines both histology and molecular information8,9. However, diagnoses that rely predominantly on histology remain common for many types of rare tumour, owing to a lack of molecular identifiers. Yet histological diagnoses face many challenges, including possible cellular variations in tumours that are a mosaic of cells containing different genetic alterations, or the fact that similar histological features can be shared by many different types of brain tumour. Questions remain about how well histological similarity reflects tumour similarity, given that tumours that have similar histology can progress in different ways, and tumours that have contrasting histology can progress in the same way.

A key development for histological analysis is the expansion of computational tools that allow machine-learning processes to analyse histological data10,11. In this approach, a computer is ‘trained’ using a data set of sample images of tumours that have been classified by a physician. The computer uses the classification information to develop its own pattern-recognition criteria with which to identify tumour types. However, a challenge arises if clearly defined diagnostic criteria for certain tumours are lacking, or if different types of tumour are histologically indistinguishable.

Capper and colleagues decided to focus on molecular information whose classification does not require complex visual assessments. They took a machine-learning approach to tumour classification based on changes in DNA methylation — the addition of methyl groups to DNA — and compared such diagnoses with those made by pathologists using histological analysis.

DNA methylation is a type of modification known as an epigenetic change. This category of alteration does not change the DNA sequence but can affect gene expression or cell fate. The role of aberrant DNA methylation and other epigenetic changes in cancer is becoming increasingly evident12,13. In many cancers, the genome-wide pattern of epigenetic changes, known as the epigenome, can be substantially altered. For example, mutations in the genes IDH1 or IDH2 in gliomas cause genome-wide dysregulation of DNA-methylation patterns that can be correlated with specific clinical outcomes12.

Previous studies1416 have highlighted the diagnostic advantages of profiling DNA methylation for certain types of brain tumour because — compared with histology or the testing of specific genetic alternations — an epigenome-wide analysis of DNA methylation offers an unbiased diagnostic approach. Yet routine epigenome-wide methylation profiling remains relatively uncommon for clinical diagnosis for several reasons, including: cost; sample requirements; a shortage of staff with the necessary data-analysis expertise; and the question of whether the findings would have implications for the clinical treatments used. However, some progress is being made. For example, techniques are now available to use DNA extracted from the most common type of chemically preserved tumour tissue on glass slides, called formalin-fixed, paraffin-embedded (FFPE) specimens.

The authors provided the computer with genome-wide methylation data for samples of almost every CNS tumour type classified by the WHO. The computer used supervised machine learning to recognize methylation patterns present in the pathologist-classified samples, as well as unsupervised machine learning, which involved the computer searching the data sets for patterns that it could use to assign samples into its own computer-generated classification categories.

After training, the computer could classify tumours into 82 distinct classes on the basis of specific methylation profiles. Only 29 of these corresponded to a specific tumour type as defined by the WHO and another 29 represented subclasses of the WHO-defined tumour types.

Yet perhaps the most interesting discoveries made by Capper and colleagues were tumour classifications that grouped together histologically similar types of tumour comprising more than one tumour type as classified by the WHO, or classifications of tumour types that did not match the WHO groupings. Such discoveries might provide insight into tumour similarities that are independent of tumour histology and could aid the development of treatment options or diagnostic tools.

The authors used the computer to classify 1,104 test cases of tumours that had been diagnosed by pathologists using standard histological or molecular techniques (Fig. 1). For 60.4% of these test cases, the computer-based classification was identical to the pathologist’s classification, and for 15.5% of the test cases the computer and pathologist assigned the same type of tumour but the computer could also assign the tumour into a subclass. In 12.6% of the test cases, the computer diagnosis did not match the pathologist’s diagnosis. Remarkably, further rigorous analysis of these cases — by, for example, gene sequencing — resulted in the classification of 92.8% of these unmatched tumours being switched from the original clinical diagnosis to the computer-based classification. Moreover, 71% of the reclassified tumours were assigned to a different tumour grade, a recategorization that might have implications for prognosis or treatment. The remaining test cases (11.5%) could not be classified by the computer. Additional computational analysis indicates that one-third of the tumours in this group might represent rare tumours for which the computer had yet to encounter enough examples to generate a classification grouping.

Figure 1 | Tumour classification using a machine-learning approach. Capper et al.1 used a machine-learning approach to classify brain tumours on the basis of genome-wide patterns of a type of DNA alteration called methylation. The computer was trained using methylation data for tumour samples that had been diagnosed by pathologists using standard microscopy-based analysis or analysis of selected genes. After training, the computer was given 1,104 test cases. The authors compared the diagnoses made by the computer and by the pathologists. Although the machine was unable to diagnose all specimens, of the specimens that it classified, the machine-based diagnosis was more accurate or could assign tumours to more-specific subcategories than the classifications made by the pathologists.

Does Capper and colleagues’ approach represent a probable future standard for tumour diagnosis, given the advantages, such as a low cost per sample that is comparable to that of standard cancer diagnostics; the compatibility with universally available FFPE material; and a website that facilitates data entry, analysis and tumour classification? And, if so, will histological analysis fall by the wayside?

Obtaining a comprehensive molecular profile of a tumour specimen is certainly useful, especially when combined with microscopic examination, and might be the way forward as medical treatments become ever-more personalized to the characteristics of an individual’s tumour. However, for now, histology remains indispensable for disease classification because the standard approaches for specimen preservation and examination by microscopy offer the most accessible and universal entry point in the routine diagnostic workflow used in clinical laboratories worldwide. A disease can manifest itself in both molecular and cellular changes; therefore, an approach that integrates both molecular analysis and visual inspection might strengthen diagnostic capabilities.

Routine and widespread use of the platform developed by Capper et al. might not be practical for many laboratories at present, so the most likely immediate application of this technology would be in assessing cases with ambiguous histological characteristics. Nevertheless, Capper and colleagues’ approach complements, extends and, in some cases, supersedes the tumour-diagnostic potential of microscopic examination.

Nature 555, 446-447 (2018)

doi: 10.1038/d41586-018-02881-7
Nature Briefing

Sign up for the daily Nature Briefing email newsletter

Stay up to date with what matters in science and why, handpicked from Nature and other publications worldwide.

Sign Up


  1. 1.

    Capper, D. et al. Nature 555, 469–474 (2018).

  2. 2.

    Bailey, P. & Cushing, H. A Classification of the Tumors of the Glioma Group on a Histo-Genetic Basis with a Correlated Study of Prognosis (Lippincott, 1926).

  3. 3.

    Cancer Genome Atlas Research Network. N. Engl. J. Med. 372, 2481–2498 (2015).

  4. 4.

    Eckel-Passow, J. E. et al. N. Engl. J. Med. 372, 2499–2508 (2015).

  5. 5.

    Sturm, D. et al. Cell 164, 1060–1072 (2016).

  6. 6.

    Yan, H. et al. N. Engl J. Med. 360, 765–773 (2009).

  7. 7.

    Sturm, D. et al. Cancer Cell 22, 425–437 (2012).

  8. 8.

    Louis, D. N., Ohgaki, H., Wiestler, O. D. & Cavenee, W. K. (eds) WHO Classification of Tumours of the Central Nervous System 4th edn (International Agency For Research on Cancer, 2016).

  9. 9.

    Aldape, K., Nejad, R., Louis, D. N. & Zadeh, G. Neuro Oncol. 19, 336–344 (2017).

  10. 10.

    Kleppe, A. et al. Lancet Oncol. 19, 356–369 (2018).

  11. 11.

    Ehteshami Bejnordi, B. et al. J. Am. Med. Assoc. 318, 2199–2210 (2017).

  12. 12.

    Turcan, S. et al. Nature 483, 479–483 (2012).

  13. 13.

    Schwartzentruber, J. et al. Nature 482, 226–231 (2012).

  14. 14.

    Wiestler, B. et al. Acta Neuropathol. 128, 561–571 (2014).

  15. 15.

    Sahm, F. et al. Lancet Oncol. 18, 682–694 (2017).

  16. 16.

    Korshunov, A. et al. Acta Neuropathol. 134, 965–967 (2017).

Download references