Intraoperative diagnosis is essential for providing safe and effective care during cancer surgery1. The existing workflow for intraoperative diagnosis based on hematoxylin and eosin staining of processed tissue is time, resource and labor intensive2,3. Moreover, interpretation of intraoperative histologic images is dependent on a contracting, unevenly distributed, pathology workforce4. In the present study, we report a parallel workflow that combines stimulated Raman histology (SRH)5,6,7, a label-free optical imaging method and deep convolutional neural networks (CNNs) to predict diagnosis at the bedside in near real-time in an automated fashion. Specifically, our CNNs, trained on over 2.5 million SRH images, predict brain tumor diagnosis in the operating room in under 150 s, an order of magnitude faster than conventional techniques (for example, 20–30 min)2. In a multicenter, prospective clinical trial (n = 278), we demonstrated that CNN-based diagnosis of SRH images was noninferior to pathologist-based interpretation of conventional histologic images (overall accuracy, 94.6% versus 93.9%). Our CNNs learned a hierarchy of recognizable histologic feature representations to classify the major histopathologic classes of brain tumors. In addition, we implemented a semantic segmentation method to identify tumor-infiltrated diagnostic regions within SRH images. These results demonstrate how intraoperative cancer diagnosis can be streamlined, creating a complementary pathway for tissue diagnosis that is independent of a traditional pathology laboratory.
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
A University of Michigan Institutional Review Boards protocol (no. HUM00083059) was approved for the use of human brain tumor specimens in the present study. To obtain these samples or SRH images, contact D.A.O. A code repository for network training, evaluation and visualizations is publicly available at https://github.com/toddhollon/srh_cnn.
Sullivan, R. et al. Global cancer surgery: delivering safe, affordable, and timely cancer surgery. Lancet Oncol. 16, 1193–1224 (2015).
Novis, D. A. & Zarbo, R. J. Interinstitutional comparison of frozen section turnaround time. A College of American Pathologists Q-Probes study of 32868 frozen sections in 700 hospitals. Arch. Pathol. Lab. Med. 121, 559–567 (1997).
Gal, A. A. & Cagle, P. T. The 100-year anniversary of the description of the frozen section procedure. JAMA 294, 3135–3137 (2005).
Robboy, S. J. et al. Pathologist workforce in the United States: I. Development of a predictive model to examine factors influencing supply. Arch. Pathol. Lab. Med. 137, 1723–1732 (2013).
Freudiger, C. W. et al. Label-free biomedical imaging with high sensitivity by stimulated Raman scattering microscopy. Science 322, 1857–1861 (2008).
Orringer, D. A. et al. Rapid intraoperative histology of unprocessed surgical specimens via fibre-laser-based stimulated Raman scattering microscopy. Nat. Biomed. Eng. 1, ii (2017).
Ji, M. et al. Rapid, label-free detection of brain tumors with stimulated Raman scattering microscopy. Sci. Transl. Med. 5, 201ra119 (2013).
Top 100 Lab Procedures Ranked by Service (2017); https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MedicareFeeforSvcPartsAB/Downloads/LabCHARG17.pdf?agree=yes&next=Accept
Metter, D. M., Colgan, T. J., Leung, S. T., Timmons, C. F. & Park, J. Y. Trends in the US and Canadian Pathologist Workforces From 2007 to 2017. JAMA Netw. Open 2, e194337 (2019).
Hollon, T. C. et al. Rapid intraoperative diagnosis of pediatric brain tumors using stimulated Raman histology. Cancer Res. 78, 278–289 (2018).
Louis, D. N. et al. The 2016 World Health Organization classification of tumors of the central nervous system: a summary. Acta Neuropathol. 131, 803–820 (2016).
Ji, M. et al. Detection of human brain tumor infiltration with quantitative stimulated Raman scattering microscopy. Sci. Transl. Med. 7, 309ra163 (2015).
Krizhevsky, A. et al. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25, 1097–1105 (Curran Associates, Inc., 2012).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).
Titano, J. J. et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 24, 1337–1341 (2018).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Litjens, G. et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis. Sci. Rep. 6, 26286 (2016).
Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. Proc. 2015 IEEE International Conf. Computer Vision (ICCV) 1026–1034 (IEEE Computer Society, 2015).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. AAAI 4, 12 (2017).
Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).
Ostrom, Q. T. et al. CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2010–2014. Neuro-Oncology 19, v1–v88 (2017).
Lee, K., Lee, K., Lee, H. & Shin, J. A Simple Unified Framework for Detecting Out-of-distribution Samples and Adversarial Attacks. Proc. 32nd International Conference on Neural Information Processing Systems 7167–7177 (2018).
Erhan, D, Bengio, Y, Courville, A. & Vincent, P. Visualizing Higher-Layer Features of a Deep Network. Technical Report, Univeristé de Montréal (2009).
Lu, F.-K. et al. Label-free neurosurgical pathology with stimulated Raman imaging. Cancer Res. 76, 3451–3462 (2016).
Kohe, S., Colmenero, I., McConville, C. & Peet, A. Immunohistochemical staining of lipid droplets with adipophilin in paraffin-embedded glioma tissue identifies an association between lipid droplets and tumour grade. J. Histol. Histopathol. 4, 4 (2017).
Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).
Viola, K. V. et al. Mohs micrographic surgery and surgical excision for nonmelanoma skin cancer treatment in the Medicare population. Arch. Dermatol. 148, 473–477 (2012).
Hoesli, R. C., Orringer, D. A., McHugh, J. B. & Spector, M. E. Coherent Raman scattering microscopy for evaluation of head and neck carcinoma. Otolaryngol. Head Neck Surg. 157, 448–453 (2017).
Carter, C. L., Allen, C. & Henson, D. E. Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer 63, 181–187 (1989).
Ratnavelu, N. D. G. et al. Intraoperative frozen section analysis for the diagnosis of early stage ovarian cancer in suspicious pelvic masses. Cochrane Database Syst. Rev. 3, CD010360 (2016).
Sottoriva, A. et al. Intratumor heterogeneity in human glioblastoma reflects cancer evolutionary dynamics. Proc. Natl Acad. Sci. USA 110, 4009–4014 (2013).
Dammers, R. et al. Towards improving the safety and diagnostic yield of stereotactic biopsy in a single centre. Acta Neurochir. 152, 1915–1921 (2010).
Zeiler, M. D. & Fergus, R. Visualizing and Understanding Convolutional Networks. Computer Vision – ECCV 2014 818–833 (2014).
Freudiger, C. W. et al. Stimulated Raman scattering microscopy with a robust fibre laser source. Nat. Photonics 8, 153–159 (2014).
Liu, Y. et al. Detecting cancer metastases on gigapixel pathology images. arXiv [cs.CV] (2017). https://arxiv.org/abs/1703.02442
Yosinski, J., Clune, J., Bengio, Y. & Lipson, H. How transferable are features in deep neural networks? Proc. 27th International Conference on Neural Information Processing Systems 2, 3320–3328 (2014).
Abadi, M. et al. Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016).
Hou, L. et al. Patch-based convolutional neural network for whole slide tissue image classification. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2424–2433 (2016).
Qin, Z. et al. How convolutional neural networks see the world: a survey of convolutional neural network visualization methods. Math. Found. Comput. 1, 149–180 (2018).
We thank T. Cichonski for manuscript editing. This work was supported by the National Institutes of Health National Cancer Institute (grant no. R01CA226527-02), Neurosurgery Research Education Fund, University of Michigan MTRAC and The Cook Family Foundation.
D.A.O. is an advisor and shareholder of Invenio Imaging, Inc., a company developing SRH microscopy systems. C.W.F., Z.U.F. and J.T. are employees and shareholders of Invenio Imaging, Inc.
Peer review information B. Benedetti and J. Carmona were the primary editors on this article, and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The class distribution of (a) training and (b) validation set images are shown as number of patches and patients. Class imbalance results from different incidence rates among human central nervous system tumors. The training set contains over 50 patients for each of the five most common tumor types (malignant gliomas, meningioma, metastasis, pituitary adenoma, and diffuse lower grade gliomas). In order to maximize the number of training images, no cases from medulloblastoma or pilocytic astrocytoma were included in the validation set and oversampling was used to augment the underrepresented class during CNN training. c, Training and validation categorical cross entropy loss and patch-level accuracy is plotted for the training session that yielded the model used for our prospective clinical trial. Training accuracy converges to near-perfect with a peak validation accuracy of 86.4% following epoch 8. Training procedure was repeated 10 times with similar accuracy and cross entropy convergence. Additional training did not result in better validation accuracy and early stopping criteria were reached.
Extended Data Fig. 2 A taxonomy of intraoperative SRH diagnostic classes to inform intraoperative decision-making.
a, Representative example SRH images from each of the 13 diagnostic class are shown. Both diffuse astrocytoma and oligodendroglioma are shown as examples of diffuse lower grade gliomas. Classic histologic features (i.e., piloid process in pilocytic astrocytomas, whorls in meningioma, and microvascular proliferation in glioblastoma) can be appreciated, in addition to features unique to SRH images (e.g., axons in gliomas and normal brain tissue). Scale bar, 50 μm. b, A taxonomy of diagnostic classes was selected specifically to inform intraoperative decision-making, rather than to match WHO classification. Essential intraoperative distinctions, such as tumoral versus nontumoral tissue or surgical versus nonsurgical tumors, allow for safer and more effective surgical treatment. Inference node probabilities inform intraoperative distinctions by providing coarse classification with potentially higher accuracy due to summation of daughter node probabilities16. The probability of any inference node is the sum of all of its daughter node probabilities.
A patch-based classifier that uses high-magnification, high-resolution images for diagnosis requires a method to aggregate patch-level predictions into a single intraoperative diagnosis. Our inference algorithm performs a feedforward pass on each patch from a patient, filters the nondiagnostic patches (line 12), and stores the output softmax vectors in an RN x 13 array. Each column of the array, corresponding to each class, is summed and renormalized (line 22) to produce a probability distribution. We then used a thresholding procedure such that if greater than 90% of the probability density is nontumor/normal, that probability distribution is returned. Otherwise, the normal/nontumor class (gray matter, white matter, gliosis) probabilities are set to zero (line 31), the distribution renormalized, and returned. This algorithm leverages the observation that normal brain and nondiagnostic tissue imaged using SRH have similar features across patients resulting in high patch-level classification accuracy. Using the expected value of the renormalized patient-level probability distribution for the intraoperative diagnosis eliminates the need to train an additional classifier based on patch predictions.
a, Minimum sample size was calculated under the assumption that pathologists’ multiclass diagnostic accuracy ranges from 93% to 97% based on our previous experiments6 and that a clinically significant lower accuracy bound was less than 91%. We, therefore, selected an expected accuracy of 96% and equivalence/noninferiority limit, or delta, of 5%, yielding a noninferiority threshold accuracy of 91% or greater. Minimum sample size was 264 (black point) patients using an alpha of 0.05 and a power of 0.9 (beta = 0.1). b, Flowchart of specimen processing in both the control and experimental arms is shown. c, A total of 302 patients met inclusion criteria and were enrolled for intraoperative SRH imaging. Eleven patients were excluded at the time of surgery due to specimens that were below the necessary quality for SRH imaging. A total of 291 patients were imaged intraoperatively and 13 patients were subsequently excluded due to a Mahalanobis distance-based confidence score (See Extended Data Figure 5), resulting in a total of 278 patients included. d, Meningioma, pituitary adenomas, and malignant gliomas were the most common diagnoses in our prospective cohort. University of Michigan, University of Miami, and Columbia University recruited 55.0%, 26.6%, 18.4% of the total patients, respectively.
a, Pairwise comparison and b, principal component analysis of class-conditional, Mahalanobis distance-based, confidence score for each layer output included in the ensemble. The confidence score from the mid- and high-level hidden features are correlated, which demonstrate that out-of-distribution samples result in greater Mahalanobis distances throughout the network. As previously described and observed in our results, out-of-distribution (i.e. rare tumors) are better detected in the representation space of deep neural networks, rather than the “label-overfitted” output space of the softmax layer23. c, Specimen-level predictions (black hashes, n = 478) and kernel density estimate from the trained LDA classifier for all specimens imaged during the trial period projected onto the linear discriminant axis. Trial and rare tumor cases were linearly separable resulting in all 13 rare tumor cases imaged during the trial period correctly identified. d, SRH mosaics of rare tumors imaged during the trial period are shown. Germinomas show classic large round neoplastic cells with abundant cytoplasm and fibrovascular septae with mature lymphocytic infiltrate. Choroid plexus papilloma shows fibrovascular cores lined with columnar cuboidal epithelium. Papillary craniopharyngioma have fibrovascular cores with well-differentiated monotonous squamous epithelium. Clival chordoma has unique bubbly cytoplasm (i.e. physaliferous cells). Scale bar, 50 μm.
a, The true class probability and intersection over union values for each of the prospective clinical trial patients incorrectly classified by the pathologists. All 17 were correctly classified using SRH plus CNN. All incorrect cases underwent secondary review by two board-certified neuropathologists (S.C.P., P.C.) to ensure the specimens were 1) of sufficient quality to make a diagnosis and 2) contained tumor tissue. b, SRH mosaic from patient 21 (glioblastoma, WHO IV) is shown. Pathologist classification was metastatic carcinoma; however, CNN metastasis heatmap does not show high probability. Malignant glioma probability heatmap shows high probability over the majority of the SRH mosaic, with a 73.4% probability of patient-level malignant glioma diagnosis. High-magnification views show regions of hypercellularity due to tumor infiltration of brain parenchyma with damaged axons, activated lipid-laden microglia, mitotic figures, and multinucleated cells. c, SRH mosaic from patient 52 diagnosed with diffuse large B-cell lymphoma predicted to be metastatic carcinoma by pathologist. While CNN identified patchy areas of metastatic features within the specimen, the majority of the image was correctly classified as lymphoma. High-magnification views show atypical lymphoid cells with macrophage infiltration. Regions with large neoplastic cells share cytologic features with metastatic brain tumors, as shown in Fig. 3. Scale bar, 50 μm.
Extended Data Fig. 7 Activation maximization to elucidate SRH feature extraction using Inception-ResNet-v2.
a, Schematic diagram of Inception-ResNet-v2 shown with repeated residual blocks compressed. Residual connections and increased depth resulted in better overall performance compared to previous Inception architectures. b, To elucidate the learned feature representations produced by training the CNN using SRH images, we used activation maximization24. Images that maximally activate the specified filters from the 159th convolutional layer are shown as a time series of iterations of gradient ascent. A stable and qualitatively interpretable image results after 500 iterations, both for the CNN trained on SRH images and for ImageNet images. The same set of filters from the CNN trained on ImageNet are shown in order to provide direct comparison of the trained feature extractor for SRH versus natural image classification. c, Activation maximization images are shown for filters from the 5th, 10th, and 159th convolutional layers for CNN trained using SRH images only, SRH images after pretraining on ImageNet images, and ImageNet images only. The resulting activation maximization images for the ImageNet dataset are qualitatively similar to those found in previous publications using similar methods34. CNN trained using only SRH images produced similar classification accuracy compared to pretraining and activation maximization images that are more interpretable compared to those generated using a network pretrained on ImageNet weights.
Extended Data Fig. 8 t-SNE plot of internal CNN feature representations for clinical trial patients.
We used the 1536-dimensional feature vector from the final hidden layer of the Inception-ResNet-v2 network to determine how individual patches and patients are represented by the CNN using t-distributed stochastic neighbor embedding (t-SNE), an unsupervised clustering method to visualize high-dimensional data. a, One hundred representative patches from each trial patient (n = 278) were sampled for t-SNE and are shown in the above plot as small, semi-transparent points. Each trial patient is plotted as a large point located at their respective mean patch position. Recognizable clusters form that correspond to individual diagnostic classes, indicating that tumor types have similar internal CNN representations. b, Gray and white matter form separable clusters from tumoral tissue, but also from each other. lipid-laden myelin in white matter has significantly different SRH features compared to gray matter with axons and glial cells in a neuropil background. c, Diagnostic classes that share cytologic and histoarchitectural features form neighboring clusters, such as malignant glioma, pilocytic astrocytoma, and diffuse lower grade glioma (i.e., glial tumors). Lymphoma and medulloblastoma are adjacent and share similar features of hypercellularity, high nuclear:cytoplasmic ratios, and little to no glial background in dense tumor.
a, A 1000 × 1000-pixel SRH image is shown with the corresponding grid of probability heatmap pixels that results from using a 300 × 300-pixel sliding window with 100-pixel step size in both horizontal and vertical directions. Scale bar, 50 μm. b, An advantage of this method is that the majority of the heatmap pixels are contained within multiple image patches and the probability distribution assigned to each heatmap pixel results from a renormalized sum of overlapping patch predictions. This has the effect of pooling the local prediction probabilities and generates a smoother prediction heatmap. c, For our example, each pixel of the inner 6 × 6 grid has 9 overlapping patches from which the probability distribution is determined. d, An SRH image of a meningioma, WHO grade I, from our prospective trial is shown as an example. Scale bar, 50 μm. e, The meningioma probability heatmap is shown after bicubic interpolation to scale image to the original size. Nondiagnostic prediction and ground truth is for the same SRH mosaic and is shown. f, The SRH semantic segmentation results of the full prospective cohort (n = 278) are plotted. The upper plot shows the mean IOU and standard deviation (i.e., averaged over SRH mosaics from each patient) for ground truth class (i.e., output classes). Note that the more homogenous or monotonous histologic classes (e.g., pituitary adenoma, white matter, diffuse lower grade gliomas) had higher IOU values compared to heterogeneous classes (e.g., malignant glioma, pilocytic astrocytoma). The lower plot shows the mean inference class IOU and standard deviation (i.e., either tumor or normal inference class) for each trial patient. Mean normal inference class IOU for the full prospective cohort was 91.1 ± 10.8 and mean tumor inference class IOU was 86.4 ± 19.0. g, As expected, mean ground truth class IOU values for the prospective patient cohort (n = 278) were correlated with patient-level true class probability (Pearson correlation coefficient, 0.811).
a, Full SRH mosaic of a specimen collected at the brain–tumor margin of a patient with a metastatic brain tumor (non-small cell lung adenocarcinoma). b, Metastatic rests with glandular formation are dispersed among gliotic brain with normal neuropil. c, Three-channel RGB CNN-prediction transparency is overlaid on the SRH image for pathologist review intraoperatively with associated (d) patient-level diagnostic class probabilities. e, Class probability heatmap for metastatic brain tumor (IOU 0.51), nontumor (IOU 0.86), and nondiagnostic (IOU 0.93) regions within the SRH image are shown with ground truth segmentation. Scale bar, 50 μm.
Supplementary Fig. 1.
Supplementary Table 1. Diagnostic information for clinical trial patients.
Intraoperative video of clinical SRH and automated diagnosis using CNN. https://youtu.be/UZZ08_fC7UU or https://umich.box.com/s/01lhqicadan63unwoha4iqfi47jb1o3c. The video shows the automated tissue-to-diagnosis pipeline described in Fig. 1 and used in the clinical trial. Each of the three steps—(1) image acquisition, (2) image processing and (3) diagnostic prediction—is labeled on screen. At the time of surgery, the CNN-predicted diagnosis was diffuse lower-grade glioma (unnormalized probability, 30.6% (shown in video); renormalized probability, 83.0%). Conventional intraoperative H&E diagnosis was ‘atypical glial cells, favor glioma’ and final histopathologic diagnosis was diffuse glioma, WHO grade II.
About this article
Cite this article
Hollon, T.C., Pandian, B., Adapa, A.R. et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat Med 26, 52–58 (2020). https://doi.org/10.1038/s41591-019-0715-9