The breast stromal microenvironment is a pivotal factor in breast cancer development, growth and metastases. Although pathologists often detect morphologic changes in stroma by light microscopy, visual classification of such changes is subjective and non-quantitative, limiting its diagnostic utility. To gain insights into stromal changes associated with breast cancer, we applied automated machine learning techniques to digital images of 2387 hematoxylin and eosin stained tissue sections of benign and malignant image-guided breast biopsies performed to investigate mammographic abnormalities among 882 patients, ages 40–65 years, that were enrolled in the Breast Radiology Evaluation and Study of Tissues (BREAST) Stamp Project. Using deep convolutional neural networks, we trained an algorithm to discriminate between stroma surrounding invasive cancer and stroma from benign biopsies. In test sets (928 whole-slide images from 330 patients), this algorithm could distinguish biopsies diagnosed as invasive cancer from benign biopsies solely based on the stromal characteristics (area under the receiver operator characteristics curve = 0.962). Furthermore, without being trained specifically using ductal carcinoma in situ as an outcome, the algorithm detected tumor-associated stroma in greater amounts and at larger distances from grade 3 versus grade 1 ductal carcinoma in situ. Collectively, these results suggest that algorithms based on deep convolutional neural networks that evaluate only stroma may prove useful to classify breast biopsies and aid in understanding and evaluating the biology of breast lesions.
The diagnostic classification of benign breast diseases, putative breast cancer precursors, and breast cancer is based largely on the histopathological appearance and molecular characteristics of epithelial cells . Although the appearance of breast stroma contributes to pathologists’ diagnostic impressions, including recognition of invasion, these subjective assessments have not been formally classified. Given that the tumor microenvironment is important in tumor growth, angiogenesis, and metastasis [2,3,4], and that stromal-epithelial interactions [5,6,7,8] contribute to progression of ductal carcinoma in situ to invasive breast cancer, we hypothesize that morphologic analysis of stroma could have importance in understanding breast carcinogenesis and diagnosis. This view is supported by evidence that the transition from ductal carcinoma in situ to invasion is characterized by greater changes in gene expression of stromal cells than in epithelial tumor cells [9, 10].
Apart from evaluation of lymphoid infiltrates, which are a diagnostic feature of medullary carcinoma and can be graded , stromal alterations are often subtle and difficult to characterize and quantify by light microscopy alone. Emerging data suggest that automated pattern recognition systems could be used to characterize stromal changes. For example, in a computer-generated automated analysis of routinely prepared hematoxylin and eosin (H&E) stained breast cancer tissue sections, Beck et al.  reported that stromal features were associated with breast cancer survival, and were more predictive of prognosis than epithelial features. Development of an automated computerized tool to identify and characterize tumor-associated stroma could have utility in pathologic diagnosis with respect to evaluating tumor margins and cancer field effects or in predicting the potential of ductal carcinoma in situ to progress to invasion, if occult neoplastic cells persist after treatment.
Development of robust computerized algorithms for discriminating patterns of normal stroma and tumor-associated stroma in histopathology images is a complex task, partly because validated morphologic criteria for distinguishing tumor-associated stroma are undefined. Machine learning approaches, and more specifically deep learning algorithms, could prove very suitable for accomplishing this objective as they are capable of learning the most discriminative features directly from a large set of classified diagnostic images, and therefore, do not require pre-defined morphologic criteria [13,14,15]. Thus, in a study of women who underwent image-guided breast biopsy to investigate a radiologic abnormality , we aimed to develop a clinically applicable algorithm reflecting the nature of breast biopsy specimens that pathologists receive for diagnosis. The objectives of the present study were: (1) to generate a deep learning algorithm that can identify and distinguish tumor-associated stromal alterations from stroma associated with benign breast disease in H&E stained sections of breast biopsies; and (2) to apply the deep learning algorithm to assess stromal characteristics in varying grades of ductal carcinoma in situ, which may represent a proxy for risk of invasion.
Materials and methods
This analysis included 882 women, ages 40–65 years, referred for diagnostic image-guided breast biopsies (including ultrasound-guided needle core biopsy and stereotactic vacuum-assisted biopsy), and who participated in the Breast Radiology Evaluation and Study of Tissues (BREAST) Stamp Project  undertaken between 2007 and 2010 at the University of Vermont Larner College of Medicine and the University of Vermont Medical Center. Women provided informed consent, which included access to medical records, self-reported breast cancer risk information, blood and saliva donations, access to radiological images and pathological tissues for research and follow-up. The study was approved by appropriate ethics review boards at the University of Vermont and at the National Cancer Institute (National Institutes of Health).
Breast biopsies were performed as ultrasound-guided core needle biopsies (14-gauge) or as stereotactically-guided vacuum-assisted biopsies (9-gauge) that were routinely fixed in formalin, prepared as paraffin-embedded tissue sections, and stained with H&E for diagnosis. For study purposes, biopsies were classified as non-proliferative benign breast disease, proliferative benign breast disease without atypia, atypical hyperplasia, ductal or lobular carcinoma in situ or invasive carcinoma . When biopsies included multiple tissue blocks, reflecting target and surrounding non-target tissues, we attempted to collect sections from both types of blocks, yielding a total of 2387 total H&E stained sections that were scanned at 20× (Aperio, ScanScope CS or Hamamatsu) as digital images (resulting specimen level pixel size 0.455 µm × 0.455 µm).
We manually annotated tissue structures to train our deep learning algorithms. We annotated breast tissue components such as stroma, epithelium, and fat, and also stromal regions in benign biopsies and adjacent to invasive cancer in the whole-slide images. Our analysis focused on the extracellular matrix stroma rather than focusing on specific features of stromal composition; thus, we analyzed all stromal areas in specific diagnostic contexts (benign, ductal carcinoma in situ and cancer) and in topographical proximity to diagnostic lesions. To analyze the pattern of stroma surrounding ductal carcinoma in situ lesions, the whole-slide images containing only ductal carcinoma in situ, and whole-slide images containing ductal carcinoma in situ with concurrent invasive cancer were annotated by a pathologist (MES). For each case, a subset of ducts containing ductal carcinoma in situ lesions was annotated on whole-slide images with point annotations in the center of the lesion and graded using standard criteria based on nuclear size and appearance, mitoses and detection of necrosis . Ductal carcinoma in situ lesions in slides with concurrent invasive cancer were annotated if they were peripheral to the invasive component and its associated stroma.
Deep learning algorithms
Deep learning is a subfield of machine learning, where very general algorithms learn features directly from data for prediction and classification. Our whole-slide image classification system is based on multiple deep convolutional neural networks . To enable assessment of unbiased performance of our algorithm, the dataset was randomly split into a training set containing 62% of the whole-slide images (1459 whole-slide images from 552 patients) and a testing set with the remaining slides (928 whole-slide images from 330 patients; Table 1).
Briefly, using representative input patches from the annotated areas, a convolutional neural network model denoted ‘Network I’ was trained using the approach we previously described , to classify fat, stroma, and epithelium. Next, a second model “Network II” was trained operating on stromal regions recognized by Network I. Network II generated a probability that an image represented cancer-associated stroma. This model was trained using manually identified regions of stroma adjacent to invasive cancer and stroma in whole-slide images not containing tumor. Examples of ductal carcinoma in situ-associated stroma were not used in the training phase. To classify whole-slide images into normal/benign vs invasive cancer, a third model ‘Network III’ was constructed and composed of a small convolutional neural network stacked on top of Network II. Network III was trained to generate a score for the entire whole-slide image indicating the probability that the slide contained invasive cancer. More details about this network are described below. Figure 1 shows an overview of the entire classification system.
Network I and Network II in this study possess a VGG-Net-like architecture . VGG-Net is a neural network architecture developed by Oxford’s Visual Geometry Group (VGG), which won the 2014 ImageNet Large Scale Visual Recognition Challenge 2014 task for object localization . Details of our network configuration and training procedures are presented in Supplementary methods sections “Convolutional neural network architecture”, “Preprocessing of whole-slide images and ground truth ROIs” and “Training procedure”.
Description of convolutional neural network III for the classification of whole-slide images into normal/benign vs invasive cancer
To illustrate the potential of stroma characterization, Network III was constructed to identify cancerous biopsies based on the output of Network II only. The output feature map of the penultimate layer of Network II (the hidden layer whose output is fed to the final classification layer) is a compact representation of the input stromal image. Network III takes as input the feature maps from Network II for eight non-overlapping stromal tissue regions (size 152.9 µm × 152.9 µm), which were identified by Network II as harboring the strongest tumor-associated alterations and predicted the whole-slide image diagnosis (normal/benign vs invasive breast cancer; Fig. 1b). Details of the procedure for selecting those regions can be found in Supplementary methods sections “Selection of candidate stromal regions as input to convolutional neural network III”.
To generate the final score for each slide representing the probability of being a cancerous biopsy, we used an ensemble of two networks comprising Network III and a modified version of Network III without the last two fully connected layers. The average probability of the two networks was taken as the final score.
We additionally compared the performance of Network III, developed in the present study, with our recently published approach for whole-slide image classification . Our previous system derived a total of 71 features from the outputs of both Network I and Network II and used these as input for a random forests classifier . These features include the global tissue amount for epithelium, stroma, and fat as well as morphological features of epithelial areas and the spatial distribution of epithelial areas in the whole-slide image derived from two region adjacency graphs: Delaunay triangulation and area-Voronoi diagram .
Classification of breast tissue whole-slide images as invasive carcinoma versus benign breast disease
The training data (1459 whole-slide images from 552 patients) used for this classification task was further divided into two sets, a preliminary set to define parameters, and a second (validation) set (comprising 10% of slides) that was used to perform final model selection and hyper-parameter optimization. The performance of our model was evaluated on the independent test set (928 whole-slide images from 330 patients) described previously above.
Analysis of ductal carcinoma in situ-associated stroma
In this experiment, we analyzed the stromal patterns surrounding ductal carcinoma in situ lesions on breast cancer slides. The ductal carcinoma in situ-associated stroma was analyzed using Network II which was trained to discriminate between normal and tumor-associated stroma. We first classified all the stromal pixels adjacent to annotated ductal carcinoma in situ lesions using Network II. Subsequently, we extracted two measures to quantify ductal carcinoma in situ-associated stroma. These measures are the mean and standard deviation of all tumor-associated stroma probabilities for the pixels surrounding ductal carcinoma in situ lesions. They were computed for a range of distances from the lesion’s margin. This analysis was performed independently on test slides with ductal carcinoma in situ lesions only and with test slides containing ductal carcinoma in situ accompanied by invasive cancer.
The area under the receiver operator characteristic curve was used to evaluate the performance of the system in discriminating between invasive carcinoma and benign breast disease biopsies. The receiver operator characteristics curve plots the sensitivity versus the false positive fraction (1-specificity) . The area under the receiver operator characteristics curve ranges from 0-1 (with 1.0 representing the perfect classifier, 0.5 = to chance). 95% confidence intervals for the receiver operator characteristics curves were obtained using the percentile bootstrap method .
The significance test for comparing two correlated receiver operator characteristics curves, when comparing the performance of the proposed system for classification of whole-slide images with our previously described system, was done using the bootstrap method in R package “pROC” . This method is based on the approach described by Hanley and McNeil  that takes into account the correlation that is induced by the paired nature of the data.
The one-way analysis of variance (ANOVA) and the Tukey post hoc test were used to compare the computed stromal measures described in results section “Analysis of ductal carcinoma in situ-associated stroma” for the patients with different ductal carcinoma in situ grades. A p-value < 0.05 was considered significant. All analyses were two-tailed.
Classification of breast tissue whole-slide images as invasive carcinoma versus benign breast disease
Network I, subdividing whole-slide images into regions consisting of epithelium, stroma, and fat achieved a pixel-level 3-class classification accuracy of 95.5% compared to reference standard, computed on a balanced subset of annotated pixels in the independent test set. Representative examples of tissue classification results are shown in Figure 2. Network II, used for classifying stroma into normal stroma and tumor-associated stroma, achieved a binary classification accuracy of 92.0% compared to reference standard, computed on a balanced subset of annotated pixels in the independent test set. Figure 2 shows the representative output probability map for a slide containing invasive cancer (Figs. 2a–c) and a normal slide (Figs. 2d–f).
Figure 3 shows the receiver operator characteristics curves for the whole-slide image classification of invasive cancer vs. non-cancer using our proposed system and our previously published method . Our newly developed convolutional neural network model achieved an area under the receiver operator characteristics curve of 0.962 (95% CI, 0.936–0.983), which was slightly higher (but not statistically significantly, p = 0.48) than our previously described approach , which achieved an area under the receiver operator characteristics curve of 0.948 (95% CI, 0.915–0.977).
In a subjective post hoc review, our study pathologist (MES) reviewed all misclassified cases: benign biopsies for which the algorithm score indicated a high cancer probability (“false positives”: probability > 0.70, n = 23), and invasive carcinomas for which the algorithm yielded a low probability of cancer (“false negatives”: probability < 0.27, n = 6). In addition, our pathologist reviewed a randomly selected group of correctly classified (true negative and true positive) cases within each diagnostic category. Given that we do not know what specific stromal features may be driving the machine classifier, all cases were reviewed in an un-masked fashion. The 23 benign biopsies that were misclassified as probably cancer, included 10 (43%) diagnoses of sclerosing adenosis; two (9%) of atypical hyperplasia and 11 (48%) of non-proliferative benign breast disease. Many of these benign specimens showed reactive stroma; these were evident in sclerosing adenosis as part of diagnostic criteria for this entity, but other benign samples showed reactions in relation to fat necrosis, suggestive of prior biopsy or ruptured ducts. The six biopsies of invasive carcinoma misclassified by the algorithm as probably benign contained minimal stroma, which likely reflects the combined effects of targeting the hypercellular center of the lesion and the small size of the biopsies.
Analysis of ductal carcinoma in situ-associated stroma
Figure 4 shows representative ductal carcinoma in situ lesions with different histological grades and their corresponding probability maps for tumor-associated stroma. Figure 5 shows examples of stroma patches for different grades of ductal carcinoma in situ harboring tumor-associated alterations. Boxplots for the mean and standard deviations of ductal carcinoma in situ-associated stroma probabilities for the pixels ≤ 175 µm from the ductal carcinoma in situ margin are shown in Figure 6a, b. In Figure 6a, b, each point is a single ductal carcinoma in situ lesion. Overall, the amount of tumor-associated stroma increased with increasing lesion grade. In addition, ductal carcinoma in situ lesions in slides without an invasive component presented less tumor-associated stroma compared to ductal carcinoma in situ lesions in slides demonstrating ductal carcinoma in situ with invasive cancer. Figures 6c, d show similar boxplots at the patient level where the average tumor stroma probability is shown as the mean of the scores of the highest grade foci per patient. A statistically significant difference was observed for the patient level means and standard deviations among the patients with different ductal carcinoma in situ grades (p = 0.023 and 0.005, respectively; one-way ANOVA). For slides containing ductal carcinoma in situ only, average tumor stromal probabilities were higher for higher grade ductal carcinoma in situ lesions, but this relationship was not evident for ductal carcinoma in situ-associated with invasive cancer. Values for tumor stromal probabilities showed greater variability for higher grade lesions. After multiple comparisons adjustment, the mean ductal carcinoma in situ-associated stroma probabilities (Fig. 6c) were significantly different between ductal carcinoma in situ grades 1 and 3 (p = 0.028). Mean differences between grades 2 and 3 and grades 1 and 2 were not statistically significant (p = 0.217 and p = 0.329). For the standard deviation of ductal carcinoma in situ-associated stroma probabilities (Fig. 6d), we observed statistically significant differences between grades 1 and 3 as well as grades 2 and 3 (p = 0.021 and p = 0.028). No statistical significance in standard deviation was observed between grades 1 and 2 (p = 0.619).
Figures 7a, b show the mean and standard deviation of ductal carcinoma in situ-associated stroma for rims of stroma at different distances from the ductal carcinoma in situ perimeter. As the distance from the ductal carcinoma in situ margin increases, the mean of the ductal carcinoma in situ-associated stroma probabilities decreases, but only slightly, and curves for different grades of ductal carcinoma in situ remain parallel up to 500 um from lesion periphery.
In this study, we developed a state-of-the-art deep convolutional neural network for distinguishing benign breast disease from invasive breast cancer based on the identification and characterization of tumor-associated stromal alterations. In an independent test set, classification of breast biopsies as benign or malignant based solely on convolutional neural network analysis of stroma achieved an impressive area under the receiver operator characteristics curve of 0.962, consistent with highly accurate discrimination. A subjective post hoc review suggested that false positive results were associated with sclerosing adenosis, a benign lesion often associated with stromal changes, and with fat necrosis, whereas false negative specimens often demonstrated tightly packed cancer cells with minimal intervening stroma for evaluation. Without training the convolutional neural network on ductal carcinoma in situ lesions, we subsequently assessed whether tumor-associated stroma could be identified in tissues surrounding ductal carcinoma in situ and whether its extent varied with clinically important pathologic features. We detected greater amounts of tumor-associated stroma in grade 3 versus grade 1 ductal carcinoma in situ and also found that ductal carcinoma in situ-associated with an invasive component generally possessed higher amounts of tumor-associated stroma compared to slides containing ductal carcinoma in situ only. Thus, our work provides support for including morphological analysis of breast stroma in studies aiming to understand risk of ductal carcinoma in situ progressing to invasion and in defining the biology of invasive breast cancer.
To date, most previous work [28,29,30,31,32,33] using automated image analysis approaches to detect and classify breast cancer in histological images involved assessment of the morphology and arrangement of epithelial structures (e.g. nuclei, ducts). Generally, the aim of this research was to objectify, standardize and quantify features that are already appreciated as important by pathologists. Although subjective evaluation of stroma may provide cues that pathologists use in the histopathologic diagnosis of breast lesions, stroma is difficult to assess microscopically, and formal criteria for classifying stromal changes have not been developed and used clinically. Accordingly, agnostic approaches, such as using deep learning techniques, are well-suited to investigating the morphology of breast stroma because visual characterization or feature selection is not required.
In the surgical management of breast cancer, it may be important to excise malignant epithelium and tumor-associated stroma. The ability of the system to objectively identify regions of altered stroma associated with tumor may additionally complement the pathologist’s diagnosis and may assist in identifying stromal tissue that should be included in tumor margins.
A key goal of our project was to use an unbiased data-driven approach to examine potential relationships between the patterns of stroma surrounding ductal carcinoma in situ lesions and ductal carcinoma in situ grade. It is hypothesized that transformation of the stroma starts in an early phase of ductal carcinoma in situ development [5,6,7], and there is growing evidence that stroma contributes importantly to the transformation of ductal carcinoma in situ to invasion [5, 6, 9]. Thus, we tested the hypothesis that stromal alterations may serve as a proxy for the potential for ductal carcinoma in situ to undergo an invasive transformation.
Although we did not train our model on ductal carcinoma in situ, we found that tumor-associated stroma probabilities were significantly higher in grade 3 ductal carcinoma in situ, with the amount of tumor-associated stroma generally increasing with increasing lesion grade. Although we were unable to distinguish pathologically defined ductal carcinoma in situ grade 2 from ductal carcinoma in situ grade 1 or grade 3, data show that reproducibility of ductal carcinoma in situ grade 2 is poor , suggesting that this comparison may have limited value. Despite this limitation, data show that high-grade ductal carcinoma in situ may have a higher risk of recurrence after surgical excision than low-grade ductal carcinoma in situ, and when recurrences occur after ductal carcinoma in situ treatment [34,35,36], they occur earlier for higher grade lesions [36,37,38]. Studies also suggest that occult invasion is more common among women with image-guided biopsies diagnosed with higher grades of ductal carcinoma in situ [39, 40] and that this may be important because grade of invasive cancer is generally matched with grade of accompanying ductal carcinoma in situ [37, 41]. Further, a low percentage of high-grade ductal carcinoma in situ has been associated with positive axillary nodes or later metastases, suggesting that at least a subset of such lesions are associated with occult invasion or disseminate through an undefined mechanism. Finally, ongoing prospective trials (LORIS , LORD , and COMET ) are assessing conservative management of low-risk ductal carcinoma in situ, given indirect evidence that many such lesions will never cause harm during a woman’s lifetime. Our data suggest that consideration of evaluating stromal changes to assess its role as a potential biomarker of risk for recurrence may have value in such trials.
There are several limitations to our study. Our dataset was limited to one study population, thus repeating this analysis in other populations is important. Additionally, this study was limited to breast tissue sections obtained at time of biopsy and further insights might be obtained by assessing the stromal patterns on whole-slide images from subsequent matched breast tissue surgical resections. Although our comparison of tumor-associated stroma in pure ductal carcinoma in situ versus ductal carcinoma in situ with invasive cancer attempted to focus on areas of slides that were further away from invasive cancer, because of limited amount of tissue in some biopsies, there was a risk that stromal changes associated with some ductal carcinoma in situ areas reflected nearby invasion. Analyzing whole-slide images of resected specimens would help alleviate this risk, provided that avoiding changes associated with the prior biopsy site does not pose insurmountable challenges. Additionally, by virtue of the deep learning process, it is unclear what components of the stroma may be driving the machine classifier. Additional studies are needed to understand the biology of the stroma surrounding ductal carcinoma in situ, including the role of the vasculature, which we previously showed was increased in ductal carcinoma in situ versus benign biopsies from this study, with the highest microvessel density in invasive carcinoma . Further experiments on larger cohorts of ductal carcinoma in situ with long-term clinical follow-up are needed, to study the potential that stromal features may have prognostic value. For example, stromal analysis may help define which ductal carcinoma in situ, grade 2, will behave indolently like grade 1 versus more aggressively like grade 3.
In conclusion, we have developed a deep learning approach utilizing convolutional neural network to identify the presence of cancer in whole-slide images based on tumor-associated stromal alterations in diagnostic image-guided breast biopsies. Further, we demonstrated that deep learning techniques can define stromal features that are related to ductal carcinoma in situ grade. Additional studies using these approaches with follow-up of ductal carcinoma in situ cases may be useful.
This project was funded in part by the Intramural Research Program of the National Cancer Institute, National Institutes of Health, Bethesda, Maryland and a competitive award to MES and LAB funded through the sale of breast cancer awareness postage stamps. The authors wish to acknowledge the financial support by the European Union FP7 funded VPHPRISM project under the grant agreement n601040. Pamela Vacek and Donald Weaver are currently funded under a U01 exploring stromal contributions to tumor progression (U01 CA196383).