Label-free quantitative screening of breast tissue using Spatial Light Interference Microscopy (SLIM)

Breast cancer is the most common type of cancer among women worldwide. The standard histopathology of breast tissue, the primary means of disease diagnosis, involves manual microscopic examination of stained tissue by a pathologist. Because this method relies on qualitative information, it can result in inter-observer variation. Furthermore, for difficult cases the pathologist often needs additional markers of malignancy to help in making a diagnosis. We present a quantitative method for label-free tissue screening using Spatial Light Interference Microscopy (SLIM). By extracting tissue markers of malignancy based on the nanostructure revealed by the optical path-length, our method provides an objective and potentially automatable method for rapidly flagging suspicious tissue. We demonstrated our method by imaging a tissue microarray comprising 68 different subjects - 34 with malignant and 34 with benign tissues. Three-fold cross validation results showed a sensitivity of 94% and specificity of 85% for detecting cancer. The quantitative biomarkers we extract provide a repeatable and objective basis for determining malignancy. Thus, these disease signatures can be automatically classified through machine learning packages, since our images do not vary from scan to scan or instrument to instrument, i.e., they represent intrinsic physical attributes of the sample, independent of staining quality.

Breast cancer is the most common type of cancer among women worldwide. The standard histopathology of breast tissue, the primary means of disease diagnosis, involves manual microscopic examination of stained tissue by a pathologist. Because this method relies on qualitative information, it can result in inter-observer variation. Furthermore, for difficult cases the pathologist often needs additional markers of malignancy to help in making a diagnosis. We present a quantitative method for label-free tissue screening using Spatial Light Interference Microscopy (SLIM). By extracting tissue markers of malignancy based on the nanostructure revealed by the optical path-length, our method provides an objective and potentially automatable method for rapidly flagging suspicious tissue. We demonstrated our method by imaging a tissue microarray comprising 68 different subjects -34 with malignant and 34 with benign tissues. Three-fold cross validation results showed a sensitivity of 94% and specificity of 85% for detecting cancer. The quantitative biomarkers we extract provide a repeatable and objective basis for determining malignancy. Thus, these disease signatures can be automatically classified through machine learning packages, since our images do not vary from scan to scan or instrument to instrument, i.e., they represent intrinsic physical attributes of the sample, independent of staining quality.

Introduction:
The latest World Health Organization (WHO) figures have reported breast cancer as the second most common form of cancer worldwide with 522,000 deaths in 2012 1 . Within the US over 200,000 new cases of the disease are expected for women in 2017 according to the American Cancer Society 2 . Effective treatment strategies require timely and accurate diagnosis of the disease. It has been reported that, in the US, the 5-year average survival rates for patients with invasive breast cancers increase from 90% to 99% when the disease is detected at a localized (non-metastatic) stage 3 .
The standard tissue evaluation method for diagnosing breast cancers involves microscopic examination of a hematoxylin and eosin (H&E) counter-stained tissue biopsy. The biopsy specimen is obtained from the patient when suspicion of disease is noted during a screening procedure such as X-ray mammography. Since cells and histological tissue sections are transparent, the H&E stain provides the necessary contrast for assessing tissue morphology using a conventional bright field microscope. This standard histopathology process has two important short-comings: reliance on qualitative markers leads to intra-and inter-observer variation while manual examination can lower the throughput of the evaluation. Quantitative microscopy could help pathologists by offering an objective assessment of the tissue physical properties. Furthermore, quantitative markers can be interpreted by machine learning classifiers for rapid analysis and automated detection 4 .
In this work, we present a method for extracting quantitative markers of malignancy in breast tissue biopsies using Spatial Light Interference Microscopy (SLIM) 5 . SLIM is a quantitative phase imaging (QPI) 6 modality that generates contrast by measuring the variation of optical path-length difference (OPD) across the tissue specimen. OPD reports on the product of the refractive index and thickness of tissue at each pixel. Malignant transformation involves physical changes in epithelial cell size and density as well as the tissue organization -both of which affect OPD maps of tissue. These maps have, therefore, been used in the past for several clinical investigations 7 . This includes applications in histopathology and cytopathology including diagnosis of prostate 8 and colorectal cancers 9,10 , prediction of recurrence in prostate cancer 11 , analysis of Gleason grade 12 , assessment of metastatic pancreatic cells 13 as well as detection of pre-malignancy in colorectal tissue 14 . Furthermore, using QPI human blood cells have also been investigated for morphological 15,16 , chemical 16-18 and mechanical markers of disease 19,20 .
To date, a majority of quantitative image analysis on breast tissue biopsies has relied on color images of stained tissue. Image classification in these cases has involved computing a wide range of histological features including geometric features 21,22 , texture-related features 23,24 and radiometric features 23 25,26 [see 27 for a review of methods]. However, the feature extraction process relies heavily on tissue staining which can vary from sample to sample and instrument to instrument, affecting the robustness of the classifier 28 . The label-free approach we propose makes classification through machine learning easier since the instrument does not require calibration for inconsistency in pixel values due to variations in staining, tissue changes caused by harsh solvents etc. Other label-free quantitative methods for tissue image classification have been proposed in the literature, including Fourier transform infrared spectroscopy (FTIR) 29-31 , Raman spectroscopy 32-34 , optical coherence tomography (OCT) 35,36 and second-harmonic generation (SHG) imaging 37,38 . However, these techniques differ from our QPI-based method in terms of speed, resolution, and compatibility with the current diagnostic pipeline.
We demonstrated in our previous work 39 that SLIM captures sufficient tissue morphology to separate benign from malignant tissue via visual investigation by trained pathologists. In this work, we demonstrate the quantitative analysis capabilities of our tissue screening system by imaging a tissue microarray (TMA) comprising 68 different cases (34 benign and 34 malignant). For each epithelial region within a tissue core, we extracted scattering, geometric, and texture-related markers of tissue malignancy from the SLIM maps (see Materials and Methods). A linear-discriminant analysis (LDA) classifier was trained to separate benign cases from malignant cases and three-fold cross validation was performed to measure the classification accuracy of the learned model 40,41 . Using validation by the Receiver Operating Characteristic (ROC) curve analysis, our results revealed a sensitivity of 94% and specificity of 85%. Figure 1 illustrates the SLIM optical setup which has been discussed in detail in previous publications 5,42 . The setup comprises of a module (CellVista SLIM Pro, Phi Optics, Inc.) coupled to the output port of a commercial phase contrast microscope (Carl Zeiss, Axio Observer Z1). This compatibility with existing microscopes promises to reduce barriers to clinical adoption since optical microscopes are commonly available in pathology labs. In the SLIM module, the conjugate image plane outside the microscope is relayed onto a CCD camera (Andor, Zyla) using a 4f system comprising lenses L 1 and L 2 . At the Fourier plane of L 1 , a spatial light modulator (Boulder Nonlinear Systems) is used to modulate the phase difference between the scattered and unscattered components of light in increments of π /2. Four different modulations are applied [ Fig. 1 (b)] and the resulting phase image is reconstructed using a previously published algorithm 5 . Using a software platform developed in-house, the SLIM module has been upgraded with full-slide scanning capabilities 9,39 . The acquisition speed is in the range of the existing commercial tissue scanners, which, in turn, only perform bright field imaging 39 . Throughout our experiments, a 40x/0.75 NA phase contrast objective was used for imaging.  images of the TMA were acquired using a bright-field microscope (Carl Zeiss, Axio Observer Z1) outfitted with a color camera (Carl Zeiss, Axiocam MRC). The H&E images were used to assist with annotation of epithelial regions in tissue, discussed below.

c. Annotation of epithelial regions in tissue images
Glands or continuous epithelial regions within each core were manually annotated using the region of interest (ROI) tool of ImageJ to allow feature extraction for each gland. A consistent criterion for annotation was used where groups of epithelial cells bounded by stroma on all sides where considered a single gland. Other tissue components within epithelium (such as lumen etc.) were considered part of the gland if bounded on all sides by epithelial cells. Glands from cores in the IDC cohort were labelled as malignant while those from cores in the tumor adjacent normal cohort were labelled as benign.

d. Extraction of geometric and scattering features
Malignant transformation in breast tissue affects the size, shape and density of epithelial cells as well as the shape and organization of epithelial tissue. As a result, both the geometry and scattering properties of the gland are affected. We used gland perimeter curvature C , as well as the mean scattering length s l as part of the feature set used for separating benign and malignant tissue. The parameter extraction process is illustrated in Fig. 2 and a detailed description for each is provided below.
The extrinsic curvature C of a two-dimensional plane curve ( , ) P x y , that is parametrized by Cartesian coordinates ( ) x t and ( ) y t with parameter t , is given by the expression 44 ( ) where the ' x , ' y and '' x , '' y refer to the first and second derivatives in t , respectively. In the above parametrization, t refers to each pixel comprising the curve ( , ) P x y , having coordinates ( ) x t and ( ) y t . This curvature can be interpreted as the magnitude of the rate of change of a vector tangent to ( , ) P x y . We computed C for the perimeter ( , ) P x y of each annotated gland by using an open source MATLAB code 45 . The code approximates ( , ) P x y as a polygon before computing C for each point defining the gland perimeter, as described in Eq. (1). To speed up computation, the image of each core was first down-sampled from the raw image size of 8000 x 8000 to 2048 x 2048 pixels. The perimeter ( , ) P x y was then further down-sampled by a factor 20 before computing C in order to remove any pixel level errors due to manual annotation. The median gland curvature C was then used as a feature for separating benign and malignant cases. Figs. 2 (c) and (d) illustrate C for representative benign and malignant glands.
The mean scattering length s l is a bulk scattering parameter that defines the length scale over which a single scattering event occurs on average. Assuming that the tissue slice captures the refractive index spatial fluctuation statistics, i.e., assuming statistical homogeneity, s l can be computed through the scattering-phase theorem using the expression 46 where ( , )  nd se en performed on the response vectors (number of clusters, K = 50) generated from all cores within the training set and the computed cluster centroids were referred to as 'textons' 47,48 . Since each pixel in the core belongs to a texton, for each pixel the histogram of textons was generated for its vicinity (window size 60 x 60 pixels) and was used to characterize the local texture in that neighborhood. This way, a 50 dimensional feature vector T was generated to characterize texture in a pixel's neighborhood. An open source MATLAB code was used for generating the LM filter bank for this work 49 .

f. Classifier training and validation
Since our work involves classifying each gland within a tissue core as benign or malignant, a feature vector for each gland was next generated by concatenating geometric, in ch its at ize he e or or ic, scattering and texture-related features. This procedure is illustrated in Fig. 4. After pixel-wise computation of gland curvature C , scattering length s l and texture vector T , the median of each feature was computed over each gland in a core and a combined 52 dimension feature vector was generated for training. For each gland, this feature vector was then used as a predictor for training a linear-discriminant analysis (LDA) classifier [ Fig. 4(a)]. Class labels, either benign or malignant, were used as the ground-truth for each gland during the training process.
The feature extraction for validation purposes, illustrated in Fig. 4 (b), followed a nearly identical procedure to that used during training. The only difference was that, instead of finding new textons (cluster centroids) for validation data, the texture feature vector T was computed by using the same textons as determined during training. As in training, a 52 dimensional feature vector was input to the LDA classifier which then used the model learned during training to generate a likelihood score for a gland being benign or malignant. Finally, the mean of the likelihood scores of all glands within a core was computed and used as the likelihood score of a core being benign or malignant. These scores were then used to generate a receiver operative characteristic (ROC) to select an operating point for separating benign and malignant cases (see Results and Discussion).

Results and Discussion
The classification results of our analysis are summarized in Fig. 5. In order to evaluate the accuracy of our method, we performed three-fold cross-validation 50 as illustrated in Fig. 5 (a). The total number of cases were divided into three (nearly) equal groups. In each trial, two groups were used for training while the remaining one was used for validation. Thus, three validation trials were performed, each time selecting a different validation/training set combination. This geometric feature is similar to the previous measurement of the gland perimeter fractal dimension that has been used for histopathology 21,27 .

Summary and Conclusions
In summary, we presented a new method for screening tissue biopsies obtained from patients under investigation for breast cancer. Since our method relies on measurement of OPD maps, an intrinsic property of tissue, the basis for classification is objective and not subject to inter-observer variation. While in the past much of the quantitative histopathology has relied on analysis of stained tissue, our method performs image processing and machine learning on unlabeled images, making it insensitive to variability due to staining. Thus, the process of automating the entire method is feasible and subject to our future efforts.
While other label-free diagnosis methods have been proposed for these types of investigations, they affect the standard diagnostic pipeline in terms of either speed, resolution or compatibility with established workflow. SLIM, on the other hand, requires minimal changes to a conventional microscopic optical train due to its modular design. Equipped with a slidescanning feature for rapid acquisition, a SLIM tissue scanner can potentially carry out highthroughput automated histopathology, not only reducing the case-load for pathologists but also providing complementary information through new markers. This carries the potential for incorporation into daily practice of diagnostic surgical pathology, either as a screening method to point out areas of the slide that need additional attention, or for difficult cases where pathologists need supporting tests to make a final diagnostic decision.

Acknowledgements:
We

Competing financial interests:
G. P has financial interest in Phi Optics, Inc., a company that develops quantitative phase imaging technologies.

Author contributions statement:
H. M imaged the tissue samples and performed the image processing and analysis. He also wrote the main manuscript text and prepared the figures. T. N helped with the image processing and analysis while M. K developed the tissue scanner used for imaging the samples. A. B provided inputs from a clinical perspective and G. P supervised the project.

Data availability statement
The datasets generated or analysed during this work are available from the corresponding author on reasonable request.  T  h  e  e  t  i  o  l  o  g  y  a  n  d  p  r  e  d  i  c  t  i  o  n  o  f  b  r  e  a  s  t  c  a  n  c  e  r  .  F  o  u  r  i  e  r  t  r  a  n  s  f  o  r  m  -i  n  f  r  a  r  e  d  s  p  e  c  t  r  o  s  c  o  p  y  r  e  v  e  a  l  s  p  r  o  g  r  e  s  s  i  v  e  a  l  t  e  r  a  t  i  o  n  s  i  n  b  r  e  a  s  t  D  N  A  l  e  a  d  i  n  g  t  o  a  c  a  n  c  e  r  -l  i  k  e  p  h  e  n  o  t  y  p  e  i  n  a  h  i  g  h  p  r  o  p  o  r  t  i  o  n  o  f  n  o  r  m  a  l  w  o  m  e  n  .  C  a  n  c  e  r  7  5  ,  5  0  3  -5  1  7  ,  (  1  9  9  5  ) .