Main

Advances in imaging instrumentation and data management provide the foundation for computational approaches to analyze digitized images of tissue sections and derive objective, quantitative measurements at the tissue, cellular, subcellular, and molecular levels.1 Computational pathology approaches offer a cost-effective platform to increase throughput, accuracy, and reliability of diagnoses of tissue samples.2, 3 Further, the quantitative nature of computational pathology can be used in combination with other assays to improve pathologists’ knowledge of disease and help inform treatment strategies and further stratify patient prognosis. It has been shown that, by integrating information derived from computational pathology with a patient’s clinical data, a better prognostic model can be derived for many diseases, including prostate cancer,4, 5, 6 lung cancer,7 breast cancer,8, 9, 10, 11, 12 glioblastoma,13, 14 basal cell carcinoma,15, 16 and ovarian cancer.17, 18

One of central challenges of computational biology, which limits its large-scale applications, is that images of tissue sections frequently vary in color appearance across research laboratories and medical facilities due to differences in tissue fixation, staining protocols, and imaging instrumentation. The wide spectrum of image color appearance causes difficulty in robustly extracting the representative images of different tissue components, such as nuclei.19 Previous studies have shown that technician variance or technique differences can lead to marked differences in staining.20 For example, the conventional hematoxylin and eosin (H&E) staining techniques have been modified to reduce material use and processing time21 or to improve the contrast and detail in the digital image.22 These technique differences provide some advantage to the pathologist, and also lead to variation in the staining of slides for use in computational pathology approaches that must be addressed.

Several stain normalization computational approaches—including color deconvolution (CD),23 histogram equalization,24 and the use of the CMYK space25—have been developed to correct for the difference image appearance and facilitate the separation of tissue types.19, 20 Of these approaches, CD is the most commonly used approach to extract nuclear and cellular images in both H&E and immunohistochemically (3,3′-diaminobenzidine, DAB) stained images.2, 9, 23, 26, 27, 28 CD uses the method of singular value decomposition, which seeks to linearly separate the color space to identify regions rich in each particular dye. However, a major disadvantage of CD is the requirement of prior knowledge for each dye’s color spectrum to visualize accurately tissue components.29 Owing to color appearance difference between images, using the same stain vector across images will introduce variance in the representative image for each dye. Although there are automated methods to determine the stain vector for individual images, the additional processing step leads to significant increase in processing time across large image data sets.30 Furthermore, CD only decouples the concentration of dye in the histopathological image, and further processing is needed to separate individual tissue components such as blood, nuclei, and extracellular matrix- and cytoplasmic-rich regions for quantification.

In this work, we propose a novel nonlinear tissue-component discrimination (NLTD) method to register automatically the color space of histopathology images and obtain representative images for individual tissue components, such as the nuclei or cytoplasm, irrespective of perceptual color differences between images. We demonstrate that the nuclei image obtained from NLTD display consistent appearance for histopathology images—including those with distinct color differences—taken from different tissues types and prepared at different institutions, including The Cancer Genome Atlas project (TCGA, http://cancergenome.nih.gov/). Importantly, the processing time of NLTD is highly comparable to the CD for small images, and much more efficient for large images, notably whole slide images. Further, we demonstrated that the nuclei images derived using NLTD produce highly accurate nucleus tracing and counting, and NLTD allows for quantitative analysis of antigen presence in immunohistochemical images. Taken together, we show that NLTD is an effective approach to obtain quantitative tissue-component images that can be easily integrated in emerging computational pathology applications.

MATERIALS AND METHODS

The NLTD method consists of five main steps (Figure 1a), detailed further here: (1) color joint-histogram creation; (2) ridge detection; (3) ridge set registration; (4) transformation function creation; and (5) tissue component image creation.

Figure 1
figure 1

Brief overview of nonlinear tissue-component discrimination (NLTD) approach. (a) NLTD applied to an image of a hematoxylin and eosin (H&E)-stained section (top) and immunohistochemically (IHC) stained image (bottom). Shown are a typical H&E image of a small artery, exhibiting multiple tissue components (nuclei (N), extracellular matrix (ECM)-rich and cytoplasm (E), blood (B)) and a typical IHC image, stained for LINE-1 ORF1p expression,31 exhibiting two tissue components (antigen (A) and nuclei (N)). The NLTD method is schematically shown in the center. Briefly, the red-blue joint histogram is first segmented to identify each region in the red-blue color space. The x axis corresponds to each red color, the y axis shows each blue-color, and the color axis represents the frequency of each discrete color combination. Ridges for each tissue component are overlaid, on the red-blue color joint histogram (RBJH). The ridge set is registered and transformed to yield the pseudocolored transformation function for each component. The pseudocolored grayscale images are shown for the nuclei, non-nuclei, and blood components (purple, pink, and red, respectively) in the far right box. (b and c) Grayscale correlation values for the red-blue joint histogram, blue-green joint histogram, and red-green joint histogram, with a value of 1 corresponding to a completely correlated colorset. (b) Pancreatic cancer H&E data set (n=45). (c) Ovarian immunohistochemistry data set (n=81). (d) Separation of red-blue color space into individual tissue components: nuclei (purple box); ECM and cytoplasm-rich (pink box).

Color Joint-Histogram Creation

In a typical 8-bit tissue image, I, the color of an individual pixel, p, at location (xp, yp) is expressed by three intensities (rp,gp,bp), each of which ranges from 0 to 255, discretely, that is,

For example, if all the intensities of a pixel are zero, the resultant color is black; conversely, if all are 255, the resultant color is white. The color joint histogram is a three-dimensional histogram created by counting the occurrence of pixels at all different set of red, green, and blue (RGB) intensities in an image. However, calculating every color combination in RGB color space and analyzing three-dimensional RGB color joint histogram is a highly computationally intensive process. An 8-bit image can contain more than 16 million unique combinations. To reduce computational time, it would be advantageous to only consider two of the three color axes, reducing the number of unique combinations 256-fold.

In a cohort of 45 H&E images, we found that blue and green color components are highly correlated within individual images (Figure 1b). Furthermore, we also found that the red and green color components are highly correlated in a set of 81 immunohistochemically stained images (DAB)31 (Figure 1c). These observations show that in both H&E and DAB images green color channel encodes highly correlative information to other color channel and implies that the red-blue color joint histogram (RBJH) can be a representative simplification of the histopathological image RGB color space. The RBJH is a two-dimensional matrix, created by counting the frequency (n) of pixels at different red (r) and blue (b) intensity values in the image (I), that is,

The resultant RBJH can be visualized as a three-dimensional surface, with the x and y axes corresponding to the red and blue color space values, respectively, and the z axis corresponding to the incidence rate for each red-blue intensity combinations.

Ridge Detection

The RBJH shows the most abundant color combinations in the red-blue color space for an image. In the RBJH, distinct populations of red-blue combinations are readily observed, corresponding to different tissue components (Figure 1d). However, to detect and segment individual tissue components in the RBJH is challenging because of the elongated nonlinear distribution of red-blue color combinations that complicates separation of the populations and, consequentially, common approaches, such as Gaussian mixture models or k-means clustering, do not work well. Gaussian mixture models fail because most images’ RBJH do not follow strictly Gaussian distributions, often having one major peak, along with a long sparse tail. Similarly, k-means clustering is not effective at detecting sparse areas in the RBJH. Additionally, both models require prior knowledge of the numbers of components present, which we have found can vary from 1 to 3 in most images. Successful extraction of individual tissue components’ locations in the red-blue color space needs to incorporate the asymmetric and elongated distribution of red-blue color combinations that is observed in the RBJH. Therefore, we propose to identify locations of major tissue components in the red-blue color space by tracing the location of ridges of distinct population in the RBJH (Supplementary Figure S2).

We first identified the major orientation of signal in RBJH using weighted principal component analysis (PCA) (Supplementary Figure S1a). PCA is applied to the each red and blue index of the RBJH, with the frequency (nr,b) associated with each color combination used as a weight. The principal component provides the major direction of the RBJH color space, which can be combined with the location of the global maximum of the RBJH to create a major axis (v0). Next, we identify the local maxima tangent along the major axis in RBJH (Supplementary Figure S1b). To ensure the all local maximum is detected, this routine is repeated along two other vectors at angles ±15° of the major direction. A map of all local maxima identified is then created by counting the frequency of local maxima identified at each red-blue index (Supplementary Figure S1c). This map is further processed through morphological dilation and thinning operations to provide a binarized location of ridges for all distinct populations in RBJH (Supplementary Figure S1d).

Ridge Set Registration

To register the ridge set maps in RBJH to different tissue components, we developed a robust algorithm based on each ridge’s proximity to specified reference color combination (Supplementary Figure S2a). Four reference color combinations are used: Red (r=255, b=0), Blue (r=0, b=255), Black (r=0, b=0), and White (r=255, b=255). The Euclidean distance transform32 is calculated for each reference color combination, and the minimum distance along each ridge is found. The component with the smallest distance from a reference color combination is determined to be the closest. In H&E staining, in general, nuclei would be closer to black and blue, whereas extracellular matrix- and cytoplasm-rich areas would be closer to white. Similar logic can be applied to DAB chromagen staining, where antigen-rich areas are closer to red than nuclei, which are closer to blue.

In some cases, where the RBJH is more homogeneous, it is possible that only one ridge is found (Supplementary Figure S2b). For H&E staining, in general, each image contains at least two distinct components—nuclei or extracellular matrix and cytoplasm-rich components. The portion of ridge closer to white and red represents extracellular matrix and cytoplasm-rich areas and the remaining portion of the ridge represents the nuclei-rich regions; this necessitates further segmentation of the ridge. To segment the ridge, the intensity profile (ie, frequency of color combinations) of the RBJH along the ridge is first extracted. A peak, corresponding to the most frequently occurring color combination, commonly appears that represents the central location of extracellular matrix and cytoplasm-rich regions on the ridge. Thus, we fit this intensity profile with a Gaussian distribution to measure the distribution of the extracellular matrix and cytoplasm-rich areas along the ridge, and segment the single identified ridge using a distance of 2 s.d. from the peak into two distinct ridges corresponding to nuclei or extracellular matrix and cytoplasm-rich components. For DAB chromagen staining, the possibility of only one ridge being identified is most likely due to the lack of presence of antigen to probe in the tissue section and hence no further segmentation is needed.

Transformation Function Creation

We formulated tissue transformation functions (TF) to convert the red-blue color space to intensity of different tissue components (k=1,2,…,N). We assume the red-blue color space has different regions that exclusively correspond to different tissue components based on the proximity to each ridge in the ridge set. A watershed segmentation is applied to the ridge of the RBJH to identify regions of the red-blue color space that represent the unique tissue components (Supplementary Figure S3a). Additionally, the regions of the red-blue color space with the most absorption (ie, lower r and/or b indices) correspond to the strongest signal within each tissue region. For each particular tissue component, the red and blue indices that are closest to the tissue’s ridge indicate a higher likelihood of belonging to that tissue and also contribute to a stronger signal.

To account for these three factors, we developed a transformation function, TFk, that account for the tissue component’s region in red-blue color space (fregion), its absorption (fabsorption), and the distance from each tissue component’s ridge (fridge) (Supplementary Figure S3b), expressed by

where k=1,2,…,N tissue component.

The tissue region function, fregion, uses the watershed segmentation result as a basis to exclude any part of the red-blue color space not belonging to the same tissue component. The region of the red-blue color space corresponding to the kth tissue component is defined as Wk. A Gaussian filter, g, can be applied to the edge of the region to allow for a smoother transition between components (Supplementary Figure S3b(i)), that is,

The absorption function, fabsorption, for kth tissue component is obtained by first calculating the Euclidean distance transform32 (Ddark) of all points in red-blue color space from the point with highest absorption (ie, darkest) on the kth tissue-component ridge (Rk), defined as the point on the ridge closest to black (r=0, b=0). To scale the distance with level of absorption of dye, the absorption function (Supplementary Figure S3b(ii)), is expressed by

The ridge function, fridge, is derived from the Euclidean distance transform,32 D, which is the minimum distance of any point in the red-blue color space to a point on the ridge of the kth tissue component, Rk. To scale the likelihood with distance, the ridge function (Supplementary Figure S3b(iii)), is expressed by

Tissue Component Image Creation

To obtain the kth tissue-component image, Tk, the red and blue pixel intensities (rp, bp) at each location (xp,yp) in the original image, I, were used to create a grayscale image according to the transformation function, TFk, that is,

Sample Acquisition

Histopathological images were acquired from pathologists at the Johns Hopkins University. The tissue samples were formalin fixed and paraffin embedded. Tissue sections were fixed for 3 h in formalin on tissue processor, followed by 1–2 h of gross room fixation. Paraffin sections were cut at 5 μm thickness. Sections were then stained with H&E and digitized using a DP27 5MP color camera. Sections of pancreatic cancer, colon cancer, ovarian cancer, and glioblastoma were included. Immunohistochemically (DAB) stained tissue was acquired through an ovarian cancer tissue microarray, as described previously.31 Additional tissue images were acquired from The Cancer Genome Atlas project (http://cancergenome.nih.gov) and published sources.33, 34

Nuclei Detection

To perform a comparison of segmentation results between CD and the NLTD method, a publicly available data set,33 including both tissue images and ground-truth nuclei locations, was analyzed. For the NLTD method, the corresponding nuclei image was obtained and nuclei location is obtained using following procedure:

  1. 1)

    Binarize each image using a dynamic threshold, calculated using Otsu’s method.35

  2. 2)

    Remove small objects based on a size threshold of 50 pixels.

  3. 3)

    Watershed segmentation to separate clusters of nuclei.

The same segmentation approach was used for the CD image corresponding to the hematoxylin dye. For each segmented nucleus identified, the nearest ground-truth nucleus was found. If two segmented nuclei were attached to the same ground-truth nuclei, the nearest would be counted as a true positive and the other would be counted as a false positive. Any segmented nucleus with the nearest ground-truth nuclei was more than one average cell diameter away was counted as a false positive. Conversely, any ground-truth nucleus that did not have any segmented nuclei within one average cell diameter was counted as a false negative.

Immunohistochemistry Scoring

A TMA of ovarian cancer tissue stained using an antibody for LINE-1 ORF1p31 was used to evaluate the utility of NLTD as an immunohistochemistry scoring aide. Each image in the TMA was separated into two images using the NLTD method, a nuclei- and an antigen-rich image (Supplementary Figure S5). Preprocessing steps were performed to only analyze nuclei-rich regions where antigen staining was present, and avoid background areas where no staining should occur. Briefly, the nuclei-rich image was segmented using Otsu’s thresholding technique.35 Small objects were removed from the image, followed by morphological opening and closing operations and another removal of small objects. After preprocessing, a transformation score was derived based on the ratio of antigen intensity to nuclei intensity (Equation (8)). Importantly, only antigen and nuclei intensity in the areas from the segmented, preprocessed image were counted.

Hardware and Software

All image processing was performed using MATLAB 2015 (Mathworks). To determine statistical significance, two-tailed t-tests were performed using Graphpad Prism 6. All computations were performed on Windows 7 Professional with an Intel Core i7-3820 processor and 16 GB RAM.

Statistics

To quantify the segmentation results, precision, recall, and F-score statistics were used.36 For this data set, it is not possible to assess accuracy or other statistics using true-negative counts, as the classification system has no negative result included and only positive occurrences (ie, nuclei) are identified. Each statistic is defined as follows:

RESULTS

Overview of the NLTD Method

The NLTD method presented in this work consists of five major steps, as illustrated with an H&E-stained image and an immunohistochemically stained image in Figure 1a. First, the RBJH is created. This joint histogram represents the frequency at which each red and blue pixel intensity combination occurs in a histopathological image, and serves as the basis for tissue-component discrimination. The RBJH is reduced to a set of curves representing the ridges, or local maxima, using an iterative approach. This ridge set is further registered with corresponding individual tissue components (eg, nuclei, extracellular matrix, and cytoplasm-rich, etc). Further, the ridge set serves as a basis for the creation of a set of transformation functions used to create individual, grayscale images from the original image representative of each tissue component present in the image (see more details in Materials and Methods section). The resulting set of tissue-component images can then be used for additional tissue processing and analysis, including nuclei detection algorithms and quantitative scoring of immunohistochemically stained samples. The MATLAB package is available upon request.

Robustness of NLTD

To demonstrate the robustness of the NLTD method, we applied NLTD to a set of histopathological images with wide range of apparent colors to show the uniformity in the images of nuclei extracted using the NLTD method (Figure 2). The image set spans multiple tissue types, along with several different image sources: the Johns Hopkins School of Medicine; images from previous studies performed at University of California, Santa Barbara34 and the University of Berlin;33 and publicly available images from the TCGA image database (http://cancergenome.nih.gov). The results show that even though the RBJH color space for different images have unique and different distributions, the NLTD method can successfully identify and register each tissue-component and extract nuclei images consistently and robustly.

Figure 2
figure 2

Application of nonlinear tissue-component discrimination (NLTD) across a wide variety of tissue types. The NLTD method is applied on many different types of tissue. The original tissue image, red-blue color joint histogram (RBJH), registered RBJH, and the nuclei component grayscale image (pseudocolored purple) are shown (left to right). The registered RBJH shows a purple line for the nucleus component, a pink line for the extracellular matrix (ECM)/cytoplasmic component, and a red line for the blood component. The sample tissue types are: (a) colon cancer, (b) kidney cancer, (c) ovarian cancer, (d) lung adenocarcinoma, (e) gastric mucosa, (f) astrocytoma, (g) skin cutaneous melanoma, and (h) breast cancer.

Processing Time

For most computational pathology applications, the time associated with processing each image and gleaning important information can quickly become a barrier with image size and/or lager cohort of images. Previous work has compared stain normalization processing time for smaller images (256 × 256, 512 × 512, and 1024 × 1024).29 Whole slide imaging, however, often results in much larger images (10 000 × 10 000 or greater), and it is important for image processing time to scale well with the size of each image. In our work, we compared CD, CD using Macenko’s color normalization method,37 and the NLTD approaches. Macenko’s approach involves an additional preprocessing step to determine each individual image’s optimal stain vector and uses the optimal stain vector for CD. Both CD approaches were faster than the NLTD method at small image size (up to 2500 × 2500), but, as the image size approached whole slide levels (15 000 × 15 000), the NLTD method was much faster than both CD-based methods and took only a quarter of the time to process each image (Figure 3). This result suggests that NLTD can more efficiently analyze larger images, which can be very useful for large data sets, such as the TCGA. Since CD is more time efficient at smaller sizes, it is possible to partition one large image into many smaller images (ie, one 10 000 × 10 000 image into one hundred 1000 × 1000 images). However, this additional processing step would still lead to an increase in processing time compared with the NLTD approach (3.8 s for NLTD on one 10 000 × 10 000 image, 11.85 s for CD on one hundred 1000 × 1000 images). Therefore, the NLTD method can be more efficiently applied to whole slide images and reduce the time needed to analyze large cohorts of images.

Figure 3
figure 3

Processing time of nonlinear tissue-component discrimination (NLTD) and color deconvolution (CD). Comparison time for processing of images of various sizes using three different color normalization techniques: NLTD (squares), CD (circles), and CD using Macencko’s method of automated stain vector determination (MMCD, diamonds). Each image used was a three-dimensional red, green, and blue (RGB) image, with side lengths defined by the x axis. Processing time is shown on the y axis in seconds as the median of 10 runs for each method at each image size.

Improving Nuclei Detection with NLTD

Nuclei detection in histopathological images has been critical and often used in computational pathology approaches to develop prognostic and diagnostic models.7, 8, 9, 11, 12, 17, 33 Currently, CD is commonly used to extract a representative nuclei image (corresponding to the hematoxylin dye levels) to apply nuclei detection algorithms.9, 28, 29, 30, 33, 38 Here, we show that using the nuclei image derived from the NLTD method improves the detection of nuclei over the CD approach. We first evaluated the contrast of individual nuclei images created from both the NLTD and the CD method (Figures 4a–c). By examining the intensity profile along one axis across nuclei, we found that the nuclei image obtained from NLTD has a substantial decrease in intensity at the periphery of the nucleus compared with nuclei images from CD. This result suggests that the implementation of segmentation algorithms to the NLTD nuclei image would be less sensitive to the intensity threshold value and hence could lead to improvements in the accuracy and robustness of nuclei segmentation algorithms.

Figure 4
figure 4

Evaluation of nonlinear tissue-component discrimination (NLTD) method. (ac) Nuclei intensity comparison between NLTD and color deconvolution (CD) approaches. Representative nuclei from several tissue types are shown, along with the NLTD and CD nuclei transformations. The intensity of each color space is integrates along the dotted lines shown, with the NLTD intensity shown in purple and the CD intensity in brown. Intensity values are normalized linearly between 0 and 1, with 0 corresponding to the minimum value in the input image, and 1 corresponding to the maximum. (d) Typical breast cancer image.33 (e) Example of segmentation results from Otsu thresholding of the nuclei NLTD color space. Detected nuclei are overlaid on top of the image from panel (d). True positives are represented by a green dot, false positives by a red dot, and false negatives by a yellow dot. (f) Precision, (g) sensitivity, and (h) F-score values for segmentation results from 35 images. (i) Receiver-operating characteristic curve for change in segmentation parameterization (threshold value) for nuclei detection. Recall (sensitivity) is shown on the x axis, with precision shown on the y axis. Results from NLTD method are shown in black, with CD shown in gray.

To examine quantitatively the performance of the NLTD and CD methods in nuclei detection and segmentation, we applied previous proposed detection algorithm (see more details in Materials and Methods section) after applying both color normalization methods (NLTD and CD) to a published set of 35 images.33 This data set included nuclei locations that had been previously registered by a pathologist and were used as ground truth (Figures 4d and e).

To assess each method, the precision, sensitivity, and F-score were measured. High precision and sensitivity are both valuable in a nuclei detection system. A system that lacks precision will lead to unnecessary calculation and validation by an observer with too many nuclei identifies. Conversely, a system that is not sensitive will miss many nuclei and potentially distort the values of nuclei counts or exclude rare nuclei events, such as mitotic or atypical nuclei. The F-score provides the harmonic mean between sensitivity and precision and serves as an overall measure of how accurate the system is.

Among the 35 images tested, we found that, overall, the images normalized using NLTD have significantly higher sensitivity in detecting nuclei than the corresponding CD images (NLTD=0.868; CD=0.753), but slightly lower precision (NLTD=0.938; CD=0.976) (Figures 4f–h). The overall accuracy, as represented by the F-score, for NLTD images is 0.860 and is significantly higher than the CD images (F-score=0.805). The slightly lower precision in our NLTD system correlates to an overdetection, with more nuclei identified by the NLTD detection system than the ground truth. The higher sensitivity, however, means that the NLTD detection system leaves fewer ground truth nuclei undetected. Taken together, these results suggest that the NLTD method is able to provide more accurate nuclei segmentation results, compared with conventional CD methods.

NLTD for Quantitative Immunohistochemistry Analysis

In addition to providing a platform for image appearance normalization and nuclei detection, the NLTD method can be used as a companion diagnostic for analysis of immunohistochemical labeling quantitatively and objectively. The intensity level of DAB chromagen labeling is used to access the level of antigen presence in tissue sample by pathologists. We applied our method to an ovarian cancer tissue microarray cohort that had been immunolabeled for L1ORF1p, a cytoplasm-localizing protein associated with cancer31 (Figure 5 and Supplementary Figure S4). Each tissue sample in this cohort was scored by a trained pathologist using a discrete scoring system (0, 1, 2, or 3). A score of 0 indicates no significant protein expression, whereas a score of 3 was given for high expression. We applied our NLTD method to individual tissue images of the TMA to create component images for antigen- and nuclei-rich regions. These images were then used to calculate an overall score corresponding to the level of antigen, normalized by nuclei intensity (see more details in Materials and Methods section). Our results showed a strong correlation (Spearman’s ρ=0.8122) between our automated scoring platform and the scoring by the pathologist. Minor overlap exists between tissues with a score of 1 and 2, but both high expression (3) and very low expression (0) scores were well stratified. This result shows the utility of our NLTD method as a nonparametric tool to assess immunolabeling.

Figure 5
figure 5

Nonlinear tissue-component discrimination (NLTD) method as a quantitative descriptor for immunohistochemistry (IHC). Ovarian tissue samples were stained with an antibody for LINE-1 ORF1p and manually scored by a pathologist31 on a discrete scale of 0 (no expression) to 3 (high expression). A quantitative score is calculated using the NLTD grayscale images. The scores correlate well, with a Spearman's p=0.8122.

DISCUSSION

CD,23 and other associated methods,9, 26, 27, 28, 29, 37 are routinely used for dye separation in histopathological images, but are limited by difference in dye appearance between images, potentially time-consuming automated image processing, and a need for further postprocessing to identify specific tissue components, such as the nuclei. The NLTD approach presented here is able to bypass these limitations, specifically the requirement of prior knowledge of color information for different batches of histopathological images. The NLTD approach makes no inherent assumptions about the histopathological image’s color space, and yields consistent, batch-invariant tissue component separation in histopathological images. We demonstrate that the NLTD method can successfully identify nuclei for a wide variety of histopathological images despite large variations in the perceptual color space (Figure 2). Importantly, no prior knowledge or user input is required, as our algorithm will automatically register locations of for each tissue-component, and the method can be used across multiple batches of images without additional user input. Therefore, NLTD method can be seamlessly integrated in computational pathology pipelines that aim to analyze large cohorts of images, such as the TCGA project (http://cancergenome.nih.gov/) or Human Protein Atlas Project.39 The TCGA project also provides the opportunity to link morphological features of the histopathological images with genomic information, with potential for better understanding of what effect the changes in gene expression can have on the morphology of the tissue.

The tissue component images created through the NLTD method can be readily analyzed to yield additional information, such as nuclei information and immunohistochemical grading. We found that the NLTD method performs nuclei segmentation better than the CD approach. The segmentation approach presented here based on a simple implementation of Otsu’s thresholding, but more refined approaches, as mentioned in reviews of computational pathology,19, 20 should lead to greater accuracy using tissue-component images from the NLTD method. We have demonstrated that the nuclei component images generated using the NLTD method have greater separation of signal from background compared to CD, suggesting simpler processes for nuclei edge detection can be used and lead to significantly reduced segmentation times. Nuclei detection requires very fast computation since an individual tissue image can have millions of nuclei, leading to large increases in total processing time with each additional nuclei detection step.

The field of computational pathology is rapidly growing, and there are many opportunities for computational approaches to provide additional prognostic and diagnostic information that cannot be provided by pathologists alone.40, 41, 42 The NLTD method presented here provides a framework that can be easily implemented for many different applications, including nuclei detection and immunohistochemistry grading. In addition to these applications, NLTD could be used as a visualization tool to normalize tissue appearance across batches, provide texture information for abundance of certain tissue components in a sample, or identify rare occurrences in whole slide images, such as mitotic nuclei. Further, the NLTD method requires no prior knowledge of an image’s color space and requires no parameterization from the user, which can allow for pathologists or medical technicians to apply this approach without requiring more sophisticated knowledge that may be needed for optimization methods or complex, linear algebraic approaches. Taken together, the proposed NLTD method presents an opportunity to establish a pipeline for classification and analysis of histopathological images that, in combination with pathologists’ expertise, can lead to better diagnosis and treatment planning for patients in the future.