UnMICST: Deep learning with real augmentation for robust segmentation of highly multiplexed images of human tissues

Yapp, Clarence; Novikov, Edward; Jang, Won-Dong; Vallius, Tuulia; Chen, Yu-An; Cicconet, Marcelo; Maliga, Zoltan; Jacobson, Connor A.; Wei, Donglai; Santagata, Sandro; Pfister, Hanspeter; Sorger, Peter K.

doi:10.1038/s42003-022-04076-3

Download PDF

Article
Open access
Published: 18 November 2022

UnMICST: Deep learning with real augmentation for robust segmentation of highly multiplexed images of human tissues

Communications Biology volume 5, Article number: 1263 (2022) Cite this article

4070 Accesses
7 Citations
6 Altmetric
Metrics details

Subjects

Abstract

Upcoming technologies enable routine collection of highly multiplexed (20–60 channel), subcellular resolution images of mammalian tissues for research and diagnosis. Extracting single cell data from such images requires accurate image segmentation, a challenging problem commonly tackled with deep learning. In this paper, we report two findings that substantially improve image segmentation of tissues using a range of machine learning architectures. First, we unexpectedly find that the inclusion of intentionally defocused and saturated images in training data substantially improves subsequent image segmentation. Such real augmentation outperforms computational augmentation (Gaussian blurring). In addition, we find that it is practical to image the nuclear envelope in multiple tissues using an antibody cocktail thereby better identifying nuclear outlines and improving segmentation. The two approaches cumulatively and substantially improve segmentation on a wide range of tissue types. We speculate that the use of real augmentations will have applications in image processing outside of microscopy.

Microenvironmental reorganization in brain tumors following radiotherapy and recurrence revealed by hyperplexed immunofluorescence imaging

Article Open access 15 April 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

Pretraining a foundation model for generalizable fluorescence microscopy-based image restoration

Article 12 April 2024

Introduction

The cell types, basement membranes, and connective structures that organize tissues and tumors are present on length scales ranging from subcellular organelles to whole organs (<0.1 to >10⁴ µm). Microscopy using Hematoxylin and Eosin (H&E) complemented by immunohistochemistry¹ has long played a primary role in the study of tissue architecture^2,3. Moreover, clinical histopathology remains the primary means by which diseases such as cancer are staged and managed clinically⁴. However, classical histology provides insufficient molecular information to precisely identify cell subtypes, study mechanisms of development, and characterize disease genes. High-plex imaging (Supplementary Table 1)^5,6,7,8,9 of normal and diseased tissues (sometimes called spatial proteomics) yields subcellular resolution data on the abundance of 20–60 antigens, which is sufficient to identify cell types, measure cell states (quiescent, proliferating, dying, etc.) and interrogate cell signaling pathways. High-plex imaging also reveals the morphologies and positions of acellular structures essential for tissue integrity in a preserved 3D environment. High-plex imaging methods differ in resolution, field of view, and multiplicity (plex), but all generate 2D images of tissue sections; in current practice, these are usually 5–10 µm thick.

When multiplexed images are segmented and quantified, the resulting single cell data are a natural complement to single cell RNA Sequencing (scRNASeq) data, which have had a dramatic impact on our understanding of normal and diseased cells and tissues^10,11. Unlike dissociative RNASeq, however, multiplex tissue imaging preserves morphology and spatial information. However, high-plex imaging data are substantially more challenging to analyze computationally than images of cultured cells, the primary emphasis of biology-focused machine vision systems to date. In particular, single cell analysis of imaging data requires segmentation, a computer vision technique that assigns class labels to an image in an instance or pixel-wise manner to subdivide it. The resulting segmentation mask is then used to quantify the intensities of different markers by integrating fluorescent signal intensities across each object (cell) identified by the mask or across a shape (usually an annulus) that outlines or is centered on the mask¹². Extensive work has gone into the development of methods for segmenting metazoan cells grown in culture, but segmentation of tissue images is a more difficult challenge due to cell crowding and the diverse morphologies of different cell types. Recently, segmentation routines that use machine learning have become standard, paralleling the widespread use of convolutional neural networks (CNNs) in image recognition, object detection, and synthetic image generation¹³. Architectures such as ResNet, VGG16, and more recently, UNet and Mask R-CNN^14,15 have gained widespread acceptance for their ability to learn millions of parameters and generalize across datasets, as evidenced by excellent performance in a wide range of segmentation competitions, as well as in hackathon challenges¹⁶ using publicly available image datasets^17,18.

In both cultured cells and tissues, localizing nuclei is an optimal starting point for segmenting cells since most cell types have one nucleus (cells undergoing mitosis, muscle and liver cells and osteoclasts are important exceptions), and nuclear stains with high signal-to-background ratios are widely available. The nucleus is generally quite large (5–10 µm) relative to the resolution of wide-field fluorescence microscopes (~0.5 µm for a 0.9 numerical aperture – NA – objective lens), making it easy to detect at multiple magnifications. Nuclei are also often found at the approximate center of a cell. There are advantages to using additional markers during image acquisition; for example Schüffler et al.¹⁹ used multiplexed IMC data and watershed methods for multi-channel segmentation. However, it is not clear which proteins are sufficiently widely expressed in different cell types and tissues to be useful in segmentation. Methods based on random forests such as Ilastik and Weka^20,21 exploit multiple channels for class-wise pixel classification via an ensemble of decision trees to assign pixel-wise class probabilities in an image. However, random forest models have far less capacity for learning than CNNs, which is a substantial disadvantage. Thus, the possibility of using CNNs with multi-channel data to enhance nuclei segmentation has not been widely explored.

A wide variety of metrics are used to quantify the performance of segmentation routines. These can be broadly divided into pixel and instance-level metrics; the former measures overlap in the shape and position of segmentation masks at the pixel level whereas the latter measures whether there is agreement in the presence or absence of a mask. The sweeping intersection over union (IoU; the Jaccard Index)¹⁶ is an example of a pixel-level performance metric; it is calculated by measuring the overlap between a mask derived from ground truth annotation and a predicted mask based on the ratio of the intersection of the pixels to their union. The greater the IoU, the higher the accuracy, with an ideal value of 1 (although this is very rarely achieved). The F1-score is an example of an instance-level metric that uses the weighted average of the precision (true positives normalized to predictions) and recall (true positives normalized to ground truth). A ‘positive’ in this case is commonly scored as 50% overlap (at the pixel level) between a predicted mask and the ground truth. It, therefore, accommodates substantial disagreement about the shape of the mask. In this context, it is important to note that supervised learning relies on the establishment of a ground truth by human experts. As described in detail below, for tissue imaging, the reported level of agreement among human experts for pixel-level annotation is only about 0.6 (at an IoU of 60%), suggesting that experts are themselves unable to determine the precise shape of segmentation masks (and the cells they represent). Not surprisingly, interobserver agreement is substantially higher (0.7–0.9) when evaluated using an instance level metric such as F1 score because it is relatively simple to decide whether a nucleus is present or not. As mentioned above, segmentation masks in high-plex imaging are commonly used to compute the integrated intensities of antibodies against nuclear, cytoplasmic, and cell-surface proteins and this places a premium on correctly determining the shape of the mask. Thus, the use of stringent pixel-level metrics such as IoU is essential for evaluating segmentation accuracy in single-cell analysis of multiplex tissue images.

The accuracy of segmentation by humans and computational methods is crucially dependent on the quality of the original images. In practice, many images of human and murine tissues have focus artefacts (blur) and images of some cells are saturated (with intensities above the linear range of the camera). This is particularly true of whole-slide imaging in which up to 1000 sequentially acquired image tiles are used to create mosaic images of specimens as large as several square centimeters. Whole slide imaging is a diagnostic necessity²² and essential to achieve sufficient power for rigorous spatial analysis²³. However, many recent papers addressing the segmentation of tissue images restrict their analysis to the clearest in-focus fields. This is logical because, in the setting of supervised learning, it is easier to obtain training data and establish a ground-truth when images are clear and inter-observer agreement is high. In practice, however, all microscopy images of tissue specimens have issues with focus: the depth of field of objective lenses capable of high resolution imaging (high NA lenses) is typically less than the thickness of the specimen so that objects above and below the plane of optimal focus are blurred. Images of human biopsy specimens are particularly subject to blur and saturation artefacts because the tissue sections are not always uniformly co-planar with the cover slip. Since most research on human tissues is incidental to diagnosis or treatment, it is rarely possible to reject problematic specimens outright. Moreover, reimaging of previously analyzed tissue sections is rarely possible due to tissue disintegration. Thus, image segmentation with real-world data must compensate for common image aberrations.

The most common way to expand training data to account for image artefacts is via computational augmentation²⁴ which involves pre-processing images via random rotation, shearing, flipping, etc. This is designed to prevent algorithms from learning irrelevant aspects of an image, such as orientation. To date, focus artefacts have been tackled using computed Gaussian blur to augment training data^25,26,27. However, Gaussian blur is only an approximation of the blurring inherent to any optical imaging system having limited bandpass (that is—any real microscope) plus the effects of refractive index mismatches and light scattering.

In this paper, we investigate ways to maximize the accuracy of image segmentation by machine learning algorithms in multiplexed tissue images containing common imaging artefacts. We generate a set of training and test data with ground-truth annotations via human curation of multiple normal tissues and tumors, and use these data to score segmentation accuracy achieved on three deep learning networks, each of which was independently trained and evaluated: UNet, Mask R-CNN, and Pyramid Scene Parsing Network (PSPNet). The resulting models comprise a family of Universal Models for Identifying Cells and Segmenting Tissue (UnMICST) in which each model is based on the same training data but a different class of ML network. Based on our analysis we identify two ways to improve segmentation accuracy for all three networks. The first involves adding images of nuclear envelope staining (NES) to images of nuclear chromatin acquired using DNA-intercalating dyes. The second involves adding real augmentations, defined here as intentionally defocused and over-saturated images (collected from the same specimens), to the training data to make models more robust to the types of artefacts encountered in real tissue images. We find that augmentation with real data significantly outperforms conventional Gaussian blur augmentation, offering a statistically significant improvement in model robustness. Across a range of tissue types, improvements from adding NES data and real augmentations are cumulative.

Results

Data sets and ground truth annotation of nuclear boundaries

One challenge in supervised machine learning on tissue images is a lack of sufficient freely-available data with ground truth labeling. Experience with natural scene images¹⁴ has shown that the acquisition of labels can be time consuming and rate limiting²⁸. It is also well established that cells in different types of tissue have nuclear morphologies that vary substantially from the spherical and ellipsoidal shape observed in cultured cells²⁹. Nuclear pleomorphism (variation in nuclear size and shape) is even used in histopathology to grade cancers³⁰. To account for variation in nuclear morphology we generated training, validation, and test datasets from seven different tissue and tumor types (lung adenocarcinoma, non-neoplastic small intestine, normal prostate, colon adenocarcinoma, glioblastoma, non-neoplastic ovary, and tonsil) found in 12 cores from EMIT (Exemplar Microscopy Images of Tissue³¹, RRID: SCR_021052), a tissue microarray assembled from clinical discards. The tissues had cells with nuclear morphologies ranging from mixtures of cells that were large vs. small, round cells vs. narrow, and densely and irregularly packed vs. organized in clusters. A total of ~10,400 nuclei were labeled by a human expert for nuclear contours, centers, and background. In addition, two human experts labeled a second dataset from a whole-slide image of human melanoma³² to establish the level of inter-observer agreement and to provide a test data set that was disjoint from the training data.

Evaluating the performance of ML segmentation algorithms and models

We implemented and then evaluated two semantic and one instance segmentation algorithms that are based on deep learning/CNNs (UNet, PSPNet, and Mask R-CNN, respectively). Semantic segmentation is a coarse-grained ML approach that assigns objects to distinct trained classes, while instance segmentation is fine grained and identifies individual instances of objects. We trained each of these models (UnMICST-U, UnMICST-P, and UnMICST-M, respectively) on manually curated and labeled data from seven distinct tissue types. The models were not combined, but were tested independently in an attempt to determine which network exhibited the best performance.

We evaluated performance using both pixel- and instance-level metrics including the sweeping intersection over union (IoU) threshold described by Caicedo et al.¹⁶, which is based on images of cell lines, and implemented in the widely used COCO dataset³³. The IoU (the Jaccard Index) is calculated by measuring the overlap between the ground truth annotation and the prediction via a ratio of the intersection to the union of pixels in two masks. The (IoU) threshold is evaluated over a range of values from the least stringent, 0.55, to most stringent, 0.8¹⁶. Unlike a standard pixel accuracy metric (the fraction of pixels in an image that were correctly classified), IoU is not sensitive to class-imbalance. IoU is a particularly relevant measure of segmentation performance for analysis of high-plex images. When masks are used to quantify marker intensities in other channels, we are concerned not only with whether a nucleus is present or not at a particular location but whether the masks are the correct size and shape.

Examples of instance-level metrics are true positives (TP) and true negatives (TN), which classify predicted objects based on whether they overlap by 50% or greater, otherwise they are deemed as false positives (FP) and false negatives (FN). The frequencies of these four states are used to calculate the F1-score and average precision (AP). The F1-score is the weighted average of precision (true positives normalized to predictions) and recall (true positives normalized to ground truth), and AP considers the number of true positives, total number of ground truth, and predictions.

The accuracy expected for these methods was determined by having multiple human experts label the same set of data and determine the level of inter-observer agreement. We assessed inter-observer agreement using both the F1-score and sweeping IoU scores with data from whole-side images of human melanoma³². For a set of ~4900 independently annotated nuclear boundaries, two experienced microscopists achieved a mean F1-score of 0.78 (Supplementary Information 1) and an IoU of 60% at a threshold of 0.6. In the discussion, we compare these data to values obtained in other recently published papers and address the discrepancy in F1-scores and IoU values. We also discuss how these values might be increased to achieve super-human performance^24,34.

Real augmentations increase model robustness to focus artefacts

To study the impact of real and computed augmentations on the performance of segmentation methods, we trained models with different sets of data involving both real and computed augmentations and then tested the data on images that were acquired in focus, out of focus or blurred using a Gaussian kernel. Where dataset sizes were unbalanced, we supplemented such instances with rotation augmentations. We assessed segmentation accuracy quantitatively based on IoU and qualitatively by visual inspection of predicted masks overlaid on image data. Real augmentation involved adding additional empirical, rather than computed, training data having the types of imperfections most commonly encountered in tissue. This was accomplished by positioning the focal plane 3 µm above and below the specimen, resulting in de-focused images. A second set of images was collected at long exposure times, thereby saturating 70–80% of pixels. Because blurred and saturated images were collected sequentially without changing stage positions, it was possible to use the same set of ground truth annotations. For computed augmentations, we convolved a Gaussian kernel with the in-focus images using a range of standard deviations chosen to cover a broad spectrum of experimental cases (Fig. 1a). In both scenarios, the resulting models were evaluated on a test set prepared in the same way as the training set.

**Fig. 1: Comparing the use of real augmentations (defocused and overexposed images) and Gaussian blur.**

In an initial set of studies, we found that models created using training data augmented with Gaussian blur performed well on Gaussian blurred test data. However, when evaluated against test data involving defocused and saturated images, we found that Gaussian blur augmentation improved accuracy only slightly relative to baseline models lacking augmentations (Fig. 1b). In contrast, the use of training data supplemented with real augmentations increased the fraction of cells retained at an IoU threshold of 0.6 by 40–60%. Statistically significant improvement was observed up to an IoU cutoff of 0.8 with all three learning frameworks (UnMICST-U, UnMICST-M, and UnMICST-P models). To perform a balanced comparison, we created two sets of training data having equal numbers of images. The first set contained the original data plus computed 90- and 180° rotations, and the second set contained original data plus defocused data collected from above and below the specimen. Again, we found that models trained with real augmentations substantially outperformed rotationally augmented models when tested on defocused test data (Fig. 1c). Thus, training any of the three different deep learning architectures with real augmentation generated models that outperformed models with computed augmentation using test data that contained commonly encountered artefacts.

Addition of NES improves segmentation accuracy

When we stained our TMA panel (the Exemplar Microscopy Images of Tissues and Tumors (EMIT) TMA) we found that antibodies against lamin A and C (Fig. 2a) (which are different splice forms of LMNA gene) stained approximately only half as many nuclei as antibodies against lamin B1 (Fig. 2b) or lamin B2 (Fig. 2c) (products of the LMNB1 and LMNB2 genes). Staining for the lamin B receptor (Fig. 2e) exhibited poor image contrast. A pan-tissue survey showed that a mixture of antibodies for nucleoporin NUP98 (Fig. 2d) and lamin B2 conjugated to the same fluorophore (Alexafluor-647) generated nuclear envelope staining (NES) for nearly all nuclei across multiple tissues (Fig. 2f–h). We judged this to be the optimal antibody cocktail. However, only some cell types, epithelia in colorectal adenocarcinoma for example, exhibited the ring-like structure that is characteristic of nuclear lamina in cultured epithelial cells. The nuclear envelope in immune and other cells has folds and invaginations³⁵ and in our data, NES staining could be irregular and diffuse, further emphasizing the difficulty of finding a broadly useful NES stain in tissue.

**Fig. 2: Comparing different nuclear envelope stains in colon adenocarcinoma.**

The value of NES images for model performance was assessed quantitatively and qualitatively. In images of colon adenocarcinoma, non-neoplastic small intestine, and tonsil tissue, we found that the addition of NES images resulted in significant improvements in segmentation accuracy based on IoU with all three learning frameworks; improvements in other tissues, such as lung adenocarcinoma, were more modest and sporadic (Fig. 3a, Lung). For nuclear segmentation of fibroblasts in prostate cancer tissue, UnMICST-U and UnMICST-M models with NES data were no better than models trained on DNA staining alone. Most striking were cases in which NES data slightly decreased performance (UnMICST-P segmentation on prostrate fibroblasts and UnMICST-U segmentation of glioblastoma). Inspection of the UnMICST-P masks suggested that the segmentation of well-separated fibroblast nuclei was already optimal with DNA images alone (~60% of nuclei retained at IoU of 0.6), implying that the addition of NES images afforded little improvement. With UnMICST-U masks in glioblastoma, the problem appeared to involve atypical NES morphology, which is consistent with a high level of nuclear pleomorphism and the presence of giant cells, both of which are well-established features of high-grade glioblastoma^36,37. We also note that NES data alone was inferior to DNA staining as a sole source of training data and should therefore be used in combination with images of DNA (Supplementary Information 2). Thus, adding NES to training data broadly but not universally improves segmentation accuracy.

**Fig. 3: NES with DNA improves nuclear segmentation.**

Combining NES images and real augmentation has a cumulative effect

To determine whether real augmentation and NES would combine during model training to achieve superior segmentation precision relative to the use of either type of data alone, we trained and tested models under four different scenarios (using all three learning frameworks; Fig. 4). We used images from the small intestine, a tissue containing nuclei having a wide variety of morphologies, and then extended the analysis to other tissue types (see below). Models were evaluated on defocused DNA test data to increase the sensitivity of the experiment. In the first scenario, we trained baseline models using in-focus DNA image data and tested models on unseen in-focus DNA images. With tissues such as the small intestine, which are challenging to segment because they contain densely-packed nuclei, scenario A resulted in slightly under-segmented predictions. In Scenario B and for all subsequent scenarios, defocused DNA images were included in the test set, giving rise to contours that were substantially misaligned with ground truth annotations and resulted in higher undersegmentation. False-positive predictions and imprecise localizations of the nuclei membrane were observed in areas devoid of nuclei and with very low contrast (Fig. 4a). When NES images were included in the training set (Scenario C), nuclear boundaries were more consistent with ground truth annotations, although false-positive predicted nuclei still remained. The most robust performance across ML frameworks and tissues was observed when NES images and real augmentation were combined: accurate nuclear boundaries were generally well aligned with ground truth annotations in both shape and in size. Observable differences in the placement of segmentation masks were reflected in improvements in IoU: for all three deep learning frameworks, including NES data and real augmentations increased the fraction of nuclei retained by 50% at an IoU threshold of 0.6 (Fig. 4b). The accuracy of UnMICST-P (blue curve) trained on in-focus DNA data alone was higher than the other two baseline models at all IoU thresholds, suggesting that UnMICST-P has a greater capacity to learn. UnMICST-P may have an advantage in experiments in which staining the nuclear envelope proves difficult or impossible.

**Fig. 4: Combination of NES and real image augmentations on segmentation performance.**

Combining NES and real augmentation is advantageous across multiple tissue types

To determine if improvements in segmentation would extend to multiple tissue types we repeated the analysis described above using three scenarios for training and testing with both in-focus (Fig. 5a) and defocused images (Fig. 5b). Scenario 1 used in-focus DNA images for training (blue bars), scenario 2 used in-focus DNA and NES images (red bars), and scenario 3 used in-focus DNA and NES images plus real augmentation (green bars). While the magnitude of the improvement varied with tissue type and test set (panel a vs b), the results as a whole support the conclusion that including both NES and real augmentations during model training confers statistically significant improvement in segmentation accuracy with multiple tissue types and models. The accuracy boost was greatest when models performed poorly (e.g., in scenario 1 where models were tested on defocused colon image data; Fig. 5b, blue bars), so that segmentation accuracy became relatively uniform across tissue and cell types. As a final test, we re-examined the whole slide melanoma image described above (which had not been included in any training data) and evaluated IoU, AP, and F1-scores. The data were consistent regardless of metric and showed that all three models benefitted from the inclusion of training data that included NES images and real augmentations (Supplementary Information 3). The improvement in accuracy, however, was modest and similar to lung adenocarcinoma. We attribute this to the fact that, like lung adenocarcinoma, melanoma has less dense regions, which our baseline models already performed well on.

**Fig. 5: Assessing different training strategies on (a) in-focus and (b) defocused test data for different tissue types.**

Applying UnMICST to highly multiplex whole-slide tissue images

To investigate the overall improvement achievable with a representative UnMICST model, we tested UnMICST-U with and without real or computed augmentations and NES data on all six tissues as a set, including in-focus, saturated, and out-focus images (balancing the total amount of training data in each case). A 1.7-fold improvement in accuracy was observed at an IoU of 0.6 for the fully trained model (i.e., with NES data and real augmentations; Fig. 6a). Inspection of segmentation masks also demonstrated more accurate contours for nuclei across a wide range of shapes. The overall improvement in accuracy was substantially greater than any difference observed between semantic and instance segmentation frameworks. We, therefore, focused subsequent work on the most widely used framework: U-Net.

**Fig. 6: Applying UnMICST models to highly multiplexed image data.**

We also tested a fully trained UnMICST-U model on a 64-plex CyCIF image of non-neoplastic small intestine tissue from the EMIT TMA (Fig. 6b). Staining intensities were quantified on a per-cell basis, and the results visualized using Uniform Manifold Approximation and Projection (UMAP; Fig. 6c). Segmentation masks were found to be well-located with little evidence of under or over-segmentation (Fig. 6d). Moreover, whereas 21% of cells with segmented nuclei stained positive (as determined by using a Gaussian-mixture model) for the immune cell marker CD45, and 53% stained positive for the epithelial cell marker E-cadherin, less than 3% were positive for both. No known cell type is actually positive for both CD45 and E-cadherin, and the very low abundance of these double-positive cells is evidence of accurate segmentation. When we examined some of the 830 double positive cells (blue dashed circle in Fig. 6c) we found multiple examples of a CD3⁺ T cell (yellow arrowheads; light yellow dots in Fig. 6e) tightly associated with or between the epithelial cells of intestinal villi (green kiwi-like structure visible in Fig. 6e). This is consistent with the known role of the intestinal epithelium in immune homeostasis³⁸. In these cases, the ability of humans to distinguish immune and epithelial cells relies on prior knowledge, multi-dimensional intensity features and subtle differences in shape and texture—none of which were aspects of model training. Thus, future improvements in tissue segmentation are likely to require the development of CNNs able to classify rare but biologically interesting spatial arrangements, rather than simple extensions of the general purpose segmentation algorithms described here.

Some tissues still pose a challenge for nuclei segmentation

Of all the tissue types annotated and tested in this paper, non-neoplastic ovary was the most difficult to segment (Supplementary Information 4a) and addition of ovarian training data to models trained on data from other tissues decreased overall accuracy (Supplementary Information 4b). We have previously imaged ovarian cancers at even higher resolution (60×/1.42NA sampled at 108 nm pixel size)³⁹ using optical sectioning and deconvolution microscopy; inspection of these images reveals nuclei with highly irregular morphology, poor image contrast, and dense packing (Supplementary Information 4c) unlike colon adenocarcinoma (Supplementary Information 4d). Thus, additional research, possibly involving different NES antibodies, will be required to improve performance with ovarian and other difficult to segment tissues. Until then, caution is warranted when combining training data from tissues with very different nuclear morphologies.

Discussion

This paper makes four primary contributions to the growing literature on the segmentation of tissue images, which is an essential step in single-cell data analysis. First, it explicitly considers training and test data that contain the types of focus and intensity artefacts that are commonly encountered in whole-slide images, particularly images of human tissues acquired in the course of clinical care and treatment. This contrasts with other recent papers that focus on optimal fields of view. Second, it shows that it is often possible to increase segmentation accuracy by including additional data (NES) on nuclear envelop morphology, and it proposes a broadly useful antibody cocktail. Third, and most significantly, it shows that the addition of real augmentations comprising defocused and saturated images to model training data improves segmentation accuracy to a significant extent whereas augmentations based on Gaussian blurring provide substantially less benefit. These results extend to deep learning frameworks based on instance segmentation (UnMICST-M) and on semantic segmentation (UnMICST-U and UnMICST-P). Finally, using newly generated labeled training data for multiple tissue types, it shows that real augmentation and NES combine to improve the robustness and accuracy of segmentation across many tissues; these improvements are directly applicable to the real-world task of segmenting high dimensional tissue and tumor images. The magnitude of improvement observed by the inclusion of NES data or real augmentation is substantially greater than the differences observed between ML frameworks. UnMICST models, therefore, represent a good starting point for performing image segmentation on rapidly growing tissue data repositories. Errors remaining when multiplexed images are segmented using optimized UnMICST models appear to have a subtle biological basis. The development of additional physiology-aware machine-learning models may be necessary to reduce these apparent errors.

One of the surprises in the current work was the seemingly low level of agreement achieved by two human experts annotating the same image data; we estimated that only 60% of the annotated nuclei between annotators had an overlap of 60% or greater (0.6 IoU threshold). Poor agreement is almost certainly a consequence of our use of a stringent sweeping IoU scoring criterion that measures the fraction of pixels that overlap between two segmentation masks. The alternative, and widely-used F1 score, which determines whether two observers (or an observer and a machine) agree on the presence of a nucleus, achieves inter-observer and automated segmentation accuracy of 0.78, which is comparable to the highest F1-scoring tissue reported for Mesmer⁴⁰, another deep learning model applied to tissue images. Moreover, our results with IoU values are similar to those recently reported by Kromp et al.¹⁷ (when IoU thresholds are adjusted to enable direct comparison). The authors of Cellseg⁴¹ also report comparable segmentation accuracies and note the difficulty of achieving a high IoU value with cells that vary dramatically in shape and focus.

It would therefore appear that many studies have achieved similar levels of inter-observer agreement and that our results are not an outlier, even though we include problematic data. This points to a fundamental challenge for all supervised learning approaches whose solution is not immediately clear. Collection of precise 3D data followed by the imposition of different levels of blurring and addition of intensity artefacts will be needed to understand the origins of inter-observer disagreement in tissue images and achieve higher quality training and test data. It also seems likely that practical improvements in segmentation are likely to come from combining recently described advances. For example, Greenwald et al.⁴⁰ use a clever community-based approach to acquire much more training data than in the current work, Kromp et al.¹⁷ combine tissue images with ground truth annotation acquired from cultured cells (by a team of undergraduate students), whereas the current work focuses on the use of NES and real augmentations to improve the robustness of segmentation algorithms across the board.

From a machine learning perspective, the value of adding additional image channels to training data is self-evident. Experimental feasibility is not always so clear. A key tradeoff is that the greater the number of fluorescence channels used for segmentation, the fewer the channels available for the collection of data on other markers. Fortunately, the development of highly multiplexed imaging has made this less relevant because collection of 20–40 or more image channels (each corresponding to a different fluorescent antibody) has become routine. This makes it straightforward to reserve two channels for segmentation. The cost-benefit ratio of adding extra segmentation data will be different in high content screening of cells in multi-well plates, for which inexpensive reagents are generally essential than in tissue imaging. In tissues, the morphology of nuclear lamin changes with disease state⁴², cell type, activation state and numerous other biological processes. While these challenges segmentation routines, imaging lamins is also likely to provide valuable biological information, further arguing for routine collection of these data⁴³. To allow others to build on the current work, we are releasing all training and test images, their segmentation masks and annotations, and real augmentations for multiple types of tissue (tonsil, ovary, small intestine and cancers of the colon, brain, lung, prostate) via the EMIT resource; models are released as components of the UnMICST model resource (see data availability and code availability information).

The most immediately generalizable finding from this work is that real augmentation outperforms computed augmentation generated using Gaussian kernels. Blurring and image saturation are an inevitable consequence of the limited bandwidth of optical systems, the thickness of specimens relative to the depth of field, light scattering, diffraction, the use of non-immersion objective lenses and consequent refractive index mismatches, and a variety of other physical processes. Real out-of-focus blur also differs when the focal plane is above and below the specimen. Areas for future application of real augmentations could include inhomogeneous light sources and stage jitter. It will undoubtedly be useful to determine kernels for more effective computed augmentation, but collecting real augmentation data imposes a minimal burden in a real-world setting. Our observation that real augmentation outperforms computed augmentation may also have general significance outside of the field of microscopy: with any high-performance camera system, real out-of-focus data will inevitably be more complicated than Gaussian blur.

Methods

Sample preparation for imaging

To generate images for model training and testing, human tissue specimens from multiple patients were used to construct a multi-tissue microarray (HTMA427) under an excess (discarded) tissue protocol approved by the Institutional Review Board (IRB) at Brigham and Women’s Hospital (BWH IRB 2018P001627). One or two 1.5 mm diameter cores were taken from tissue regions with the goal of acquiring one or two examples of different healthy or tumor types including non-neoplastic medical diseases and secondary lymphoid tissues such as tonsil. Slides were stained with reagents from Cell Signaling Technologies (Beverly MA, USA) and Abcam (Cambridge UK) as shown in Table 1.

Table 1 Antibodies used for immunofluorescence staining.

Full size table

Before imaging, slides were mounted with 90% glycerol and a #1.5 coverslip. Prior to algorithmic evaluation, the images were split into three mutually disjoint subsets and used for training, validation, and testing.

Acquisition of image data and real augmentations

The stained TMA was imaged on a INCell 6000 (General Electric Life Sciences) microscope equipped with a 20×/0.75 objective lens (370 nm nominal lateral resolution at 550 nm wavelength) and a pixel size of 0.325 µm per pixel. Hoechst and lamin-A647 were excited with a 405 and 642 nm laser, respectively. Emission was collected with the DAPI (455/50 nm) and Cy5 (682/60 nm) filter sets with exposure times of 60 and 100 ms, respectively. Whole-slide imaging involved acquisition of 1215 tiles with an 8% overlap, which is recommended for stitching in ASHLAR, a next generation stitching and registration algorithm for large images (https://github.com/labsyspharm/ashlar). To generate defocused data, we acquired images from above and below the focal plane by varying the Z-axis by 3 µm in both directions. To generate saturated images of DNA staining, a 150 ms exposure time was used. These two types of suboptimal data were then used for real augmentation during model training, as described below.

Representative cores for lung adenocarcinoma, non-neoplastic small intestine, normal prostate, colon adenocarcinoma, glioblastoma, non-neoplastic ovary, and tonsil were extracted from image mosaics and down-sampled by a factor of 2 to match the pixel size of images routinely acquired and analyzed in MCMICRO³¹. Images were then cropped to 256 × 256-pixel tiles, and in-focus DNA and NES were imported into Adobe Photoshop to facilitate human annotation of nuclear boundaries. We labeled contours and background classes on separate layers while swapping between DNA and NES as necessary. To save time, we drew complete contours of nuclei and filled these in using the Matlab imfill operation to generate nuclei centers. For nuclei at the image borders where contours would be incomplete, we manually annotated nuclei centers. As described by Ronneberger et al. (2015), a fourth layer was used to mark areas between clumped cells. These additional annotations made it possible to specifically penalize models that incorrectly classified these pixels. During image review, we observed that certain nuclei morphologies appeared more frequently than others. To account for this imbalance, we annotated only characteristic nuclei of each tissue type in each image in an effort to balance the occurrence of nuclei shapes in our training, validation, and test sets. For example, small intestine and colon images displayed both round and elongated nuclei, and since the former shape was already present in other tissues (such as lung) in our dataset, we only annotated the latter shape for small intestine and colon tissues. Full dense annotations on a held-out test dataset were validated by a second annotator and measured using the F1-score. The F1-score evaluation between both annotated ground truths was high and demonstrated excellent agreement (Supplementary Information 1).

Because original, defocused, and saturated images of DNA were all acquired in the same image stack, it was possible to use a single registered set of DNA annotations across all augmented image channels. To produce the training set, each image was cropped into 64 × 64 patches, normalized to use the full dynamic range, and further augmented using 90° rotations, reflections, and 20% upscaling. Consistent with the training set, the validation and test sets also include defocused and saturated examples but were not augmented with standard transformations. The ratio of data examples present in the training, validation, and test set split was 0.36:0.24:0.4. For a fair comparison across models, the same dataset and split were used for the three deep learning frameworks described in this manuscript (Supplementary Table 2).

Model implementation

To facilitate model training, three distinct state-of-the-art architectures were separately, trained, implemented, and evaluated. They are, in no particular order, UNet, Mask R-CNN, and PSPNet and were adopted from their original references without modification to their architecture. UNet was selected for its prior success in the biomedical domain, Mask R-CNN was selected for its ability to perform both object detection and mask generation, and PSPNet was selected for its capacity to integrate image features from multiple spatial scales. Training, validation, and test data were derived from 12 cores in 7 tissues and a total of 10,359 nuclei in the composition of colon – 1142; glioblastoma (GBM) – 675; lung – 1735; ovarian – 956; fibroblast – 922; small intestine – 1677; tonsil – 3252. To maintain consistency of evaluation across segmentation algorithms, segmentation accuracy was calculated by counting the fraction of cells in a held out test set that passed a sweeping Intersection over Union (IoU) threshold. The NES channel was concatenated to the DNA channel as a three-dimensional array as input into each architecture.

UnMICST-U model training

A three-class UNet model¹⁴ was trained based on annotation of nuclei centers, nuclei contours, and background. The neural network is comprised of 4 layers and 80 input features. Training was performed using a batch size of 32 with the Adam Optimizer and a learning rate of 0.00005 with a decay rate of 0.98 every 5000 steps until there was no improvement in accuracy or ~100 epochs had been reached. Batch normalization was used to improve training speed. During training, the bottom layer had a dropout rate of 0.35, and L1 regularization was implemented to minimize overfitting^44,45 and early stopping. Training was performed on workstations equipped with NVidia GTX 1080 or NVidia TitanX GPUs.

UnMICST-M model training

Many segmentation models are based on the Mask R-CNN architecture¹⁵, Mask R-CNN has previously exhibited excellent performance on a variety of segmentation tasks. Mask R-CNN begins by detecting bounding boxes of nuclei and subsequently performs segmentation within each box. This approach eliminates the need for an intermediate watershed, or equivalent, segmentation step. Thus, Mask R-CNN directly calculates a segmentation mask, significantly reducing the overhead in traditional segmentation pipelines. We adopted a ResNet50⁴⁶ backbone model in the UnMICST-M implementation and initialized the weights using pretrained values from the COCO object instance segmentation challenge³³ to improve convergence properties. For efficient training, we upsampled the original input images to 800 × 800-pixels and trained a model for 24 epochs using a batch size of 8. The Adam optimizer, with a weight decay of 0.0001 to prevent overfitting, was exploited with a variable learning rate, initially set to 0.01 and decreased by a factor of 0.1 at epochs 16 and 22. Training was performed on a compute node cluster using 4 NVidia TitanX or NVidia Tesla V100 GPUs. For evaluation and comparison, we used the model with the highest performance on the validation set, following standard practice.

UnMICST-P model training

We trained a three class PSPNet model⁴⁷ to extract cell nuclei centers, nuclei contours, and background from a wide variety of tissue types. PSPNet is one of the most widely used CNNs for the semantic segmentation of natural scene images in the computer vision field. The network employs a so-called pyramid pooling module whose purpose is to learn global as well as local features. The additional contextual information used by PSPNet allowed the segmentation algorithm to produce realistic probability maps with greater confidence. We used ResNet101 as a backbone. Training of the network was performed using a batch size of 8 with an image size of 256 × 256-pixels for 15,000 iterations or until the minimum loss model was obtained. A standard cross entropy loss function was used during training. Gradient descent was performed using the Adam optimizer with a learning rate of 0.0001 and a weight decay parameter of 0.005 via L2 regularization. Batch normalization was employed for faster convergence, and a dropout probability of 0.5 was used in the final network layer to mitigate overfitting. The model training was performed on a compute cluster node equipped with NVidia Tesla V100 GPUs.

Analysis of multi-dimensional data

For the analysis shown in Fig. 6, a 64-plex CyCIF image of non-neoplastic small intestine tissue from the EMIT TMA (https://www.synapse.org/#!Synapse:syn22345748/) was stained with a total of 45 antibodies as described in protocols https://www.protocols.io/view/ffpe-tissue-pre-treatment-before-t-cycif-on-leica-bji2kkge and https://doi.org/10.17504/protocols.io.bjiukkew. Images were segmented using the UnMICST-U model trained on DNA with NES data and real augmentations. Mean fluorescence intensities across 45 markers for 27,847 segmented nuclei were quantified as described in ref. ³¹. E-cadherin positive and CD45 positive cells were identified using Gaussian-mixture models on log-transformed data. For multivariate clustering, log-transformed mean intensities of all single cells of 14 selected protein markers (E-cadherin, pan-cytokeratin, CD45 CD4, CD3D, CD8, RF3, PML, GLUT1, GAPDH TDP43, OGT, COLL4, an EPCAM) were pre-processed using Uniform Manifold Approximation and Projection (UMAP)⁴⁸ and clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)⁴⁹. Clusters expressing a high level of both E-cadherin and CD45 were identified and overlaid on a false-colored image showing the staining of DNA, E-cadherin, and CD45.

Data availability

To allow others to build on the current work, we are releasing all training, validation and test images, their annotations, and real augmentations for multiple types of tissue (tonsil, ovary, small intestine and cancers of the colon, brain, lung, prostate) via the EMIT resource; models for training and inference are released as components of the UnMICST model resource. Source data for graphs in main figures can be found in Supplementary Data 1.xlsx.

Code availability

The code and instructions used for training and implementing the UnMICST models can be found at: https://labsyspharm.github.io/UnMICST-info/

References

Immunologists, A A. The demonstration of pneumococcal antigen in tissues by the use of fluorescent antibody. J. Immunol. 45, 159–170 (1942).
Google Scholar
Albertson, D. G. Gene amplification in cancer. Trends Genet. 22, 447–455 (2006).
Article CAS PubMed Google Scholar
Shlien, A. & Malkin, D. Copy number variations and cancer. Genome Med. 1, 62 (2009).
Article PubMed PubMed Central Google Scholar
Amin, M. B. et al. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more ‘personalized’ approach to cancer staging. Ca Cancer J. Clin. 67, 93–99 (2017).
Article PubMed Google Scholar
Gerdes, M. J. et al. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc. Natl Acad. Sci. USA 110, 11982–11987 (2013).
Article CAS PubMed PubMed Central Google Scholar
Giesen, C. et al. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat. Methods 11, 417–422 (2014).
Article CAS PubMed Google Scholar
Angelo, M. et al. Multiplexed ion beam imaging of human breast tumors. Nat. Med. 20, 436–442 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lin, J.-R. et al. Highly multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional optical microscopes. eLife 7, e31657 (2018).
Article PubMed PubMed Central Google Scholar
Stack, E. C., Wang, C., Roman, K. A. & Hoyt, C. C. Multiplexed immunohistochemistry, imaging, and quantitation: A review, with an assessment of Tyramide signal amplification, multispectral imaging, and multiplex analysis. Methods 70, 46–58 (2014).
Article CAS PubMed Google Scholar
Achim, K. et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat. Biotechnol. 33, 503–509 (2015).
Article CAS PubMed Google Scholar
Slyper, M. et al. A single-cell and single-nucleus RNA-Seq toolbox for fresh and frozen human tumors. Nat. Med. 26, 792–802 (2020).
Article CAS PubMed PubMed Central Google Scholar
McQuin, C. et al. CellProfiler 3.0: Next-generation image processing for biology. PLoS Biol. 16, e2005970 (2018).
Article PubMed PubMed Central Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention, 234–241 (2015).
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision, 2961–2969 (2017).
Caicedo, J. C. et al. Nucleus segmentation across imaging experiments: The 2018 Data Science Bowl. Nat. Methods 16, 1247–1253 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kromp, F. et al. An annotated fluorescence image dataset for training nuclear segmentation methods. Sci. Data 7, 262 (2020).
Article PubMed PubMed Central Google Scholar
Schwendy, M., Unger, R. E. & Parekh, S. H. EVICAN—a balanced dataset for algorithm development in cell and nucleus segmentation. Bioinformatics 36, 3863–3870 (2020).
Article CAS PubMed PubMed Central Google Scholar
Schüffler, P. J. et al. Automatic single cell segmentation on highly multiplexed tissue images. Cytom. A 87, 936–942 (2015).
Article Google Scholar
Arganda-Carreras, I. et al. Trainable Weka Segmentation: A machine learning tool for microscopy pixel classification. Bioinformatics 33, 2424–2426 (2017).
Article CAS PubMed Google Scholar
Berg, S. et al. ilastik: Interactive machine learning for (bio)image analysis. Nat. Methods 16, 1226–1232 (2019).
Article CAS PubMed Google Scholar
Aeffner, F. et al. Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. J. Pathol. Inform. 10, 9 (2019).
Article PubMed PubMed Central Google Scholar
Lin, J.-R. et al. Multiplexed 3D atlas of state transitions and immune interactions in colorectal cancer. Preprint at bioRxiv https://doi.org/10.1101/2021.03.31.437984 (2021).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (eds Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1097–1105 (Curran Associates, Inc., 2012).
Ahmed Raza, S. E. et al. MIMO-Net: A multi-input multi-output convolutional neural network for cell segmentation in fluorescence microscopy images. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) 337–340 (IEEE, 2017).
Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data 6, 60 (2019).
Article Google Scholar
Horwath, J. P., Zakharov, D. N., Mégret, R. & Stach, E. A. Understanding important features of deep learning models for segmentation of high-resolution transmission electron microscopy images. Npj Comput. Mater. 6, 1–9 (2020).
Article Google Scholar
Gurari, D. et al. How to collect segmentations for Biomedical Images? A Benchmark Evaluating the Performance of Experts, Crowdsourced Non-experts, and Algorithms. 2015 IEEE Winter Conf. Appl. Comput. Vis. (IEEE, 2015).
Skinner, B. M. & Johnson, E. E. P. Nuclear morphologies: Their diversity and functional relevance. Chromosoma 126, 195–212 (2017).
Article PubMed Google Scholar
Dalle, J.-R. et al. Nuclear pleomorphism scoring by selective cell nuclei detection. In IEEE Workshop on Applications of Computer Vision (WACV 2009), 7–8 December, 2009, Snowbird, UT, USA (IEEE Computer Society, 2009).
Schapiro, D. et al. MCMICRO: A scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat. Methods https://doi.org/10.1038/s41592-021-01308-y (2021).
Nirmal, A. J. et al. The spatial landscape of progression and immunoediting in primary melanoma at single-cell resolution. Cancer Discov. 12, 1518–1541 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lin, T.-Y. et al. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014 (eds. Fleet, D., Pajdla, T., Schiele, B. & Tuytelaars, T.) 740–755 (Springer International Publishing, 2014).
Deng, J. et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).
Fischer, E. G. Nuclear morphology and the biology of cancer cells. Acta Cytol. 64, 511–519 (2020).
Article CAS PubMed Google Scholar
Kros, J. M. Grading of gliomas: The road from eminence to evidence. J. Neuropathol. Exp. Neurol. 70, 101–109 (2011).
Article PubMed Google Scholar
Louis, D., Ohgaki, H., Wiestler, O. & Cavenee, W. WHO Classification of Tumours of the Central Nervous System, Neuro-Oncology, 23, 1231–1251 (2021).
Article Google Scholar
Allaire, J. M. et al. The intestinal epithelium: Central coordinator of mucosal immunity. Trends Immunol. 39, 677–696 (2018).
Article CAS PubMed Google Scholar
Färkkilä, A. et al. Immunogenomic profiling determines responses to combined PARP and PD-1 inhibition in ovarian cancer. Nat. Commun. 11, 1459 (2020).
Article PubMed PubMed Central Google Scholar
Greenwald, N. F. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol., 40, 555–565 (2022).
Article CAS PubMed Google Scholar
Lee, M. Y. et al. CellSeg: a robust, pre-trained nucleus segmentation and pixel quantification software for highly multiplexed fluorescence images. BMC Bioinform. 23, 46 (2022).
Article Google Scholar
Sakthivel, K. M. & Sehgal, P. A novel role of lamins from genetic disease to cancer biomarkers. Oncol. Rev. 10, 309 (2016).
PubMed PubMed Central Google Scholar
Bell, E. S. & Lammerding, J. Causes and consequences of nuclear envelope alterations in tumour progression. Eur. J. Cell Biol. 95, 449–464 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ng, A. Y. Feature selection, L1 vs. L2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning 78 (Association for Computing Machinery, 2004).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Zhao, H., Shi, J., Qi, X., Wang, X. & Jia, J. Pyramid scene parsing network. Proceedings of the IEEE conference on computer vision and pattern recognition, 2881–2890 (2017).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Article CAS Google Scholar
Campello, R. J. G. B., Moulavi, D. & Sander, J. Density-based clustering based on hierarchical density estimates. In Advances in Knowledge Discovery and Data Mining (eds Pei, J., Tseng, V. S., Cao, L., Motoda, H. & Xu, G.) 160–172 (Springer, 2013).

Download references

Acknowledgements

We thank Alyce Chen and Madison Tyler for their help with this manuscript. The work was funded by NIH grants U54-CA225088 and U2C-CA233262 to P.K.S. and S.S. and by the Ludwig Cancer Center at Harvard. Z.M. is supported by NCI grant R50-CA252138. We thank Dana-Farber/Harvard Cancer Center (P30- CA06516) for the use of its Specialized Histopathology Core.

Author information

These authors contributed equally: Clarence Yapp, Edward Novikov.

Authors and Affiliations

Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA, 02115, USA
Clarence Yapp, Edward Novikov, Won-Dong Jang, Tuulia Vallius, Yu-An Chen, Zoltan Maliga, Connor A. Jacobson, Sandro Santagata & Peter K. Sorger
Image and Data Analysis Core, Harvard Medical School, Boston, MA, 02115, USA
Clarence Yapp & Marcelo Cicconet
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
Edward Novikov, Won-Dong Jang, Donglai Wei & Hanspeter Pfister
Ludwig Center for Cancer Research at Harvard, Harvard Medical School, Boston, MA, 02115, USA
Tuulia Vallius, Sandro Santagata & Peter K. Sorger
Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 02115, USA
Sandro Santagata
Department of Systems Biology, Harvard Medical School, Boston, MA, 02115, USA
Peter K. Sorger

Authors

Clarence Yapp
View author publications
You can also search for this author in PubMed Google Scholar
Edward Novikov
View author publications
You can also search for this author in PubMed Google Scholar
Won-Dong Jang
View author publications
You can also search for this author in PubMed Google Scholar
Tuulia Vallius
View author publications
You can also search for this author in PubMed Google Scholar
Yu-An Chen
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Cicconet
View author publications
You can also search for this author in PubMed Google Scholar
Zoltan Maliga
View author publications
You can also search for this author in PubMed Google Scholar
Connor A. Jacobson
View author publications
You can also search for this author in PubMed Google Scholar
Donglai Wei
View author publications
You can also search for this author in PubMed Google Scholar
Sandro Santagata
View author publications
You can also search for this author in PubMed Google Scholar
Hanspeter Pfister
View author publications
You can also search for this author in PubMed Google Scholar
Peter K. Sorger
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study design was conceived by C.Y., W.D.J., E.N., and P.K.S. Image acquisition and annotation were done by C.Y. S.S. provided the EMIT TMA sample and validated the tissue types. TMA staining was performed by Z.M. and C.A.J. Data analysis was performed by C.Y., W.D.J., E.N., and Y.A.C. Y.A.C. and C.Y. performed the single cell quantitative analysis and analysis found in Fig. 6. Additional coding was done by M.C. Additional experiments were conducted by D.W. P.K.S., S.S., and H.P. supervised the study. All authors contributed to the writing and editing of the manuscript.

Corresponding author

Correspondence to Peter K. Sorger.

Ethics declarations

Competing interests

P.K.S. is a member of the SAB or BOD member of Applied Biomath, RareCyte Inc., and Glencoe Software, which distributes a commercial version of the OMERO database; P.K.S. is also a member of the NanoString SAB. In the last five years the Sorger lab has received research funding from Novartis and Merck. Sorger declares that none of these relationships have influenced the content of this manuscript. S.S. is a consultant for RareCyte Inc. The other authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Shan E Ahmed Raza and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Luke R. Grinham.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Data

supplementary data

nr-reporting-summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yapp, C., Novikov, E., Jang, WD. et al. UnMICST: Deep learning with real augmentation for robust segmentation of highly multiplexed images of human tissues. Commun Biol 5, 1263 (2022). https://doi.org/10.1038/s42003-022-04076-3

Download citation

Received: 01 June 2021
Accepted: 06 October 2022
Published: 18 November 2022
DOI: https://doi.org/10.1038/s42003-022-04076-3

This article is cited by

High-plex immunofluorescence imaging and traditional histology of the same tissue section for discovering image-based biomarkers
- Jia-Ren Lin
- Yu-An Chen
- Peter K. Sorger
Nature Cancer (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.