White blood cells (WBCs) are essential components of the immune system and of the body's protection against infections. The five key forms of WBCs are lymphocytes (including B and T cells), eosinophils, neutrophils, monocytes and basophils. In healthy individuals, these WBC populations have specific concentration ranges and any deviations from these parameters are clinically informative1.

Wright's staining of blood smears is one of the most common methods for detecting white blood cell aberrations2. However, it is time-consuming and laborious for a clinical pathologist to detect and identify individual cells in order to diagnose leukocyte abnormalities. Furthermore, the staining procedure is involved, which altogether makes diagnosis tied to a clinical infrastructure.

Another way to calculate WBC relative percentages is with flow cytometry, where fluorescently labeled antibodies are used to differentially mark WBC populations3. With long-established laboratory procedures, this approach is widely used in clinical practice. Modern cytometers, such as mass cytometers, can evaluate up to 40 parameters in any single measurement. However, they are limited to analyzing cell phenotypes based on the expression degree of antibody labels, similar to fluorescence-based cytometers4.

Recent efforts to further facilitate these tasks by analyzing label-free white blood cells include the use of intensity-based imaging flow cytometry3,5, in which the benefits of digital microscopy, such as the measurement of morphology, are combined with the high-throughput and statistical certainty of a flow cytometer. Using this technique, white blood cells can be detected, classified and counted. However, this method requires expensive equipment, involved sample preparation, is of low resolution, and does not provide quantitative information on cellular components.

Quantitative phase imaging (QPI)6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25 is a label-free imaging method that can evaluate pathlength changes in biological samples at the nanometer scale. QPI has a variety of medical diagnostic applications26 : Di Caprio et al. have applied QPI to study sperm morphology27 ; Marquet et al. have used QPI to examine living neurons28; Lee et al. used it to investigate cell pathophysiology29; and Din et al. have used it to perform research on macrophages and hepatocytes30.

Phase imaging approaches typically use coherent light sources, which compromises multiple factors of image quality, such as signal-to-noise ratio (SNR) and contrast, due to speckles. SLIM overcomes this drawback by using a broadband field to derive nanoscale details and dynamics in live cells using interferometry31. We previously used color spatial light interference microscopy (cSLIM) to examine piglet brain tissue32,33. This system uses a brightfield objective and an RGB camera, and generates 4 intensity images, a regular color micrograph being one of them. Thus, cSLIM simultaneously produces both a brightfield image and a phase map. This image, φ(x,y), is a data matrix relating to the nanoarchitecture of the imaged sample.

There has recently been a surge of interest in using AI to analyze relevant datasets in medical fields34,35,36,37,38,39,40,41. AI has unique image processing capabilities allowing it to detect multi-dimensional features that qualified pathologists would otherwise miss. Deep convolutional networks enable thousands of image-related feature sets to be tested to recognize complex biological data42,43.

Here, we apply phase imaging with computational specificity (PICS)44,45,46, a novel microscopy technique that combines AI computation with quantitative data, to analyze WBCs in blood smears. Specifically, we combine deep learning networks with cSLIM micrographs to detect, classify and segment four types of white bloods cells: neutrophils, lymphocytes, monocytes, and eosinophils. To the best of our knowledge, this is the first time such a strategy has been implemented. Such a system does not require staining of cells as we convert phase maps, which contain biologically relevant data into Wright’s stain brightfield images. This is unprecedented and indisputably valuable for standard clinical produces requiring the accurate assessment of WBCs without the use of tedious preparations and extraneous labels.


Phase imaging with computational specificity

Our label-free SLIM scanner comprises custom hardware and in-house developed software. The cSLIM principle of operation relies on phase shifting interferometry applied to a phase contrast setup (see Ref.31 for details). We shift the phase delay between the incident and scattered field in increments of π/2 and acquire 4 respective intensity images, which is sufficient to compute the phase image unambiguously. Figure 1A shows the optical diagram of the cSLIM system. Figure 1B and C show an example of the four phase-shifted color frames and extracted quantitative image of a blood smear, respectively. Using our in-house software and the traditional ‘stop-and-stare’ scanning method, we acquire an individual frame in roughly 0.25 s, and capture a scan of 625 frames in 10 min. The samples used here are fixed blood smears and therefore don’t move regardless of acquisition speed.

Figure 1
figure 1

Schematic setup for cSLIM. (A) The cSLIM module is attached to a commercial phase contrast microscope and uses a brightfield objective with an RGB camera. (B) The four phase-shifted color interferograms, with the initial unshifted frame corresponding to a brightfield image. (C) Computed SLIM phase image.

It should be noted that the cSLIM setup does not remove the halo artifact, as each of the four frames is acquired with the ring illumination. This artifact is therefore still present in the final quantitative map, causing fake shading around low frequency features.

Blood smears preparation

18 blood smears stained with Wright’s stain were used for our analysis. Each smear was scanned in a configuration of 12 × 12–40 frames, depending on the size of the ‘zone of morphology,’ to avoid clumped areas at the application point as well as sparse areas near the feathered edge. Examples of brightfield and counterpart phase images (3 × 4 frame stitches) of the smears are shown in Fig. 2A, B, with zoom-in instances of a neutrophil, monocyte, eosinophil, and lymphocyte in Fig. 2C–F. Each slide was imaged to produce both brightfield and quantitative channels. The classifications of boundaries and segmentations were performed manually in MATLAB using the ‘Image Labeler’ application. Basophils were omitted from analysis due to insufficient numbers needed for training and validation. The ground truth classification of the WBCs was performed by a board-certified hematopathologist. The procedures used in this study for conducting experiments using human subjects were approved by the institute review board at the University of Illinois at Urbana–Champaign (IRB Protocol Number 13900). Furthermore, all blood smear slides came already fully prepared, and all methods were carried out in accordance with relevant guidelines and regulations, and dissociated from patient statistics, with informed consent that was obtained from all subjects and/or their legal guardians.

Figure 2
figure 2

Blood smear images. (A) 3 × 4 stitch of brightfield images from a single scan. Scale bar: 200 µm. (B) corresponding SLIM images to (A). Examples of (C) a neutrophil, (D) monocyte, (E) eosinophil, and (F) lymphocyte in brightfield and phase channels. Scale bars: 5 µm.

Image-to-image translation

The purpose of converting phase images to the original brightfield version is for situations when an unstained slide is imaged or a slide is captured with a grayscale camera. The phase map created with cSLIM is the equivalent of these versions. A translation from QPI to Wright’s stain, as far we know, has never been heretofore achieved.

The conversion of SLIM micrographs to artificial brightfield images of stained WBCs was accomplished using a conditional generative adversarial network (GAN) based network called pix2pix47. GAN models have been very successfully at converting micrographs types from one modality to another34,48. The pix2pix model was chosen because converting SLIM images to brightfield counterparts is a purely pixel-wise transformation in intensity and retrieving quantitative data is not required. A total of 504 images were used for processing and among these 50 images were held out as test images. The remaining images were split between training and validation in an 8:1 ratio. In this pix2pix network, there are two components: a generator (G) and a discriminator (D). The task of the discriminator is to distinguish between a real image and the fake image generated by the generator. The generator and discriminator play an adversarial min–max game, the GAN loss can be mathematically written as:

$$L_{cGAN} (G,D) = E_{x,z} [\log D(x,y)] + E_{x,z} [\log (1 - D(x,G(x,z))]$$

where \(E_{x,z}\) is the expected value over real and fake instances, x is the input image, and z is the random noise. This trains the generator G to create artificial images which are supposed to fool the discriminator. An L1 loss is also combined with this GAN loss for more stable training. An Adam optimizer49 with a learning rate of 0.0002 was used to train the generator and the batch size was set to 2. Input SLIM images were downsampled to 512X512 to fit the GPU memory.

The semantic segmentation was performed in multiple steps as shown schematically in Fig. 3. The EfficientDet model50 was used for localization and classification of different WBC classes. A U-Net was used to generate binary segmentation maps of WBC cells. The localizations and binary maps were combined to generate semantic segmentation maps through a process described below. The same test images were used in these steps also. The training and validation data were split randomly 5 times, and with each trained model the localization, classification and segmentation was performed with the same set of testing images. The detailed descriptions of these steps are given below.

Figure 3
figure 3

Image processing. (A) The procedure for analyzing WBCs begins with an image-to-image translation with pix2pix from SLIM to brightfield. (B) The translated image is then trained with EfficientDet to locate and classify all cell types, (C) in parallel with a U-net that produces binary masks of the WBCs. (D) Finally, combining both networks enables the semantic segmentation of different WBCs in each frame.

Localization and classification of WBCs

The localization and classification of white blood cells were performed by a state-of-the-art deep learning-based object detection model, EfficientDet50. The EfficientDet model took an image as input and predicted localization-bounding boxes with associated WBC class labels. An example of a rectangular label is shown in Fig. S1A. In Fig. 3, a generic schematic is shown where the input of the EfficientDet model is a translated brightfield image. In this study, the architecture of the EfficientDet was specified as EfficientDet-D0. It uses EfficientNet-B0 as the backbone network for feature extraction. The weights of EfficientNet-B0 in the EfficientDet model were initialized with an EfficientNet-B0 that was pre-trained on the ImageNet dataset51 for an image classification task. The whole EfficientDet network was subsequently fine-tuned by use of the translated images in the training set. In the fine-tuning process, the EfficientDet was trained by minimizing a compound focal and smooth L1 loss that measures the classification errors and bounding box prediction errors, respectively50. The loss function was minimized with an Adam optimizer49 with a batch size of 8. The learning rate was set to \({5\times 10}^{-5}\), which was determined based on the network performance on the validation set. The network training was stopped if the mean average precision (mAP) of validation set did not increase for 10 consecutive epochs. The EfficientDet network weights that yielded the highest validation mAP during the training process were selected to establish the final EfficientDet model.

The trained EfficientDet was tested on the unseen testing set with mAP as the performance metric. The mean classification and localization time per frame was 140 ms. For a more robust evaluation, the training process described above was repeated five times corresponding to the five random partitions of training and validation data. These five EfficientDet models were tested on the same unseen testing set. The corresponding outcomes are discussed in the Results section.

Binary segmentation of WBCs

For binary segmentation of WBC cells, a U-Net network was used. The U-Net model took a translated brightfield image as input and predicted a binary map in which each pixel represented either WBC class or background class. An example of how cells were labeled for this is shown in Fig. S1B. In Fig. 3, a general schematic is shown where the input to the U-Net model is a translated brightfield image and the output is a binary segmentation map.

The architecture of the U-net consists of 5 blocks and 4 blocks in the expansion path and contraction path, respectively. Each block in the expansion path includes a convolutional layer, Max-pooling layer, and a BatchNorm layer; each block in the contraction path includes a convolutional layer, an up-sampling layer, and a BatchNorm layer.

The training and validation sets used for U-Net were consistent with those used for the previous EfficientDet model training. In this step, the ground truth for each image sample was the binary mask map of WBCs. The U-Net was trained by using an Adam optimizer49 to minimize the mean squared error loss that measures the difference between the ground truth segmentation map and the prediction of the U-Net. The learning rate and batch size were \({3\times 10}^{-5}\) and 2, respectively. The validation loss was monitored in the training process. The network training was stopped if there was no decrease in validation loss for 5 consecutive epochs, while we chose the weights corresponding to the lowest validation loss. The training process was repeated five times based on the same partitions of training and validation data described previously. Additionally, the testing dataset was the same as described in the previous section of image-to-image translation, with mean processing times of 110 ms per frame.

Semantic map generation

In the previous two steps, localization-classification of WBC cells using EfficientDet model and binary segmentation by U-Net model were described. This step combined the results of these previous two steps to generate semantic maps. In Fig. 3, for a given translated brightfield image, the output of the EfficientDet model and U-Net model were combined to generate the semantic segmentation map. The semantic segmentation map was generated by combining the predicted labeled boxes and binary maps in a pixel-wise manner, described as follows: the pixels outside cell regions in the binary map were classified into the background class. For pixels inside cell regions, if they also existed inside one unique labeled box, these pixels were classified into the WBC class of the associated cell box; if a pixel was contained within two or more labeled boxes, which was observed to be a very rare case in our studies, the pixel was classified into the WBC class of the labeled box with the highest confidence score according the majority class; in all other cases, the pixels were assigned to the background class.

With the five pairs of trained EfficientDet and U-Net models in the previous two steps, five semantic maps were generated for a given translated image. The five semantic maps were finally combined into one for a more reliable prediction by applying a pixel-wise majority-voting process. The combined semantic map was the final predicted semantic map for the image. Pixel-wise recall, precision, and F1 scores were used to evaluate the semantic segmentation performance of the proposed approach on the unseen testing set. The pixel-wise recall, precision, and F1 scores were computed between the predicted semantic maps and their ground truth values for all the images in the unseen test. The corresponding results are discussed in the Results section.


Our dataset included 504 images that were selected from the scans to include white blood cells. We analyzed 267 neutrophils, 117 eosinophils, 192 lymphocytes, and 82 monocytes. There were insufficient basophils to include in the set. Due to the natural proportion of white blood cells to one another, with neutrophils comprising 40–60%, lymphocytes 20–40%, monocytes 2–8%, and eosinophils 1–4% of total WBCs 52,53, it was difficult to deliberately make an even number of cells in each category. Although the SLIM images contribute new structural information, the color disparity in some of the cellular components is sometimes diminished, such as in WBC nuclei.

In Fig. 4, a sample translated brightfield image, generated by the image-to-image translation model from an input SLIM image, is shown along with the corresponding original brightfield images. Visually, both the images appear identical. Upon closer inspection, it can be seen that some of the WBC nuclei are not as dark in hue as in the original images, and the background is slightly grainy in some areas. Figure 4D shows the combined GAN and L1 loss plots of the generator. The model weights corresponding to the lowest validation loss were chosen to generate the translated brightfield images from the test set SLIM images. The quantitative results for localization, classification and segmentation are described in the following sections.

Figure 4
figure 4

Image-to-image translations. (A) Example of an input SLIM micrograph. (B) Corresponding brightfield image that serves as ground truth. (C) The translated brightfield output of the model. (D) Loss plot of the training sessions, combing GAN and L1 losses. Scale bar: 25 µm.

Localization and classification

The mean and standard deviation of mAPs for the four categories (neutrophils, eosinophils, lymphocytes, and monocytes) corresponding to five sets of testing are shown in Table 1. For a comparative analysis, we employed the same strategy to train five EfficientDet models with annotated SLIM images and ground truth brightfield images, respectively. The corresponding results are shown in Table 1.

Table 1 Localization and Classification.

In general, the brightfield and translated images both produced better results for the four WBC categories in terms of the average mAPs over the five test sets. An example of a labeled translated image is shown in Fig. 5A. In this case, there is an eosinophil in the bottom left corner that is recognized and correctly labeled with 99% certainty. In the top right corner, a neutrophil is correctly labeled with 91% certainty. These values are not indicative of all respective cell types, even in the case of localization and classification, but only pertain to these specific cases. The Precision-Recall curves for each WBC in all tested translated images are shown in Fig. 5B, with comparisons of original image types in Fig. S2. The best performance is that of the eosinophils, with a precision score of 90.09%, likely due to a very distinctive red granular cytoplasm, and the lowest is that of lymphocytes, with 56.6%, likely because many large lymphocytes appear similar to small monocytes, even to a trained pathologist. These translated images were produced using only quantitative phase image input and can therefore be regarded as being equivalent to unlabeled blood smears.

Figure 5
figure 5

Localization and classification. (A) Example of an input translated brightfield micrograph with an eosinophil and neutrophil located and classified with 99% and 91% certainty, respectively. (B) Precision-Recall curves for all cell categories in all images. Scale bar: 20 µm.


The results of the semantic segmentation on translated images are listed in Tables 2, 3. These numbers are based on pixel-wise F1 scores. Eosinophils had the highest scores, with 86.7%, and neutrophils and lymphocytes had the lowest, with 68.5%, for reasons similar to those in localization and classification tasks. These results confirm the reliability of the translated model to convert unseen, unstained quantitative phase images into typical Wright’s stain brightfield images.

Table 2 Semantic Segmentation.
Table 3 Majority Voting.

An example of a WBC segmentation is shown in Fig. 6 for SLIM, brightfield and translated cases. In this example there is a neutrophil in the top right corner, one in the bottom left corner, and a lymphocyte in the bottom left corner. Figure 6D–F are the predicted labels from the model, and Fig. 6G–I are the ground truth, with further examples presented in Fig. S3. Both brightfield and translated images have predicted labels similar to the ground truth, with all three cells correctly identified.

Figure 6
figure 6

Semantic segmentation results. (A) Example SLIM input, (B) example brightfield input, and (C) example translated input. (D–F) Corresponding predicted labels, and (G–I) corresponding ground truth labels. Scale bars; 20 µm.


Here, we present evidence that our method of combining AI with color spatial light interference microscopy (cSLIM) can quickly identify different white blood cells, such as neutrophils, monocytes, lymphocytes and eosinophils, without manual analysis. This is an important contribution to blood smear analysis, especially given the significance and multitude of leukocyte complications. We demonstrated that applying AI to cSLIM images delivers excellent performance in first artificially generating brightfield micrographs and afterwards localizing and classifying different WBCs. The results for all four categories indicate that the proposed method may be useful in quick screenings for cases of suspected leukocyte disorders.

Not only does this technique offer automatic screening, but multiple blood smear slides can be evaluated rapidly as the overall throughput of the cSLIM scanner is comparable with that of commercial whole slide scanners. Inferring additional information through digital staining would be one way to improve upon these results other than simply adding more images, while keeping the samples label-free. This has recently been accomplished with phase images54, RI tomography55, and autofluorescence56. In our case, artificially recreating various fluorescent tags to identify specific components of white blood cells, such as a molecular tag for human neutrophil elastase (HNE), could help enhance our results. Future scope includes evaluating more images with sufficient instances of basophils and bands to add to the current WBC categories list, as well as imaging blood smears with specific leukocyte abnormalities, such as autoimmune neutropenia and leukemia.