## Introduction

Rapid and accurate estimation of the viability of biological cells is important for assessing the impact of drugs, physical or chemical stimulants, and other potential factors in cell function. The existing methods to evaluate cell viability commonly require mixing a population of cells with reagents to convert a substrate to a colored or fluorescent product1. For instance, using membrane integrity as an indicator, the live and dead cells can be separated by trypan blue exclusion assay, where only nonviable cells are stained and appear as a distinctive blue color under a microscope2,3. MTT and XTT assay estimate the viability of a cell population by measuring the optical absorbance caused by formazan concentration due to alteration in mitochondrial activity4,5,6. Starting in the 1970s, fluorescence imaging has developed as a more accurate, faster, and reliable method to determine cell viability7,8,9,10. Similar to the principle of trypan blue test, this method identifies individual nonviable cells by using fluorescent reagents only taken up by cells that lost their membrane permeability barrier. Unfortunately, the step of exogenous labeling generally requires some incubation time for optimal staining intensity, making all these methods difficult for quick evaluation. Importantly, the toxicity introduced by stains eventually kills the cells and, thus, prevents the long-term investigation.

Quantitative phase imaging (QPI) is a label-free modality that has gained significant interest due to its broad range of potential biomedical applications11,12. QPI measures the optical phase delay across the specimen as an intrinsic contrast mechanism, and thus, allows visualizing transparent specimen (i.e., cells and thin tissue slices) with nanometer scale sensitivity, which makes this modality particularly useful for nondestructive investigations of cell dynamics (i.e. growth, proliferation, and mass transport) in both 2D and 3D13,14,15,16,17,18. In addition, the optical phase delay is linearly related to the non-aqueous content in cells (referred to as dry mass), which directly yields biophysical properties of the sample of interest19,20,21,22. More recently, with the concomitant advances in deep learning, we have witnessed exciting avenues for label-free imaging. In 2018, Google presented “in silico labeling”, a deep learning based approach that can predict fluorescent labels from transmitted-light (bright field and phase contrast) images of unlabeled samples23. Around the same time, researchers from the Allen Institute showed that individual subcellular structures such as DNA, cell membrane, and mitochondria can be obtained computationally from bright-field images24. Because a QPI map quantitatively encodes structure and biophysical information, it is possible to apply deep learning techniques to extract subcellular structures25,26, perform signal reconstruction27,28, correct image artifacts29,30, convert QPI data into virtually stained or fluorescent images31,32, and diagnose and classify various specimens33,34.

In this article, we demonstrate that rapid viability assay can be conducted in a label-free manner using spatial light interference microscopy (SLIM)35,36, a highly sensitive QPI method, and deep learning. We apply the concept of phase imaging with computational specificity (PICS) to digitally stain for the live and dead markers. Demonstrated on live adherent HeLa and CHO cell cultures, we predict the viability of individual cells measured with SLIM by using a joint EfficientNet37 and transfer learning38 strategy. Using the standard fluorescent viability imaging as ground truth, the trained neural network classifies the viable state of individual cell with 95% accuracy. Furthermore, by tracking the cell morphology over time, unstained HeLa cells show significantly higher viability compared to the cells stained with viability reagents. These findings suggest that the PICS method enables rapid, nondestructive, and unbiased cell viability assessment, potentially valuable to a broad range of biomedical problems, from drug testing to the production of biopharmaceuticals.

## Results

The procedure of image acquisition is summarized in Fig. 1. We employed spatial light interference microscopy (SLIM)35 to measure the quantitative phase map of cells in vitro. The system is built by attaching a SLIM module (CellVista SLIM Pro, Phi Optics, Inc.) to the output port of an existing phase-contrast microscope (Fig. 1a). By modulating the optical phase delay between the incident and the scattered field, a quantitative phase map is retrieved from four intensity images via phase-shifting interferometry39. SLIM employs broadband LED as an illumination source and common-path imaging architecture, which yields sub-nanometer sensitivity to optical pathlength changes and high temporal stability39,40. By switching to epi-illumination, the optical path of SLIM is also used to record the fluorescent signals over the same field of view. Detailed information about the microscope configuration can be found in Methods.

To demonstrate the feasibility of the proposed method, we imaged and analyzed live cell cultures. Before imaging, 40 μL of each cell-viability-assay reagent (ReadyProbes Cells Viability Imaging Kit, Thermofisher) was added into 1 ml growth media, and the cells were then incubated for approximately 15 min to achieve optimal staining intensity. The viability-assay kit contains two fluorescently labeled reagents: NucBlue (the “live” reagent) combines with the nuclei of all cells and can be imaged with a DAPI fluorescent filter set, and NucGreen (the “dead” reagent) stains the nuclei of cells with compromised membrane integrity, which is imaged with a FITC filter set. In this assay, live cells produce a blue-fluorescent signal; dead cells emit both green and blue fluorescence; The procedure of cell culture preparation can be found in Methods.

After staining, the sample was transferred to the microscope stage, and measured by SLIM and epi-fluorescence microscopy. In order to generate a heterogeneous cell distribution that shifts from predominantly alive to mostly dead cells, the imaging was performed under room conditions, such that the low-temperature and imbalanced pH level in the media would adversely injure the cells and eventually cause necrosis. Recording one measurement every 30 or 60 min, the entire imaging process lasted for approximately 10 h. We repeated this experiment four times to capture the variability among different batches. Figure 1b shows the SLIM images of HeLa cells measured at t = 1 h, 6, and 8.5 h, respectively, and the corresponding fluorescent measurements are shown in Fig. 1c, d. The results in Fig. 1 show that the adverse environmental condition continues injuring the cell, where blebbing and membrane disruption could be observed during cell death. Our QPI measurements agree with the results reported in previous literature41. On the other hand, these morphological alterations are correlated with the changes in fluorescence signals, where the intensity of NucGreen (“dead” fluorescent channel) continuously increases, as cells transit to dead states. By comparing the relative intensity between NucGreen and NucBlue signals, semantic segmentation maps are generated to label individual cell as either live or dead, as shown in Fig. 1e. The procedure of generating the semantic maps can be found in Supplemental Note 1. All collected image sequences were combined to form a dataset for PICS training and testing, where each sequence is a time-lapse recording of cells from live to dead states. Then we randomly split the sequences with a ratio of approximately 6:1:1, to obtain training, validation, and testing dataset, respectively. Instead of splitting by frame, we generated a training dataset by dividing image sequences to ensure fair generalization. In addition, we combined data across all measurements to take underrepresented cellular activities into account, which makes the purposed method generalizable.

### Deep neural network architecture, training, validation, and testing

With fluorescence-based semantic maps as ground truth, a deep neural network was trained to assign “live”, “dead”, or background labels to pixels in the input SLIM images. We employed a U-Net based on EfficientNet (E-U-Net)37, with its architecture shown in Fig. 2a. Compared to conventional U-Nets, the E-U-Net uses EfficientNet37, a powerful network of relatively lower complexity, as the encoding part. This architecture allows for learning an efficient and accurate end-to-end segmentation model, while avoiding training a very complex network. The network was trained using a transfer learning strategy38 with a finite training set. At first, the EfficientNet of E-U-Net (the encoding part) was pre-trained for image classification on a publicly available dataset ImageNet42. The entire E-U-Net was then further fine-tuned for a semantic segmentation task by using labeled SLIM images from the training and validation set.

The network training was performed by updating the weights of parameters in the E-U-Net using an Adam optimizer43 to minimize a loss function that is computed in the training set. More details about the EfficientNet module and loss function can be found in the Methods and Supplemental Note 2. The network was trained for 100 epochs. At the end of each epoch, the loss function related to the being-trained network was evaluated, and the weights that yielded the lowest loss on the validation set were selected for the E-U-Net model. Figure 2d shows training and validation loss vs. the number of epochs, using 899 and 199 labeled images as training and validation datasets. The Methods section and Fig. 2a–c present more details about the E-U-Net architecture and network training.

To demonstrate the performance of phase imaging with computational specificity (PICS) as a label-free live/dead assay, we applied the trained network to 200 SLIM images not used in training and validation. Figure 3a shows the three representative testing phase maps, whereas corresponding ground truth and PICS prediction are shown in Fig. 3b, c, respectively. This direct comparison indicates that PICS successfully classifies the cell states. We found that, most often, the incorrect predictions were caused by cells located at the boundary of FOV, where only a portion of their cell bodies was measured by SLIM. In addition, PICS may fail when cells become detached from the well plates. In this situation, the suspended cells appear out of focus, which gives rise to inaccurate prediction. As reported in previous publications, the conventional deep learning evaluation metrics focus on assessing pixel-wise segmentation accuracy, which overlooks some biologically relevant instances44. Here, we adopted an object-based evaluation metric, which relies on comparing the dominant semantic label between the predicted cell nuclei and the ground truth for individual nuclei. The confusion matrix and the corresponding evaluation (e.g., precision, recall, and F1-score) are shown in Table 1. A comparison with standard pixel-wise evaluation and procedure of object-based evaluation are included in Supplemental Note 3. The entries of the confusion matrix are normalized with respect to the number of cells in each category. Using the average F1 score across all categories as an indicator of the overall performance, this PICS strategy reports a 96.7% confidence in distinguishing individual live and dead HeLa cells.

### PICS on CHO cells

Chinese hamster ovary (CHO) cells are often used for recombinant protein production, and it received U.S. FDA approval for bio-therapeutic protein production. Here, we demonstrate that our label-free viability assay approach is applicable to other cell lines of interest in pharmaceutical applications. CHO cells were plated on a glass-bottom 6-well plate for optimal confluency. In addition to NucBlue/NucGreen staining, 1 μM of staurosporine (apoptotic inducing reagent) solution was added to the culture medium. This potent reagent permeates the cell membrane and disrupts protein kinase, cAMP, and leads to apoptosis in 4–6 h. The cells were then measured by SLIM and epi-fluorescence microscopy. The cells were maintained in regular incubation conditions (37 °C and 5% concentration of CO2) throughout the experiment. In addition, we verified that the cells were not affected by necrosis and lytic cell death (see Supplemental Note 4). After image acquisition, E-U-Net (EfficientNet-B7) training was immediately followed. In the training process, 1536 labeled SLIM images and 288 labeled SLIM images were used for network training and validation, respectively. The structure of EfficientNet-B7, training, and validation loss can be found in Fig. S3a, b, respectively. The trained E-U-net was finally applied to 288 unseen testing images to test the performance of dead/viability assay. The procedure of imaging, ground truth generation, and training was consistent with the previous experiments.

Figure 4a shows the time-lapse SLIM image of CHO cells measured at t = 0, 2, and 10 h after adding apoptosis reagent, and the corresponding viability map determined by fluorescence signal and PICS are plotted in Fig. 4b, c, respectively. In contrast to necrosis, the cell bodies became gradually fragmented during apoptosis. The visual comparison in Fig. 4 suggests that PICS yields good performance in extracting cell nucleus and predicting their viable state. Running an evaluation on individual cells, as shown in Table 2, the network gives an average F-1 score of 94.9%. Again, the inaccurate prediction is mainly caused by cells at the boundary of the FOV. We also found rare cases where cells show features of cells death at early stage45,46,47, but it was identified as live by traditional fluorometric evaluation (for example, see Fig. S5 in the Supplemental Information). Furthermore, because most of the cells stay adherent, the PICS accuracy was not affected by cell confluence. The evaluation metrics at different confluence levels are included in the Supplemental Note 4.

### PICS on unlabeled HeLa cells

Performing viability assay on unlabeled cells essentially circumvents the cell injury effect caused by exogenous staining and produces an unbiased evaluation. To demonstrate this feature on a different cell type, a fresh HeLa cell culture was prepared in a 6-well plate, transferred to the microscope stage, and maintained under room conditions. Half of the wells were mixed with viability assay reagents, where the viability was determined by both PICS and fluorescence imaging. The remaining wells did not contain reagents, such that the viability of these cells was only evaluated by PICS. The procedure of cell preparation, staining, and microscope settings were consistent with the previous experiments. We took measurements every 30 min, and the entire experiment lasted for 12 h.

Figure 5a and c shows SLIM images of HeLa cells with and without fluorescent reagents at t = 0, 2.5, and 12 h, respectively, whereas the resulting PICS predictions are shown in Fig. 5b and d. Supplemental Video 1 shows a time-lapse SLIM measurement, PICS prediction, and standard live-dead assay based on fluorescent measurements. Supplemental Video 2 shows HeLa cells without reagents. As expected, the PICS method depicts the transition from live to dead state. In addition, the visual comparison from Fig. 5a–d suggests that HeLa cells with viability stains in the media appear smaller in size, and more rapidly enter the injured state, as compared to their label-free counterparts. Using TrackMate48, an ImageJ plugin, we were able to extract the trajectory of individual cells and track their morphology over time. As a result, the cell nucleus, area, and dry mass at each moment in time can be obtained by integrating the pixel value over the segmented area in the PICS prediction and SLIM image, respectively. We successfully tracked 57 labeled and 34 unlabeled HeLa cells. Figure 5e, f shows the area and dry mass change (mean ± standard error), where the values are normalized with respect to the one at t = 0. Our results of tracking agree with the physiological description49,50, and are consistent with previously reported experimental validations46,51. However, the short swelling time in the reagent-treated cells suggests the toxicity of the chemical compounds would potentially accelerate the pace of cell death. Running two-sample t-tests, we found a significant difference in cell nuclear areas between the labeled and unlabeled cells, during the interval t = 2 and t = 7 h (p < 0.05). Similarly, cell dry mass showed significant differences between the two groups during the time interval t = 2 and t = 5 h (p < 0.05). In this study, we focus on optimizing the PICS performance in classifying live/dead markers at the cellular level. At the pixel level, the trained network can reveal the cell shape change, but its performance in capturing the nucleus shape and area is limited, which makes the current approach subject to segmentation error. This is largely due to the low contrast between the nucleus boundary and cytoplasm in injured cells.

Although the effect of the fluorescent dye itself on the optical properties of the cell at the imaging wavelength is negligible52,53,54,55, training on images of tagged cells may potentially alter the cell death mechanism and introduce bias when optimizing the E-U-Net. To investigate this potential concern, we performed a set of experiments where the unlabeled cells were imaged first by SLIM, then tagged and imaged by fluorescence for ground truth. As described in Supplemental Note 4, we found that the performance of PICS, in this case, was consistent with the results shown in Figs. 3 and 4, where SLIM was applied to tagged cells. The data indicated that the live and dead cells were classified with 99% and 97% sensitivity, respectively, suggesting that the proposed live-dead assay method can be used efficiently on cells that were never labeled. Of course, SLIM imaging of already stained cells, followed by fluorescence imaging, is a more practical workflow, as the input-ground truth image pairs can be collected continuously. On the other hand, training on unlabeled cells allows us to achieve the true label-free assay which is most valuable in applications.

## Discussion

We demonstrated PICS as a method for high-speed, label-free, unbiased viability assessment of adherent cells. This approach utilizes quantitative phase imaging to record high-resolution morphological structures of unstained cells, combined with deep learning techniques to extract intrinsic viability markers. Tested on HeLa and CHO adherent cultures, our optimized E-U-Net method reports outstanding accuracy of 96.7% and 94.9% in segmenting the cell nuclei and classifying their viability state. In Supplemental Note 5, we compared the E-U-Net accuracy with the outcomes from other networks or training strategies. By integrating the trained network on NVIDIA graphic processing units, the proposed label-free method enables real-time acquisition and viability prediction (see Supplemental Video 3 for a demonstration). One SLIM measurement and deep learning prediction take ~100 ms, which is approximately 8 times faster than the acquisition time required for fluorescence imaging with the same camera. Of course, the cell staining process itself takes time, approximately 15 min in our case. The real-time in situ feedback is particularly useful in investigating viability state and growth kinetics in cells, bacteria, and samples in vivo over extended periods of time56,57,58,59. In addition, results suggest that PICS rules out the adverse effect on cell function caused by the exogenous staining, which is beneficial for the unbiased assessment of cellular activity over a long time (e.g., many days). Of course, this approach can be applied to other cell types and cell death mechanisms.

Prior studies typically tracked QPI parameters associated with individual cells over time to identify morphological features correlated with cell death45,46,51. In contrast, our approach provides a real-time classification of cells based on single frames, which is a much more challenging and rewarding task. Compared to these previous studies, our PICS method avoids intermediate steps of feature extraction, manual annotation, and separate algorithms for training & cell classification. We employ a single DNN architecture with direct QPI measurement as input, and the prediction accuracy is significantly improved over the previously reported data47. The labels output by the network can be used to create binary masks, which in turn yield dry mass information from the input data. The accuracy of these measurements depends on the segmentation process. Thus, we anticipate that future studies will optimize further the segmentation algorithms to yield high-accuracy dry mass measurements over long periods of time.

Label-free imaging methods are valuable for studying biological samples without destructive fixation or staining. For example, by employing infrared spectroscopy, the bond-selective transient phase imaging measures molecular information associated with lipid droplets and nucleic acids60. In addition, harmonic optical tomography can be integrated into an existing QPI system to report specifically on non-centrosymmetric structures61. These additional chemical signatures would potentially enhance effective learning and produce more biophysical information. We anticipate that the PICS method will provide high-throughput cell screening for a variety of applications, ranging from basic research to therapeutic development and protein production in cell reactors11. Because SLIM can be implemented as an upgrade module onto an existing microscope and integrates seamlessly with fluorescence, one can implement this label-free viability assay with ease.

## Methods

### Cell preparation

HeLa cervical cancer cells (ATCC CCL-2TM) and Chinese hamster ovary (CHO-K1 ATCC CCL-61TM) cells were purchased from ATCC and kept frozen in liquid nitrogen. Before the experiments, we thawed and cultured the cells into a T75 flask in Dulbecco’s Modified Eagle Medium (DMEM with low glucose) containing 10% fetal bovine serum (FBS) and incubated in 37 °C with 5% CO2. As the cells reach 70% confluence, the flask was washed thoroughly with phosphate-buffered saline (PBS) and trypsinized with 3 mL of 0.25% (w/v) Trypsin EDTA for three minutes. When the cell starts to detach, the cells were suspended in 5 mL DMEM and passaged onto a glass-bottom 6 well plate to grow. To evaluate the effect of confluency on PICS performance, CHO cells were plated in three different confluency levels: high (60,000 cells), medium (30,000 cells), and low (15000 cells). HeLa and CHO cells were then imaged after two days.

### SLIM imaging

The SLIM optical setup is shown in Fig. 1a. In brief, the microscope is built upon an inverted phase-contrast microscope with a SLIM module (CellVista SLIM Pro; Phi Optics) attached to the output port. Inside the module, a spatial light modulator (Meadowlark Optics) is placed at the system pupil plane via a Fourier transform lens to constantly modulate the phase delay between the scattered and incident light. By recording four intensity images with phase shifts of 0, π/2, π, and 3π/2, a quantitative phase map, φ, can be computed by combining the 4 acquired frames in real-time.

For both SLIM and fluorescence imaging, cultured cells were measured by a 40× objective, and the images were recorded by a CMOS camera (ORCA-Flash 4.0; Hamamatsu) with a pixel size of 6.5 μm. For each sample, we randomly selected a cellular region approximately 800 × 800 µm2 to be measured by SLIM and fluorescence microscopy (NucBlue and NucGreen). The acquisition time of each SLIM and fluorescent measurements are 50 ms and 400 ms, respectively, and the scanning across all 6 wells takes roughly 4.3 min, where the delay is caused by mechanical translation of the motorized stage. For deep learning training and predicting, the recorded SLIM images were downsampled by a factor of 2. This step saves computational cost and does not sacrifice information content. We would like to point out that the acquisition of the fluorescence data is needed only for the training stage. For real-time interference, our acquisition is up to 15 frames per second for SLIM images, while the inference takees place in parallel.

### E-U-Net architecture

The E-U-Net is a U-Net-like fully convolutional neural network that performs an efficient end-to-end mapping from SLIM images to the corresponding probability maps, from which the desired segmentation maps are determined by the use of a softmax decision rule. Different from conventional U-Nets, the E-U-Net uses a more efficient network architecture, EfficientNet37, for feature extraction in the encoding path. Here, EfficientNets refers to a family of deep convolutional neural networks that possess a powerful capacity of feature extraction but require much fewer network parameters compared to other state-of-the-art network architectures, such VGG-Net, ResNet, Mask R-CNN, etc. The EfficientNet family includes eight network architectures, EfficientNet-B0 to EfficientNetB7, with an increasing network complexity. EfficientNet-B3 and EfficientNet-B7 were selected for training E-U-Net on HeLa cell images and CHO cell images, respectively, considering they yield the most accurate segmentation performance on the validation set among all the eight EfficientNets. See Supplemental Note 2 and Fig. 2b, c for more details about the EfficientNet-B3 and EfficientNet-B7.

### Loss function and network training

Given a set of $$B$$ training images of M × N pixels and their corresponding ground truth semantic segmentation maps, loss function used for network training is defined as the combination of focal loss62 and dice loss63:

$${L}_{{Focal\_loss}}=-\frac{1}{B}\mathop{\sum}\limits_{i=1}^{B}\frac{1}{MN}\mathop{\sum}\limits_{x\in \varOmega }{[1-{y}_{i}{(x)}^{T}{p}_{i}(x)]}^{\gamma }{y}_{i}{(x)}^{T}{\log }_{2}{p}_{i}(x),$$
(1)
$${L}_{Dice\_loss}=1-\frac{1}{3}\mathop{\sum }\limits_{c=0}^{2}\frac{2T{P}_{c}}{2TP+F{P}_{c}+F{N}_{c}}$$
(2)
$${L}_{combined}=\alpha {L}_{Focal\_loss}+\beta {L}_{Dice\_loss}$$
(3)

In the focal loss LFocal_loss, $$\varOmega =\{(1,1),(1,2),{{{{\mathrm{..}}}}}.,(M,N)\}$$ is the set of spatial locations of all the pixels in a label map. $${y}_{i}(x)\in \{{[1,0,0]}^{T},{[0,1,0]}^{T},{[0,0,1]}^{T}\}$$ represents the ground-truth label of the pixel x related to the ith training sample, and the three one-hot vectors correspond to the live, dead and, background classes, respectively. Accordingly, the probability vector $${{{{{{\bf{P}}}}}}}_{i}(x)\in {{\mathbb{R}}}^{3}$$ represents the corresponding predicted probabilities belonging to the three classes. $${[1-{y}_{i}{(x)}^{T}{p}_{i}(x)]}^{\gamma }$$ is a classification error-related weight that reduces the relative cross-entropy $${y}_{i}{(x)}^{T}{\log }_{2}{p}_{i}(x)$$ for well-classified pixels, putting more focus on hard, misclassified pixels. In this study, γ was set to be the default value of 2 as suggested in Ref. 62. As the dice loss LDice_loss, the TPc, FPc, and FNc are the number of true positives, that of false positives, and that of false negatives, respectively, related to all pixels of viability class $$c\in \{0,1,2\}$$ in the B images. Here, c = 0, 1, and 2 correspond to the live, dead, and background classes, respectively. In the combined loss function, $$\alpha ,\beta \in \{0,1\}$$ are two indicators that control whether to use focal loss and dice loss in the training process, respectively. In this study, α and β were set to [1, 0] and [1, 1] for training the E-U-Nets on the HeLa cell dataset and CHO cell dataset, respectively. The choices of [α, β] were determined by the segmentation performance of the trained E-U-Net on the validation set.

The E-U-Net was trained with randomly cropped patches of 512 × 512 pixels drawn from the training set by minimizing the loss function defined above with an Adam optimizer43. In regard to Adam optimizer, the exponential decay rates for 1st and 2nd moment estimates were set to 0.9 and 0.999, respectively; a small constant ɛ for numerical stability was set to 10−7. The batch sizes were set to 14 and 4 for training the E-U-nets on the HeLa cell images and CHO cell images, respectively. The learning rate was initially set to 5 × 10−4. At the end of each epoch, the loss of a being-trained E-U-Net was computed on the whole validation set. When the validation loss did not decrease for 10 training epochs, the learning rate was multiplied by a factor of 0.8. This validation loss-aware learning rate decaying strategy benefits for mitigating the overfitting issue that commonly occurs in deep neural network training. Furthermore, data augmentation techniques, such as random cropping, flipping, shifting, and random noise and brightness adding, etc., were employed to augment training samples on the fly for further reducing the overfitting risk. The E-U-Net was trained for 100 epochs. The parameter weights that yield the lowest validation loss were selected and subsequently used for model testing and further model investigation.

The E-U-Net was implemented using the Python programming language with libraries including Python 3.6 and Tensorflow 1.14. The model training, validation, and testing were performed on an NVIDIA Tesla V100 GPU of 32 GB VRAM.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.