Deep transfer learning-based hologram classification for molecular diagnostics

Lens-free digital in-line holography (LDIH) is a promising microscopic tool that overcomes several drawbacks (e.g., limited field of view) of traditional lens-based microcopy. However, extensive computation is required to reconstruct object images from the complex diffraction patterns produced by LDIH. This limits LDIH utility for point-of-care applications, particularly in resource limited settings. We describe a deep transfer learning (DTL) based approach to process LDIH images in the context of cellular analyses. Specifically, we captured holograms of cells labeled with molecular-specific microbeads and trained neural networks to classify these holograms without reconstruction. Using raw holograms as input, the trained networks were able to classify individual cells according to the number of cell-bound microbeads. The DTL-based approach including a VGG19 pretrained network showed robust performance with experimental data. Combined with the developed DTL approach, LDIH could be realized as a low-cost, portable tool for point-of-care diagnostics.

human experts for many data sets [15][16][17][18] . In this paper, we took deep transfer learning (DTL) [19][20][21][22][23][24][25] approach to classify raw holograms and compared them with other ML schemes including convolutional neural networks (CNN) [26][27][28] . DTL extracts feature information from input data using the convolution part of pre-trained networks and subsequently feeds the information to classifiers. It has been known that pretrained networks can be exploited as a general-purpose feature extractor 20 . In this DTL approach, we used a VGG19 29 model that was pretrained with a large number of ordinary images (i.e., not holograms) available in the ImageNet 30 , and fine-tuned the classifier to obtain high-performance classification. We applied these approaches to classifying holograms generated from cells and microbeads without a reconstruction process. Specifically, algorithms were developed to i) automatically detect the holograms of cells labeled with microbeads, ii) classify detected cells according to the number of the cell-bound beads, and iii) construct the histogram of the cell-bound beads from the entire hologram. We found that a DTL approach offered more reliable, robust, and efficient performance in hologram classification than the conventional CNN.

Results
System and assay setup. Figure 1A shows the schematic of the LDIH system 3 . As a light source, we used a light-emitting diode (LED; λ = 420 nm). The light passes through a circular aperture (diameter, 100 µm), generating a coherent spherical wave on the sample plane. The incident light and the scattered light from the sample interfere with each other to generate holograms which are then recorded by a CMOS imager 10,31 . The system has a unit (×1) optical magnification, resulting in a field-of-view equal to the imager size.
To enable molecular-specific cell detection, we used antibody-coated microbeads (diameter, 6 µm) for cell labeling. The number of attached beads is proportional to the expression level of a target marker, allowing for quantitative molecular profiling 3 . Diffraction patterns from unlabeled and bead-bound cells have subtle differences that are hard to detect with human eyes (Fig. 1B). Only after image reconstruction can beads and cells be differentiated and counted; cells have high amplitude and phase values, whereas microbeads have negligible phase values.
Reconstruction-free ML approaches. Conventional LDIH reconstruction ( Fig. 2A) requires multiple repetitions of back-propagation, constraint application, and transformation 8 . This iterative algorithm is computationally intensive, either incurring long processing time or requiring high-end resources (e.g., a high-performance graphical processing unit server) for faster results 3 . Furthermore, human curation is occasionally needed to correct for stray reconstruction (e.g., debris, twin images). In contrast, our ML-based approach is a reconstruction-free classification method (Fig. 2B). As an off-line task, we first built a training dataset by automatically detecting cell candidates and cropping them from the entire holograms. Then, we labeled each cropped hologram with the number of the cell-bound beads using reconstructed image as ground truth. Next, we trained a network using annotated holograms of bead-bound cells. After the training was complete, the network was used for on-line classification tasks; cell candidate holograms were detected and their holograms, without any image preprocessing, were entered as input for classification based on the number of cell-bound beads. Finally, the histograms of the cell-bound beads from the entire holograms were created for molecular diagnosis.
Both off-line and on-line tasks in the ML approach (Fig. 2B) required the automatic detection of holographic patterns of cells. To achieve this task, we implemented a computational method which identifies the center of individual diffraction patterns 32 . First, the images of the gradient magnitude of the holograms were generated. Next, they were thresholded based upon their intensities. The converging directions of the gradients were used to estimate the positions of cell candidates in the holograms ( Fig. 3A; see Methods). Using this method, we detected 2729 potential cell candidates from 31 holograms. The samples for these holograms were prepared by labeling SkBr3 breast carcinoma cells with polystyrene beads conjugated with control, EpCAM, and HER2 antibodies. Then, we reconstructed object images and cropped the holograms and the object images (270 × 270 pixels). We labeled the cropped holograms (Fig. 3C) and their reconstructed object images ( Fig. 3D) with the number of the bead attached to a cell (N B : 0, 1, 2, 3, ≥4). There were also images of floating beads, multiple cells, and artifacts, which were collectively labeled as 'background' (BG). The distribution of the class in the final training set is shown in Fig. 3B. Visualization of hologram features. We first tested the feasibility of the reconstruction-free classification by visualizing the features extracted from the holograms. Using the VGG19 pretrained model, we extracted features from the training set of holograms (Fig. 4A). Since VGG19 was trained using color images (RGB channels) and our data were grayscaled, we copied the same image in each channel in the VGG-19 pretrained model. Then, using PCA (Principal Component Analysis), we reduced the feature dimension from 32,768 to 500 and visualized their distribution using t-SNE plots 33 . In both holograms (Fig. 4B) and object images (Fig. 4C), each class of bead-bound cells was visually more segregated than they were in the cases where only the same PCA was applied without using VGG19 (Fig. 4D,E), suggesting that VGG19-based features of the holograms could discriminate the numbers of cell-bound beads.
Classification results by deep transfer learning. Using the features from VGG19-PCA or PCA alone, we trained the multilayer perceptron (MLP, Fig. 5A) separately for holograms or object images. Since the training data were unbalanced (Fig. 3B), we took the following approaches: To balance the training set of cell-bound beads, we applied data augmentation (rotation and zoom-in) to increase the data size in the case of N B ≥1 (see Methods). Then, to address the unbalance between bead-bound cells (N B : 0 to ≥4) and background (BG) data we used a weighted cost function using the proportion of bead-bound cells to BG data. From the whole dataset consisting of 2729 cropped images from 31 holograms, we randomly split the data into training, validation, and testing dataset with a 64:16:20 ratio. The model was selected based on the validation loss, and the model performance was evaluated based on the testing data. For statistical analysis, we repeated the training 20 times with different data splitting (see Methods for detail).
Since cells with more than two bead attachments are considered positive for a given target biomarker 3 , we first performed the binary classification (N/P) based on the bead number (N B ): negative (N B ≤ 1) vs. positive (N B ≥ 2). The accuracies of VGG19-PCA-MLP in N/P were 90.2% for holograms and 93.4% for object images, whereas the accuracies of PCA-MLP were only 79.5% for holograms and 76.8% for object images ( Fig. 5B and Supplementary Table 1). Since the background (BG) data are included in real situations, we also trained the classifiers after adding the BG class (N/P + BG). In comparison to the N/P classification, the accuracies were still similar (VGG19-PCA-MLP: 90.4% for holograms and 92.3% for object images, PCA-MLP: 79.1% for holograms and 76.4% for object images) ( Fig. 5B and Supplementary Table 1). The sensitivity and specificity also showed that VGG19-PCA-MLP outperformed PCA-MLP in all cases (Fig. 5C, Supplementary Tables 2 and 3).
While this binary classification for the negative and positive cells can be applied to molecular diagnostics, the actual number of the beads and their distribution from a given patient sample can provide more detailed information including cancer stages and patient sub-types. Therefore, we trained the classifiers based on the number of the cell-bound beads. When we performed the classification using the number of beads (0, 1, 2, 3, ≥4),  To quantitatively compare the classification performance among all classification cases, we employed the Cohen's kappa coefficient 34   classifiers has better performance than N B classifiers since multi-category classification is more prone to error than binary classification. ii) The VGG19-PCA-MLP outperforms PCA-MLP in all cases. iii) While the classification using holograms showed good performance, the classification using object images performs marginally better than holograms. To see the performance of the multi-category classification more closely, we computed the average confusion matrices of the bead classification (N B , N B + BG). The prediction accuracy was high when the number of beads is 0 or ≥4, or the BG class, and the accuracies decreased when the bead number was between 1 and 3 ( Fig. 5F-I).
Since the high occurrence values of the confusion matrices were near the diagonals, the misclassification mainly happened among neighboring numbers for both holograms and object images. This property makes the molecular profiling from the entire holograms less susceptible to mis-classification error. To compare the performances of different classifiers in our DTL approach, we also trained SVM (Support Vector Machine) and RF (Random Forest) models using the same VGG19-PCA features. In both N/P and N B classifications, MLP outperformed SVM and RF significantly (see p-values in Supplementary Tables 6 and 7). We overlaid the actual and predicted distributions of cell-bound beads from 18 different samples, whose number of detected cell candidates are more than 15, excluding BG (Fig. 6A). We also plotted the histograms of the actual and predicted numbers of the cell-bound beads in each sample (Fig. 6B). These show that the predicted bead proportions matched well with the actual distribution. Also, the mean difference between the proportions of the actual and the prediction was within 5% (Fig. 6C). This suggests that our multi-category classification based on the number of the cell-bound beads can be used to characterize the molecular profiles of the cancer cell population from a patient sample.
Roles of VGG19 pretrained model. To evaluate the role of the pretrained model in our classification, we also trained a conventional convolutional neural network (CNN) de novo for N B classification (Fig. 7A). Whereas there were no statistical differences in accuracy, RCI, and kappa between the CNN and the VGG19-PCA-MLP (Fig. 7B, Supplementary Table 8), we observed that the validation loss and accuracy of the training curves of this CNN were far more fluctuating than those of the VGG19-PCA-MLP (Fig. 7C,D). Therefore, the standard deviation of the accuracy and the loss in the steady state of the CNN training was significantly larger than those of the VGG19-PCA-MLP (p-values: 2.30 × 10 −107 for the loss, and 1.03 × 10 −255 for the accuracy by two sample F-test) (Fig. 7E,F). Since the CNN has many more parameters to learn than VGG19-PCA-MLP, the cost function of the validation set may have many more local minima than that of the training set, which makes the validation loss and accuracy fluctuate during the training process. This suggests that DTL is more robust to the data variability and can produce a more generalizable classifier. The DTL also used significantly less computational resources than the CNN. For one-time training, the VGG19-PCA-MLP took 30% less time than the CNN (Fig. 7G; NVIDIA GTX 1080Ti was used). In the VGG19-PCA-MLP training, the majority of time was spent in the feature extraction (VGG-PCA) rather than MLP training (Fig. 7G). Once the features of the training set were extracted, the repeated training was highly efficient whereas the CNN training required the feature extraction in every step of the training. Optimizing VGG19-PCA-MLP was much more efficient compared to the CNN, which could allow for training VGG19-PCA-MLP with computationally limited POC devices. Moreover, after combining the automatic cell candidate identification and our DTL based prediction, it took 7.7 seconds to process the whole FOV image (3000 × 3500 pixels, the number of cell candidates: 100). In summary, these results demonstrate the feasibility of hologram classification without reconstruction, simplifying the workflow and decreasing the computational cost for a POC application.

Discussion
We have demonstrated that DTL approaches can effectively classify holograms of bead-bound cells without reconstructing original object images. The conventional reconstruction involves heavy computation, executing iterative phase recovery processes. Our DTL approach requires much less computational power, which could allow for POC devices to train and predict raw holograms. Intriguingly, our neural networks reliably handled overlapping interference patterns among cells or between cells and unbound beads. In our training set, the target cells were positioned at the centers of the images, and other cells or unbound beads were away from the image centers. More than 70% of intensity is concentrated in the first inner circle of a hologram, whereas interferences between two holograms usually happen in the fringes and have much weaker signal strength. Conceivably, the trained networks placed more weight on the hologram center, effectively ignoring fringe patterns.
Our DTL approach could offer appealing new directions to further advance LDIH: (i) deep learning-based training/classification can be executed at the local device level without complex computation; (ii) not relying on high-resolution reconstructed images -the classification network is robust to experimental noise such as reconstruction errors or artifacts; and (iii) the network is elastic and can be continuously updated for higher accuracy in POC devices. With these merits, we envision that the developed ML networks will significantly empower LDIH, realizing a truly POC diagnostic platform.

Methods
Data Collection. Samples were prepared by labeling cancer cells (SkBr3, A431) with polystyrene beads (diameter, 6 µm). We prepared four different sets of beads. Three sets were conjugated with antibodies against different molecular targets: EGFR, EpCAM, and HER2; the fourth set was conjugated with control IgG antibodies. Aliquots of cells were labeled with each set of beads. Labeled cells, suspended in buffer, were loaded on a microscope slide, and their holograms were imaged using the LDIH system 3 . To prepare the dataset set for classification, we reconstructed object images from holograms using a previously developed algorithm 3 . We cropped holograms (270 × 270 pixels) around the position of the automatically detected cell candidates (see Cell Candidate Detection below). Three researchers manually annotated the holograms of the cropped cell candidates with the following labels: the number (0, 1, 2, 3, ≥4) of the beads attached to cells, an unattached bead, multiple cells, and artifacts. Later, we collectively labeled the beads unattached to cells, multiple cells, and artifacts as 'background. ' Cell Candidates Detection. We implemented computational methods which automatically localized the single-cell candidates in the hologram images based on the diffracted patterns of concentric circles in the holograms 32 . The algorithm uses the fact that the gradient directions of holograms on concentric circles converge to the centers of the diffraction patterns. The detailed detection procedure is the following: (1) We normalized the holograms by dividing the pixel values by the background intensity values and then rescaled them into a range [0, 255]. (2) We denoised the normalized hologram using Gaussian blurring with a 6-pixel size (MATLAB function imgaussfilt()). Then, we calculated the gradient direction and magnitude of the denoised holograms using the MATLAB build-in function imgradient () with 'prewitt' method. (3) We thresholded the gradient magnitude images using a threshold value of 8.0, which removed the small gradient magnitude pixels and generated the binary mask. Then, the gradient direction images were masked by the gradient magnitude binary mask. (4) Along each direction in the masked gradient direction images, the frequencies of the gradient directions were accumulated within a specified range (50-pixel length, which generated the frequency maps of the gradient direction). (5) We denoised the frequency accumulation map using Gaussian blurring with 3-pixel size (MATLAB function imgaussfilt()). Then, we thresholded the denoised frequency accumulation map using the top 1% of the pixel values, and locate the center candidates. Then, we cropped 270 × 270 hologram and object image patches around the detected candidate center positions.
Labelling Training Set. Three annotators independently labeled cropped holograms and their corresponding object images. In order to balance the class distribution, we augmented the image data labeled with N B = 1, 2, 3, and ≥4. The augmentation was performed by two strategies: rotation with a range of [0, 40] and zooming-in with the maximum value, 0.2, using the Keras library.

Machine Learning Classification.
Using the VGG19 pretrained model, we extracted image features from cropped holograms and object images. Since VGG19 was originally used for color images (RGB channels), and our data were of grayscale format, the same data was used in each channel. For data preprocessing, we perform the standard normalization, where each image patch was subtracted by its mean value and divided by its standard deviation. After the features extracted from VGG19, PCA (Principal Component Analysis) was performed to reduce the dimensionality of the data from 32768 to 500. After the feature extraction step, we used an MLP (Multilayer Perceptron) consisting of three fully-connected neural network blocks for the classification. The first two blocks have a fully-connected (FC) layer, Batch Normalization layer, ReLU activation and Dropout layer (parameter: 0.5). The FC layers in the first two blocks have the sizes of 128 and 64, and the L2 norm regularizer (parameter: 0.05). The third block has an FC layer with 'softmax' activation. Also, Support Vector Machine (SVM) and Random Forest (RF) were applied to compare the performance with the MLP. The parameters of SVM and RF were optimized by using the grid-search method from the sklearn package shown below.
The parameters of the grid search for RF The parameters of the grid search for SVM To show the roles of the pretrained VGG19 model, we trained a CNN using the same dataset (Fig. 7A). The CNN has three feature extraction blocks consisting of two convolutions layers and one max-pool layer (4 × 4.) After this feature extraction, the same MLP structure was used for the classification.
Performance evaluation of the classifiers. We split the augmented dataset into three groups in a stratified fashion using the class labels: training, validation and test sets (64: 16:20). The training set (64%) was used for training the network. The validation set (16%) was used for the model selections. After the training, the classification performance was evaluated using the testing set (20%). For robust statistical analysis, we repeated the training 20 times. The performance measures were the accuracy, Cohen's Kappa coefficient (Kappa) 34 , and relative classifier information (RCI) 36 . Kappa is a standard metric for multi-categorical classification and RCI, as an entropy-based measure, is also suitable to evaluate the performance by measuring the reduced uncertainty by the classifier in comparison to the prior class distribution. For the classification involved with negative and positive bead attachment, we also measure sensitivity and specificity using the sklearn.metrics python package. The samples of the positive bead attachment were treated as 'positive, ' and the other cases were treated as 'negative' . For the statistical testing, we used unpaired two-tailed Wilcoxon rank sum tests which do not rely on the assumption of normality.

Molecular Profiling.
To quantify the distribution of the proportion or the frequency of the number of the attached beads (N B ), we chose 18 images, whose cell candidates were larger than 15. For each image, we calculated the proportion and the frequency of the predicted and the actual number of attached beads in each hologram.
Comparison between VGG19-PCA-MLP and CNN. To evaluate the performance between VGG19-PCA-MLP and CNN, the performance measures including accuracy, Kappa and RCI were used as described above. Then, the fluctuations of the validation accuracy and loss were measured as follows: we selected the last 20 epochs for each training process, and then calculate the residuals by subtracting the sample mean value. We repeated the training twenty times with random data splitting. The statistical test for the difference of variance was performed by two sample F-test.

Code Availability Statement
The code used in the current study is available from the corresponding author upon reasonable request.

Data Availability Statement
The datasets used in the current study are available from the corresponding author on reasonable request.