Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry

Deep learning has achieved spectacular performance in image and speech recognition and synthesis. It outperforms other machine learning algorithms in problems where large amounts of data are available. In the area of measurement technology, instruments based on the photonic time stretch have established record real-time measurement throughput in spectroscopy, optical coherence tomography, and imaging flow cytometry. These extreme-throughput instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments. Owing to the abundance of data they generate, time-stretch instruments are a natural fit to deep learning classification. Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time-stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood. Such a technology holds promise for early detection of primary cancer or metastasis. Here we describe a new deep learning pipeline, which entirely avoids the slow and computationally costly signal processing and feature extraction steps by a convolutional neural network that directly operates on the measured signals. The improvement in computational efficiency enables low-latency inference and makes this pipeline suitable for cell sorting via deep learning. Our neural network takes less than a few milliseconds to classify the cells, fast enough to provide a decision to a cell sorter for real-time separation of individual target cells. We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion.

instruments generate approximately 1 Tbit/s of continuous measurement data and have led to the discovery of rare phenomena in nonlinear and complex systems as well as new types of biomedical instruments.Owing to the abundance of data they generate, time stretch instruments are a natural fit to deep learning classification.Previously we had shown that high-throughput label-free cell classification with high accuracy can be achieved through a combination of time stretch microscopy, image processing and feature extraction, followed by deep learning for finding cancer cells in the blood.Such a technology holds promise for early detection of primary cancer or metastasis.Here we describe a new implementation of deep learning which entirely avoids the computationally costly image processing and feature extraction pipeline.The improvement in computational efficiency makes this new technology suitable for cell sorting via deep learning.Our neural network takes less than a millisecond to classify the cells, fast enough to provide a decision to a cell sorter.We demonstrate the applicability of our new method in the classification of OT-II white blood cells and SW-480 epithelial cancer cells with more than 95% accuracy in a label-free fashion.
Deep learning provides a powerful set of tools for extracting knowledge that is hidden in largescale data.In image classification and speech recognition, deep learning algorithms have already made big inroads scientifically and commercially, creating new opportunities in medicine and bioinformatics 1 .In medicine, deep learning has been used to identify pulmonary pneumonia using chest X-ray images 2 , heart arrhythmias using electrocardiogram data 3 , and malignant skin lesions at accuracy levels on par with trained dermatologists 4 .The predictive potential of deep neural networks is also revolutionizing related fields like genetics and biochemistry where the sequence specificities of DNA-and RNA-binding proteins have been determined algorithmically from extremely large and complex datasets 5 .Recently, a deep-learning assisted image-activated sorting technology was demonstrated.It used frequency-division-multiplexed microscope to acquire fluorescence image by labeling samples and successfully sorted microalgal cells and blood cells 6 .
Moreover, deep learning models help to analyze water samples so that the ocean microbiome is monitored 7 .
Flow cytometry is a biomedical diagnostics technique which generates information gathered from the interaction of lasers with streaming cellular suspensions to classify each cell based on its size, granularity, and fluorescence characteristics through the measurement of forward-and sidescattered signals (elastic scatterings), as well as emission wavelength of fluorescent biomarkers used as marker-specific cellular labels (inelastic scatterings) 8,9 , respectively.One application of this technology is fluorescence-activated cell sorting (FACS) which enables the physical collection of cells of interest away from undesired cells within a heterogeneous mixture using multiple fluorescent labels to apply increasingly stringent light scattering and fluorescent emission characteristics to identify and collect target cell populations.
Despite the growing utility of flow cytometry in biomedical research and therapeutics manufacturing, the use of this platform can be limited due to the use of labeling reagents which may alter the behavior of bound cells through their inadvertent activation or inhibition prior to collec-tion or through the targeting of unreliable markers for cell identification.CD326/EpCAM 10 is one example of the latter.This protein was initially accepted as a generic biomarker for cancer cells of epithelial origin (or their derivatives such as circulating tumor cells) but was later found to be heterogeneously expressed on both or even absent on the most malignant CTC 11 demonstrating some limitations to this approach.While these findings provide a rationale for the development of labelfree cellular analysis and sorting platforms, sole reliance on forward-and side-scattered signals in the absence of fluorescence labeling information has been challenging as a cellular classification modality due to poor sensitivity and selectivity.
As a solution, label-free cell sorting based on additional physical characteristics has gained popularity 12,13 .This approach is compatible with flow cytometry, but entails rapid data analysis and multiplexed feature extraction to improve classification accuracy.To achieve feature expressivity, parallel time stretch quantitative phase imaging (TS-QPI) methods are employed [14][15][16][17] to assess additional parameters such as cell protein concentration (correlated with refractive index) and categorize unlabeled cells with increased accuracy.
We have recently introduced a novel imaging flow cytometer that analyzes cells using their biophysical features 18 .Label-free imaging is implemented by photonic time stretch 19,20 and the trade-off between sensitivity and speed is mitigated by using amplified time-stretch dispersive Fourier transform 19,[21][22][23][24] .In time-stretch imaging 25 , the target cell is illuminated by spatially dispersed broadband pulses, and the spatial features of the target are encoded into the pulse spectrum in a short pulse duration of sub-nanoseconds.Both phase and intensity quantitative images are captured simultaneously, providing abundant features including protein concentration, optical loss, and cellular morphology [26][27][28][29] .This procedure was successfully used as a classifier for OT-II hybridoma T-lymphocytes and SW-480 colon cancer epithelial cells in mixed cultures and distinct sub-populations of algal cells with immediate ramifications for biofuel production 18 .However, the image processing pipeline to extract morphological and biophysical features from label-free images has proven costly in time, taking several seconds to extract the features of each cell.This relatively long processing duration prevented the further development of a time-stretch imaging flow cytometer because classification decisions need to be made within subseconds, prior to the exit of target cells from the microfluidic channel.Even combined with deep learning methodologies for cell classification following biophysical feature determination, the conversion of waveforms to phase/intensity images and the feature extraction are demanded to generate the input datasets for neural network processing.
To remove the time-consuming steps of image formation and hand-crafted feature extraction, we developed and describe the use of a deep convolutional neural network to directly process the one-dimensional time-series waveforms from the imaging flow cytometer and automatically extract the features using the model itself.Eliminating the requirement of an image processing pipeline prior to the classifier, the running time of cell analysis can be reduced significantly, and cell sorting decisions can be made in less than a millisecond, orders of magnitude faster than previous efforts 18 .
Furthermore, we find that some features may not be represented in the phase and intensity images extracted from the waveforms, but can be observed by the neural network when the data is provided as the raw time-series waveforms.These hidden features, not available in manually designed image representations, enhance the model to perform cell classification more accurately.The balanced accuracy and F 1 score of our model reach 95.74% and 95.71%, respectively, for an accelerated classifier of SW-480 and OT-II cells, achieving a new state of the art in accuracy, while enabling cell sorting by time-stretch imaging flow cytometry for the first time.

Results
In order to study the learning behavior of the model, the neural network is evaluated on the training and validation datasets for every epoch of each class and their averaged forms (Fig. 1).There are multiple ways to measure the performance of this model.Tracking the F 1 score is one such example.The F 1 score is the harmonic mean of precision and recall, where precision is the pos- Ultimately, the weighted-averaged validation F 1 scores observed achieved 97.01%accuracy.To evaluate the reproducibility of the results obtained by this neural network, the training procedure was repeated five times starting from randomly initialized weights and biases and demonstrated significant concordance between runs (The standard variation of the results was less than 0.82% at the last epoch).

Discussion
In order for label-free real time flow cytometry to become a feasible methodology, imaging and data analysis need to be completed while the cell is traveling the distance between sample inlet of the microfluidic channel and the cell sorting mechanism (Fig. 2).During imaging, the time-stretch

Conclusion
In this manuscript, a deep convolutional neural network for direct processing of flow cytometry waveforms was presented.The results demonstrate record performance in label-free classification of cancerous cells with a test F 1 score of 95.71% and accuracy of 95.70% for all classes evaluated.
The system achieves this accurate classification in less than a millisecond, enabling real-time labelfree cell sorting.
itive predictive value measuring the correctness of the classifier and the recall measures the completeness.Therefore, F 1 score is considered a very effective means of measuring classification performance.Since the examples in the dataset are categorized into three classes (SW-480, OT-II and blanks), the task for the neural network is multi-class classification as evaluated by calculating the F 1 score per class and also their averaged forms.Three forms of F 1 score averaging are taken into account: (1) the micro-averaged F 1 score, which considers aggregate true positives for precision and recall calculations; (2) the macro-averaged F 1 score, which evaluates precision and recall of each class individually, and then assigns equal weight to each class; (3) and the weightedaveraged F 1 score that assigns a different weight to each class should the dataset be imbalanced.Orange curves show the train F 1 score while green curves show the results of validation F 1 score.Comparing the classification performance for each class, this neural network demonstrates suc-cessful recognition of SW-480 colorectal cells and OT-II hybridoma cells upon completion of the first training epoch.Interestingly, classification of the acellular dataset require approximately 10 epochs to achieve similar performance.The overall performance is determined by the averaged F 1 scores of these three classes.The F 1 scores of the training and validation datasets continue to improve until a maximum is reached at approximately the epoch 50.Meanwhile, the approximate performance of training and validation reveals a good generalization of this neural network.

Figure 1 :
Figure 1: Convergence of the network training.F 1 score, as a measure of the classification

Figure 2 :
Figure 2: Potential application of deep learning in cell sorting.A microfluidic channel with