Deep Learning for Imaging Flow Cytometry:

Abstract

networks with non-linear dimension reduction. DeepFlow uses learned features of the 23 neural network to visualize, organize and biologically interpret single-cell data. 24 Dissecting the cell cycle as a source of cell-to-cell variability is crucial for quantitative 25 single-cell biology. We demonstrate DeepFlow for a large dataset of cell-cycling Jurkat 26 cells. First, we reconstruct the cells' continuous progression through cell cycle from raw 27 image data. This shows that DeepFlow can learn a continuous distance measure between 28 categorical phenotypes. Second, we are able to detect and separate a subpopulation of 29 dead cells, although the data set had been cleaned using established approaches. 30 DeepFlow detects this morphologically abnormal subpopulation in an unsupervised 31 manner. Third, in label-free classification of cell cycle phases, we reach a 6-fold 32 reduction in error rate as compared to a recent approach based on boosting on a series 33 of image features. In contrast to previous methods, DeepFlow's predictions are fast 34 enough to consider integration with the imaging flow cytometry measurement process. 35 Author Summary 36 We present DeepFlow, a deep learning based data analysis workflow optimized for the 37 requirements of imaging flow cytometry. We use it to analyze a large data set of a  Further, IFC generates high-dimensional information for each cell, including 54 spatially-mapped intensity information for thousands of pixels for each of several 55 channels: brightfield and darkfield (which require no staining procedure) and, optionally, 56 several fluorescence channels. This means a dramatic increase in information content as 57 compared to the measurement of a single spatially integrated fluorescence intensity 58 value for each channel, as in conventional flow cytometry [3]. Finally, IFC provides one 59 image for each single cell, and hence does not require whole-image segmentation.

60
It is often not known in advance which morphological features are useful to 61 distinguish specific, often rare, phenotypes in IFC. Classical computer vision algorithms 62 are unlikely to extract sufficient metrics to capture all relevant morphological features. 63 Deep learning, by contrast, potentially captures many more subtleties of image data. 64 Here, we present the deep learning based data analysis workflow DeepFlow -deep 65 learning for imaging flow cytometry. It consists of a deep convolutional neural network 66 combined with a standard softmax classifier and a visualization tool based on non-linear 67 dimension reduction (Fig. 1).

68
DeepFlow enables improved data analysis capabilities for IFC as compared to prior 69 traditional machine learning methods [4][5][6][7]. This is mainly due to three general 70 advantages of deep learning over traditional machine learning: there is no need for 71 cumbersome preprocessing and manual feature definition, classification accuracy is 72 improved and learned features can be visualized to uncover their biological meaning.

73
Other recent work on deep learning in high-throughput microscopy either relied on 74 engineered features [8], focused on whole-image segmentation without addressing 75 visualization of network features [9]. Reference [10] is most closely related to the present 76 work, but neither presents an optimized solution to Imaging Flow Cytometry data, nor 77 Representative images for the cell cycle stages as measured in brightfield, darkfield and fluorescence channels. Seven cell cycle stages define seven classes. We only show one representative image for the interphase classes G1, S, and G2, which can hardly be distinguished by eye.
addresses the particular challenges of a continuous biological process, like cell cycle.

79
We used a data set of 32,266 asynchronously growing immortalized human T  Recent advances in deep learning have shown that deep neural networks are able to 93 learn powerful feature representations [12][13][14][15]. For DeepFlow, we adapt the widely used 94 "Inception" architecture [14], and optimize it for treating the relatively small input 95 dimensions that occur in IFC data. The architecture consists in 13 three-layer 96 "dual-path" modules (Suppl. Fig. 7), which process and aggregate visual information at 97 an increasing scale. These 39 layers are followed by a standard convolution layer, a fully 98 connected layer and the softmax classifier. Training this 42-layer deep network does not 99 present any computational difficulty, as the first three layers consist in "reduction 100 dual-path" modules (Suppl. Fig. 7), which strongly reduce the original input dimensions 101 prior to convolutions in the following "normal dual-path modules". The number of 102 kernels used in each layer increases towards the end, until 336 feature maps with size 8 103 × 8 are obtained. A final average pooling operation melts the local resolution of these 104 maps and generates the last 336-dimensional layer, which serves as an input for both 105 classification and visualization. The neural network operates directly on the uniformly 106 resized images from an arbitrary number of channels of the Imaging Flow Cytometer. It 107 is trained with cell images that have been labeled as described above, using stochastic 108 gradient descent with standard parameters (see Suppl. Notes). Here, we focus on the 109 case in which only the brightfield and darkfield channels are used as input for the 110 network, during training, visualization and prediction. This case is interesting as 111 sometimes fluorescent labeling might affect the biological process under study and 112 should then be avoided. Also, this case provides much less information as when using all 113 channels, and hence provides a difficult benchmark test for DeepFlow. We note, 114 however, that technical imperfections in the IFC data capture might always lead to a 115 minor amount of fluorescence signal, activated by a fluorescence channel, in the darkfield 116 and brightfield channels, a phenomenon known as "bleed through" (see Suppl. Notes). 117

118
To show how learned features of the neural network can be used to visualize, organize 119 and biologically interpret single-cell data, we study the activations in the last layer of 120 the neural network [10,16]. We refer to this as studying the activation space 121 representation of the data. The approach is motivated by the fact that the neural 122 network strives to organize data in the last layer in a linearly separable way, given that 123 it is directly followed by a softmax classifier. Euclidian distances in this space can be  Still, the activation space of DeepFlow's last layer has 336 dimensions; it is much too 129 high-dimensional to be accessible for human interpretation. As the fine-grained 130 geometric structure of data in this space is highly complex, non-linear dimension 131 reduction methods are best suited to visualize the data in a lower dimensional space. 132 We thus use t-distributed stochastic neighbor embedding (tSNE) [17] to visualize the 133 activation space representation [16] of a validation data set.

135
In this visualization, we observe that the Jurkat cell data is organized in a long 136 stretched cylinder along which cell cycle phases are ordered in the chronologically 137 correct order (Fig. 3a). This is remarkable as the network has been provided with 138 neither structure within the class labels nor the relation among classes but simply with 139 categorical class labels. The learned features evidently allow reconstructing the 140 continuous temporal progression from the raw IFC data, and by that allow defining a 141 continuous distance between the phenotypes of different cell cycle phases. 142 We separately visualized just those cells annotated as being in the interphase classes 143 (G1, S, G2) ( Figure 3b). We overlaid on this structure a color map displaying the DNA 144 content of cells as obtained after segmentation of images from one of the fluorescent 145 channels (PI). The DNA content reflects the continuous progression of cells in the cell 146 cycle on a more fine-grained level. Its correspondence with the longitudinal direction of 147 the cylinder found by tSNE demonstrates that the temporal order learned by the neural 148 network is accurate even beyond the categorical class labels.  DeepFlow detects abnormal cells.

150
Both tSNE visualizations (Fig. 3a,b) produce a small, separate cluster, highlighted with 151 an arrow in Fig. 3b. This cluster is learned in an unsupervised way as cell cycle phase 152 labels provide no information about it: it contains cells from all three interphase classes. 153 While cells in the bulk have high circularity and well defined borders (Fig. 3c), cells in 154 the small cluster are characterized by morphological abnormalities such as broken cell 155 walls and outgrowths, signifying dead cells (Fig. 3d).

156
Comparison with previous approaches.

157
For comparison, we show the tSNE visualization of cells from the interphase classes in a 158 space of image-analysis based features (Fig. 3e), used in Ref.
[5]. The data is neither 159 organized in a continuous way that reflects cell cycle progression, nor can one detect a 160 cluster of abnormal cells.

161
Analysis of intermediate-layer activation patterns. 162 We interpret the data representation encoded in one of the trained intermediate layers 163 of the neural network by inspecting its activation patterns using exemplary input data 164 from several classes (Fig. 4). These activation patterns are the essential information

174
DeepFlow outperforms boosting for cell cycle classification. 175 We study the classification performance of DeepFlow on the validation data set shown 176 in Fig. 3. We first focus on the case in which G1, S and G2 phases are considered as a 177 single class. Using five-fold cross-validation on the 32,266 cells, we obtain an accuracy of 178 98.73%±0.16%. This means a 6-fold improvement in error rate over the 92.35% off-diagonal entries of the two lower rows of the matrix (Fig. 5a). This means high 184 sensitivity, most cells from mitotic phases are correctly classified as such. Still this 185 comes at the price of low precision: many cells from the interphase class are classified as 186 mitotic phases, as indicated by the high numbers in the off-diagonal entries of the first 187 row of the matrix (Fig. 5a). DeepFlow, by contrast, achieves high sensitivity and 188 precision, leading to an almost diagonal confusion matrix (Fig. 5b).

189
DeepFlow enables separation of all seven cell cycle classes. 190 We also evaluated the full seven-class problem in which the three interphase classes are 191 considered individually. Here, we obtain an accuracy of 79.40%±0.77%. This number distinguish (see Fig. 2), even when using information from the fluorescence channels.

195
The accuracy might therefore be affected by wrong labelling, it might be higher if all fluorescence channels were used as input for the neural network, and it might be slightly 197 lower if "bleed through" enriched brightfield and darkfield images. If high classification 198 accuracy is of importance, and one is not only interested in visualizing and interpreting 199 the data, these questions have to be answered from case to case. Their answer depends 200 in particular on how labels are generated and how many channels of the IFC are used. 201 Here, we confirm that the considerably lower accuracy as compared to the five-class 202 problem results primarily from cells in the S phase being wrongly classified as either G1 203 or G2 (Fig. 6a). This is also shown by the Receiver Operating Characteristic, which 204 relates the true positive rate (sensitivity) with the false positive rate (fall-out) as the 205 classification threshold changes (Fig. 6b). Integrating the curve to obtain the standard 206 performance metric "Area under the curve" (AUC). Even though the AUC for the S 207 phase is still high with 0.87, it is the lowest among the majority classes (G1,S,G2), and 208 therefore has a strong effect on the accuracy. Overall we find that all seven classes yield 209 high values, greater than 0.85, and four of the seven classes, yield very high values, 210 greater than 0.95.

212
The visualization of the data as encoded by the last layer of the network using tSNE 213 demonstrates how DeepFlow overcomes a well known issue of traditional machine 214 learning. When trained on a continuous biological process using discrete class labels, 215 conventional machine learning often fail to resolve the continuum [4]. We confirmed this 216 for the present data set of cell-cycling Jurkat cells (Fig. 3e), but note that resolving the 217 continuous cell cycle progression based on features such as those of Ref.
[5], has recently 218 been enabled [18] by combining feature extraction with an elaborate trajectory learning 219 algorithm [19]. While this approach still suffers from many other disadvantages of 220 traditional machine learning, as mentioned before, DeepFlow's approach is conceptually 221 much simpler. The learned features of the neural network -which can also be used for 222 classification -directly generate a feature space, in which data is continuously 223 organized [20]. DeepFlow learns in an unsupervised way that some cells within the G1 224 phase are at the very beginning of the cell cycle, whereas others in the G1 are already 225 transitioning into the S phase. This is possible as adjacent classes are morphologically 226 more similar to each other than classes that are temporally further separated. Of course, 227 if this assumption of continuity fails, then DeepFlow will not reconstruct the continuous 228 progression, a limitation in common with the other mentioned algorithmic 229 approaches [18]. Also, note that DeepFlow's simple reconstruction of a continuum is 230 quite exceptional compared to other fields of single-cell biology. For example, in the 231 analysis of single-cell transcriptomic and proteomic data, much research effort has been 232 applied to solve precisely this task [19,21,22].

233
The unsupervised detection of a discrete cluster of abnormal cells indicates that the 234 network learns the cluster of abnormal cells independently of the cell-cycle-label based 235 training. The model is therefore not only capable of resolving the biological process of 236 the cell cycle, but generates features that are general enough to separate even 237 incorrectly labeled cells, which do not belong to this process. This shows the ability of 238 DeepFlow to find completely unknown phenotypes and processes without knowledge 239 about features or even labels in cell populations from IFC data. There is also a high 240 practical use of the detection of damaged cells. The data set used in this paper has been 241 preprocessed using the IDEAS ® (Merck Millipore Inc.) analysis software to remove 242 images of abnormal cells. In particular, we removed out of focus cells by gating for 243 images with gradient RMS and debris by gating for circular objects with a large area.

244
The discovery of a cluster of abnormal cells shows the limitations of this approach and 245 provides a solution to it.

246
An advantage of using a neural network for cell classification in IFC is its speed.  We also expect DeepFlow to be helpful for a wide variety of image data, including 269 images from high-throughput microscopy. Although generally lower-throughput in 270 terms of the number of cell processed, conventional microscopy is nevertheless still 271 high-throughput and can usually provide higher resolution images than IFC, providing 272 advantages for some biological processes. Furthermore, given that multi-spectral 273 methods are advancing rapidly, imaging mass spectrometry is allowing dozens of labeled 274 channels to be acquired [23,24]. Due to its basic structure and high flexibility,  Preprocessing.

287
Our algorithmic workflow of cell cycle analysis with Deep Neural Networks begins with 288 brightfield and darkfield images from the cells. In order to allow uniform training of our 289 network on the whole dataset, we resize the images to 66 × 66 pixels by stretching the 290 border pixels. We choose this method over individual image rescaling to avoid the 291 destruction of possibly important size relation information between cells.

292
The data set used in this paper has been preprocessed using the IDEAS ® software 293 (Merck Millipore Inc.) to remove images of abnormal cells. In particular, we removed 294 out of focus cells by gating for images with gradient RMS and removed debris by gating 295 for circular cells with a large area.  The DeepFlow network architecture consists of 42 layers, which results in a total 301 number of parameters of about 2 mio. It is build up starting with 3 dual-path reduction 302 modules, followed by 10 normal dual-path modules, one pooling layer, one fully 303 connected layer and the softmax layer. Each dual-path module consists in 3 layers: a 304 convolution layer, a batch normalization layer and an activation layer. Although there is 305 no "big" fundamental difference between dualpath and standard convolution modules, 306 dual-path based networks tend to converge a little better in practice, since the gradient 307 flow from pooling and convolution in the reduction module counteracts the vanishing 308 gradient problem: not the entire gradient gets multiplied by approx 10 −4 convolutional 309 weight, pooling just lets it through.

310
In the first (input) layer, all IFC channels are combined in a linear operation by 311 feeding them in the channel -which equals the color -dimension of the convolution 312 input. This means the convolution uses kernels which convolve over all channels 313 simultaneously. The number of 3×3 kernel weights then is nine times the number of 314 channels. Increasing the number of channels simply increases the "kernel depth" in the 315 color dimension, and hence, is trivial.

317
The network was trained for 100 epochs using stochastic gradient descent with standard 318 parameters: 0.9 momentum, a fixed learning rate of 0.01 up to epoch 85 and of 0.001 319 afterwards as well as a slightly regularizing weight decay of 0.0005. Training took 320 around 7 h and was stopped manually by inspecting convergence cross-entropy.

322
For the results presented in this paper, we implemented DeepFlow using the MxNet have also implemented and successfully tested our architecture using TensorFlow, which 327 is available from https://github.com/tensorflow/tensorflow. The user might 328 choose the software package according to personal preferences.

329
Nonlinear dimension reduction. 330 We use the tSNE implementation of Ref.

333
The data acquired using the ImageStream was fully compensated using typical control 334 images (see Ref. [26]) so the image tiffs would have minimal bleed through between 335 channels. We could not detect even a slight indication of bleed through in the Jurkat 336 cell data, neither upon inspection by eye, nor upon correlating the integrated intensity 337 of each fluorescence channel with the integrated intensity of bright and darkfield 338 channels, respectively. We then checked the existence of bleed through in the Cytometer 339 used for data generation by switching off the light source of the brightfield channel, 340 while keeping the fluorescence excitation on. We would then expect zero intensity in the 341 brightfield images, but instead measured a slight intensity stemming from the 342 fluorescence channels. This common technical aspect of IFC measurements merits an 343 own investigation and will appear elsewhere. Here, our aim is to compare methodologies 344 rather than to claim absolute levels of accuracy.