Stain-free identification of cell nuclei using tomographic phase microscopy in flow cytometry

Quantitative Phase Imaging (QPI) has gained popularity in bioimaging because it can avoid the need for cell staining, which in some cases is difficult or impossible. However, as a result, QPI does not provide labelling of various specific intracellular structures. Here we show a novel computational segmentation method based on statistical inference that makes it possible for QPI techniques to identify the cell nucleus. We demonstrate the approach with refractive index tomograms of stain-free cells reconstructed through the tomographic phase microscopy in flow cytometry mode. In particular, by means of numerical simulations and two cancer cell lines, we demonstrate that the nucleus can be accurately distinguished within the stain-free tomograms. We show that our experimental results are consistent with confocal fluorescence microscopy (FM) data and microfluidic cytofluorimeter outputs. This is a significant step towards extracting specific three-dimensional intracellular structures directly from the phase-contrast data in a typical flow cytometry configuration.


S1. CSSI algorithm
In order to describe the steps of the proposed CSSI algorithm, sketched in Fig. 3c, we exploit the 3D numerical cell phantom shown in Figs. 3a,b.
1. The 3D RI tomogram of the analyzed cell is centered in its × × array, that is then divided into distinct cubes, each of which has an edge measuring pixel, as shown in the central xz-slice in Fig. S1a.
The parameter is the resolution factor at which the 3D array is firstly analyzed. It must be an even number and, after dividing each side of the 3D array by , an odd number must be obtained. Therefore, each distinct cube contains 3 voxels (i.e. RI values). The cubes completely contained within the cell shell are the investigated cubes (yellow cubes within the blue cell shell in Fig. S1a). The central cube is not an investigated cube, since it is taken as a reference cube (green cube in Fig. S1a), which vertices have x-, y-, and z-coordinates taken from pairs ( 1 , 2 ), ( 1 , 2 ), and ( 1 , 2 ), respectively. In Fig. S1b, we also display the 3D array from which the central xz-slice of Fig. S1a has been selected.
As discussed in the Main Text, the CSSI algorithm is based on the WMW test 35,36 . It is a rank-based non-parametric statistical test, thus distributions do not have to be normal. With a certain significance level γ, it allows rejecting or not the null hypothesis H0 for which two sets of values have been drawn from the same distribution. The significance level γ is the probability of making an error of 1st species, i.e. of rejecting the null hypothesis H0 when it is true. The confidence level is defined as 1-γ, i.e. it is the probability of not rejecting the null hypothesis H0 when it is true. An important parameter in a statistical test is the p-value, which ranges from 0 to 1. The p-value is the observed significance level, i.e. the smallest significance level at which H0 is rejected. It can be also defined as the probability of obtaining results at least as extreme as the results actually observed, when the null hypothesis H0 is true. Therefore, a low p-value leads to reject the null hypothesis H0, because it means that such an extreme observed result is very unlikely when the null hypothesis H0 is true. In fact, if the p-value≥γ, H0 is not rejected with significance level γ, while if p-value<γ, H0 is rejected with significance level γ. Therefore, the greater the p-value the greater the confidence level with which two sets of values have been extracted from the same population. Hence, our algorithm performs multiple comparisons between the investigated cubes and the reference one through the WMW test, because we are assuming that the voxels belong to the subcellular structure of interest, thus if a certain has been drawn from its same distribution, then also the voxels belong to the subcellular structure of interest. Without loss of generality, here we are describing and testing the method in the case of the nucleus segmentation. As discussed in the Main Text, for many kinds of suspended cells (e.g., cancer cell lines) the central voxels of the cell belong to the nucleus, therefore we associate the reference cube to the central cube of the 3D RI tomogram.
2. An adaptive threshold is set according to the p-values computed through the WMW test between the investigated cubes and the reference cube . It is chosen as the maximum value less than or equal to , such that for at least one it happens that p-value≥ . h. After computing thresholds 1 , 2 , 3 , 4 , and 5 , the filtered nucleus set reported in Fig. S1g is formed by cubes , that satisfy one of the following conditions where & is the logical and operator, and are elements of vectors ̅ and ̅ , respectively, with = 1,2, … , , and is the maximum value of vector ̅ .
However, to build a filtered nucleus set , a strong spatial and statistical filtering has been made, in order to store only cubes that belong to the nucleus with high probability, thus leading to a strong underestimation of the nucleus region. Moreover, to increase the statistical power of the WMW test, the resolution factor should not be too small. As a consequence, the -cubic structuring element leads to a low spatial resolution. ii. If the computed p-value≥ α , the examined sub-cube is inserted into the refined nucleus set .
6. All the possible pairs of sub-cubes in the refined nucleus set are linked through a line segment. 7. A morphological closing is performed to smooth the corners of the resulting 3D polygonal and fill its holes, thus finally obtaining the partial nucleus set , displayed in Fig. S1i.

8.
Steps 1-7 are repeated times on the same cell, thus obtaining partial nucleus sets , with = 1, 2, … , , that are slightly different from each other, because in some of the performed WMW tests, the reference set is randomly drawn from a greater one.
9. The sum of all the partial nucleus sets provides a tomogram of occurrences, in which each voxel can take integer values ∈ [0, ] since each voxel may have been classified nucleus times. In Fig.   S2a, the central slice of this tomogram of occurrences is reported.
10. An adaptive threshold * is set to segment the tomogram of occurrences. Let be the number of voxels that have been classified nucleus at least times, with = 1, 2, … , . Therefore, 1 is the number of voxels of logical or among all the partial nucleus sets , while is the number of voxels of logical and among all the partial nucleus sets .
a. A vector ̅ of percentage volumes is created, which elements are computed as with = 1,2, … , , as reported in Fig. S2b by blue dots.
The 3D segmented nuclear OCH should be computed as the set of voxels that have occurred at least times. The parameter should maximize simultaneously the accuracy, sensitivity, and specificity of the proposed CSSI method. In Fig. S2c, these performances are reported for each 3D segmented region composed by voxels that have occurred at least times, with = 1,2, … , , along with the value, highlighted by the vertical green line. However, in a real experiment these trends are unknown, thus the intersection point cannot be computed. Therefore, a criterion is requested to find * threshold, i.e., a suitable estimate of the threshold.
b. The * threshold (red vertical line in Fig. S2b) is found as the index at which the percentage volume vector ̅ is nearest to a threshold (orange horizontal line in Fig. S2b).
In Fig. S2c, where the * threshold is highlighted by the red vertical line, it is clear that, despite * ≠ , * is located in the same quasi-constant region of , hence this estimated threshold leads to very little differences in terms of clustering performances with respect to the optimal one. 11. The final 3D nuclear OCH is made of voxels that have been classified nucleus at least * times, as shown in Fig. 3d.
In Fig. 3d, it is evident that the proposed CSSI algorithm allows segmenting a 3D nuclear OCH very close to the original one, as underlined by accuracy, sensitivity, and specificity reported below the tomograms.
Moreover, these values are very close to the optimal ones that could be obtained by using the optimal threshold instead of the estimated one * , i.e.
All the parameters involved in the proposed CSSI algorithm are described in Table S1. It is worth underlining that, in our experiments, a resolution factor =10 px has been set to analyse arrays made of at least 190 × 190 × 190 voxels, since it was an optimum compromise between the need of having both high resolution in nucleus segmentation and high statistical power in WMW test (see the analysis about the imaging spatial resolution in Section S3).
Our methodology requires as first step the processing of tens of images for computing the 3D tomographic reconstruction and as second step the implementation of the CSSI algorithm for recovering the 3D nuclear OCH for each cell. The overall computational processing takes tens of minutes on a conventional desktop computer. Instead, the numerical processing to recover nuclear organelles from the recorded images takes few minutes in the 2D case of Amnis ImageStreamX due to the use of a commercial software. However, our computational processing can be easily speeded up using Graphic Processing Units, parallel processing strategies, and faster programming languages (in our implementation we used MATLAB ® R2020a).

Fig. S1. CSSI of the stain-free nuclear OCH in a 3D numerical cell phantom (Supplementary Video 1). a
Central xz-slice of the 3D array divided into distinct cubes with edge =10 px. The central reference cube is highlighted in green while the investigated cubes are highlighted in yellow within the cell shell in blue. b 3D array from which the central xz-slice in (a) has been selected. c Preliminary nucleus set made of -cubes classified nucleus after a first rough clustering. d Vector of sorted p-values ̅ computed through the WMW test between each cube , in the preliminary nucleus set and the reduced nucleus set − , with = 1,2, … , . e Vector of sorted and normalized Euclidean distances ̅ (blue dots) between each cube , in the preliminary nucleus set and the centroid of all cubes in , along with the fourth-degree polynomial fitting ̅ (red line), with = 1,2, … , . f First difference ̅ (red line) of vector of fitted sorted distances ̅ in (b), with highlighted in black the lowest value with null slope. g Filtered nucleus set made of -cubes classified nucleus after a spatial and statistical filtering of the preliminary nucleus set in (c). h Refined nucleus set made of /2-cubes classified nucleus after increasing resolution in the filtered nucleus set in (d) through sub-cubes of size /2. i Partial nucleus set obtained by linking sub-cubes in (e) through segment lines and by using morphological closing. In (b,c,g-i), the blue region is the cell shell and the red region is the segmented nucleus at different steps of the CSSI algorithm. (blue dots), i.e. number of voxels classified nucleus at least times normalized to 20 , along with the threshold (horizontal orange line) used to find the estimated threshold * (vertical red line) by an intersection analysis. c CSSI performances associated to each possible threshold for creating the final 3D nucleus set , expressed in terms of accuracy (blue), sensitivity (magenta), and specificity (orange), along with the optimum threshold (vertical green line in which performances are simultaneously maximized) and its estimation * (vertical red line) computed in (b).

Fig. S3. 2D cyto-fluorimetric images of SK-N-SH cells recorded by Amnis ImageStreamX®.
Three cells recorded simultaneously in brightfield images (top) and fluorescent images with the stained nucleus (bottom). The contour of the nucleus segmented by using the fluorescence information is overlapped in red. Scale bar is 5 μm.

S2. CSSI performances of the stain-free nucleus identification in presence of nucleoli
The nucleolus is the biggest structure inside the nucleus and it is a region particularly dense of genetic material, therefore its malfunctioning has been related to different human diseases S1,S2 . For example, the nuclear-nucleolar volume ratio is expected to change in a cancer cell S3 , as well as the number of nucleoli S4 . Moreover, due to its denser genetic composition, on average the nucleolus has a higher RI than the surrounding nucleus 32 . Therefore, we integrated the nucleolus to the same 1000 numerical 3D cell phantoms described in the Methods section in order to check whether its presence could negatively alter the correct identification of the stain-free nucleus by CSSI. Since in diploid human cells there are up to ten possible nucleoli per cell S4 , for each phantom we randomly drawn the number of nucleoli from the uniform distribution 3 { 3 , 3 }. The overall volume of all the nucleoli in a cell is computed from the nucleus-nucleolus volume ratio randomly drawn from the uniform distribution 4 { 4 , 4 } S3 , and then it is divided equally among all the nucleoli, each of them being simulated as a sphere. The nucleolus RIs are randomly drawn from the gaussian distribution 5 ( 5 , 2 ), where the mean value 5 , in turn randomly drawn from the uniform distribution 5 { 5 , 5 }, depends on the nucleus RIs in order to emulate the biological condition in which the nucleolus average RI is higher than the nucleus one. All the described parameters are reported in Table S2. Within the 1000 3D numerical cell phantoms, the multiple nucleoli are randomly localized inside the nucleus. In Figs.
S5a,d,g,j, four numerical cell phantoms are displayed with the same 15 mitochondria and the same nucleus, and with 0, 3, 6, and 9 nucleoli, respectively. As reported in the corresponding RI histograms in Figs. S5b,e,h,k, respectively, the RI values assigned to the nucleoli are greater than the corresponding outer nuclei but still included in their RI distributions in order to consider the worst case condition for segmentation. In Fig. S5c, the nuclear OCH segmented by the CSSI method without considering the presence of nucleoli is mostly overlapped to the nuclear OCHs displayed in Figs. S5f,i,l obtained by the CSSI method in presence of 3, 6, and 9 nucleoli, respectively, as also suggested by the values of accuracy, sensitivity, and specificity reported in Figs. S5c,f,i,l. After applying the CSSI algorithm to the 1000 numerical 3D cell phantoms, we calculated the values of 9 metrics to quantify the CSSI performances in segmenting the stain-free nucleus in the presence of the nucleoli (see the second column of Table S3). The values of the 9 metrics can be directly compared with the corresponding ones in the first column of Table S3 regarding the segmentation of the stainfree nucleus without the simulation of the nucleoli. Moreover, the comparison in Fig. S4 between the histograms of these 9 metrics about these two case studies clearly show that, despite the presence of the nucleolus, even if performances of the CSSI in identifying the stain-free nucleus have slightly decreased, they still keep very high, over the 90 %. Therefore, the presence of the nucleoli inside the nucleus does not substantially worsen the performances of the CSSI nucleus segmentation algorithm.   S5. Numerical assessment of the CSSI algorithm applied to segment the 3D nuclear OCH after the addition of the nucleoli. a, d, g, j Isolevels representation of the 3D cell model, simulated with the same cell membrane, cytoplasm, nucleus, and 15 mitochondria, and with 0, 3, 6, and 9 nucleoli, respectively. b, e, h, k Histogram of the RI values assigned to the nucleus and the nucleoli in (a, d, g, j), respectively. c, f, i, l Nuclear OCH segmented by the CSSI algorithm within the cell phantoms in (a, d, g, j), respectively. The corresponding CSSI accuracy, sensitivity, and specificity are reported at top right.

S3. Extension of the CSSI algorithm to other organelles
The CSSI algorithm is based on the computation of statistical similarities among groups of voxels inside the same cell by starting from an initial guess on the organelle location. To implement it, the sole property that must be satisfied consists in having different statistical distributions of the reconstructed quantity among the several intracellular organelles, as occurs for example in the RI case 32 . For this reason, the same strategy, based on statistics approach, can be also exploited to segment other cell organelles. In particular, the steps of the CSSI algorithm described in Section S1 have been developed on the basis of this statistics working principle at the aim of identifying a single compact organelle having a distinctive RI distribution inside the cell, and it has been demonstrated for segmenting the nucleus. Here we show that the same algorithm can be implemented to segment another single compact organelle, like the nucleolus in case a cell has a single nucleolus (e.g., in slowly cycling cells S5 ). As regards the nucleolus location, two different situations have to be analyzed, namely case 1 and case 2. To describe them, in Fig. S6 we consider a numerical cell phantom with 15 mitochondria and a spherical nucleolus with a 27 times smaller volume than the surrounding nucleus. In the case 1, the nucleolus is not in the center of the cell, as shown by the violet sphere in Fig. S6a. In the case 2, the nucleolus is located in the center of the cell, as displayed by the violet sphere in Fig. S6g. As reported in the RI histograms in Figs. S6b,h, the RI values assigned to the nucleoli simulated in the cases 1 and 2, respectively, are greater than the corresponding outer nuclei but still included in their RI distributions in order to consider the worst case condition for segmentation. As regards the case 1, the CSSI algorithm starts looking for nuclear voxels from the central cube (i.e., the yellow reference cube in Fig. S6c overlapped to the red simulated nucleus).
As the CSSI algorithm is able to segment the nuclear OCH, i.e. the best convex hull that on average overlaps to the cell nucleus, the hole left by the presence of the nucleolus is automatically closed.
Consequently, the correct nucleus segmentation shown in Fig. S6d, directly comparable with the simulated nucleus in Fig. S6c, can be obtained. Then, as the nucleolus has the highest RIs inside the nucleus, the CSSI algorithm can be easily adapted to search for its voxels by setting as starting reference cube the cube having the highest average RI within the segmented nucleus (i.e., the yellow cube in Fig. S6e overlapped to the simulated nucleolus). As displayed in violet in Fig. S6f, the CSSI output is the nucleolar OCH, i.e. the best convex hull that on average overlaps to the cell nucleolus simulated in Fig. S6f. Instead, the case 2 is exactly the opposite of case 1. In fact, if the nucleolus is in the center of the cell (violet simulated region in Fig. S6i) and the CSSI algorithm normally starts from the central reference cube (yellow cube in Fig. S6i), the first output is the nucleolar OCH, as reported in violet in Fig. S6j. In such a case, it can be easily inferred that the segmented region is the nucleolus rather than the nucleus due to its much smaller size. Then, the second step consists in searching for the nucleus by starting from a reference cube that is adjacent to the outer side of the segmented nucleolus (e.g., the yellow cube in Fig. S6k). Again, the nuclear OCH is correctly segmented, as shown in red in Fig. S6l with respect to the red simulated nucleus in Fig. S6k. Therefore, starting from the same 1000 3D numerical cell phantoms described in the Methods section, we simulated 500 phantoms for the case 1 and 500 phantoms for the case 2 by using the same distributions presented in Section S2 by fixing one single nucleolus per cell. In the third column of Table S3, we report the values of 9 metrics to quantify the CSSI performances in segmenting the stain-free nucleolus (the corresponding histograms are displayed in Fig. S7), which are just slightly worse than the stain-free nucleus ones because of the much smaller number of voxels representative of this organelle. In fact, the principal factor that could limit the success of the CSSI algorithm in case of a single compact organelle is a low imaging spatial resolution with respect to the size of the analyzed organelle, which means that the organelle is represented by a very low number of voxels. In such a case, to take into account a lower number of voxels representing a certain organelle, the resolution factor ε can be reduced. However, it cannot be excessively decreased otherwise the opposite effect is obtained, i.e. a bad segmentation of the organelle due to a too low statistical power of the WMW test. The statistical power is indeed the probability of correctly classifying a voxel as non-organelle when it actually does not belong to that organelle, and it is well known that the statistical power of a hypothesis test increases with the number of observations. To demonstrate this property, we simulated a 3D numerical cell phantom with 15 mitochondria and we used it to create 8 possible resolution scenarios, i.e. we simulated inside it the same spherical organelle with a radius changing from 15 to 50 pixels (step of 5 pixels). In Fig. S8a, we show three of these phantoms corresponding to the 50, 30, and 15 pixels radii. For each of these 8 resolution scenarios, we implemented the CSSI algorithm by using three resolution factors, i.e. ε=10 px, ε=8 px, and ε=6 px, which segmentation results are reported in Figs. S8b-d, respectively, for the 50, 30, and 15 pixels radii. In order to quantitatively compare the CSSI segmentation at different organelle sizes (i.e., imaging spatial resolutions) and resolution factors ε, we computed the F1 scores through the formula in Table S3. In Fig. S8e, it is evident that by fixing a too low resolution factor (i.e., ε=6 px), the segmentation performances significantly drop because of the low statistical power of the WMW test.
Instead, the segmentation performances are much higher in the ε=10 px and ε=8 px cases, which means that starting from the resolution factor ε=8 px, the statistical power (i.e., the number of observations) of the WMW test is enough to guarantee a good segmentation output. Moreover, as can be expected, the zoom-in of the F1 score plot in Fig. S8f highlights a growing trend of the classification performances with the organelle pixel size (i.e., with the imaging spatial resolution). In particular, the ε=8 px curve shows the best behavior since it has the lowest slope, which means that, by reducing the imaging spatial resolution, the segmentation performance decreases more slowly than the ε=10 px. In fact, after reaching a sufficient statistical power by selecting ε>6 px, the ε=8 px resolution factor allows following better the external organelle shape due to a better spatial resolution despite the number of observations is lower than the ε=10 px case. It is worth underlying two other important points. First, by using a resolution factor ε at least equal to 8 px, an organelle with an equivalent radius less than 15 pixels cannot be examined since the organelle would be covered by very few distinct cubes. Furthermore, the CSSI computational time increases in a nonlinear way with the resolution factor ε, therefore we selected ε=10 px as a good trade-off to analyze our simulated and experimental tomograms. In summary, remarkably the CSSI algorithm can be used to segment other single compact organelles different from the nucleus but, in case they are very small, the imaging spatial resolution must be able to provide an equivalent radius >15 pixels to implement the CSSI algorithm with at least a resolution factor ε=8 px. Otherwise, the statistics of the WMW test are not robust and this greatly worsens the segmentation performances. Instead, in case of multiple organelles, the same working principle based on the statistical similarities can be exploited. However, the algorithm implemented in Section S1 must be slightly modified to avoid the constraints herein used to gather all the candidate cubes at the aim of segmenting a single compact region, i.e. the nucleus. Fig. S6. Numerical assessment of the CSSI algorithm applied to segment a single 3D nucleolar OCH when far from the center (case 1 in a-f) or in the center (case 2 in g-l) of the 3D numerical cell phantom. a, g Isolevels representation of the 3D cell model, simulated with five sub-cellular components, i.e., cell membrane, cytoplasm, nucleus, nucleolus, and 15 mitochondria. b, h Histogram of the RI values assigned to the nucleus and the nucleolus in (a, g), respectively. c, d Respectively, simulated nucleus and corresponding nuclear OCH (red) segmented by the CSSI algorithm despite the presence of the not-centered nucleolus in (a). In (c), the starting central reference cube C R of the CSSI algorithm is overlapped in yellow. e, f Respectively, simulated nucleolus and corresponding nucleolar OCH (violet) segmented by the CSSI algorithm after the nucleus identification (red) in (d). In (e), the starting reference cube C R of the CSSI algorithm is overlapped in yellow in the zone of the segmented 3D nuclear OCH in (d) having the highest RIs. i, j Respectively, simulated nucleolus and corresponding nucleolar OCH (violet) segmented by the CSSI algorithm because, by starting from the central reference cube C R overlapped in yellow in (i), the centered nucleolus in (g) is found. k, l Respectively, simulated nucleus and corresponding nuclear OCH (red) segmented by the CSSI algorithm after recognizing that the region segmented in (j) is the nucleolus due its small size. In (k), the starting reference cube C R of the CSSI algorithm is overlapped in yellow in the adjacent zone to the outer side of the segmented 3D nucleolar OCH in (j).  S8. CSSI performances in segmenting a generic organelle with respect to the imaging spatial resolution (i.e., its number of voxels) and the CSSI resolution factor . a Three 3D numerical cell phantoms with 15 mitochondria (green) and a spherical organelle (red) having an equivalent radius of 50 px (left), 30 px (center), and 15 px (right). b-d CSSI segmentation of the red organelles in (a) by using the resolution factors =10 px, =8 px, and =6 px, respectively. e F1 score in segmenting the red organelle in (a) by using the resolution factors =10 px, =8 px, and =6 px while changing its equivalent radius (i.e., the imaging spatial resolution) between 15 px and 50 px. f Zoom-in of the =10 px and =8 px curves in (e).