Label-free identification of non-activated lymphocytes using three-dimensional refractive index tomography and machine learning

Identification of lymphocyte cell types is crucial for understanding their pathophysiologic roles in human diseases. Current methods for discriminating lymphocyte cell types primarily relies on labelling techniques with magnetic beads or fluorescence agents, which take time and have costs for sample preparation and may also have a potential risk of altering cellular functions. Here, we present label-free identification of non-activated lymphocyte subtypes using refractive index tomography. From the measurements of three-dimensional refractive index maps of individual lymphocytes, the morphological and biochemical properties of the lymphocytes are quantitatively retrieved. Machine learning methods establish an optimized classification model using the retrieved quantitative characteristics of the lymphocytes to identify lymphocyte subtypes at the individual cell level. We show that our approach enables label-free identification of three lymphocyte cell types (B, CD4+ T, and CD8+ T lymphocytes) with high specificity and sensitivity. The present method will be a versatile tool for investigating the pathophysiological roles of lymphocytes in various diseases including cancers, autoimmune diseases, and virus infections.

To understand the roles of different types of lymphocytes, several methods based on labelling techniques have been developed to identify and discriminate lymphocyte cell types. Because different kinds of lymphocytes have very similar cellular morphology such as a large nucleus with small cytosolic regions and round shapes, conventional optical methods such as bright-field microscopy or phase contrast microscopy have limited ability in classifying lymphocyte cell types 10 .
To overcome this, a specific surface membrane proteins, known as surface markers, are recognized and tagged with magnetic beads or fluorescence molecules via antigen-antibody binding. And then, specific types of lymphocytes are identified and separated by magnetic forces or fluorescence signals 11 . Targeting surface markers is a precise and efficient manner to determine the cell types; however, labelling methods have potential risks of altering cellular functions by modifying membrane protein structures. Moreover, labelling methods have limitations such as they cannot simultaneously identify multiple cell types due to the limited numbers of distinguishable labelling agents 12 .
Label-free approaches such as mass spectroscopy 12 and Raman spectroscopy 10 have also been introduced to overcome the limitations of labelling methods because these spectroscopic methods exploit intrinsic biochemical properties of lymphocytes. Mass spectroscopy measures cellular biochemical properties which enable the profiling of lymphocyte proteins as well as the identification of lymphocyte subtypes. However, it has a limitation in live-cell analysis due to the homogenization process of the cells. Raman spectroscopy measures molecular vibrations and characterizes biochemical properties of a sample. Raman spectroscopy permits label-free live-cell analysis of lymphocytes with high accuracy; however, it requires a bulky optical system and long acquisition time (several seconds) to measure 2-D Raman signals, which limits its broad use in clinics.
Here, we present a label-free method to identify lymphocyte cell types by exploiting optical diffraction tomography (ODT) and machine learning. ODT is a label-free imaging technique that measures a three-dimension (3-D) refractive index (RI) tomogram of a sample which quantitatively provides morphological and biochemical information 13,14 . Previously, ODT has been widely used to study various biological samples including red blood cells [15][16][17][18][19][20][21][22] , white blood cells (WBC) 23 36 , and hair 37 . In our previous study, we reported that ODT enables the quantitative analysis of WBCs including lymphocytes and macrophages 23 . We demonstrated that lymphocytes and macrophages could be discriminated based on quantitative morphological and biochemical properties; however, we were unable to identify lymphocyte cell types due to their indistinguishable cellular morphology and biochemical characteristics.
In the present study, to find morphological and biochemical differences in indistinguishable lymphocyte cell types, we used machine learning techniques. Machine learning methods construct classification models by combining multiple features in a data-driven manner, not a hypothesis-driven manner. This approach is especially powerful for high-dimensional data that are extremely difficult to manually process by a human due to the complicity and large size 38 . In biomedical research fields, machine learning methods have been widely used to solve complex biological problems: identification of bacterial species 39 , discrimination of WBC types 40,41 , and investigation of pathophysiologic conditions [42][43][44] . However, there has been no applications that use both 3-D RI tomography and machine learning for the purpose of biomedical studies. Here, we exploit machine learning techniques to establish classification models using the quantitative morphological and biochemical information of lymphocytes which is retrieved from the 3-D RI tomograms of individual cells and show that established classification models enable the identification of three lymphocyte subtypes (B, CD4+ T, and CD8+ T lymphocytes). The present approach is a versatile tool for determining lymphocyte cell types in a label-free manner.

Results
The overall procedure for label-free identification of lymphocytes is summarised in Fig. 1. The present method involves three steps: (i) measurement of the 3-D RI tomograms of individual lymphocytes (Fig. 1a), (ii) establishment of a statistical classification model using the quantitative biochemical and morphological features of the lymphocytes obtained from the 3-D RI tomograms (Fig. 1b), and (iii) identification of the lymphocyte types using the established classifier at a single cell level (Fig. 1c). Figure 1a shows the procedures for reconstructing the 3-D RI tomograms of the lymphocytes. To reconstruct the 3-D RI tomograms, multiple 2-D holograms of a cell are measured at various angles of illuminations using an interferometric microscope (Fig. 1d). A coherent laser beam is split into two arms by a beam splitter. One arm passes through a sample, and then, the diffracted light from the sample is projected onto a camera plane with a microscope. At the camera plane, the sample beam interferes with the other arm and generates a spatially modulated hologram. The angle of the beam impinging onto the sample is controlled by a dual-axis galvanomirror. From the measured holograms, complex optical fields consisting of both the amplitude and quantitative phase images are retrieved using a field retrieval algorithm (See Methods).
Then, a 3-D RI tomogram of a lymphocyte is reconstructed using the retrieved multiple optical amplitude and phase information via an optical diffraction tomography algorithm 14,45 .
To test whether 3-D RI tomograms of lymphocytes enable the identification of their cell types, three types of lymphocytes (B cell, CD4+ T, and CD8+ T lymphocytes) were obtained from mice peripheral blood through specific surface marker staining and flow cytometry (See Methods) prior to measuring on the 3-D RI tomograms.  pg for the B cells, CD4+ and CD8+ T cells, respectively (Fig. 3e). There were significant differences in the total cellular dry mass among the cell types (p-values < 0.001). The B cells had a smaller cellular mass compared to the T cell subsets.
Moreover, the CD8+ T cells were statistically heavier than that of the CD4+ T cells. These results reveal that the biochemical characteristics of each lymphocyte type have statistical differences. However, even though the quantitative morphological and biochemical analyses show statistical differences among the lymphocyte population, a single lymphocyte cannot be identified by its cell type using a single retrieved quantitative characteristic because of cell-to-cell variations.
To identify the lymphocyte types at an individual cell level, machine learning methods were exploited to construct classification models using multiple quantitative characteristics of the lymphocytes. To demonstrate a proof of concept, a training-and-identification method, called statistical classification or supervised machine learning, was used. Seventy percent of the total lymphocytes were randomly chosen and systematically analysed to extract the unique characteristics for each lymphocyte type (Fig. 1b), and then, the cell types of the remaining lymphocytes were identified with the extracted features (Fig. 1c). For the statistical model, a k-NN algorithm was used 46,47 , which has been widely used for classification purposes in various research fields. The k-NN algorithm consists of a supervised machine learning which establishes a classifier based on quantitative characteristics of the training lymphocytes from known cell types. In this study, a four nearest neighbour (k-NN, k = 4) algorithm was used to establish the classification models.
To exploit the structural and biochemical information of the intracellular organelles of the lymphocyte as features of the k-NN (k = 4) algorithm, the quantitative characteristics of the lymphocytes were calculated at various threshold RI values by increasing the threshold from 1.34 to 1.378 with an increment of 0.002. We found that the 3-D RI distribution, which had higher RI values than the threshold RI, tends to reveal information about the intracellular components as the threshold RI values increase (Supplementary Fig. 1). All combinations of structural and biochemical information obtained at a single RI threshold or two different RI thresholds are tested to establish an optimized classifier with cross-validation, and then, a classifier, which shows the best performance, is selected. Then, the remaining lymphocytes are identified using the established classifier, and the identification accuracy (sensitivity and specificity) is measured. Figure 4 and Table 1 show the overall cross-validation accuracy, sensitivity (true positive results overall positive inputs), and specificity (true negative results overall negative inputs) of the training and test results. We performed statistical classification on three different combinations of lymphocyte types: (i) B and T lymphocytes, (ii) T lymphocyte subsets (CD4+ and CD8+), and (iii) all three types of lymphocytes. First, the T cell subsets were considered as one T cell type, and the classification model is optimized to have the best overall cross-validation accuracy. Features used for establishing the classifier were the surface area, sphericity, and dry mass (RI threshold: 1.342) and all the quantitative information (RI threshold: 1.368). The overall accuracy of the optimized classifier for identifying the B and T cells was 93.15%, and the test accuracy was 89.81% (Fig. 4a). Second, the CD4+ and CD8+ T cells were analysed. To optimize the statistical classification model, the surface area and sphericity (RI threshold: 1.342 and 1.362) of the CD4+ and CD8+ T cells, respectively, were used as features. The classification accuracy was 87.41% and 84.38% for the training and test, respectively (Fig. 4b). Lastly, the statistical classification was performed to identify the three types of lymphocytes. The classification model was optimized with the surface area, dry mass density (RI threshold: 1.340), and sphericity, dry mass density, dry mass (RI threshold: 1.370) as features. The accuracy of the training and test were 80.65% and 75.93%, respectively. These results indicate that machine learning enables the identification of lymphocyte cell types with an accuracy of over 75%.

Discussion
We demonstrated label-free identification of lymphocyte cell types at a single cell level using ODT and statistical classification. ODT provides quantitative morphological and biochemical information on the lymphocytes by measuring their 3-D RI distribution. We found that there were significant differences in the quantitative characteristics among the lymphocyte cell population; however, individual lymphocytes in different lymphocyte cell types are indistinguishable due to cell-to-cell variations. To overcome this limitation, the k-NN (k = 4) algorithm was exploited as a statistical classification method to establish classification models using the quantitative characteristics of the lymphocytes. The optimized classification models can discriminate B and T cells with high accuracy. In addition, the T cell subsets were identified using the k-NN (k = 4) algorithm with an overall accuracy of over 80%. Moreover, the three types of lymphocytes were identified using the classification model with an overall accuracy of over 75%.
The identification results show that the classification models more precisely discriminate between B and T lymphocytes rather than the T cell subsets, which implies that differences in cellular morphology and biochemical properties between B and T cells are more distinct than those between the CD4+ and CD8+ T cells. The results are consistent with previous knowledge of the lymphocyte-differentiation pathway 48 . B and T lymphocytes originate from hematopoietic stem cells and then mature in different organs. Thus, lymphocytes have identical cellular phenotypes such as one large nucleus and spherical shapes; however, B and T lymphocytes have entirely different cellular functions. Even though our method established the classifiers by optimizing features without biological relevance, the machine learning algorithm found distinct differences in the morphological and biochemical properties among the lymphocyte subtypes.
The present method combines 3-D RI tomography and machine learning, which provides several advantages. First, the present method enables label-free identification of lymphocyte subtypes which cannot be achieved by optical microscopic techniques without using fluorescent methods because of similar phenotypes in lymphocyte subtypes. ODT measures the 3-D RI distribution of the lymphocytes, and machine learning methods find significant differences among the lymphocyte subsets to identify their cell types. Second, the present method has a simple and cost-effective optical setup compared to flow cytometry or other label-free techniques such as Raman spectroscopy. Recently, a 3-D holographic microscope has become commercially available, which simplifies the optical system and reduces the required time for measuring a 3-D RI tomogram by exploiting a digital micromirror device 49 . Thus, the present method can be easily transferred to basic research facilities and clinics. Lastly, there is no limitation in applying the present method to discriminate other types of cells including WBCs, cancer cells, neurons, and glial cells. Because ODT has been widely used to measure various biological samples, the present approach can be readily used to classify various cell types.
There are several points to be improved in future works. The present study proves a proof of concept; however, the overall accuracies for identifying each lymphocyte type should be enhanced. Several classification algorithms including two k-NN algorithms (k = 4 and k = 6), linear discrimination, quadratic discrimination, naïve Bayes, and decision tree were tested, and the k-NN (k = 4) algorithm showed the best performance for identifying the lymphocyte types (Supplement Fig. 2).
However, these classifiers are at the basic level of machine learning methods. Recently, deep learning methods have been introduced and widely sued in various research fields including image recognition 50 , speech recognition 51 , and biomedical research [52][53][54] . Thus, the identification accuracy could be more improved by exploiting ODT and deep learning methods. In addition, the speed of measuring the 3-D RI tomograms could be improved. For single lymphocyte imaging, we found and measured lymphocytes sparsely placed on a cover glass which limits the speed of the tomogram measurements. We expect that microfluidic approaches might be a solution to increase the speed of the 3-D RI tomograms of lymphocytes and become a practical method for discriminating cell types.
In summary, we envision that ODT combined with machine learning will be a useful tool in biomedical research. ODT quantitatively provides the morphological and biochemical characteristics of samples, and machine learning enables the classification of cell types using the measured quantitative information. The present method can be further used to study immunology, cancer, and neuroscience.

Mice
C57BL/6J mice (gender and age-matched, 6-8 weeks) were purchased from Daehan Biolink (Korea). Animal care and experimental procedures were performed under approval from the Animal Care Committee of KAIST (KA2014-01 and KA2015-03). All the experiments in this study were carried out in accordance with the approved guidelines

Flow cytometry and Lymphocyte Sorting
White blood cells were isolated from the blood harvested from the heart of mice. Erythrocytes were removed by ACK lysis.

Measurement of 3-D refractive index tomograms
To reconstruct the 3-D RI tomograms of lymphocytes, a Mach-Zehnder interferometric microscope was used 55 (Fig. 2). A laser beam from a diode-pumped solid-state laser (λ = 532 nm, 100 mW, Shanghai Dream Laser Co., Shanghai, China) is split into two arms using a beam splitter. One arm illuminates a sample with various illumination angles ranging from -60 to 60 in air at the sample plane with respect to the optic axis, which is systematically controlled with a dual-axis galvanomirror (GVS012, Thorlabs, Newton, NJ, USA), and the other is used as a reference beam. The sample is placed between a condenser lens (UPLSAPO Water 60, the numerical aperture (NA) = 1.2, Olympus, Japan) and an objective lens (PLAPON Oil 100, NA = 1.4, Olympus, Japan). The diffracted light from the sample is then collected by the objective lens and projected onto the camera plane. At the camera plane, the sample beam interferes with the reference beam, generating spatially modulated holograms, which are then captured by a CMOS camera (1024 PCI, Photron USA Inc., San Diego, CA, USA). For reconstructing a 3-D RI tomogram, a total of 300 holograms of a sample are measured by changing the angle of illuminations which takes less than 1 sec. Then, the optical field information (amplitude and phase) of the measured holograms are retrieved using a field retrieval algorithm based on Fourier transformation 56,57 . From the retrieved multiple amplitude and phase information, a 3-D RI tomogram is reconstructed using an optical diffraction tomography algorithm. An iterative regularization algorithm with a non-negatively constraint was used to fill the missing cone information which results from the limited NA of the condenser and objective lenses 58 . Details on reconstructing 3-D RI tomograms can be found elsewhere 15,59 .

Quantitative characterization of the structural and biochemical information of the lymphocytes
Quantitative structural and biochemical information of the lymphocytes were calculated from the measured 3-D RI tomograms. To calculate the cellular volume V and surface area S of a lymphocyte, the voxels of a 3-D RI tomogram of a lymphocyte, which had higher RI values compared to the threshold RI value, were selected. From the selected voxels, the cellular volume and surface area were calculated corresponding to the numbers of voxels and cellular boundaries, respectively. Sphericity, which is a dimensionless parameter and indicates the roundness of a lymphocyte, was obtained from the measured cellular volume and surface area as follows: Sphericity =  1/3 (6V) 2/3 /S. The biochemical information (dry mass density and cellular dry mass) were obtained from the RI values due to the linear relation between an RI value and the local concentration of non-aqueous molecules (i.e., proteins, lipids, and nucleic acids inside cells). RI values were converted to the concentration of non-aqueous molecules (mostly proteins) with the following relationship: n = n 0 +C, where n is an RI value of a voxel; n 0 is an RI value of the background, and  is the refractive index increment (RII) of proteins. Because most proteins have similar RII values, we used a RII value of 0.2 mL/g in this study. The total dry mass of a lymphocyte was calculated by simply integrating the dry mass density over the cellular volume. Details on calculating the quantitative information of a sample can be found elsewhere 23,25 .

Image processing and statistical analysis
Image processing was performed with Matlab_R2014b and ImageJ. Statistical analysis was done by the GraphPad Prism software. The RI isosurfaces were rendered by commercial software (TomoStudio, Tomocube Inc., Korea).

COMPETING FINANCIAL INTERESTS
Prof. Park has financial interests in Tomocube Inc., a company that commercializes optical diffraction tomograp hy and quantitative phase imaging instruments and is one of the sponsors of the work.