Introduction

Myelodysplastic syndromes (MDS) are heterogeneous clonal hematopoietic stem cell disorders characterized by ineffective and neoplastic hematopoiesis and dysplasia of one or more of the major hematopoietic lineages associated with a variable risk of later acute leukemia1,2,3. Although several susceptibility genes have recently been identified (e.g., DNMT3A, TET2, ASXL1, TP53, and RUNX1), the pathogenesis is not yet completely understood4,5. Therefore, diagnostic workup relies on conventional tests including a complete blood count (CBC), morphological examinations of PB smear and bone marrow (BM) aspiration and biopsy, and flow cytometry6,7. The first two tests are valuable initial diagnostic steps, being much less invasive and costly than the other examinations.

Currently, normal leukocytes can be differentiated using automated hematology analyzers equipped with optical sensors and mathematical computer-based algorithms (e.g., the Sysmex XN series)8,9. However, because the morphology of dysplastic blood cells in patients with hematological disorders is much more elaborate than that of normal cells, manual microscopic examinations remain the mainstay of diagnosis, which are time-consuming, demanding, and subjective. Thus, many industrial and academic researchers have sought to develop efficient and accurate automated diagnostic systems. Current advances in computer technology have been used to derive automated diagnostic systems for leukemia. Over the past decade, more than 20 studies have attempted to diagnose of hematological malignancies mainly acute lymphoblastic leukemia (ALL) using various mathematical algorithms to recognize and classify cell images10,11,12. This process requires several complex steps such as preprocessing, segmentation, feature extraction, and classification13. Recently, convolutional neural networks (CNNs), advanced forms of deep learning, have been used to optimize the parameters automatically, without the need for mathematical algorithms14. CNNs classify cell images more accurately than conventional neural networks or machine-learning systems14.

In this study, we first developed an automated blood cell image-recognition system using a deep learning system (DLS) powered by CNNs that simultaneously classifies 17 blood cell types and 97 morphological features of such cells. Second, we created an automated MDS diagnostic support system by combining the CNN-based image-recognition system with a form of extreme gradient boosting (XGBoost). Then, we evaluated the diagnostic system using the PB smear samples obtained from patients with MDS or aplastic anemia (AA). We chose AA for the comparison because dysmorphic cells are not often evident in PB samples of AA compared to MDS although both diseases are characterized by reticulocytopenic anemia, variable neutropenia and thrombocytopenia due to BM failure15. Our diagnostic system successfully differentiated MDS from AA with high accuracy compared to human diagnoses. Here, we described the details of how we developed this new diagnostic system of MDS.

Results

Performance of the DLS in terms of morphological classification of blood cell types

The DLS performance in terms of morphological classification of blood cell types was validated using the validation datasets generated as described in Material and Method (Table 1). Table 2 shows that the DLS cell differentiation sensitivity ranged from 93.9 to 99.8%, and the specificity from 96.0 to 100%. We compared the DLS performance with that of the DI-60, a conventional computer-based image-recognition system of automated hematology analyzer (Sysmex), and observed that the DLS was more sensitive and specific (Supplemental Table S1). Figure 1 shows the DLS confusion matrix for the 17 blood cell types, compared to the reference classification of validation dataset. The DLS tended to misclassify segmented neutrophils as band neutrophils, lymphocytes as variant lymphocytes, band neutrophils as meta-myelocytes, meta-myelocytes as myelocytes, promyelocytes as myelocytes, and large platelets as thrombocyte aggregations.

Table 1 Images in the datasets.
Table 2 Cell classification performance of the DLS.
Figure 1
figure 1

Confusion matrix for the 17 types of differentiated blood cells. The 17 blood cell types differentiations by DLS are compared to the reference classification of validation dataset. SNE, segmented neutrophil; LY, lymphocyte; MO, Monocyte; BNE, band neutrophil; EO, eosinophil; BA, basophil; MY, myelocye; MMY, Metamyelocyte; ERB, erythroblast; BL, blast; PMY, promyelocyte; LP, large platelet; ART, artifact; SMU, smudge; TAG, thrombocyte aggregation; VLY, variant lymphocyte; MEK, megakaryocyte.

To dissect such misclassifications in the confusion matrixes, we examined the internal features learned by the DLS using t-distributed Stochastic Neighbor Embedding (t-SNE)16. Figure 2 shows cell images projected from the 2,048-dimensional output of the last hidden layer of the DLS onto two dimensions. Blasts (red dots) remain in the center of the field. Three types of cells (granulocytes, lymphocytes, and monocytes) surround the blasts. Granulocytes are distributed to the left of the blasts all the way from the most differentiated segmented neutrophils (top) to the most premature promyelocytes (bottom). On the contrary, lymphocytes are located to the right of the blasts, and are distributed from premature variant lymphocytes (top) to mature lymphocytes (bottom). Eosinophils, basophils, and monocytes are found in relatively discrete locations. Some band neutrophils lie within metamyelocytes. The DLS may thus be unable to differentiate these two cell types. Megakaryocytes lie adjacent to blasts, which might compromise the accuracy of image recognition. Large platelets and platelet aggregations lie at the extreme right of the field.

Figure 2
figure 2

t-SNE visualization of the “DLS last hidden layer” representations of 17 blood cell types. The 17 types of cell clusters are colored and labeled with abbreviations of the cell types (inset).

DLS performance in terms of recognizing morphological abnormalities

Next, we explored how accurately the DLS automatically detected dysmorphic features of peripheral blood cells of each hematopoietic lineage in the validation datasets generated as described in Material and Method. Table 3 shows the sensitivity, specificity, and area under the curve (AUC) calculated from the Receiver Operatorating Characteristic (ROC) curve. The sensitivity was high (80 to 98%) except for detection of giant platelets. Representative images of dysmorphic peripheral blood cells in the validation datasets are shown in Supplementary Fig. 1.

Table 3 Classification performance of our system in terms of dysmorphic blood cells.

DLS performance in terms of the differential diagnosis of MDS and AA

Although both MDS and AA can trigger pancytopenia, dysmorphic blood cells are not often evident in AA in contrast to MDS17. In MDS, neutrophils undergo degranulation or abnormal granulation and may exhibit the pseudo-Pelger-Huet anomaly and/or hypo- or hyper-segmentation; giant neutrophils and platelets are evident17,18. To allow automated diagnosis of MDS, 114 image-pattern parameters from smears of MDS and AA patients were fed to XGBoost, which automatically analyzed the extent and nature of normal and dysmorphic images, and then diagnosed MDS or non-MDS using the test datasets.

Figure 3 shows a heat map of dysmorphic cell features based on the SHapley Additive exPlanations (SHAP) values analyzed by our system for each case (MDS: 1–26; AA: 1–11 cases). The darker the color, the more dysmorphic the cells. The rates of detection in MDS samples of abnormal neutrophil degranulation and the pseudo-Pelger-Huet anomaly, and giant platelets, were significantly higher than in AA samples. However, dysmorphic features of lymphocytes, basophils, eosinophils, and promyelocytes did not assist differentiation of MDS from AA, consistent with the diagnostic features of MDS evident in BM aspirates15.

Figure 3
figure 3

A heat map indicating the extent of blood cell dysmorphic features. Each row indicates the dysmorphic features of cells of a single case (MDS cases 1–26 and AA cases 1–11). SHAP values were calculated and used to generate the heat map. GN, neutrophils; LY, lymphocytes; EO, eosinophils; BA, basophils; MY, myelocytes; MMY, metamyelocytes; EB, erythroblasts; BL, blasts; PMY, promyelocytes; PLT, large platelets; VLY, variant lymphocytes; MK, megakaryocytes.

The sensitivity and specificity of the DLS performance in terms of the differential diagnosis of MDS and AA were 96.2 and 100%, respectively. The AUC of the ROC curve was 0.990 (Fig. 4).

Figure 4
figure 4

Differential diagnostic performance of the DLS. The AUC was used to measure performance; the maximum value is 1.

Discussion

We developed a novel MDS diagnostic support system using PB smears. The system featured a CNN-based image recognition DLS and an EGB-based decision-making algorithm, XGBoost.

The conventional computer running image-recognition system engage in algorithms for preprocessing, segmentation, feature extraction, and classification, which are similar to how images are recognized by the human eye. In these systems, many mathematical algorithms are used: (1) histogram equalization, Gaussian filtering, or median filtering for preprocessing; (2) K-means clustering or calculation of Fuzzy C-means for segmentation; (3) geometrical or shape features for feature extraction; and (4) support vector machines (SVMs), artificial neural networks, or random forests (RFs)13 for classification. However, optimization of these parameters are not straightforward because the variety of format, scaling and bit-size of algorithms and the difficulties of parameter tuning, which trigger communication mismatches between algorithms. On the contrary, neural networks perform all of these complicated tasks simultaneously, and there is no explicit need for complex mathematical models. Neural networks consist of piles of layers. Each layer is analogous to a neuron of the brain.

Recently, deep CNNs featured five convolutional layers were employed to detect ALL cells and to classify into three morphological subtypes (i.e., L1, L2, and L3, French-American-British Classification), and achieved 95–99% of sensitivity and specificity10. The performance of this CNN was superior than the ones of previous studies using mathematical algorithms such as support vector machines, the K nearest-neighbor approach, and hybrid hierarchial classifiers12,19.

Detection and classification of myeloid malignant cells including MDS requires capability to differentiate normal and abnormal morphological features in three hematopoietic lineages including myeloid cells, erythroblasts and platelets in PB smears. Therefore our CNN-based image recognition DLS featured eight convolutional layers in total to detect and classify more complicated images than the ones of ALL. Finally, our system recognized over 100 patterns in cell size and cytoplasmic morphological features, and achieved >90% sensitivity and specificity in the diagnosis of MDS compared to the human eye. But why not 100%? As shown in the t-SNE plots (Fig. 2), it might be very difficult to differentiate the cells that are continuously differentiating in a same lineage. For example, even by human eyes, it is difficult to distinctly differentiate band neutrophils from less matured metamyelocytes. However, further training may improve the DLS accuracy more effectively than human eyes with higher reproducibility.

We, then, created an MDS diagnostic support system featuring a highly trained cell image-recognition system combined with a decision-making algorithm based on XGBoost, which afforded 96% sensitivity and 100% specificity in terms of differential diagnosis of MDS and AA. These results were consistent with the recently developed automated diagnostic system of dermatological disease based on well-trained CNNs which demonstrated the comparable performance to human diagnosis20.

It is often difficult to distinguish the hypoplastic form of MDS (hMDS) from AA because both present with hypocellular BM. However, the risk of progression to acute leukemia is greater in hMDS, and differential diagnosis is important21. Although BM aspiration and biopsy examinations are essential to definite diagnosis, quantitative estimation of peripheral blood polymorphs including dysplastic features of granulocytes has been reported as a simple and valuable diagnostic tool in MDS22. Dysmorphic WBCs such as hypogranular neutrophils or pseudo-Pelger-Huet cells found in the PB are suggestive to differentiate hMDS from AA23,24.

Our work has several limitations: (1) although the accuracy of automated MDS diagnosis was over 90%, our system remains to be adjunctive in its nature since BM examination, clinical information, flow cytometric data, and genetic tests are essential for definite diagnoses of MDS7; (2) this was a single-center study with a relatively small number of samples, and the training sample patterns may have been incomplete; (3) we only used one combination of DLSs, CNNs, and XGBoost; and (4) while the infectious diseases were not studied in this study, it is important to distinguish MDS from AA with infection that can be accompanied with dysmorphic WBCs including toxic granulation, Döhle bodies and toxic vacuolation. In addition, other inflammation markers such as CRP are important to diagnose infectious diseases. Therefore, as a next step, we are planning to construct an advanced DLS trained with the extended data of serum biochemistry. It is indispensable to train this DLS with increased number of cases to cover various morphological changes of blood cells and to improve accuracy. Also, we are planning to develope a DLS to analyze images of BM samples.

The morphological approach continues to be fundamental at the beginning of the diagnostic algorithm, even the new molecular technologies including gene mutation and gene expression profiling are integrated with morphological examination in future4,5. Our approach might be applicable to develop new automated diagnostic systems for various hematological disorders.

Materials and Methods

Sample selection

The study has been approved by the the Juntendo University Hospital Medical Ethics Committee (Tokyo, Japan). As part of the approval, the ethics committee explicitly waived the need for informed consents from individual patients because all samples were de-identified in line with the Declaration of Helsinki. A total of 3,261 peripheral blood (PB) smears, including 1,165 from patients with hematological disorders, were prepared at Juntendo University Hospital (Tokyo, Japan) from 2017 to 2018. The slides were stained with May Grunwald-Giemsa using an SP-10 device (a fully automated slide-maker; Sysmex, Kobe, Japan). A total of 703,970 digitalized (preprocessed) cell images were collected with DI-60 automated digital cell image analyzer (Sysmex). The hematological disorders included MDS (n = 94), myeloproliferative neoplasms (n = 127), acute myeloid leukemia (n = 38), acute lymphoblastic leukemia (ALL, n = 27), malignant lymphoma (n = 324), multiple myeloma (n = 82) and AA (n = 42). Of all images, 695,030 were used to train the CNN-based image-recognition system, and 8,940 were used for validation. To develop an automated diagnostic system for MDS, 75 MDS and 36 AA cases were used for training. The gold standard of this study is the diagnosis by the hematopathologists in accordance with the latest guidelines17. All diagnoses were confirmed by independent hematopathologists based on clinical information, laboratory, flow cytometric, and genetic data, and BM aspiration and biopsy findings25.

Data preparation

The training datasets were prepared for the recognition of image patterns by the deep learning system (DLS). The datasets were classified into 17 cell types and 97 abnormal morphological features by two laboratory technologists board-certified in hematology and one senior hematopathologist using the morphological criteria of the Clinical and Laboratory Standards Institute (CLSI) H20-A2 guideline and the 2016 revised WHO classification of myeloid neoplasms and acute leukemia18. After accumulating the image patterns using the training datasets, the performance of the DLS was evaluated using the validation datasets that were generated for testing the DLS by two laboratory technologists board-certified in hematology and one senior hematopathologist who are different from the ones worked on the training datasets. Table 1 summarizes the types and numbers of cell images used for training and testing.

The deep convolutional neural network and training using individual cell images

To classify cells and identify morphological abnormalities simultaneously, we created a DLS-based cell image-recognition system composed of a CNN module that extracted features of preprocessed images and a classification module analyzing such features and classifying cell images into 17 cell types exhibiting some of 97 abnormal morphological characteristics (cell and nuclear size and shape, and cytoplasmic patterns). Figure 5 shows the overall structure of our image-recognition system. The “feature extraction module” is composed of two submodules. The first (upstream) submodule has three consecutive blocks, and each block follows two parallel pathways consisting of several convolutional network layers. These layer stacks optimize feature extraction from image data and output parameters to the next block. The second (downstream) submodule has eight consecutive blocks, each of which follows parallel pathways, one of which consists of a series of convolutional layers, whereas the other lacks convolutional components and is termed a residual network that functions as a buffer to avoid saturation of the system.

Figure 5
figure 5

Schematic of the cell image-recognition system featuring convolutional neural networks. The deep learning system consists of three principal blocks, each of which contains piles of CNNs. The colors indicate different components of the CNNs.

Each layer plays a different role: Separable Convolution (a specific type of convolutional layer; Conv 2D), Batch Normalization (BN), and Activation (ACT). Separable Convolution is a variant of regular convolution, in that spatial convolution is performed independently by each channel26. Conv 2D is a key component of neural networks that optimize parameters used to extract features and then processes the images to form “feature maps”27,28. BN normalizes input data distribution29. ACT follows, using a Rectified Linear Unit (ReLU)30. The first submodule was connected to the second to create feature maps. Conv 2D was bypassed in the second module to avoid unwanted deep layer saturation; this effectively calculates weights via back-propagation. The architecture was implemented using Keras31 and Tensorflow32.

Extreme Gradient Boosting (EGB) to create a diagnostic algorithm for MDS

Next, we developed a system differentiating MDS from AA using cell image features extracted by the CNNs. To this end, we employed a XGBoost that uses a large ensemble of weak predictive models (such as decision trees) to recognize and classify the dysmorphic features/patterns of various blood cells33. XGBoost is one of the fastest and most efficient algorithms identifying optimal decision-making parameters34. First, we fed XGBoost with various cell image parameters, including the 17 cell classifications and 97 dysmorphic features identified by the CNN-based image-recognition algorithm. Then, we trained XGBoost using smears from the 75 MDS and 36 AA patients; XGBoost analyzed and remembered diagnostic cell patterns and dysmorphic features. Next, we used the 26 MDS and 11 AA samples to test the system. To determine how XGBoost made diagnostic decisions, the SHAP value of dysmorphic extents of various cell types were plotted on heat maps35.