High-accuracy morphological identification of bone marrow cells using deep learning-based Morphogo system

Accurate identification and classification of bone marrow (BM) nucleated cell morphology are crucial for the diagnosis of hematological diseases. However, the subjective and time-consuming nature of manual identification by pathologists hinders prompt diagnosis and patient treatment. To address this issue, we developed Morphogo, a convolutional neural network-based system for morphological examination. Morphogo was trained using a vast dataset of over 2.8 million BM nucleated cell images. Its performance was evaluated using 508 BM cases that were categorized into five groups based on the degree of morphological abnormalities, comprising a total of 385,207 BM nucleated cells. The results demonstrated Morphogo’s ability to identify over 25 different types of BM nucleated cells, achieving a sensitivity of 80.95%, specificity of 99.48%, positive predictive value of 76.49%, negative predictive value of 99.44%, and an overall accuracy of 99.01%. In most groups, Morphogo cell analysis and Pathologists' proofreading showed high intragroup correlation coefficients for granulocytes, erythrocytes, lymphocytes, monocytes, and plasma cells. These findings further validate the practical applicability of the Morphogo system in clinical practice and emphasize its value in assisting pathologists in diagnosing blood disorders.

between acute myeloid leukemia (AML) and healthy cells and to forecast the state of the Nucleophosmin 1 (NPM1) mutation, the most prevalent mutation in AML.However, this system requires the manual selection of areas for disease classification as judged by the pathologist, making the results potentially erroneous 20 .To detect acute lymphoblastic leukemia (ALL) in microscopic blood pictures, Atteia et al. are optimized using the Bayesian optimization technique.On a holdout test set, the best CNN model determined by the Bayesian optimization approach for ALL detection recorded 100% accuracy, specificity, and sensitivity 21 .
The Morphogo system we have developed overcomes many of these limitations, enabling efficient and accurate identification and classification of BM nucleated cells.According to our previous research, The Morphogo system integrates digital imaging of BM smear with artificial intelligence-based automatic BM cell differential count and has shown high accuracies in identifying various cell types, including granulocytic cells, erythroid cells, lymphoid cells, plasma cells, and monocytic cells, and even metastatic cancer cells 6,8,22,23 .We are committed to further improving the Morphogo system to enhance its performance and clinical value in assisting with the diagnosis of hematologic diseases.

Methods
Sources and classification of samples.This was a retrospective study.508 BM cases were collected from Kingmed Diagnostics from October 2021 to December 2021.Following the recommendations of pathologists, the BM smears were divided into five groups, denoted G1-G5, based on the extent of pathological and cell morphological changes.The diseases grouped within each category are as follows: G1: Relatively normal cases; G2: Disorders with quantitative abnormalities primarily affecting mature cells, including anemia, bleeding/thrombosis, myeloproliferative neoplasms (MPN), chronic myeloid leukemia (CML); G3: Disorder follow-up cases; G4: Malignant hematological disorders characterized by a substantial proliferation of blasts and immature cells, including acute leukemia (AL), Multiple myeloma (MM); G5: Disorders associated with a higher occurrence of abnormal cells, including megaloblastic anemia (MgA), myelodysplastic syndrome (MDS), Chronic lymphoproliferative disease (CLPD).All BM smears underwent appropriate staining using the Wright-Giemsa method, ensuring the quality aligned with the recommendation of the nation guide to clinical laboratory procedures (NGCLP, fourth edition) or the international council for Standardization in Hematology (ISH) 8 .The study was approved by the Ethics Committee of Guangzhou Kingland Medical Laboratory Center.The detailed information of the enrolled BM cases was listed in Table 1.

System workflow.
Morphogo system is a CNN-based Artificial Intelligence (AI) system developed by Hangzhou Zhiwei Information and Technology Ltd that is used to perform a differential count of BM nucleated cells automatically.
The workflow is as follows: (1) The Morphogo system initiates an automated scan of the BM smear using a 40 × objective lens, capturing a whole slide image (WSI) in the process.This enables the system to count megakaryocytes and identify the adaptive area for cell analysis.(2) Subsequently, the system switches to a 100 × objective lens to capture images of the designated area.Using CNN, the system identifies BM nucleated cells within this area and performs a differential cell count until a specified number of cells are obtained.(3) Before finalizing and releasing the cell morphology report, the data undergoes review by a pathologist.(Fig. 1).myelocyte, neutrophilic metamyelocyte, band neutrophil, segmented neutrophil, eosinophilic myelocyte, eosinophilic metamyelocyte, band eosinophil, segmented eosinophil, basophil, monoblast, promonocyte, monocyte, lymphoblast, prolymphocyte, mature lymphocyte, plasmablast, immature plasma, plasma cell and others including smudge cell, histocyte, and mast cell according to WHO classification.Cell classification performance was evaluated in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy 14,24 .Accurately identifying individual morphological categories can be challenging, particularly when closely related categories exhibit morphological similarities.Recognizing this uncertainty in the morphological identification of BM nucleated cells, we incorporate the concept of tolerance classes, wherein certain mispredictions by the CNN model are deemed acceptable even if they differ from the precise labels provided by pathologists.This consideration was called tolerance classes 25 .The presence of tolerance classes is illustrated in Fig. 2, where the light blue color indicates tolerable mix-ups.For example, the confusion between myelocyte and promonocyte falls within the realm of tolerance.furthermore, we collected and compared the results of pathologists' proofreading of all BM smears with the output of the Morphogo system, using kappa values as a metric to assess the agreement between the two approaches in disease diagnosis.

Establishment of algorithms.
In the process of Morphogo scanning and analyzing BM smears, intelligent algorithms play an important role.There are several key algorithms involved in this process.The first is the slide scanning area algorithm, which extracts the slide area to be observed by mimicking the human task-based visual object attention mechanism to determine the 40 × scanning coverage.The second is the auto-focal plane algorithm.When the camera rapidly captures more than 100 images with varying sharpness at different object distances, the Sobel operator is applied to extract the gradient values in different directions of the images.By quantifying image clarity using a dedicated function, the algorithm identifies the clearest regions within each image and an image fusion algorithm is then employed to merge these regions, ensuring the best clarity for every nucleated cell within the field of view.Then, the 40 × full-slide assembling algorithm is utilized to introduce feature changes while maintaining a consistent scale, and the key points of the image are identified by the Gaussian differential function, and the key points are matched based on Ransac algorithm, achieving seamless assembly of the image and generating a WSI.Once the WSI is obtained, an area selection algorithm is used to select an optimal area for 100 × cell imaging.In 100 × cell images, a cell segmentation method based on saturation clustering is employed to accurately separate and locate the nucleated cells for differential count.Finally, the classification of BM nucleated cells is realized by a deep learning algorithm.This algorithm utilizes expert-labeled cell images and incorporates different types of cell morphological characteristics.By leveraging and the updated big data platform, which provides a continually expanding dataset, the algorithm achieves accurate classification and analysis of BM nucleated cells.
Training of algorithm.The Morphogo system, which has been trained by more than 2.8 million BM nucleated cells, has now developed and refined to the point where it can automatically scan and analyze BM smears in less than 10 min while detecting more than 35 different types of nucleated cells (Table 2).The training of the algorithm was run on a server equipped with Intel Core i9 10, 900X, 16G × 4 ADATA DDR4, NVIDIA GeForce RTX 2080 Ti cards, and CUDA Version 10.2.The optimal algorithm for cell categorization was obtained after several training sessions.Subsequently, 385,207 BM cell images in this paper were used as validation datasets.To interpret the correlation, the r-value is as follows: r less than 0.09 was no correlation; 0.1-0.3 was a weak correlation; 0.3-0.5 a was a moderate correlation.0.5-1.0 was a high correlation 26 .The relationship between K value and consistency is as follows: K = 0-0.20,extremely weak consistent; K = 0.21-0.40,weak consistent; K = 0.41-0.60,moderately consistent; K = 0.61-0.80,strongly consistent, and K = 0.81-1.0,almost perfect consistent 27 .Unless otherwise indicated, all data were displayed as mean and standard deviation (x̅ ± s) and analyzed by two-tailed Student's t-test.p < 0.05 were considered statistically significant differences.
Statement.All of the above methods were performed by the relevant guidelines and regulations.
Ethical approval.This study was approved by the Ethics Committee of Guangzhou Kingmed Diagnostics Medical Laboratory Center.Because abandoned samples of routine clinical detections were collected and clinical case information was used, the Ethics Committee of Guangzhou Kingmed Diagnostics Medical Laboratory Center therefore has approved the application for performing the study with the exemption of informed consent from all participants.

Results
Highly accurate classification of BM nucleated cells by Morphogo system.The high-resolution digital images of BM nucleated cells from the ROI were acquired using the Morphogo system.These cell images were categorized into 25 categories (Fig. 3).Cell classification results predicted by the Morphogo system and annotated by pathologists were shown in a confusion matrix (Fig. 2).The dataset consisted of 385,207 single-cell images.The row displayed cell classification results from the Morphogo system, and the column showed results from pathologists' proofreading.The dark blue pane located diagonally illustrated the number of nucleated cells classified by the Morphogo system which were entirely consistent with pathologists' proofreading.The white pane represented cells that were classified as different types by the Morphogo system and pathologists proofreading.Cell numbers shown in light blue panes represented cells that were easily confused either between different maturing stages within the same lineage or between morphologically related cell types, so their misclassification was considered tolerable.
To evaluate the cell classification performance of the Morphogo system under different pathological conditions, the Morphogo system was applied to patient cases with more than 14 types of hematological diseases.The  3. The sensitivity in the classification of BM nucleated cells by the Morphogo system was an average value of 80.95%.The Morphogo system exhibited a sensitivity of more than 95% in the identification of 9 categories of BM nucleated cells.For specificity, the test sample yielded an average of 99.48% for all classes of BM nucleated cells.The value of PPV varied greatly in different classes of BM nucleated cells, ranging from 30.45% to 99.69%, with an average value of 76.49%.The Morphogo system showed a more than 95% PPV value among Neutrophilic metamyelocytes, Band neutrophils, Segmented neutrophils, Intermediate erythroblasts, Monocytes, and others.The average value of the NPV was more than 99%, ranging from 95.43 to 100.00%.And the NPVs of eosinophilic metamyelocyte, band eosinophil, and plasmablast ahead of the other cells have a value of 100.00%.The Morphogo system performed a high accuracy in the classification of BM nucleated cells by 95.55-99.98%,with an average value of 99.01%.Therefore, the results of our study showed that the Morphogo system had high sensitivity, specificity, PPV, NPV, and accuracy in the classification and counting of BM nucleated cells.

The Morphogo system has high application value in the diagnosis of hematological diseases.
To further verify the application value of the Morphogo system in the diagnosis of hematological diseases, the diagnoses made based on the Morphogo system were compared to the pathologists proofreading.The evaluation was made for each sample group (G1-G5) in terms of intraclass correlation coefficient (ICC) and 95% CI.As shown in Table 5, except for the progenitors, ICC between the two different methods was high for granulocytes, erythrocytes, lymphocytes, monocytes, and plasma cells in the G1, G2, G3, and G5 groups (ICC ≥ 0.818, P < 0.01), and slightly lower for G4.Based on these results, the diagnosis results of the Morphogo system for most hematological diseases should be correct.www.nature.com/scientificreports/ The Morphogo system automatically records the time it takes to scan BM smears and identify BM cells.Morphogo system can complete automatic scanning continuously, and efficiently, with a success rate of 99.4%.The average time of a single slide scan is 7:46 (min), and most of the slide scanning time is concentrated in 5-9 min.The Morphogo system takes 7.46 ± 0.002 min/sheet to identify and count BM cells (Table 6).These results suggest that the Morphogo system can assist in the artificial diagnosis of hematologic diseases, which greatly saves time.

Discussion
One of the most challenging steps in the workup of diagnosis of blood diseases is the morphological classification of BM nucleated cells, and the effectiveness of the classifier determines its utility in blood disorder diagnostics.CNN models, currently the leading classification framework, have shown superior performance compared to manual cellular morphological feature detection 8,25,28 in recognizing and classifying diverse medical images.Our results, obtained using Morphogo, a cell morphology analysis system created using CNN models, indicate that rapid advancements in artificial intelligence will enable automated hematologic disease screening systems to realize their full potential.
To enhance the CNN's ability to discern potential relationships between BM nucleated cells during the learning process, we trained the CNN on the discriminative features of BM nucleated cells using 2.3 million BM cell images.We then tested the trained model on over 0.5 million cell images collected from various hospitals.This extensive database is beyond the reach of most models.The Morphogo system can now identify more than 35 classes of BM nucleated cells, including certain pathological cell types, and a few non-hematopoietic cells.Our results showed that the Morphogo system achieves high sensitivity, specificity, PPV, NPV, and accuracy in the classification and counting of 25 classes of BM nucleated cells.Moreover, the Morphogo system's cell differential results were in substantial agreement with those of pathologists' proofreading.Furthermore, the Morphogo system has the capability to automatically scan, identify and count BM nucleated cells, with an average processing time of 7.46 min.This indicates a substantial potential for the Morphogo system to enhance the efficiency of BM cell morphology analysis.
The study provided pathologists with a potential application of AI in the morphology examination of BM smears.However, as previous research has reported, even experienced pathologists find it challenging to identify small differences between cells with similar morphological characteristics are difficult to identify 25 .For example,  www.nature.com/scientificreports/ a promonocyte is often misidentified as a monocyte.Both manual counting and smart counting are affected by staining differences, and increasing the training data does not significantly improve accuracy 29 .The current Morphogo system cannot accurately distinguish subtle differences between morphologically similar cells.This limitation may explain why the sensitivity and PPV performance were not satisfactory in the identification of promonocytes.Furthermore, the image quality of BM nucleated cells depends on several factors such as the quality of BM smear preparation, the pathological condition, and the imaging process 13 .These factors can contribute to inaccuracies in BM cell identification.The morphology of blasts in AL of G4 is more uniform, while in MDS of G5, blasts tend to be polymorphic and malformed 13 .Consequently, blasts are easier to be identified and classified in AL, and difficult to be identified in MDS, which might be the cause of the higher misdiagnosis rate in some cases when using the Morphogo system compared to pathologists' manual review.However, due to the large number of BM samples processed daily and the laborious and time-consuming nature of BM cell differential counting, some laboratories only count 100-200 cells in each BM smear.By utilizing the Morphogo system, they can review AI-based cell differential count results on a computer screen, dramatically improving the efficiency of laboratory work.The Morphogo system can analyze a larger number of cells in a shorter time, allowing pathologists to review more cells and avoid misdiagnosing critical morphological changes, ultimately reducing the misdiagnosis rate.Furthermore, the Morphogo system provides a standardized and digital approach to cell differential counting, enabling more reliable and repeatable assessment of morphology, and enhancing the overall quality control of BM morphology assessment.It also facilitates better comparison and communication among technicians and pathologists, ultimately leading to more effective patient care.This study employed a single-center method, where all BM smears were prepared in the same laboratory and digitally processed.The performance evaluation of the Morphogo system focused on identifying BM nucleated cells in common hemato-pathological conditions, and the dataset reasonably reflects the morphological changes of most cell types.However, this study still had some limitations.Firstly, it was limited to 14 common diseases, and the number of cases was insufficient to determine whether the Morphogo system's AI performance would be consistent across all common hematopathological diseases and rare conditions.Secondly, efforts should be made to minimize the impact of staining variations on categorization strategies.Last but not least, we did not specifically collect samples of cells with dysplastic abnormalities during the initial development of the algorithm,

Figure 1 .
Figure 1.Workflow design for Morphogo analysis and pathologist review.

Figure 2 .
Figure 2. The summary of cell classification results obtained by Morphogo pre-classification and pathologists' proofreading.The confusion matrix displays the count of cell images within each of the 25 morphological categories of BM nucleated cells.Rows represent the preliminary classifications by the Morphogo system, while columns reflect the pathologists' review.Diagonal entries in the matrix indicate the instances where the Morphogo system's classification aligns with the pathologists' review.Mix-ups that are considered tolerable are highlighted in light blue.

Figure 3 .
Figure 3. Sample images of BM cells classified by Morphogo.

Figure 4 .
Figure 4.The correlation analysis of Morphogo pre-classification and pathologists' proofreading.(A)-(Y) shows the scatter plot of linear regression lines of the percentage of BM cells after paired counting of BM smears in 508 patients.

Table 1 .
To evaluate the cell classification performance of the Morphogo system in different hematological diseases, the BM nucleated cells were categorized into 25 categories: proerythroblast, early erythroblast, intermediate erythroblast, late erythroblast, myeloblast, promyelocyte, neutrophilic Grouping and basic information of BM smear samples.MPN myeloproliferative neoplasms, CML chronic myeloid leukemia, AL acute leukemia, MM multiple myeloma, MgA megaloblastic anemia, MDS myelodysplastic syndrome, CLPD chronic lymphoproliferative disease.

Table 2 .
Classes of BM cells pre-classified by Morphogo.