Fast and accurate automated recognition of the dominant cells from fecal images based on Faster R-CNN

Fecal samples can easily be collected and are representative of a person’s current health state; therefore, the demand for routine fecal examination has increased sharply. However, manual operation may pollute the samples, and low efficiency limits the general examination speed; therefore, automatic analysis is needed. Nevertheless, recognition exhaustion time and accuracy remain major challenges in automatic testing. Here, we introduce a fast and efficient cell-detection algorithm based on the Faster-R-CNN technique: the Resnet-152 convolutional neural network architecture. Additionally, a region proposal network and a network combined with principal component analysis are proposed for cell location and recognition in microscopic images. Our algorithm achieved a mean average precision of 84% and a 723 ms detection time per sample for 40,560 fecal images. Thus, this approach may provide a solid theoretical basis for real-time detection in routine clinical examinations while accelerating the process to satisfy increasing demand.

Fecal sample collection. In total, 676 positive samples were collected from the Fourth Affiliated Hospital of Nanchang University. These samples were diluted, stirred, allowed to stand and finally sent to a flow cell. To observe a clear sample image, an OLYMPUS CX31 was used in the optical system as the basic optical structure with a 40 × objective lens [numerical aperture (NA): 0.65, material distance: 0.6 mm]. An EXCCD01400KMA CCD camera was used to capture images with 6.45 µm resolution, and a standard halogen lamp was chosen for illumination. Ten to 15 images were collected from each subject in different visual fields. The size of the collected images was 1600 × 1200. Annotation of each image was conducted manually as the ground truth. The location and size of (RBCs), white blood cells (WBCs), pyocytes (PYOs), and mildews (Mids) were recorded according to the image analysis. Only the standard cell structure was annotated from the images, and the defocused image was not marked to reduce false detection of impurities. A total of 8785 images with stylized components were collected. Training a on a small number of images can affect the test performance of a model. Therefore, to reduce the effect of overfitting, data augmentation was performed using random vertical and horizontal flipping and random contrast and saturation adjustments.
Proposal. Four main elements must be identified during routine fecal examination: RBCs, WBCs, PYOs, and Mids. Other components, such as calcium oxalate crystals, starch granules, pollen, plant cells, plant fibers and food residues, are classified as impurities with less clinical significance. For details, please see Fig. 1a-h.
Faster R-CNN 20 consists of three main parts: (1) a feature extraction layer, (2) a region proposal network (RPN), and (3) a classification and regression network; see Fig. 2 for a detailed model schematic diagram. Among them, the RPN and classification and regression network share the previous feature extraction layer, as shown in Fig. 2a. The feature extraction layer is composed of a series of convolutional neural networks composed of a convolutional layer, pooling layer, and activation layer. According to the feature map generated by the feature extraction layer, the RPN can generate anchors of different sizes and aspects, which are then used to generate the region proposal. The proposed region generated by RPN is input into the classification and regression network for the type recognition and box accurate regression. Because the scale of the feature map layer corresponding to different foreground regions is inconsistent, Fast R-CNN adopts a region of interest (ROI) pooling strategy to www.nature.com/scientificreports/ unify the dimensions. Although the calculation is simplified, some features are lost; therefore, we propose PCA dimension reduction to normalize the dimensions of the features. The feature extraction layers use Resnet 20 , a 152-layer network composed of four residual network blocks: the first three residual network blocks are selected as feature extractors (see Fig. 2b).
The RPN was used to generate a batch of proposals, similar to the selective search used in R-CNN 22 and Fast R-CNN 23 . The network structure is consistent with the RPN used in Faster R-CNN: a 256-channel output is generated by a 3 × 3 convolutional layer after the feature map layer (conv4b_35), which is used to fuse the information around the features and to fuse information across channels. Meanwhile, the fused layer is connected by two branches, termed the SoftMax classification head and box location regression head; for details, see Fig. 3a. In contrast to the RPN in Faster R-CNN, whose box dimensions are hand selected, the generated anchors are based on the average size of the foreground target, which allows the regression network to run smoothly to learn and predict good locations; for details, see Fig. 3b.
In the training process, the RPN module is trained jointly, rather than alternately, with the object recognition network. Since the structure of the Faster R-CNN is end to end, both the RPN and the object recognition network can provide feedback on the feature extraction layer. During backpropagation, the loss functions from both the RPN and the Fast R-CNN are combined and calculated together. Moreover, we introduce the PCA strategy in the classification and regression component of Faster R-CNN that should be trained separately. The original Faster R-CNN model, denoted by M0, can improve the RPN network (3.1.2) and the ROI pooling strategy. PCA-based Faster R-CNN is denoted by M1. The training process is shown in Fig. 4.

Results
In total, 676 biological samples were obtained from the Fourth Affiliated Hospital of Nanchang University. Therefore, 40,560 fecal images were used to develop the detection algorithm based on Faster R-CNN. All images were collected independently from the microscopic imaging system. The best resolutions of the 12 images were collected for each sample. To further validate the algorithm, experienced laboratory experts annotated the cells   Table 1.
After training, the network was tested. The WBCs are marked with blue squares and percentages (Fig. 5a-c), while the RBCs are marked with green squares and percentages (Fig. 5a). PYOs are marked with light blue squares and percentages (Fig. 5a). Furthermore, the remaining components, Mids, are marked with gray squares and percentages (Fig. 5b,d); for details, please see Fig. 5.
Average precision (AP) and mean average precision (mAP) were used to detect the cells and identify their location from the microscopic images. Due to the insufficient sample size during training, the detection recognition rate was low. For example, for RBCs, WBCs and Mids, the detection results reflected the performance of the model, and the mAP value was 84%. Two established classes of methods are used for object detection in images: one based on morphology segmentation or selective search, which is used in R-CNN and Fast R-CNN, Table 1. Overview of the dataset. The dataset is divided randomly. As some samples are negative, they contain fewer cells.   Table 2). Clearly, the selective search segmentation method used by R-CNN and Fast R-CNN consumed substantial time. With the introduction of PCA into the feature extraction layer, the features were assigned the main component during the classification and regression process, and the features of Faster R-CNN and R-FCN were filtered out through the pooling strategy. These results also indicate that the Faster R-CNN method based on PCA had the highest overall recognition rate.
The large number of impurities in the fecal samples made the background of the images complex. Inevitably, the pattern components in the images were difficult to address. However, our algorithm can effectively distinguish the adhesive type components. Unfortunately, the morphological or selective search method cannot accomplish this task. For instance, when an RBC and Mid in the image were stuck together, our algorithm could distinguish the two components (see Fig. 6).  www.nature.com/scientificreports/

Discussion
In summary, 676 fecal samples and 40,560 microscopic images were prepared for algorithm development. Our algorithm presented good performance in identifying four kinds of cells and their locations in microscopic images. The algorithm has two major advantages, the average time required to analyze a sample and accuracy. Clearly, our algorithm consumes significantly less time than R-CNN and Fast R-CNN, which may be due to the introduction of RPN. The R-CNN and Fast R-CNN models use selective search in the segmentation of foreground objects, which requires considerable running time. Each foreground target unit propagates forward to extract features in R-CNN post segmentation 22 , while Fast R-CNN shares the convolutional layer, which can extract features by propagating forward once 23 . However, no significant exhaustion time difference was found between Faster R-CNN and R-FCN. R-FCN uses the position-sensitive map method to avoid the fully connected layer and simplify the training parameters; consequently, the time consumption is slightly lower than that of Faster R-CNN. The time consumption of PCA-Faster-R-CNN is slightly higher, mainly because of the introduction of the PCA strategy after feature extraction.
With respect to the AP performance for four kinds of cells from a single image, the AP of RBCs was the best (0.92), which we believe to be a result of the obvious characteristics of RBCs and the fact that there are no significant morphological changes for different RBCs. The number of RBCs in the collected dataset is large, and data enhancement is adopted to improve the training of RBCs.
The AP values of WBCs and Mildew were 0.85 and 0.81, respectively. This reduced performance may be caused by the specific characteristics of cells in different views. In different samples, leukocytes may be round and influenced by osmotic pressure or be shaped as irregular ellipses. Similarly, different Mids have different spore numbers, sizes and shapes after budding, so the recognition rate is lower than that of RBCs. Meanwhile, due to the sample size, the accuracy of Mids is slightly better than that of WBCs. Furthermore, the AP of PYOs was 0.78, likely a result of the small sample size and insufficient training. PYOs are usually composed of many WBCs with large irregular shapes. Due to the small sample size, the training model suffered from a certain degree of overfitting.
Notable, our algorithm presented better mAP (0.84) than the other methods. The results indicate that PCA plays an important role in feature selection. After introducing PCA into our algorithm model, we proposed a model training method that did not follow the end-to-end architecture of the original Faster R-CNN. The disadvantage is that the model did not represent imbalanced samples well. For example, the number of PYOs was small, and the AP was relatively low compared with that of other types of cells. The PCA-Faster-R-CNN model can be used in other fields of recognition of components in microscopic images, such as target detection in leucorrhea, type component detection in urine, and cell counting in blood.

Conclusion
A deep learning model for cell detection is proposed for locating and identifying objects from microscopy images. The algorithm achieves the highest mAP and has the ability to detect and locate RBCs, WBCs, Mids, and PYOs rapidly. The mAP is approximately 84%, and the detection time is 723 ms per image (1600 × 1200 resolution).

Limitation.
Due to the small sample size in the collected dataset, fat globules were not considered in this analysis. When the number of samples belonging to a certain category is small-for example, PYOs-as training proceeds, the model can easily suffer from overfitting. Artificial adhesion of leukocytes can be used to expand the number of samples via data enhancement.