Improved small blob detection in 3D images using jointly constrained deep learning and Hessian analysis

Imaging biomarkers are being rapidly developed for early diagnosis and staging of disease. The development of these biomarkers requires advances in both image acquisition and analysis. Detecting and segmenting objects from images are often the first steps in quantitative measurement of these biomarkers. The challenges of detecting objects in images, particularly small objects known as blobs, include low image resolution, image noise and overlap between the blobs. The Difference of Gaussian (DoG) detector has been used to overcome these challenges in blob detection. However, the DoG detector is susceptible to over-detection and must be refined for robust, reproducible detection in a wide range of medical images. In this research, we propose a joint constraint blob detector from U-Net, a deep learning model, and Hessian analysis, to overcome these problems and identify true blobs from noisy medical images. We evaluate this approach, UH-DoG, using a public 2D fluorescent dataset for cell nucleus detection and a 3D kidney magnetic resonance imaging dataset for glomerulus detection. We then compare this approach to methods in the literature. While comparable to the other four comparing methods on recall, the UH-DoG outperforms them on both precision and F-score.

improved small blob detection in 3D images using jointly constrained deep learning and Hessian analysis Yanzhe Xu 1 , teresa Wu 1* , fei Gao 1 , Jennifer R. charlton 2 & Kevin M. Bennett 3 Imaging biomarkers are being rapidly developed for early diagnosis and staging of disease. The development of these biomarkers requires advances in both image acquisition and analysis. Detecting and segmenting objects from images are often the first steps in quantitative measurement of these biomarkers. The challenges of detecting objects in images, particularly small objects known as blobs, include low image resolution, image noise and overlap between the blobs. The Difference of Gaussian (DoG) detector has been used to overcome these challenges in blob detection. However, the DoG detector is susceptible to over-detection and must be refined for robust, reproducible detection in a wide range of medical images. In this research, we propose a joint constraint blob detector from U-Net, a deep learning model, and Hessian analysis, to overcome these problems and identify true blobs from noisy medical images. We evaluate this approach, UH-DoG, using a public 2D fluorescent dataset for cell nucleus detection and a 3D kidney magnetic resonance imaging dataset for glomerulus detection. We then compare this approach to methods in the literature. While comparable to the other four comparing methods on recall, the UH-DoG outperforms them on both precision and F-score.
There is great interest in tailoring diagnostic and therapeutic tools to individual patients. This concept reflects the growing recognition that there is significant variability between individuals. As therapies focus on molecular targets, diagnostic medical imaging tools must reveal focal pathologies and the effects of therapy in each patient. High-resolution object detection and image segmentation are thus critical to obtaining meaningful data in a heterogeneous image.
In image analysis, detection is used to identify objects such as organs and tumors, and segmentation is used to isolate the objects from an image. While large objects can often be automatically or semi-automatically isolated, small objects (blobs) are difficult to detect and segment. Blobs can range in size and location in images. Examples of blobs include cells or cell nuclei in images from optical microscopy 1 , exudative lesions in images of the retina 2 , breast lesions in ultrasound images 3 , and glomeruli in magnetic resonance (MR) images of the kidney [4][5][6] . Major challenges to detecting these blobs include low image resolution and high image noise. The small blobs are often numerous and can overlap each other. Many approaches have been proposed for blob detection [7][8][9] of which intensity thresholding is among the most common 10 . Intensity thresholding assumes that the blobs have consistently different intensities from the background. Global differences can be addressed with a fixed threshold and local differences can be addressed with an adaptive threshold 11,12 . However, the assumptions required for consistent thresholding are often violated, and thresholding alone can lead to erroneous detection or segmentation. To address this, researchers have proposed multi-step pipelines 13,14 in which thresholding is only the first step. Intensity-based features are then derived using filters for improved detection. One popular class of filters is based on mathematical morphology 15,16 . Operators such as erosion, dilation, opening and closing allow geometrical and topological properties of objects. This approach often begins with selected seed points in the image and iteratively adds connected points to form labeled regions. Mathematical morphology is preferred when the blobs are relatively large in size and small in number. Weaknesses of this approach include the tendency to under-segment and diminished performance in the presence of noise. Under-segmentation occurs when multiple blobs within close proximity are detected as one, resulting in an erroneously low detected number. Another type of filter is based on space transformation. For example, Radial-Symmetry 17 , a point detector for small blobs, uses radially symmetric space as a transformation space to detect radially symmetric blobs. SIFT 18 , SURF 19 and BRISK 20 are region detectors. Each of the region detectors extracts scale invariant features to detect small objects but may suffer from poor performance in optical imaging 21 . Recently, the Laplacian of Gaussian (LoG) detector 22,23 , from scale space theory, has attracted attention in blob detection 8,24 . Similar to the radially symmetric detector, the LoG detector is unreliable in detecting rotationally asymmetric blobs. To solve this, LoG extensions have been proposed, including the Difference of Gaussian (DoG) 18,[25][26][27] and the Generalized Laplacian of Gaussian (gLoG) 21 . While each approach detects small blobs to some extent, non-blob objects are detected as false blob candidates resulting in over-detection. A post-pruning procedure can remove false blob candidates, but results have been inconsistent 28 .
Here we focus on detecting individual glomeruli in MR images of the kidney as a specific blob detection problem. To date, most biomarkers of kidney pathology have come from histology using destructive techniques that estimate glomerular number [29][30][31] . A non-destructive imaging approach to measuring nephron endowment provides a new marker for renal health and susceptibility to kidney disease. Cationic ferritin enhanced MRI (CFE-MRI) enables the detection of glomeruli in animals 32,33 and in human kidneys 5,[32][33][34] . Because each glomerulus is associated with a nephron, CFE-MRI may provide an important imaging marker to detect changes in the number of nephrons and susceptibility to renal and cardiovascular disease 5 . Glomerulus detection by CFE-MRI presents difficulties because glomeruli are small and have a spatial frequency similar to image noise. Zhang et al. developed the Hessian-based Laplacian of Gaussian (HLoG) detector 1 and the Hessian-based Difference of Gaussian (HDoG) detector 28 to automatically detect glomeruli in CFE-MR images. They employed the LoG or DoG to smooth the images, followed by Hessian analysis of each voxel for pre-segmentation. Since LoG and DoG suffer from over-detection, a Variational Bayesian Gaussian Mixture Model (VBGMM) was implemented as a final step. LoG and DoG were the first two detectors applied to MR images of the kidney to identify glomeruli. However, deriving Hessian-based features from each blob candidate is computationally expensive, limiting high-throughput studies. In addition, unsupervised learning using the VBGMM in the post-pruning procedure requires a number of carefully tuned parameters for optimal clustering. Here we propose a new approach, termed UH-DoG, which applies joint constraints from spatial probability maps derived from U-Net, a deep learning model, and Hessian convexity maps derived from Hessian analysis on the DoG detector. The theoretical foundation of Hessian analysis guarantees that pre-segmentation will recognize all true convex blobs and some non-blob convex objects, resulting in a blob superset. Joining probability maps allows us to distinguish true blobs from the superset. The joint-constraint extension of the detector requires no post-pruning and thus is robust, generalizable and computationally efficient.
Within the field of deep learning, the Convolutional Neural Network (CNN) has been successfully implemented in medical imaging applications ranging from object detection and segmentation to classification [35][36][37] . The first generation of CNN models was used to classify images through fully connected layers. Shelhamer et al. 38 first proposed a Fully Convolutional Network (FCN) that transfers the fully connected layers to deconvolutional layers and provides a dense class map with arbitrarily-sized input image. The FCN changes "image-label" mapping to "pixel/voxel-label" mapping for object detection and image segmentation. One limitation of the FCN for medical imaging is the need for large training datasets. A lightly weighted FCN model, the U-Net 39 , employs a modified FCN architecture to require fewer training images but yield precise, fast segmentation. U-Net has been implemented in various medical segmentation tasks such as nucleus, cell, and breast lesion segmentation [40][41][42] , all drawn from limited datasets. The U-Net yields a probability map where each pixel or voxel indicates the likelihood of being within the imaging object. However, based on our previous study 43 , U-Net does not reliably separate glomeruli within close proximity. Therefore, we choose to adopt the probability map as part of UH-DoG in conjunction with Hessian analysis for glomerulus detection from CFE-MR images.
There are three main advantages of the UH-DoG method. First, a global blob likelihood constraint from the U-Net probability map reduces over-detection by DoG. Second, a local convex constraint from the Hessian convexity map reduces under-segmentation. Third, integrating the probability map constraint with the Hessian convexity map eliminates the need for post-pruning. To validate the performance of UH-DoG, four methods were chosen from the literature: HLoG 1 , gLoG 21 , LoG 22 , and Radial-Symmetry 17 . We tested these on dataset of 2D fluorescent images (n = 200) where the locations of blobs were known. UH-DoG outperformed the other four methods in F-score and performed comparably to the other four methods in recall. Next, we compared blob detection of these methods on a 3D kidney MR dataset against the HDoG method. The differences between UH-DoG and HDoG were negligible but the average computation time of UH-DoG was 35% shorter than that of HDoG.

Methods
We propose UH-DoG, a joint constraint-based detector for glomeruli detection. UH-DoG consists of three steps (Fig. 1).
Step 1 is to use the Difference of Gaussian (DoG) to smooth the images, followed by Hessian analysis to identify possible blob candidates based on local convexity. Step 2 is to use a trained U-Net to generate a probability map, which captures the most likely blob locations.
Step 3 is to combine the probability map from Step 2 with blob candidates from Step 1 as joint constraints to identify true blobs. Each step is discussed in detail in the following sections.
Hessian analysis and hessian convexity map. Before implementing Hessian analysis, DoG is used to smooth the images. By employing a convolution operator, DoG can filter image noise and enhance objects at the selected scale 24 . DoG is a fast approximation of the LoG filter to highlight blob structure 4 and is thus computationally efficient 18 .
Let a 3D image be → . f R R : 3 The scale-space representation σ L x y z ( , , ; ) at point (x, y, z), with scale parameter σ, is the convolution of image f(x, y, z) with the Gaussian kernel σ G x y z ( , , ; ): where * is the convolution operator and the Gaussian kernel According to 18 , σ σ σ∇ = .
σ L x y z L x y z ( , , ; ) ( , , ; ) 2 We approximate the partial derivative σ σ L x y z ( , , ; ) by a one-sided difference quotient, the DoG approximation of LoG is: To locate an optimum scale for the blobs, similar to 1 , we add γ-normalization to form the normalized DoG , which is: where γis introduced to automatically determine the optimum scale for the blobs. We set γ to 2 here. For details on tuning γ, refer to 1 . The normalized DoG transformation underlies Hessian-based convexity analysis to detect blobs.
After the image is smoothed by the normalized DoG, for a voxel (x, y, z) in the normalized DoG image σ DoG x y z ( , , ; ) nor at scale σ, the Hessian matrix for this voxel is: . σ * is used to generate the optimum Hessian convexity map σ ⁎ HI x y z ( , , ; ). This map is the local convexity constraint for detecting the convex blob regions. Result is a set of convex objects including all true blobs and some non-blob convex objects.

U-Net and Probability Map.
A classical CNN usually consists of multiple convolutional layers followed by pooling layers, activation layers, and fully connected layers. Convolutional layers learn hierarchical and high-level feature representation. Pooling layers can reduce feature dimensions and capture spatial feature invariance. The final fully-connected layers categorize the images into different groups. Compared to classical CNNs, FCNs www.nature.com/scientificreports www.nature.com/scientificreports/ replace all fully-connected layers with a fully convolutional layer. There are several advantages to this approach. First, the input image size can be arbitrary because all models consist of convolution layers, so output size only depends on input size. Second, the FCN can be trained from whole images without patch sampling, thus the effects of patch-wise training need not be considered. However, FCNs require a large dataset for training. To address this issue, a modified FCN model, the U-Net, was proposed. Interested readers are referred to U-Net 39 for details. The output of U-Net is a probability map in [0, 1].
In U-Net, let the input images be ∈ × × X R I I I 1 2 3 , and the output map be ∈ × × Y R I I I . A binary cross entropy loss function is used in the training process to obtain the output map: where Y k is the true label and Ŷ k is the predicted probability for voxel k. In our U-Net output, we obtain a probability map U x y z ( , , ), where x and y are dimensions of each 2D image slice and z is the slice number. The probability map is the global blob likelihood constraint in our joint constraints operations to detect the most likely blob regions. By setting a probability threshold, most noise is removed. However, some touching blobs could have a higher probability than the threshold in the boundary and might not be split up, resulting in reduced detection known as under-segmentation. Joining the Hessian convexity map with U-Net probability map will address the challenges of small blob detection.

Joint constraint operation for true blob identification. Given a 3D image
, 3 Hessian analysis is applied to render a convexity map σ ⁎ HI x y z ( , , ; ), U-Net is applied to render a probability map U x y z ( , , ). We introduce a joint operator σ =  ⁎ UH x y z HI x y z I x y z ( , , ) ( , , ; ) ( , , ), where I x y z ( , , ) is a binary indicator matrix. Given a probability threshold δ I x y z , ( , , ) b = 1 when δ > U x y z ( , , ) b ; otherwise, I x y z ( , , ) = 0. We define the true blob candidate as a 27-connected voxel 44 , and the blob set is represented as: To illustrate, Fig. 2 shows images of blobs detected during the joint constraint operation of the U-Net probability map and the Hessian convexity map. The blue circle in Fig. 2a shows only one blob. The same blue circle on Fig. 2b, after application of the Hessian convexity map, shows there is one "bigger" blob in the middle and a number of smaller blobs around the boundary of the blue circle. Figure 2c shows the correct outcome, only one blob in the middle. This clearly illustrates the sensitivity of the Hessian matrix to noise. Even though the Hessian analysis guarantees the detection of the convex object, some non-blob convex objects (noise) will also be detected, resulting in over-detection. This noise can be readily filtered by the U-Net probability map (Fig. 2c). We conclude that U-Net may be useful for denoising, which alleviates the over-detection of Hessian analysis.
The red circle in Fig. 2a shows overlapped blobs. They are still overlapping in the U-Net probability map from Fig. 2c. But they are split up in the Hessian convexity map from Fig. 2b. By joining the Hessian convexity map and U-Net probability map with a single global threshold, the overlapped blobs in the red circle are visualized as distinct entities, as shown in Fig. 2d. We conclude Hessian analysis could alleviate the under-segmentation issue from U-Net.
Our proposed UH-DoG integrates the probability map from U-Net and convexity map from Hessian analysis to guarantee robustness to noise and effective blob detection. The detailed steps of UH-DoG are shown in Table 1.

Experiments.
Two experiments were conducted to validate of the performance of our proposed UH-DoG detector. The first experiment validated the UH-DoG on 200 fluorescence, 2D light microscopy images for cell detection 45 . The 2D cell images were of interest because (1) to the best of our knowledge, there are no 3D small blob datasets available for comparison; (2) the blobs from these images are small and each image could be used

Results
Training dataset and data augmentation. We used a public dataset 46 to train our deep learning model, based on optical images of cell nuclei. This dataset has 141 optical microscopy pathology images (2,000 × 2,000 pixels), as shown in Fig. 3a. The 12,000 ground truth annotations are typically done by an expert, which involves delineating object boundaries over 40 hours 46 . Due to the large amount of time and effort required, the annotated nuclei in this dataset only represents a small fraction of the total number of nuclei present in all images. Since we aim to facilitate U-Net to denoise our blobs images based on the ground truth labeled images, as shown in Fig. 3b, we generated Gaussian distributed noise with µ = 0 noise and σ = . 0 01 noise 2 and we added it to the ground truth labeled images, resulting in 141 simulated training images, as shown in Fig. 3c. Data were augmented to increase the in variance and robustness properties of U-Net 39 . We generated the augmented data by a combination of rotation shift, width shift, height shift, shear, zoom, and horizontal flip. Figure 4 illustrates an example fluorescent image (256 × 256 pixels). Since this was a 2D image, our proposed UH-DoG must incorporate a modified 2D DoG because comparison algorithms were from the 2D LoG and its extensions.

Experiment I: Validation experiments using 2D fluorescent images.
To revise the DoG to a 2D version, for 2D images f x y ( , ) with the Gaussian kernel σ G x y ( , ; ), we modified the normalized 3D DoG detector from Eq. 4 in a 2D format: Then the corresponding Hessian matrix were modified from Eq. 5 as follows: The parameter settings for Hessian analysis and DoG were as suggested in 28 . γ is set to 2. σ varies from 0.5 to 3 with step-size 0.5. Δσ is set to 0.001.
We used precision, recall and F-score to evaluate the performance of our proposed algorithm. Precision measures the fraction of retrieved candidates confirmed by the ground-truth. Recall measures the fraction of ground-truth data retrieved. F-score measures overall performance. Since ground truth data were provided in the form of dots (the coordinates of the blob centers), as in the literature 1,28 , a candidate was considered a true positive if its intensity centroid was within a threshold d of the corresponding ground truth dot. Specifically, if the Euclidian distance D ij between dot i and blob candidate j was less than or equal to d, the blob was considered a true positive. To avoid duplicate counting, the number (#) of true positives TP was calculated by Eq. 11. Precision, recall, and F-score are calculated by Eqs. 12, 13, 14 respectively: and find the optimum scale section by.
6. Join the probability map with Hessian convexity map to identify true blobs.  (14) where m is the number of ground-truth and n is the number of blob candidates; d is a thresholding parameter set to a positive value + ∞ (0, ) . If d is small, fewer blob candidates are counted since the distance between the blob candidate centroid and ground-truth should be small. If d is too large, more blob candidates are counted. Here, since local intensity extremes could be anywhere within a small blob with an irregular shape, we set d to the average diameter of the blobs .
Since the results of detection by the complete versions of HLoG, gLoG, Radial-Symmetry and LoG on 200 pathological images are available online 1,17,21 , the results were directly used from these papers for comparison. Figure 5 shows a comparison of UH-DoG to the HLoG, gLoG, LoG and Radial-Symmetry algorithms. While UH-DoG is comparable to HLoG, gLoG and Radial Symmetry algorithms in recall, it significantly outperforms the four algorithms in both precision and F-score ( Table 2). The standard deviation of F-score in UH-DoG was 0.025, compared to 0.0377 with the HLoG method, compared to 0.1436 with the gLoG method, 0.0795 with the Radial-Symmetry method, and 0.0385 with the LoG method. We conclude that UH-DoG provides more accurate and robust detection of blobs in this dataset. In addition, statistical analysis was performed with the results summarized in Table 2. While comparable to the four algorithms on recall, our approach statistically outperformed the others on precision and F-score. Experiment II: Validation experiments using 3D Kidney MRI. In this section, we conducted experiments on CF-labeled glomeruli from a dataset of 3D magnetic resonance images (256 × 256 × 256 voxels) to measure number (N glom ) and apparent size (aV glom ) of glomeruli in diseased kidneys and healthy control kidneys. Acute kidney injury was induced in adult male C57Bl/6 mice using an intraperitoneal injection of folic acid (125 mg). A subset of the group receiving folic acid, the AKI group (n = 4) was euthanized 4 days after the  www.nature.com/scientificreports www.nature.com/scientificreports/ folic acid was administered and the remainder of those that received folate were euthanized 4 weeks later and termed the chronic kidney disease (CKD) group, n = 3. The control groups for AKI (n = 5) and CKD (n = 6) were age-matched adult male C57Bl/6 mice that received intraperitoneal sodium bicarbonate.
For improved detection, we adopted a preprocessing step to segment the medulla from the image because no glomeruli are located there. Based on the segmented kidney image, shown in Fig. 6a, we converted it to a binary mask (Fig. 6b). Then we generated a distance mask, seen in Fig. 6c. With the map showing the distance between each kidney's voxel and the kidney boundary, we set up a distance threshold to remove regions farther from the boundary than this threshold. Figure 6d shows the 2D image slice after removing the medulla.
Then we performed the proposed UH-DoG method to segment the kidney glomeruli in Fig. 6d. The parameter settings are as follows: γ is set to 2. σ varies from 0.5 to 1.8 with step-size 0.1. Δσ is set to 0.001. Example segmentation results are shown in Figs. 7 and 8. The number of glomeruli (N glom ), mean apparent glomerular volume (aV glom ) and median aV glom are reported in Table 3, where the UH-DoG method is compared to the HDoG method. We used the method of calculating apparent glomerular volume from the paper 34 . Similarly, Table 4 summarizes the results from the AKI and control groups.
We performed quality control by visually checking the identified glomeruli in kidney images. Figure 7 shows glomerular identification for CKD and control kidneys. Figure 8 shows glomerular identification for kidneys in the AKI and control groups.
Discussion: computation cost. UH-DoG significantly decreases computation time compared to the HDoG algorithm 28 , as shown in Tables 5 and 6. The training time of U-Net is not included in the estimates of computation time as it is trained beforehand and can be used to test on all images.
Discussion: Clinical translation. The use of imaging biomarkers in humans has increased both for disease early detection and disease severity assessment. Additionally, imaging biomarkers can serve as surrogate endpoints in clinical trials, reducing cost and burden associated with these studies. For example, total kidney volume has recently been accepted as a surrogate marker for disease progression in autosomal dominant polycystic  Table 2) with others. For recall, UH-DoG has significant difference with gLoG and LoG.  www.nature.com/scientificreports www.nature.com/scientificreports/ kidney disease trials 47 . Although the importance of glomerular number has been universally accepted, the detection of glomerular number and size has been limited because the only methodology to obtain these metrics were destructive stereological approaches that could only be performed post mortem. With the advent of CFE-MRI, the need for image analysis tools is paramount. However, it is critical to the success of any imaging biomarker that   www.nature.com/scientificreports www.nature.com/scientificreports/ the marker be accurate and rapidly obtained. This study demonstrates some of the challenges in detecting small objects, such as glomeruli, particularly in the settings of low image resolution, image noise and overlap of objects. It also shows the promise of rapid acquisition where data can be used in a timeframe to influence patient care.   Table 4. Glomerular number (Nglom) and volume (aVglom) for the AKI and control mice kidneys using the proposed UH-DoG method comparing with HDoG method (*aVglom unit mm 3 × 10 −4 ). (2020) 10:326 | https://doi.org/10.1038/s41598-019-57223-y www.nature.com/scientificreports www.nature.com/scientificreports/ Further work is necessary to validate the accuracy of the detection of diseased glomeruli to apply this algorithm to a wider range of renal disease models.

Conclusion
Discovering imaging biomarkers is important to inform disease diagnosis, prognosis, therapy development and treatment assessment. Of particular interest in this research is to identify quantitative glomeruli biomarkers from CFE-MR image. This is a challenging problem because the number of glomeruli is large, the size is small. In addition, the limitation from imaging acquisition such as hardware and variable acquisition parameters often renders the images with less desirable resolution resulting the overlapping glomeruli. In this paper, we demonstrated a new small blob detector by joining the Hessian convexity map and probability map from U-Net. This joint constraint-based approach overcomes under-segmentation by U-Net and over-detection by Hessian analysis. While it was successfully implemented in segmenting the kidney glomeruli, there are still some limitations. First, the assumption that the blobs are convex and similar in size may not be robust for non-convex objects with difference sizes. A future possible improvement is to enhance ability of U-Net to detect both convex and non-convex small objects. Second, the probability map is sensitive to the threshold. We plan to explore the use of thresholding to improve UH-DoG.

Data availability
The datasets generated during and/or analyzed during the current study will be made available upon request.    Table 6. Computation time for AKI and Control kidneys using HDoG and the proposed method with scale = 1 (Intel Xeon 3.6 GHz CPU and 16 GB of memory, NVIDIA TITAN XP and 12 GB of memory).