An image inpainting-based data augmentation method for improved sclerosed glomerular identification performance with the segmentation model EfficientNetB3-Unet

He, Songping; Zou, Yi; Li, Bin; Peng, Fangyu; Lu, Xia; Guo, Hui; Tan, Xin; Chen, Yanyan

doi:10.1038/s41598-024-51651-1

Download PDF

Article
Open access
Published: 10 January 2024

An image inpainting-based data augmentation method for improved sclerosed glomerular identification performance with the segmentation model EfficientNetB3-Unet

Songping He¹,
Yi Zou²,
Bin Li¹,
Fangyu Peng²,
Xia Lu³,
Hui Guo³,
Xin Tan⁴ &
…
Yanyan Chen⁵

Scientific Reports volume 14, Article number: 1033 (2024) Cite this article

625 Accesses
Metrics details

Subjects

Abstract

The percent global glomerulosclerosis is a key factor in determining the outcome of renal transfer surgery. At present, the rate is typically computed by pathologists, which is labour intensive and nonstandardized. With the development of Deep Learning (DL), DL-based segmentation models can be used to better identify and segment normal and sclerosed glomeruli. Based on this, we can better quantify percent global glomerulosclerosis to reduce the discard rate of donor kidneys. We used 51 whole slide images (WSIs) from different institutions that are publicly available on the internet. However, the number of sclerosed glomeruli is much smaller than that of normal glomeruli in different WSIs, which can reduce the effectiveness of Deep Learning. For better sclerosed glomerular identification and segmentation performance, we modified and trained a GAN (generative adversarial network)-based image inpainting model to obtain more synthetic sclerosed glomeruli. Our proposed inpainting method achieved an average SSIM (Structural Similarity) of 0.8086 and an average PSNR (Peak Signal-to-Noise Ratio) of 22.8943 dB in the area of generated sclerosed glomeruli. We obtained sclerosed glomerular segmentation performance improvement by adding synthetic sclerosed glomerular images and achieved the best Dice of glomerular segmentation in different test sets based on the modified Unet model.

Multi-level dilated residual network for biomedical image segmentation

Article Open access 08 July 2021

Deep-learning model for evaluating histopathology of acute renal tubular injury

Article Open access 19 April 2024

Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks

Article Open access 15 November 2019

Introduction

There are a large number of patients in need of kidney transplantation waiting for kidney donors, and this demand is still growing¹. Meanwhile, many studies have shown that chronic damage to donor kidney biopsy specimens is closely related to transplant outcomes, so approximately 17–20% of collected donor kidneys need to be discarded after pathologist evaluation². Additionally, in daily practice, often due to the urgency of time, different pathologists will have subjective biases when evaluating sections, potentially resulting in unnecessary discarding of organs. We need to minimize the occurrence of this discarding due to the shortage of kidney donors.

In kidney transplant evaluation, there are many indicators to consider, among which the percent of global glomerulosclerosis is considered to be the entry point for kidney transplantation³. Due to the large number of glomeruli, the assessment of percent global glomerulosclerosis is very time-consuming and causes poor reproducibility among pathologists. The professional knowledge requirements of pathologists are high, and human error easily occurs. Therefore, automatic image processing methods that can accurately detect and classify glomeruli are needed.

Recently, due to the strong feature extraction ability of Deep Learning, an increasing number of studies have begun to use it to detect or segment objects in pathological images. In the imaging task, CNNs in particular are widely used. Unet, introduced by Ronneberger et al. based on CNN⁴, has proven to be useful in many tasks of tissue image segmentation and classification.

However, the performance of Deep Learning often depends on the quantity and quality of the training data. The acquisition of medical images involves the privacy of patients and requires the annotation of experts, so it is relatively difficult to obtain the training data of medical images. Meanwhile, in the publicly available data for glomerular studies, the number of sclerosed glomeruli is much smaller than that of normal glomeruli. Class imbalance can bring difficulties to Deep Learning.

In our study, we proposed a GAN-based image inpainting framework to generate more new sclerosed glomeruli from masks. Innovatively, newly generated sclerosed glomeruli were obtained based on the diverse shapes of the masks and the surrounding contextual information. Furthermore, we improved the segmentation network based on Unet and trained the model by combining the original data with new synthetic data. We realized the automatic segmentation and classification of normal and globally sclerosed glomeruli in digital pathological sections.

Related research

GAN and medical image generation

Since Ian Goodfellow proposed GAN in 2014, it has become possible to generate realistic images by designing the game process of the generator and discriminator⁵. Because of their powerful data generation capabilities, an increasing number of GANs have been used in the generation of pathological and medical images to perform data augmentation. In⁶, the combination of VAE and StyleGAN was proposed. The network generated the hidden code of the image through VAE as the input of StyleGAN to generate realistic cell images. In⁷, a medical image augmentation method, namely, a texture-constrained multichannel progressive generative adversarial network (TMP-GAN) was proposed. In⁸, Lei et al. proposed a lesion attention conditional generative adversarial network (LACGAN) to synthesize retinal images with realistic lesion details to improve the training of the disease detection model. Amirrajab, S. et al. proposed a novel framework consisting of image segmentation and synthesis based on mask-conditional GANs for generating high-fidelity and diverse Cardiac Magnetic Resonance (CMR) images⁹. Although different GAN-based frameworks have been applied in the generation of medical images, they still need to improve the performance in the field of pathological images, and there are also few studies on the generation of glomerular pathological images to improve segmentation performance.

Deep learning on glomerular identification and classification

In recent years, an increasing number of methods based on Deep Learning have been proposed to realize the identification and classification of glomeruli in digital pathological images. Each of these approaches has its own advantages and drawbacks. Jon N. Marsh et al. used a fully convolutional neural network based on the VGG16 architecture for glomerular segmentation and achieved 0.784 Aggregate Dice coefficients for nonglobally sclerosed glomeruli and 0.600 for globally sclerosed glomeruli¹⁰. Jaime Gallego et al. trained the Unet model on PAS-stained WSIs and H&E-stained WSIs. On the PAS-stained WSIs, normal and sclerosed glomeruli were classified with F1-scores of 97.5% and 68.8%, respectively. On H&E-stained WSIs, F1-scores of 90.8% and 78.1% were achieved¹¹. Gloria Bueno proposed the sequential CNN segmentation-classification strategy(SegNet-AlexNet) and this two-step framework achieved 98.16% accuracy in classifying normal and sclerosed glomeruli when trained on 47 PAS-stained WSIs¹². Lei Jiang et al. trained cascade mask region-based CNN architecture to detect, classify, and segment glomeruli into three categories: (i) GN, structural normal; (ii) global sclerosis; and iii) glomerular with other lesions. They achieved F1 scores of 0.839, 0.806, and 0.753, respectively, in the whole-slide image group¹³. Tianyuan Yao et al. developed and released a holistic Glo-In-One open-source toolkit to provide holistic glomerular detection, segmentation, and lesion characterization¹⁴. Kawazoe et al. developed an automated computational pipeline for detecting glomeruli on PAS-stained WSIs, followed by segmenting Bowman’s space, the glomerular tuft, the crescentic, and the sclerotic region inside of the glomeruli¹⁵. Silva et al. proposed the end-to-end network, named DS-FNet, combining the strengths of semantic segmentation and semantic boundary detection networks via an attention-aware mechanism, and it showed consistently high performance in a one-to-many-stain glomerulus segmentation¹⁶.

Most of the existing studies have not focused on the performance improvement of sclerosed glomerular segmentation. However, in fields such as kidney transplantation, the evaluation of sclerosed glomeruli is necessary and meaningful. Therefore, this paper focuses on improving the identification and segmentation performance of sclerosed glomeruli while solving the problem of automatic identification and segmentation of glomeruli.

Materials

Data source

In our study, we collected 51 WSIs from open sources, and they are introduced as follows. Thirty-one WSIs generated by the European project AIDPATH (http://aidpath.eu) were chosen. The tissue samples were stained using periodic acid–Schiff (PAS) and were scanned at 20× with a Leica Aperio ScanScope CS scanner¹⁷. The remaining 20 WSIs representing various human kidney pathologies came from four sources: three independent medical centres and TCGA. The data from three independent medical centres were collected by¹⁰, including 4 H&E-stained slides from the Military Institute of Medicine in Warsaw in Poland, six PAS-stained slides from Hospital Universitario Valld’Hebron, Barcelona in Spain and five H&E-stained slides from Cedars-Sinai Medical Center in Los Angeles in the USA. Five H&E-stained slides from the publicly available TCGA repository¹⁸. All slides were prepared from formalin-fixed paraffin-embedded (FFPE) sections with a thickness of 4 μm. The literature¹¹ specifically describes the method followed for making the slides.

The 31 WSIs from¹⁷ contain two folders. DATASET_A: Raw data with 31 whole slide images (WSIs) in SVS format. We converted these to PNG format. DATASET_B: 2340 images with a single glomerulus, 1170 normal glomeruli and 1170 sclerosed glomeruli. All of them are in PNG format and are detected from DATASET_A. As the repository only provided normal glomerular patches and sclerosed glomerular patches, with the help of professional pathologists, we should find the specific locations of these glomeruli in WSIs, and use QuPath software to label categories and draw their outlines. For 20 WSIs from three independent medical centres and TCGA, the pathologists in¹¹ first identified 78 ROIs and then delineated 1,184 glomeruli within the ROIs. ROIs were extracted for × 10, which corresponded to a pixel size of ~ 10 μm. In the availability of materials and data section of this article, we provide the URL where the data can be obtained.

Data processing

We used 25 PAS-stained WSIs from AIDPATH and 20 PAS stained ROIs from three independent medical centres and TCGA as training data for glomerular identification and classification, and the remaining six WSIs from AIDPATH and 58 ROIs were used as different test sets to verify the effectiveness of the glomerular identification algorithm. As shown in Fig. 1, we called the six PAS-stained WSIs from AIDAPTH Test1, the five PAS-stained ROIs from Zenodo and the 53 H&E-stained ROIs from Zenodo Test2 and Test3. Because the training data did not include H&E-stained slides, we could also test algorithm migration performances on H&E- stained WSIs. Since the resolution of a single WSI or ROI was very high, it was not easy to train. We performed overlapping cropping on WSIs or ROIs. The size of the cropping was 1024 × 1024, and the step length was 512. For the test sets, we also adopted the same strategy as the processing method of the training sets. The number of patches obtained from different datasets is shown in Fig. 1. To reduce the training and testing time, we downsampled all slides two times to reduce the size of the picture before cropping. We converted all glomerular contour labelling into pixelwise mask. Specifically, each WSI corresponded to two masks, with black representing the background and white representing all the normal glomeruli and sclerosed glomeruli, respectively. Figure 2 shows the masks of normal and sclerosed glomeruli on a patch.

Methods

Framework of sclerosed glomerular generation

Cropped sclerosed glomerular masks

To realize the generation of sclerosed glomeruli considering shapes and contextual information, we must create sclerosed glomerular datasets and corresponding masks to train the image inpainting network. Inspired by^19,20, we synthesized our datasets using existing data. Specifically, the datasets were created as shown in Fig. 3. It is worth noting that our cropping method is designed to place the sclerosed glomeruli in the centre of the cropped image as much as possible, thus potentially assisting in the subsequent training of the generative network.

First, we need to obtain all sclerosed glomerular masks from the segmentation training set. In Fig. 3, the bottom left portion shows the mask of a certain part of sclerosed glomeruli that is shown in the full section image (Fig. 3 top half). Based on the sclerosed glomerular labels provided by the masks of the open data source, the minimum peripheral circle was made for each sclerosed glomerulus, as shown in the red circles in the bottom left of Fig. 3. The centre of the outer circle was taken as the centre of the cropped rectangle picture, and the size is $256\times 256$, as shown in the green rectangle box in the bottom left section. The position of the rectangular box is mapped to the position of the original slice, as shown in the bottom right of Fig. 3. The rectangular masks and the corresponding pictures were cropped. The final training data are shown in the Fig. 4.

The ROIs are usually synthesized in both the foreground and the background to be fair and unbiased. For example, in^21,22, the authors adopted this idea for synthesis. However, when considering the synthesis of ROIs in this article, we did not consider the generation both the glomeruli and the adjacent tissues (background). The reasons are as follows. In the segmentation task, the area of the sclerosed glomerular regions is relatively small compared to the area of its background. Thus, when the deep learning network segments and classifies the glomeruli in the image, the fraction of other parts is much higher than the fraction of glomerular regions. So, the diversity of other tissues can be guaranteed. Based on this, we adopted a generation way like image inpainting to make the generated glomeruli have a good fusion with the existing adjacent tissue and reduce the number of training parameters. Subsequently we can also generate the adjacent tissue and combine it with the generated glomeruli, which may make our model more robust.

Architecture of sclerosed glomerular inpainting network

The training phases of the sclerosed glomerular inpainting network are shown in Fig. 5. It is divided into four modules as follows. (1) The sclerosed glomerular mask input module controls the area of sclerosed glomerular generation. (2) The Generator module is mainly based on AutoEncoder, which consists of an encoder and a decoder. (3) The discriminator module mainly determines whether the input picture is a real picture or a generated picture and, in turn, promotes the training of the generator. (4) The sclerosed glomerular attention loss module includes the global image loss and the loss of the sclerosed glomerular foreground itself.

A. Sclerosed glomerular mask input

A picture with only background ${X}_{{\text{gap}}}$ is obtained by Eq. (1), where ${X}_{{\text{ori}}}$ represents a cropped picture of the sclerosed glomerulus with background,${X}_{{\text{mask}}}$ represents the corresponding masks, and ⊙represents pixelwise multiplication. According to Eq. (2), we can obtain the network input ${X}_{{\text{input}}}$ by merging the images ${X}_{{\text{gap}}}$ and the masks ${X}_{{\text{mask}}}$ in the channel dimension, where $merge(\bullet )$ is the function realizing dimension concatenation. Through passing ${X}_{{\text{input}}}$ into the generator, we can realize sclerosed glomerular generation at the vacancy.

$$\begin{array}{c}{{X}_{{\text{gap}}}=X}_{{\text{ori}}}\odot \left(1-{X}_{{\text{mask}}}\right)\end{array}$$

(1)

$${{X}_{{\text{input}}}=merge(X}_{{\text{gap}}}{,X}_{{\text{mask}}})$$

(2)

B. Generator

The generator consists of an encoder, a stack of building blocks, a self-attention block and a decoder. In addition, we use skip connections between the encoder and the decoder. The generator takes the 256 × 256 ${X}_{{\text{input}}}$ as the input. In the encoder section, the input first passes through a convolutional network of 7 × 7 convolution kernel size, with batch normalization and a LeakyReLU activation function, followed by two 4 × 4 convolutional layers with a stride of 2 to downsample the image. This is followed by eight AOT blocks, all with the same parameter settings to reduce the amount of computation required. The AOT block was proposed in²³, and the architecture is shown in Fig. 6a. AOT blocks adopt the split-transformation-merge strategy in three steps²⁴. During the transformation, each subkernel performs a different transformation of the input feature ${x}_{1}$ by using a different dilation rate. Inspired by ResNet, a gated residual connection first calculates the spatially-variant gate value $\upbeta$ from × 1 by a standard convolution and a sigmoid operation, and then the AOT block aggregates the input feature × 1 and the learned residual feature × 2 by a weighted sum with $\upbeta$.The network structure of the decoded part and the encoding part are consistent, and two deconvolution layers are used to make the size of the masked picture the same as the size of the input image. Before the first layer of the upsampling network, there is a self-attention block whose input size is $64\times 64$. It is proposed in²⁵. As shown in Fig. 6b, by obtaining the self-attention feature maps, we can explore the relationship between the locality of the picture and the whole to solve the problem of long-distance dependence. Finally, the tanh function is applied in the output layer.

C. Discriminator

The discriminator in this task was set to two, namely, the local discriminator and the global discriminator. The local discriminator only discriminated the generated sclerosed glomeruli, and the global discriminator discriminated the complete generated images, including the sclerosed glomeruli and the background. When we used the local discriminator, the region of the nonglomerulus can be filled in white so that the size of the local glomeruli image is consistent with the input whole image. In this way, the local discriminator has the same network structure as the global discriminator to reduce the amount of calculation. The input size of each discriminator is $256\times 256\mathrm{ pixels}$. There are a total of six convolutional layers, and each convolutional layer uses a 4 × 4 kernel with a stride of 2 (Convolution + LeakyReLU + Batch normalization) to decrease the size of the feature representations. The number of channels in the discriminator is set to 64, 128, 256, 512 and 1. The last layer of both discriminators produces N × N output patches representing classification scores (‘real’ or ‘fake’).

D. Sclerosed glomerular attention loss module

As shown in Fig. 5, the image loss and sclerosed glomerular loss are set up to ensure that the whole pictures remain consistent and that the sclerosed glomeruli show a sense of clear texture and staining. Based on the designed loss module with the nature of attention, the generated network achieved a balance between the generation of glomeruli itself and the inpainting of the complete image.

The adversarial loss of the global image and local sclerosed glomeruli are shown in Eqs. (3) and (4), respectively, where D is the global discriminator and ${D}_{l}$ is the local discriminator. To reduce the amount of computation required, we set the network parameters of D and ${D}_{l}$ to be the same. ${X}_{{\text{rec}}}$ is the generated global image, which is obtained by the generator as shown in Eq. (5), where G represents the generator. ${R}_{ori}$ and ${R}_{rec}$ are the original local sclerosed glomeruli and the generated local sclerosed glomeruli, respectively. They are obtained as shown in Eqs. (6) and (7).

$$\begin{array}{c}{L}_{{\text{advg}}}=E\left[D{\left({X}_{{\text{rec}}}\right)}^{2}\right]+E\left[{\left(1-D\left({X}_{{\text{ori}}}\right)\right)}^{2}\right]\end{array}$$

(3)

$$\begin{array}{c}{L}_{{\text{advl}}}=E\left[{D}_{l}{\left({R}_{{\text{rec}}}\right)}^{2}\right]+E\left[{\left(1-{D}_{l}\left({R}_{{\text{ori}}}\right)\right)}^{2}\right]\end{array}$$

(4)

$$\begin{array}{c}{X}_{{\text{rec}}}=G\left({X}_{{\text{input}}}\right)\end{array}$$

(5)

$$\begin{array}{c}{R}_{ori}= {X}_{{\text{ori}}}\odot {X}_{{\text{mask}}}+\left(1- {X}_{{\text{mask}}}\right)\end{array}$$

(6)

$$\begin{array}{c}{R}_{rec}= {X}_{{\text{rec}}}\odot {X}_{{\text{mask}}}+\left(1- {X}_{{\text{mask}}}\right)\end{array}$$

(7)

Usually, in the field of image generation, we use pixel reconstruction loss (${L}_{1}$) to describe the pixel difference between images. As shown in Eqs. (8) and (9), ${L}_{1g}$ and ${L}_{1l}$ represent global ${L}_{1}$ and local ${L}_{1}$.

$$\begin{array}{c}{L}_{1g}=\parallel {X}_{{\text{ori}}}-{{X}_{{\text{rec}}}\parallel }_{1}\end{array}$$

(8)

$$\begin{array}{c}{L}_{1l}=\parallel {R}_{{\text{ori}}}-{{R}_{{\text{rec}}}\parallel }_{1}\end{array}$$

(9)

With the good effect of the generative algorithm in the field of image style transformation, the image features extracted by the convolutional network have been widely used as part of the objective function. We use the perceptual loss and style loss of the global image, which are shown in Eqs. (10) and (11), respectively.

$$\begin{array}{c}{L}_{per}=\sum_{i=1}^{{N}_{i}}\frac{{\parallel \left.{\phi }_{i}({X}_{{\text{ori}}})-{\phi }_{i}({X}_{{\text{rec}}}\right)\parallel }_{1}}{{N}_{i}}\end{array}$$

(10)

where ${\phi }_{i}$ is the activation map from the i-th layers of a pretrained network (e.g., VGG19²⁶)$\mathrm{and } {N}_{i}$ is the number of elements in ${\phi }_{i}$. Similarly, the style loss is defined as the L1 distance between the Gram matrices of deep features of inpainting and real images.

$$\begin{array}{c}{L}_{sty}={\mathbb{E}}_{i}\left[{\parallel \left.{\phi }_{i}{\left({X}_{{\text{ori}}}\right)}^{T}{\phi }_{i}({X}_{{\text{ori}}})-{\phi }_{i}{\left({X}_{{\text{rec}}}\right)}^{T}{\phi }_{i}({X}_{{\text{rec}}}\right)\parallel }_{1}\right]\end{array}$$

(11)

The loss values of each part are added by a certain weight to obtain the final loss function, as shown in Eq. (12).

$$\begin{array}{c}{L}_{total}={\lambda }_{adv}{(L}_{{\text{advg}}}+{L}_{{\text{advl}}})+{\lambda }_{1g}{L}_{1g}+{\lambda }_{1l}{L}_{1l}+{\lambda }_{per}{L}_{per}+{\lambda }_{sty}{L}_{sty}\end{array}$$

(12)

where ${L}_{total}$ is the total loss and ${\lambda }_{adv}$=0.02, ${\lambda }_{1}$=1, ${\lambda }_{per}$=0.1, and ${\lambda }_{sty}$=150.

Process of synthesizing datasets

As shown in Fig. 7, in the stage of sclerosed glomerular synthesis, we used Deep Convolutional Generative Adversarial Network (DCGAN)²⁷ to generate masks of different shapes and sizes based on the existing masks. Since colourful pixel values are likely to appear during mask generation, it is necessary to grayscale the generated masks and set a threshold at the same time to eliminate isolated regions in the masks whose area was smaller than the threshold. The glomerular contours in the masks are scaled so that the number of contours of different sizes are evenly distributed. Based on the pathologist's recommendation, we locate the potential positions for sclerosed glomeruli and cropped out squares of $256\times 256$ in these positions. Similar to the operation during training, masked images (${X}_{{\text{gap}}\_{\text{t}}}$) are obtained by pixelwise multiplication, as shown in Eq. (13) based on randomly selected generated masks (${X}_{{\text{gmask}}}$) and cropped images (${X}_{{\text{ori}}\_{\text{t}}}$).${X}_{{\text{gmask}}}$ and ${X}_{{\text{gap}}\_{\text{t}}}$ are concatenated as the input of the inpainting model. Finally, the generated images were merged into the original cropped area, and a new ROI with several sclerosed glomeruli in different positions was obtained.

$$\begin{array}{c}{{X}_{gap\_t}=X}_{{\text{ori}}\_{\text{t}}}\odot \left(1-{X}_{{\text{gmask}}}\right)\end{array}$$

(13)

Glomerular segmentation network

For the design of the glomerular image segmentation network, we generally adopt an encoder-decoder architecture, within which the decoder structure is Unet. The skip connection in Unet is used to fuse multiscale features from the encoder with upsampled features. Here, shallow features and deep features are connected together to reduce the spatial information loss caused by downsampling. In the encoder part, we select EfficientNet as our encoder backbone. The reason why the more advanced transformer structure is not adopted here is that its performance heavily relies on pretraining and requires a large amount of computation. Thus, its training time and computation time will be higher than those of the CNN model under the same parameters. Meanwhile, its prediction time will be longer. We hope that EfficientNet can obtain the result faster while ensuring the effect, which is very important for slice evaluation. EfficientNet was proposed in²⁸ and takes into account both the depth and width of the network. There are currently several versions of EfficientNet including B0-B7. To meet the speed and accuracy requirements of network training, we use EfficientNetB3 as the backbone of our encoder. The input size of the network is 1024 × 1024. Before training the model, the inputs are normalized. The network architecture is shown in Fig. 8.

When training the segmentation model, the batch size was 8, and the optimizer was Adam. We adopted CosineAnnealingLR in the Pytorch framework for the learning strategy and the minimum learning rate was 0.00001. We used the weighted sum of Binary CrossEntropy Loss (${L}_{BCE}$)and Dice Loss (${L}_{Dice}$)²⁹ as the total loss (${L}_{seg}$). The specific formula is shown in Eq. (14), where $\lambda$ = 0.5. We trained our segmentation models on a single Tesla P100 (16 GB). A total of 200 epochs were trained, and the ratio of training sets and validation sets was 8:2. When evaluating the test set, the model with the lowest Dice coefficient of the validation sets was selected for evaluation.

$$\begin{array}{c}{L}_{seg=\lambda {L}_{BCE}+{L}_{Dice}}\end{array}$$

(14)

Experiments and results

Details and evaluation methods

Based on the images of sclerosed glomeruli and the corresponding masks obtained in the previous chapter, we trained and tested the inpainting network. The ratio of the training set to the test set was 8:2. We trained the models on a single Tesla P100 (16 GB). Here, we used ADAM as the optimizer with an initialized learning rate of 0.0001 and betas of {0.5; 0.999}. We trained our model for 100 epochs with a batch size of 16. The size of the network output was 256 × 256.

To characterize the model’s glomerular segmentation ability, especially the performance on sclerosed glomeruli, quantitative evaluation is needed. In the segmentation process, for each pixel in the image, there are two categories: positive and negative. If the prediction of positive or negative is correct, it is TP or TN. Conversely, it is FP or FN. Based on these four values, we can also obtain other commonly used metrics, as shown in Table 1.

Table 1 Equations of metrics performance.

Full size table

Generation results and analysis

To evaluate the quality of generated sclerosed glomeruli quantitatively, we used the mean absolute error (MAE), peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) to compare the differences between generated glomeruli and the original. Table 2 shows the values of the three metrics.

Table 2 Values of inpainting performance.

Full size table

As shown in Fig. 9, we obtained the synthetic images by combining the cropped glomeruli from the generated images with the background of the original images.

Glomerular segmentation and analysis

Before adding generated sclerosed glomeruli to the training data, we verified the effect when using traditional data augmentation, including random flipping and rotating. We evaluated the performance of glomerular segmentation with traditional data augmentation and without traditional data augmentation for three test sets. The data augmentation strategy we adopted is an online method and each input training picture has a certain chance to be flipped or rotated. We performed ten experiments, with each experimental training set and validation set randomly divided. As shown in Table 3, we obtained the performance of glomerular segmentation based on our segmentation model under traditional data augmentation. “√” represents the use of traditional data augmentation.

Table 3 Performance of glomerular segmentation based on our segmentation model under traditional data augmentation or not, where NG-normal glomeruli and SG-sclerosed glomeruli.

Full size table

As seen from the Table 3, for different test sets, applying traditional data augmentation to training data can improve the overall effect of glomerular segmentation to a certain extent, but there may be a decline in precision. The experiments show that the segmentation performance of normal glomeruli is much better than that of sclerosed glomeruli, which is consistent with most of the current studies and validates the necessity of our data generation. At the same time, we see that the performance on test 3 is lower than that of test 1 and test 2, and we can conclude that the migration ability of the algorithm in recognition of renal pathological images with different staining needs to be improved because of the characteristics of different staining methods.

Therefore, in order to better evaluate the effect of the sclerosed glomerulus we generated on image segmentation, we analyse the influence of different amounts of synthetic data on the model identification ability for sclerosed glomeruli on the basis of traditional data augmentation, as shown in Table 4.

Table 4 Performance comparison of sclerosed glomerular segmentation based on our segmentation model when adding different amounts of synthetic data.

Full size table

Table 4 shows that by adding different amounts of synthetic data based on our algorithm, the segmentation performance of scleral glomerulus is greatly improved. This shows that the ability to recognize sclerosed glomeruli is improved, as our generated sclerosed glomeruli have different shapes and sizes distributed in different locations. However, the performance of adding more synthetic data is not always better than that of others. The reason for our analysis is that although the diversity of the generated data shapes is greatly improved, the mechanism features inside the sclerosed glomeruli are still generated based on the existing data, and the distribution of features is still consistent with the original ones.

Additionally, we compared our segmentation model with other classical models to verify the advantages of our method in the task of glomerular segmentation, as shown in Table 5.

Table 5 Performance comparison of glomerular segmentation based on three modes, including our proposed method.

Full size table

Compared with other medical semantic segmentation algorithms including Unet and Unet++ ³⁰ which are all trained with 100% generated data, we calculated the mean value of each metric of the two classes on the different test sets. We see that our algorithm performs better than other algorithms on different test sets. In addition, on the test set stained by H&E, our algorithm has greater advantages than the others, such as better generalization and migration ability.

Figure 10 shows the visualization of our final model output, with the annotations of the data in the left column and the output of the model in the right column. Blue represents normal glomeruli, and red represents sclerosed glomeruli. We can see that our model can label sclerosed glomeruli that missed the mark in the original data label, especially in the marginal part, which shows the excellent identification ability of our model. However, at the same time, there are smaller sclerosed glomeruli that are missing and need to be improved.

Conclusion

In the task of glomerular identification and classification, it is difficult and costly to obtain large amounts of data for training the model, and there is a problem of class imbalance because the number of sclerosed glomeruli is much larger than that of normal glomeruli in the available data. Therefore, we proposed a sclerosed glomerular generation method based on image inpainting. With the existing masks, we generated diverse masks and scaled them to obtain more small sclerosed glomeruli. By using the proposed image inpainting-based method and generated masks, we synthesized multiple images of sclerosed glomeruli with good fusion with the backgroundand ensured that the texture of sclerosed glomeruli was clear and true. We combined the synthesized data with the original data and passed them into the segmentation network. The glomerular segmentation network was based on Unet where we used EfficinetNetB3 as the backbone of the encoder. When we incorporated synthetic sclerosed glomeruli, we achieved better sclerosed glomerular identification under traditional data augmentation. Compared with other segmentation models, our model achieved the best mean F1 and Dice coefficients containing 2 classes by using EfficientNetB3-Unet.

Since our identification algorithm is mainly targeted at the training of globally sclerosed and normal glomeruli, the identification performance of other classes using our algorithm needs to be improved. Our generated glomeruli are also globally sclerosed, and we can further explore the controlled generation of different degrees of sclerosed glomeruli. In this way, we can reduce the number of missed glomerular tests.

Data availability

The 31 WSIs generated in the AIDPATH are hosted by Mendeley at: https://data.mendeley.com/datasets/k7nvtgn2x6/3 and 78 ROIs from 21WSIs are hosted by Mendeley at: https://zenodo.org/record/4299694.

References

Tullius, S. G. & Rabb, H. Improving the supply and quality of deceased-donor organs for transplantation. N. Engl. J. Med. 378(20), 1920–1929 (2018).
Article PubMed Google Scholar
Moeckli, B. et al. Evaluation of donor kidneys prior to transplantation: An update of current and emerging methods. Transpl. Int. 32(5), 459–469 (2019).
Article PubMed Google Scholar
Stewart, D. E. & Klassen, D. K. Early experience with the new kidney allocation system: A perspective from UNOS. Clin. J. Am. Soc. Nephrol. CJASN 12(12), 2063 (2017).
Article PubMed Google Scholar
Ronneberger, O., Fischer, P., & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. 234–241. (Springer, 2015).
Goodfellow, I. J. et al. Generative adversarial networks. Proc. Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014).
Google Scholar
Liu, K., Shuai, R. & Ma, L. Cells image generation method based on VAE-SGAN. Proc. Comput. Sci. 183, 589–595 (2021).
Article Google Scholar
Guan, Q. et al. Medical image augmentation for lesion detection using a texture-constrained multichannel progressive GAN. Comput. Biol. Med. 145, 105444 (2022).
Article PubMed Google Scholar
Lei, H. et al. LAC-GAN: Lesion attention conditional GAN for ultra-widefield image synthesis. Neural Netw. 158, 89–98 (2023).
Article PubMed Google Scholar
Amirrajab, S. et al. Label-informed cardiac magnetic resonance image synthesis through conditional generative adversarial networks. Comput. Med. Imaging Graph. 101, 102123 (2022).
Article PubMed Google Scholar
Marsh, J. N. et al. Deep learning global glomerulosclerosis in transplant kidney frozen sections. IEEE Trans. Med. Imaging 37(12), 2718–2728 (2018).
Article PubMed PubMed Central Google Scholar
Gallego, J. et al. A U-Net based framework to quantify glomerulosclerosis in digitized PAS and H&E stained human tissues. Comput. Med. Imaging Graph. 89, 101865 (2021).
Article PubMed Google Scholar
Bueno, G., Fernandez-Carrobles, M. M., Gonzalez-Lopez, L. & Deniz, O. Glomerulosclerosis identification in whole slide images using semantic segmentation. Comput. Methods Programs Biomed. 184, 105273 (2020).
Article PubMed Google Scholar
Jiang, L., Chen, W., Dong, B., Mei, K., Zhu, C., Liu, J., & Shi, H. A deep learning-based approach for glomeruli instance segmentation from multistained renal biopsy (2021).
Yao, T. et al. Glo-In-One: Holistic glomerular detection, segmentation, and lesion characterization with large-scale web image mining. J. Med. Imaging 9(5), 052408–052408 (2022).
Article Google Scholar
Kawazoe, Y. et al. Computational pipeline for glomerular segmentation and association of the quantified regions with prognosis of kidney function in IgA nephropathy. Diagnostics 12(12), 2955 (2022).
Article CAS PubMed PubMed Central Google Scholar
Silva, J. et al. Boundary-aware glomerulus segmentation: toward one-to-many stain generalization. Comput. Med. Imaging Graph. 100, 102104 (2022).
Article PubMed Google Scholar
Bueno, G., Gonzalez-Lopez, L., Garcia-Rojo, M., Laurinavicius, A. & Deniz, O. Data for glomeruli characterization in histopathological images. Data Brief 29, 105314 (2020).
Article PubMed PubMed Central Google Scholar
Weinstein, J. N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013).
Article PubMed PubMed Central Google Scholar
Shin, Y., Qadir, H. A. & Balasingham, I. Abnormal colon polyp image synthesis using conditional adversarial networks for improved detection performance. IEEE Access 6, 56007–56017 (2018).
Article Google Scholar
Qadir, H. A., Balasingham, I. & Shin, Y. Simple U-net based synthetic polyp image generation: Polyp to negative and negative to polyp. Biomed. Signal Process. Control 74, 103491 (2022).
Article Google Scholar
Efros, A. A., & Freeman, W. T. Image quilting for texture synthesis and transfer. In Seminal Graphics Papers: Pushing the Boundaries. Vol. 2. 571–576 (2023).
Glotsos, D., Kostopoulos, S., Ravazoula, P. & Cavouras, D. Image quilting and wavelet fusion for creation of synthetic microscopy nuclei images. Comput. Methods Programs Biomed. 162, 177–186 (2018).
Article PubMed Google Scholar
Zeng, Y., Fu, J., Chao, H. & Guo, B. Aggregated contextual transformations for high-resolution image inpainting. IEEE Trans. Vis. Comput. Graph. 29, 3266 (2022).
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., & He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500 (2017).
Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. Self-attention generative adversarial networks. In International Conference on Machine Learning. 7354–7363 (PMLR, 2019).
Sengupta, A., Ye, Y., Wang, R., Liu, C. & Roy, K. Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. 13, 95 (2019).
Article PubMed PubMed Central Google Scholar
Radford, A., Metz, L., & Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv Preprint arXiv:1511.06434 (2015).
Tan, M., & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. 6105–6114. (PMLR, 2019).
Milletari, F., Navab, N., & Ahmadi, S. A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 Fourth International Conference on 3D Vision (3DV). 565–571 (IEEE, 2016).
Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N., & Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings. Vol. 4. 3–11. (Springer, 2018).

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China in 2021 (Num. 72104085).

Author information

Authors and Affiliations

Digital Manufacturing Equipment National Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China
Songping He & Bin Li
National NC System Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China
Yi Zou & Fangyu Peng
Key Laboratory of Organ Transplantation of Ministry of Education, Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, National Health Commission and Chinese Academy of Medical Sciences, Huazhong University of Science and Technology, Wuhan, China
Xia Lu & Hui Guo
Wuhan Intelligent Equipment Industrial Institute Co Ltd, Wuhan, China
Xin Tan
Department of Information Management, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Yanyan Chen

Authors

Songping He
View author publications
You can also search for this author in PubMed Google Scholar
Yi Zou
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Fangyu Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xia Lu
View author publications
You can also search for this author in PubMed Google Scholar
Hui Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xin Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.S. and Y.S. wrote the main manuscript text. Y.Z. prepared all figures .B.L. , F.Y. ,X.L. , H.G. ,X.T. and Y.C. provided guidance on the article. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yanyan Chen.

Ethics declarations

Competing interests

Authors declare no conflict of interest of the manuscript entitled: ‘An Image Inpainting-based Data Augmentation Method for Improved Sclerosed Glomerular Identification Performance with the Segmentation Model- EfficientNetB3-Unet’.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

He, S., Zou, Y., Li, B. et al. An image inpainting-based data augmentation method for improved sclerosed glomerular identification performance with the segmentation model EfficientNetB3-Unet. Sci Rep 14, 1033 (2024). https://doi.org/10.1038/s41598-024-51651-1

Download citation

Received: 23 July 2023
Accepted: 08 January 2024
Published: 10 January 2024
DOI: https://doi.org/10.1038/s41598-024-51651-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Multi-level dilated residual network for biomedical image segmentation

Deep-learning model for evaluating histopathology of acute renal tubular injury

Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks

Introduction

Related research

GAN and medical image generation

Deep learning on glomerular identification and classification

Materials

Data source

Data processing

Methods

Framework of sclerosed glomerular generation

Cropped sclerosed glomerular masks

Architecture of sclerosed glomerular inpainting network

A. Sclerosed glomerular mask input

B. Generator

C. Discriminator

D. Sclerosed glomerular attention loss module

Process of synthesizing datasets

Glomerular segmentation network

Experiments and results

Details and evaluation methods

Generation results and analysis

Glomerular segmentation and analysis

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links