An image inpainting-based data augmentation method for improved sclerosed glomerular identification performance with the segmentation model EfficientNetB3-Unet

The percent global glomerulosclerosis is a key factor in determining the outcome of renal transfer surgery. At present, the rate is typically computed by pathologists, which is labour intensive and nonstandardized. With the development of Deep Learning (DL), DL-based segmentation models can be used to better identify and segment normal and sclerosed glomeruli. Based on this, we can better quantify percent global glomerulosclerosis to reduce the discard rate of donor kidneys. We used 51 whole slide images (WSIs) from different institutions that are publicly available on the internet. However, the number of sclerosed glomeruli is much smaller than that of normal glomeruli in different WSIs, which can reduce the effectiveness of Deep Learning. For better sclerosed glomerular identification and segmentation performance, we modified and trained a GAN (generative adversarial network)-based image inpainting model to obtain more synthetic sclerosed glomeruli. Our proposed inpainting method achieved an average SSIM (Structural Similarity) of 0.8086 and an average PSNR (Peak Signal-to-Noise Ratio) of 22.8943 dB in the area of generated sclerosed glomeruli. We obtained sclerosed glomerular segmentation performance improvement by adding synthetic sclerosed glomerular images and achieved the best Dice of glomerular segmentation in different test sets based on the modified Unet model.


An image inpainting-based data augmentation method for improved sclerosed glomerular identification performance with the segmentation model EfficientNetB3-Unet
Songping He 1 , Yi Zou 2 , Bin Li 1 , Fangyu Peng 2 , Xia Lu 3 , Hui Guo 3 , Xin Tan 4 & Yanyan Chen 5* The percent global glomerulosclerosis is a key factor in determining the outcome of renal transfer surgery.At present, the rate is typically computed by pathologists, which is labour intensive and nonstandardized.With the development of Deep Learning (DL), DL-based segmentation models can be used to better identify and segment normal and sclerosed glomeruli.Based on this, we can better quantify percent global glomerulosclerosis to reduce the discard rate of donor kidneys.We used 51 whole slide images (WSIs) from different institutions that are publicly available on the internet.However, the number of sclerosed glomeruli is much smaller than that of normal glomeruli in different WSIs, which can reduce the effectiveness of Deep Learning.For better sclerosed glomerular identification and segmentation performance, we modified and trained a GAN (generative adversarial network)-based image inpainting model to obtain more synthetic sclerosed glomeruli.Our proposed inpainting method achieved an average SSIM (Structural Similarity) of 0.8086 and an average PSNR (Peak Signal-to-Noise Ratio) of 22.8943 dB in the area of generated sclerosed glomeruli.We obtained sclerosed glomerular segmentation performance improvement by adding synthetic sclerosed glomerular images and achieved the best Dice of glomerular segmentation in different test sets based on the modified Unet model.
There are a large number of patients in need of kidney transplantation waiting for kidney donors, and this demand is still growing 1 .Meanwhile, many studies have shown that chronic damage to donor kidney biopsy specimens is closely related to transplant outcomes, so approximately 17-20% of collected donor kidneys need to be discarded after pathologist evaluation 2 .Additionally, in daily practice, often due to the urgency of time, different pathologists will have subjective biases when evaluating sections, potentially resulting in unnecessary discarding of organs.We need to minimize the occurrence of this discarding due to the shortage of kidney donors.
In kidney transplant evaluation, there are many indicators to consider, among which the percent of global glomerulosclerosis is considered to be the entry point for kidney transplantation 3 .Due to the large number of glomeruli, the assessment of percent global glomerulosclerosis is very time-consuming and causes poor reproducibility among pathologists.The professional knowledge requirements of pathologists are high, and human error easily occurs., has proven to be useful in many tasks of tissue image segmentation and classification.

OPEN
However, the performance of Deep Learning often depends on the quantity and quality of the training data.The acquisition of medical images involves the privacy of patients and requires the annotation of experts, so it is relatively difficult to obtain the training data of medical images.Meanwhile, in the publicly available data for glomerular studies, the number of sclerosed glomeruli is much smaller than that of normal glomeruli.Class imbalance can bring difficulties to Deep Learning.
In our study, we proposed a GAN-based image inpainting framework to generate more new sclerosed glomeruli from masks.Innovatively, newly generated sclerosed glomeruli were obtained based on the diverse shapes of the masks and the surrounding contextual information.Furthermore, we improved the segmentation network based on Unet and trained the model by combining the original data with new synthetic data.We realized the automatic segmentation and classification of normal and globally sclerosed glomeruli in digital pathological sections.

Related research GAN and medical image generation
Since Ian Goodfellow proposed GAN in 2014, it has become possible to generate realistic images by designing the game process of the generator and discriminator 5 .Because of their powerful data generation capabilities, an increasing number of GANs have been used in the generation of pathological and medical images to perform data augmentation.In 6 , the combination of VAE and StyleGAN was proposed.The network generated the hidden code of the image through VAE as the input of StyleGAN to generate realistic cell images.In 7 , a medical image augmentation method, namely, a texture-constrained multichannel progressive generative adversarial network (TMP-GAN) was proposed.In 8 , Lei et al. proposed a lesion attention conditional generative adversarial network (LACGAN) to synthesize retinal images with realistic lesion details to improve the training of the disease detection model.Amirrajab, S. et al. proposed a novel framework consisting of image segmentation and synthesis based on mask-conditional GANs for generating high-fidelity and diverse Cardiac Magnetic Resonance (CMR) images 9 .Although different GAN-based frameworks have been applied in the generation of medical images, they still need to improve the performance in the field of pathological images, and there are also few studies on the generation of glomerular pathological images to improve segmentation performance.

Deep learning on glomerular identification and classification
In recent years, an increasing number of methods based on Deep Learning have been proposed to realize the identification and classification of glomeruli in digital pathological images.Each of these approaches has its own advantages and drawbacks.Jon N. Marsh et al. used a fully convolutional neural network based on the VGG16 architecture for glomerular segmentation and achieved 0.784 Aggregate Dice coefficients for nonglobally sclerosed glomeruli and 0.600 for globally sclerosed glomeruli 10 .Jaime Gallego et al. trained the Unet model on PASstained WSIs and H&E-stained WSIs.On the PAS-stained WSIs, normal and sclerosed glomeruli were classified with F1-scores of 97.5% and 68.8%, respectively.On H&E-stained WSIs, F1-scores of 90.8% and 78.1% were achieved 11 .Gloria Bueno proposed the sequential CNN segmentation-classification strategy(SegNet-AlexNet) and this two-step framework achieved 98.16% accuracy in classifying normal and sclerosed glomeruli when trained on 47 PAS-stained WSIs 12 .Lei Jiang et al. trained cascade mask region-based CNN architecture to detect, classify, and segment glomeruli into three categories: (i) GN, structural normal; (ii) global sclerosis; and iii) glomerular with other lesions.They achieved F1 scores of 0.839, 0.806, and 0.753, respectively, in the whole-slide image group 13 .Tianyuan Yao et al. developed and released a holistic Glo-In-One open-source toolkit to provide holistic glomerular detection, segmentation, and lesion characterization 14 .Kawazoe et al. developed an automated computational pipeline for detecting glomeruli on PAS-stained WSIs, followed by segmenting Bowman's space, the glomerular tuft, the crescentic, and the sclerotic region inside of the glomeruli 15 .Silva et al. proposed the end-to-end network, named DS-FNet, combining the strengths of semantic segmentation and semantic boundary detection networks via an attention-aware mechanism, and it showed consistently high performance in a one-to-many-stain glomerulus segmentation 16 .
Most of the existing studies have not focused on the performance improvement of sclerosed glomerular segmentation.However, in fields such as kidney transplantation, the evaluation of sclerosed glomeruli is necessary and meaningful.Therefore, this paper focuses on improving the identification and segmentation performance of sclerosed glomeruli while solving the problem of automatic identification and segmentation of glomeruli.

Materials Data source
In our study, we collected 51 WSIs from open sources, and they are introduced as follows.Thirty-one WSIs generated by the European project AIDPATH (http:// aidpa th.eu) were chosen.The tissue samples were stained using periodic acid-Schiff (PAS) and were scanned at 20× with a Leica Aperio ScanScope CS scanner 17 .The remaining 20 WSIs representing various human kidney pathologies came from four sources: three independent medical centres and TCGA.The data from three independent medical centres were collected by 10 , including 4 H&E-stained slides from the Military Institute of Medicine in Warsaw in Poland, six PAS-stained slides from Hospital Universitario Valld'Hebron, Barcelona in Spain and five H&E-stained slides from Cedars-Sinai Medical Center in Los Angeles in the USA.Five H&E-stained slides from the publicly available TCGA repository 18 .
www.nature.com/scientificreports/All slides were prepared from formalin-fixed paraffin-embedded (FFPE) sections with a thickness of 4 μm.The literature 11 specifically describes the method followed for making the slides.
The 31 WSIs from 17 contain two folders.DATASET_A: Raw data with 31 whole slide images (WSIs) in SVS format.We converted these to PNG format.DATASET_B: 2340 images with a single glomerulus, 1170 normal glomeruli and 1170 sclerosed glomeruli.All of them are in PNG format and are detected from DATASET_A.As the repository only provided normal glomerular patches and sclerosed glomerular patches, with the help of professional pathologists, we should find the specific locations of these glomeruli in WSIs, and use QuPath software to label categories and draw their outlines.For 20 WSIs from three independent medical centres and TCGA, the pathologists in 11 first identified 78 ROIs and then delineated 1,184 glomeruli within the ROIs.ROIs were extracted for × 10, which corresponded to a pixel size of ~ 10 μm.In the availability of materials and data section of this article, we provide the URL where the data can be obtained.

Data processing
We used 25 PAS-stained WSIs from AIDPATH and 20 PAS stained ROIs from three independent medical centres and TCGA as training data for glomerular identification and classification, and the remaining six WSIs from AIDPATH and 58 ROIs were used as different test sets to verify the effectiveness of the glomerular identification algorithm.As shown in Fig. 1, we called the six PAS-stained WSIs from AIDAPTH Test1, the five PAS-stained ROIs from Zenodo and the 53 H&E-stained ROIs from Zenodo Test2 and Test3.Because the training data did not include H&E-stained slides, we could also test algorithm migration performances on H&E-stained WSIs.Since the resolution of a single WSI or ROI was very high, it was not easy to train.We performed overlapping cropping on WSIs or ROIs.The size of the cropping was 1024 × 1024, and the step length was 512.For the test sets, we also adopted the same strategy as the processing method of the training sets.The number of patches obtained from different datasets is shown in Fig. 1.To reduce the training and testing time, we downsampled all slides two times to reduce the size of the picture before cropping.We converted all glomerular contour labelling into pixelwise mask.Specifically, each WSI corresponded to two masks, with black representing the background and white representing all the normal glomeruli and sclerosed glomeruli, respectively.Figure 2 shows the masks of normal and sclerosed glomeruli on a patch.

Cropped sclerosed glomerular masks
To realize the generation of sclerosed glomeruli considering shapes and contextual information, we must create sclerosed glomerular datasets and corresponding masks to train the image inpainting network.Inspired by 19,20 , we synthesized our datasets using existing data.Specifically, the datasets were created as shown in Fig. 3.It is worth noting that our cropping method is designed to place the sclerosed glomeruli in the centre of the cropped image as much as possible, thus potentially assisting in the subsequent training of the generative network.
First, we need to obtain all sclerosed glomerular masks from the segmentation training set.In Fig. 3, the bottom left portion shows the mask of a certain part of sclerosed glomeruli that is shown in the full section image (Fig. 3   The ROIs are usually synthesized in both the foreground and the background to be fair and unbiased.For example, in 21,22 , the authors adopted this idea for synthesis.However, when considering the synthesis of ROIs in this article, we did not consider the generation both the glomeruli and the adjacent tissues (background).The reasons are as follows.In the segmentation task, the area of the sclerosed glomerular regions is relatively small compared to the area of its background.Thus, when the deep learning network segments and classifies the glomeruli in the image, the fraction of other parts is much higher than the fraction of glomerular regions.So, the diversity of other tissues can be guaranteed.Based on this, we adopted a generation way like image inpainting to make the generated glomeruli have a good fusion with the existing adjacent tissue and reduce the number of training parameters.Subsequently we can also generate the adjacent tissue and combine it with the generated glomeruli, which may make our model more robust.

Architecture of sclerosed glomerular inpainting network
The training phases of the sclerosed glomerular inpainting network are shown in Fig. 5.It is divided into four modules as follows.(1) The sclerosed glomerular mask input module controls the area of sclerosed glomerular generation.(2) The Generator module is mainly based on AutoEncoder, which consists of an encoder and a decoder.(3) The discriminator module mainly determines whether the input picture is a real picture or a generated picture and, in turn, promotes the training of the generator.(4) The sclerosed glomerular attention loss module includes the global image loss and the loss of the sclerosed glomerular foreground itself.

A. Sclerosed glomerular mask input
A picture with only background X gap is obtained by Eq. ( 1), where X ori represents a cropped picture of the sclerosed glomerulus with background,X mask represents the corresponding masks, and ⊙represents pixelwise multiplication.According to Eq. ( 2), we can obtain the network input X input by merging the images X gap and the masks X mask in the channel dimension, where merge(•) is the function realizing dimension concatenation.Through passing X input into the generator, we can realize sclerosed glomerular generation at the vacancy.

B. Generator
The generator consists of an encoder, a stack of building blocks, a self-attention block and a decoder.In addition, we use skip connections between the encoder and the decoder.The generator takes the 256 × 256 X input as the input.In the encoder section, the input first passes through a convolutional network of 7 × 7 convolution kernel size, with batch normalization and a LeakyReLU activation function, followed by two 4 × 4 convolutional layers with a stride of 2 to downsample the image.This is followed by eight AOT blocks, all with the same parameter settings to reduce the amount of computation required.The AOT block was proposed in 23 , and the architecture is shown in Fig. 6a.AOT blocks adopt the split-transformation-merge strategy in three steps 24 .During the transformation, each subkernel performs a different transformation of the input feature x 1 by using a different dilation rate.Inspired by ResNet, a gated residual connection first calculates the spatially-variant gate value β from × 1 by a standard convolution and a sigmoid operation, and then the AOT block aggregates the input feature × 1 and the learned residual feature × 2 by a weighted sum with β.The network structure of the decoded part and the encoding part are consistent, and two deconvolution layers are used to make the size of the masked picture the same as the size of the input image.Before the first layer of the upsampling network, there is a self-attention block whose input size is 64 × 64 .It is proposed in 25 .As shown in Fig. 6b, by obtaining the self-attention feature maps, we can explore the relationship between the locality of the picture and the whole to solve the problem of long-distance dependence.Finally, the tanh function is applied in the output layer.

C. Discriminator
The discriminator in this task was set to two, namely, the local discriminator and the global discriminator.The local discriminator only discriminated the generated sclerosed glomeruli, and the global discriminator discriminated the complete generated images, including the sclerosed glomeruli and the background.When we used the local discriminator, the region of the nonglomerulus can be filled in white so that the size of the local glomeruli image is consistent with the input whole image.In this way, the local discriminator has the same network structure as the global discriminator to reduce the amount of calculation.The input size of each discriminator is 256 × 256pixels .There are a total of six convolutional layers, and each convolutional layer uses a 4 × 4 kernel with a stride of 2 (Convolution + LeakyReLU + Batch normalization) to decrease the size of the (1)

D. Sclerosed glomerular attention loss module
As shown in Fig. 5, the image loss and sclerosed glomerular loss are set up to ensure that the whole pictures remain consistent and that the sclerosed glomeruli show a sense of clear texture and staining.Based on the designed loss module with the nature of attention, the generated network achieved a balance between the generation of glomeruli itself and the inpainting of the complete image.The adversarial loss of the global image and local sclerosed glomeruli are shown in Eqs. ( 3) and ( 4), respectively, where D is the global discriminator and D l is the local discriminator.To reduce the amount of computation required, we set the network parameters of D and D l to be the same.X rec is the generated global image, which is obtained by the generator as shown in Eq. ( 5), where G represents the generator.R ori and R rec are the original  6) and (7).
Usually, in the field of image generation, we use pixel reconstruction loss ( L 1 ) to describe the pixel difference between images.As shown in Eqs. ( 8) and ( 9), L 1g and L 1l represent global L 1 and local L 1 .
With the good effect of the generative algorithm in the field of image style transformation, the image features extracted by the convolutional network have been widely used as part of the objective function.We use the perceptual loss and style loss of the global image, which are shown in Eqs.(10) and (11), respectively.
(3) where φ i is the activation map from the i-th layers of a pretrained network (e.g., VGG19 26 )andN i is the number of elements in φ i .Similarly, the style loss is defined as the L1 distance between the Gram matrices of deep features of inpainting and real images.
The loss values of each part are added by a certain weight to obtain the final loss function, as shown in Eq. ( 12).

Process of synthesizing datasets
As shown in Fig. 7, in the stage of sclerosed glomerular synthesis, we used Deep Convolutional Generative Adversarial Network (DCGAN) 27 to generate masks of different shapes and sizes based on the existing masks.
Since colourful pixel values are likely to appear during mask generation, it is necessary to grayscale the generated masks and set a threshold at the same time to eliminate isolated regions in the masks whose area was smaller than the threshold.The glomerular contours in the masks are scaled so that the number of contours of different sizes are evenly distributed.Based on the pathologist's recommendation, we locate the potential positions for sclerosed glomeruli and cropped out squares of 256 × 256 in these positions.Similar to the operation during training, masked images ( X gap_t ) are obtained by pixelwise multiplication, as shown in Eq. ( 13) based on ran- domly selected generated masks ( X gmask ) and cropped images ( X ori_t ).X gmask and X gap_t are concatenated as the input of the inpainting model.Finally, the generated images were merged into the original cropped area, and a new ROI with several sclerosed glomeruli in different positions was obtained.

Glomerular segmentation network
For the design of the glomerular image segmentation network, we generally adopt an encoder-decoder architecture, within which the decoder structure is Unet.The skip connection in Unet is used to fuse multiscale features from the encoder with upsampled features.Here, shallow features and deep features are connected together to reduce the spatial information loss caused by downsampling.In the encoder part, we select EfficientNet as our encoder backbone.The reason why the more advanced transformer structure is not adopted here is that its performance heavily relies on pretraining and requires a large amount of computation.Thus, its training time and computation time will be higher than those of the CNN model under the same parameters.Meanwhile, its prediction time will be longer.We hope that EfficientNet can obtain the result faster while ensuring the effect, which is very important for slice evaluation.EfficientNet was proposed in 28 and takes into account both the depth and width of the network.There are currently several versions of EfficientNet including B0-B7.To meet (11)

Generation results and analysis
To evaluate the quality of generated sclerosed glomeruli quantitatively, we used the mean absolute error (MAE), peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) to compare the differences between generated glomeruli and the original.Table 2 shows the values of the three metrics.
As shown in Fig. 9, we obtained the synthetic images by combining the cropped glomeruli from the generated images with the background of the original images.

Glomerular segmentation and analysis
Before adding generated sclerosed glomeruli to the training data, we verified the effect when using traditional data augmentation, including random flipping and rotating.We evaluated the performance of glomerular segmentation with traditional data augmentation and without traditional data augmentation for three test sets.The data augmentation strategy we adopted is an online method and each input training picture has a certain chance to be flipped or rotated.We performed ten experiments, with each experimental training set and validation set randomly divided.As shown in Table 3, we obtained the performance of glomerular segmentation based on our segmentation model under traditional data augmentation."√" represents the use of traditional data augmentation.
As seen from the Table 3, for different test sets, applying traditional data augmentation to training data can improve the overall effect of glomerular segmentation to a certain extent, but there may be a decline in precision.The experiments show that the segmentation performance of normal glomeruli is much better than that of sclerosed glomeruli, which is consistent with most of the current studies and validates the necessity of our data generation.At the same time, we see that the performance on test 3 is lower than that of test 1 and test 2, and we can conclude that the migration ability of the algorithm in recognition of renal pathological images with different staining needs to be improved because of the characteristics of different staining methods.www.nature.com/scientificreports/Therefore, in order to better evaluate the effect of the sclerosed glomerulus we generated on image segmentation, we analyse the influence of different amounts of synthetic data on the model identification ability for sclerosed glomeruli on the basis of traditional data augmentation, as shown in Table 4.
Table 4 shows that by adding different amounts of synthetic data based on our algorithm, the segmentation performance of scleral glomerulus is greatly improved.This shows that the ability to recognize sclerosed glomeruli is improved, as our generated sclerosed glomeruli have different shapes and sizes distributed in different locations.However, the performance of adding more synthetic data is not always better than that of others.The reason for our analysis is that although the diversity of the generated data shapes is greatly improved, the mechanism features inside the sclerosed glomeruli are still generated based on the existing data, and the distribution of features is still consistent with the original ones.
Additionally, we compared our segmentation model with other classical models to verify the advantages of our method in the task of glomerular segmentation, as shown in Table 5.
Compared with other medical semantic segmentation algorithms including Unet and Unet++ 30 which are all trained with 100% generated data, we calculated the mean value of each metric of the two classes on the different test sets.We see that our algorithm performs better than other algorithms on different test sets.In addition, on the test set stained by H&E, our algorithm has greater advantages than the others, such as better generalization and migration ability.
Figure 10 shows the visualization of our final model output, with the annotations of the data in the left column and the output of the model in the right column.Blue represents normal glomeruli, and red represents sclerosed glomeruli.We can see that our model can label sclerosed glomeruli that missed the mark in the original data label, especially in the marginal part, which shows the excellent identification ability of our model.However, at the same time, there are smaller sclerosed glomeruli that are missing and need to be improved.

Conclusion
In the task of glomerular identification and classification, it is difficult and costly to obtain large amounts of data for training the model, and there is a problem of class imbalance because the number of sclerosed glomeruli is much larger than that of normal glomeruli in the available data.Therefore, we proposed a sclerosed glomerular top half).Based on the sclerosed glomerular labels provided by the masks of the open data source, the minimum peripheral circle was made for each sclerosed glomerulus, as shown in the red circles in the bottom left of Fig. 3.The centre of the outer circle was taken as the centre of the cropped rectangle picture, and the size is 256 × 256 , as shown in the green rectangle box in the bottom left section.The position of the rectangular box is mapped to the position of the original slice, as shown in the bottom right of Fig. 3.The rectangular masks and the corresponding pictures were cropped.The final training data are shown in the Fig. 4.

Figure 1 .
Figure 1.Data processing and settings.

Figure 3 .
Figure 3. Explanation of the sclerosed glomerular cutting method.

Figure 6 .
Figure 6.Key modules in the inpainting network.(a) Architecture of the AOT block.(b) Architecture of the self-attention block.

1
Digital Manufacturing Equipment National Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China. 2 National NC System Engineering Research Center, Huazhong University of Science and Technology, Wuhan, China. 3Key Laboratory of Organ Transplantation of Ministry of Education, Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, National Health Commission and Chinese Academy of Medical Sciences, Huazhong University of Science and Technology, Wuhan, China. 4Wuhan Intelligent Equipment Industrial Institute Co Ltd, Wuhan, China.Recently, due to the strong feature extraction ability of Deep Learning, an increasing number of studies have begun to use it to detect or segment objects in pathological images.In the imaging task, CNNs in particular are widely used.Unet, introduced by Ronneberger et al. based on CNN 5Department of Information Management, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.* email: yychen@ tjh.tjmu.edu.cn

Table 1 .
Equations of metrics performance.

Table 3 .
Performance of glomerular segmentation based on our segmentation model under traditional data augmentation or not, where NG-normal glomeruli and SG-sclerosed glomeruli.The values represent the mean and standard deviation of the ten training times.

Table 4 .
Performance comparison of sclerosed glomerular segmentation based on our segmentation model when adding different amounts of synthetic data.The values represent the mean and standard deviation of the ten training times.