Generation of microbial colonies dataset with deep learning style transfer

We introduce an effective strategy to generate an annotated synthetic dataset of microbiological images of Petri dishes that can be used to train deep learning models in a fully supervised fashion. The developed generator employs traditional computer vision algorithms together with a neural style transfer method for data augmentation. We show that the method is able to synthesize a dataset of realistic looking images that can be used to train a neural network model capable of localising, segmenting, and classifying five different microbial species. Our method requires significantly fewer resources to obtain a useful dataset than collecting and labeling a whole large set of real images with annotations. We show that starting with only 100 real images, we can generate data to train a detector that achieves comparable results (detection mAP \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=0.416$$\end{document}=0.416, and counting MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=4.49$$\end{document}=4.49) to the same detector but trained on a real, several dozen times bigger dataset (mAP \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=0.520$$\end{document}=0.520, MAE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=4.31$$\end{document}=4.31), containing over 7 k images. We prove the usefulness of the method in microbe detection and segmentation, but we expect that it is general and flexible and can also be applicable in other domains of science and industry to detect various objects.

Synthetic microbial colonies dataset. In our study, we develop another strategy to generate annotated synthetic datasets without the need for a lot of input resources. We propose a method combining traditional computer vision and DL-based style transfer concepts for generating synthetic images of Petri dishes. We utilise the neural style transfer technique to change the style of an image in one domain (synthetic) to the style of an image in another domain (real), and thus improve the realness of the generated data. Photorealistic image stylization does not require much resources and typically uses a single style image during the stylization process 38 . The idea was introduced by Gatys 39 and further developed by Fei-Fei and others [40][41][42][43] .
We used the higher-resolution subset of the recently introduced AGAR microbial colony dataset 12 to conduct a synthetic dataset generation experiment. We randomly selected 100 annotated images from this subset and used them to feed the colony extraction part. Additionally, 10 empty dishes were taken to serve as a background on which colonies were deposited. Using the extracted colonies, we generated a big synthetic dataset of Petri dishes augmented using style transfer with 20 different styles. For diverse styles, we used 20 fragments with different lighting conditions, selected from these 100 input images. Each generated image contains a selected number of colonies (corresponding to real data to be mimicked) and is of size 512 × 512 pixels, which corresponds to the size of fragments (patches) used for training colony detectors when evaluating AGAR dataset 12 . To cover the whole dish we would need approximately 8 × 8 patches which gives a similar resolution for the whole dish image as in the AGAR samples with 4000 × 4000 pixels (higher-resolution subset). Note, however, that during the generation process we synthesize individual patches, not the entire dish at once, but after simply adjusting the method, we could also generate the whole dish. Finally, a deep learning detector was trained entirely on this synthetic dataset and evaluated on the test part of the higher-resolution AGAR subset. This way, the obtained results are comparable with the ones from 12 , obtained for the same DL detector architecture, but trained entirely on the real dataset, i.e. training part of the higher-resolution subset containing 7360 images.

Methods
In this section we present a detailed description of a method for generation of synthetic images with microbial colonies. Then we provide an overview of the neural network models used to: make the generated images more realistic, and to implement the colony detector. Our source code with the python implementation of the generation framework is publicly available at 44 . Let us start with a general scheme of the introduced method. In Fig. 1, our strategy to generate a synthetic dataset is presented. The method consists of three subsequent steps. We start with labeled real images of Petri dishes and perform colony segmentation using traditional computer vision algorithms including proper filtering, thresholding, and energy-based segmentation. To get a balanced working dataset, we randomly select 20 images for each of the 5 microbial species (giving 100 images in total) from the higher-resolution subset of the AGAR dataset 12 .
In the second step, segmented colonies and clusters of colonies are randomly arranged on fragments of an empty Petri dish (we call them patches). We select a random fragment of one of the 10 images of a real empty dish. We repeat this step many times, placing subsequent clusters in random places making sure they do not overlap. Simultaneously, we store the position of each colony from the cluster placed on the patch and its segmentation mask, creating a dictionary of annotations for that patch.
Finally, in the third step, we apply the augmentation method via deep learning style transfer. We transfer the style of a given raw patch to one of the selected real images that serve as style carriers. We select 20 real fragments with very different lighting conditions serving as style carriers to increase the diversity of the generated patches.
Colony segmentation. At first we read the annotations-each stored as 4 numbers (x, y, width, height) defining a bounding box that represents a single labelled colony on a given image. Using this information, we check which colonies overlap by more than 0.01 part of its area and, this way, we build an adjacency matrix between colonies. This step is important because colonies naturally overlap in highly populated dishes. Conse-Scientific Reports | (2022) 12:5212 | https://doi.org/10.1038/s41598-022-09264-z www.nature.com/scientificreports/ quently, to capture the geometrical details of their mutual shapes, we want to segment whole clusters of overlapping colonies-not just individual ones. Using the adjacency matrix and breadth-first search algorithm for the graph (representing connections between colonies), we obtain connected clusters of bacterial colonies (clearly, some clusters contain a single colony) that we segment in the next step. In this step, we cut off the rectangular fragment of the image which contains a single cluster (and repeat this step for every cluster), and add a binary alpha channel m bx with zero values on areas not covered by any of the bounding boxes from the cluster. We call it boxes' mask, and obviously m bx has the same size as the rectangular fragment. We are now ready to perform a segmentation procedure to remove the colony background and-consequently-refine the alpha channel for that fragment. Examples of colonies with removed background ready to be arranged on an empty dish are presented in Fig. 1.
The segmentation procedure consists of multiple substeps using computer vision methods. Such methods have already been used to detect and segment microbial objects 10,11,[45][46][47] . The first step consists of filtering out unwanted artifacts. At first, we apply the unsharp mask filter and convert the given fragment to the CIELab colorspace. Secondly, we implement simple removal of dark objects (i.e., contamination and characteristic text labels across the dish substrate): they are detected via the luminance and b-value thresholding 10 in the CIELab colorspace. Then, we dilate the mask m d for better coverage of artifacts (dark regions). Finally, each pixel in the detected dark region is replaced with the nearest valid (i.e., not belonging to the dark region mask m d ) pixel found by a random walk algorithm. An example of the method operation is shown in Fig. 2. Such an approach successfully removes the above-mentioned artifacts, but unfortunately random walk introduces a lot of speckle noise to a given fragment. Therefore, in the last step we apply denoising using non-local means 48 to remove such speckle noise. Now we are ready to perform colony cluster segmentation. We use the powerful Chan-Vese algorithm 49,50 based on a signal energy minimization provided by the scikit-image library 51 . Then add some margins to the resulting segmentation mask m s to be sure that we segment whole colonies together with their edges-see examples in Fig. 3. To get better future blending with the empty dish background, we create an additional mask m b with values for each pixel (in a range of [0 : 255]) proportional to its distance from average background color (i.e. outside the segmentation mask m s ) in the CIELab colorspace. Such a mask has lower values in areas where the pixel color is similar to the patch background average color. The idea in this step is to increase opacity (lower alpha channel) in areas that are similar to the background and contain a lower amount of interesting information, i.e., we assume that colonies are different from their background. This step can significantly improve the realism of the visual look of the generated patches. Next, this blending mask multiplies the segmentation and box masks, and using Hadamard product of m bx • m s • m b , we obtain the final alpha channel. Finally, we use it to remove the colony background. www.nature.com/scientificreports/ Patch generation. After extracting numerous clusters of colonies for every 5 types of microbes, we are ready to generate a synthetic dataset. To generate a single patch, we select one of the 10 empty dishes, randomly rotate it, and take its 512 × 512 sized patch. Moreover, one of the microbe types is selected. Then, we take a random colony cluster of that type, rotate and flip it randomly, and put it in a random place on the patch. The position of each placed colony (i.e., bounding box) and its segmentation mask is stored. Next, we repeat this step many times placing subsequent clusters in random places but, at the same time, making sure they do not overlap. We also select the number of colonies placed on the patch using the exponential probability distribution e − x , with 1/ = 10 mean. This corresponds to the distribution of the number of colonies on the patch in AGAR dataset. Note, however, that in case of larger colonies typically fewer colonies will fall into a single patch. We did not observe such phenomena in our experiments, but in some cases, to avoid biasing different classes by the size of colonies, it may be also helpful to provide an additional augmentation technique by varying (scaling) the size of placed colonies during the generation. We repeat the whole patch-generating process and get a dataset of 50k patches. All subsequent steps of the patch generation method together with the number of parameters that need to be tuned are listed in Table 1.
Neural style transfer. Having described how to generate raw patches, we move on to neural style transfer.
We utilize the stylization method described in 38 with architecture similar to this used in seminal work 40 but with HRNet 52 high-resolution representation network instead of a standard convolutional encoder-decoder architecture. We chose this approach after several trials because it gives the most realistic stylization of our raw Petri dish images without introducing unwanted artifacts, which was the case for other tested methods, e.g, the original one introduced in 39 . We base our style transfer implementation on the code repository provided in 38 . The architecture consists of two deep networks: an image generation network-HRNet and pretrained VGG19 53 used to calculate content and style reconstruction losses. Deep convolutional neural network VGG19 is used because we are not interested in exactly matching the pixels of output y and style y s or content y c images; instead we force them to have similar feature representations and compare the VGG-activations between output and style or content. The resulting feature maps are combined via Gram matrix 39 G, and the weighted differences  Neural object detector. We trained two widely used deep learning object detectors from R-CNN family: Faster 18 and Cascade 54 using synthetic data, and show the usefulness of the generation method by testing the performance of microbe detection on real images of Petri dishes-the test part of the higher-resolution subset of AGAR dataset. For detecting and counting microbial colonies, we used Faster R-CNN with ResNet-50 55 as a backbone, and Cascade R-CNN with HRNet 52 backbone. The same architectures were used as the baseline (Faster R-CNN) and a more advanced model (Cascade R-CNN) during experiments with the real AGAR dataset 12 , which makes them good networks to compare performance. Moreover, the Faster detector was extended to Mask R-CNN in case of instance segmentation experiments. Both used detectors are the top deep learning models with quite complex structure, e.g. Cascade R-CNN model with HRNet has got 956 different layers in total.
We trained the networks for 20 epoches, with a batch size of 8 (for Faster) or 3 (for Cascade), and an adaptive learning rate initialized to 0.0001. During training, Stochastic Gradient Descent (SGD) method was used for optimizing the network weights. Calculations were performed on NVIDIA Titan RTX GPU, and the training process lasted about 2 (Faster) to 4 (Cascade) days depending on the neural network model used. We used the detectors implementation from the MMDetection 56 toolbox. Backbones' weights were initialized using models pre-trained on ImageNet 57 , available in the torchvision package of the PyTorch library.

Results
In this section we present results for the generation of synthetic images with microbial colonies, and the results for training deep learning detectors using the generated data.
Generation and stylization of synthetic dataset. In the first step, we take 100 annotated real images of Petri dishes (20 for each microbe type in AGAR dataset) and perform colony segmentation using traditional computer vision algorithms. In the second step, segmented colonies and clusters of colonies are randomly arranged on fragments of an empty Petri dish. Examples of generated patches together with their masks denoting different colonies are shown in Fig. 1. Further examples of generated patches with different microbe types are listed in Fig. 5. In the third step, we apply the augmentation method via deep learning style transfer. Examples of stylization of five different raw synthetic patches using five different real style images are presented in Fig. 6.   www.nature.com/scientificreports/ We introduce stylizing for two visual reasons. First, it makes the generated images look more realistic-they become difficult to distinguish from real images of Petri dishes. Second, stylizing mitigates the mismatch at the edges of pasted colonies, which is an artifact that strongly impacts the detector performance-the detector would undesirably learn to detect nonrealistic edges of microbial colonies. However, another important factor is that the addition of stylizing, and thus augmenting using these 20 different styles, simply increased the diversity of the quite homogeneous raw dataset, and had a big impact on detector training.
Using the method, we generated 50k of raw patches with different microbe types and their amount. To test the stylization impact, we use two different stylization strengths controlled by weight-see "Methods" section for further details, and apply them separately to the raw patches. Weaker stylization with = 0.02 gives semistylized set, while stronger one with = 0.05 gives a fully-stylized set. We also store the raw set with no stylization. Each of the three sets contains 50k patches which corresponds to about 65 k of patches in the training part of the higher-resolution subset of AGAR dataset.
Deep learning detection on real data. The idea behind the conducted experiments is to train a neural network model using synthetic data to detect microbial colonies, and then test its performance on real images with bacterial colonies in Petri dish.
Examples of detection on real data. We train the Faster R-CNN and Cascade R-CNN on the three synthetic sets separately. The best results are obtained for training on the fully-stylized set-the examples of tests for real patches in case of Faster R-CNN are presented in Fig. 7. Detector performs quite well, but one may notice missing bounding boxes for some colonies, especially in crowded samples where colonies frequently overlap. The worst performance occurs for blurred P.aeruginosa microbes that form the biggest colonies; the best occurs for sharp small colonies of S.aureus and C.albicans microbes. In some cases, some excessive (false positive) bounding boxes also appear. In case of Cascade R-CNN, the results are similar with a slightly lower tendency of the detector to generate excessive bounding boxes, which results in better counting statistics-see Table 2.
Automatic instance segmentation is a problem that occurs in many biomedical applications. During the patch generation, we also store a segmentation mask at a pixel level for each colony. This additional information can be used to train a deep learning instance segmentation model. We use the Mask R-CNN 19 model which extends Faster R-CNN detector that we have already trained. In the study, we train the model using pretrained weights Colony counting metrics. One of the main applications of object detection in microbiology is to automate the process of counting microbial colonies grown on a Petri dish. We verify the proposed method of synthetic dataset generation by comparing it with a standard approach where we collect a big real dataset 12 and train the detector for colony detecting and counting tasks. There are several types of metrics typically used in the context of object detection and counting. To characterize the quality of detection, we calculate the standard mean Average Precision (mAP) established by the famous COCO competition 58 . The Average Precision (AP) of detection is calculated for a given Intersection over Union (IoU) threshold. IoU describes the level of overlapping between the truth and predicted bounding boxes. The mean value of AP for different IoU thresholds and classes gives the mAP. To measure the effectiveness of colony counting, two separate metrics were used, namely, standard Mean Absolute Error (MAE) and symmetric Mean Absolute Percentage Error (sMAPE), which additionally weights the errors using information about the number of instances present on a given dish. Precise definitions of both measures can be found in Supplementary Material for 12 .
Let us now discuss the results of counting microbial colonies on a Petri dish. We train the chosen detector (Faster RCNN with ResNet-50 backbone and Cascade R-CNN with HRNet) on a 50k large dataset generated using 100 images of the training part of the higher-resolution AGAR subset, and test microbial colonies counting in the same task as conducted in 12 (testing part of AGAR higher-resolution). The resulting detection metrics are presented in Table 2, and counting results in Fig. 9 (the detectors were trained on raw, semi-stylized, and fully-stylized sets separately). It turns out that the detection precision itself (mAP) and counting errors (MAE and sMAPE) for the fully-stylized set are only slightly worse (especially MAE) than for the same detector but trained on the whole big set containing over 7360 of real images giving about 65 k of patches. Cascade R-CNN detects colonies slightly better than Faster R-CNN and this is true for training on both synthetic and real data. It is also interesting to note that Cascade R-CNN is better for the most stylized set, for the others, semi-stylized and raw, Faster R-CNN performs better.
It is also clear that introducing style transfer augmentation greatly improves the detection quality, and without stylization the results are rather poor-see results for raw set in Fig. 9(left). The top row in Fig. 9 shows counting results for Faster, while the bottom row for Cascade detector. Moreover, in the Fig. 9(right)-results for fully-stylized set, we observe the typical problem where the detector underestimates the number of colonies  www.nature.com/scientificreports/ for highly populated dishes. This is because in such cases colonies (especially bigger ones) frequently overlap, which makes detection (and counting) harder.

Discussion
Object counting problems, such as estimating the number of cells in a microscopic image or the number of humans or cars in a video frame, have become one of the major tasks in computer vision these days 59 . Generation of synthetic microbiological images using simple geometrical transformations that emulate microscopic views of bacterial colonies was used for validation of various algorithms for automated image analysis 60 , or to train a high-resolution convolutional neural network to build density-map-based cell counters 61 . In the latter, authors use methods similar to ours, where the model was trained using synthetic data and tested on real images. However, in our approach we train a much more complicated but thus more flexible (and prone to generalize) object detector. Therefore, we need to use a much bigger and more diverse dataset containing objects of different sizes and with different lighting conditions. We also verified that further enlarging the dataset by adding another 10k patches practically did not improve the detection fidelity. Deep CNNs were also trained to count cells 62 and microbial colonies 16,46 , but in these approaches the network training was done entirely in a supervised manner using real datasets. Simple augmentation techniques were applied successfully to extend microbial datasets 46,63 in bacteria classification tasks. We also flipped or rotated the images, or added salt and pepper noise or speckle noise, but with no significant effect on the training process. According to our experiments, only advanced augmentation by using style transfer significantly affects the training of the detector. It is also worth mentioning that adding more styles did not help much; during the generation we now choose from 20 style images, but when we increased this number to 50, it did not give further improvement. On the other hand, the degree of styling was more important, but if it was too high, i.e > 0.05 , the fidelity of microbe classification would decrease. Other groups involved in the generation of synthetic microbial images also reported the usefulness of style transfer methods 35 . They use the style transfer algorithm in the microbial colony segmentation task to improve the realism of synthetic images of Petri dishes generated using GANs. However, we additionally show that style transfer allows us not only to improve realism, but also to introduce different color or lighting domains, and thus enhance the trained detector's ability to generalize. We saw in Fig. 7 that by using different style images during training, the detector can learn to count in slightly different lighting conditions (domains) at the same time. Therefore, our approach www.nature.com/scientificreports/ can be used to quickly change domains, e.g., we can collect styles and empty dishes in different lighting, and then generate synthetic images in that domain. It is also worth emphasizing that standard GANs are not applicable in our case because to train the detector in a supervised manner we need to know exactly where each colony will appear. We need to control precisely where they are located-their placement and number on a dish, which is not possible with generic, unconstrained GAN. On the other hand, there are variations of GANs such as cycle GAN or conditional GAN (e.g. pix2pix translation 24 ) that would allow colonies to be generated at desired locations. However, our preliminary experiments have shown that while we are able to generate realistic looking images in a single style with pix-2pix, training of this type of networks is difficult and it is much harder to achieve sufficient diversity of domains using this approach. As a result, the detector trained on such data performs worse. On the other hand, one-shot style transfer is simply faster (especially for high resolution images), easier to train (without mode collapse or converge failure), and able to transfer with high resolution to different domains. Moreover, data leakage, which is more problematic for medical investigation due to privacy issues, can be controlled with proposed generation strategy. Therefore, to generate the data we used a hybrid of traditional computer vision and image style transfer rather than individual generative models.

Conclusions
Training modern deep learning models requires large and diverse datasets containing thousands of labeled images 58 . By using traditional computer vision techniques complemented by a deep learning style transfer algorithm, we were able to build a microbial data generator supplied with only 100 real images that demands much less effort and resources than collecting and labeling a vast dataset containing thousands of real images.
In principle, any object with some local differences from the background (in colorspace or brightness) can be segmented and then used to generate images in our scheme together with the appropriate transfer of styles. We showed that, once generated, our synthetic dataset allows us to train state-of-the-art deep learning object detectors, making it applicable to any object detection task in microbiology, but also in other scientific or industrial applications.