Annotated dataset for deep-learning-based bacterial colony detection

Makrai, László; Fodróczy, Bettina; Nagy, Sára Ágnes; Czeiszing, Péter; Csabai, István; Szita, Géza; Solymosi, Norbert

doi:10.1038/s41597-023-02404-8

Download PDF

Data Descriptor
Open access
Published: 28 July 2023

Annotated dataset for deep-learning-based bacterial colony detection

Scientific Data volume 10, Article number: 497 (2023) Cite this article

4749 Accesses
2 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Quantifying bacteria per unit mass or volume is a common task in various fields of microbiology (e.g., infectiology and food hygiene). Most bacteria can be grown on culture media. The unicellular bacteria reproduce by dividing into two cells, which increases the number of bacteria in the population. Methodologically, this can be followed by culture procedures, which mostly involve determining the number of bacterial colonies on the solid culture media that are visible to the naked eye. However, it is a time-consuming and laborious professional activity. Addressing the automation of colony counting by convolutional neural networks in our work, we have cultured 24 bacteria species of veterinary importance with different concentrations on solid media. A total of 56,865 colonies were annotated manually by bounding boxes on the 369 digital images of bacterial cultures. The published dataset will help developments that use artificial intelligence to automate the counting of bacterial colonies.

Virtual reality-empowered deep-learning analysis of brain cells

Article Open access 22 April 2024

A visual-language foundation model for computational pathology

Article 19 March 2024

AI in health and medicine

Article 20 January 2022

Background & Summary

In microbiology, the colony-forming unit (CFU) is used to determine the number of viable bacteria that can grow on solid media¹. In all cases, CFU values can only be interpreted when normalized to a unit volume (e.g., ml). In clinical microbiology, food hygiene, and vaccine research, quantification of CFU is essential. The CFU count is most commonly estimated by counting the number of colonies on solid culture media. As the estimation of the number of living bacteria is often a key, but at the same time the process of colony counting is rather time-consuming and labour-intensive, there have been several attempts in the literature to automate the procedure. A number of tools (EBImage², ImageJ³, OpenCFU⁴, AutoCellSeg⁵, CFUCounter⁶) have been developed and are used for colony counting, which has some predefined threshold (e.g., color) and counts the resulting objects. Although they can be of great help in laboratory work, it is important to be aware of their drawbacks. A general limitation of these solutions is that objects in the image that are not colonies (e.g., pieces of the wall of a Petri dish, air bubbles) may also appear in the result as colonies. Although some tools allow these erroneous detections to be corrected manually, this again requires time-consuming expert work. Also limiting their everyday use is that most of them cannot count colonies if the number of colonies in the Petri dish is too high⁶. The use of artificial intelligence (AI) to automate colony counting seems obvious. In the AI approach, colony counting is first an object detection problem. A further task could be the differentiation of bacterial species, which requires classification solutions. By these approaches, one can obtain the total and per-class CFU count by counting the detected and classified objects to estimate the total and per-species CFU counts. There are several machine-learning approaches available to solve this kind of problem. Nowadays, convolutional neural networks (CNNs) are probably the most efficient tools in this field^7,8,9,10, and there are efforts to use CNNs to automate colony counting^11,12,13. In line with these authors, the aim of our research group was to train CNNs to estimate CFU automatically. The availability of as many digital images of annotated bacterial cultures as possible is a prerequisite for colony detection and classification with CNN. We could not find a similar public, freely available dataset to use for our own CNN-based development when we started our work.

The aim of creating the dataset presented here was to build a collection of digital records of bacterial cultures performed under everyday laboratory conditions on solid media. In creating such datasets, the question arises as to whether the digital images should be produced under some highly controlled, standardized conditions or in a way that could presumably be produced anywhere. The former solution may obviously lead to more accurate results on a given dataset, but the latter may open up the possibility of extendibility. In creating the dataset presented here and made freely available, we chose the latter approach, using mobile phones to take 369 digital images of cultures of 24 bacterial species on solid media, annotating a total of 56,865 bacterial colonies.

Methods

Culturing of bacterial species

Our studies have cultured 24 bacterial species of veterinary importance (Table 1). These are species whose disease processes can cause significant economic damage in farm animals, which can cause disease in companion animals, and which are important for the safety of food products. Bacterial cultures were obtained from the bacterial strain collection of the Bacteriology Laboratory, Department of Microbiology and Infectious Diseases, University of Veterinary Medicine, where the bacterial strains were stored in an ultra-low freezer at −80 °C. Before each strain is stored in the collection, its bacterial species is identified by MALDI-TOF MS. Different media were used depending on the requirements of each bacterial species (Table 1).

Table 1 The bacterial species included in the data set.

Full size table

Several steps were necessary to obtain the bacterial cultures we later used to make digital images. On the first day, the frozen strains were inoculated onto the appropriate culture medium for the bacteria and incubated under conditions appropriate to the requirements of the bacteria. On the second day, a typical colony from the culture was inoculated onto a fresh medium and incubated. On the third day, a colony of bacteria was inoculated into tryptone soy broth (TSB) using a sterile cotton swab and incubated at 37 °C for 24 hours. The cultures were then used to prepare a dilution series on a decimal basis using sterile physiological saline suspension. In the first step of the dilution (basic dilution), 0.1 ml of the initial culture was first pipetted into a test tube containing 9.9 ml sterile saline, and the suspension was thoroughly homogenized (10⁻² dilution). Then 0.5 ml of this suspension was pipetted into a test tube containing 4.5 ml of sterile physiological saline solution. This gave the 10⁻³ dilution. The latter step of the dilution was carried out up to the 10⁻⁶ dilution (further dilutions).

Each member of the dilution series was homogenized by vortexing for 10 seconds. Subsequently, 50 μl per dilution of the dilutions was taken from each medium and distributed over the surface of the medium using a sterile glass rod with circular movements. After a final incubation at 37 °C for 24–48 hours, digital images of the Petri dishes containing the cultures were taken. The bacteria were inoculated, and dilutions were performed in a BSL2 safety cabinet. Incubation was done in a thermostat.

Digitalization and annotation

For the digitalization, three different mobile phones (LG Nexus 5X, iPhone 6, and HUAWEI P30 Lite) were used so that the variability of the devices could be accounted for in the data set. For the same purpose, black and white backgrounds for the dishes were used to take the photos. Care was taken to ensure that the camera on the phone was parallel to the plane of the Petri dish.

The digital images were uploaded to a server where an expert using COCO Annotator v0.11.1 (https://github.com/jsbroks/coco-annotator/) drew a bounding box around each colony and labeled the identified unit with the bacterial species (Fig. 1). After annotation, the COCO¹⁴ structured JSON¹⁵ files containing the bounding boxes and labels were downloaded and subjected to further validation steps.

Data Records

The number of images, annotations per bacterial species, and the distribution of the number of annotations per image are summarized in Fig. 2. The curated 369 digital images of 24 bacterial species cultures are freely available in the Figshare repository¹⁶. The filenames describe their origin. The first member of the file name is the bacterial species identifier (the ID column in Table 1), and the second member is the serial number of the image associated with that species. Accordingly, the naming file sp21_img04.jpg is the 4th image of the Staphylococcus aureus cultures. In addition to the images, the repository¹⁶ contains one metadata and five annotation files. In the first sheet of the images.xls metadata file, one line for a digital image contains the bacterial species ID, the file name, whether it was taken on a white or black background, and how many CFUs it contains. All the technical characteristics of the images and their recording are listed on the second sheet. The annot_COCO.json, annot_tab.csv, annot_tab.tsv, annot_VOC_XML.zip and annot_YOLO.zip files contain the 56,865 annotation data in COCO JSON, comma-separated, tab-separated, Pascal VOC XML¹⁷ and YOLO¹⁸ formats respectively.

Technical Validation

The annotated images of the bacterial cultures were curated by two experts with PhDs in bacteriology, and images they considered inappropriate were excluded from the final collection. The criteria for retaining images was whether the bacterial colonies morphologically matched the criteria for the species completely.

The annotations exported from the COCO annotator were reviewed by another expert using the Make Sense v1.11.0-alpha (https://github.com/jsbroks/coco-annotator/) tool, and the necessary corrections were made. In some cases, two identical images of the same culture were included in the initial collection, and these redundancies were filtered by findimagedupes v2.19.1 (https://gitlab.com/opennota/findimagedupes).

As our previous experience has shown that annotation bounding boxes exported from some annotation software can shift, especially for large numbers of annotated objects, we checked these separately. Since our CNN training designed on the dataset will be performed in the Detectron2 (https://github.com/facebookresearch/detectron2) environment, we tested whether the position of the annotation bounding boxes on the images placed with Detectron2 is correct based on the COCO format JSON files generated from the CSV files exported from Make Sense. This was done using a Python script that placed the associated bounding boxes on each digital image. The resulting images were curated one by one, and in all cases, the annotation bounding box positions were found to be correct.

Further technical validation was performed using independent data. Bärr et al.¹⁹ recorded images of S. aureus cultures every 10 minutes to estimate the colony growth rate. For this object detection challenge, not only the accuracy of colony detection but also the error of colony size estimation is also an important model selection criterion. We performed colony predictions on their publicly available images using CNNs trained on our dataset¹⁰. The colony growth rate presented by Bärr et al.¹⁹ and our predicted colony growth rate showed a difference of 0.1 μm/h (~0.2%).

Usage Notes

As we have more experience in object detection and classification with Detectron2, we recommend this environment for using the data. As several other efficient solutions are available, we have placed the annotation data in the repository¹⁶ in various formats to facilitate wider use of the data.

We believe the dataset can be used for three types of object detection and classification tasks. The first option is to train neural networks to detect bacterial colonies separately per species. A second option is to treat colonies of 24 species with different morphologies as one class and train CNNs on the whole dataset to detect a “general colony-forming unit” type¹⁰. A third option is to train the CNN on all the bacterial culture images and annotations but using the 24 classes, allowing the classification of bacterial colonies in addition to detection.

Code availability

As mentioned above, the correct position of the annotations was verified by drawing the corresponding bounding boxes on the images using Detectron2. The Python script used for this is in the file bbox_placement_test.py. The input annotation file for this run is a COCO JSON one. This was also generated from the tab-delimited annotation file using a Python script provided in TSV_to_COCO.py. Both script files are available in the Figshare repository¹⁶.

References

McVey, D. S., Kennedy, M., Chengappa, M. M. & Wilkes, R. Veterinary Microbiology 4th edn (John Wiley & Sons, 2022)
Pau, G., Fuchs, F., Sklyar, O., Boutros, M. & Huber, W. EBImage–an R package for image processing with applications to cellular phenotypes. Bioinformatics 26, 979–981, https://doi.org/10.1093/bioinformatics/btq046 (2010).
Article CAS PubMed PubMed Central Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9, 671–675, https://doi.org/10.1038/nmeth.2089 (2012).
Article CAS PubMed PubMed Central Google Scholar
Geissmann, Q. OpenCFU, a new free and open-source software to count cell colonies and other circular objects. PloS ONE 8, e54072, https://doi.org/10.1371/journal.pone.0054072 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Torelli, A. et al. AutoCellSeg: robust automatic colony forming unit (CFU)/cell analysis using adaptive image segmentation and easy-to-use post-editing techniques. Scientific Reports 8, 1–10, https://doi.org/10.1038/s41598-018-24916-9 (2018).
Article CAS Google Scholar
Zhang, L. Machine learning for enumeration of cell colony forming units. Visual Computing for Industry. Biomedicine, and Art 5, 1–8, https://doi.org/10.1186/s42492-022-00122-3 (2022).
Article CAS Google Scholar
Ribli, D., Horváth, A., Unger, Z., Pollner, P. & Csabai, I. Detecting and classifying lesions in mammograms with deep learning. Scientific Reports 8, 4165, https://doi.org/10.1038/s41598-018-22437-z (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kilim, O. et al. Physical imaging parameter variation drives domain shift. Scientific Reports 12, 21302, https://doi.org/10.1038/s41598-022-23990-4 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Nagy, S. Á., Kilim, O., Csabai, I., Gábor, G. & Solymosi, N. Impact evaluation of score classes and annotation regions in deep learning-based dairy cow body condition prediction. Animals 13, 194, https://doi.org/10.3390/ani13020194 (2023).
Article PubMed PubMed Central Google Scholar
Nagy, S. Á. et al. Bacterial colony size growth estimation by deep learning. Preprint at https://doi.org/10.1101/2023.04.25.538361 (2023).
Ferrari, A., Lombardi, S. & Signoroni, A. Bacterial colony counting with convolutional neural networks in digital microbiology imaging. Pattern Recognition 61, 629–640, https://doi.org/10.1016/j.patcog.2016.07.016 (2017).
Article ADS Google Scholar
Huang, L. & Wu, T. Novel neural network application for bacterial colony classification. Theoretical Biology and Medical Modelling 15, 1–16, https://doi.org/10.1186/s12976-018-0093-x (2018).
Article Google Scholar
Beznik, T., Smyth, P., de Lannoy, G. & Lee, J. A. Deep learning to detect bacterial colonies for the production of vaccines. Neurocomputing 470, 427–431, https://doi.org/10.1016/j.neucom.2021.04.130 (2022).
Article Google Scholar
Lin, T-Y. et al. Microsoft COCO: Common objects in context. In European conference on computer vision, 740–755, https://doi.org/10.1007/978-3-319-10602-1_48 (Springer, 2014).
Pezoa, F., Reutter, J. L., Suarez, F., Ugarte, M. & Vrgoč, D. Foundations of JSON schema. In Proceedings of the 25th international conference on World Wide Web, 263–273, https://doi.org/10.1145/2872427.2883029 (2016).
Makrai, L. et al. Annotated dataset for deep-learning-based bacterial colony detection. Figshare https://doi.org/10.6084/m9.figshare.22022540.v3 (2023).
Everingham, M. et al. The PASCAL visual object classes challenge: A retrospective. International Journal of Computer Vision 111, 98–136, https://doi.org/10.1007/s11263-014-0733-5 (2015).
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. Preprint at https://doi.org/10.48550/arXiv.1506.02640 (2016).
Bär, J., Boumasmoud, M., Kouyos, R. D., Zinkernagel, A. S. & Vulin, C. Efficient microbial colony growth dynamics quantification with ColTapp, an automated image analysis application. Scientific Reports 10, 16084, https://doi.org/10.1038/s41598-020-72979-4 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work has been supported by the European Union project RRF-2.3.1-21-2022-00004 within the MILAB Artificial Intelligence National Laboratory framework.

Funding

Open access funding provided by University of Veterinary Medicine.

Author information

Authors and Affiliations

Department of Microbiology and Infectious Diseases, University of Veterinary Medicine, 1143, Budapest, Hungary
László Makrai, Bettina Fodróczy & Péter Czeiszing
Centre for Bioinformatics, University of Veterinary Medicine, 1078, Budapest, Hungary
Bettina Fodróczy, Sára Ágnes Nagy, Péter Czeiszing, Géza Szita & Norbert Solymosi
Department of Physics of Complex Systems, Eötvös Loránd University, 1117, Budapest, Hungary
István Csabai & Norbert Solymosi

Authors

László Makrai
View author publications
You can also search for this author in PubMed Google Scholar
Bettina Fodróczy
View author publications
You can also search for this author in PubMed Google Scholar
Sára Ágnes Nagy
View author publications
You can also search for this author in PubMed Google Scholar
Péter Czeiszing
View author publications
You can also search for this author in PubMed Google Scholar
István Csabai
View author publications
You can also search for this author in PubMed Google Scholar
Géza Szita
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Solymosi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.S. takes responsibility for the data’s integrity. N.S., L.M., G.S. and I.C. conceived the concept of the work. L.M., B.F. and P.C. performed the bacteria culturing. B.F. and P.C. made the digital images. B.F. and S.Á.N. annotated the images. L.M. and G.S. curated the digital images. N.S. curated and edited the annotations. N.S., S.Á.N., B.F. and L.M. participated in the drafting of the manuscript. N.S., S.Á.N., L.M., G.S. and I.C. carried out the manuscript’s critical revision for important intellectual content. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Norbert Solymosi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Makrai, L., Fodróczy, B., Nagy, S.Á. et al. Annotated dataset for deep-learning-based bacterial colony detection. Sci Data 10, 497 (2023). https://doi.org/10.1038/s41597-023-02404-8

Download citation

Received: 18 May 2023
Accepted: 21 July 2023
Published: 28 July 2023
DOI: https://doi.org/10.1038/s41597-023-02404-8

This article is cited by

Assessment of airborne bacteria in the indoor of public-use facilities concentrated on influencing factors and opportunistic pathogenic bacteria
- Hyesoo Lee
- Bong Gu Lee
- Min-Kyeong Yeo
Air Quality, Atmosphere & Health (2024)
Bacterial colony size growth estimation by deep learning
- Sára Ágnes Nagy
- László Makrai
- Norbert Solymosi
BMC Microbiology (2023)