Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision

Genze, Nikita; Vahl, Wouter K.; Groth, Jennifer; Wirth, Maximilian; Grieb, Michael; Grimm, Dominik G.

doi:10.1038/s41597-024-02945-6

Download PDF

Data Descriptor
Open access
Published: 23 January 2024

Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision

Scientific Data volume 11, Article number: 109 (2024) Cite this article

1657 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Sustainable weed management strategies are critical to feeding the world’s population while preserving ecosystems and biodiversity. Therefore, site-specific weed control strategies based on automation are needed to reduce the additional time and effort required for weeding. Machine vision-based methods appear to be a promising approach for weed detection, but require high quality data on the species in a specific agricultural area. Here we present a dataset, the Moving Fields Weed Dataset (MFWD), which captures the growth of 28 weed species commonly found in sorghum and maize fields in Germany. A total of 94,321 images were acquired in a fully automated, high-throughput phenotyping facility to track over 5,000 individual plants at high spatial and temporal resolution. A rich set of manually curated ground truth information is also provided, which can be used not only for plant species classification, object detection and instance segmentation tasks, but also for multiple object tracking.

OpenWeedLocator (OWL): an open-source, low-cost device for fallow weed detection

Article Open access 07 January 2022

Combining computer vision and deep learning to enable ultra-scale aerial phenotyping and precision agriculture: A case study of lettuce production

Article Open access 01 June 2019

CherryChèvre: A fine-grained dataset for goat detection in natural environments

Article Open access 11 October 2023

Background & Summary

Weeds are plants that, although not specifically cultivated, are adapted to grow on arable land. Typically, weeds are considered to be an undesirable element in crop production. Their negative impact on crop development can be described in terms of competition with the crop for resources (nutrients, sunlight, space and water), reduction in productivity, increased challenges during harvesting and an overall increase in the cost of agricultural production. In addition, weeds can be hosts for insects and diseases^1,2, which might further increase the necessity for control strategies. Nevertheless, weeds might also have positive effects on biodiversity³ and soil structure⁴. Therefore, only highly competitive and invasive weed species should be removed which might lead to more sustainable agriculture⁵.

Over centuries, several crop management strategies were established to mitigate the negative impact of weeds, which can be divided into five main categories⁶: ‘preventative’ (preventing weeds from establishing), ‘cultural’ (maintaining field hygiene with low weed seed bank), ‘mechanical’ (removing weeds by mowing, mulching or tilling), ‘biological’ (using natural enemies such as insects or animals), and ‘chemical’ (applying herbicides). Disadvantages of these approaches include financial burden, additional time and effort to varying degrees. In addition, control treatments may impact the health of people, plants, soil, animals, and the environment^7,8,9.

Sustainable strategies for managing weeds are critical to feed the world’s population while conserving the ecosystems and biodiversity⁹. The limited and rational use of herbicides is an important principle of sustainable farming, as spraying of herbicides leads to waste and can pollute soil and water sources. Furthermore, agrochemical residues are one of the most important food-related concerns. Therefore, additional non-chemical and site-specific weed management (SSWM) strategies¹⁰ are needed, which should be linked with the farm management system. One key aspect is to automatically and precisely detect weeds to mitigate the additional time and effort for either site-specific or weed-specific herbicide application or mechanical weed control. Numerous studies demonstrated methods to automatically detect weeds on the field or in greenhouses, where computer vision-based methods seem the most promising^{11,12,13,14,15,16,17,18,19}. These methods can be grouped into different tasks with varying ground truth information, as shown in Fig. 1. Starting from an image as shown in Fig. 1a, image classification is the least accurate task (Fig. 1b). It is applied to a single plant cut-out without location and thus cannot be applied in SSWM tasks, where the simultaneous detection and localization of multiple plants is desired. Therefore, the object detection²⁰ task might be utilized. However, this task detects rectangular bounding boxes (Fig. 1c) which is not satisfactory due to the complex and irregular shape of the plants. Also, these models are prone to occlusion²¹ which diminishes their performance in areas with high weed infestation²². Nevertheless, the analysis of a plant’s growth dynamics can be achieved, where multiple objects are tracked through time (see Fig. 1d). Moreover, by using segmentation masks, additional tasks can be performed. Here, by convention any countable entity (i.e. plant, person, etc.) is named ‘thing’ and any uncountable region (i.e. soil, sky) is called ‘stuff’: The semantic segmentation task²³ provides a precise delineation of stuff, which separates every pixel in the image by class label, but cannot separate different plants of the same class²⁴ (see Fig. 1e). This is crucial in selective weed management²⁵ to conserve biodiversity by removing only competitive weeds. Consequently, instance segmentation²⁴ (Fig. 1f) can generate accurate detections of things individually, which can be used in many downstream tasks such as weed density assessment or biomass estimation²⁶. Nevertheless, this type of model is only able to detect countable objects (things) and does not consider stuff regions. Finally, panoptic segmentation²⁷ combines the concept of both semantic and instance segmentation and assigns two labels (semantic label and instance id) to each pixel in an image (Fig. 1g).

The basis for the development, validation and assessment of such systems is the availability of high quality data on weed diversity in a particular area of interest. Several datasets are publicly available, but lack several aspects for precise plant phenotyping, as summarized in Table 1.

Table 1 Comparison of ground-based image datasets for plant detection.

Full size table

Most datasets available lack variability in the data (low number of individuals or plant species) limiting their usability in different studies. Only a few datasets are larger than 100,000 annotated plant samples, including Open Plant Phenotype Database (OPPD)²⁸ and Pl@ntNet-300k²⁹. However, Pl@ntNet-300k can only be used for classification tasks without tracking plant growth stages and OPPD is missing semantic and instance segmentation masks, which are important for precise phenotyping. Also, the bounding box information of a plant is often not sufficient, as it is too coarse for most weed management applications. Therefore, semantic segmentation or even more accurate instance segmentation masks are required. Finally, tracking a plant over time provides valuable insight into growth dynamics.

In this work we have created a high-quality dataset of different plant species with a high temporal and spatial resolution. We added manually curated semantic and instance segmentation masks of a subset to make this dataset suitable for weed management tasks. For this purpose, we used a high throughput phenotyping system to ensure a high degree of automation, as this system was equipped with controlled illumination and an automatic irrigation system. In our dataset, we included images of plants captured multiple times per day. This included captures in the evening, when the appearance of some species changes due to their dependency on sunlight. In addition, we generated data from different varieties of sorghum and maize, focusing on a wide range of seedling weeds that are also common in agricultural sites where these crops are grown.

Methods

The methodology can be summarized into three steps, as shown in Fig. 2. First, we will describe the experimental setup (Fig. 2a). Second, we will illustrate the image generation (Fig. 2b) and conclude with the labeling process (Fig. 2c).

Experimental Setup

To generate a dataset consisting of high-quality images that capture the initial growth dynamics of individual plants of several weed species, a greenhouse experiment was performed at the Moving Fields facility (https://www.lfl.bayern.de/verschiedenes/ueberuns/272457/index.php) of the Bavarian State Research Center for Agriculture in Freising, Germany. In the experiment, plants were grown in micro-plots that were watered and photographed automatically on at least a daily basis from the day of sowing until harvest, which took place at around shooting. Built into a greenhouse, the Moving Fields facility (LemnaTec GmbH) consists of a conveyor belt system, three irrigation stations and four ‘Scanalyzer 3d’ photo cabins, which together enable experimental units consisting of plants growing on micro-plots to be automatically moved, watered, weighted and photographed. The greenhouse can be climatically controlled with regard to humidity and temperature and can be illuminated by 48 sodium-vapor lamps (Philips Son-T AGRO). The conveyor belt system (Bosch Rexroth TS2plus) accommodates and enables the movement of 390 carriers (micro-plots). At the three measuring stations, digital scales (Bizerba ST) and high-pressure pumps (Wartson-Marlow) enable carriers, together with any plants transported by them, to be weighed and to be watered to a unit-specific target weight. The plant species included in the dataset, listed in Table 2 and Table 3, were selected as weed species common to fields of sorghum grown in Germany. Additional selection criteria were 1) commercial availability and 2) the ability to be grown at the climatically controlled conditions of a greenhouse. Seeds of the species involved were acquired from commercial breeders in Germany, the Netherlands and France. All plants were grown in boxes of size 40 × 30 × 22 cm (outer dimensions). The color of these boxes was blue, to facilitate image analysis afterwards. Each box was filled to about half height (roughly 11 cm after compression) with a commercial peat-free substrate (Höfter GmbH), primarily consisting of coconut fibers. Plants were grown as monocultures; each box contained plants of one species only. To yield data for enough individual plants per species, the number of boxes varied between species due to different germination rates.

Table 2 Selected weed species with corresponding amount of data captured and labeled.

Full size table

Table 3 Selected crop varieties with corresponding amount of data captured and labeled.

Full size table

The number of seeds planted in each box was made dependent on the expected germination rate, which was adjusted throughout the experiment. Thus, seed density varied both within species over time and between species. Following breeder recommendations, some seeds were kept in a vernalization room (at 4 °C) or treated with gibberellin acid (GA₃) to ensure germination success. Units were sent to an automatic watering station as often as frequent imaging allowed, in practice at least twice a day. Each unit was watered to its unique target weight. This target weight initially corresponded to the unit’s weight at sowing. Throughout the experiment, target weights were adjusted repeatedly to prevent boxes from becoming either too dry or too wet. Twice a week, all boxes were examined to score seedling emergence, to thin the standing stock in order to minimize overlap between individual plants and to harvest plants that either started shooting or that became too big for the box they were growing in.

In addition, different varieties of maize (Zea Mays) and sorghum (Sorghum bicolor) were grown and captured, as shown in Table 3.

Image Generation

To generate well illuminated, high-resolution top-down images of the experimental units, one of the Scanalyzer 3D imaging cabins of the Moving Fields facility was used. In this cabin, one RGB camera (Basler piA2400-17gm) is mounted 2.8 m perpendicular above the conveyor band. This camera takes images with 2456 × 2058 pixels. This camera is equipped with a motorized zoom lens (Pentax C6Z1218M3-5); throughout the experiment, however, this lens was set fixed at a single position, which resulted in a ground resolution of ∼ 0.17 mm per pixel. The micro-plots were illuminated by 14 fluorescent lamps (Osram HE 28 W/865) that were also mounted perpendicular above the conveyor band. Units were imaged as often as possible, in practice at least once a day on the two days per week on which plant maintenance took place and at least twice a day on all other days. The units were tracked over a development period from sowing till the last plant either started shooting or became too big for the setup, representing the relevant stages for weed control on the field for sorghum. The images collected were stored in a LemnaTec-specific raw format, after which they were converted to PNG format. Each experimental unit was marked with a unique numerical barcode composed of identity codes for 1) the species, 2) the treatment (with or without GA3) and 3) the replicate at issue. Each image was saved with the barcode of the unit that was on the image as well as the date and time of image acquisition.

Labeling Process

The complete dataset was labeled using the open-source software CVAT (Computer Vision Annotation Tool; https://www.cvat.ai/) as a self-hosted solution. This software made the labeling time-efficient, as it provided an easy interface for multiple object tracking by adding time series information per plant. Each species was labeled individually, providing the EPPO code as a label. Although only one species was seeded per tray, more weed species germinated during the experiment (compare Table 4), as the seed assortment was not completely pure. Therefore, the additional label “weed” was used to annotate these plants and the unknown species was identified by an expert in a second step. The correct species could not be specified for all plants due to several reasons (i.e., little germination resulting in small plants, occlusion with other plants, etc.), especially when they were not part of our initial assortment.

Table 4 Additional germinating weed species (dicots) that were not sown explicitly.

Full size table

The instance segmentation masks were manually drawn using another open-source software, GIMP (GNU Image Manipulation Program), which provides pixel-level information. Therefore, we selected one tray from each of 14 plant species, as not all species could be labeled due to complexity and time constraints (see Fig. 1g).

Data Records

The Moving Fields Weed Dataset³⁰ (MFWD) is deposited at the digital library of the Technical University of Munich (https://mediatum.ub.tum.de/1717366). The dataset consists of 94,321 high temporal and spatial resolution images of 30 different plant species (see Fig. 3).

Additional ground truth data is provided, consisting of the plant species, a bounding box per plant, and time series information to track the same plant individual through growth. We labeled a large subset of these images, resulting in 200,148 records for 5,068 plants in the current version. Additional information is shown in Table 5.

Table 5 Summary of the current version of our dataset.

Full size table

Image data is stored in PNG format to ensure the highest possible quality without compression artifacts. Compressed (JPEG) images are also stored to ensure accessibility with lower Internet bandwidth. All object detection and object tracking information are stored in a separate CSV file named “gt.csv”. The segmentation masks are stored in the folder “masks”. An additional folder is provided containing all images without ground truth annotations. The contents of the CSV file are explained in Table 6:

Table 6 Description of the csv file containing ground truth information.

Full size table

Technical Validation

The growth experiments were conducted using multiple trays, resulting in different numbers of replicates per species. Here, a minimum of nine replicates were used. Seeds were treated according to breeders’ recommendations. Some weeds germinated only when treated with gibberellin acid. Therefore, the optimal procedure was evaluated in a prior experiment.

The quality of the bounding boxes and labels was ensured twofold. First, a valid bounding box could be evaluated during the labeling process by using tools in CVAT directly and by an additional human inspector doing quality control. Second, using the time-series information, all plant cut-outs of one plant individual could be plotted in a series of images to visually inspect the bounding boxes. Finally, plants of additional not sown species could be assessed and classified in an additional step. Remaining instances were labeled as class “Weed”, as they were tiny and thus could not be labeled by species.

The high variability in the seed germination rate of different weed species resulted in a very diverse data set of different weeds with different germination rates, as shown in Fig. 4.

Usage Notes

The dataset can be downloaded via custom Python scripts (https://github.com/grimmlab/MFWD) and used as a resource for precision agriculture, smart farming, and computer vision related tasks. In addition, we encourage the development of algorithms that take plant phenotyping data into account. Since the dataset consists of high-resolution images, we also provide a custom Python script to easily resize the images. The dataset could be a useful resource for the computer science community in general to develop novel machine learning and computer vision algorithms for automatic weed detection. Here, the data could be used in classification as well as object detection and segmentation tasks. Furthermore, the additional time series information makes our dataset suitable for multiple object tracking. Finally, the inherent inter- and intra-class variance can be used to evaluate new algorithms that address class imbalance, which is a challenge in many machine learning tasks^31,32,33.

To demonstrate the usefulness of the dataset, we provide a simple baseline experiment on the image classification task. For this purpose, we use plant cut-outs from the jpeg-compressed images and rescale them to an image size of 224 × 224 pixels². Here we focused on the multi-species classification for sorghum, i.e. all sorghum varieties were labeled as “SORVU” and excluded all maize images. Additionally, the generic weed class was excluded for this experiment, as these were mostly small plants which could not even be classified by the human eye. We also excluded POLAV and VICVI from the experiment because they contained less than three plant individuals and thus could not be separated into training, validation, and test sets. We strongly recommend stratifying the data by plant individuals, as the temporal resolution is high, and models may overfit if the data were randomly split. The final dataset of 27 plant species contained 167,505 images and was split into a training- (~60%), validation- (~20%), and test-set (~20%).

We selected two different deep learning-based model architectures, ResNet-10³⁴ and EfficientNet_b0³⁵, to evaluate the classification performance. For the hyperparameter optimization we used grid-search and five different learning rates (sampled from a log uniform distribution in the range between 1e-3 and 1e-4). We sampled 512 plant cut-outs in a batch by oversampling the minority classes, due to high class imbalances. The networks were initialized with weights from ImageNet. The Adam³⁶ optimizer with a learning rate scheduler and the cross-entropy loss³⁷ was used to train the models. We validated the models using the validation-set by calculating the weighted f1-score^38,39, due to the high class imbalance. Each model was trained for a maximum of 50 epochs. Early stopping⁴⁰ was used as a regularization technique to avoid overfitting. After training the models, EfficientNet_b0 with a learning rate of ~5.4*10⁻⁴ gave the best results on the validation set with an f1-score of 90.00%. The summary of the hyperparameter optimization is shown in Table 7.

Table 7 Summary of the hyperparameter optimization calculated on the validation set.

Full size table

Finally, the best performing model was applied to the hold-out test set to evaluate the generalization abilities on an unseen dataset. Here, the model achieved a weighted f1-score of 90.57%, indicating good generalization performance within the MFWD dataset. The complete code for training and testing the model is publicly available in our GitHub repository.

However, deep learning models trained on this dataset may not be applicable to out-of-context data, such as weed detection in drone imagery. Here, pre-training a model on our dataset and fine-tuning it to the target task might be a feasible strategy to scale up weed detection in agricultural landscapes. However, the main target application of our dataset is to encourage the research community to develop new computer vision algorithms on a unified dataset, thus increasing the reproducibility of the results.

Code availability

The code to download the dataset is publicly available for download on GitHub: https://github.com/grimmlab/MFWD.

References

Dentika, P., Ozier-Lafontaine, H. & Penet, L. Weeds as Pathogen Hosts and Disease Risk for Crops in the Wake of a Reduced Use of Herbicides: Evidence from Yam (Dioscorea alata) Fields and Colletotrichum Pathogens in the Tropics. J Fungi 7, 283, https://doi.org/10.3390/jof7040283 (2021).
Article CAS Google Scholar
Norris, R. F. & Kogan, M. Interactions between weeds, arthropod pests, and their natural enemies in managed ecosystems. Weed Sci 48, 94–158 (2000).
Article CAS Google Scholar
Schumacher, M., Dieterich, M. & Gerhards, R. Effects of weed biodiversity on the ecosystem service of weed seed predation along a farming intensity gradient. Glob Ecol Conserv; 24, e01316, https://doi.org/10.1016/j.gecco.2020.e01316 (2020).
Article Google Scholar
Logsdon, S. D. Root effects on soil properties and processes: Synthesis and future research needs. In: Enhancing Understanding and Quantification of Soil-Root Growth Interactions. John Wiley & Sons, Ltd, pp 173–196, https://doi.org/10.2134/advagricsystmodel4.c8 (2015).
Harker, K. N., Clayton, G. W. & O’Donovan, J. T. Reducing agroecosystem vulnerability to weed invasion. In: Invasive Plants: Ecological and Agricultural Aspects. Birkhäuser Basel, pp 195–207, https://doi.org/10.1007/3-7643-7380-6_12 (2005).
Harker, K. N. & O’Donovan, J. T. Recent Weed Control, Weed Management, and Integrated Weed Management. Weed Technol; 27, 1–11, https://doi.org/10.1614/WT-D-12-00109.1 (2013).
Article Google Scholar
Myers, J. P. et al. Concerns over use of glyphosate-based herbicides and risks associated with exposures: A consensus statement. Environ. Heal. A Glob. Access Sci. Source. 15, 1–13, https://doi.org/10.1186/s12940-016-0117-0 (2016).
Article CAS Google Scholar
Steinmetz, Z. et al. Plastic mulching in agriculture. Trading short-term agronomic benefits for long-term soil degradation? Sci. Total Environ.; 550, 690–705, https://doi.org/10.1016/j.scitotenv.2016.01.153 (2016).
Article ADS CAS PubMed Google Scholar
MacLaren, C., Storkey, J., Menegat, A., Metcalfe, H. & Dehnen-Schmutz, K. An ecological future for weed science to sustain crop production and the environment. A review. Agron. Sustain. Dev. 40, 1–29, https://doi.org/10.1007/s13593-020-00631-6 (2020).
Article Google Scholar
Christensen, S. et al. Site-specific weed control technologies. Weed Res; 49, 233–241, https://doi.org/10.1111/j.1365-3180.2009.00696.x (2009).
Article Google Scholar
Hasan, A. S. M. M., Sohel, F., Diepeveen, D., Laga, H. & Jones, M. G. K. A survey of deep learning techniques for weed detection from images. Comput. Electron. Agric.; 184, https://doi.org/10.1016/j.compag.2021.106067 (2021).
Wang, A., Zhang, W. & Wei, X. A review on weed detection using ground-based machine vision and image processing techniques. Comput. Electron. Agric.; 158, 226–240, https://doi.org/10.1016/j.compag.2019.02.005 (2019).
Article Google Scholar
Dian Bah, M., Hafiane, A. & Canals, R. Deep learning with unsupervised data labeling for weed detection in line crops in UAV images. Remote Sens; 10, https://doi.org/10.3390/rs10111690 (2018).
Bakhshipour, A. & Jafari, A. Evaluation of support vector machine and artificial neural networks in weed detection using shape features. Comput. Electron. Agric.; 145, 153–160, https://doi.org/10.1016/j.compag.2017.12.032 (2018).
Article Google Scholar
Sivakumar, A. N. V. et al. Comparison of object detection and patch-based classification deep learning models on mid-to late-season weed detection in UAV imagery. Remote Sens; 12, https://doi.org/10.3390/rs12132136 (2020).
Genze, N. et al. Improved weed segmentation in UAV imagery of sorghum fields with a combined deblurring segmentation model. Plant Methods; 19. https://doi.org/10.1186/s13007-023-01060-8 (2023).
Genze, N. et al. Deep learning-based early weed segmentation using motion blurred UAV images of sorghum fields. Comput. Electron. Agric.; 202, https://doi.org/10.1016/j.compag.2022.107388 (2022).
Milioto, A., Lottes, P. & Stachniss, C. Real-Time Semantic Segmentation of Crop and Weed for Precision Agriculture Robots Leveraging Background Knowledge in CNNs. In: Proceedings - IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., pp 2229–2235, https://doi.org/10.1109/ICRA.2018.8460962 (2018).
Lottes, P., Khanna, R., Pfeifer, J., Siegwart, R. & Stachniss, C. UAV-based crop and weed classification for smart farming. In: Proceedings - IEEE International Conference on Robotics and Automation. Institute of Electrical and Electronics Engineers Inc., pp 3024–3031, https://doi.org/10.1109/ICRA.2017.7989347 (2017).
Zhao, Z. Q., Zheng, P., Xu, S. T. & Wu, X. Object Detection with Deep Learning: A Review. IEEE Trans. Neural Networks Learn. Syst.; 30, 3212–3232, https://doi.org/10.1109/TNNLS.2018.2876865 (2019).
Article Google Scholar
Genze, N., Bharti, R., Grieb, M., Schultheiss, S. J. & Grimm, D. G. Accurate machine learning-based germination detection, prediction and quality assessment of three grain crops. Plant Methods; 16. https://doi.org/10.1186/s13007-020-00699-x (2020).
Janneh, L. L., Zhang, Y., Cui, Z. & Yang, Y. Multi-level feature re-weighted fusion for the semantic segmentation of crops and weeds. J King Saud Univ - Comput Inf Sci; 35, https://doi.org/10.1016/j.jksuci.2023.03.023 (2023).
Long, J., Shelhamer, E. & Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp 431–440, https://doi.org/10.1109/CVPR.2015.7298965 (2015).
Champ, J. et al. Instance segmentation for the fine detection of crop and weed plants by precision agricultural robots. Appl Plant Sci; 8. https://doi.org/10.1002/aps3.11373 (2020).
von Redwitz, C. et al. Better-Weeds – Next generation weed management. Tagungsband 30 Dtsch Arbeitsbesprechung über Frag der Unkrautbiologie und -bekämpfung; 432–437, https://doi.org/10.5073/20220124-075254 (2022).
Sapkota, B. B. et al. Use of synthetic images for training a deep learning model for weed detection and biomass estimation in cotton. Sci Rep; 12. https://doi.org/10.1038/s41598-022-23399-z (2022).
Kirillov, A., He, K., Girshick, R., Rother, C. & Dollar, P. Panoptic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, pp 9396–9405, https://doi.org/10.1109/CVPR.2019.00963 (2019).
Madsen, S. L. et al. Open plant phenotype database of common weeds in Denmark. Remote Sens; 12. https://doi.org/10.3390/RS12081246 (2020).
Garcin, C., Joly, A., Bonnet, P., Chouet, M. & Servajean, M. Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution. NeurIPS 2021-35th Conf Neural Inf Process Syst 2021.
Genze, N. et al. Manually annotated and curated dataset of diverse weed species in maize and sorghum for computer vision. Technical University of Munich, mediaTUM, https://doi.org/10.14459/2023mp1717366 (2023).
Chou, H. P., Chang, S. C., Pan, J. Y., Wei, W. & Juan, D. C. Remix: Rebalanced Mixup. In: Lecture Notes in Computer Science. Springer Science and Business Media Deutschland GmbH, pp 95–110, https://doi.org/10.1007/978-3-030-65414-6_9 (2020).
Zhou, F., Yang, S., Fujita, H., Chen, D. & Wen, C. Deep learning fault diagnosis method based on global optimization GAN for unbalanced data. Knowledge-Based Syst; 187. https://doi.org/10.1016/j.knosys.2019.07.008 (2020).
Kaur, H., Pannu, H. S. & Malhi, A. K. A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Comput. Surv.; 52. https://doi.org/10.1145/3343440 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 770–778, https://doi.org/10.1109/CVPR.2016.90 (2016).
Tan, M. & Le, Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019. 2019, pp 10691–10700.
Kingma, D. & Ba, J. Adam: A Method for Stochastic Optimization. In: International Conference on Learning Representations (ICLR). San Diega, CA, USA, 2015.
Cox, D. R. The Regression Analysis of Binary Sequences. J R Stat Soc Ser B 20, 215–232, https://doi.org/10.1111/j.2517-6161.1958.tb00292.x (1958).
Article MathSciNet Google Scholar
Chinchor, N. MUC-4 evaluation metrics. 4th Messag Underst Conf MUC 1992 - Proc: 22–29, https://doi.org/10.3115/1072064.1072067 (1992).
He, H. & Garcia, E. A. Learning from Imbalanced Data. IEEE Trans Knowl Data Eng 21, 1263–1284, https://doi.org/10.1109/TKDE.2008.239 (2009).
Article Google Scholar
Yao, Y., Rosasco, L. & Caponnetto, A. On early stopping in gradient descent learning. Constr Approx 26, 289–315, https://doi.org/10.1007/s00365-006-0663-2 (2007).
Article MathSciNet Google Scholar
Hughes, D. P. & Salathe, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv.org 2015.http://arxiv.org/abs/1511.08060 (accessed 27 Oct2023).
Jiang, Y., Li, C., Paterson, A. H. & Robertson, J. S. DeepSeedling: Deep convolutional network and Kalman filter for plant seedling detection and counting in the field. Plant Methods; 15. https://doi.org/10.1186/s13007-019-0528-3 (2019).
Olsen, A. et al. DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning. Sci Rep; 9. https://doi.org/10.1038/s41598-018-38343-3 (2019).
Wang, P. et al. Weed25: A deep learning dataset for weed identification. Front Plant Sci; 13. https://doi.org/10.3389/fpls.2022.1053329 (2022).
Kitzler, F., Barta, N., Neugschwandtner, R. W., Gronauer, A. & Motsch, V. WE3DS: An RGB-D Image Dataset for Semantic Segmentation in Agriculture. Sensors; 23. https://doi.org/10.3390/s23052713 (2023).
Sudars, K., Jasko, J., Namatevs, I., Ozola, L. & Badaukis, N. Dataset of annotated food crops and weed images for robotic computer vision control. Data Br; 31. https://doi.org/10.1016/j.dib.2020.105833 (2020).
Giselsson, T. M., Jørgensen, R. N., Jensen, P. K., Dyrmann, M. & Midtiby, H. S. A Public Image Database for Benchmark of Plant Seedling Classification Algorithms. arXiv.org 2017.http://arxiv.org/abs/1711.05458 (accessed 27 Oct2023).
Bender, A., Whelan, B. & Sukkarieh, S. A high-resolution, multimodal data set for agricultural robotics: A Ladybird’s-eye view of Brassica. J F Robot; 37, 73–96, https://doi.org/10.1002/rob.21877 (2020).
Article Google Scholar
Haug, S. & Ostermann, J. A crop/weed field image dataset for the evaluation of computer vision based precision agriculture tasks. In: Lecture Notes in Computer Science. Springer Verlag, pp 105–116, https://doi.org/10.1007/978-3-319-16220-1_8 (2015).
Minervini, M., Fischbach, A., Scharr, H. & Tsaftaris, S. A. Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recognit Lett; 81, 80–89, https://doi.org/10.1016/j.patrec.2015.10.013 (2016).
Article ADS Google Scholar

Download references

Acknowledgements

We thank Marlene Hanly and Agnes Wappler for their assistance in labeling the dataset. Funding for the research presented in this paper is provided by the Bavarian State Ministry for Food, Agriculture and Forests within the EWIS project (Funding ID: G2/N/19/13).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Bioinformatics, Schulgasse 22, 94315, Straubing, Germany
Nikita Genze, Maximilian Wirth & Dominik G. Grimm
Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, Petersgasse 18, 94315, Straubing, Germany
Nikita Genze, Maximilian Wirth & Dominik G. Grimm
Institute for Crop Science and Plant Breeding, Bavarian State Research Center for Agriculture, Am Gereuth 6, 85354, Freising, Germany
Wouter K. Vahl & Jennifer Groth
Technology and Support Centre in the Centre of Excellence for Renewable Resources (TFZ), Schulgasse 18, 94315, Straubing, Germany
Michael Grieb
Technical University of Munich, TUM School of Computation, Information and Technology (CIT), Boltzmannstr. 3, 85748, Garching, Germany
Dominik G. Grimm

Authors

Nikita Genze
View author publications
You can also search for this author in PubMed Google Scholar
Wouter K. Vahl
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Groth
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Wirth
View author publications
You can also search for this author in PubMed Google Scholar
Michael Grieb
View author publications
You can also search for this author in PubMed Google Scholar
Dominik G. Grimm
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.G.G., N.G., M.G. conceived and designed the study. J.G. and W.K.V. acquired the plants’ seeds. J.G. and W.K.V. designed the experiments and were responsible for the image acquisition. N.G. and M.W. labeled the dataset. M.G. guided the labeling process with domain expertise. N.G. was responsible for quality control of the data. N.G. and D.G.G. wrote the manuscript with contributions from all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Dominik G. Grimm.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Genze, N., Vahl, W.K., Groth, J. et al. Manually annotated and curated Dataset of diverse Weed Species in Maize and Sorghum for Computer Vision. Sci Data 11, 109 (2024). https://doi.org/10.1038/s41597-024-02945-6

Download citation

Received: 29 August 2023
Accepted: 10 January 2024
Published: 23 January 2024
DOI: https://doi.org/10.1038/s41597-024-02945-6