Multibeam forward-looking sonar (MFLS) plays an important role in underwater detection. There are several challenges to the research on underwater object detection with MFLS. Firstly, the research is lack of available dataset. Secondly, the sonar image, generally processed at pixel level and transformed to sector representation for the visual habits of human beings, is disadvantageous to the research in artificial intelligence (AI) areas. Towards these challenges, we present a novel dataset, the underwater acoustic target detection (UATD) dataset, consisting of over 9000 MFLS images captured using Tritech Gemini 1200ik sonar. Our dataset provides raw data of sonar images with annotation of 10 categories of target objects (cube, cylinder, tyres, etc). The data was collected from lake and shallow water. To verify the practicality of UATD, we apply the dataset to the state-of-the-art detectors and provide corresponding benchmarks for its accuracy and efficiency.
Background & Summary
Object detection is becoming faster and more accurate with the development of AI technology. This helps underwater robots archive better performance in accident rescue, facilities maintenance, biological investigation and other underwater applications. Onshore AI algorithms are developing rapidly based on rich and high-quality datasets. In order to transfer AI achievements from land to underwater, appropriate underwater datasets are required. There have been several underwater optical datasets, such as Brackish1 dataset, Segmentation of Underwater IMagery2 (SUIM) dataset, Detecting Underwater Objects3 (DUO) dataset, for the research on object detection, semantic segmentation and other AI applications. Due to the scattering and attenuation of the light in water, underwater optical imaging is a difficult task and often gets low quality images. So acoustic sensors are widely used for perceiving the underwater environment. MFLS is portable for underwater robots while providing dynamic real-time image data in high resolution. It is very applicable for scenarios requiring close and detailed inspection.
There have been previous works on underwater object detection and related applications with MFLS in AI areas. Haoting Zhang et al. proposed MFLS image target detection models based on You Only Look Once (YOLO) v5 network4. Zhimiao Fan et al. proposed a modified Mask Region Convolutional Neural Network (Mask RCNN) for MFLS image object detection and segmentation5. Longyu Jiang et al. proposed three simple but effective active-learning-based algorithms for MFLS image object detection6. Alan Preciado-Grijalva et al. investigated the potential of three self-supervised learning methods (RotNet, Denoising Autoencoders, and Jigsaw) to learn sonar image representation7. Gustavo Divas Karimanzira et al. proposed an underwater object detection solution with MFLS based on RCNN and deployed the solution on an NVIDIA Jetson TX28. Neves et al. proposed a novel multi-object detection system using two novel convolutional neural network-based architectures that output object position and rotation from sonar images to support autonomous underwater vehicle (AUV) navigation9. Xiang Cao et al. proposed an obstacle detection and avoidance algorithm for an AUV with MFLS, using the YOLOv3 network for obstacle detection10. There are different defects among the datasets used in these researches, such as small sample size, few categories of objects, and virtual image data based on style transfer technology. The most important point is that most of the datasets in the related research are not public.
Underwater data collection often comes with high costs of economy, labor and time. Professionals are highly required in operating the MFLS devices and annotating the sonar images while most of the researchers are inexperienced in the MFLS, which results in few public related datasets11,12,13,14,15. Developing MFLS-based algorithms requires a large number of sonar images to verify and improve their approaches. For example, object detection methods in deep learning fields require data to train neural networks. Some researchers have applied sonar simulation11,12 and image translation technology13,14,15 as a solution of lacking data. Since the complexity of acoustic propagation property and the instability of underwater environment, there have always been differences between the generated images and real images. Contrasting to the optical images, the lack of dataset obstructs the development of research on object detection with MFLS images in AI areas. Our proposed dataset aims to improve the above situations.
Considering the recognition habits of human vision, MFLS generally provides images processed with filters and pseudo-coloring which may cause the loss of effective data. Based on acoustic propagation characteristics, the MFLS provides the range and azimuth angle information. So the image in sector representation achieves better visual perception. But invalid information is imported to areas beyond the sector in the images. Figure 1 shows the MFLS raw image and the processed images. The images contain the same three target objects: ball, square cage and human body model.
There have been several MFLS datasets providing processed image data. Erin McCann et al. provided a sonar dataset containing 8 fish species for fish classification and fishery assessment16. Deepak Singh et al. provided a sonar dataset containing typical household marine debris and distractor marine objects in 11 classes for semantic segmentation17. Matheus M. Dos Santos et al. published dataset ARACATI 2017, which provides optical aerial and acoustic underwater images for cross-view and cross-domain underwater localization18. Pontoon objects and moving boats were present in the MFLS data of the dataset. Figure 2 shows an example of comparison of the three datasets and our UATD dataset.
Our UATD dataset directly addresses the above two issues. Starting from 2020, we collected the MFLS data in Maoming and Dalian, China. The environment included lake and shallow water. The dataset provides raw data of MFLS images in high resolution with annotation of 10 categories of target objects. A corresponding benchmark of SOTA detectors performed on UATD including efficiency and accuracy indicators was provided. This dataset could promote the research on underwater object detection based on MFLS. Our work supports three consecutive China Underwater Robot Professional Contest (URPC), providing the dataset for the underwater target object detection algorithm competition. UPRC2022 refers to https://challenge.datacastle.cn/v3/cmptDetail.html?id=680.
Collecting MFLS data
Tritech Gemini 1200ik (website: https://www.tritech.co.uk/product/gemini-1200ik) multibeam forward-looking sonar was used for data collection. The sonar operates at two acoustic frequencies, 720 kHz for long-range target detection, and 1200 kHz for enhanced high-resolution imaging at shorter ranges. Table 1 shows the acoustic specifications of the sonar. The Gemini software development kit providing the raw data of sonar images is available for Windows and Linux operating systems.
We have designed a mechanical structure for the sonar to collect data, as shown in Fig. 3a. The sonar is fixed to a box structure. The box structure is mounted to the end of a metal rod. A connecting piece is installed in the middle of the metal rod to fix the collection equipment to the hull. The connecting piece allows us to adjust the rod to control the depth of sonar to the surface and the tilt angle to the water bottom during the collection. The sonar data collection structure equipped on the boat is shown in Fig. 3b.
We performed the experiments in two places: Golden Pebble Beach at Dalian(39.0904292°N, 122.0071952°E) and Haoxin Lake at Maoming(21.7011602°N, 110.8641811°E). The environments of experiments performed and satellite maps with the experimental areas marked are shown in Fig. 4. The experimental waters have a minimum depth of 4 meters and a maximum depth of 10 meters at Dalian, and about a depth of 4 meters at Maoming.
Ten categories of target objects were selected: cube, ball, cylinder, human body model, tyre, circle cage, square cage, metal bucket, plane model and ROV, as shown in Fig. 5 with their scales. The objects were tied to a floating ball with a long rope individually so that the rough location of the objects could be distinguished according to the ball floating on the water. As a result, the objects might suspend in water or lay on the bottom.
After deploying the objects, we drove the boat mounted with sonar and cruised around the selected sites, searching the target objects and recording data by adjusting the sonar direction.
The shape of the same target may change when the sonar is imaging at different positions and angles, which makes it difficult for the annotator to judge the target category only by experience and intuition when annotating. Therefore, we developed an annotation software for sonar images named forward-looking sonar label tool (OpenSLT). Compared with other annotation tools, OpenSLT has the following two new features: 1) input as image stream. 2) real-time annotation. These features allow us to overcome the above problem as mentioned. OpenSLT can be divided into three modules: toolbar, image display area and annotation display area, as shown in Fig. 6a. The tool first receives the raw data of sonar as input and plays in the form of a video stream. The playback speed can be accelerated or slowed down until the target is found. Then annotator can press the pause button and annotate in image display area using mouse. The annotation will be automatically generated and saved locally. This annotation method enables the annotator to continuously track the target object when annotating as shown in Fig. 6, avoiding the situations where the target object position cannot be confirmed and the target object type cannot be judged when it reappears. With the data protocol provided by Tritech, we extract the sonar working information and the sonar image information from every frame of sonar original data, including working range, frequency, azimuth, elevation, sound speed and image resolution. Then we store these information in a CSV file. OpenSLT loads the CSV file and retains these information during the annotation. In addition, OpenSLT generates file path information in the annotation for batch processing.
UATD dataset is openly available to the public in a figshare repository19. The dataset contains 9200 image files in BMP format corresponding with the same number of annotation files in XML format, and is divided into three ZIP archives, namely “UATD_Training.zip”, “UATD_Test_1.zip”, “UATD_Test_2.zip”. “UATD-Training” contains 7600 pairs of images and annotations. The remaining two parts contain 800 pairs of images and annotations respectively. Each part consists of two folders, storing the image files and annotation files respectively. The image files are in the folder named “image”, and the annotation files are in the folder named “annotation”.
The class distribution of the objects is shown in Fig. 7a. The statistic of collecting ranges and frequencies is shown in Fig. 7b. A total of 2900 images have been collected in 720k Hz and 6300 have been in 1200k Hz. The sonar working range distributes from 5 meters to 25 meters during the data collection. Therefore, we count the number of images with the scope of every 1 meter by range. Besides, the images are also counted by working frequency of 720k and 1200k respectively as shown in Fig. 7b.
A pair of UATD data is shown in Fig. 8 as an example. The echo intensities data is stored in the first channel of the BMP image file. The data in the rest of the two channels of the image is the same as the first channel. The annotation file can be divided into four sections. The “sonar” section provides some basic sonar working information at the moment the corresponding image is collected. As shown in the example: the range is 14.9941 m, the azimuth is 120°, the elevation is 12°, the sound speed is 1582.4 m/s and the frequency is 1200k Hz. All of these parameters are parsed from the sonar output data stream directly. The “file” section provides some information about the relative paths of the image file and annotation file. The “filename” parameter provides the common filename prefix of a pair of image files and annotation files. The “size” section provides the image information. In this example, the image resolution is 1024 × 1428 and owns 3 channels. The “object” section provides the category name under “name” tab and bounding box in pixels of the object under the “bndbox” tab.
The appearances of the same underwater target object at different imaging angles of the MFLS are generally different, leading to a great challenge to the subsequent labeling work. To address the challenge, we designed three effective methods to ensure the accuracy of sonar image annotations. Firstly, three members of our team were responsible for the annotation and completed the labeling work individually after randomly assigning the collected data. Then, with the data playback function in OpenSLT, cross-checking was performed to reduce manual annotation errors. Secondly, we recorded the video stream displaying the processed images for vision habits from Gemini 1200ik synchronously with raw data during the data collection. At the same time, OpenSLT played back the data in the way of the stream to corresponding with the recorded video stream. So it was convenient to detect and track the object by comparing it with the video during annotating the raw data in OpenSLT, avoiding labeling errors caused by losing the target. Finally, we handed over the annotated data to a professional data management company cooperating with us. The professional staff members of the company have checked the data again to ensure correctness.
Object detection benchmarks
A benchmark based on our dataset is given in Table 2. Currently, MMdetection20 is one of the best open-source object detection toolboxes based on PyTorch, which provides a variety of SOTA detectors and is simple to employ. Therefore, the benchmarks are generated by MMdetection(V2.25.0). We choose Faster-RCNN21 and YOLOv322 with various backbones as our object detectors which are the most popular two-stage and one-stage SOTA detectors respectively.
The evaluation has considered both accuracy and efficiency. We first adopted the evaluation metric mean average precision (mAP) and mean average recall rate23 (mAR) to measure the accuracy of detectors on UATD. Then, the efficiency was tested on the local computer and the indicators of FPS, Params and FLOPs were given. More details as below:
mAP - Corresponds to the mean AP for intersect of union (IoU) equals to 0.5 on total categories (10 in UATD).
mAR - Corresponds to the mean recall rate on total categories (10 in UATD).
APname - AP of class (name belongs to the classes in UATD).
Params - The parameter size of models.
FLOPs - Floating-point operations per second with input image size of 512 × 512.
FPS - Frames per seconds.
The UATD was trained on the local machine with an NVIDIA GeForce GTX 1080 GPU. The image height of the input sonar image is resized to 512 while the image width is scaled by the original ratio in both training and inference. The pretrained parameters on ImageNet24 were used to initialize the backbone. During the training period, the initial learning rate was set to 0.0005 and decreased by 0.1 at the 8th and 11th epoch (12 epochs in total) respectively. The warm-up strategy was adopted with a 0.0001 warm-up ratio and increased by linear step in the first 500 iterations. Otherwise, Adam method was employed to optimize the models.
According to the benchmark, Faster-RCNN with Resnet-18 backbone achieves the best mAP of 83.9% and the best mAR of 89.7%. On the other hand, YOLOv3 with MobilenetV2 backbone has a good performance in efficiency with only 3.68 M Params and 4.22 G FLOPs, as well as the fastest inference speed of 93.4 FPS tested on the local machine.
UATD dataset is published in a figshare repository19. Furthermore, the annotation tool OpenSLT is published alongside the dataset, archived as “UATD_OpenSLT.zip”. OpenSLT is developed based on Qt 5.9. The tool worked well on Ubuntu 18.04/20.04 environment during our annotation work. In addition, we provide an example (a small dataset with several sonar image files and corresponding CSV files) with the tool for users to test. The file README.md along with the tool plays the role of guidance for the users.
Pedersen, M., Haurum, J. B., Gade, R. & Moeslund, T. B. Detection of marine animals in a new underwater dataset with varying visibility. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 18–26, https://openaccess.thecvf.com/content_CVPRW_2019/html/AAMVEM/Pedersen_Detection_of_Marine_Animals_in_a_New_Underwater_Dataset_with_CVPRW_2019_paper.html (2019).
Islam, M. J. et al. Semantic segmentation of underwater imagery: Dataset and benchmark. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1769–1776, https://doi.org/10.1109/IROS45743.2020.9340821 (2020).
Liu, C. et al. A dataset and benchmark of underwater object detection for robot picking. In 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 1–6, https://doi.org/10.1109/ICMEW53276.2021.9455997 (2021).
Zhang, H., Tian, M., Shao, G., Cheng, J. & Liu, J. Target detection of forward-looking sonar image based on improved yolov5. IEEE Access 10, 18023–18034, https://doi.org/10.1109/ACCESS.2022.3150339 (2022).
Fan, Z., Xia, W., Liu, X. & Li, H. Detection and segmentation of underwater objects from forward-looking sonar based on a modified mask rcnn. Signal, Image and Video Processing 15, 1135–1143, https://doi.org/10.1007/s11760-020-01841-x (2021).
Jiang, L., Cai, T., Ma, Q., Xu, F. & Wang, S. Active object detection in sonar images. IEEE Access 8, 102540–102553, https://doi.org/10.1109/ACCESS.2020.2999341 (2020).
Preciado-Grijalva, A., Wehbe, B., Firvida, M. B. & Valdenegro-Toro, M. Self-supervised learning for sonar image classification. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1498–1507, https://doi.org/10.1109/CVPRW56347.2022.00156 (2022).
Karimanzira, D., Renkewitz, H., Shea, D. & Albiez, J. Object detection in sonar images. Electronics 9, 1180 https://www.mdpi.com/2079-9292/9/7/1180 (2020).
Neves, G., Ruiz, M., Fontinele, J. & Oliveira, L. Rotated object detection with forward-looking sonar in underwater applications. Expert Systems with Applications 140, 112870, https://doi.org/10.1016/j.eswa.2019.112870 (2020).
Cao, X., Ren, L. & Sun, C. Research on obstacle detection and avoidance of autonomous underwater vehicle based on forward-looking sonar. IEEE Transactions on Neural Networks and Learning Systems https://doi.org/10.1109/TNNLS.2022.3156907 (2022).
Choi, W.-S. et al. Physics-based modelling and simulation of multibeam echosounder perception for autonomous underwater manipulation. Frontiers in Robotics and AI 8, https://doi.org/10.3389/frobt.2021.706646 (2021).
Cerqueira, R., Trocoli, T., Albiez, J. & Oliveira, L. A rasterized ray-tracer pipeline for real-time, multi-device sonar simulation. Graphical Models 111, 101086, https://doi.org/10.1016/j.gmod.2020.101086 (2020).
Sung, M., Kim, J., Kim, J. & Yu, S.-C. Realistic sonar image simulation using generative adversarial network. IFAC-PapersOnLine 52, 291–296, https://doi.org/10.1016/j.ifacol.2019.12.322 (2019).
Sung, M. et al. Realistic sonar image simulation using deep learning for underwater object detection. International Journal of Control, Automation and Systems 18, 523–534, https://doi.org/10.1007/s12555-019-0691-3 (2020).
Liu, D. et al. Cyclegan-based realistic image dataset generation for forward-looking sonar. Advanced Robotics 35, 242–254, https://doi.org/10.1080/01691864.2021.1873845 (2021).
McCann, E., Li, L., Pangle, K., Johnson, N. & Eickholt, J. An underwater observation dataset for fish classification and fishery assessment. Scientific data 5, 1–8, https://doi.org/10.1038/sdata.2018.190 (2018).
Singh, D. & Valdenegro-Toro, M. The marine debris dataset for forward-looking sonar semantic segmentation. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 3734–3742, https://doi.org/10.1109/ICCVW54120.2021.00417 (2021).
Dos Santos, M. M., De Giacomo, G. G., Drews-Jr, P. L. & Botelho, S. S. Cross-view and cross-domain underwater localization based on optical aerial and acoustic underwater images. IEEE Robotics and Automation Letters 7, 4969–4974, https://doi.org/10.1109/LRA.2022.3154482 (2022).
Yang, J. & Xie, K. Underwater acoustic target detection (UATD) dataset. Figshare https://doi.org/10.6084/m9.figshare.21331143.v3 (2022).
Chen, K. et al. Mmdetection: Open mmlab detection toolbox and benchmark. Preprint at https://doi.org/10.48550/arXiv.1906.07155 (2019).
Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031 (2017).
Redmon, J. & Farhadi, A. Yolov3: An incremental improvement. Preprint at https://doi.org/10.48550/arXiv.1804.02767 (2018).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J. & Zisserman, A. The pascal visual object classes (voc) challenge. International journal of computer vision 88, 303–338, https://doi.org/10.1007/s11263-009-0275-4 (2010).
Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255, https://doi.org/10.1109/CVPR.2009.5206848 (2009).
This work was supported by the National Natural Science Foundation of China (Grant, No.62027826). We thank the Dalian Key Laboratory of Underwater Robot of Dalian University of Technology for their support during the data collection. We thank the support provided by OpenI Community(https://openi.pcl.ac.cn) during data processing.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Xie, K., Yang, J. & Qiu, K. A Dataset with Multibeam Forward-Looking Sonar for Underwater Object Detection. Sci Data 9, 739 (2022). https://doi.org/10.1038/s41597-022-01854-w