Unsupervised underwater shipwreck detection in side-scan sonar images based on domain-adaptive techniques

Underwater object detection based on side-scan sonar (SSS) suffers from a lack of finely annotated data. This study aims to avoid the laborious task of annotation by achieving unsupervised underwater object detection through domain-adaptive object detection (DAOD). In DAOD, there exists a conflict between feature transferability and discriminability, suppressing the detection performance. To address this challenge, a domain collaborative bridging detector (DCBD) including intra-domain consistency constraint (IDCC) and domain collaborative bridging (DCB), is proposed. On one hand, previous static domain labels in adversarial-based methods hinder the domain discriminator from discerning subtle intra-domain discrepancies, thus decreasing feature transferability. IDCC addresses this by introducing contrastive learning to refine intra-domain similarity. On the other hand, DAOD encourages the feature extractor to extract domain-invariant features, overlooking potential discriminative signals embedded within domain attributes. DCB addresses this by complementing domain-invariant features with domain-relevant information, thereby bolstering feature discriminability. The feasibility of DCBD is validated using unlabeled underwater shipwrecks as a case study. Experiments show that our method achieves accuracy comparable to fully supervised methods in unsupervised SSS detection (92.16% AP50 and 98.50% recall), and achieves 52.6% AP50 on the famous benchmark dataset Foggy Cityscapes, exceeding the original state-of-the-art by 4.5%.

Experiments show that method S1c can be integrated into the current DAOD framework effectively to achieve superior performance.Algorithm 1 provides a detailed description of the training procedure, which is termed DCB.Additional ablation experiments are presented in Table S1.Method

DCBD for unsupervised SSS detection
Due to the inherent differences in imaging mechanisms and the contrasting environments of the instances captured, a substantial domain gap exists between real images and SSS images.This gap poses significant limitations on the unsupervised training of SSS images.Table .S2 details the ablation studies and intermediary processes in the training of DCBD for the optical to SSS task, emphasizing our approach's efficacy.Our experiments reveal that style-transfer and semi-supervised methods are indispensable components.By synergizing these advanced cross-domain techniques, our foundational framework pioneers in achieving unsupervised SSS image detection tasks (89.20%AP50).The integration of IDCC and DCB further refines the model, enhancing its accuracy and recall rates, and contributing to an additional performance gain of 2.96% AP50.Moreover, we evaluate the superiority of DCBD compared to the Weighted Box Fusion (WBF) 1 method.WBF is a prevalent model prediction fusion technique that integrates outputs from multiple models during post-processing for more robust predictions.While WBF demonstrates marginally higher accuracy in the first stage than DCBD, it requires loading multiple models during inference, thereby increasing the computational burden during model deployment.In contrast, DCBD efficiently leverages the identical structures of DRD and DID along with their domain information discrepancies.Through iterative refinement, DCBD continues to enhance model accuracy in subsequent stages, achieving superior performance over WBF (3.01%mAP and 0.85% AP50).Table S3.Comparison results of different generative algorithms during the burn-in phase.
The combined simulated data from SDM and CycleGAN yields the best results, significantly improving the model performance during the burn-in phase.Based on this data simulation strategy, the accuracy of DCBD during training is presented in Table.S4.We observed that using a better generative model can significantly improve the performance of the DRD and DID branches.However, the overall performance improvement of the DAD model was not significant, with only a 0.56% mAP and 0.01% AP50 improvement.On the one hand, this phenomenon validates the effectiveness of using SDM for shipwreck simulation, indicating that employing better data simulators can enhance the effectiveness of domain-adaptive object detectors in unsupervised underwater shipwreck detection tasks.On the other hand, the minor improvement in DAD performance indirectly reflects the superiority of our algorithm.In the absence of data simulation, the performance of domain-adaptive object detection algorithms is only influenced by three factors: the quality of optical images, the quality of sonar images, and the effectiveness of the algorithm.Assuming the use of an optimal domain-adaptive object detection algorithm, the accuracy of the detector is only affected by the quality of the images.Our proposed algorithm, which balances feature transferability and discriminability, learns domain-relevant and domain-invariant features from images captured by different branches, maximally utilizing all knowledge from optical and SSS images and achieving optimal results.Therefore, using burn-in models with higher performance only improves the stability of model training, failing to significantly enhance the model's performance further.

Analysis of failure cases
To further analyze areas for improvement, we conducted an analysis of failure cases of DCBD on underwater shipwreck detection.Several typical failure cases are illustrated in Error cases (a), (b), and (c), while lowering the evaluation accuracy of the model, don't affect the performance of the detector in practical detection tasks.Cases (a) and (b) are caused by the fragmentation of shipwrecks, making it difficult even for annotators to determine how to annotate fragmented shipwrecks accurately.For case (c), the false shadows in SSS images are an integral part of the target being detected, this failure case demonstrates the potential of Domain-adaptive object detection to bridge the domain differences between optical and sonar images while remaining unrestricted.In such cases, using accuracy to evaluate detector performance is not entirely reliable.This highlights the need for researchers to propose more effective performance evaluation methods.
Case (d) illustrates the limitations of our algorithm, where the detector misclassifies background regions as instances of shipwrecks.This deficiency may stem from two main reasons: (1)Unsupervised training strategies: During training, the model lacks accurate SSS image labels to correct its predictions.Learning from erroneous pseudo-labels will make the detector more confident in judging incorrect predictions.This cumulative effect exacerbates detector errors.
(2)Insufficient dataset quality: The availability of bird's-eye view data in optical images is limited, which constrains both the quantity and quality of optical images.This limitation in training data directly impacts the performance ceiling of the detector.
analysis of each component and training process of DCBD in the Optical → SSS Task.'Semi-sup' refers to the semi-supervised method.The terms 'Domain invariant' and 'Domain relevant' correspond to the 'DRD' and 'DID' models, respectively.Different generative methods during the Burn-in phaseConsidering that using different generative methods during the burn-in phase may affect the performance of domain-adaptive object detection, we conducted an analysis of the performance based on different generative methods.We simulated underwater SSS data using CycleGAN 2 , StyleGANv3 3 , and Stable Diffusion Model (SDM)4 respectively, and the results are shown in Figure.S3, S2, S4.

Figure S5 .
Figure S5.Four categories of failure cases.

Table S1 .
Comparison results of different domain collaborate algorithms on Cityscapes → Foggy Cityscapes.
Algorithm 1: Domain Collaborative Bridging Input: training dataset: (X s ,Y s , X t ); batch dataset: (x s , y s , x t ); teacher model: M DRD , M DID ; EMA momentum: α; alternating round: R; max_iterations.4 for k ← 1 to max_iterations do 5 Get source data x s , y s , target data x t ; 6 Get fake-target images x f t , y f t ; 7 Optimize M DRD , M DID ; 8 Update EMA model:

Table S4 .
Comparison results of different generative algorithms in underwater SSS shipwreck detection task.