Introduction

Synthetic aperture radar (SAR), as an all-weather remote sensing radar, plays a crucial role in various fields such as remote sensing, reconnaissance, and space surveillance. SAR necessitates the reception of its echo signals, which are subsequently processed through algorithms for imaging1. The purity of the echo signals significantly influences the subsequent imaging outcomes. However, during signal collection, SAR may inadvertently capture spurious signals in adjacent frequency bands, collectively referred to as radio frequency interference (RFI)2. RFI has the potential to mask the scattering effects of weak signals, impeding the analysis of SAR imaging results. Consequently, suppressing RFI in SAR signals has become a pivotal research direction.

Currently, scholars have proposed various algorithms for the suppression of RFI3. These algorithms encompass traditional parameterized algorithms4,5,6,7, non-parameterized algorithms8,9,10, and semi-parameterized algorithms11,12. With the accumulation of data, the application of deep learning-related technologies for RFI suppression has witnessed rapid development13,14,15. By leveraging various neural network architectures, requirements for RFI suppression can be effectively addressed across multiple domains including the time domain, time-frequency domain, and image domain16,17,18.

The aforementioned algorithms for RFI suppression process signals containing RFI after determining whether the signal is affected by RFI . However, these algorithms do not consider the potential impact of RFI processing on data points that do not contain RFI across various domains. Chojka et al.19 employed a convolutional network as a classifier to preliminarily differentiate images with and without RFI in the image domain. Artiemjew et al.20 used a LeNet-type convolutional neural network to identify various levels of RFI damage in SAR images. Nevertheless, these two algorithms can only indicate the presence of RFI in images and do not precisely mark RFI regions. Tao et al.21 proposed a time-series-based RFI extraction model using Sentinel-1 data in the same area, involving matrix decomposition and hyperparameter settings that incur significant computational complexity. Li et al.22 achieved the construction of Sentinel-1 ground range detected (GRD) products without RFI background. Subsequently, they analyzed the differences between the constructed background and the images to identify Sentinel-1 GRD products containing RFI. While this algorithm is concise and capable of rapidly screening and detecting many GRD products, its robustness is relatively low. Despite the convenience and intuitiveness of image-domain RFI processing algorithms, their accuracy falls short of algorithms applied in the signal and time-frequency domains.

Fan et al.16 addressed the RFI detection problem as a two-class classification issue in the time-frequency domain, utilizing VGG-16 as the primary network to screen time-frequency spectrograms containing RFI. This algorithm filters spectrograms that include RFI but does not provide precise localization of the RFI, limiting its assistance in subsequent RFI suppression. Li et al.23 introduced a time-domain detector for detecting and locating impulsive RFI based on high RFI energy and short-duration characteristics. However, this algorithm exhibits poor performance in detecting weak RFI. Subsequently, they proposed a time-frequency detector combining eigenvalue decomposition (EVD) and short-time Fourier transform (STFT)24, which, compared to the reference, demonstrates improved detection and localization of weak RFI but involves complex spectral analysis processes. Ding et al.25 conducted statistical analysis on SAR signals after STFT, treating them as Rayleigh-distributed. They employed a constant false alarm rate (CFAR) algorithm to detect RFI, aiming to optimize the final suppression effect. However, this threshold-based approach faces challenges in balancing false-negative and false-positive probabilities.

Most studies treat the detection and suppression of RFI as relatively independent issues. This paper proposes a lightweight algorithm based on deep learning that exhibits precise RFI detection capabilities (LDNet) in the time-frequency domain. This algorithm can be combined with time-frequency domain RFI suppression algorithms to enhance the suppression effects without altering the underlying suppression algorithm. The specific research contributions are summarized as follows:

  1. (1)

    The algorithm can effectively distinguish whether RFI exists in time-frequency spectrograms. It accurately labels the RFI positions for spectrograms containing RFI and generates the corresponding binary matrices.

  2. (2)

    To mitigate the impact of the RFI detection algorithm on the overall operational speed of the suppression system, we introduce lightweight modules in the neural network design. Further reduction in the size of the network model is achieved through pruning operations while maintaining accuracy.

  3. (3)

    Experimental results demonstrate excellent RFI detection performance of the algorithm on a lightweight basis.

The remaining structure of this paper is as follows: “Methods” provides a detailed introduction to the LDNet model. “Results” evaluates the specific detection performance of the network against RFI and the degree of model lightweight. “Discussion” discusses the detection and segmentation performance of LDNet, along with its limitations. Finally, “Conclusion” summarizes the content of this paper.

Methods

Fig. 1
figure 1

Overall workflow of the RFI suppression system. RFI detection: the time-frequency spectrogram of SAR echoes is categorized into two scenarios, “Without RFI” and “With RFI”, both of which are simultaneously fed into LDNet. LDNet employs a lightweight design, encompassing local information extraction and global information extraction processes. LDNet outputs the regions of the time-frequency spectrogram containing data points contaminated by RFI. RFI suppression: based on the results of RFI detection, the time-frequency spectrograms identified as affected by RFI undergo RFI suppression algorithmic treatment. The spectrograms obtained from the suppression algorithm affect the data points not contaminated by RFI. In the final output spectrogram, the data values in the RFI region matrix detected by RFI detection are processed through the RFI suppression algorithm, while the original time-frequency spectrogram data are retained in non-RFI regions.

The detection of RFI plays a pivotal and foundational role in implementing RFI suppression. Current RFI detection algorithms primarily filter the presence or absence of RFI without delving into the correlation and integration with RFI suppression algorithms. In light of the shortcomings observed in previous research, this study introduces an RFI detection algorithm, termed LDNet, based on deep neural networks. Firstly, the algorithm accomplishes fundamental RFI filtering by effectively screening the signal in the time-frequency domain. Secondly, the algorithm accurately labels RFI regions and generates corresponding binary matrices for time-frequency spectrograms containing RFI. Utilizing the generated binary matrices, the regions with RFI in the time-frequency spectrograms obtained from the RFI suppression algorithm are suppressed, while the regions without RFI retain the original time-frequency spectrogram data. This strategy enhances the suppression effect without altering the RFI suppression algorithm. Finally, to mitigate the impact of the RFI detection step on the overall operational speed of the RFI suppression process, we employ deep separable convolutions and a “cheap” attention mechanism to construct the network structure. This design allows for obtaining local and global information simultaneously while reducing the computational complexity of the network. Additionally, network pruning is applied to remove redundant parameters, further diminishing the computational complexity of the network. The complete processing flow is shown in Fig. 1.

Network architecture

The objective of LDNet is to extract regions affected by RFI from input time-frequency spectrograms. The architecture of this network is illustrated in Fig. 2, with its input being the time-frequency spectrogram of SAR signals. The output consists of a binary matrix indicating whether RFI influences data points.

Fig. 2
figure 2

Schematic diagram of LDNet structure. LDNet consists of three primary components: Stem, IEM, and Head. Time-frequency spectrogram serves as the input to LDNet, generating the RFI region matrix as the output.

The input of LDNet is defined as \(\mathcal {S}\in \mathbb {R}^{3\times H\times W}\). The Stem module expands the input in both the channel and spatial dimensions to capture edge information of RFI regions in the time-frequency spectrogram, ensuring that the network does not lose any crucial information during feature extraction. Subsequently, the channel dimension is reduced, integrating the extracted channel information , while the spatial dimension is restored to its original size to reduce data volume. The output of the Stem module is denoted as \(\mathcal {S}_1\in \mathbb {R}^{8\times H\times W}\). The feature mapping of the Stem module can be represented as:

$$\begin{aligned} {{\mathcal {S}}_1} = Conv_{1 \times 1}\left( {{f_R}\left( {Con{v_{3 \times 3} }\left( {{f_R}\left( {Con{v_\downarrow }\left( {{f_R}\left( {Con{v_ \uparrow }\left( {\mathcal {S}} \right) } \right) } \right) } \right) } \right) } \right) } \right) . \end{aligned}$$
(1)

where \(Conv_\uparrow\) represents the convolutional and batch normalization operations that expand spatial dimensions, and \(Conv_\downarrow\) represents the convolutional and batch normalization operations that reduce spatial dimensions. \(Conv_{3 \times 3}\) represents a convolutional operation of size \(3 \times 3\) without adjusting spatial dimensions, and \(Conv_{1 \times 1}\) represents a convolutional operation of size \(1 \times 1\). \({f_R}\) represents the ReLU6 activation function. ReLU6 is a special type of ReLU function, calculated as follows:

$$\begin{aligned} ReLU6\left( x \right) = \min \left( {6,\max \left( {0,x} \right) } \right) . \end{aligned}$$
(2)

Following this, the network integrates local and global information through multiple information extraction modules (IEM), generating the feature map \(\mathcal {S}_2\in \mathbb {R}^{C\times H\times W}\). The structure of the IEM will be detailed in the next subsection.

Finally, point-wise convolution is directly applied in the Head module to compress the channel information, reducing the channel dimension of \(\mathcal {S}_2\) to 2. This operation represents dividing the categorization of the overall feature map into two classes (presence or absence of RFI), denoted as \(\mathcal {S}_3\in \mathbb {R}^{2\times H\times W}\). Its feature mapping can be represented as:

$$\begin{aligned} {{\mathcal {S}}_3} = Con{v_{3 \times 3}}\left( {{{\mathcal {S}}_2}} \right) . \end{aligned}$$
(3)

In post-processing, the values in different channels of \(\mathcal {S}_3\) are interpreted as probabilities indicating the presence or absence of RFI, where 0 represents the presence of RFI and 1 represents the absence of RFI, ultimately generating a binary matrix.

Information extraction module (IEM)

The input feature map is bifurcated into two branches in the information extraction module (IEM) to compute new feature maps. The mapping process of the IEM is illustrated in Fig. 3. For ease of analysis, the input feature map to the n-th IEM is defined as \({\mathcal {S}}_{2n}\), and the output as \({\mathcal {S}}_{2n}^{\prime }\).

Fig. 3
figure 3

Schematic diagram of the IEM structure. Local information extraction: first, a \(1 \times 1\) convolution is applied along the channel dimension, followed by a \(3 \times 3\) convolution operation to capture spatial features, and finally, another \(1 \times 1\) convolution is performed along the channel dimension. Global information extraction: initially, a \(1 \times 1\) convolution is conducted along the channel dimension, followed by convolution operations in the horizontal direction to capture horizontal correlations, and finally, convolutions are performed in the vertical direction to obtain vertical correlations.

For local information extraction, we employ a structure akin to inverted residuals. Initially, point-wise convolution is conducted in the channel-depth direction to amalgamate information across channels within the same spatial dimension, generating a novel feature map. Concurrently, we widen the channel dimension to better represent more detailed information . Subsequently, depth-wise convolution is applied in the spatial dimension to each input channel, reducing network parameters based on learning both channel and spatial information. Finally, point-wise convolution is again utilized to compress the channel numbers of the feature map that has acquired spatial information, diminishing the volume of data transferred. The mapping process for local information extraction can be represented as:

$$\begin{aligned} {\mathcal {S}}_{2n}^l = Conv_{1 \times 1}\left( {{f_R}\left( {Conv_{3 \times 3}\left( {{f_R}\left( {Conv_{1 \times 1}\left( {{{\mathcal {S}}_{2n}}} \right) } \right) } \right) } \right) } \right) . \end{aligned}$$
(4)

Regarding the use of the ReLU non-linear activation function, we apply it in high-channel dimensions based on the “manifold of interest” theory26.

For the time-frequency spectrogram of SAR signals, it is of low rank in the absence of RFI. Even in the presence of RFI, the time-frequency spectrogram can be decomposed into a low-rank matrix and a sparse matrix. Hence, there is no need to compute relationships between feature map data points densely. This aligns with the assumption of the decoupled fully connected attention mechanism (DFC)27, which assumes a low-rank condition for images. Therefore, in terms of global information extraction, we employ DFC to compute cheap global attention. Initially, ordinary convolution is used to integrate the spatial and channel information of the feature map, followed by the aggregation of information from different positions along the horizontal and vertical directions to compute attention values for each point. Computing attention values at horizontal and vertical positions for one data point involves the corresponding attention values at its respective horizontal and vertical positions. Thus, this design can encompass all data points, extract global information, and reduce network parameters. The mapping process for global information extraction can be represented as:

$$\begin{aligned} {\mathcal {S}}_{2n}^g = Conv_{k \times 1}\left( {Conv_{1 \times k}\left( {Conv_{3 \times 3}\left( {{{\mathcal {S}}_{2n}}} \right) } \right) } \right) . \end{aligned}$$
(5)

where \(Conv_{1 \times k}\) and \(Conv_{k \times 1}\) represent convolution operations with \(1 \times k\) kernels in the horizontal and vertical directions, respectively. Since there is no significant increase in the channel dimension during the process of extracting global information, no activation function is applied. In the backend of the IEM, local and global information extraction results are fused through element-wise summation, completing the feature mapping process for this module. The final output feature mapping process for the IEM can be represented as:

$$\begin{aligned} {\mathcal {S}}_{2n}^{\prime } = {\mathcal {S}}_{2n}^l + {\mathcal {S}}_{2n}^g. \end{aligned}$$
(6)

Computational analysis

To achieve a lightweight architecture, we refrain from using common residual blocks and transformer attention mechanisms in the IEM. For the design of the local information extraction network, assuming the input feature map dimensions are \([C_{in},H,W]\), we retain its spatial dimensions, increase the channel dimension, and transform the feature map to \([C_{hid},H,W]\), followed by reducing the channel dimension to yield a feature map of dimensions \([C_{out},H,W]\). If using a standard \(3\times 3\) convolution operation, the computational cost is given by:

$$\begin{aligned} FLOPs=9\cdot H\cdot W\cdot C_{in}\cdot C_{hid}+9\cdot H\cdot W\cdot C_{hid}\cdot C_{out}. \end{aligned}$$
(7)

For the local information extraction mechanism in IEM, the computational cost is:

$$\begin{aligned} FLOPs=H\cdot W\cdot C_{in}\cdot C_{hid}+9\cdot H\cdot W\cdot C_{hid}+H\cdot W\cdot C_{hid}\cdot C_{out}. \end{aligned}$$
(8)

Assuming \(C_{hid}=4\cdot C_{in}\), when \(C_{out}=2\cdot C_{in}\), the computational cost ratio is \({9\cdot C_{in}}/{(C_{in}+3)}\), indicating that when \(C_{in}>3/8\), the computational cost of the local information extraction mechanism in IEM is less than that of ordinary convolution. Similarly, when \(C_{out}=1/{(2\cdot C_{in})}\), the computational cost ratio is \({9\cdot C_{in}}/{(C_{in}+6)}\), signifying that when \(C_{in}>3/4\), the computational cost of the local information extraction mechanism in IEM is less than that of ordinary convolution. In the network design, these assumptions are feasible, and the dimensions of the input mappings in the network are always greater than 1.

For global information extraction, assuming the input feature map is \([C_{in},H,W]\) and the output feature map is \([C_{out},H,W]\). In transformer28, calculating the attention value for one data point requires simultaneous involvement of all data points, resulting in an excessively large network model. If the complete attention mechanism in transformer is used, the computational cost is given by:

$$\begin{aligned} FLOPs=2\cdot {(H\cdot W)}^2\cdot C_{out}+H\cdot W\cdot ({C_{out}}^2+3\cdot C_{in}\cdot C_{out}). \end{aligned}$$
(9)

For the global information extraction mechanism in IEM, assuming a \(3\times 3\) convolution is used, with convolution size \(1\times 5\) in the horizontal and vertical directions of the spatial dimension, the computational cost is:

$$\begin{aligned} FLOPs=10\cdot H\cdot W\cdot C_{out}+9\cdot H\cdot W\cdot C_{in}\cdot C_{out}. \end{aligned}$$
(10)

The computational cost ratio between the two mechanisms is \({(2\cdot H\cdot W+3\cdot C_{in}+C_{out})}/{(10+9\cdot C_{in})}\). Given that the spatial dimensions of the feature map in IEM remain uncompressed, the large value of \(H\cdot W\) indicates that the computational cost of the global information extraction mechanism in IEM is significantly lower than that of the transformer attention mechanism.

Network pruning

In the initial stages of network design, a considerable amount of redundant data is often introduced into the structure to enhance the network’s learning capability. However, this design approach poses a clear contradiction to the original intention of lightweight the network. To address this issue, we employ a structured pruning approach that effectively reduces redundant information within the network.

Specifically, we use the weights of convolutional kernels as indicators of their importance, measuring it through the \(L_2\) norm. This metric effectively captures each kernel’s contribution to network learning, enabling precise identification of kernels that contribute less to improving network performance. Subsequently, we implement pruning by removing kernels with lower importance, thereby achieving lightweight network structure. To achieve optimal lightweight effects, we iteratively optimize the network structure through multiple rounds of pruning and fine-tuning. This cyclic process reduces the model’s parameter size while preserving network performance and ensures robust performance on both training and testing datasets. Table 1 presents the specific architecture of each layer in the network and the resulting structure after pruning. Table 2 presents the variations in channel dimensions across four IEMs.

Table 1 Network architecture and pruning results of LDNet.
Table 2 Network architecture and pruning results of the IEM.

Results

We incorporated simulated RFI into real SAR echoes to emulate disturbed scenarios. The simulated RFI considered parameters such as the signal’s frequency, bandwidth, chirp rate, modulation index, and quantity. Three types of signals were designed: narrowband interference (NBI), chirp-modulated wideband interference (\(\hbox {WBI}_{\text {CM}}\)), and sinusoidal-modulated WBI (\(\hbox {WBI}_{\text {SM}}\)). Additionally, the range of signal-to-interference ratio (SIR) was set between \(-1\) and \(-10\) dB. In order to enhance the network’s ability to detect RFI, we devised scenarios involving the mixture of multiple instances of the same category of RFI within a single signal and the combination of diverse RFI within a single signal. To prepare the data for training and evaluation, we transformed the simulated signals into the time-frequency domain using STFT. The resulting spectrograms were resized to dimensions of 256 \(\times\) 256 pixels and saved as RGB images with dimensions of 256 \(\times\) 256 \(\times\) 3. During training and testing, the data values were scaled to a range of 0 to 1 by dividing by 255.

For the RFI detection task, we annotated the spectrograms such that regions containing RFI were marked with a value of 1, while non-RFI regions were labeled with 0.

During network training, the mean squared error (MSE) loss function guided learning, with dynamic learning rate adjustment using the adaptive moment estimation (Adam) optimizer.

To validate the RFI detection performance of LDNet, we conducted comparisons with threshold-based CFAR, segmentation networks based on deep learning, and AC-UNet29 dedicated to RFI detection. We compared LDNet with deep learning-based networks to affirm its lightweight characteristic.

Comparison of RFI detection results

RFI detection can be treated as a binary state detection problem with CFAR. Typically, the maximum false alarm probability (MFAP) is chosen within the range of \(\left[ 10^{-3},10^{-8}\right]\). We set the MFAP to \(10^{-6}\), and tested under the SIR of \(-6\) dB. As shown in Fig. 4, LDNet outperformed CFAR in RFI detection . CFAR mistakenly labels spectrogram points exceeding a threshold as RFI, disregarding their true nature.

Fig. 4
figure 4

Comparison results with CFAR. MFAP set to \(10^{-6}\). SIR = \(-6\) dB. (a) Time-frequency spectrogram with RFI. (b) RFI regions. (c) CFAR. (d) LDNet.

We created a test dataset spanning SIR values from \(-1\) to \(-10\) dB and evaluated eight algorithms: CFAR, FCN, LR-ASPP, three algorithms based on DeepLabV3+ with MobileNetV226, ResNet-10130, and Xception31 backbones , JSLCNN32, and AC-UNet. We treated RFI data points as positive and non-RFI data points as negative, allowing us to obtain the false positive rate (FPR) and false negative rate (FNR) of each algorithm’s segmentation results.

Table 3 compares the mean intersection over union (MIoU) , FNR, and FPR of each algorithm with LDNet. CFAR outperformed FCN and LR-ASPP due to the data imbalance between non-RFI and RFI points. DeepLabV3+ , JSLCNN and AC-UNet enhanced RFI detection by fusing features, but suffered from reduced pixel accuracy. Through multiple iterations of pruning and fine-tuning training, the pruned LDNet exhibits slightly superior RFI detection performance compared to the unpruned LDNet. LDNet’s FPR and FNR were not the lowest, but its overall misclassification rate was the lowest.

Table 3 Comparison of segmentation results for algorithms.
Fig. 5
figure 5

Confusion matrix of different algorithms. SIR \(= -6\) dB. (a) CFAR, MFAP set to \(10^{-6}\). (b) FCN. (c) LR-ASPP. (d) MobileNetV2. (e) ResNet-101. (f) Xception. (g) JSLCNN . (h) AC-UNet. (i) LDNet.

Utilization of confusion matrices for a detailed analysis of algorithmic classification performance. At SIR \(= -6\) dB, Figure 5 illustrates the detection results of various algorithms. CFAR exhibits high RFI precision but suffers from many non-RFI misclassifications. Deep learning algorithms frequently mislabel RFI. FCN and LR-ASPP had poor RFI classification. DeepLabV3+-based algorithms and JSLCNN showed comparable effectiveness. AC-UNet avoided non-RFI misclassifications but frequently mislabeled RFI. Such errors are more consequential than non-RFI misclassifications. LDNet had the lowest misclassification rate, demonstrating superior RFI detection accuracy.

Comparison of networks’ comprehensive performance

Rapid RFI detection algorithms can significantly enhance the operational speed of the entire RFI suppression system. Table  4 provides a detailed overview of all deep learning algorithms utilized in this study, including their parameters, floating-point operations (FLOPs), frames per second (FPS), and latency. LDNet has the smallest model size, but its parallel local and global information extraction results in higher FLOPs than LR-ASPP, yet lower than others. LDNet also shows the fastest prediction speed. Among all models, AC-UNet and LDNet perform best. Compared to AC-UNet, LDNet reduces parameters, FLOPs, and latency by 99.03%, 95.19%, and 24.53%, respectively, while increasing FPS by 32.59%, indicating its significant advantage.

Table 4 Complexity of deep learning-based algorithms.

We evaluated network scale using parameters, prediction speed using latency, and RFI detection quality using MIoU. Figure 6 shows a three-dimensional scatter plot visualizing the networks’ performance. Points nearer to (1,0,0) indicate better overall performance. JSLCNN, AC-UNet , and LDNet all outperform other algorithms significantly , with LDNet demonstrating higher accuracy, lower latency, and a smaller network scale.

Fig. 6
figure 6

Comprehensive performance of the neural networks involved in the experiments.

Discussion

Segmentation performance of LDNet

A lower SIR indicates a greater distinction between RFI and SAR signals, theoretically facilitating detection. This is because, at lower SIR levels, the characteristics of the RFI are more pronounced, making them easier to identify and segment. We evaluated the segmentation performance of LDNet under the SIR conditions specified in our experimental environment, and the results are shown in Fig. 7. It can be observed that within the SIR range of the dataset, LDNet demonstrates effective and stable detection performance.

Fig. 7
figure 7

MIoU of LDNet under different SIR.

To further evaluate LDNet’s performance beyond the SIR range defined in the dataset, we conducted additional tests at 0 dB, 5 dB, and 10 dB, as illustrated in Fig. 8. At 0 dB, LDNet maintains robust segmentation performance, effectively identifying and segmenting RFI signals. However, at 5 dB, we observed instances where LDNet misclassifies some RFI points as non-RFI when the SAR signal is in a high-amplitude region. This suggests a reduction in LDNet’s accuracy under specific conditions. At 10 dB, there is a noticeable decline in segmentation performance. Nevertheless, it is important to note that at 10 dB, the influence of RFI on SAR imaging is minimal. Thus, even with less precise segmentation, the overall imaging quality remains largely unaffected.

Fig. 8
figure 8

Results of RFI detection for out-of-dataset SIR. (a) Time-frequency spectrogram with SIR = 5 dB. (b) Detection results with SIR = 0 dB. (c) Detection results with SIR = 5 dB. (d) Detection results with SIR = 10 dB.

Impact of misclassification

Misclassification can occur in two scenarios: RFI data points misclassified as non-RFI, and non-RFI data points misclassified as RFI. Misclassifying RFI as non-RFI results in incomplete RFI suppression, potentially leaving residual RFI in the time-frequency spectrograms, which can compromise subsequent data processing accuracy. Conversely, misclassifying non-RFI as RFI alters non-RFI data during suppression, leading to integrity and authenticity issues.

Based on the FNR and FPR results in Table 3, LDNet does not achieve optimal performance in either type of misclassification. However, upon analyzing their impact on RFI suppression, similar misclassification rates highlight the greater influence of FNR over FPR. An exception is CFAR, which, being threshold-based rather than semantic-based, may misclassify high-amplitude non-RFI information as RFI, thereby failing to protect such data. In summary, LDNet demonstrates superior overall performance considering combined misclassification effects.

Limitations and future directions

One limitation of LDNet is that some of its channel numbers are not powers of two, potentially reducing the efficiency of hardware parallel computation. To overcome this limitation, future research could concentrate on designing specialized hardware accelerators optimized for handling non-power-of-two channel numbers. These accelerators would involve tailored designs in processor architecture, memory access patterns, and computational units, thereby enhancing support for such channels and improving hardware parallel computation efficiency.

RFI in real-world environments exhibit diverse and complex characteristics, which may differ significantly from the training data distribution. Addressing this challenge requires the development of adaptive and online learning mechanisms for LDNet. These mechanisms would enable dynamic adjustment based on real-time received signals, thereby enhancing robustness against previously unseen interference signals. Additionally, leveraging data augmentation and synthetic techniques to enrich the training dataset can further bolster LDNet’s ability to generalize across varied environments and scenarios, thereby enhancing its performance in managing complex and variable RFI.

The integration of LDNet into existing RFI suppression systems necessitates careful consideration of compatibility and interface issues with other system components. Future research should therefore prioritize integrating LDNet into practical systems, conducting extensive testing to resolve compatibility challenges, and validating its performance and reliability in real-world environments.

Conclusion

This paper introduces a lightweight neural network for detecting and segmenting RFI, providing a novel perspective for enhancing the suppression effectiveness of RFI suppression algorithms in the time-frequency domain. Unlike algorithms that separate detection and suppression algorithms, this approach can filter time-frequency spectrograms containing RFI and identify the pixel regions affected by RFI, thereby optimizing the suppression results of RFI suppression algorithms. Experimental validation demonstrates that the proposed algorithm is lightweight and capable of effectively identifying RFI regions in the time-frequency domain .