Introduction

Early information on the condition of fetal heart can help to prevent half of all stillbirths1. Fetal monitoring is crucial in providing valuable information for timely intervention and avoiding damage to vital fetal organs. Cardiotocography is a conventional obstetrical monitoring technique used to assess fetal distress by processing the pulse and uterine contractions recorded simultaneously from the fetus and the mother respectively2. Although cardiography is a standard, non-invasive technique for fetal heart monitoring, the signals recorded do not always reflect abrupt changes in fetal heart rate3,4. This can lead to misinterpretation and misdiagnosis, and therefore devalues cardiotocography clinically5.

To accurately diagnose fetal cardiac pathologies, the morphology of the electrocardiographic signal is required in the early stages of pregnancy to enrich the information provided by cardiotocography3,6 and other sonographic techniques7,8,9. Electrocardiography techniques in which an electrode is placed directly on the fetal scalp can provide this morphological information. The electrocardiogram (ECG) obtained on the scalp is more reliable for monitoring the fetal heart, but it can only be recorded during labor after the amniotic fluid membrane has ruptured, and may pose risks of infection for both fetus and mother10. Non-invasive fetal electrocardiography, which involves placing electrodes on the maternal abdomen, offers a promising alternative to standard fetal monitoring methods11,12,13. Combining all the advantages of the aforementioned techniques, it is non-invasive, safe, and comfortable for both fetus and mother. It provides more accurate information and can be used during pregnancy, unlike the scalp ECG recording method. However, the use of non-invasive fetal ECG in clinical practice remains limited due to its low signal-to-noise ratio (SNR)10,14. Noises mixed with non-invasive fetal ECG that complicated its extraction include instrumental noise, maternal ECG, fetal and maternal electromyographic signals, and motion artifacts15,16. These noises can distort the waveform of the fetal ECG, leading to misinterpretation of fetal heart status17,18. It is therefore imperative to develop denoising methods that provide ECG signals with an accurate representation of fetal heart status.

To remove noise from fetal ECG signals, several methods have been proposed in the literature. These methods are generally used as post-processing tools after removal of the maternal ECG from the abdominal recordings, in order to improve the quality of the extracted fetal ECG. They can be grouped into model-based methods and learning-based methods. Model-based methods include most classical signal denoising algorithms such as low-pass filtering19, non-local means based methods20,21, signal decomposition techniques in frequency and time domains, such as spectral decomposition22,23, empirical mode decomposition24,25 and discrete wavelet transform26,27. Most model-based methods are simple to implement, but suffer from their inability to perform well under a variety of noise conditions.

Recently, deep learning-based methods have achieved incredible success in many fields, such as image restoration28,29,30,31, image segmentation32,33,34,35, audio filtering36,37,38,39, and audio classification40,41,42,43. These success stories have attracted more and more in-depth research in the field of ECG noise reduction. Methods designed for ECG signal quality restoration and enhancement44,45,46,47, QRS detection48,49,50,51, and fetal ECG signal extraction52,53,54,55,56 are representative examples. Fotiadou et al.57 were the first to propose a fully convolutional deep neural network for the post-processing of non-invasive fetal ECG signals recorded both in single-channel and multi-channel configuration. Their method significantly improves the quality of fetal ECG signals and successfully preserves morphological signal changes. However, in the case of channel signals heavily corrupted by noise, it was not able to reconstruct the relevant morphological features of the fetal ECG, leading to an erroneous diagnosis due to added waves in the reconstructed signals that should not have been there. Singh et al.58 proposed a method based on a generative adversarial neural network for denoising adult ECG. Their model showed good generalization ability for multiple noise scenarios but failed in the case of highly corrupted signals.

In this work, to overcome the aforementioned limits, we introduced a deep neural network with skip connections capable of denoising multi-channel fetal ECG signals. The proposed network learns, in a supervised and adversarial way, a sequence of nonlinear transformations that can take the noise-corrupted ECG signals as input and deliver the noise-free ECG signals as output. The trained network eliminates noise in the extracted fetal ECG signals by condensing them into a latent representation, followed by upsampling to recover details.

This paper is organized as follows: In the following section, we explain the proposed approach to fetal ECG denoising and describe the datasets used for training and evaluation of the proposed method. We also describe the other denoising methods used for a comparative study with the proposed method. The “Results and discussion” section is focused on results and discussion, and the paper ends with a conclusion followed by some perspectives.

Material and methods

Methodology

Proposed approach

Our approach aims to train, in a supervised way, a neural network that denoise highly corrupted fetal ECG signals using a dataset of paired clean and noisy fetal ECG signals. More formally, this problem can be stated as follows:

Given a paired signal training dataset \(\left\{ (\hat{x},x)_1, \ldots , (\hat{x},x)_N\right\}\), where \(\hat{x}\) represented the noisy version of the clean fetal ECG signal x for a given sample, our goal is to learn the right parameter values of a neural network so that the network can map the noisy signal \(\hat{x}\) to the clean signal x. With these optimized parameters, the neural network can be used to denoise all fetal ECG signals, including those not available in the training database.

Figure 1 illustrated the overview of the proposed approach at training and inference time.

Figure 1
figure 1

Training and inference schemes of the proposed approach for fetal ECG denoising.

Three networks involve in training. The first two sub-networks are the encoder and the decoder networks, respectively \(f_E\) and \(f_D\). They are highlighted in blue and yellow respectively in Fig. 1 and are components of the final network f used to denoise the signals at inference time. The last one, highlighted in green, is the discrimnator network (\(f_{disc}\)).

The encoder network (\(f_E\)) is composed of eight dilated convolutional layers. Each one performs a sequence of three operations. These operations are a 1D convolution, a leaky rectified linear unit (leakyReLU) activation function, and an instance normalization operations. The encoder downscales the input noisy fetal ECG signal \(\hat{x}\), compressing it to a vector of lower dimensions \(\hat{z}\), hypothesized to live in space of the best features of x and thus referred to as the latent vector of x. That means by mapping \(\hat{x}\) to \(\hat{z}\), \(f_E\) only retains useful features of the signal and thus ignores useless features as noise. The decoder network upscales the latent representation \(\hat{z}\) to a signal \(\tilde{x}\) with the same dimension as x for consistent comparison. It consists of nine layers. The first eight perform a sequence of three operations each. These operations are the same as those performed in the encoder layer, except that the dilated convolution is replaced by a transposed convolution operation. The last layer of the decoder is a regular convolution layer performing a convolution operation with no activation function or normalization operation. The discriminator network (\(f_{disc}\)) is composed of four layers, each ones performing a regular linear operation followed by a rectified linear unit (ReLU) operation. except the last one in which . Its role is to distinct the \(\tilde{x}\) signal generated by the decoder from the clean signal x. The \(f_{disc}\) network is the part responsible for the adversarial learning in the proposed approach.

Figure 2 depicts the architectures of the encoder, decoder and the discriminator networks. This figure shows that every two convolutional and mirrored transposed convolutional layers are linked with dashed arrows. These dashed arrows are known as “skip” connections and help to deal with the gradient vanish problem occurring in deep architectures59 such as the one proposed. The input dimensionality of features at each layer is also presented in Fig. 2, and a more detailed view with the output dimensionality, the number of trainable parameters and all the variables taken into account by the different operations at each layer is shown in Table 1.

During training, the encoder learns to retain the useful features of a noisy input signal \(\hat{x}\) by mapping it to a much lower-dimensional vector \(\hat{z}=f_E(\hat{x})\). In addition, the encoder is again used to map the clean signals x to the corresponding latent lower-dimensional vector \(z=f_E(x)\). Since z and \(\hat{z}\) are of equal dimension, they can be used for consistent comparison, thus enhancing the encoder’s ability to eliminate unnecessary features when encoding noisy signals. This second use of the encoder is one of the unique aspects of the proposed approach. The decoder takes the latent vector \(\hat{z}\) as input and learns to generate a signal \(\tilde{x}=f_D(\hat{z})\) as close as possible to the clean signal x. The generated signal and the clean signal are then fed into the discriminator which, by discriminating between the generated signal and the clean signal, helps the decoder to generate a better signal which finally fools the discriminator. The loss functions for training the respective networks are presented in the next section.

The convolution operation in the encoder and the decoder networks uses a dilation factor to increase the perceptual field of our networks so that the temporal structure of the noisy input ECG signal can be captured at multiple scales. In addition, the encoder network is designed to handle a four-channel ECG signal, enabling the spatial information provided by each channel to be exploited. In this way, spatio-temporal information is exploited to eliminate noise and provide clean signals while preserving the morphology of the individual ECG components.

Based on the above, after training, the encoder and decoder networks can be used to design the final filter f defined as \(f=f_D \circ f_E\) for denoising unseen noisy fetal ECG.

Figure 2
figure 2

The Encoder, Decoder, and Discriminator architectures of the proposed approach.

Table 1 Detailed description of the proposed networks architectures.

Objective function and training parameters

To train the networks parameters of the proposed approach, we formulate an objective function by combining three loss functions, each optimizing the individual sub-networks of the proposed approach.

The first is a contractive denoising auto-encoder loss \(\mathscr{L}_{ctr}\) introduced in Refs.60,61 and used in Ref.62 for ECG filtering purposes. It is formed by the addition of two terms, a classical denoising auto-encoder loss \(\mathscr{L}_{rec}\) computing the mean squared error (MSE) between the clean fetal ECG and the denoised fetal ECG, and a Frobenius norm \(\Omega\) of Jacobian matrix computed from the noisy input fetal ECG. Hence, the contractive loss \(\mathscr{L}_{ctr}\) is defined as follows:

$$\begin{aligned} \mathscr{L}_{ctr} = \mathscr{L}_{rec} + \Omega \end{aligned}$$
(1)

with

$$\begin{aligned} \mathscr{L}_{rec} = \left\| x-\tilde{x}\right\| ^2_2 \end{aligned}$$
(2)

and,

$$\begin{aligned} \Omega = \left\| \frac{\partial f_E(\hat{x})}{\partial \hat{x}} \right\| ^2_F \end{aligned}$$
(3)

The next is responsible for adversarial learning introduced in generative adversarial networks (GANs)63. The GAN framework commonly comprises two sub-networks, a generator network and a discriminator, with competing objectives63. They have achieved considerable success in image processing and are still being explored for time series applications such as time classification and synthesis64,65,66,67,68. However, it has been shown that many GAN-based algorithms trained with classical adversarial loss may not converge63,69. To avoid this problem and continue benefit from the advantages offered by GANs, several techniques for computing loss functions have been introduced to encourage GAN-based algorithms convergence70,71,72. We use the one called feature matching proposed in Ref.70, which uses the internal representation of the discriminator network to update the generator parameters. In our reformulation of the GAN framework, the decoder network plays the role of the generator network, hence the adversarial loss \(\mathscr{L}_{adv}\) is defined as the MSE between the feature representation of the clean fetal ECG and the generated fetal ECG, respectively:

$$\begin{aligned} \mathscr{L}_{adv} = \left\| f_{disc}(x)-f_{disc}\left( f_D(\hat{z})\right) \right\| ^2_2 \end{aligned}$$
(4)

where \(f_{disc}\) is a function that outputs an intermediate layer of the discriminator network.

The \(\mathscr{L}_{ctr}\) and \(\mathscr{L}_{adv}\) introduced above encourage the neural network f, especially the decoder part \(f_D\) to produce realistic and similar signals to clean fetal ECG. This work can be done only under the condition that the encoder part produces a vector \(\hat{z}\) capturing the best representations of x. To enforce that, we introduced an additional loss function \(\mathscr{L}_{enc}\) defined as the MSE between the latent vector of the clean fetal signal and the latent vector of the noisy fetal signal:

$$\begin{aligned} \mathscr{L}_{enc} = \left\| f_E(x)-f_E(\hat{x})\right\| ^2_2 \end{aligned}$$
(5)

Overall, the objective function to train the proposed approach is defined as the weighted average of the above loss functions:

$$\begin{aligned} \mathscr{L} = w_{enc}\mathscr{L}_{en} + w_{adv}\mathscr{L}_{adv} + w_{rec}\mathscr{L}_{rec} + w_{\omega }\Omega \end{aligned}$$
(6)

where \(w_{enc}\), \(w_{adv}\), \(w_{rec}\), and \(w_{\omega }\) are the hyper-parameters adjusting the contribution of individual losses to the overall objective function. Since each fetal signal has four channels, the overall objective is computed along each channel and then averaged.

In our experiments, we organized signals in batches of size \(B=8\) and used as optimization algorithm the variant of the Adam algorithm proposed in Ref.73 with weight decay and learning rate parameters set to \(5.10^{-2}\) and \(10^{-5}\), respectively. We empirically found that setting the weighting parameters \(w_{enc}\), \(w_{adv}\), \(w_{rec}\), \(w_{\omega }\) to 4, \(10^{-2}\), 25, \(10^{-4}\), respectively, allowed fast convergence. The network was implemented in Pytorch and trained for 20 epochs on a 64-bit Intel Core i7 processor with 8 GB of RAM.

Data for training and evaluation

Two datasets were used for training purposes and to evaluate the proposed method, as well as the other methods chosen for benchmarking. The first one is a synthetic fetal ECG dataset created by employing the open-source fecgsyn toolkit developed in Ref.74. The fecgsyn toolbox is used to create a dataset suitable for network training by dividing each sample in the dataset into an abdominal noisy signal and a fetal ECG clean signal of thirty-two channels each. Using the fecgsyn toolbox, we simulate different physiological event cases with different noise levels ranging from \(-\,12\) to 12 dB and from \(-\,30\) to 0 dB respectively, for training and evaluation purposes. The physiological event cases considered are described in Table 2, they are similar to events simulated in the Fetal ECG Synthetic Database74,75 except that we excluded the case of twin pregnancy as the proposed model was not developed to handle this case. Each simulation was run, for statistical purposes, five times independently using a five-minute signal at a sampling rate of 250 Hz.

The second one is an open-access dataset called the Abdominal and Direct Fetal Electrocardiogram Database76. It consists of real multi-lead abdominal fetal ECG signals obtained from five women in labor, between 38 and 41 weeks of gestation. Each signal has been recorded with 16-bit resolution at a sampling rate of 1000 Hz for five minutes and was processed by digital filters to eliminate baseline and power-line inference76. For each record, four-channel abdominal mixtures and the corresponding direct fetal ECG signals are given.

Table 2 Description of physiological events of the synthetic signals of synthetic dataset.

Data preprocessing

Before applying the proposed algorithm to data, certain preprocessing steps are necessary. Firstly, using the method based on the Extended Kalman Filtering77, the cancellation of the maternal ECG was performed for the abdominal part of all recordings in the synthetic and real databases.

Then, for each thirty-two channel signal in the synthetic dataset, eight channels were selected to form two four-channel signals. The channels were selected according to recommendations given in Ref.74. All channels were taken into account in the real database since there are only four. Finally, all the fetal ECG signals were resampled to 500 Hz to have a common frequency and they were divided into sequences of \(4 \times 1920\) samples. The signals were also standardized along each channel to have a mean of zero and a unit standard deviation.

Performance evaluation

Performance metrics

In this study, two sets of performance measures were used to evaluate the proposed approach. The first set measures the divergence between the signal output by the proposed method and the ground truth signal. These measures were only applied to the synthetic dataset, since for each noisy fetal ECG in the synthetic dataset, its clean version is available. These metrics include the signal-to-noise ratio improvement (SNR\(_{imp}\)), the root mean-square error (RMSE) and the percent-root distortion (PRD).

SNR\(_{imp}\) measures the difference of SNR between a denoised signal and the corresponding noisy input signal. A higher value of SNR\(_{imp}\) indicates better denoising performance. It can be expressed for a channel, c, of a signal as follows:

$$\begin{aligned} \textit{SNR}_{imp} = \textit{SNR}_{out} - \textit{SNR}_{in} \end{aligned}$$
(7)

where SNR\(_{in}\) and SNR\(_{out}\) are defined as:

$$\begin{aligned} \textit{SNR}_{in}= & {} 10\times \log _{10}{\left( \frac{\sum _{t=1}^T x_{c,t}^2}{\sum _{t=1}^T (\hat{x}_{c,t}-x_{c,t})^2}\right) } \end{aligned}$$
(8)
$$\begin{aligned} \textit{SNR}_{out}= & {} 10\times \log _{10}{\left( \frac{\sum _{t=1}^T x_{c,t}^2}{\sum _{t=1}^T (\tilde{x}_{c,t}-x_{c,t})^2}\right) } \end{aligned}$$
(9)

RMSE measures the variance between the denoised signal and the corresponding clean signal. A lower value of RMSE corresponds to a smaller difference and is desired as it indicates better performance. For a channel, c, of a signal, RMSE is defined as:

$$\begin{aligned} \textit{RMSE} = \sqrt{\frac{1}{T}\sum _{t=1}^T (x_{c,t}-\tilde{x}_{c,t})^2} \end{aligned}$$
(10)

PRD can be used as an indicator of the recovery quality of the compressed signal. A lower value of PRD indicates a better quality of the denoised signal. For a channel, c, of a signal, it is defined as:

$$\begin{aligned} \textit{PRD} = 100\times \sqrt{\frac{\sum _{t=1}^T (x_{c,t}-\tilde{x}_{c,t})^2}{\sum _{t=1}^T x_{c,t}^2}} \end{aligned}$$
(11)

These metrics are computed for each channel and then averaged.

In a dataset containing the real fetal ECG, the above metrics are useless since there are no clean fetal ECG signals that can be used as ground truth signals. However, the dataset provides the abdominal recording with the simultaneously recorded scalp ECG annotated by the experts76. Although recorded directly from the fetal head, scalp measurements contain a considerable amount of noise. Moreover, they are recorded on a different lead than the abdominal leads; thus, even in the case of perfect denoising, the denoised signals and the scalp signals can not be matched. Nevertheless, the manifestations of major cardiac electrical events such as ventricular depolarization should coincide and be aligned in the ECG signals from the abdomen and scalp. It is therefore possible to base the evaluation of the proposed method on the correctness of detecting the QRS complexes. The calculation of the correctly detected QRS complexes is performed with a tolerance of \(\pm \,50\) milliseconds of the scalp R-peak location78. Based on correct and incorrect detection, the following metrics can be defined:

$$\begin{aligned} \textit{PPV}= & {} \frac{\textit{TP}}{\textit{TP}+\textit{FP}} \end{aligned}$$
(12)
$$\begin{aligned} \textit{SE}= & {} \frac{\textit{TP}}{\textit{TP}+\textit{FN}} \end{aligned}$$
(13)
$$\begin{aligned} \textit{F}_1= & {} \frac{2 \textit{PPV}\cdot \textit{SE}}{\textit{PPV} + \textit{SE}} \end{aligned}$$
(14)

where TP is True Positive, i.e. the correctly detected QRS complexes, FN is False Negative, i.e. the undetected QRS complexes; and FP is False Positive, i.e. the incorrectly detected QRS complexes. PPV measures the accuracy of the detection algorithm by evaluating the ratio of detections of true-positive QRS complexes to the total number of QRS complexes detected; SE measures the ability of the detection algorithm to detect true-positive QRS complexes relative to the total number of QRS complexes annotated, and \(\textit{F}_1\) is the harmonic mean of sensitivity and accuracy, providing a balanced measure of the performance of the detection algorithm. These metrics are widely used to evaluate QRS complex detection and higher values of them are indicators of better performance denoising48,51.

Comparative methods

A benchmarking study is carried out to evaluate the proposed approach against three other denoising methods. The first comparative method is wavelet-based. The wavelet transform is widely studied and used for ECG signal analysis because, by expanding the signal in terms of a localized wavelet function in both time and frequency, it provides good time resolution at high frequency and good frequency resolution at low frequency79. Wavelet-based denoising methods consist of three main steps consisting of decomposing the signal into coefficients, comparing these coefficients with a certain threshold, and reconstructing the signal with the thresholded coefficients. We selected the sixth-order symlet wavelet as the mother wavelet because it works well on ECG noise80 and the sureshrink method was used for thresholding81,82.

The next comparative method is a supervised learning-based method. It is a denoising convolutional autoencoder (FCN-DAE) proposed in Ref.83 for noise reduction in ECG signals. The architecture of FCN-DAE has three main parts; the encoder part and decoder part using convolution and deconvolution operations together with batch normalization and exponential linear unit (ELU) as activation function, respectively84. The last part is the output layer performing only a deconvolution operation. A more exhaustive description can be found in Ref.83. It is important to note that FCN-DAE was originally proposed for single-channel ECG filtering83, so we slightly modified it to adapt it to four-channel signals configuration.

The last method is the supervised learning-based method designed in57, especially for denoising multichannel fetal ECG. It is a deep convolutional neural network (DCNN) consisting of an encoder of eight convolutional layers symmetrically connected with eight transposed convolutional layers of a decoder. The encoder’s layers perform convolution operations such that the signal is downsampled by two after each layer, while the decoder’s layers perform transposed convolutions that upsample the signal by two after each layer. The leaky rectified linear units with a slope of 0.2 are used as a non-linearity operation at each layer. A more exhaustive description of this method can be found in Ref.57.

For a fair comparative study, it is important to note that the comparative denoising methods and the proposed method do not have the same complexity. The calibration of wavelet-based methods relies solely on the choice of wavelet family and order, and the choice of threshold, and is done through a trial-and-error process, usually in a short time. Methods based on deep learning require more parameters, more computational resources and more time to train these parameters. The proposed model uses 49,229,492 parameters to estimate the denoised signal, while the DCNN model uses 93,649,796 parameters, just under twice as many, for the same task. It should be noted, however, that training of the proposed model is more complex due to the dual use of the encoder and the presence of the discriminator (Fig. 2). The FCN-DAE model is the simplest of the deep learning models used in this study. It has 86,211 parameters and uses no residual connection like the other two.

Results and discussion

Evaluation on synthetic dataset

The proposed method and the existing methods were evaluated on the synthetic dataset, and the results are presented in Fig.  3. Figure 3 illustrates the improvement in SNR, the RMSE and the PRD for input SNR ranging from \(-\,30\) to 0 dB. It can be observed that the proposed method outperforms the existing methods by providing the highest values of \(\textit{SNR}_{imp}\) and the lowest values of RMSE and PRD throughout the whole range of input \(\textit{SNR}\). More explicitly, for input SNR between \(-\,10\) and 0 dB the proposed method and the DCNN method57 have comparable performances, while the performances of the FCN-DAE method83 and the wavelet denoising method gradually become similar. As the input SNR decreases, the DCNN and wavelet denoising methods perform less well, rendering them useless in cases of very low-quality signals, while the FCN-DAE method still performs well but not as well as the proposed method.

Figure 3
figure 3

Performance of the proposed network in comparison with the existing methods in terms of (a) the improvement in signal-to-noise ratio (SNR\(_{imp}\)), (b) the root mean-square-error (RMSE), and (c) the percent root distortion (PRD). Each metric was computed along each channel and then averaged.

Figure 3 shows that the wavelet denoising method gives the worst results. This is not surprising because, in the presence of heavily noisy fetal ECG signals, the wavelet denoising method is not able to preserve individual variations between ECG complexes and is inclined to distort the signal amplitude, whereas the proposed and existing learning-based methods, DCNN and FCN-DAE, achieve better results. Figure 4 illustrates this phenomenon for a typical signal from our synthetic test dataset. The values of performance metrics before and after denoising are provided in Table 3. It can be seen that the signal output by the proposed method is clearer and closer to the ground truth signal than the signal output by other denoising methods. For this particular example, the SNR result of the proposed method is around 5 dB higher than the SNR result of the second best method, FCN-DAE.

These results clearly show that the proposed method significantly improves the quality of the noisy fetal ECG signals. It preserves morphology, amplitude, and variations among individual fetal ECG complexes, even when most signal channels are severely corrupted. This result is not surprising, and is explained by the loss function (Rq. 5), which explicitly forces the network to capture the best representations of the fetal ECG signal by examining both the noisy fetal ECG and the clean fetal ECG. Moreover, the multi-lead configuration of the input signal enables the network to capture spatio-temporal information in the fetal ECG. These information are beneficial in cases of severely corrupted signal channels since the network has sufficient features to recover each channel.

Figure 4
figure 4

Qualitative comparison of denoising a synthetic signal from the test dataset by the proposed and existing denoising methods. (a) the noisy input fetal ECG, (b) the ground truth fetal ECG. The signal produced by (c) the proposed method, (d) the DCNN method, (e) the FCN-DAE method, (f) the wavelet method. The SNR values before and after applying denoising methods are given in Table 3.

Table 3 Quantitative comparison in terms of SNR of the signal depicted in Fig. 4, before (\(\textit{SNR}_{in}\)) and after (\(\textit{SNR}_{out}\)) denoising.

Evaluation of real dataset

Figure 5
figure 5

Qualitative comparison of the peak detection over the proposed and existing denoising output samples. In (a) the fetal scalp ECG, (b) the noisy input fetal ECG. The output signal by (c) the proposed method, (d) the DCNN method, (e) the FCN-DAE method, (f) the wavelet method. Only the first channels of the output signal are displayed for better visualization.

As previously stated, the denoising performance of the proposed and existing methods cannot be directly measured in the case of real fetal ECG signals, as there are no clean reference signals. Instead, we examine the performance of QRS complexes detection facilitation. To quantify the above performance, we applied the denoising methods and then the Hamilton peak detector to the signal extracted from the r10 recording of the real fetal ECG dataset. Only the r10 recording was selected from the five available because, after applying the extraction method, others were too noisy to allow denoising and peak detection.

Figure 6
figure 6

Performance of the proposed method and the existing methods in case of real fetal ECG signals. The considered metrics are (a) the positif predicted value (PPV), (b) the sensitivity (SE), and (c) the F\(_1\) score. Each metric was computed along each channel and then averaged.

Figure 5 illustrates a comparison of QRS complexes detection on a piece of extracted noisy fetal ECG and on the denoised signals after the application of the proposed and existing methods. A corresponding piece of scalp ECG is also provided. The red dots in the plots indicate the positions of the QRS complexes detected by the Hamilton peak detector, with the exception of those in the scalp ECG signal, which are provided by the database. For reasons of simplicity and visualization, only the first channels of the denoised signals are shown.

Using the QRS complexes detected on the denoised signals by the Hamilton peak detector and those of the scalp ECG provided by the database, we use the performance metrics described by the equations Eqs. (12), (13), (14) to quantitatively evaluate the methods. These metrics are calculated for each channel and for all samples in record r10 in the real dataset. Figure 6 shows the average values of these metrics, while Table 4 depicts in more detail the values of the performance metrics for each channel.

Table 4 QRS complexes detection performance based on Hamilton Peak Detector.

These results indicate that the number of true detected QRS complexes (TP) and the QRS complexes detection errors (FP and FN) are respectively increased and reduced over the signal output by the proposed method and the existing methods, with the exception of the DCNN method. The proposed method performs significantly better than the others, achieving the best scores in all performance measures for QRS complex detection with significant difference scores (around \(29\%\), \(18\%\), \(26\%\) difference in PPV, SE and F\(_1\) respectively) compared to the second best method, the FCN-DAE. The proposed denoising method considerably increases the number of true detected QRS complexes (TP) (over 400 complexes on average) and reduces, on average, the detection errors (FP and FN) by around \(50\%\).

Conclusion

We have presented a method based on adversarial learning for denoising fetal ECG signals. The proposed denoiser network simultaneously takes as input four fetal ECG leads, encodes them into an ECG latent signal representation, and then decodes this latent representation to recover and output clean and reliable fetal ECG leads. Qualitative and quantitative evaluations on synthetic and real data have shown that the proposed method is superior to the existing ECG denoising methods, such as wavelet-based and learning-based methods, for severely corrupted ECG signals with an SNR value higher than \(-\,30\) dB, thus enabling a substantial improvement in the quality of noisy multi-channel fetal ECG signals. All these performances make the proposed method suitable for clinical applications. A future outcome will be the direct denoising of raw abdominal signals, without suppression of the maternal ECG.