Introduction

Fetal heart rate (fHR) analysis is very important during pregnancy because it provides critical information about fetal health such as presence of fetal hypoxia. Currently cardiotocography (CTG) is so called golden standard for fetal monitoring. CTG was the first method used for fetal heart activity monitoring and helped to reduce number of fetal mortality due to mentioned hypoxia1. However, since CTG started being commonly used, the number of caesarean sections performed for presumed hypoxia has increased2. For this reason and also because of ultrasound energy, alternative methods started to be tested such as fetal electrocardiography (fECG)3,4,5, fetal phonocardiography (fPCG)6,7,8 and fetal magnetocardiography (fMCG)9,10,11. Analysis of the fECG is a useful for fHR detection and many adverse factors during pregnancy and childbirth. There is invasive and non-invasive approach to measure fECG. Invasive approach is dangerous and able to perform only during labour. Therefore, it is preferable to measure fECG by non-invasive approach using electrodes placed on maternal abdomen. The signal obtained by this way is called abdominal ECG (aECG) and typically contains a large amount of noise1,12.

Overall, noninvasive fECG extraction is a challenging task that is related to several factors associated with the physiology and technical aspects of signal acquisition and processing. The main problem is the low signal-to-noise ratio (SNR) of fECG component. Compared to maternal signals, these fetal signals are usually weak in aECG signals, which complicates the extraction of a clear fetal signal. In addition, there may be overlap between mECG and fECG signals, leading to contamination and difficulty in distinguishing between the two signals.

Another problem is the correct placement of the electrodes, which is crucial for obtaining aECG signals containing a sufficiently distinct fetal component (there is no standardised distribution). In general, we are talking about the quality of the input aECG signal. This can be difficult due to the small size of the fetus, its position and location. Fetal and maternal movements can also cause artifacts, making it difficult to obtain a stable and reliable signal. It is important to note that fECG signals change with gestational age, so most extraction algorithms need to adjust the input parameters to these changes. All these problems lead to difficulties in automatic determining the quality of the input aECG signal for subsequent processing (in case it is necessary to select a certain number of measured input signals to be processed).

Using multiple electrodes and processing signals from different locations can improve signal quality, but also introduces problems related to spatial resolution. Moreover, sometimes it is preferable to process only one input aECG signal (for example fECG signal extraction via mobile device). To make the processing of fECG signals more accurate, standardized protocols for data acquisition, signal processing and validation metrics need to be developed. However, due to the above problems, this is still an unsolved problem.

Currently, the aim is to test the applicability of fECG for home fetal monitoring. If all the issues could be debugged and HW and SW developed to accurately extract critical information non-invasive fECG could be used in clinical practice. This would mean a more accurate determination of fetal hypoxia compared to CTG, as the fECG signal allows determination of short-term changes in fHR. This means that sometimes there are accelerations in the fHR signal that are physiological, but in the case of CTG, they can be evaluated as a possible problem. In addition, using an accurately extracted fECG, it is possible to perform ST analysis, which can even more accurately determine possible hypoxia. Which could also lead to the replacement of the ST analyzer.

There are already several commercially available devices for measuring non-invasive fECG. These are mainly the Monica AN24 (2012) and Monica Novii Wireless Patch System (2014) (Monica Healthcare Ltd., Nottingham, UK)13, MERIDIAN M110 Fetal Monitoring System (2017) (MindChild Medical, Inc., North Andover, MA, USA)14, and PUREtrace (2017) (Nemo Healthcare, Veldhoven, the Netherlands)15.

Currently, there are a large number of extraction methods. Researchers are still struggling with this problem and try to find a method to reliably and efficiently extract fECG16,17,18,19,20,21,22. The extraction approach can be divided into single-channel and multi-channel with each type of method having its own advantages and disadvantages. The most commonly used methods for fECG extraction are multi-channel and include methods based on blind source separation (BSS) such as independent component analysis (ICA) and Principal component analysis (PCA)1,23,24,25. Furthermore, methods based on adaptive algorithms such as least mean squares (LMS) and recursive least squares (RLS) are also very commonly used26,27,28. However, as far as single-channel methods are concerned, template subtraction (TS) based methods seem to be suitable in terms of simplicity and efficiency. There is a simple variant of the TS method without any template adaptation, but in addition there are several variants based on different approaches to template adaptation.

TS is an fECG extraction method that better suppresses misaligned fetal R-peaks and re-estimates missing individual R-peaks, thereby trying to find overlapping intervals of mECG and fECG signal17. Authors in the study29 explored the use of singular value decomposition (TS\(_{\textrm{SVD}}\)) to create a template. In study30 they focused on predicts an upcoming complex from previous by linear prediction TS\(_{\textrm{LP}}\). The authors in the31 study took a different approach to template creation by using a scaling factor TS\(_{\textrm{SF}}\). Sequential analysis (SA) is an extraction method using a priori information about the maternal R-peaks, where this information is used to detect the mECG signal and create a template by applying averaging and scaling, which leads to improved extraction success rates32. In the study33, the authors compared BSS methods, adaptive filters and also three different TS-based methods namely TS\(_{\textrm{SVD}}\), TS\(_{\textrm{SF}}\), and TS using Extended Kalman Filter (TS\(_{\textrm{EKF}}\)). They came to the results that TS methods achieved an median F1 value of 96.0%, which was lower than using adaptive filtering (97.9%) and BSS methods (99.9%). However, it should be emphasized that they used synthetic data from the fECGSYNDB and discussed that a certain algorithm may work well in some particular cases and fail in some non-stationary cases. The use of different TS-based methods was also mentioned in the study34, which aimed to create a practical guide for noninvasive fECG signals processing. Other methods that use a template include for example Dynamic time wrapping (DTW). This method takes into account the diffeomorphism of each period by adjusting the subtraction template12. Template-matching (TM) approach is aimed at localizing the fECG R wave that overlapped with mQRS. This method uses several principal components from the multi-QRS subspace decomposition using SVD to construct the mQRS and fQRS templates35. Group of BSS methods are used to estimate or separate mECG from the sensor without knowing the characteristics of the transmission signal, where the ICA approach is very often used36. However, these methods require multichannel signal sources. The non-negative matrix (NMF) factorization method is used to separate fECG using activation scaling by scaling a specific row of the activation matrix, the signal of interest can be emphasized from the mixed signal37. The use of a time-frequency analysis combining the fractional Fourier transform (FrFT) and the discrete wavelet transform (DWT), called FrFT-DWT, for fECG extraction is discussed in the study38. Last but not least, the Stockwell transform (ST) method is used to represent the signal in the time-frequency domain. It is an extension of the short-time Fourier transform with a Gaussian window with scalable width. The identification of the maternal R-peaks uses a time-frequency domain mapping converted into a one-dimensional unipolar signal39.

In addition to the fECG extraction and signal filtering, R-peaks detection is a very important part of fHR estimation. This step is very critical because even with very accurate fECG extraction, the detection alone can invalidate the entire result. Currently, a detector based on the continuous wavelet transform (CWT) is considered as a very accurate40,41. However, many experiments show that the main influence on CWT detector efficiency is the correct choice of the maternal wavelet type.

Efficiency of TS based methods varies greatly depending on the input aECG signals, so in this paper we will focus not only on a comparison of TS based methods used for mECG elimination/reduction, but also on performing the experiment on different databases containing real recordings. As we mentioned, the final estimated fHR depends also on used R-peak detectors, so influence of maternal wavelet type on CWT detector accuracy will be performed. Main contribution of this study include:

  • Comparative analysis of maternal wavelet used for CWT detector.

  • Comparative analysis of TS based methods.

  • Experiment on real datasets with reference annotations.

  • Determination of suitable input aECG signals from each recording for fECG extraction.

The rest of the paper is organized as follows: “Material and methods” will provide state of the art about TS based methods. “Proposed methodology” will include the materials and methods used along with the methodology of the experiment conducted. “Results” will contain the results of the experiments along with the resulting method comparison. Discussion and conclusion will be presented in the “Discussion” and “Conclusion”.

Material and methods

Extracting the fECG signal using TS methods has been shown many times to be very accurate. However, a comparison of the different TS-based methods has not yet been sufficiently performed to demonstrate which one is appropriate. A big problem is the test run on only one recording or one type of dataset, because the experiment result may turn out differently on another dataset. This means that the accuracy of extraction methods is strongly dependent on the input signals used. For this reason, two different datasets containing real signals were used in this study. Each recording of the used datasets contains four aECG signals, so we will also focus on testing the extraction result on each signal and determine suitable one. Furthermore, as already mentioned, the extraction result itself is strongly dependent on the detector used or its setup, therefore in this work an experiment was performed focusing on the selection of a suitable mother wavelet for a very efficient CTW detector.

Dataset

We used signals from two real datasets available on a public server, and were recorded under clinical conditions as part of research projects at the Department of Obstetrics and Gynecology of the Medical University of Silesia in Katowice, Poland. Research was approved by the University’s Bioethics Committee (Commission approval number NN-013-345/02). The subjects read the informed consent and gave a written consent to participate in the study. The datasets analysed during the current study are available in the figshare repository integrated with Scientific Data Journal, detailed information could be found in Refs.42,43.

These datasets are consisting of four aECG signals that were obtained by non-invasive measurement (Ag/AgCl electrodes were placed on the maternal abdomen). All signals were recorded as part of research projects at the Department of Obstetrics and Gynecology of the Medical University of Silesia in Katowice, Poland. The recording of the signals was always supervised by qualified trained medical personnel43. Both datasets are annotated with the exact positions of the fQRS complexes, which were determined by the authors using automatic detection of R-peaks and verified by clinical experts.

The signals from both datasets were digitized with 16-bit resolution and a sampling rate of 500 Hz. All captured aECG signals were preprocessed using a filter with multiple notches located every 50 Hz. To eliminate low frequency interference, the cutoff frequency was set at 5 Hz and to eliminate power line interference, the cutoff frequencies were set between 45 and 55 Hz. Labor dataset contains 12 recordings of 5 min in length originating from women between 38th and 42nd week of pregnancy taken in an advanced stage of labor. Pregnancy dataset contains 10 recordings of 20 min in length originating from women between 32nd and 42nd weeks of pregnancy. Figure 1 shows samples of the r1 recordings from both datasets.

Figure 1
figure 1

Sample aECG signals from the datasets used for the experiments.

Evaluation parameters

In this work, the objective evaluation is performed by calculating the accuracy of R-peaks detection. In order to compute the accuracy parameters, we first need to extract the fECG signal and estimate the R-peaks positions in it. Furthermore, the datasets under test need to have reference annotations of the correct R-peaks positions determined by experts. Subsequently, the parameters true positive (TP), false positive (FP) and false negative (FN) are determined. Detected R-peaks in the extracted signal that are within ± 50 ms of the reference annotations are marked as TP. FP is defined as detected R-peaks in the extracted signal that fall outside the mentioned interval. Finally, missed R-peaks that should have been detected in the mentioned interval but were missing are determined as FN. After determining these parameters TP, FP and FN, it is possible to calculate sensitivity (SE) using Eq. (1), calculate positive predictive value (PPV) using Eq. (2) and calculate F1 score using Eq. (3)44,45,46,47.

$$\begin{aligned} SE&= \frac{TP}{TP+FN} \cdot 100. \end{aligned}$$
(1)
$$\begin{aligned} PPV&= \frac{TP}{TP+FP} \cdot 100. \end{aligned}$$
(2)
$$\begin{aligned} F1&= 2 \cdot \frac{SE \cdot PPV}{SE+PPV}. \end{aligned}$$
(3)

Proposed methodology

In this subsection, each significant parts of the experiment will be described in more detail. All methods were performed in accordance with the relevant guidelines and regulations. Figure 2 shows the procedure of the experiment conducted in this study. The demonstration is performed for a single recording from used datasets containing four input aECG signals. The whole experiment can be divided into several steps:

  1. 1.

    Preprocessing of input aECG signals (four signals are always used for a single recording from used datasets).

  2. 2.

    Detection of maternal R-peaks from input aECG signals (use of PCA, rules and CWT detector).

  3. 3.

    Input signal selection for further processing (processing sequentially all four input aECG signals measured by the electrodes AE\(_1\)–AE\(_4\)).

  4. 4.

    Extraction of fECG signal using TS method (using one type of TS-based method, because the experiment was always performed for each method separately).

  5. 5.

    Detection of fetal R-peaks from extracted fECG signal (labeled as fECG\(_{\text{i}}\) in the flowchart because it depends on the input aECG signal being processed) using CWT detector.

  6. 6.

    Evaluation by F1 score, which indicates the harmonic mean between sensitivity and positive predictive value, and storing the result.

  7. 7.

    Repeating steps 3–6 for the remaining aECG signals.

  8. 8.

    Determine the input aECG signal with the highest F1 value from the tested input aECG signals.

Figure 2
figure 2

Flowchart of experiment used for fECG extraction (AE\(_1\)–AE\(_4\) are active electrodes, AE\(_0\) is reference electrode, and N is active ground).

Thus, the experiment was performed repeatedly for each tested TS-based method for subsequent comparison of their performance. In the next subsection a description of the important parts of the presented experiment will be made: Preprocessing, Maternal R-peaks Detection, and Template Subtraction.

Preprocessing

Both technical and physiological interferences are present in aECG measurements. The physiological interface is associated with manifestations of the organism, such as motion artifacts (at high frequencies), breathing activity (at low frequencies) or signal interference from other biological signals. On the other hand, technical interference mainly includes power interference (50 Hz; 60 Hz). However, apart from these artifacts, the biggest problem (artifact) during fECG extraction is the maternal signal, which is several times larger in amplitude than the fetal signal. In addition, the spectrum of mECG overlaps with fECG, making fECG extraction more complicated. The main frequency of maternal QRS (mQRS) complexes lies in the range of 0.5–35 Hz and the main frequency of fetal QRS complexes lies in the range of 10–15 Hz17,43,48,49.

In this study for preprocessing we have chosen finite impulse response (FIR) filter. Since the data was bandstop filtered (45–55 Hz) and highpass filtered (5 Hz) by the dataset authors50,51, we only used the bandpass filter. Considering the aforementioned frequency band of fetal QRS complexes and dataset authors filtering, we used a band of 5–70 Hz and a filter order of 500.

Maternal R-peaks detection

TS-based methods require accurate determination of the maternal R-peak positions, because without this step it is not possible to create a template for adaptation and subsequent subtraction. This step is very important because if maternal R-peak positions are inaccurately determined, it introduces a large error into the extraction process itself. The datasets used have annotations regarding the exact maternal R-peak positions established by experts. However, we could not use them for our purposes because in practice, when measuring and analyzing signals, we do not have information about the exact positions of the maternal R-peak positions, so we need to determine them.

The algorithm for detecting and determining maternal R-peak positions is based on the following procedure. Since we always have four input aECG signals for our experiment, we can perform a more accurate maternal R-peaks detection. First, all aECG signals are used as input to PCA to find the main source signals and eliminate the problem of poor input selection where the detection would not be accurate. PCA is a dimensionality reduction technique and its primary goal is to transform a dataset with potentially correlated variables into a new set of uncorrelated variables, known as principal components. These components are linear combinations of the original variables and are ordered by the amount of variance they capture in the data1.

Subsequently, the CWT detector is used to detect R-peaks. The CWT detector is based on the decomposition of the signal by CWT to the 5th level. Subsequently, a search for local minima and maxima in the received signal after CWT is performed. Further, the adjustment of the searched local minima and maxima is performed using adaptive thresholding. Finally, zero-crossing detection is performed between the adjusted local minima and maxima that are separated by a maximum of 120 ms (modulus pair). The last modification is to find the maximum (R-peak) in the neighborhood of the detected zero passes52,53,54.

CWT detector is applied at the first and second outputs of the PCA method. This is because these two outputs have the highest energy and by using both of them, we avoid the problem of having only the fetal signal without the maternal component in the first estimated signal. Then, algorithm decides which of the PCA outputs provided the smaller number of R-peaks and this is selected as the correct one. The determined positions are stored and prepared for TS based methods.

We subsequently checked the accuracy of the maternal R-peaks detection against the reference annotations provided for the databases. We used the F1 score determination for the evaluation and achieved an accuracy of 99.75%, confirming that the proposed algorithm is sufficient for the purpose of this study. Minor inaccuracies are not caused by bad algorithm and rules, but by the quality of the input signals and the functionality of the CWT detector. Regarding the CWT detector settings, the maternal wavelet gaus1 was chosen.

Template subtraction

The TS method is simple and effective single-channel fECG extraction method. Figure 3 shows a diagram of the TS method functionality. At the beginning of the TS method, it is necessary to detect the positions of R-peaks in the input aECG signal. Then, based on these positions, individual mQRS complexes are cut out (0.25 s to the left and 0.45 s to the right of the determined R-peaks). Subsequently, a template is created by median of all received mQRS complexes17. Finally, a template subtraction is performed at all locations where maternal R-peaks were originally detected. This removes the maternal signal from the input aECG signal, leaving ideally only the fECG signal.

There are many variations of this method aimed at template adaptation. This means that unlike the classical TS method described above, which takes the template and subtracts it at the individual locations of the mQRS complexes, it additionally adapts the shape of the template to the actual mQRS complex to be subtracted. This will greatly increase the accuracy of the estimated fECG single. The selected template-based methods for this study are described below:

  • Template substraction using singular value decomposition (TS\(_{\textrm{SVD}}\)) SVD is a factorization of certain input matrix into a matrix U, \(\Sigma\) and V, where U and V are orthonormal matrices and \(\Sigma\) is a zero matrix except for possible non-negative numbers on the main diagonal (these numbers are called singular values of the input matrix). The disadvantage is that the computational complexity of constructing the singular decomposition increases with the third power of the dimension of the matrices. The TS\(_{\textrm{SVD}}\) method estimates matrix U from matrix of detected mQRS complexes with selected number of source components, see Eq. (4) for SVD calculation. This matrix U is then used to create the template TECG relative to the actual \(mQRS_i\) complex from the input aECG signal to be subtracted, see Eq. (5)29.

    $$\begin{aligned} SVD&=U \cdot \Sigma \cdot V^T. \end{aligned}$$
    (4)
    $$\begin{aligned} TECG&=mQRS_i \cdot (U \cdot U^T). \end{aligned}$$
    (5)
  • Template substraction using linear prediction (TS\(_{{LP}}\)) This method uses linear prediction to determine the template (predicts an upcoming complex from previous). The template is constructed by weighting the previous cycles to minimize the root mean square error (unlike other TS-based methods where the weights of each cycle are the same). In order to adapt the TECG template to the actual \(mQRS_i\) complex, this method uses the Eqs. (6) and (7), where \(mQRS_i\) is the actual mQRS complex, mQRS is a matrix whose rows are the individual mQRS complexes and vector \(\lambda\) are contains weights30.

    $$\begin{aligned} \lambda&=(mQRS^T \cdot mQRS)^{-1} \cdot mQRS^T \cdot mQRS_i. \end{aligned}$$
    (6)
    $$\begin{aligned} TECG&=\lambda \cdot mQRS. \end{aligned}$$
    (7)
  • Template substraction using scaling factor (TS\(_{{SF}}\)) This method is based on determining a scaling factor for template adaptation. After preparing the template using the median, Eq. (8) is used to calculate the scaling factor for the actual complex \(mQRS_i\). Then, according to Eq. (9), the TECG template is adjusted to the actual complex \(mQRS_i\) and used for subtraction. The scaling reduces the discrepancy between the average and true mQRS complex, which is affected by the time-varying morphology of mECG signal31.

    $$\begin{aligned} a&=(TECG^T \cdot TECG)^{-1} \cdot TECG^T \cdot mQRS_i. \end{aligned}$$
    (8)
    $$\begin{aligned} TECG&=\alpha \cdot TECG. \end{aligned}$$
    (9)
  • Sequential analysis (SA) SA is based on TS\(_{{SF}}\) method and focused on scaling procedure improvement. Scaling is not performed on entire mQRS complexes, but separately scales the P wave, QRS complex and T wave. In this way, the temporal variability of the morphology of the mECG signal is considered. The prepared template TECG is divided into P wave (0–0.2 s of template), QRS complex (0.2–0.3 s of template) and T wave (0.3–0.7 s of template). Scaling factors \(a_p\), \(a_{QRS}\) and \(a_T\) are then determined for each segment using Eq. (8), which are used to adapt the template (using Eq. (9)) before subtracting the actual \(mQRS_i\) complex32.

Figure 3
figure 3

Block diagram of TS method functionality.

Results

The whole experiment on real data from the Labour and Pregnancy datasets was performed on all signals of each recording. The MATLAB R2023a programming language was used. In the first part of the results, the effect of the used maternal wavelet on the accuracy of R-peak detection was tested. All extracted signals were successively used as input to the CWT detector where the maternal wavelets were tested:

  • Biorthogonal: bior1.1, bior1.3, bior1.5, bior2.2, bior2.4, bior2.6, bior2.8, bior3.1, bior3.3, bior3.5, bior3.7, bior3.9, bior4.4, bior5.5, bior6.8

  • Coiflet: coif1–coif5.

  • Daubechies: db1–db45.

  • Fejer-Korovkin: fk4, fk6, fk8, fk14, fk18, fk22.

  • Gaussian: gaus1–gaus8.

  • Reverse biorthogonal: rbio1.1, rbio1.3, rbio1.5, rbio2.2, rbio2.4, rbio2.6, rbio2.8, rbio3.1, rbio3.3, rbio3.5, rbio3.7, rbio3.9, rbio4.4, rbio5.5, rbio6.8.

  • Symlet: sym1–30.

A total of 124 different maternal wavelts were tested. To summarize the extraction efficiency, the mean of all F1 obtained on all signals of one dataset was then performed. Subsequently, the same was done for the second dataset. Finally, a similar test was performed for both datasets together. From the received table, which had 124 rows, only a part of the best results was selected. Other results can be found in the supporting material. These results can be seen in Table 1, where the highest value in a given column is highlighted in bold. From this table it can be deduced that the wavelets with a lower width index achieved better results than those with a higher index. This implies that narrower wavelets are preferable for R-peak detection. It can also be seen that the Gaussian family wavelets achieved the highest accuracy, with the gaus2 wavelet achieving the highest accuracy for the Labour dataset and then the gaus3 wavelet for the Pregnancy dataset. Moreover, the gaus3 wavelet achieved the highest accuracy when the experiment was performed on both datasets together.

Table 1 Effect of maternal wavelet selection for CWT detector on R-peak detection accuracy.

Based on the initial experiment with the influence of the maternal wavelet, it was decided to use the gaus3 wavelet for the rest of the experiment. Tables 2 and 3 then show the results of the accuracy of R-peak detection based on the F1 determination on each dataset. Using each TS-based method, fECG signal extractions were performed on all signals of each recording of both datasets. For clarity, results greater than 90% (high) are highlighted in bold and results less than 80% (low) are highlighted in italic. The remaining results in the 80–90% interval have been left in black (medium). These results indicate which signals of each recording are applicable for fECG signal extraction, and also which TS-based method provides the highest accuracy.

Table 2 Results of the accuracy of R-peaks (F1) determination from extracted signals using individual tested TS-based methods on the Labour dataset (using the gaus3 maternal wavelet).
Table 3 Results of the accuracy of R-peaks (F1) determination from extracted signals using individual tested TS-based methods on the Pregnancy dataset (using the gaus3 maternal wavelet).

From Tables 2 and 3, we can see that using each TS-based method, the recordings were labeled the same in most cases (high, medium, and low). When we take a closer look at the difference in accuracy between the methods for a particular channel of a recording, we can see that in most cases there is less than 5% difference between tested methods. However, in four cases the accuracy difference between tested methods was greater than 10% (Labour dataset, recording 4, channel 3; Pregnancy dataset, recording 1, channel 4; Pregnancy dataset, recording 9, channel 1, and Pregnancy dataset, recording 9, channel 4), in one case greater than 20% (Labour dataset, recording 10, channel 1) and in one case even greater than 50% (Pregnancy dataset, recording 10, channel 2). It can be seen that these were channels of recordings that achieved low accuracy for all methods. Thus, it can be concluded that there is no significant difference between the tested TS-based methods for the channels that achieved high accuracy. For the channels that achieved low accuracy, it can be hypothesized that some TS-based methods can extract the fECG signal better (at least to some degree) for these signals.

From the above Tables 2 and 3, a summary Table 4 was created, which contains for each TS-based method only the best result from each signal. The individual rows of this table contain the tested recordings, where the last row denotes the mean extraction accuracy using each method. It can be seen from the table that the SA method achieved the highest accuracy on both datasets. However, performance may vary depending on the specific recording, indicating the importance of accounting for individual differences in the fECG signal extraction process. Nevertheless, it should be noted that the results of other TS-based methods did not achieve statistically significantly lower accuracy. When mean over all recordings of both datasets simultaneously, the TS method achieved F1 = 95.71%, the TS\(_{\textrm{SVD}}\) method F1 = 95.93%, the TS\(_{\textrm{LP}}\) method F1 = 95.30%, the TS\(_{\textrm{SF}}\) method F1 = 95.82% and the SA method F1 = 95.99%. When we look at the difference in accuracy between the methods for individual recordings in this Table 4, we can see that it ranges from 0.4 to 3%. The largest difference was for recording 2 of the Pregnancy dataset.

Table 4 Highest extraction accuracies within individual recordings of both datasets for the tested TS-based methods.

Discussion

The quality of the input signal has a great influence on the resulting extraction. The main factor that affects the resulting signal quality is the arrangement of the electrodes and their correct mounting. Poor electrode placement results in noise that could affect the fECG signal and its resulting extraction. The success of extraction may also be affected by the gestation age and its position in the pregnant woman’s abdomen, as the fHR changes during development. Recordings that contain aECG signals with substandard quality produced low fQRS complex detection accuracy. This was due to the fact that the level of the fetal component was very low compared to the maternal component and in some cases even invisible. Some signals also suffered from noise. For these signals, effective extraction is almost impossible, and therefore it is important to pay close attention to the correct positioning of the sensing electrodes and the setup of the measurement system when acquiring them. Table 5 shows for each recording of both datasets which input aECG signals have low, medium and high quality. This table is intended to help the future authors in selecting input signals for the extraction methods and can also serve as a check for automatic classifiers of input aECG signals based, for example, on evaluation using input signal quality index (SQI) parameters. This table was created based on the results from Tables 2 and 3, where the criteria were determined as follows:

  • Signals with low-quality: F1 = 0–80%.

  • Signals with medium-quality: F1 = 80–90%.

  • Signals with high-quality: F1 = 90–100%.

Table 5 Determination of the quality of input aECG signals for subsequent fECG signal extraction.

The biggest problem with signals marked as low quality was frequency noise, which was present in the signal despite filtering with a band-pass FIR filter with cutoff frequencies of 5–70 Hz (see Fig. 4a). The data used were also filtered by the dataset authors themselves as mentioned in the section describing the datasets used. However, in the signals it was at least possible to see that the removal of powerline interference was effectively done. Furthermore, there was no isolinear fluctuation of the signals because even these low frequencies were effectively removed by the authors of the datasets.

Unfortunately, the detection of low efficiency, for some of the signals we identified as low quality, was due to the measurement of aECG signals with no visible fECG signal (see Fig. 4b). As already mentioned, this could have been a problem with the fetal position. In particular, we would talk about signals where only one or two signals were marked as low quality and the others as medium quality or high quality. However, for recording r3 from the Labour dataset and r10 from the Pregnancy dataset, most of the signals were marked as low quality. Here, all signals and therefore the whole of both recordings were under-measured, which could be due to different reasons.

Figure 4
figure 4

Example of low quality aECG input signals leading to insufficient fECG signal extraction.

Furthermore, it was very interesting to analyze the effect of applying the TS-based method on the amplitude of fetal R-peaks. In fact, TS-based methods have the additional problem that fetal R-peaks can be partially removed during maternal subtraction. Therefore, the amplitudes of fetal R-peaks were determined for all input aECG signals of both datasets using reference annotations. Subsequently, amplitude averaging was performed for each input aECG signal. The same was done for the extracted signals, again using the reference annotations to eliminate the effect of minor inaccuracies of the CWT detector used. For this analysis, we used the extracted fECG signals using the SA method because it achieved the best result according to our study. The mean change in amplitude of the fetal R-peaks can then be seen for the Labour dataset in Table 6 and for the Pregnancy dataset in Table 7. The tables show that the assumed amplitude change is present in the extracted signals. In most cases, it was a change in amplitude of a few microV. However, for some signals, and especially from the Pregnancy dataset, it can be seen that in some cases it was a change in amplitude of practically half.

Table 6 Mean change in fetal R-peak amplitude in extracted fECG signals relative to input aECG signals (labour dataset).
Table 7 Mean change in fetal R-peak amplitude in extracted fECG signals relative to input aECG signals (Pregnancy dataset).

The following Table 8 compares the results of the proposed method with studies dealing with single-channel fECG signal processing. It is obvious from this table that it is very difficult to accurately compare the results obtained in the study. This is because different studies use different evaluation parameters, datasets or even approaches to fECG single extraction. The results obtained in our study were in most cases higher (or comparable) than other studies focusing on single-channel signal processing methods. For comparisons, studies that used at least one of the ACC, SE, PPV and F1 evaluation parameter were mainly selected. Higher accuracy was achieved in two of the studies mentioned38,39. However, upon closer examination of the experiment performed by the authors of these studies, it can be seen that they only used signals from the tested datasets that provides good results. For example, for a similar dataset ADFECGDB, which is basically an older version of the Labour dataset, authors selected only the input aECG channels from 5 recordings that have high quality signals. These signals have high quality also according to Table 5 in this study. It is therefore clear that in this study the resulting accuracy is reduced due to the results from the lower quality recordings.

Table 8 Results comparison with studies focused on singlechannel fECG extraction.

Figure 5 shows examples (first 5 s of signals) of successful extractions of fECG signals using the SA method. For both subfigures there are always samples of the input aECG signal (grey waveform) and the subsequent extracted fECG signal (black waveform) in the upper graph. The second graphs from the top shows the calculated fHR from the reference annotations (grey plot) and using the detected R-peaks in the extracted fECG signal (black plot). In these plots, slight deviations (peaks) from the reference can be seen in the estimated fHR, which is due to a slight shift of the detected R-peaks. In the following graphs, a moving averaging with a window length of 5 samples is applied to the estimated fHR, which removes the mentioned peaks and preserves the trend of the fHR with respect to the reference. In the last graphs, a subtraction of the estimated fHR from the reference fHR is performed to show the error signals. These error signals can be seen to have a small amplitude and hence just a small deviation from the reference.

Figure 5
figure 5

Demonstration of fECG signal extraction using SA method and subsequent fHR estimation.

The results of this study could be higher if some signal smoothing method is used as a final step55 or if an optimization technique is used56. However, in this study, the focus was primarily and only on the comparison of various different TS-based methods in fECG signal extraction. The result of this study can be used in the design of an efficient hybrid system in which the SA method would be used as the main part of the extraction system.

Future research will focus on testing new signal processing methods that can be used as a sub-part of a hybrid system. Along with testing methods, the aim will be to test new optimisation algorithms, especially those inspired by nature. Much attention will also be paid to testing single-channel signal processing methods. However, a major problem is the selection of input signals, so the simultaneous research goal focuses on SQI testing. The goal is to develop a system that evaluates whether the input aECG signal is suitable for extraction, contains a enoungh visible fetal signal and does not have too much noise. This problem has been addressed by many authors, but no system exists (that achieve accuracy approaching almost 100%) yet that automatically evaluates input aECG signals. Promising results were achieved in the study57, where they used a supervised machine learning approach for automatic selection. Their results were very interesting when they performed an experiment on 10336 5-second signal segments obtained from a real data set of multi-channel transabdominal recordings obtained from 55 volunteer pregnant women between 21 and 27 weeks of pregnancy. They achieved an accuracy of over 86% and more than 88% of the channels marked as informative were correctly identified.

Next, attention will be paid to the multichannel determination of fHR. That is, when performing single-channel fECG signal extraction on multiple input signals, multiple fECG signals are received. The goal will be to separately detect R kmits from these signals and then compare them with each other using different methods to achieve more accurate fHR estimation. Alternatively, the goal will be to adjust the fHR or multiple detected fHR curves from the extracted fECG signals. In summary, the goal will be to achieve the most accurate estimate of fHR relative to the reference when multiple extracted fECG signals can be used. Another aim of the research will be to perform morphological analysis, i.e. analysis of ST segment, QT interval length, etc. The mentioned segments and lengths are very important sources of information about the health status. A major advantage of TS-based methods is that they do not interfere with the morphology of the extracted fECG signal. In fact, a large number of classical signal processing methods such as WT or EMD have the problem of morphology corruption. This means that TS-based methods can be considered as suitable in terms of the possibility to perform ST analysis.

Conclusion

This study dealt with fECG signal processing using TS-based methods. For experiments, two datasets containing real signals including annotations were used: Labour dataset and Pregnancy dataset. The aim of the study was to compare several methods (TS, TS\(_{\textrm{SVD}}\), TS\(_{\textrm{LP}}\), TS\(_{\textrm{SF}}\) and SA) with each other and to determine the accuracy achieved on the individual signals of the datasets used. In addition, many types of maternal wavelets used for the CWT detector were tested to see what effect this has on the detection accuracy. From the testing it was evident that the best performance was achieved using the Gaussian family of wavelets and the best result was achieved using the gaus3 maternal wavelet. The accuracy of the selected methods was evaluated by determining the statistical parameter F1. The highest mean extraction accuracy on the two datasets used was achieved using the SA method (F1 = 95.99%). In addition, the quality/usability of the input signals of the individual recordings of the datasets used in this work was determined. This work supports the claim that TS-based methods are suitable for fECG extraction. Based on their effectiveness, these methods could be used in the future as part of hybrid systems. Combined these methods with another signal processing method and taking its advantages, even higher fECG signal extraction accuracy could be achieved (Supplementary Informations S1 and S2).