Introduction

Remote photoplethysmography (rPPG) represents a significant leap forward in the realm of biosensing technologies, offering a non-invasive and cost-effective means to monitor vital signs such as heart rate (HR) and HR variability (HRV)1. This innovative approach uses ubiquitous RGB cameras found in everyday devices, making vital sign monitoring more accessible than ever before.

The potential of rPPG in enabling remote health monitoring has garnered significant interest, especially in telemedicine and personal health tracking domains2. Its applications span from clinical patient monitoring to consumer health, showcasing its versatility. Recent research in rPPG has explored the evaluation of red, green, and blue channels for heart rate detection3, the development of less complex methods for improved heart rate measurement via rPPG4, the evaluation of biases in rPPG methods5, and investigations into the effectiveness of various rPPG methods in different settings6,7. Additionally, there have been studies on the use of machine learning for blood pressure detection using rPPG8,9. However, the accuracy of rPPG is often compromised by artifacts, primarily due to motion and external light interference, which significantly impact signal quality10. This presents a substantial challenge, especially in scenarios where precision and reliability are crucial.

Signal quality indices (SQIs) have thus become critical in the field of biosensing for assessing the integrity of biosignals11. These indices, including the widely recognized signal-to-noise ratio (SNR), offer a quantitative measure of a signal’s reliability, making them essential tools in the evaluation of rPPG signals12. They help in distinguishing high-quality signals from those corrupted by noise and artifacts, ensuring the accuracy of health monitoring.

In recent years, the clinical relevance of conventional PPG measurements has attracted significant research interest13,14. Despite this growing attention, a comprehensive understanding of the most effective SQIs for rPPG, especially within the context of mobile health applications, remains a challenge. Addressing this gap, our study investigates eight distinct SQIs previously examined for conventional PPG signals15. We aim to simplify (reduce complexity) real-time analysis by identifying a singular SQI capable of capturing high-quality video for accurate heart rate detection and cardiac assessment. Additionally, we focus on establishing a practical threshold for this SQI. This is intended to facilitate its immediate and straightforward implementation on smartphones, wearables, or other portable devices, enabling efficient real-time video analysis in rPPG—a task that has traditionally been characterized by its complexity and high resource demands.

Results

In Fig. 1, we can see that the POS rPPG method produces a value of zero in the chart with a 1-s window. This happens because this method doesn’t provide any useful signal when used with such a small window.

Fig. 1: Impact of window size on rPPG method accuracy.
figure 1

This figure displays the average absolute beats-per-minute difference (|ΔBPM|) across all activities and subjects for various remote photoplethysmography (rPPG) methods, as the window size increases from 1 to 3 s (from left to right). |ΔBPM| serves as a key metric for evaluating the precision of each rPPG technique over short analysis intervals. The data was obtained from the PURE dataset. ΔBPM beats-per-minute difference, rPPG remote photoplethysmography.

We are conducting an investigation involving the CHROM, GREEN, and OMIT rPPG methods to understand how the average change in heart rate (ΔBPM) behaves as we use larger window sizes. You can find this information in Fig. 2.

Fig. 2: Comparison of rPPG method performance across PURE and LGI-PPGI datasets.
figure 2

This figure illustrates the average absolute beats-per-minute difference (|ΔBPM|) for three remote photoplethysmography (rPPG) methods: CHROM, GREEN, and OMIT, across all activities and subjects within the PURE (a) and LGI-PPGI (b) datasets. The analysis covers a range of window sizes from 1 to 12 seconds. |ΔBPM| represents the difference in beats per minute, serving as a measure of accuracy for each rPPG method.

In Fig. 2, we observe a significant drop in heart rate from 1 to 3 s. This drop aligns with our expectations and was already visible in the previous figure. Interestingly, increasing the window size beyond 5 s does not seem to significantly affect the accuracy of heart rate prediction. Furthermore, when we use a window size of 3 s, our predictions are already very close to the lowest possible accuracy.

It is worth noting that in a prior study that focused on PPG (not rPPG), researchers found that the best window size for assessing the quality of a PPG waveform was 2 s. This 2-s window size was able to distinguish between recordings that were considered ‘Excellent’ and those that were ‘Unfit’.

To further enhance the clarity of our analysis, we conducted a labeling analysis using DTW and the coefficient ‘r.’ This analysis spanned a window size range from 3 to 12 s. The outcome of this process was the classification of rPPG signals into three distinct categories: ‘Unfit,’ ‘Acceptable,’ and ‘Excellent.’ Each classification was assigned to specific rPPG methods, window sizes, and datasets.

To streamline our analysis, we made the decision to exclude window sizes smaller than 3 s. This choice was informed by the data presented in previous figures, which unequivocally demonstrated that none of the obtained signals met the predefined criteria for being labeled as ‘Acceptable.’ Consequently, all signals falling within this range were automatically categorized as ‘Unfit,’ rendering the associated data irrelevant for our intended purpose.

For larger window sizes, we still observed a preponderance of signals in the ‘Unfit’ class. However, certain rPPG methods, such as LGI, GREEN, and OMIT, exhibited a more balanced distribution between ‘Excellent’ and ‘Acceptable’ signals. This balance became more pronounced as the window size increased, as evident in Fig. 3.

Fig. 3: Distribution of reconstructed signals by rPPG methods across different window sizes.
figure 3

This figure presents a comparison of the signal distributions obtained from eight remote photoplethysmography (rPPG) methods, utilizing window sizes of 4, 8, and 12 s, respectively. Each panel illustrates how the labeled signals are reconstructed under the specified window size, providing insights into the variability and consistency of each rPPG method’s performance. In the leftmost figure, we have a total of 75 Excellent signals, 35 Acceptable signals, and 362 Unfit signals. In the central figure, we have a total of 125 Excellent signals, 74 Acceptable signals, and 273 Unfit signals. In the rightmost figure, we have a total of 143 Excellent signals, 85 Acceptable signals, and 245 Unfit signals.

The subsequent phase of our analysis involved evaluating the effectiveness of SQIs in classifying the quality of rPPG signals. To achieve this, we employed four classifiers and utilized leave-one-out cross-validation (LOOCV) for three specific pairwise comparisons: ‘Excellent’ vs. ‘Unfit,’ ‘Unfit’ vs. ‘Acceptable,’ and ‘Excellent’ vs. ‘Acceptable.’

It is worth noting that, in some cases, the three classes exhibited imbalances, and in certain instances, one or more classes had no samples at all. This necessitated a careful consideration of the feasibility of conducting the aforementioned comparisons. For example, when examining Fig. 3, particularly for the case with 4-s window size, it became evident that none of the comparisons for signals derived from the POS, ICA, PBV, and PCA methods would yield statistically significant results. This was due to the ‘Excellent’ class either being empty or containing only a single sample, which is insufficient for reliable LOOCV.

Conversely, when we examined the class distribution for the 8-s window size case, we found that all three comparisons would generate meaningful results for seven out of eight rPPG methods. Our analysis encompassed a comprehensive investigation of rPPG signals, including labeling, SQI-based classification, and considerations of class distribution.

Considering the eight rPPG methods as individual annotators for labeling the signals, we calculated the inter-rater agreement as the percentage of signals labeled in the same class across all rPPG methods. For the PURE dataset, the highest agreement occurred with a 3-s window size (0.40), while for the LGI-PPGI dataset, the highest agreement was observed with a 6-second window size (0.47). Combining the datasets, a window size of 5 s resulted in the highest agreement (0.41) among the eight rPPG methods.

Additionally, we used the kappa statistic to assess the agreement between the eight rPPG methods in the context of three rPPG quality classifications. The average inter-observer pairwise Cohen’s kappa coefficient (Cohen’s kappa is defined as: \(k=\frac{{P}_{A}-{P}_{E}}{1-{P}_{E}}\), where PA is the observed agreement among raters (proportion of items on which they agree), and PE is the expected agreement (probability that two annotators would agree by chance).) between the CHROM and GREEN methods was k = 0.37, indicating moderate agreement. Furthermore, we employed Fleiss’ kappa coefficient (Fleiss’ kappa is defined as: \(k=\frac{{P}_{\rm{obs}}-{P}_{\rm{rnd}}}{1-{P}_{\rm{rnd}}}\), where Pobs represents the observed agreement among raters, calculated as the proportion of all assignments classified into a particular category, summed over all categories and all raters, and Prnd is the probability of chance agreement.) to investigate agreement among more than two methods. The average result was k = 0.17 for the CHROM, GREEN, and OMIT methods, indicating fair agreement.

Based on the results of the kappa agreement, we tested the ability of SQIs to classify the quality of rPPG signals obtained through the GREEN, CHROM, and OMIT rPPG methods individually. We focused on the window size with the highest agreement, which was 3 s for the PURE dataset and 6 s for the LGI-PPGI dataset.

To assess the performance of SQIs in evaluating rPPG signal quality, we conducted leave-one-out cross-validation (LOOCV) using four supervised learning classifiers: support vector machine (SVM), linear discriminant analysis (LDA), decision tree classifier (TREE), and logistic regression (LOGI).

We reported Sensitivity (SE), Positive Predictivity (PP), the F1 score (F1), and the Overall F1 score (OF1) of the four classifiers after LOOCV on the normalized SQIs in Table 1 (PURE Dataset) and Table 2 (LGI-PPGI Dataset). The results are ordered in decreasing order of the OF1 score, which is the mean value of the F1 scores from the four classifiers. Notably, the gold standard PSQI generally did not exhibit discriminative power for the three comparisons. In contrast, the NSQI outperformed all other SQIs for the ‘Excellent’ vs. ‘Unfit’ and ‘Excellent’ vs. ‘Acceptable’ comparisons in both datasets. For the ‘Unfit’ vs. ‘Acceptable’ comparison, the KSQI emerged as the most effective index in both datasets. Furthermore, it is worth mentioning that NSQI and KSQI consistently achieved the highest OF1 scores compared to other SQIs.

Table 1 Performance metrics of four classifiers (SVM, LDA, LR, and TREE) evaluated through leave-one-out cross-validation on the PURE dataset.
Table 2 Performance metrics of four classifiers (SVM, LDA, LR, and TREE) evaluated through leave-one-out cross-validation on the LGI-PPGI dataset.

We also established straightforward linear thresholds using the support vectors obtained from the linear SVM applied to the labeled NSQI and KSQI features. Figure 4 displays the decision boundaries for the three comparisons, derived from a linear SVM utilizing data from the two combined datasets.

Fig. 4: Decision boundary and KDE of rPPG signals classified by a linear SVM.
figure 4

This figure showcases the decision boundary determined by a linear support vector machine (SVM) in the classification of remote photoplethysmography (rPPG) signals, alongside the kernel density estimation (KDE) for signals from different classes. The visualization elucidates the separation achieved by the linear SVM and provides a density-based perspective on the distribution of rPPG signals across the classified groups. We considered labeled signals reconstructed through the CHROM, GREEN, and OMIT rPPG methods from the PURE and LGIPPGI datasets combined with a window size of 5 s. For each comparison, we considered the best SQI (see Tables 1 and 2): a NSQI for Excellent signals vs. Acceptable signals; b KSQI for Unfit signals vs. Acceptable signals; and c NSQI for Excellent signals vs. Unfit signals.

Furthermore, we have identified three distinct thresholds, as illustrated in Fig. 5, which serve as valuable tools for assessing signal quality based on the normalized index values from previous observations. The first threshold distinguishes between Excellent and Unfit signals, the second threshold separates Unfit from Acceptable signals, and the final threshold demarcates the boundary between Excellent and Acceptable signals.

Fig. 5: Application of thresholds in quality assessment of rPPG for near contact-level cardiac monitoring.
figure 5

This figure illustrates the application of predetermined quality thresholds to assess the signal quality of remote photoplethysmography (rPPG) signals, reconstructed from facial videos for real-time cardiac monitoring, aiming for a precision comparable to traditional contact-based PPG signals from the fingertip. It showcases how Signal Quality Indices (SQIs) thresholds are utilized to differentiate among rPPG signals. This approach provides immediate signal quality feedback, offering a practical tool for researchers to refine rPPG-based cardiac analysis technologies.

Figure 5 effectively encapsulates the essence of our study, illustrating our central contribution: a novel and straightforward approach to rPPG signal quality assessment. Traditionally, real-time video analysis in this context has been resource-intensive, its effectiveness constrained by the time complexity of the algorithms used. Our research counters this challenge by introducing a simplified method, employing a singular SQI with a defined threshold, specifically (NSQI < 0.293). This threshold has been carefully selected to accurately pinpoint high-quality cardiac information within video frames.

This innovative approach markedly reduces the computational burden, simultaneously maintaining high accuracy in the extraction of cardiac data from video streams. The ability to selectively filter video frames based on their assessed quality, focusing exclusively on those classified as excellent, is a distinctive feature of our study. Such precision in identifying high-quality frames is instrumental in optimizing rPPG signal processing, paving the way for more efficient and accurate remote cardiac monitoring technologies.

Discussion

This study introduces a novel approach for assessing the quality of rPPG signals using eight SQIs across three signal quality classes. While the findings are promising, we recognize several limitations:

  1. 1.

    Dataset: The dataset used in this study was limited in both size and diversity, featuring a small number of subjects with limited variation in age and skin color. These constraints were due to practical limitations at the time of the study. Future research with larger and more diverse datasets is crucial to enhance the generalizability and applicability of our findings.

  2. 2.

    Equipment: We employed only a standard webcam and an industrial camera for signal collection. This setup may not fully represent real-world scenarios where a variety of camera types and sensors, especially those in smartphones, are commonly used. Future studies should assess the applicability of our method using a broader range of sensing modalities to ensure its relevance in different contexts.

  3. 3.

    Methodology: Our methodology relies on handcrafted features and traditional machine learning algorithms. Although this approach was effective for our specific dataset, its performance may vary in other datasets or scenarios that require different features or algorithms. Investigating advanced techniques, such as deep learning, could potentially improve the accuracy and robustness of rPPG signal quality assessment in future research.

  4. 4.

    Cardiometrics: The current study focused on the quality of rPPG signals without exploring their correlation with specific physiological parameters like heart rate, respiratory rate, and blood oxygen saturation. Understanding this relationship is essential to determine the clinical relevance of rPPG measurements. We recommend that future studies investigate this correlation and explore the potential clinical applications of rPPG technology.

While this study aimed to identify a singular optimal SQI, it opens avenues for future research in several areas:

  1. 1.

    Combining SQIs: Future research could explore the feasibility of combining multiple SQIs to potentially enhance the quality of rPPG signal assessment. This exploration could lead to more accurate and reliable biosensing applications.

  2. 2.

    Applications: Additional studies are encouraged to explore the broader applications of rPPG technology in clinical and non-clinical settings, further expanding the utility of this promising biosensing method.

In conclusion, this study marks a significant leap in rPPG signal quality assessment by introducing a novel, straightforward methodology. Our approach uniquely simplifies real-time video analysis, traditionally a complex and resource-intensive task. By employing a singular SQI with a specific threshold, such as (NSQI < 0.293), we demonstrate the ability to consistently identify high-quality cardiac information in video frames. This methodology not only enhances the efficiency of cardiac data extraction from videos but also ensures higher accuracy. Our findings suggest that excellent-quality frames can be reliably used for accurate heart and cardiac information analysis. This innovative use of a singular SQI in rPPG signal analysis is a pioneering contribution to the field, offering a practical solution to one of remote biosensing’s most challenging aspects.

Methods

For our study, we utilized two prominent datasets: the LGI-PPGI16 dataset from Pilz et al. and the PURE17 dataset from Stricker et al. Together, these datasets offer a diverse range of scenarios and subjects, encompassing 16 individuals (3 females and 13 males) performing various head motions, resulting in a total of 84 videos. Importantly, each subject’s pulse measurements are available, providing crucial data for our analysis.

The LGI-PPGI dataset provides an in-depth look at the rPPG signal processing in real-world conditions. It includes video recordings of six participants, each performing four distinct activities, yielding 24 videos in total. These activities, ranging from minimal head movement to active scenarios, are critical in understanding the versatility and challenges of rPPG technology:

  1. 1.

    Resting: Minimal movement, indoor setting.

  2. 2.

    Gym: Significant movement, indoor setting.

  3. 3.

    Talk: Outdoor setting with minimal movement.

  4. 4.

    Rotation: Head rotation, indoor setting.

Video lengths vary, capturing realistic scenarios. The pulse oximeter used for reference signals operates at 60 Hz, while the RGB camera records at 25 Hz.

The PURE dataset complements our study with 10 participants engaged in controlled head movements, recorded in 60 sequences. These sequences cover a range of motions, providing valuable insights into the robustness of rPPG methods:

  1. 1.

    Steady: Stationary, direct camera gaze.

  2. 2.

    Talking: Minimal movement, simulated conversation.

  3. 3.

    Slow Translation: Parallel head movements.

  4. 4.

    Fast Translation: Increased speed of head movements.

  5. 5.

    Small Rotation: Head orientation towards nearby targets.

  6. 6.

    Medium Rotation: Broader head rotation.

Each video, approximately one minute long, was recorded with an eco274CVGE camera and a finger clip pulse oximeter under varying natural light conditions. The detailed recording setup enhances the reliability of our findings.

We explored various algorithms for reconstructing remote photoplethysmogram (rPPG) signals from RGB data, each offering unique insights into signal processing. Utilizing the comprehensive pyVHR Python framework18,19, we implemented a range of rPPG methods. Below is an overview of these methods, detailing their distinct approaches to extracting rPPG signals:

  • CHROM20: This method enhances rPPG signal quality by filtering out noise using specific color channels, focusing on chrominance aspects.

  • PBV21: PBV leverages pulse fluctuations in the RGB signal, identifying color changes induced by vascular resistance movements.

  • ICA22: It applies independent component analysis to the RGB signal, isolating components that prominently feature rPPG signals.

  • PCA23: This technique distinguishes the rPPG signal from the overall RGB signal using principal component analysis.

  • POS24: Employing a skin-tone perpendicular plane, POS derives the rPPG signal from the RGB data.

  • LGI16: LGI utilizes local transformations to create a reliable rPPG signal, enhancing the method’s robustness.

  • OMIT25: OMIT reconstructs the rPPG signal through matrix decomposition, ensuring components are linearly uncorrelated.

  • GREEN26: Focusing on the green color channel, this method estimates the PPG signal due to its close resemblance.

These methods represent the diverse approaches in rPPG signal processing, each contributing to a more comprehensive understanding of this advanced technology. Note that each rPPG method was applied independently, followed by a unified filtering step. This approach ensures that the final signal quality assessment is not biased towards any single extraction method, providing a comprehensive evaluation.

Here, we describe the pipeline used for extracting the rPPG (remote photoplethysmography) signal along with the corresponding SQI (Signal Quality Index) values from video recordings. We employed the pyVHR toolbox developed by Boccignone et al.18 to carry out this signal reconstruction. Figure 6 provides a visual representation of the pipeline, where the RGB signal represents the time series of the average color channel values, and the rPPG method refers to the techniques employed to derive the rPPG signal.

Fig. 6: Pipeline from a selfie video to the quality detection of rPPG after applying SQIs.
figure 6

Note: RGB Red, Green, and Blue color channel values, rPPG remote photoplethysmogram.

The initial step of our pipeline involves the frame-by-frame identification of facial patches on the subjects. We achieve this using MediaPipe Face Mesh27, a real-time solution that estimates 468 3D facial landmarks. The selection of specific facial regions is based on their influence on reflected light due to blood volume changes. Factors such as light conditions28, motion29, and makeup30 are also considered. In our datasets, we encountered varying light conditions, motion, and subjects without makeup. Recent research has shown that the cheeks and forehead are optimal regions for rPPG extraction31,32, as supported by their frequent use in the literature33,34.

Consequently, our study assesses a total of 44 landmarks across three different facial regions: the forehead, left cheek, and right cheek. These landmarks are denoted by specific numbers within the pyVHR framework: (10, 67, 69, 104, 108, 109, 151, 299, 337, 338) for the forehead; (36, 47, 50, 100, 101, 116, 117, 118, 119, 123, 126, 147, 187, 203, 205, 206, 207, 216) for the left cheek; (266, 280, 329, 330, 346, 347, 347, 348, 355, 371, 411, 423, 425, 426, 427, 436) for the right cheek.

Each landmark represents the mean value computed across all pixels within a 20 × 20 patch surrounding it. As a result, we obtain 44 RGB values per video frame, leading to a 1500 × 44 × 3 matrix for the LGI-PPGI dataset and a 1800 × 44 × 3 matrix for the PURE dataset. These dimensions are calculated based on the video length, camera fps, the number of landmarks, and RGB channels.

The subsequent step involves the creation of an RGB time series by averaging the 44 RGB values for each frame. Following this, in the third step, we focus on assessing the ability of SQIs to classify signal quality for short window sizes. We apply eight rPPG methods across window sizes ranging from 1 to 12 s and subsequently filter the resulting signal. We employ a six-order bandpass filter with a range of 0.7–3.5 Hz. This process yields the filtered rPPG signal time series with dimensions of (60/window size) × (25 × window size) for the LGI-PPGI dataset and (60/window size) × (30 × window size) for the PURE dataset, representing the number of windows and frames per window.

It is important to note that the rPPG signal reconstruction process incorporates color channel transformations into the selected rPPG methods, with filtering steps employed to extract the rPPG signal from the RGB signal. This filtering step occurs after the application of the eight rPPG methods. In the final step, we extract eight SQIs features from the filtered rPPG time series, calculated as the mean value across all windows.

This comprehensive pipeline allows us to accurately reconstruct the rPPG signal and evaluate its quality in various scenarios.

Here, we present an evaluation of eight SQIs used to assess the quality of PPG (Photoplethysmography) signals. Each SQI is discussed along with its respective mathematical formula. The selection of these eight SQIs for PPG signal quality assessment is based on their demonstrated relevance and effectiveness in existing literature. These SQIs have been widely explored and proven to be informative indicators of PPG signal quality, making them suitable candidates for our study. It is important to note that the equations and definitions of these SQIs are adapted from15.

  • Perfusion (PSQI): The perfusion index is a standard measure for evaluating PPG signal quality. It is calculated as the ratio of pulsatile blood flow to static blood in peripheral tissue, often obtained from a pulse oximeter. The formula for the perfusion index is given by:

    $${{{{P}}}}_{{{{\rm{SQI}}}}}=\frac{({y}_{\rm{max}}-{y}_{\rm{min}})}{| \bar{x}| }\times 100,$$

    where \(\bar{x}\) represents the statistical mean of the raw PPG signal, and y is the filtered PPG signal. In our study, we evaluate the perfusion index using rPPG methods such as CHROM and POS, known for their effectiveness in filtering out noise caused by light reflection.

  • Skewness (SSQI): Skewness is a measure of the asymmetry of a probability distribution and is related to distorted PPG signals. It is defined as:

    $${{{{S}}}}_{{{{\rm{SQI}}}}}=\frac{1}{N}\mathop{\sum }\limits_{i=0}^{N}{\left(\frac{{x}_{i}-{\widehat{\mu }}_{x}}{\sigma }\right)}^{3},$$

    where \({\widehat{\mu }}_{x}\) represents the empirical estimate of the mean of xi, σ is the standard deviation, and N is the number of samples in the PPG signal.

  • Kurtosis (KSQI): Kurtosis measures how the tails of a distribution differ from those of a normal distribution. It determines if extreme values are present in the distribution and has been found to be a good indicator of PPG signal quality. Kurtosis is defined as:

    $${{{{K}}}}_{{{{\rm{SQI}}}}}=\frac{1}{N}\mathop{\sum }\limits_{i=0}^{N}{\left(\frac{{x}_{i}-{\widehat{\mu }}_{x}}{\sigma }\right)}^{4},$$
  • Entropy (ESQI): Entropy quantifies the uncertainty in a signal’s probability density function (PDF) and is another effective indicator of PPG signal quality. Its formula is given as:

    $${{{{E}}}}_{{{{\rm{SQI}}}}}=-\mathop{\sum }\limits_{n = 1}^{N}x{[n]}^{2}{\log }_{e}(x{[n]}^{2}),$$

    where x represents the raw PPG signal, and N is the number of data points.

  • Zero crossing rate (ZSQI): The zero crossing rate indicates the rate of sign changes in the signal, representing how often the signal changes from positive to negative. It is defined as:

    $${{{{Z}}}}_{{{{\rm{SQI}}}}}=\frac{1}{N}\mathop{\sum }\limits_{n=1}^{N}{\mathbb{1}}\{y \,<\, 0\},$$

    where y is the filtered PPG signal of length N, and \({\mathbb{1}}\{A\}\) is the indicator function.

  • Signal-to-noise Ratio (NSQI): This SQI compares the power of the desirable signal to the power of undesired background noise. The formula is:

    $${{{{N}}}}_{{{{\rm{SQI}}}}}=\frac{{\sigma }_{\rm{signal}}^{2}}{{\sigma }_{\rm{noise}}^{2}},$$

    where σsignal is the standard deviation of the absolute value of the filtered PPG signal (y), and σnoise is the standard deviation of the y signal.

  • Matching of multiple systolic wave detection algorithms (MSQI): Different rPPG methods are sensitive to different types of noise. This SQI compares two PPG systolic wave detection algorithms to assess their accuracy in separating events. The formula is defined as:

    $${{{{M}}}}_{{{{\rm{SQI}}}}}=\frac{| {S}_{\rm{Billauer}}\cap {S}_{AT}| }{| {S}_{\rm{Billauer}}| },$$

    where SBillauer represents systolic waves detected by Billauer’s algorithm, and SAT represents systolic waves detected with an algorithm based on the first derivative with adaptive thresholds.

  • Relative power (RSQI): This SQI explores the frequency domain to assess PPG signal quality. It calculates the ratio of power spectral density (PSD) in the 1-2.25 Hz frequency band to the PSD in the entire signal (0–8 Hz), providing a measure of signal quality. The formula is given as:

    $${{{{R}}}}_{{{{\rm{SQI}}}}}=\frac{\mathop{\sum }\nolimits_{f = 1}^{2.25}{\rm{PSD}}}{\mathop{\sum }\nolimits_{f = 0}^{8}{\rm{PSD}}},$$

    where PSD is calculated using Welch’s method.

These eight SQIs provide comprehensive insights into the quality of PPG signals, allowing for the assessment and improvement of data reliability in various scenarios.

Here, we describe the process of automatically labeling rPPG signals into three distinct classes. These labels help assess the quality of the signals obtained through our pipeline.

  1. 1.

    Excellent: This class includes rPPG signals where we can estimate the heart rate (HR) with an accuracy of ±5 beats, and both systolic and diastolic waves are discernible.

  2. 2.

    Acceptable: Signals in this class have HR estimations accurate within ±5 beats, but the systolic and diastolic waves are not discernible.

  3. 3.

    Unfit: Signals falling into this class cannot provide a reliable HR estimate, and the systolic and diastolic waves are not discernible.

Our classification process involves quantitative analyses using dynamic time warping (DTW) and Pearson correlation (r). Specifically, the ‘Excellent’ class is assigned to rPPG signals where both systolic and diastolic waves are discernible, and the HR estimation accuracy is within ±5 beats. Here’s how each evaluation metric is defined:

  1. 1.

    Beats-per-minute difference (ΔBPM): We start by distinguishing between ‘Unfit’ and ‘Acceptable’ rPPG signals. ΔBPM is the mean difference between the predicted (rPPG) BPM and the ground truth (PPG) BPM across all signal windows. It assesses how closely the frequency of the maximum in the power spectrum (PS) of the rPPG signal matches the heartbeat. Signals with ΔBPM ≤ 5 are considered ‘Acceptable,’ in line with acceptable error ranges for commercial wearable HR estimation.

  2. 2.

    Dynamic time warping (DTW): DTW is a robust algorithm that handles temporal fluctuations and noise. We use DTW to quantify the similarity and alignment between rPPG and reference PPG signals, evaluating their temporal features’ correspondence. A lower DTW score indicates better alignment.

  3. 3.

    Correlation (r): Pearson correlation measures the linear relationship between two variables. We calculate r for each sampling point in a window to assess the linear association between PPG and rPPG signals.

The final labeling decision is based on the overall score (OS), calculated as:

$$OS=\frac{1}{2}\left(1-\,{{\mbox{DTW}}}\,+r\right).$$

Here, DTW and r represent the average normalized values across all signal windows for the eight rPPG methods. Signals with an OS > 0.5 are labeled as ‘Excellent,’ while those with an OS below this threshold vary between ‘Acceptable’ and ‘Unfit.’ We observed that window sizes below 3 s lead to unreliable BPM estimations and often result in ‘Unfit’ labels.

We define ‘inter-rater agreement’ as the consensus among eight rPPG methods, treated as individual ‘raters,’ in labeling rPPG signal quality as ‘Excellent,’ ‘Acceptable,’ or ‘Unfit.’ The classification relies on the OS derived from DTW and Pearson’s correlation. Signals with an OS > 0.5 are categorized as ‘Excellent,’ while those below can fall into either ‘Acceptable’ or ‘Unfit’ categories.

Our evaluations consider factors such as color channel transforms, window length, and class distribution of reconstructed signals. Notably, window sizes below 3 s are associated with unreliable BPM estimations. We further analyze the impact of increasing window sizes on signal labeling and the balance between classifications. This comprehensive approach enables a thorough assessment of rPPG method performance and signal quality.