Synthetic photoplethysmogram generation using two Gaussian functions

Evaluating the performance of photoplethysmogram (PPG) event detection algorithms requires a large number of PPG signals with different noise levels and sampling frequencies. As publicly available PPG databases provide few options, artificially constructed PPG signals can also be used to facilitate this evaluation. Here, we propose a dynamic model to synthesize PPG over specified time durations and sampling frequencies. In this model, a single pulse was simulated by two Gaussian functions. Additionally, the beat-to-beat intervals were simulated using a normal distribution with a specific mean value and a specific standard deviation value. To add periodicity and to generate a complete signal, the circular motion principle was used. We synthesized three classes of pulses by emulating three different templates: excellent (systolic and diastolic waves are salient), acceptable (systolic and diastolic waves are not salient), and unfit (systolic and diastolic waves are noisy). The optimized model fitting of the Gaussian functions to the templates yielded 0.99, 0.98, and 0.85 correlations between the template and synthetic pulses for the excellent, acceptable, and unfit classes, respectively, with mean square errors of 0.001, 0.003, and 0.017, respectively. By comparing the heart rate variability of real PPG and randomly synthesized PPG for 5 min in 116 records from the MIMIC III database, strong correlations were found in SDNN, RMSSD, LF, HF, SD1, and SD2 (0.99, 0.89, 0.84, 0.89, 0.90 and 0.95, respectively).

Scientific RepoRtS | (2020) 10:13883 | https://doi.org/10.1038/s41598-020-69076-x www.nature.com/scientificreports/ is associated with the closing of the aortic valve 10 . Note that a number of factors, such as gender, age, and disease, can change the morphology and duration of the PPG pulse, making it appear different from a normal PPG pulse. There are four publicly available PPG databases: Multiparameter Intelligent Monitoring in Intensive Care (MIMIC III) 11 , BioSec.Lab 12 , PPG-BP 13 , and Wrist PPG 14 . The MIMIC database signals are recorded at a sampling rate of 125 Hz and have different time durations, but the subject's condition at the time of measuring is not clear. The signals were recorded at 100 Hz in the BioSec.Lab PPG dataset with a 3-min duration of each segment, and these subjects are measured at the fingertip, and in different conditions, such as relax, after exercise, short time-lapse and long time-lapse. The signals recorded in the PPG-BP database are sampled at a rate of 1 kHz for 2.1 s, and the measuring site is the fingertip. In the Wrist PPG database, the signals were recorded at a sampling rate of 256 Hz and varying durations, and recorded during exercise, the measuring site is the wrist. Typically, algorithms developed and tested on these databases are database-dependent. To evaluate developed algorithms reliably, it needs to test them in different conditions. Although the recording conditions of different databases may differ, there are few options for different conditions, which makes it challenging to evaluate the performance of currently available algorithms in different clinical settings over a range of noise levels and sampling rates. There is a need for a synthesizer that can generate PPG dynamically (with variable lengths, different sampling rates, different types of morphologies, and noise levels). Besides, synthesized PPG signals can also be used to reconstruct missing segments of the PPG signal.
Modeling of PPG pulses has been attempted earlier studies. Shariati and Zahedi compared four linear parametric models 15 . Wang et al. used multi-Gaussian functions to fit a single PPG waveform 16 . They compared the simulation effects of different numbers of Gaussian functions. However, these only addressed a single heartbeat. Other papers generated a PPG signal from a sequence of pulses. Martin-Martinez et al. 17 proposed stochastic modeling to synthesize PPG signals. They designed a single-pulse model based on two Gaussian functions comprising 10 parameters. The mean of two autoregressive moving average models was used to approximate these 10 parameters; however, the original PPG signal is required for the time evolution of the 10 parameters in the model. Sološenko et al. proposed a model for simulating PPG during atrial fibrillation 18 . They used one log-normal and two Gaussian waveforms to simulate a single pulse and extract the RR intervals from the ECG in order to connect individual PPG pulses according to the RR intervals; however, their model required an ECG signal as an input parameter. In contrast, this paper proposes a dynamic model that can generate synthetic PPGs with different lengths and sampling frequencies.

Results
The dataset we used was 116 subjects from the MIMIC III database 11 . The PPG for each subject was recorded at a sampling frequency of 125 Hz for a duration of 5 min.

Similarity in single pulses.
To test the ability of the model to simulate different shapes of PPG pulse, the model was used to emulate three classes 19 of normalized PPG templates (referred to as excellent PPG, acceptable PPG, and unfit PPG). Excellent PPG occurs when the systolic and diastolic waves are salient. Acceptable PPG occurs when the systolic and diastolic waves are not salient, but the heart rate can be determined. Unfit PPG occurs when the heart rate cannot be determined and the systolic and diastolic waves cannot be distinguished. The circular motion principle. ω is the angular velocity; it is fixed per pulse, but changes with different pulses. This figure shows how the circular motion will be used to generate a PPG waveform in a periodical manner.
Scientific RepoRtS | (2020) 10:13883 | https://doi.org/10.1038/s41598-020-69076-x www.nature.com/scientificreports/ There is one template for each class of PPG pulse. These templates are extracted manually from the database we used and from different subjects. The mean square error (MSE) and the correlation coefficient were used to evaluate the model-fitting accuracy of the algorithm. MSE is expressed as follows: where z p (n) and s(n) represent individual points of the synthetic PPG and real PPG, respectively, and l is the length of the signal. Figure 2 shows the morphology of the excellent and acceptable template PPG and the synthetic PPG using the optimization parameters. And Fig. 5 shows the result of simulating the unfit template. The corresponding optimized parameters, MSEs, and correlations are shown in Table 1. This model achieved a high correlation and low MSE when simulating the excellent and acceptable templates. Note that the length of these templates was different.
Variability of beat-to-beat intervals. Valley-to-valley intervals are always similar to peak-to-peak intervals in PPGs; both of them correspond to heart rate 20 . To test the pulse duration generated by the proposed model, we compared heart rate variability (HRV) of the synthetic PPG and the real PPG. The real PPGs are the 116 subjects from the MIMIC III database. For each real PPG, we synthesized a PPG by the proposed model with the same mean heart rate and standard deviation of the beat to beat intervals. To look for the peaks, the real PPG is filtered by the Chebyshev II filter at the frequency range 0.5-15 Hz. And then, a simple algorithm that looks for local maxima within a small window was used to detect the peaks. After synthesizing the PPG without noise, the synthesized PPG will undergo the same processing to obtain peaks. These peaks were used to calculate the HRV parameters. Note that the synthetic PPG is the same length as real PPG.
HRV is defined as the variation in the time interval between heartbeats. We compared the HRV parameters of real PPG and synthetic PPG for the following: standard deviation of beat to beat intervals (SDNN) and root Figure 2. Synthesized PPG signal using (a) excellent pulse template and (b) acceptable template. The optimal parameters for generating synthesized PPG signals based on these templates are shown in Table 1. An excellent PPG template means that the systolic and diastolic waves are salient (more clinical features, including heart rate, can be determined). An acceptable PPG template means that the systolic and diastolic waves are not salient (only heart rate can be determined). Here, MSE is the mean square error between the synthetic PPG and the template PPG, and r is the correlation coefficient. This figure shows that the model can generate excellent and acceptable PPG pulses with high correlation and low MSE.
Scientific RepoRtS | (2020) 10:13883 | https://doi.org/10.1038/s41598-020-69076-x www.nature.com/scientificreports/ mean square of the successive differences between adjacent beat-to-beat intervals (RMSSD) in the time domain, low-frequency (LF) power and high-frequency (HF) power in the frequency domain, the standard deviation of the Poincare plot (PP) perpendicular to the line of identity (SD1) and the standard deviation of the PP along to the line of identity (SD2) 21,22 . LF is the total spectral power of all beat-to-beat intervals between 0.04 and 0.15 Hz, while HF is total spectral power between 0.15 and 0.4 Hz. Figure 3 shows the comparison of the HRV between the synthetic PPG without noise and the real PPG over a 5-min interval of data for the 116 records. Pulse parameters of the synthetic PPG corresponded to those of the excellent pulse template. Noise addition. Figure 4 shows the 10-s synthetic PPG generated by the dynamic model. Different levels of noise were added to Fig. 4a-c. Min-max normalization was performed after adding noise. Three different values of low-frequency noise were added in Fig. 4a, five different values of high-frequency noise were added in Fig. 4b, and a mixture of five different values, including low and high-frequency noise, was added in Fig. 4c.

Discussion
This work developed a dynamic model for generating synthetic PPGs with any sampling frequency and duration that are characteristic of real PPGs. The average morphology can be controlled by specific parameters as in dynamic ECG models 23 . Table 2 compares our work to other studies. The reason behind using only two Gaussian functions to represent the main two phases of the cardiac cycle: Diastole and systole. Increasing the number of Gaussian functions can improve the fit accuracy 16 , but the advantage of using only two Gaussian functions is that the function parameters correspond to the features of the systolic and diastolic peaks. We used MSE and correlation as similarity measures to compare between the original PPG signal and synthesized. The reason behind using MSE is it is a very popular distance measure for quantifying similarity 24 , and the reason behind using Pearson's Correlation coefficient is the robustness in quantifying morphological changes in time series physiological signals such as PPG 25 . To obtain optimal parameters for the model, the two similarity measures were used simultaneously. In other words, we optimized the model's parameters based on MSE and Pearson's correlation coefficient at the same time. By using the optimized parameters, this model can simulate both excellent and acceptable real PPG pulses even if only two Gaussian functions are used; however, it is not highly correlated for unfit templates. It is difficult to accurately extract the features using the morphology of unfit PPG beats. If a higher fit accuracy is required, we could easily add more Gaussian functions in a single pulse to the model.
The dynamic model converts the independent variable t in the Gaussian functions from time to angle using the arctan2 function. The six model parameters are then used as is without additional tuning for generating PPG waveforms. Optimized model parameters (obtained from tuning a certain PPG template) can be used for synthesizing modified forms of the PPG signal by changing the sampling rate, morphology and duration. This formulation could be considered as an improvement over previous formulations. Moreover, the formulation of the model is more clinically appealing and interpretable as the two Gaussian functions are representing systole and diastole.
Another advantage of the proposed model is that it can generate PPG signals without using additional biosignals such as ECG. The proposed model generates beat-to-beat intervals in a random fashion to simulate real PPG signals. Martin-Martinez 17 used two autoregressive moving average models to approximate beat-to-beat intervals based on real PPG signals; however, it requires the original PPG to generate a synthesized PPG signal. Sološenko 18 used an additional biosignal (from an ECG) to induce variability in the beat-to-beat durations of the synthesized PPG signal. In contrast, our proposed model only requires the mean HR and the standard deviation of the beat-to-beat intervals to synthesize a PPG without the need for additional biosignals.
The objective of this work was to define a model that could generate synthetic PPGs representing excellent and acceptable PPG waveforms. The model could not simulate unfit forms. As shown in Fig. 5, the MSE between the synthetic PPG and the real PPG was 0.0166 and the correlation was 0.85 for the unfit template. The optimization Table 1. The optimal parameters for each PPG template obtained using the interior point method. θ 1 and θ 2 are the position of the peak centers of the first Gaussian function and the second Gaussian function, respectively; a 1 and a 2 are the peak heights, b 1 and b 2 are the standard deviations of the first and second Gaussian functions; respectively. MSE is the mean square error, and r is the correlation coefficient. An excellent PPG template means that the systolic and diastolic waves in pulse are salient. An acceptable PPG template means that the systolic and diastolic waves are not salient, but heart rate can be determined. Unfit means that the heart rate cannot be determined and the systolic and diastolic waves cannot be distinguished.  Table 1. The optimization algorithm stopped when the current step size was less than the selected value of the step-size tolerance (1 × 10 −10 ) . The value of the objective function cannot be close to zero when the PPG is unfit. Using two Gaussian functions to represent a single PPG waveform is not sufficient to adequately capture the morphological changes of the unfit template. Therefore, one of our next steps is to increase the number of Gaussian functions to generate different PPG morphologies, including unfit waveforms. Note that the beauty of the proposed method is the clinical interpretability of the model as the Gaussian functions are representing the systolic and diastolic waves in PPG waveform. The increase in Gaussian functions could be less clinically meaningful.
Another limitation of this work is that arrhythmia is not considered due to the lack of clinical information available for arrhythmic subjects. It is known that arrhythmia causes changes in the duration and amplitude of the PPG pulse 18 . These changes impact the morphology of arrhythmia PPG waveforms different from that of normal ones. We plan to use other databases to determine the effect of different types of PPG waveform morphologies, and induce arrhythmic beats into the normal synthetic signal. We will also test event detectors (e.g., to detect systolic waves) based on this model. However, the proposed model could allow us to understand the underlying changes in abnormalities (e.g., changes of parameters for generating systolic and diastolic waves in subjects with and without hypertension).
It is essential to have the ability to add different types of noise to the simulated PPG signals to simulate different real-life conditions. For example, in the case of signal-to-noise ratio greater than 2, adding low-frequency and SD2, respectively. Average = (simulate ′ + real ′ )/2 and difference = real ′ − simulate ′ where real ′ and simulate ′ are the normalized values of HRV parameters for real PPG and synthetic PPG signals. The equation of normalization is X ′ = (X−mean of X) /standard deviation of X , where X stands for the HRV parameters. (b), (d), (f), (h), (j) and (l) are correlations of SDNN, RMSSD, LF, HF, SD1 and SD2, respectively. r is the correlation coefficient. p is the p value. ms = milliseconds. This figure shows that the model can obtain a synthetic PPG with a similar HRV to a real PPG by given the same mean heart rate and standard deviation of beat-to-beat intervals.

Methods
The PPG generator consists of two main parts: modeling a single PPG waveform and generating a PPG signal. The main idea behind the work is to generate a sequence of PPG waveforms based on the circular motion principle, as shown in Fig. 1.
Modeling single PPG pulses. In this model, the PPG waveform is the trajectory of motion in the threedimensional space described using a Cartesian coordinate system (x, y, z). The periodicity of PPG is represented by circular motion, as shown in Fig. 6a. The trajectory of motion in the (x, y) plane is mapped to the unit circle. One sweep of the circle corresponding to a peak-to-peak interval or heartbeat. (x, y) is defined as: where t is the time, and ω is the angular velocity used to control the duration of the pulse, calculated by: where T is the duration of the PPG pulse. The trajectory in the z direction is the resulting PPG signal. θ is introduced as an independent variable for motion in the z direction. θ is the four-quadrant inverse tangent of (x, y), defined as: Regardless of the changes to (x, y), θ is considered to the range of (− π, π) . Peaks on the PPG, such as the systolic peak and the diastolic peak, were simulated by Gaussian functions, defined as: where A is the peak height, µ is the position of the peak center, and σ is the standard deviation of the two Gaussian functions.
These two peaks were placed along the unit circle at fixed angles θ 1 and θ 2 . In this model, z is the sum of the Gaussian functions for the variable θ . A periodical PPG waveform is generated through circular motion as follows: (2) x(t) = cos(ωt) y(t) = sin(ωt), Figure 5. Synthesized PPG waveform based on the unfit template. The optimal parameters of the synthesized unfit PPG waveform are shown in the row corresponding to the unfit group in Table 1. Unfit means that the heart rate cannot be determined and the systolic and diastolic waves cannot be distinguished. Here, MSE is the mean square error between the synthetic PPG and the template PPG, and r is the correlation coefficient. This figure shows that the proposed model cannot synthesize the unfit PPG pulses.
Scientific RepoRtS | (2020) 10:13883 | https://doi.org/10.1038/s41598-020-69076-x www.nature.com/scientificreports/ where t 0 is the end time of the previous beat, π is used to align the initial point of this model to the position of the onset (θ = −π) in a PPG waveform. The method used for the selection of parameters is introduced in the next section. The corresponding changes to x , y , θ , and z over a single period are shown in Fig. 6b-e; these are repeated in the next pulses. In this figure, the pulse duration was one second, and the sample frequency was 125 Hz.
Parametric optimization. The peak height, position of the peak center, and standard deviation of the two Gaussian functions were used to determine the morphology of the synthetic PPG. In this study, three types of real PPG pulse templates (described earlier) were used to evaluate the model. There was one PPG pulse in each type of template. The objective of the optimization step was to determine model parameters that would result in matching the synthetic PPG to the real PPG as closely as possible. The corresponding objective function was expressed as follows: s.t. p = {a 1 , θ 1 , b 1 , a 2 , θ 2 , b 2 } . with the constraints: where z p (n) is the synthetic PPG, l is the length of the real PPG s(n) , and corr is Pearson's linear correlation coefficient, to test the correlation between the synthetic PPG and the real PPG, which in turn, calculated as follows: p * = arg min p ((1 − corr(z p (n), s(n))) + l n=1 (z p (n) − s(n)) 2 )  Table 1). The blue curve represents the first Gaussian, which corresponds to the systolic wave, and the red curve represents the second Gaussian, which corresponds to the diastolic wave. www.nature.com/scientificreports/ where n is the signal length. α i , β i are the individual points with index i, and α , β are the mean value of α , β . In this study, the interior-point method 26 was used to solve the optimization problem.
Pulse duration generator. It is clear from the model that the pulse durations are equal to the onset-toonset intervals, also called valley-to-valley intervals in synthetic PPG signals. The difference between valleyto-valley intervals is represented by the angular velocity ω . The dynamic model can repeat the morphology to generate the required signal length. By providing a series of valley-to-valley intervals, the model can synthesize a continuous PPG. The pulse durations in the synthetic PPG are generated using the normal distribution random function with mean and standard deviation. The mean value and the standard deviation are calculated according to the mean heart rate and the standard deviation of beat-to-beat intervals provided by the user. Variability of parameters on a beat-to-beat basis was not considered in this study.
Noise addition. After synthesizing the clean signal, the noise was incorporated into the signal based on the specific need identified. In this work, a simple way to incorporate noise signal was introduced. The noise signal was defined as: where B is the amplitude of the noise, and f is the noise frequency. For instance, an example of incorporating low-frequency noise, such as baseline wander, would be to adopt an amplitude of 0.4 and frequency of 0.2 Hz. Another example of incorporating high-frequency noise would be to adopt an amplitude of 0.02 and frequency of 50 Hz. If necessary, one can add a combination of different frequencies and amplitudes, even including different types of noise to the same synthesized PPG signal.