Artificial Intelligence Estimation of Carotid-Femoral Pulse Wave Velocity using Carotid Waveform

In this article, we offer an artificial intelligence method to estimate the carotid-femoral Pulse Wave Velocity (PWV) non-invasively from one uncalibrated carotid waveform measured by tonometry and few routine clinical variables. Since the signal processing inputs to this machine learning algorithm are sensor agnostic, the presented method can accompany any medical instrument that provides a calibrated or uncalibrated carotid pressure waveform. Our results show that, for an unseen hold back test set population in the age range of 20 to 69, our model can estimate PWV with a Root-Mean-Square Error (RMSE) of 1.12 m/sec compared to the reference method. The results convey the fact that this model is a reliable surrogate of PWV. Our study also showed that estimated PWV was significantly associated with an increased risk of CVDs.


Data Description
We used the Framingham Heart Study (FHS) data, a longitudinal epidemiological cohort analysis, in this manuscript 21 . The participants were part of FHS Cohorts Gen 3 Exam 1 22 , Offspring Exam 7 23 , and Original Exam 26 24 . They underwent a comprehensive, noninvasive assessment of central hemodynamics generating a successful collection of a total of N = 6698 tonometry recordings. The recorded PWV measurements were calculated using a simultaneous right carotid tonometry pressure waveform with electrocardiogram recording and a right femoral tonometry pressure waveform with electrocardiogram recording combined with body surface measurements of the participants 7 . This method of measuring is sometimes called the sequential measurement 25 . One can find a comparison of this method to the reference method of measuring PWV in 25 . For a broader description regarding FHS data, please see 7 and references contained within. Some participants had missing or erroneous tonometry waveforms data. We marked these recordings as "faulty record" (N = 1011). From the rest of the data (N = 5687), some recorded PWV measurements had values equal to or greater than 30 m/sec, or even 0 m/sec. We considered these values to be "measurement error" (N = 21). We also excluded the population of age greater than 70 (N = 661) to minimize the CVD treatment effects. Also, we know that individuals having an age of 70 or greater already experience arterial stiffness due to age factor. These filterings led to a total of N = 5020 usable observations. Furthermore, for the prognostics study, we excluded individuals having cardiovascular diseases prior to, or on, their tonometry exam date leading to N = 4798 participants.

Artificial Intelligence Methodology
General Signal Processing. Each uncalibrated tonometry recording from the FHS data included a 10 to 20 second trace. Some of the signal processing parameters were provided by FHS data. For example the unitless variables Augmentation Index (AIx) and Mean Carotid Shape Factor (MCSF) were included as they were provided (See 26 and references contained within). As a quick reference, MCSF is the average value of an arterial cardiac cycle signal normalized by its range. Furthermore, Reflected Wave Arrival Time (RWAT) was also among the variables that FHS had provided with the data (See 26 and references within).
However, to use the IF algorithm, we had to extract arterial cycles from the raw signal. At first, a short-window moving average was used to eliminate the unwanted noise from the signals. The window size was taken to be 0.02 sec. This window size would eliminate noise levels above ~50 Hz. The blood pressure waveform can be encoded with frequencies less than ~25 Hz (50 Hz is twice this value) 27 . In mathematical terms, the moving average s t ( ) we used in our study, for a signal s(t), can be expressed as Our method is not dependent on this choice of noise-filtering and other low-pass filtering approaches could also be used. Then the signals were normalized to remove the effects of breathing and other artificial motions. The normalization was performed using the location of maxima and minima of each recording 28 . We then used a modified version of the automatic cycle selection introduced in 29 to pick cycles. Dicrotic notch was found based on the derivatives and filtering of the picked cycles 28 . These cycles were then fed into the IF algorithm.
Intrinsic Frequency. A typical arterial pressure waveform consists of a systolic and diastolic part. The systolic part is when the aortic valve is open and heart is pumping blood into the aorta and arterial system. The diastolic part is when the aortic valve is closed preventing the blood from re-entering the left ventricle. The closure of the aortic valve, on the pressure waveform trace, is commonly called the dicrotic notch. The IF method 18,19 assumes that there are two constant dominant dynamical frequencies before and after the closure of the aortic valve. These frequencies are called Intrinsic Frequencies (IFs). This method does not need a calibrated aortic or carotid pressure signal; and can even be applied to signals collected by a smart phone 20 .
In the IF method 18,19 , it is assumed that the instantaneous frequencies are piecewise constant throughout the cardiac cycle. The dicrotic notch separates these frequencies. For an aortic pressure waveform, the IF problem can be formulated as with a continuity condition at T 0 (the time of the dicrotic notch) and periodicity at T (the duration of the cardiac cycle). Here, the indicator function is defined as x y [ , ) Also, a 1 , b 1 , a 2 and b 2 are the envelopes of the IF model fit. ω 1 and ω 2 are Intrinsic Frequencies (IFs) of the waveform. Further, p is the mean pressure for the period [0, T). The goal of the IF model (2) is to extract a fit, called Intrinsic Mode Function (IMF), that carries most of the energy (information) from a pressure waveform s(t) in one period. The latter is done by solving the following optimization problem 19 : ω ω ω ω ω ω Here, 2 is the L 2 -norm defined on [0, T). One assumption in this optimization is that the extracted IMF is continuous at the dicrotic notch time T 0 (the first condition in (5)). The other assumption is that the extracted IMF is periodic (the second condition in (5)). The method of the solution of the optimization problem mentioned by (4) and (5) can be found in 19 .
Statistical Learning. Dimensionless parameters or combinatorial mixtures can be extracted from the solutions of (4) and (5). These parameters are both of mathematical and physiological importance as shown in our recent work of noninvasive iPhone measurement of left ventricular ejection 20 . For example, we can normalize ω 1 and ω 2 with respect to systolic and diastolic periods, T 0 and T − T 0 respectively 20 . Even, we can normalize ω 1 and ω 2 by the whole cardiac cycle T. In fact, there is a systematic way to create new variables from a set of given features. We have used the method introduced in 30 to create new features from ω 1 , ω 2 , T, and T − T 0 . Some of the outputs of this method were used in our PWV model. The original set of variables used for feature extraction included IFs and their variants, carotid waveform shape factors such as reflected wave arrival time and augmentation index, and clinical features and blood pressure and age. We specifically used the mentioned clinical variables since a 2010 study published by European heart journal has emphasized that PWV is affected by age and blood pressure 31 . In short, the original set of features is In (6), we have constructed new features from {ω 1 , ω 2 , T, T − T 0 }. To be more specific, we used the methodology introduced in 30 and field expert knowledge to introduce T , Furthermore, in (6), MCSF is the mean carotid shape factor, AIx is the augmentation index, SSN is the supra-sternal notch to femoral site length, RWAT is the reflected wave arrival time for a cardiac arterial waveform cycle, Age is the age of the participant at the time of tonometry reading, P s is the brachial systolic pressure, and P d is the brachial diastolic pressure. In order to provide the most useful subset of these variables into the PWV model, we applied a combination of best subset variable selection methods based on multi-linear regression 32 . Using this approach, we ended up with the variables set c n r s d 2 1 1 ω ω ω ρ = .
After this stage, we kept 20% of the data as a hold back test set (N = 1004). The other part was kept as a train set (N = 4016). These sets were picked at random from the original data. However, we chose them in a uniform way such that, both in train and test sets, the PWV distribution would follow the population distribution. After this stage, on the train set, we performed a bootstrap aggregation (bagging) without replacement (sub-sampling) 33  in each network. Here, |V| is the number of elements in V. The bagging was conducted with sampling 66% at each iteration. A total number of 1000 iteration was used. It is needless to mention that one could stop the iterations when the out-of-bag RMSE reaches a plateau. At each iteration, the neural networks were trained for 100 epochs. A squared penalty of 0.01 was used to prevent over-fitting at each iteration. MATLAB implemented Levenberg-Marquardt backpropagation was used in training the neural nets [34][35][36] .
The whole machine learning pipeline can be expressed as follows: 1. The uncalibrated waveforms where analysed to extract cardiac cycles; 2. IF parameters and waveform features, such as shape factors, were extracted from the selected cardiac cycles; 3. The waveform features were blended in with the routine clinical parameters to construct the original set of features V 0 ; 4. The best subset variable selection method was applied to reduce the dimensionality of the feature-space, namely V; 5. A sub-sampled bagged system of neural networks was trained and tested.
To further analyze our method and the effectiveness of estimated PWV, we conducted a prospective cohort study and used proportional hazards regression models to evaluate the association between PWV and incident CVD. We evaluated this relationship for PWV measurements as well as for estimated PWV values produced by our noninvasive IF method. Subsequently, we compared the hazards of PWV for CVD with that of estimated PWV. Baseline population consisted of participants free from CVDs. Adjusted models included components from Framingham risk score: sex, age, total cholesterol, HDL cholesterol, blood pressure, diabetes and smoking.
Smoking was defined as regular usage within the last 12-months prior to the examination date. The assumption of proportionality was met. All continuous variables were log transformed to address skewness. Predictive value was evaluated via likelihood ratio test and the Akaike Information Criterion (AIC). Only complete cases without missing data were studied. Kaplan-Meier plots of cumulative probability of a first major CVD event were constructed for PWV and also for estimated PWV, when participants were grouped according to tertiles of PWV and tertiles of estimated PWV. Log rank test was used to compare the unadjusted Kaplan-Meier curves. p-values < 0.05 were considered as significant.

Results and Discussion
PWV Model Results. The population demographics of these variables are shown in Table 1. Convergence of the ensemble of the models was guaranteed by a flat RMSE plot, of both train (RMSE = 1.04) and test (RMSE = 1.12) sets, over the total number of iterations, Fig. 1. The 0.09 gap between the train and test sets, in Fig. 1, shows that the model has an acceptable generalization capability. The total RMSE, on the whole dataset including the train and test sets, was 1.05 m/sec. Our simulations with Decision Tree (DT), boosted DT, boosted Neural Networks 33 show similar but marginally larger RMSE values.
The estimated PWV versus the measured values are plotted in Fig. 2. Our model's correlation of the estimated PWV with respect to FHS sequential measured PWV is 0.85. The Bland-Altman plot of the results is shown in Fig. 3. The limits of agreement are approximately ±2.07.
Prognosis Results. Study exclusion criteria resulted in a sample of 4798 usable observations, which included individuals 19 to 69 years old without CVD at the baseline examination. The characteristics of the study sample are presented in Table 2. Within a follow up period of 10 years, 171 participants had a CVD event. Cox proportional hazards models for PWV and estimated PWV are presented in Table 3. After adjusting for standard risk factors, both PWV and estimated PWV were significantly associated with an increased risk for a first major CVD  where std() is the standard deviation operator. From the model results, we also have   ( ) We can again observe that larger values of PWV correspond to more error.   It is logical to state that the error between the reference and sequential methods is independent from the error between our model and sequential methods. As a result, using this fact and Equations (17) With Complior, there is simultaneous measurement of the pressure pulse. The technician places one probe at the patient's carotid location and a second probe at the femoral location. Then the distance between these two locations is calculated and entered into Complior software. Cuff blood pressure is also measured and entered into the software. PWV measures are subsequently generated after a proprietary algorithm is used to measure the pulse transit time between the two locations. With PulseTrace, the stiffness index is estimated by analyzing the photoplethysmographic waves obtained on the fingertip of the individual. The index is calculated by dividing the height of the participant by the time delay between the first systolic peak and the early diastolic peak of the signal. In a prospectively designed validation study led by Feistrizer et al. 46 , aortic PWV estimates from oscillometric technique were generated from 40 participants free of CVDs between 24 and 55 years old. These measurements were compared to the aortic PWV values produced by the reference method of cardiac magnetic resonance. Analysis of agreement between the two methods showed Bias = 0.57 m/sec, LoA = ±1.92 m/sec m/sec and r = 0.86. In the aforementioned study, only 28% of the participants were females and the median age of the cohort was 34 years. According to the authors of the paper, the study population does not meet the Artery Society guideline's for PWV validation. In specific, the Artery Society needs a homogeneous sex distribution (a minimum of 40% for either sex) as well as a homogeneous distribution along the age groups. Another study designed by Hametner et al. 45 , compared oscillometric estimations of aortic PWV against intra-aortic arterial PWV measurements using a population of 120 patients undergoing elective cardiac catheterization for suspected coronary artery disease (22 patients with age ≤ 50 years old and 29 patients with age ≥ 70 years old). Exclusion criteria consisted of unstable clinical conditions, arrhythmias and valvular heart disease. In their work, to estimated PWV a number of variables from pulse wave analysis and wave separation were combined in a mathematical model in which the major determinants were age, central pressure and aortic characteristic impedance. The following results were then reported by the authors: Bias = 0.43 m/sec, LoA = ±2.45 m/sec with a correlation of r = 0.81. This study also does not follow the Artery Society guidelines as only 10% of participants were females.
In a recent study by Campo et al. 47 , it is shown that the aortic PWV can be measured non-invasively with a bathroom scale. The authors combined the principles of ballistocardiography and impedance plethysmography on a single foot to estimate the aortic PWV. They compared their PWV estimations to measured PWV s, on a group of 205 participants. On the validation set, they reported r = 0.7, Bias = 0.25 m/sec, and LoA = [−2.48, 2.98]. According to the authors, this new technique presents a few limitations including the gait instability, which affects more frail elderly and some neurological diseases. Other types of diseases may also influence the applicability of the measurement, like for example atrial fibrillation or skin diseases. The population study had several exclusions too. For example, pregnant participants or participants that had morbid obesity (BMI > 35) were excluded.
In another recent publication 48 , Greve et al. argued that the need for a non-invasive PWV estimation is imminent because of the relative inaccessibility of devices such as high-quality applanation tonometry. They have proposed using an equation based on age and mean arterial pressure to perform the estimation. They show that in a healthy group without cardiovascular risk factors, the correlation of the estimation with measured values is r = 0.52. This could be seen as a major limitation of that study. Furthermore, they showed that in an apparently healthy patients with cardiovascular risk factors the correlation is r = 0.67. Finally, within the group with known CVDs, the correlations is reported to be very low (r = 0.37). They also concluded that the estimated PWV could predict cardiovascular events independently of the traditional cardiovascular risk factors. However,in another smaller study 49 ,Greve et al. claimed that the estimated PWV only predict CVDs in apparently healthy individuals. Moreover, the estimated PWV reclassifies apparently healthy participants to a higher risk category.
Innovative aspects of proposed method. The  Complior, PulseTrace and Oscillometric, the IF technique is less intrusive and easier to operate that offers a more practical solution to PWV estimations.
The presented study uses an artificial intelligence technique to render an accurate estimation of central arterial stiffness (carotid-femoral PWV). The sample size used in this study is large enough and includes both healthy and CVD volunteers to offer an adequate statistical power. This study also uses a homogeneous age and gender distribution population.
We further analyzed our results based on the Artery Society guidelines for validation of non-invasive hemodynamic measurement devices 50 . We subsequently excluded the following from our results before generating updated Bland-Altman plots: individuals with a BMI 30 (due to problems regarding the measurement of an accurate path length) and PWV ≥ 15. The updated results after the fore-mentioned filtering were: Bias = 0.03 m/sec, LoA = ±1.76 m/sec (SD = 0.88); see Figs 4 and 5. These outputs would be graded as acceptable by the Artery Society.
Risk Evaluation. The results, see Table 3, suggest that PWV and our estimated PWV both convey comparable risks for incident CVD in a model adjusted for standard risk factors. The use of Kaplan-Meier failure method showed that when participants were grouped according to tertiles of PWV, the probability of developing a CVD event increased with the group displaying higher PWV values (log-rank test p < 0.0001); see Fig. 6. The same observation was made when participants were grouped by tertiles of the estimated PWV. The group with estimated PWV of 7.86 m/sec or higher was at an increased risk of developing a CVD event (log-rank test p < 0.0001);  Model Performance. Following the Kaplan-Meier failure method analysis, we segmented the PWV data into three different groups, to check the model performance for different subsets according to PWV values. Figure 8 shows that in all three tertiles PWV ≤ 6.5, 6.5 ≤ PWV ≤ 7.9 and PWV ≤ 7.9, the model presented in this paper has an acceptable performance.

Study Limitations.
The major limitations of this study are: 1. The population used in this study was not racially diverse (mostly Caucasian). For a more general conclusion, we need to have a more diverse study population. 2. The automatic cycle selection, used in this study, is prone to mis-identification of cardiac cycles and dicrotic notches. However, the effect of the related error on overall finding of this study is insignificant. 3. The sequential method used to calculate the PWV might depict error either in body surface measurements of wave travel times at high PWV values.

Conclusions
In this paper, we have introduced a novel artificial intelligence method to estimate the carotid-femoral pulse wave velocity. This method is based on the newly introduced Intrinsic Frequency method 19 and as inputs uses only an uncalibrated carotid pressure waveform with typical clinical variables such as blood pressure. The main advantages in having an estimated PWV from an uncalibrated carotid pressure waveform, with few typical clinical variables such as blood pressure, would be that it is does not need an ECG measurement nor a femoral tonometry recording. As a result, it is easier, and potentially can be done by a smart phone as we have shown in our previous publication that carotid waveform can be easily measured using a regular iPhone camera 20 .
Here, in this article, we have been able to address the need of estimating PWV by providing an accurate and precise statistical model estimating pulse wave velocity. The model presented in this manuscript can estimate PWV with an RMSE of 1.12 m/sec, compared to the reference method.  We have provided an error analysis and comparison to other methods currently in use in order to support the conclusion that the presented model is an acceptable surrogate for arterial stiffness. Furthermore we conducted a prospective investigation to analyze the predictive value of estimated PWV in relation to the onset of CVDs. Our study showed that estimated PWV was significantly associated with increased risk of CVDs. Data Availability. The data used in this study can be requested from FHS directly. It is publicly available to qualified investigators. An approved research proposals could be qualified to receive the de-identified data. FHS data, in general, can be requested by a research application submission to one of the following: • Directly from Framingham Heart Study (https://www.framinghamheartstudy.org/), BioLINCC (https://biolincc.nhlbi.nih.gov/home/), or • dbGaP (https://www.ncbi.nlm.nih.gov/gap).
The manuscript data can be found using the following links: The upper-right plot is for 6.5 < PWV < 7.9. The lower-left plot is the Bland-Altman for 7.9 ≤ PWV. All three different tertiles show that the model presented in this paper have an acceptable performance.