Efficient detection of aortic stenosis using morphological characteristics of cardiomechanical signals and heart rate variability parameters

Recent research has shown promising results for the detection of aortic stenosis (AS) using cardio-mechanical signals. However, they are limited by two main factors: lacking physical explanations for decision-making on the existence of AS, and the need for auxiliary signals. The main goal of this paper is to address these shortcomings through a wearable inertial measurement unit (IMU), where the physical causes of AS are determined from IMU readings. To this end, we develop a framework based on seismo-cardiogram (SCG) and gyro-cardiogram (GCG) morphologies, where highly-optimized algorithms are designed to extract features deemed potentially relevant to AS. Extracted features are then analyzed through machine learning techniques for AS diagnosis. It is demonstrated that AS could be detected with 95.49–100.00% confidence. Based on the ablation study on the feature space, the GCG time-domain feature space holds higher consistency, i.e., 95.19–100.00%, with the presence of AS than HRV parameters with a low contribution of 66.00–80.00%. Furthermore, the robustness of the proposed method is evaluated by conducting analyses on the classification of the AS severity level. These analyses are resulted in a high confidence of 92.29%, demonstrating the reliability of the proposed framework. Additionally, game theory-based approaches are employed to rank the top features, among which GCG time-domain features are found to be highly consistent with both the occurrence and severity level of AS. The proposed framework contributes to reliable, low-cost wearable cardiac monitoring due to accurate performance and usage of solitary inertial sensors.


Methods
Motivation. Following the occurrence of AS and in turn the changes in the forces against which the heart has to contract to eject blood, the morphology of the cardiac signals of AS patients are expected to differ from their normal states 2 . For instance, it has been demonstrated that the progression of stenosis is conclusively correlated with ECG ST-T wave changes in a retrospective study on 29 patients 37 . Another study has recognized prolonged Q-T wave to be an in-dependent predictor of mortality among AS patients 38 . Furthermore, the occurrence of stenosis causes HRV parameters to change accordingly 39 , which will be scrutinized as a common characteristic among AS patients in this work. Figure 1 depicts the ECG, GCG X , and GCG Y from top to bottom, respectively, wherein GCG X and GCG Y are annotated according to the method discussed in the following sections. These axes of GCG provide useful information about the cardiac activity timing intervals as demonstrated in previous research works 21,22 . Hence, they are expected to provide potential insights into the diagnosis of AS.
Experimental protocol and setup. This study includes thirty-two AS patients (sixteen males and sixteen females), and thirteen healthy subjects (seven females and six males). Among the AS patients, eleven, twelve, and nine patients were diagnosed with the severity levels of mild, moderate, and severe, respectively. The average (standard deviation) ages of the AS and healthy groups are 84.18 (9.61) years and 68.38 (17.68) years, respectively. The demographic information of the AS and non-AS groups is summarized in Table 1.
Linear and angular vibrations of the chest wall were recorded using a commercial wearable sensor node (Shimmer3 from Shimmer sensing) secured by a band strap on the mid-sternum along the third rib. A threeaxis accelerometer records SCG, and a three-axis gyroscope records GCG. Each modality is recorded in three dimensions: the x-axis corresponding to the shoulder-to-shoulder direction (along the coronal axis), the y-axis corresponding to the head-to-toe direction (along the longitudinal axis), and the z-axis corresponding to the dorso-ventral direction (along the anterior-posterior axis). In this paper, the dimensional letters X, Y, Z appended as sub-scripts to SCG/GCG denote the signal from the corresponding axis. Simultaneously, a four-lead ECG www.nature.com/scientificreports/ was recorded as a reference sensor. All waveforms were recorded synchronously at a 256 Hz sampling rate. The experimental setup is shown in Fig. 2. All data were collected at the cardiac care Unit of the Columbia University Medical Center (CUMC). The subjects were seated at rest on a bed or a chair for at least five minutes. They breathed naturally without controlling their breathing depths. The patient experimental protocol was approved by the Institutional Review Board of Columbia University Medical Center (CUMC) under protocol number AAAR4104. All methods were carried out in accordance with relevant guidelines and regulations. All participants provided written informed consent to take part in the study. Collected data were transferred to a computer and processed in a Python framework. The flow graph of the pre-processing and feature extraction procedure is illustrated in Fig. 3.
Signal pre-processing. As shown in Fig. 3, all channels of SCG and GCG signals were initially bandpass filtered using a 4th-order Butterworth filter with cut-off frequencies of 1-45 Hz and 1-20 Hz, respectively. Subsequently, motion artifacts associated with movements during recordings were removed using a root-meansquare (RMS) filter. In most of the literature, the RMS filter is employed by applying a M = 500 ms sliding window for signal segmentation, whereas in this work M is optimized as discussed in the following sections. Meanwhile, the segment removal threshold was selected twice the median value of the filter. It is worth noting that after motion artifact removal, signal chunks were not attached to each other, but processed separately. Therefore,  Figure 3. Left panel: pre-processing, right panel: feature extraction flow graphs. In the pre-processing step, motion artifacts and baseline wandering are removed from signals, which is followed by signal segmentation, peak detection, and annotation. Feature extraction is carried out for time and frequency HRV parameters as well as the GCG morphological features. Features are eventually concatenated to create a feature vector. www.nature.com/scientificreports/ if the duration of a chunk was less than N seconds, we would not take it into account for further processing. Here, N represents the size of the chunks, out of which the desired features are extracted. Inspired by 35 , a peak detection algorithm, as shown in Fig. 4, was designed to detect the GCG X and GCG Y peaks and annotate the fiducial points according to Fig. 1. To this end, the three axes of each of SCG and GCG were combined using the root-mean-square (RMS) function, generating the linear and angular resultant vectors, respectively, as shown in Fig. 4. It is followed then by an envelope detection technique leveraging Hilbert transform 40 . Next, a 2nd-order Butterworth low-pass filter with a cut-off frequency of 2 Hz was applied to eliminate abrupt changes in the signal. Afterwards, an adaptive peak detection algorithm based on the Pan-Tompkin method was used to discriminate the real peaks in the resulted signal from summation of linear and angular envelopes 41 . Fig. 5 illustrates the six channels of SCG and GCG modalities followed by their corresponding beats detected by the algorithm. In the end, to locate the exact positions of the peaks in each of the 6 axes, a 50-ms window, centered at the detected peaks of the summation signal, was designed to find the local maxima associated with each axis.
After peak detection within an N-second chunk followed by a 1-10 Hz band-pass filter, signal annotation was carried out for GCG X and GCG Y , where I, J, K, L, and A points were characterized as proposed in 33 and marked in Fig. 1. Along with the mentioned points, two other parameters that we named as the maximum acceleration point (P) and its corresponding amplitude, maximum acceleration (MA), were also extracted. According to    42 . For temporal HRV parameters, a few time-domain analyses were applied to the series of successive inter-beat intervals (IBIs). The normal-to-normal IBI (NN) is defined as the interval between consecutive J peaks in the GCG signal 43 . A few HRV parameters were extracted from the NN time series, such as the average (AVNN), the standard deviation (SDNN), root-mean-square of successive differences (RMSSD), and the proportion of the number of adjacent NN intervals whose durations differ more than 50 ms (NN50) to the total number of NNs (pNN50). It is worth mentioning that SDNN, RMSSD, and pNN50 are of great clinical significance as they allow for measuring cardiac risk, respiratory arrhythmia, and parasympathetic nervous activity 42,44 . Additionally, to further explore the impact of NN on AS detection, we introduced the median, skewness, kurtosis, entropy (ENN), self-entropy (SENN), and conditional entropy (CENN) values associated with NNs as new features. Due to the nonlinearity underlying the dynamics of HRV, we also extracted the vector angular index (VAI), the vector length index (VLI), SD1, and SD2 out of Poincare map-a scatter plot of NN at time t in terms of NN at time t + 1 45 . These features were calculated according 1 to 4: where θ i denotes the angle of the i th scatter point with the x-axis. l i and L indicate the distance of the i th point to the origin and mean of distances of all B points to the origin, respectively.

Frequency-domain HRV parameters.
Frequency-domain analysis was carried out based on the estimation of power spectral density (PSD) of the SCG and GCG signals, where the oscillation power of very-low-frequency (VLF), low-frequency (LF), and high-frequency (HF) bands were extracted as frequency-domain HRV parameters. It was shown that parasympathetic activities are manifested in HF (0.15-0.4 Hz), whereas sympathetic activities belong to the LF (0.04-0.15 Hz) as well as VLF (0.0033-0.04 Hz) ranges 46 . In addition to the mentioned features, the total power of PSD was calculated as an additional feature.

GCG timing intervals.
A few timing intervals describing the cardiac system were calculated for the GCG signal. It was demonstrated that the isovolumetric contraction time (IVCT), isovolumetric relaxation time (IVRT), and left ventricular ejection time (LVET) could be estimated using J-I, L-K, and K-J 33 . Furthermore, we investigated whether MA, P, and its corresponding intervals have any impact on AS detection. Other parameters such as the intervals between each pair of the fiducial points depicted in Fig. 1 along with their mean, median, standard deviation, skewness, and kurtosis values were extracted as auxiliary features. The logic behind such an exhaustive feature extraction is to characterize the most relevant GCG timing intervals resulting in the highest accuracy of AS diagnosis. The extracted features are summarized in Table 2.
Training and hyperparameter optimization. Figure 6 shows the general schematic of the proposed machine learning framework including feature engineering, data split, training, and feature selection. As demonstrated in this figure, AS detection was performed in two different cases: subject-level and chunk-level. The former implies using each subject as a single sample, whereas the latter suggests each signal chunk as a sample for training the predictive models. In the chunk-level feature space, frequency-domain features were avoided since the chunk size is not long enough to accurately calculate the spectral parameters, while the whole signal could be used to measure the spectral features. Regardless of the scenario, the entire feature space was split into two parts, training (80%) and test (20%) datasets. Following the data split, we trained the predictive models, where the hyperparameters were optimized through leave-one-out 10-fold cross-validation (10-CV). For subject-level 10-CV, 0.1 of the subjects were held out at each fold, and the model continued to be optimized using the rest 0.9 held-in subjects. However, in the chunk-level 10-CV training, 0.1 of the total chunks were held out. It should Classification techniques and evaluation metrics. Given the two datasets, we used four machine learning techniques for the diagnosis of AS: decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM) 47,48 . These classifiers are widely employed in a variety of biomedical applications including opioid patients monitoring 49 , heart failure prediction 50 , and cancer prognosis 51 due to their robustness. During the 10-CV, the hyperparameters of each model were tuned in terms of performance. Table 3 presents the parameters with respect to which the ML models were optimized. For instance, the DT model was optimized in terms of the maximum depth of the tree, the minimum number of samples in a leaf, the minimum number of samples for splitting a node, the maximum number of features, and the criterion used for root selection. Similar Table 2. Feature space by group types. *Mean, median, standard deviation, skewness, kurtosis, entropy, min, and max were calculated for every parameter in this column.  Figure 6. Subject-level and chunk-level datasets. In the subject-level dataset, each subject is considered a sample. In the chunk-level dataset, every chunk represents a sample. The hyperparameters of ML classifiers are fine-tuned through a 10-fold cross-validation. The trained models are evaluated using the remaining 20% of datasets. The top features are ranked according to their contribution to the classifier output. www.nature.com/scientificreports/ parameters were optimized for RF and XGBoost. For the case of XGBoost, however, the model should also be tuned in terms of the learning rate, since training the XGBoost follows a gradient-based pattern. Furthermore, all predictive models were optimized in terms of class weights to tackle the data imbalance.
The following metrics were calculated to evaluate the performance of the classification algorithms: precision (PR), recall (RE), accuracy (AC), and F1-score (F1). AC is a simple metric that measures the accuracy of the model in prediction, and deals only with true predictions. However, it does not provide information for the cases where the model misclassifies a sample. PR and RE serve to deal with this problem by introducing false alarms and missed rates, respectively. As a combination of PR and RE, F1 offers a more comprehensive understanding of the performance, which was used for filter optimization as well.
Filter optimization. As explained in previous sections, we employed an RMS filter for motion artifact removal with length M (ms). It was also mentioned that we segmented the GCG signals into N-second windows. These parameters tend to be fixed to M = 500 ms and N = 10 s in the literature 22,52 , which are experimental numbers. We analyzed the effects of changing these parameters on the performance of XGBoost model. To this end, the performance metrics were assessed in terms of different values for M and N in Fig. 7, separately. Fig. 7a and b illustrate the performance for N = 10 s in terms of different values for M and M = 500 ms in terms of different values for N at the chunk-level analyses, respectively. The same characteristics are investigated in Fig. 7c and d for the subject-level. As depicted in these figures, suggesting optimum values for M and N is not a straightforward process. For instance, Fig. 7a suggests N = 10 s and M = 500 ms for the maximum F1-score, whereas in Fig. 7b in order to achieve the best performance, N should hold 18 s, which in turn implies that these parameters need to be tuned at the same time. We, therefore, defined an optimization problem to simultaneously optimize the values of M and N. The optimization problem is defined as follows:  www.nature.com/scientificreports/ where M * and N * indicate the optimum values resulting in the highest possible F1-score. To solve the optimization problem, the Bayesian optimization method is employed. This optimization method introduces a surrogate for the cost function and measures its uncertainty by a Bayesian learning technique and a Gaussian process regression. It then defines an acquisition function from the surrogate to determine the sampling locations and find the optimum values 53 . In this way, an end-to-end training procedure results in the optimized objective function and its corresponding filter parameters.

Experimental results and discussion
In this section, the experimental results are discussed and compared with the literature.
Datasets, features, and filter optimization. As mentioned earlier, two datasets were provided. The subject-level dataset included 45 subjects, where 36 of them were used to train the models, and the remaining 9 were held out for the test datasets. The held-out group consists of two mild AS, one moderate AS, two severe AS, and four healthy individuals. Out of 36 training subjects, at each fold of 10-CV, 32 subjects were held in for training and 4 subjects were held out for hyperparameter optimization. Per each subject, mean, median, standard deviation, skewness, kurtosis, entropy, min, and max (8 features) were calculated for every 11 GCG features tabulated in Table 2. Thus, the total number of GCG features was 11OE8 = 88. Furthermore, 5 frequency-domain HRV parameters were extracted for each channel axis (6 axes) per subject, amounting to 6OE5=30 features. Besides, 15 time-domain HRV parameters were extracted out of NNs for each subject. Therefore, the total number of extracted features for each sample in the subject-level feature space was 123. Chunk-level feature space included 1272 chunks, out of which 88 GCG timing parameters as well as 15 time-domain HRVs (total of 103 features) were calculated. For training the chunk-level dataset, 255 and 1017 chunks were used for test and training, respectively. Subsequently, 102 chunks were held out for 10-CV hyperparameter optimization, whereas 915 chunks were held in for fine-tuning the model. The RMS filter length and chunk window length were optimized by the Bayesian optimization technique through the process of training. It should be noted that these parameters were optimized by involving the training dataset, but not the test dataset. The optimum values achieved for filter parameters were M * =1582 ms and N * =11.2 s. The RMS filter length is three times the value proposed in the literature, i.e., M =500 ms, whereas the length of the chunk is almost consistent with the literature.
Feature selection for models. We conducted both the Shapley Additive exPlanations (SHAP) technique, a game theory-based approach for interpreting the output of a model in terms of the input feature space, and the feature-importance f-score from XGBoost to provide an explainable model 54 . SHAP is employed as it calculates the feature importance in terms of the impact of every single observation on the output performance, whereas the f-score from XGBoost represents the importance of a feature for the whole dataset. The score for each feature in SHAP method is calculated by the following formula: where α i , γ , i, and Ŵ denote the score value of the i th feature, any subset of feature space excluding the i th feature, i th feature, and the entire feature space, respectively. F1 also represents the F1-score, where [F1(γ ∪ i) − F1(γ )] indicates the difference of F1-score resulting from incorporating the i th feature within the feature space. For every feature, α i is calculated for all samples in the training dataset and the resulting values are averaged to obtain the feature score. F-score by XGBoost, on the other hand, evaluates each feature based on its impact on the output for all samples at once. Thus, SHAP values validate the feature ranking obtained by XGBoost feature importance and vice versa. That is, if the scores from both cases demonstrated an agreement, one can conclude the model is robust enough for the prediction task. In the following sections, we assess the importance of the features through both XGBoost and SHAP values.
Performance evaluation. After training and hyperparameter optimization, the performance of the proposed method is evaluated by applying the model on the test dataset to predict the existence of AS. Next, by comparing the predicted values and the true labels, the performance of the method is reported using metrics introduced in the previous section. Table 4 summarizes the performance results for the two datasets. The optimized method was reported by 100% accuracy, 100% F1-score, 100.00% precision, and 100.00% recall for DT, XGBoost, and SVM at the subject-level. As reported, DT outperforms RF, indicating 77.78% accuracy and 87.50% F1-score for the models with literature-based parameters ( M = 500 ms and N = 10 s). Similarly, DT suggests a higher F1-score in comparison to RF, i.e., 100.00% vs. 80.00%, for the models with optimized parameters ( M * = 1582 ms and N * = 11.2 s). Having a closer look at the performance of the models with optimized parameters and literature-based ones at subject-level analyses, the accuracy and F1-score of DT were improved, respectively, by 22.22% and 12.50%. Furthermore, XGBoost shows 85.71% F1-score, which is the highest performance among the models with literature-based parameters at the subject level. Yet, XGBoost with the optimized filter parameters implies 100.00% of accuracy and F1-score at the subject level. Such an improvement demonstrates the functionality of filter optimization. According to the optimized chunk-level analyses in Table 4, XGBoost and RF outperform other methods, suggesting 96.49% and 90.34% of F1-score, respectively. These values offer   Figure 8 depicts the top 20 features for both chunk and subject levels. Figure 8a and b illustrate the top 20 features achieved by XGBoost f-score and SHAP values for subject-level analyses, respectively. Figure 8c and d show the same order for chunk-level analyses. Among the top 20 features of subject-level analyses, eighteen features (90.00%) are consistent between the two criteria of importance, whereas seventeen features (85.00%) are reported as the common ones between the two methods at the subject level. According to Fig. 8b, among the top features, sixteen belong to GCG morphological features, three to time-domain HRV, and one to frequencydomain HRV parameters, demonstrating the higher representativeness of GCG features in comparison to others. Moreover, the highest occurrences belong to MA and P-point-included intervals, each with five features among the top-20's. From the time-domain HRV parameters, NN shows the highest correlation with AS, whereas the only feature is the total power of SCG Z from the frequency-domain HRV space. For the chunk-level analyses, nineteen features belong to the GCG time-domain features, whereas only one feature, i.e., NN, belongs to the time-domain HRV feature space. Among the top GCG features, MA, P-J, and L-J repeat three times, suggesting the highest frequencies among the top 20 features. Furthermore, the top features presented in Fig. 8b and d almost equally contribute to making a decision on AS and healthy states (blue and red bars), denoting that the top features are representative of both classes.
Ablation study on feature space. An ablation study was conducted on feature spaces to investigate the predictive performance of AS diagnosis. XGBoost was used in this section due to its superior performance on both datasets. The optimized parameters were used for both subject-level and chunk-level classification tasks. Three feature spaces for the subject-level and two feature spaces for the chunk-level (frequency-domain HRV parameters are not included for chunk-level) datasets were assessed. Thus, the performance of the model was evaluated for GCG time-domain parameters, time-domain HRV, and frequency-domain HRV parameters separately. The results are summarized in Table 5. As shown for the chunk-level study, the best performance belongs to GCG timing intervals (95.19% F1-score), whereas the weakest performance is achieved by time-domain HRV parameters (74.56% F1-score). Furthermore, frequency-domain HRV introduces more relevant information for AS detection compared to time-domain HRV features at the subject-level (80.00% vs. 60.00%). This implies that frequency-domain HRV parameters carry more information regarding AS than the time-domain HRV parameters (accuracy: 66.67% vs. 55.56%).
Furthermore, the top five features for the datasets in the ablation study are extracted and shown in Table 6. According to the time-domain HRV parameters listed for chunk-level and subject-level studies, NN, SD1, and SD1/SD2 contribute more than others to AS diagnosis, whereas the top features from frequency-domain HRV features include the total power, LF/HF ratio, and HF power.Interestingly, three GCG features out of five in both cases are associated with MA and the intervals involving the P-point. Also, IVRT is ranked twice as the best feature for the subject level, showing the importance of this feature. www.nature.com/scientificreports/ Classification of severity level based on per-patient basis. In this section, the robustness of the proposed method is evaluated for diagnosing the severity level of aortic stenosis on a per-patient basis. For this purpose, a 4-class classification was conducted to observe the failure modes of the proposed method. The classification involves the healthy group as well as mild, moderate, and severe aortic stenosis cases. For a 4-class classification, we augmented the size of the dataset by considering 50% of overlap between every two consecutive signal chunks. This practice helps the predictive model to better be generalized to the test dataset. As a result, a total of 2336 chunks were generated, of which 1868 and 468 were used for training and test datasets, respectively. 90% and 85% consistencies are found between feature importance and SHAP values at subject-level and chunklevel, respectively. www.nature.com/scientificreports/ The 4-class classification results suggest 92.72% precision, 91.95% recall, 93.80% accuracy, and 92.29% F1-score. As for the F1-score, the performance of the 4-class classification has been slightly degraded compared to the binary classification for aortic stenosis (92.29% vs. 96.49%). This small difference has occurred due to the small size of the dataset with respect to the number of classes (4 classes). To investigate the failure modes, the classification results are summarized in a confusion matrix depicted in Fig. 9. As illustrated in this figure, 98.16% of the healthy group were classified correctly, whereas the remaining 1.84% were classified as mild cases. According to the confusion matrix, the mild, moderate, and severe cases were classified correctly with the rates of 80.77%, 92.93%, and 95.95%, respectively. These results suggest that severe patients' recordings hold distinctive characteristics which serve to distinguish them from other cases. The 4.05% misclassified severe cases were reported as moderate cases. This misclassification has happened due to the morphological similarities between moderate and severe cases. Interestingly, a severe case was never misclassified as a healthy case or mild AS, which demonstrates the robustness of the proposed feature space with respect to the number of classes. As for the mild severity, 8.97% and 10.26% were classified as healthy and moderate cases, respectively, which could have been caused by either the size of the training dataset, or the similar morphological characteristics of mild AS with healthy and moderate cases. As for the moderate cases, a small aggregate of 7.07% was misclassified as either mild or severe, which proves the practicality of the proposed framework. Having considered the aforementioned points, it is concluded that the proposed method offers high reliability in detecting and classifying AS. Similar to AS detection task, the top features for the 4-class classification were obtained through the feature importance and the SHAP value methods, as summarized in Fig. 10a and b, respectively. By comparing the two feature sets, a high consistency can be observed in 16 common features, most of which are from the GCG timing intervals. Fig. 10b also contributes to interpreting the impact of each feature on the predicted classes. As shown, 5 MA-related features are among the top features, which demonstrates their importance in classifying severity level. According to Fig. 10b, MA_mean, IVRT_median, SDNN, and PJ_skewness indicate higher consistencies with severe cases than other features, which can be considered for future studies on AS. Certain features such as SDNN, MA_std, IVCT_std, and IVCT_median contribute to both healthy and mild classes equally, which may explain the 8.97% of the mild cases misclassified as healthy.
Comparison with literature. AS detection is considered only in a limited number of research works, and this motivates the authors to compare the proposed framework with the works addressing heart disease detection/classification through SCG/GCG modalities. Therefore, the proposed method is compared with other meth-  21 . However, our method predicts AS without any auxiliary signals, whereas in 21 , the ECG signal was used for feature extraction. Furthermore, the proposed framework suggests an analogous or higher F1-score, i.e., 0.96, and recall, i.e., 0.97, in comparison to the methods for detection of other CVDs. Furthermore, the proposed framework goes further beyond the disease detection by offering explainable modeling using SHAP values and feature importance from XGBoost to extract the physical meaning of the readings. To recapitulate, one could appreciate the proposed method as an accurate, computationally-efficient, and interpretable approach for AS detection which contributes to low-cost wearable sensing systems.

Conclusions and future work
This paper reports on the design and development of a novel reference-less framework for the detection of aortic stenosis based on SCG and GCG morphological characteristics and HRV parameters. The model is optimized in terms of filter design, and two groups of datasets are prepared at the subject and chunk levels. Furthermore, new parameters namely MA and P-included intervals are also introduced and shown to have higher consistencies with AS among the top features ranked by SHAP values and f-scores by XGBoost. Other features, such as timedomain and frequency-domain HRV parameters, are also extracted. However, a low correlation is demonstrated between HRV parameters and AS. On the contrary, ML models trained on the GCG timing intervals perfectly discriminate the AS cohort from the non-AS group. The most accurate ML model for both datasets is XGBoost, where F1-scores of 100.00% and 96.49% are reported for subject-level and chunk-level analyses, respectively. It is shown that the proposed optimized-filter design is suitable at both the subject-level and chunk-level settings, driving our methods to outperform previous works in the literature. Finally, the proposed framework was demonstrated to be robust enough for classifying the severity level of AS by offering 93.80% and 92.29% accuracy and F1-score, respectively. In this work, data were collected from two groups of AS and non-AS subjects at senior ages. Future work involves a larger number of subjects by including subjects at younger ages. Also, due to the close relationship between AS and GCG parameters, there is promise in tracking the progress and severity of aortic stenosis at different stages using gyroscopic parameters.