Heart rate variability helps classify phenotype in systemic sclerosis

We aimed to develop a systemic sclerosis (SSc) subtypes classifier tool to be used at the patient’s bedside. We compared the heart rate variability (HRV) at rest (5-min) and in response to orthostatism (5-min) of patients (n = 58) having diffuse (n = 16, dcSSc) and limited (n = 38, lcSSc) cutaneous forms. The HRV was evaluated from the beat-to-beat RR intervals in time-, frequency-, and nonlinear-domains. The dcSSc group differed from the lcSSc group mainly by a higher heart rate (HR) and a lower HRV, in decubitus and orthostatism conditions. Stand-up maneuver lowered HR standard deviation (sd_HR), the major axis length of the fitted ellipse of Poincaré plot of RR intervals (SD2), and the correlation dimension (CorDim) in the dcSSc group while increased these HRV indexes in the lcSSc group (p = 0.004, p = 0.002, and p = 0.004, respectively). We identified the 5 most informative and discriminant HRV variables. We then compared 341 classifying models (1 to 5 variables combinations × 11 classifier algorithms) according to mean squared error, logloss, sensitivity, specificity, precision, accuracy, area under curve of the ROC-curves and F1-score. F1-score ranged from 0.823 for the best 1-variable model to a maximum of 0.947 for the 4-variables best model. Most specific and precise models included sd_HR, SD2, and CorDim. In conclusion, we provided high performance classifying models able to distinguish diffuse from limited cutaneous SSc subtypes easy to perform at the bedside from ECG recording. Models were based on 1 to 5 HRV indexes used as nonlinear markers of autonomic integrated influences on cardiac activity.


Heart rate variability study
The subjects were instructed to refrain from exercising, drinking alcohol, coffee or tea, and sleep deprivation for at least 24 h before their participation.The day of the HRV study, room experiment was quiet and its temperature kept constant.Subjects were asked to lie down in the examination bed and to relax with free breathing.Three lead electrocardiogram (ECG) was acquired (ECG100C, UIM100C, and MP150A-CE Biopac equipment, BIOPAC Systems Inc., Santa Barbara, CA, USA) during 15 min in supine resting condition and then during 10 min in orthostatic resting condition after stand-up.The data were digitized and stored on a personal computer for further analysis.Two periods of 5 min were extracted and analyzed: (1) 5 min of supine rest (Decubitus) after at least 5 min of stable ECG signal, and (2) 5 min of orthostatic rest (Orthostatism) starting immediately after stand-up phase (< 10 s) noise stabilization.Signals were conditioned and processed as follow and previously described 10 .Raw signals were sampled at 1000 Hz and digitized with a 24-bit analog-to-digital converter.Non-evident non-stationarity, such as very slow drifting of the mean or sudden changes of the variance, was observed after 3-order polynomial detrending.No additional filtering technique such as integration was used.Pan and Tompkins real-time QRS detection algorithm was used to automatically detect R-waves and build the RRI series.Tachograms were then firstly visually checked and corrected for artifacts (including ectopic beat management, i.e., detection, cancelation, and interpolation) when necessary.Then, all cardiovascular data analysis was performed from the 5 min beat-to-beat time series constituted from inter-beat intervals from ECG R waves, i.e., the RR intervals (RRI) that were extracted from raw signals.Discrete original RRI series were resampled by cubic spline interpolation with a 4 Hz sampling rate to generate equidistantly sampled time series x(i), i = 1, 2, …, N. The HRV was assessed according to the consensual standards 13 , using Kubios software (Kubios HRV, 2.1, Biosignal Analysis and Medical Imaging Group, Kuopio, Finland), computing HRV metrics in Decubitus (_S1) as well as in Orthostatism (_S2).We also assessed stand-up induced HRV changes (_delta) calculating the difference between orthostatic and decubitus values for each variable.

Time-domain analysis of HRV
Several classical metrics were used as indexes of the total variability that arises from both random and periodic sources 14 , including: the mean RRI (mean_RR) and mean HR (mean_HR), the standard deviation of the RRI (sd_RR) and of the HR (sd_HR), the square root of the mean squared differences between adjacent normal RRI (RMSSD), the count of successive normal beat lengths that differed more than 50 ms (NN50), and the percentage of successive normal beat lengths that differed more than 50 ms (pNN50).

Frequency-domain analysis of HRV
Different metrics were used to characterize the periodic oscillations of the studied time series using the estimated power spectrum by fast Fourier transform Welch's periodogram technique.These metrics included the centroid frequency (expressed in Hz) and power (_power, expressed in ms 2 ) in 2 frequency bands of interest: high frequencies (HF, 0.15-0.4Hz) and low frequencies (LF, 0.04-0.15Hz).The total power (tot_power) and LH_power/HF_power ratio (LF_HF_power) were also computed.The HF_power and LF_power were also expressed as the percentage of tot_power (HF_power_perc and LF_power_perc, respectively) and normalized units (HF_power_nu and LF_power_nu, respectively).These variables were said [13][14][15] to be indexes of the total variability of the heart rate (tot_power), the modulation of the sinus node activity by the parasympathetic component of the ANS (HF_power_nu), the modulation of the sinus node activity by both the parasympathetic and sympathetic components of the ANS (LF_power_nu), the influence of the temperature, and the sympathetic/ parasympathetic balance (LF_HF_power).

Nonlinear-domain analysis of HRV
A set of metrics are known to catch and highlight non-linear properties of the studied time series.We plotted the RRI of rank n + 1 as a function of the RRI of rank n (lag 1 Poincaré plot) and calculated its SD1, SD2 and SD1/ SD2 defined as the standard deviation of the instantaneous beat-to-beat RRI variability (minor axis of the fitted ellipse), the standard deviation of the continuous long-term RRI variability (major axis of the fitted ellipse) and the axis ratio, respectively 16 .They were respectively used as non-linear indexes of (1) the rapid changes in the RRI and thus of the parasympathetic sinus node control 17 , (2) the effects of both parasympathetic and sympathetic components on the sinus node activity 18 , and (3) the relationship between these components, which is the ratio of the short interval variation to the long interval variation.We also used other non-linear tools to characterize the RRI dynamics, as previously described 10 .Recurrence plot analysis quantification (minimal line length (Lmin), mean line length (Lmean), maximal line length (Lmax), divergence (DIV), recurrence rate (REC), determinism (DET) and Shannon entropy (ShanEn)) was performed with embedding dimension, lag, and threshold distance set to m = 10, τ = 1, and r = √ mSD of the RRI time series analyzed, respectively.We computed detrended fluc- tuation analysis α1 and α2 coefficients (alpha1 and alpha2, respectively) with segment length set to n ∈ (4,16) and n ∈ (16,64), respectively.Approximate Entropy (ApEn) and Sample Entropy (SampEn) estimates of each RRI time series were computed with m (embedding dimension) and r (filtering level) set to 2 and 0.2 SD of the RRI time series analyzed, respectively.Finally, we also estimated correlation dimension (CorDim) of the RRI times series 19 .Indeed, first, non-linear deterministic measures of heartbeats better detect the autonomic changes related to mortality than the stochastic indexes; and second, what the time-dependent non-linear metrics show as indicators of cardiac vulnerability to lethal arrhythmias are transient non-stationary shifts of dimension of the heartbeat dynamics to a low value 20 .CorDim quantified the time self-similarity of a signal, i.e., of the RRI time series.CorDim can be considered as an estimation of the lower threshold of the number of degrees of freedom for the underlying system generating the observed data and is typically used as an index of the overall complexity of the system dynamics estimated from a time series 21 .CorDim was computed as described by Grassberger and Procaccia 22 with r = 15% from the standard deviation of the RRI.Accordingly, the first step is to construct the correlation integral (N,m,r) function.The correlation integral counts the fraction of pairs (Xi, Xj) whose distance is smaller than r and is defined as: where Xi and Xj represent phase-space trajectory points, N represents the total amount of phase-space points, and H represents the Heaviside step function, i.e., H(α) = 0 if α < 0 and H(α) = 1 if α ≥ 0. D2 is subsequently computed as:

Outcomes
The main outcome was the ability to express the SSc subtype as a function of HRV, i.e., to classify the SSc clinical form as diffuse or limited of a patient using heart rate dynamics metrics as markers.Secondary outcomes were the link between HRV metrics and SSc skin extent, and between HRV metrics and lung function.

Statistical analysis
All statistical computations were performed with Python 3.10 LTS.

Descriptive statistics
The quantitative variables were first tested for normality distribution with the Shapiro-Wilk normality test.Accordingly, means or medians groups' differences (i.e., dcSSc vs lcSSc) were tested performing independent t-tests, paired t-tests, or Wilcoxon-Mann-Whitney tests when necessary for continuous variables, and Wilcoxon-Mann-Whitney or Kruskal-Wallis tests for ordinal variables.The qualitative categorical variables were assessed through modalities headcounts and groups' differences were tested performing Chi-square or Fisher exact tests.Linear correlations between quantitative variables were estimated computing Pearson's and Spearman's correlation coefficients when necessary and hypothesis of correlation was tested performing a t-test.A p-value < 0.05 was considered as statistically significant, nevertheless all variables having an unadjusted univariate p-value < 0.1 (n = 33) were considered for feature engineering.Furthermore, specifically for HRV analysis, and with a dual objective of simplifying the interpretation, particularly in terms of physiological outcomes, and limiting the expansion of type I error, a two-way repeated measures ANOVA analysis with Bonferroni correction was employed.The first factor considered was the clinical form (dcSSc or lcSSc), while the second factor was the position (decubitus or orthostatism).Non-parametric ANOVA (Kruskal-Wallis) was conducted when necessary.For secondary outcomes, simple and multiple linear regressions were performed to predict quantitative variables while simple and multiple logistic regressions were performed to predict bimodal categorical variables.Ordinary Least Squares optimizing method were used in both cases.Significance was set to p-value < 0.05.

Features engineering
To limit effects of collinearity on features extraction procedure, we identified all pairs of variables being highly correlated (R 2 > 0.9) and, according to their physiological explainability, excluded the less appropriate variables (n = 9).Quantitative data were then normalized (min-max scaling) for subsequent steps.On one hand, five features selection algorithms (Random Forest, Extra Tree Classifier, Select K Best, Generic Univariate Selection, and Recursive Feature Elimination) were used to rank the 24 selected HRV variables.Ranking was performed on variable importance (explained variance) through a homemade redundancy weighted score taking into account, for each variable, the rank attributed by each algorithm and the dispersion of the rank attributed through the five algorithms.On the other hand, the gamma analysis technique was used to rank the most inter-groups discriminative variables 23,24 .

Models and classification
We used the Synthetic Minority Over-sampling Technique (SMOTE) to manage unbalanced groups' size to avoid any bias in models' learning and classification processes.To limit the number of candidate models and their complexity in order to be compatible with a physiological explainability and a bedside application, we generated only combinations of 1 to 5 variables among the 24 selected HRV variables, i. e. we generated i=5 i=1 C 5 i = 31 combinations among the i=5 i=1 C 24 i = 55454 possible combinations.From the 31 combinations of the most discriminative variables generated, 341 classifying models were generated applying on each combination 11 machine learning algorithms that were: logistic regression, decision tree, random forest, linear support vector machine (SVM), radial based function SVM, neuronal network, gradient boosting machine, stacked ensembles, extremely randomized trees, k nearest neighbor, and extreme gradient boosting.Database was randomly split into training (80% of sample size) and validation (20% of sample size) sets.Performances of the 341 classifying models for SSc subtypes classification purposes, i.e., dcSSc or lcSSc, were assessed according to 8 usual metrics that are: mean squared error, logloss, sensitivity, specificity, precision, accuracy, area under curve of the ROC-curves and F1-score.The cross validation was performed with k-fold = 5 and repeated 10 times, providing information on how well a classifier generalizes for test data and further classifications, in particular the expected error range of the classifier thanks to the cross-validation score.www.nature.com/scientificreports/

Ethical approval
The study was approved by the Ethical Committee of Marseille University Hospital and was conducted in accordance with the Declaration of Helsinki.Each subject provided written consent for data analysis.Agreement number is PADS19-352/WZTVF7.

Population description
Between January 01, 2018, and December 31, 2021, 60 patients having SSc and assessed for dysautonomia were enrolled.16 had dcSSc and 42 lcSSc clinical forms according to the internist experts.Two were excluded because of missing data.General characteristics of the studied population and each subgroup are summarized in Table 1.By definition and as expected, the two groups differed clinically mainly in the higher proportion of patients with severe symptoms in the dcSSc group compared with the lcSSc group, as observed through the higher Rodnan (t-statistic = 8.885, df = 20, p < 0.001) and Medsger (U-statistic = 626.5,p < 0.001) scores, and through the presence of sphincter atonia (χ 2 -statistic = 26.72,df = 2, p = 0.001) and nonspecific interstitial pneumonia (χ 2 -statistic = 44.86,df = 2, p = 0.031).Similarly, esophageal peristaltism dysfunction and diffuse parenchymal lung disease tended to be more frequent in the dcSSc group.As also expected, the two groups biologically differed mainly in the higher proportion of patients with anti-Scl-70 (χ 2 -statistic = 209.56,df = 2, p < 0.001) and anti-SSA (χ 2 -statistic = 20.5, df = 2, p = 0.001) antibodies in the dcSSc group compared with the lcSSc group that had higher proportion of patients with anti-centromere antibodies (χ 2 -= 14.05, df = 2, p = 0.001).

Lung function
Respiratory parameters measured during lung function test and their interpretation as functional syndromes are summarized in Table 2.As expected, in terms of lung function, the two groups differed only in the higher proportion of patients with restrictive lung disease in the dcSSc group compared with the lcSSc group (χ 2statistic = 26.74,df = 2, p = 0.001).

Autonomic influences on sinus node activity
HRV indexes in decubitus then in orthostatism conditions as well as the stand-up induced HRV changes are summarized in Table 3.The two groups differed mainly in a higher HR in the dcSSc group compared with the lcSSc group (mean_HR, clinical form main effect, H-statistic = 11.13,p = 0.003), in decubitus and even more so in orthostatism position (mean_HR, interaction of main effects, H-statistic = 29.93,p < 0.001), HR (mean_HR) being inversely proportional to RRI duration (mean_RR).On the contrary, the differences in observed values for all other HRV markers, especially those derived from spectral and nonlinear analysis, appear to be less systematic.Except for ShanEn, which is similarly higher in the dcSSc group compared with the lcSSc group (ShanEn, clinical form main effect, F-statistic = 6.50, df = 1, p = 0.049) in both decubitus and orthostatic positions (no interaction effect), the other HRV markers differ from one group to another depending on the position considered.This differential effect of verticalization according to patients' group is highlighted through the interaction effect, as seen with LF_power, HF_power, or tot_power.Similarly, a higher length of the mean and the longest diagonal line of the recurrence plot was observed in the dcSSc group compared with the lcSSc group according to position (Lmax, interaction of main effects, H-statistic = 28.92,p < 0.001 and Lmean, interaction of main effects, H-statistic = 13.46,p = 0.011), Lmax being inversely linked to divergence (DIV).Finally, the two groups differed also in terms of HRV parameters changes induced by stand-up.Mainly, mean_HR increase was higher in dcSSc compared to lcSSc groups (t-statistic = 3.034, df = 24, p = 0.006) while all others time domain parameters were lowered during orthostatism and even more lowered in the dcSSc group than in the lcSSc group.Particularly, sd_HR was lowered in the dcSSc group while increased in the lcSSc group (t-statistic = − 3.329, df = 18, p = 0.004).Similarly, SD2 and CorDim were lowered in the dcSSc group while increased in the lcSSc group (t-statistic = − 3.586, df = 18, p = 0.002 and t-statistic = − 3.247, df = 20, p = 0.004, respectively).

Features importance
Collinearity and correlations between all variables that have been significantly different in the dcSSc group compared to the lcSSc group with a univariate approach are summarized in Figs. 1 and 2. We can note that many HRV indexes are positively or negatively correlated, and mostly of interest, some HRV parameters are correlated to clinical and respiratory quantitative parameters.Particularly, first, mean_HR_S2, SD2_delta, sd_HR_delta, SD2_delta and CorDim_delta are correlated with Rodnan and Medsger scores and with most of the respiratory function parameters, and second, most of the HRV parameters changes (delta) are correlated with clinical scores and lung function parameters.A synthetic view of features importance to describe data variance is proposed in Fig. 3. Features are ranked according two different metrics, the relative importance of the feature (explained data variance) and the gamma-metric (discriminant power).The first four most efficient features identically identified are SD2_delta, sd_HR_delta, CorDim_delta, and SD2_S2.The fifth important feature is either LF_power_S2 either mean_HR_S2 according to the used metric.As fifth feature we chose mean_HR_S2 because of a more physiological and measuring robustness.

Models' performances
Table 4 summarizes the performances of the classifying models generated including 1 to 5 selected variables.
Concerning sensibility and specificity we can note that models range from 0.87 to 1 and from 0.25 to 0.99 respectively.The most efficient variable taken alone and combining the best sensibility and specificity combination is CorDim with 1 and 0.99 values respectively.But on the other hand, precision, accuracy, and F1-score are the lowest of all tested models (0.749, 0.771, and 0.823 respectively).Besides, it should be noted that the proposed models, except for the one including only CorDim, appear to have a rather average sensitivity ranging from 0.25 to 0.81.Moreover and interestingly, we can note that F1-score that is an integrative metrics (harmonic mean of precision and accuracy) is not a linear function of the number of included variables.F1-score ranges from 0.823 for the best 1-variable model to a maximum of 0.947 for the 4-variables best model while 0.946 is reached by the 3-variables best model.If specificity and/or precision is an objective we can note that best models are two 3-variables and two 4-variables models including SD2_delta, SD2_S2, sd_HR_delta, and CorDim_delta.

Major findings
We aimed to provide classifying models of SSc subtypes, i.e., models that distinguish dcSSc from lcSSc clinical forms, from HRV linear and nonlinear metrics, and easy to compute at the patient's bedside.As expected, at rest in decubitus condition, only HR was clearly different in SSc subgroups, HR being higher in dcSSc.Transition  www.nature.com/scientificreports/from decubitus to orthostatism increased HR in all SSc patients.HR reached higher magnitude in dcSSc and was associated with a lower overall variability (tot_power).Stand-up-induced HRV changes revealed differential effects of orthostatism in SSc subtypes, effects that were mainly higher HR increase associated with (1) loss of time-domain variability and (2) loss of nonlinear proprieties of HRV (sd_RR, sd_HR, SD1 and SD2, ApEn, CorDim, and recurrence plot quantifying indexes) in dcSSc compared to lcSSc patients.Accordingly, efficient classifying models of SSc subtypes have been provided using from 1 to 4 HRV metrics as input variables reaching F1-score of overall performance up to almost 0.95.To resume, it's possible to distinguish dcSSc from lcSSc clinical forms at the patient's bedside with a simple digital ECG analysis.

Autonomic influences on sinus node activity
Time domain and frequency domain HRV metrics have been used to investigate the cardiovascular consequences of SSc since 1994 2 .On one hand, SSc is known to alter autonomic nervous system and its cardiovascular component 25 and on the other hand HRV has been considered to inform on autonomic nervous system modulation of sinus node rhythmic activity 13 .High frequency (around 0.25 Hz) being related to respiration via the parasympathetics and low frequency (around 0.1 Hz) to a sympathetic Eigen-oscillation of the baroreflex to blood pressure and heart rate system 26 .Typically, SSc is described as increasing resting HR through a predominant sympathetic activity and vagal withdrawal compared to healthy controls 9,[27][28][29] and as blunting cardiovascular response to orthostatism 9,29 .The HR and HRV we observed in dcSSc and lcSSc subtypes at rest and during orthostatism does not evidence a so clear phenotype-dependent sympathetic activation neither parasympathetic withdrawal.In decubitus position, when comparing dcSSc and lcSSc groups using multiple univariate unpaired t-tests as well as ANOVA, we found no evidence for any difference in autonomic modulation of sinus activity, as commonly accepted 13 .For 5-min RR intervals time series this usually includes some HRV spectral indexes that are LF_power (ms 2 ), LF_power_prc (%), LF_power_nu (n.u.), HF_power (ms 2 ), HF_power_prc (%), HF_power_nu (n.u.) and LF_HF_power.On the contrary, in dcSSc compared to lcSSc, we found that orthostatism was paradoxically associated with significantly lower low frequency spectral power (LF_power as explicated by the main effects interaction) while high frequency spectral power (HF_power as explicated by the main effects interaction) tends to be lower.Orthostatism is known to physiologically induce a temporary adaptative response increasing sympathetic and decreasing parasympathetic activities.Oppositely to previous descriptions 9,29 , our data do not reveal any major alteration in the sympathovagal response, as expressed by LF_prc, LF_nu, HF_prc, HF_nu and LF_HF ratio, to orthostatism.These normal responses are explicitely evident in Table 3 of the results.LF_power_nu and even more so HF_power, HF_power_prc, HF_power_nu, as well as LF_HF_power, are significantly modified during orthostatism compared to decubitus (position effect, p < 0.001).No clinical form effect (group effect) was observed but only main effects interactions (main effects interactions, p < 0.001).The effect of orthostatism was different according to clinical form.Yet, indexes from RRI time series spectral analysis are known to be markers of the autonomic nervous system influences on the www.nature.com/scientificreports/sinus node rhythmic activity.Particularly LF and HF are thought to reflect sympathovagal balance as well as sympathetic and parasympathetic sinus influences, respectively, when expressed in normalized units.According to first HRV guidelines 13 , one could interpret our results as a sympathovagal balance functionality.On the contrary, there are at least two factors that should prompt us to approach the interpretation more judiciously.Firstly, a recent and highly regarded review 30 emphasizes that scientific evidence in the literature supports a predominantly parasympathetic-determined Heart Rate Variability (HRV) when assessed through spectral analysis indexes.This implies that, based on the results of HFnu and LFnu, as well as others spectral analysis outcomes, it appears that parasympathetic-determined HRV does not play a major role in partitioning the clinical forms of SSc patients.Secondly, the mathematical constructs of normalized HRV spectral metrics can lead to a confusing interpretation, depending on whether LF and HF relatively vary with LFnu = LF/(TP − VLF) ≈ LF/(LF + HF) and HFnu = HF/(TP − VLF) ≈ HF/(LF + HF).According to these knowledges and our observations, it is difficult to interpret our results as a simple and unique parasympathetic blunted response (withdrawal) to transition from decubitus to orthostatic conditions.It seems that in orthostatism, sympathetic (LF_power (ms2), LF_power_prc (%)) modulation of sinus activity is lower in dcSSc compared to lcSSc that is in favor of an unpaired adaptation to orthostatism all the more so as it should increase.As LFnu = LF/(TP − VLF), this impairment seems to be supported by modifications impacting LF, TP, but also VLF that is said to reflect mid-and long-term reninangiotensin system, blood pressure regulation, thermoregulation, and others long-term humoral and hormonal activities 13 .And second, Poincaré plot short term variability index SD1 was not significantly different between dcSSc and lcSSc groups (no clinical form effect) but orthostatism had differential effects according the groups (main effects interactions, p < 0.002).SD1, that is the standard deviation of the plot data on the minor axis is mathematically equal to root mean square of successive differences of RRI duration described the instantaneous beat-to-beat variability that is neurally supported by the only vagal nerve 31 .Oppositely, Poincaré plot long term variability index SD2 that is said to reflect sympathetic and baroreflex HR modulation 32,33 was significantly different between dcSSc and lcSSc groups during orthostatism but decubitus (main effects interactions, p < 0.034).All these elements converge with the exclusion of a simple vagal-mediated impairment differential main effect between lcSSc and dcSSc, but mostly toward a baroreflex loop complex impairment.Probably for this reason, metrics of HRV linear dynamics lacked to clearly distinguish dcSSc and lcSSc subtypes.

HR nonlinear dynamics and system failure
Three metrics quantifying the nonlinear properties of HR dynamics have been found to be more deeply impacted in dcSSc vs. lcSSc groups by orthostatic stress according to stand-up-induced changes results: SD2, ApEn, and CorDim.SD2 that is the long/major axe of the elliptic fit of Poincaré plot is also said to reflect nonlinear longterm RR modulation 31,32 .ApEn reflects the complex structure of a time series and the complex behavior of the system generating the time series, i.e., the system complexity and its components interactions [34][35][36][37][38] .CorDim quantifies the time self-similarity of a signal, i.e., of the RRI time series.Thus, CorDim can be considered as an estimation of the lower threshold of the number of degrees of freedom for the underlying system generating the observed data and is typically used as an index of the overall complexity of the system dynamics estimated from a time series 21 .In healthy people, stand-up induces a loss of HRV because of the arterial baroreflex, an adaptative autonomic nervous system response associating sympathetic activation and parasympathetic withdrawal 39 .
But the complexity of a physiological system is increased when its components are interacting to adequate physiological response 34 .Taken together, the combined stand-up-induced changes of these three metrics that explore complementary nonlinear properties highlight that the cardiovascular homeostatic system is impaired in dcSSc but not in lcSSc.Stand-up stimulus should increase nonlinearity of HR dynamics, which is a wellness index of physiological systems.But in our study, stand-up decreased HR nonlinearity and complexity, which is an impairment and failure marker of dynamic systems including physiological complex systems 34 .Accordingly, our results converge to highlight that orthostatism stress lowers dramatically physiological flexibility (adaptability) in dcSSc patients compared to lcSSc.DcSSc are less adaptative to orthostatism than lcSSc.These results seem to be in accordance with previous data on baroreflex impairment and orthostatism intolerance in SSc patients.Normal vasoconstrictive response to an increase in venous transmural pressure is almost abolished in tissues with sclerodermic changes, probably due to sympathetic neuropathy 40 .Besides, advanced and fibrotic forms of SSc are associated with a blunted orthostatic stress response 9 .Finally, orthostatic stress intolerance is significantly associated with microvascular damage as assessed by videocapillaroscopy 41 .Moreover, in addition to the above-mentioned pathophysiological elements, we cannot exclude but have even to retain the potential role of the pulmonary abnormalities mainly observed in the dcSSc patients and characterizing restrictive lung disease in the loss of HRV nonlinear properties.It is well-known that lung expansion magnitude determines respiratory sinus arrythmia that is a major component of heart rate variability through vagal activity [42][43][44] .Now, we have precisely documented that dcSSc exhibited more frequently restrictive syndrome than lcSSc and that the presence of a restrictive syndrome was correlated with HRV differences at rest, during orthostatism, and changes induced by stand-up test.

SSc subtypes and SSc phenotyping
SSc phenotyping into dcSSc and lcSSc remains a daily challenge for practitioners as clinical form phenotype is a determinant of prognosis and care strategy of the disease 1 .In 2019, Sobanski et al. used 24 clinical and serological variables to cluster 6927 SSc patients in an elegant data analysis study 45 .The first 2 clusters overlapped the usual dcSSc and lcSSc subtypes but authors showed that these 2 clusters may not capture the complete heterogeneity of the disease, several other clusters being associated with differential survival rates.They concluded that organ damage should be taken into consideration.Among other criteria that could help SSc phenotyping, HRV has been previously studied [3][4][5][6]9 . Neertheless, no major difference had been previously robustly documented and no operational tool has been developed and tested for a real-life application.Through this study we provided SSc subtypes clinical forms classifying models based on HRV parameters usable at the patient's bedside.Several models are provided, more or less complex, but reasonably easy to compute for a daily use, with different performances allowing the clinician to choose the most appropriate for his medical purpose.The easiest models are that provided by our secondary outcomes.HRV can model SSc skin extent trough linear regressions and model respiratory function, mainly the presence of a restrictive syndrome, through logistic regressions.The first being part of the very definition of SSc clinical forms, the second being a highly known pathological status associated with dcSSc.Nevertheless, the efficient models (R 2 > 0.5) are quickly complex in terms of number of included variables and interpretability.As shown in Table 2, most of the lung function test variables were significatively different in lcSSc and dcSSc.But most of these variables are correlated and mostly were highly predictable by linear regression models (results not shown).Because of this data high collinearity, probably because of a causal link that could be a limited stretch reflex from the restrictive lungs (neither demonstrated nor documented in this work), we had to provide a specific methodology to extract the most informative variables that we performed through features engineering step.To our knowledge, we propose for the first time a whole functional test of cardiovascular function, usable at the patient's bedside that includes: (1) a 15-min (10 min decubitus, stan-up, 5 min orthostatism) 1-lead numerical ECG recording, (2) the estimation of 1 to 5 HRV metrics easy to compute, and (3) a simple classifying model using machine learning approaches.Locally, we have chosen the GBM 4-variables classifying model including SD2_delta, CorDim_delta, SD2_S2, and mean_HR_S2 HRV indexes as it has the best specificity and F1-score compromise.Nevertheless, the 1-variable model including the only CorDim has the best sensibility-specificity combination while its F1-score is the lowest (0.82) of the tested models that is, in data science an honorable performance.Finally, the choice of the model has to be driven by the underlying medical question and expectation.

Limits
Physiological interpretation A first potential limitation of this study is that we did not directly assess sympathetic and parasympathetic tones by neurography.Physiological and physio-pathological interpretation of the models is then not obvious.We used indirect indexes (spectral powers) from spectral analysis of RRI time series that are known to capture only linear properties of HR dynamics.Nevertheless, in stationary conditions, LF_power and HF_power, expressed in ms 2 as well as in normalized units, are largely thought to reflect sympathetic and parasympathetic modulatory effects on sinus node activity 13 .As previously discussed, compiling knowledge on effects of sympathetic and vagal activity on HRV suggests that the HRV power spectrum, including its low frequency component, is mainly determined by the parasympathetic system 30 .Thus, LF-power and HF_power are said to be of limited interest in some physiological conditions, especially during interacting processes that, by definition, lead to nonlinear behavior 46,47 .
In relation to the complexity of the sinus node activity modulation system, predominantly nonlinear behavior must be assumed, explored, and interpreted.

Length of time series
A second potential limitation of this study is the number of points constituting the analyzed time series.On one hand, the length of RRI time series used to perform linear time-and frequency-domains HRV analysis is usually set to 5 min 13 .The value of the scaling exponent is mostly defined by the dynamics of the short-time variability.
On the other hand, and oppositely, exploring nonlinear dynamics as the dimensionality of the space spanned by the data, particularly when using CorDim, requires long time series to be reliably computed 48 .Time series are supposed to include 10 CorDim data points, i.e., around 10,000 data points should be used.Using only and approximately 400 data points per time series, we can't consider that CorDim we computed represent robustly and reliably the whole concept and the underlying properties defined by the correlation dimension.But decidedly, the computation we performed (that of CorDim, i.e., the correlation sum) on 400 data points time series led to a metrics that was statistically characterized as sensitive, reliable, stable and discriminant marker.Accordingly, we then highlight that the interpretation of the time behavior of the RRI time series and HR dynamics should be made with caution: The CorDim changes we measured are not necessarily the results of the changes in the scaling behavior of the heart rate dynamics.Specific studies should test this hypothesis.

Design of the study
A third limitation of the study could be the necessary precaution to take before generalizing these results.Most of the previous studies were designed as sex and age matched groups analysis.On the contrary, we have taken 60 consecutive patients and did not perform specific matching.As shown in Table 1, no age or sex difference appeared but height was significantly different in dcSSc and lcSSc.Three points needs to be addressed to consolidate conclusions: even if our sample size was sufficient for data analysis and to provide, in fine, a validated set of models, sample size has not been calculated on all variables of interest of the study that has not been designed accordingly; no matching has been made on any variable; and no early-stage disease group has been identified since all patient had already their diagnosis.Ormea et al. 25 showed that autonomic degeneration was independent of tissue and vascular alterations, as severe nerve damage was already present in apparently healthy skin.He suggested that ANS modification takes place before vascular damage and tissue fibrosis occur.Moreover, Ormea stated that ANS dysfunction is a main factor in the development of the disease.Accordingly, further studies on a larger dataset are needed to confirm and generalize our first results, particularly with the aim to use the classifying models as a predicting tool when used at the early-stage of the disease. https://doi.org/10.1038/s41598-024-60553-1

Figure 1 .
Figure1.Correlations between heart rate variability metrics.The HRV metrics that were significantly different (p < 0.05) in the dcSSc and lcSSc subgroups were tested for correlation and are represented in the correlation matrix.Correlations between HRV metrics from time-, frequency-, and nonlinear-domains measured in supine (_S1) as orthostatism (_S2) conditions and HRV metrics stand-up induced changes (_delta) are reported.R 2 is color-coded from -1 (dark red) to + 1 (dark blue) when correlation was significant (p < 0.05).

Figure 2 .
Figure 2. Correlations between clinical features, heart rate variability and lung function metrics.Correlations between HRV metrics from time-, frequency-, and nonlinear-domains measured in supine (_S1) as orthostatism (_S2) conditions and HRV metrics stand-up induced changes (_delta), and clinical features as well lung function metrics are reported.R 2 is color-coded from -1 (dark red) to + 1 (dark blue) when correlation was significant (p < 0.05).

Figure 3 .
Figure 3. Features importance.The features importance of heart rate variability metrics is represented according to two different features importance ranking techniques: relative importance (top histograms), and gamma-metric (bottom histograms).Features importance of HRV metrics from time-, frequency-, and nonlinear-domains measured in supine (_S1) as orthostatism (_S2) conditions and HRV metrics stand-up induced changes (_delta) was assessed.The four most important features (red arrows) were: SD2_delta, stand-up induced change of the major axis length of the fitted ellipse on RR intervals Poincaré plot; sd_HR_ delta, stand-up induced change of the heart rate standard deviation; CorDim_delta, stand-up induced change of the correlation dimension; SD2_S2, major axis length of the fitted ellipse on RR intervals Poincaré plot in orthostatism condition.

Table 1 .
General characteristics of the studied population.Values are expressed as mean ± standard deviation or modality size (percentage of the whole population) for quantitative and categorical variables, respectively.

Table 3 .
Heart rate variability of the studied population.Values are expressed as mean ± standard deviation.Diffuse (dcSSc) and located (lcSSc) cutaneous systemic sclerosis are described for decubitus and orthostatism and adjusted (Bonferroni's correction) p-values of the clinical form and position main effects as their interaction are reported.Stand-up-induced changes of each group are also reported.P < 0.05 was considered as significant (bold).Unadjusted p-values<0.1 from univariate analysis ( • mark) was used as threshold for further analysis.Time domain variables are: mean_RR, mean RR intervals duration; sd_RR, standard deviation of RR intervals duration; mean_HR, mean heart rate; sd_HR, standard deviation of heart rate; RMSSD, root mean square of successive differences of RR intervals duration; NN50, count of successive RR intervals duration differing from more than 50 ms; pNN50, percentage of RR intervals duration differing from more than 50 ms.

Table 4 .
Performances of classifying models.SSc subtypes classifying models including 1 to 5 most informative and discriminant HRV variables are presented.Variables used are: SD2_delta, stand-up induced change of the major axis length of the fitted ellipse on RR intervals Poincaré plot; sd_HR_delta, stand-up induced change of the heart rate standard deviation; CorDim_delta, stand-up induced change of the correlation dimension; SD2_S2, major axis length of the fitted ellipse on RR intervals Poincaré plot in orthostatism condition; mean_HR_S2, mean heart rate in orthostatism condition.Classifier algorithms used were neuronal networks; stacked ensembles; and GBM, gradient boosting machines.Models' performances were assessed through MSE, mean squared error; LogLoss, logarithmic loss; Sensibility; Specificity; Precision; Accuracy; AUC, area under curve; and F1-score.