Introduction

Heart disease has one of the highest mortality rates of any condition, which is similar to that of cancer1,2,3,4,5. Early detection of cardiac disorders might help prevent fatalities. One of the most basic means of detecting irregularities is pulse assessment, which was first used in ancient China and Egypt in centuries B.C.6,7,8. In the modern era, the electrocardiogram (ECG) provides a detailed assessment of cardiac electrical activity, reflecting depolarization and repolarization of the atria and ventricles9. Based on this information, a variety of diagnoses can be made, such as premature ventricular contraction (PVC), premature atrial contraction (PAC), and atrial fibrillation (AF).

Though efficient, ECG examinations are costly and inconvenient. While electrical signals are efficiently analyzed by automated algorithms, comprehensive diagnoses must be made by medical practitioners in the end10,11. Accordingly, they are typically carried out only once a year for each individual during physical checkups, and fewer people are recommended to perform Holter ECG examinations continuously for 1 day, only if some cardiac anomalies are found in the physical checkups. More serious is that those who were negative in those ECG examinations still have a chance of getting cardiac disorders.

Thus it is desirable to make it possible to alert all persons to cardiac dysfunction using a handy device such as a smartwatch. While full ECG information is not available, a handy device may make it possible to constantly monitor a series of pulse intervals, which may provide sufficient information for alerting cardiac disorders12. This has led us to an idea of identifying cardiac disorders by analyzing heart rate variability (HRV) through heart periods in as much detail, just as expert medical practitioners have been doing for centuries.

HRV is quantified by measuring fluctuations in heartbeat intervals, using metrics such as the standard deviation of intervals between successive cardiac R waves or RR intervals (SDRR) and the coefficient of variation (Cv), which is defined as the ratio of the standard deviation to the mean13,14. Nevertheless, the variation in heartbeat intervals is caused by not only cardiac disorders but also slow fluctuations in the heart rate that occur even in healthy people15,16. There have been many attempts for isolating cardiac disorders from such miscellaneous factors, using a variety of analysis methods such as linear, frequency domain, wavelet domain, and nonlinear methods17,18,19,20,21. Nevertheless, diagnosing cardiac symptoms still requires medical experts to consider various possible conditions when making comprehensive judgments.

Recently, artificial intelligence-aided methods based on deep learning algorithms have been adopted for detecting cardiac disorders from heartbeat signals5,22,23,24,25. While these methods achieve high performance, individual decisions depend on a huge number of parameters that were empirically determined with given training datasets. A problem of such artificial intelligence-aided methods is that we cannot explain the reason for individual diagnoses made for each case. Another problem is that such machine-learning diagnoses may depend sensitively on a huge number of model parameters, and accordingly, identical results may not be reproduced even from the same set of training data.

To improve these points, here we design an automatic diagnosing system by simply combining two metrics that measure specific aspects of heartbeat variability. On a plane of the two metrics, we specify decision boundaries for different types of cardiac disorders. From this, we can characterize the difference in various cardiac diseases explicitly in terms of two metrics.

As the first metric, we have adopted the metric of local variation (Lv), which was previously used to analyze neuronal firing in the brain26,27,28. This is because Lv captures the difference in adjacent pulse intervals by removing the influence of slow fluctuations in heart rate. We have confirmed that Lv was superior to the conventional Cv in detecting premature contraction. The working of Lv is similar to that of an HRV index SD1\(^2\), in that they measure the variation of consecutive RR intervals. The superiority of Lv to Cv is similar to the superiority of SD1\(^2\) to other linear HRV indices such as SDRR or SDSD29,30.

Nevertheless, we also realized that Cv may still have captured some aspect independent of Lv particularly for atrial fibrillation. By identifying the differences between Lv and Cv, we sought to combine them in a manner that would maximize the ability to characterize cardiac status. Through this process, we invented a new metric, the local-global variation ratio (Lg), that effectively discriminates between premature contractions and fibrillation. Using these two metrics Lv and Lg, we constructed an algorithm to automatically diagnose cardiac symptoms. The new method exhibits outstanding performance in alerting AF, which is associated with serious diseases31,32,33,34.

For this analysis, we used Holter ECG recordings obtained from more than 1,000 outpatients in total who had cardiological examinations at Clinic, Kyoto Industrial Health Association. Recordings contain a series of RR intervals monitored for 1 day, and are accompanied by diagnoses made by medical doctors and technicians, including not only AF but also PVC and PAC, the latter of which may take place even in healthy people35,36. By assuming that heart periods that can be measured by a handy device are noisy observation of RR intervals37,38, we attempted to infer diagnoses of PVC, PAC, and AF from a set of RR intervals and examined how the inference can withstand the possible fluctuations added to the RR intervals, mimicking the noisy observations. To examine the generality of the analysis, we also applied the current method to the publicly available MIT-BIH databases, which have been conventionally used as standard datasets.

Results

For Holter ECG data recorded from 1,017 subjects in total (863 independent persons), we computed the variability metrics from each series of RR intervals and compared their ability to alert cardiac conditions, such as PVC, PAC, and AF.

Comparison of variability metrics

Figure 1 depicts the distributions of PVC, PAC, and AF plotted against the RR interval metrics: the heart rate Hr, the logarithm of coefficient of variation \(\log _{10} Cv\), and the logarithm of the local variation \(\log _{10} Lv\) (METHODS), each of which is calculated every 10 minutes and averaged over 18 hours. Because Cv and Lv computed for heartbeats are much smaller than unity, we take their logarithm to focus on their difference.

Many subjects have nonzero PVC and PAC, in which premature heartbeat is initiated in ventricles and atria, respectively. These events may occur even in healthy people. In our dataset, individual PVC and PAC events were identified by an automatic detection algorithm of ECG, and those with the occurrence probability higher than \(10^{-3}\) comprised 49% and 33% of the entire subjects, respectively. In contrast with these, AF accompanied by the irregular beating of the atrial chambers is considered a serious symptom leading to cardiac dysfunction. In our dataset, the symptom was determined manually by expert medical doctors and technicians, and 72 of 1,017 subjects (7%) exhibited nonzero AF.

For PVC, PAC, and AF, we categorized the diagnostic status of each subject as 0 or 1 according to whether the ratio of each condition was lower or higher than a threshold of 0.1, and predicted the dichotomic status based on Cv or Lv as obtained from pulsation signals. The numbers of the 1,017 subjects at high risk of PVC, PAC, and AF were 128 (12%), 34 (3%), and 53 (5%), respectively. Only one subject (0.1%) exhibited multiple symptoms (PAC and AF).

Figure 1
figure 1

Cardiac disorders plotted against Hr, \(\log _{10} Cv\), and \(\log _{10} Lv\). Each dot represents the average of 10-minute statistics for a single subject over 18 hours. Vertical axes are the logarithm of (a) premature ventricular contraction (PVC); (b) premature atrial contraction (PAC); and (c) atrial fibrillation (AF). Values of the Pearson correlation r are indicated above. Yellow zones represent regions of \(>0.1\).

Figure 2
figure 2

The true positive rate (TPR), true negative rate (TNR), and the Matthews correlation coefficient (MCC) measuring the ability of two individual metrics, thresholds of \(\log _{10} Cv\) and \(\log _{10} Lv\), to detect cardiac disorders: (a) PVC, (b) PAC, and (c) AF. For each cardiac disorder, each subject was categorized into positive or negative, according to whether the ratio of anomalous states identified by an automatic detection algorithm of ECG was higher or lower than 10%. TPR and TNR indicate the probability of each person being positive and negative according to whether the metric Cv or Lv is higher and lower than a given value, respectively. MCC measures a balance between true and false positives and negatives.

A conventional HRV metric Cv and our newly introduced metric Lv were strongly correlated with cardiac symptoms; subjects who exhibited the higher Cv or Lv were more likely to exhibit cardiac disorders. The average heart rate Hr was relatively weakly correlated with cardiac symptoms, and accordingly, we shall concentrate on Cv and Lv in the following analysis.

We first examined whether the high value of Cv or Lv could signal each cardiac disorder. We categorized data as “positive” or “negative” according to whether or not \(\log _{10} Cv\) or \(\log _{10} Lv\) was greater than a given threshold. Positive data were categorized as “true positive (TP)” or “false positive (FP)” according to whether or not subjects exhibited a given cardiac disorder (PVC, PAC, or AF). In contrast, negative data were categorized as “false negative (FN)” or “true negative (TN)” according to whether or not subjects exhibited a given cardiac disorder.

The upper rows in Fig. 2 depicts the true positive ratio \(\mathrm{TPR}=N_{\mathrm{TP}}/(N_{\mathrm{TP}}+N_{\mathrm{FN}})\), and the true negative ratio \(\mathrm{TNR}=N_{\mathrm{TN}}/(N_{\mathrm{TN}}+N_{\mathrm{FP}})\), where \(N_{\mathrm{TP}}\), \(N_{\mathrm{FN}}\), \(N_{\mathrm{TN}}\), and \(N_{\mathrm{FP}}\) are the numbers of true-positive, false-negative, true-negative, and false-positive cases, respectively. Higher Cv or Lv results in higher TNR, implying that subjects in whom the metric is lower than the threshold are unlikely to exhibit cardiac disorders. However, a high threshold lowers TPR, implying that many cardiac disorders are missed.

One means of achieving a reasonable balance between true and false positives and negatives is to maximize the Matthews correlation coefficient (MCC)39 defined as

$$\begin{aligned} \mathrm{MCC} = \frac{ N_{\mathrm{TP}} N_{\mathrm{TN}} - N_{\mathrm{FP}} N_{\mathrm{FN}} }{\sqrt{ (N_{\mathrm{TP}}+ N_{\mathrm{FP}}) (N_{\mathrm{TP}}+ N_{\mathrm{FN}}) (N_{\mathrm{TN}}+ N_{\mathrm{FP}}) (N_{\mathrm{TN}}+ N_{\mathrm{FN}}) } }. \end{aligned}$$

The lower rows in Fig. 2 depict the MCC values for PVC, PAC, and AF; peaks occur at an intermediate value for a threshold of \(\log _{10} Cv\) or \(\log _{10} Lv\). For PVC and PAC, the new metric Lv is more efficient than Cv in achieving higher MCC values. By contrast, Cv results in a higher MCC than Lv for AF.

Improving the detection of cardiac disorders

In the aforementioned analysis, we have seen that Cv and Lv had different strengths for different disorders. Because these metrics capture different aspects of the RR-variability, their performance in alerting cardiac disorders might be increased if these metrics were suitably combined.

Figure 3
figure 3

Different cardiac disorders plotted on a plane spanned by the new and old metrics \(\log _{10} Lv\) and \(\log _{10} Cv\). (a) Dots represent 1,017 subjects, who were diagnosed as atrial fibrillation (AF), premature ventricular contraction (PVC), premature atrial contraction (PAC), and negative, respectively colored in magenta, blue, and green, and black. The prediction zone for each cardiac disorders, given by the set of constraints \(\log _{10} Lv > a\) and \(\log _{10} Lg = \log _{10} Lv - 2 \log _{10} Cv > b \,\, (\text {or} < b)\) is depicted by the appropriate colors. (b) Representative pulse sequences of 1 minute, respectively taken from four different cases. NS stands for no symptom.

Figure 3a displays the distribution of datapoints representing subjects with different cardiac diseases plotted on a plane spanned by \(\log _{10} Lv\) and \(\log _{10} Cv\). The distribution seemingly comprises different groups; there is one big cluster at the lower left, centered at low \(\log _{10} Lv \approx -3\) and low \(\log _{10} Cv \approx -1.2\), while the datasets with higher \(\log _{10} Lv > -2\) tended to align linearly with the slope in parallel to \(\log _{10} Lv - 2 \log _{10} Cv\).

Measurements associated with cardiac disorders are located in the upper right region defined by high values of \(\log _{10} Lv\) and \(\log _{10} Cv\). On further analysis, the data points representing different cardiac disorders are distributed differently in this region; fibrillation (AF) cases are located on the upper side, whereas premature contraction (PVC and PAC) cases are located on the lower side. Representative pulse sequences of 1 minute taken from these diagnostic cases are depicted in Fig. 3b.

To determine how the local variation Lv and the coefficient of (global) variation Cv contribute to the separation of premature contractions and fibrillation, we newly introduce a metric of the local-global variation ratio (Lg), defined as

$$\begin{aligned} Lg = \frac{Lv}{Cv^2}. \end{aligned}$$

In the denominator, Cv, which measures the standard deviation of intervals, is squared to conform to the power of deviation in the numerator Lv measures the squared deviation (METHODS). For a Poisson random pulse train, Lg exhibits the value unity because both Lv and Cv take the value unity. For a locally regular pulsation whose rate is slowly modulated, \(Lg \ll 1\), because Lv is much smaller than \(Cv^2\). For a perfectly regular pulsation in which RR intervals are identical, Lg is undefined because both the numerator Lv and the denominator \(Cv^2\) are zero. We shall prove in the METHODS that Lg takes the value 3/2 for a sequence in which RR intervals of similar durations are arranged randomly. For a pulsation in which long and short intervals alternate, however, Lg takes the even higher value 3. In this way, the local-global variation ratio Lg can discriminate how intervals are arranged even if the intervals are close to each other (as is the case with heartbeats).

It is noteworthy that in Fig. 3, premature contractions (PVC and PAC) and fibrillation (AF) are well separated by the line \(\log _{10} Lg = \log _{10} Lv - 2 \log _{10} Cv \approx 0.15\). It is helpful to recall that \(Lg = 3/2\) for randomly arranged RR intervals and that \(\log _{10} 3/2 \approx 0.18\). As we have seen, Lg can be as large as 3 if long and short intervals alternate. This means that long and short pulse intervals tend to alternate in premature contractions, as they are distributed on the lower side of the line \(\log _{10} Lg = \log _{10} Lv - 2 \log _{10} Cv \approx 0.18\).

Considering these features, it is advantageous to infer the presence of each cardiac disorder based on a combination of variability metrics given by the following set of constraints,

$$\begin{aligned} \log _{10} Lv > a \end{aligned}$$

and

$$\begin{aligned} \log _{10} Lg > b \,\, (\text {or} < b). \end{aligned}$$

We selected the parameters a and b and the inequality direction so that the goodness index MCC is maximized for each cardiac disorder. The selected parameters are as follows: (PVC): \(\log _{10} Lv >-1.3\), \(\log _{10} Lg > 0.14\); (PAC): \(\log _{10} Lv >-1.5\), \(\log _{10} Lg > 0.15\); (AF): \(\log _{10} Lv >-1.3\), \(\log _{10} Lg < 0.15\). The selected zones are depicted in different colors in Fig. 3.

The method of combining Lv and Lg achieves MCC performances much higher than those obtained from simple thresholding of either Cv or Lv alone (Table 1). In particular, the ability to alert fibrillation (AF) was drastically improved by the combination method. Note that the obtained MCC performances would practically be unchanged even if we make leave-one-out cross-validation because the number of parameters (one or two) is much lower than the number of data points (higher than 1,000).

Table 1 MCC values measuring the performances of each alerting method.

Alerting cardiac disorders based on a shorter recording period

So far we have compared alerting methods using 18-hour data of RR intervals extracted from 1-day Holter ECG. As the Holter recording is costly and burdensome, it would be ideal to be able to detect cardiac disorders even with a shorter interval. Here we are interested in how the detection performance degrades if the recording duration is shortened, such as 1 hour, 10 minutes, or even 1 minute. We clipped out such shorter recordings from full-length data and estimated the MCC values for detection performance using the single metrics Cv and Lv as well as the combination method. Figure 4 shows that our new combination method may provide good performance even based on shorter recordings, with an accuracy comparable to or even higher than that obtained by applying a conventional single metric such as Cv to an 18-hour data of RR intervals.

Figure 5 depicts distributions of HRV metrics computed from recording periods of 1 minute, 10 minutes, and 1 hour. While datasets of shorter durations are more scattered on a plane spanned by \(\log _{10} Lv\) and \(\log _{10} Cv\), the distributions of premature contraction (PVC and PAC), fibrillation (AF), and negative cases remain well separated.

Figure 4
figure 4

The MCCs for Cv and Lv, and the combination method based on datasets with various recording durations: 1 minute, 10 minutes, 1 hour, and 18 hours. (a) PVC, (b) PAC, and (c) AF. The MCCs for the shorter recording durations are represented with the mean and standard deviation.

Figure 5
figure 5

Two metrics \(\log _{10} Lv\) and \(\log _{10} Cv\) computed from shorter recording periods. (a), (b), and (c) Datasets of 1 minute, 10 minutes, and 1 hour, plotted on a plane spanned by \(\log _{10} Lv\) and \(\log _{10} Cv\). The parameters a and b of each prediction zone were adapted to respective datasets.

Figure 6
figure 6

The MCCs for the combination method applied to recording duration of 1 minute and 18 hours. (a) PVC, (b) PAC, and (c) AF. Original pulse times were jittered with Gaussian noises of the standard deviation of 10, 50, 100, 200, and 500 ms.

Figure 7
figure 7

Noisy datasets of different cardiac disorders plotted on a plane spanned by \(\log _{10} Lv\) and \(\log _{10} Cv\). (a), (b), and (c): Datasets were obtained by jittering original pulse times with Gaussian noises of the standard deviation of 10 ms, 50 ms, and 100 ms. The parameters a and b of each prediction zone were adapted to respective noisy datasets.

Robustness against erroneous observation

If we monitor pulsations using a handy device such as a smartwatch, the measurement will be inaccurate since the pulse intervals are not identical to RR intervals. Considering possible fluctuations in the measurements, we created noisy data by jittering the original RR intervals with noises of the mean zero and the standard deviations of 10, 50, 100, 200, and 500 ms. Figure 6 demonstrates the manner in which the MCC values degrade with different noise durations. We can see that the combination method withstands the fluctuations if their noise level is less than 100 ms.

Figure 7 depicts HRV metrics of noisy datasets, in which the original series of RR intervals of 18 hours were jittered with noises of the standard deviations of 10, 50, and 100 ms. While datasets of noisy RR intervals were shifted toward higher values of \(\log _{10} Cv\) and \(\log _{10} Lv\), the distributions of premature contraction (PVC and PAC), fibrillation (AF), and negative cases remain well separated for these cases.

Application to public databases

To examine if our method may also be valid for other datasets, we applied it to the data of the MIT-BIH databases, which have been deemed standard. Here we considered the “Atrial Fibrillation Database” as representing the symptoms similar to the AF cases selected by Kyoto Industrial Health Association, while the “Arrhythmia Database” and the “Normal Sinus Rhythm Database” represent those who were not AF. Figure 8 represents the distributions of \(\log _{10} Lv\) versus \(\log _{10} Cv\) of 1minute, 10minute, 1hour, and 18hour data of the MIT-BIH datasets. For each dataset, two parameters a and b of our prediction model were selected so that MCC is maximized. The MCC values obtained for the AF cases of MIT-BIH datasets were much higher than those obtained for our Kyoto datasets; for instance, the MCC value for 18-hour data was as high as 0.761. The difference may have arisen because diagnostic criteria and the population ratio of cardiac cases were different between these datasets. Nevertheless, it is noteworthy to see the two parameters a and b were similar to those of Kyoto database; for instance, \(a = -0.115\) and \(b = 0.27\) for 18 hour data. This implies that a similar criterion may be applicable even for different datasets.

Figure 8
figure 8

Two metrics \(\log _{10} Lv\) and \(\log _{10} Cv\) computed for three kinds of MIT-BIH databases. (a), (b), (c), and (d): Datasets of 1 minute, 10 minutes, 1 hour, and 18 hours plotted on a plane spanned by the two metrics. The parameters a and b of the prediction zone (magenta) for the AF cases were adapted to respective datasets.

Discussion

In this study, we constructed an automated algorithm for alerting cardiac symptoms from pulsation signals. We first compared Lv with Cv and found that Lv was superior for detecting premature contractions, while Cv was superior for identifying fibrillation. Considering the specificity of each metric, we introduced the local-global variation ratio \(Lg = Lv/Cv^2\) and found that this metric effectively discriminates between premature contractions and fibrillation. With a method for combining Lv and Lg, we have obtained a performance far superior to that obtained by simple thresholding with single metrics.

Considering the cost and time needed to perform 1-day Holter ECG, it is desirable to be able to make a reasonable inference from a shorter recording, such as the few-minute ECG carried out during a regular medical checkup. We confirmed that applying our new combination method to 1-minute recordings provided detection performance that was comparable to or even better than that of the conventional HRV metric Cv applied to a 1-day recording.

Nevertheless, this favorable performance might not be exclusive to our method. Experienced medical practitioners should be able to achieve similar performance through pulse diagnosis, or by making full use of the many existing HRV metrics. Some of these metrics can detect features similar to Cv and Lv. For instance, if we map consecutive RR intervals, as was done in40,41,42, the variability along a diagonal line in the map corresponds to the global variability as represented by Cv, while the variability along the orthogonal axis is similar to the local variability of consecutive intervals as represented by Lv, in which slow rate fluctuation is effectively mitigated.

We have seen that longer recording of pulses provides more reliable inference. This is partly because the estimation of variability metrics becomes more accurate with time. But the main reason may be that cardiac symptoms occur intermittently during 1-day recording and it is difficult to identify disorders if the analysis is performed while symptoms are absent. But if we can use a more convenient device such as a smartwatch that can constantly monitor heartbeats, cardiac disorders can be easily detected using our analysis algorithm.

By assuming that pulses reflect the RR intervals of cardiac beats, being accompanied by noises occurring downstream, we tested whether our analysis method was robust against noisy data. By jittering the original RR intervals with fluctuating noises, we have confirmed that the detection of cardiac symptoms did not deteriorate if the fluctuation in the measurements was shorter than 100 ms. Thus our method might work for alerting to cardiac disorders from pulse signals obtained with a smartwatch. However, it should be noted that HRV analysis is limited in that it cannot classify arrhythmic events such as isolated premature ventricular contraction43,44. Once any malfunction is detected by HRV analysis, one should move on to ECG analysis for diagnosing the more detailed morphological features.

Here we selected the parameters for determining cardiac symptoms so that the performance was maximized in terms of the MCC. However, the MCC was based on a balance between true and false positives and negatives, in the sense that positives and negatives are considered to have equal weights. We may lower the threshold of Lv if we wish to be more cautious in predicting cardiac disorders, even though this results in an increase of false positives.

To test the generality of the analysis method, we applied it to the data of the MIT-BIH databases. We have confirmed that our method may detect AF cases with much higher MCC values, indicating the higher classification performance. It is noteworthy that we obtained a similar criterion for this dataset, although the diagnostic criteria may generally be different between different medical institutions. The advantage of our method may be its simplicity in representing and diagnosing cardiac status; our method has only two parameters a and b, and we may check whether they have been dependent on training datasets, which may be largely dependent on the diagnostic criterion of different institutions.

In this study, we achieved high detection performance by combining Cv and Lv, particularly by paying attention to the local-global variation ratio Lg. The essential point of improvement was to combine multiple metrics that detect different features of heartbeats. If we can obtain more sample data, it might be worthwhile to look for more detailed combinations of additional variability metrics.

Methods

ECG data

The data of Holter ECG were obtained from outpatients who had cardiological examinations at Clinic, Kyoto Industrial Health Association, Kyoto 604-8472, Japan. Japanese institutions or corporations are requested to let workers undergo a medical checkup once a year. If potentially concerning results are observed during the ECG recording performed during the physical checkup, individuals are recommended to undergo 1-day Holter ECG monitoring. Accordingly, subjects who received Holter ECG may have been more likely to have cardiac disorders than the overall population. Nevertheless, the majority of these individuals were still “healthy” in that they were not diagnosed with heart failure, and some were asymptomatic.

In the analysis of 1-day Holter recording, each pulse was automatically diagnosed as PVC and/or PAC by software provided by Fukuda Denshi Co. Ltd. Based on the summary data, we calculated the fraction of pulses exhibiting PVC and/or PAC contained in an entire set of pulses. Medical doctors analyzed the full records of individual subjects, and if abnormalities were detected, they manually determined the periods of AF. We calculated the ratio of the AF period to the total measurement time, 18 hours. The ECG data obtained from a CM5 lead were analyzed. From the recorded RR intervals, we computed Cv and Lv. We discarded 14 datasets whose recording period was shorter than 18 hours. Accordingly, we had datasets of 1,017 subjects in total (863 independent persons, men: 625, women: 238), and analyzed the initial 18 hours. The ages of the subjects ranged from 20 to 90.

All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all subjects or their legal guardians. Categorization of cardiac symptoms of outpatients was performed by medical doctors who acquired the license as Physician and Surgeon in Japan. The present study was approved by Institutional Research Board, Kyoto Industrial Health Association (Permission No. S18-0006), and the Ethics Review Committee for Medical and Health Research involving Human Subjects, Ritsumeikan University (Permission No. BKC-LSMH-2021-039).

MIT-BIH ECG databases

In addition to the Kyoto databases, we examined public databases MIT-BIH provided by the Harvard-MIT Division of Health Sciences and Technology45,46,47. Here we adopted three kinds of databases: MIT-BIH Atrial Fibrillation Database (2-channel Holter ECG recorded at a sampling frequency of 250 Hz for 10 hours); MIT-BIH Arrhythmia Database (2-channel Holter ECG recorded at a sampling frequency of 360 Hz for 24 hours. We adopted the 30-minute waveform data); and MIT-BIH Normal Sinus Rhythm Database (2-channel ambulatory ECG recorded at a sampling frequency of 128 Hz for 24 hours). We obtained RR intervals from the waveform data of the first channel. R-peaks were detected using the ecg_peaks function of neurokit2 software48. For datasets whose recording period was less than the specified period (1minute, 10minutes, 1hour, 18hours), we analyzed the entire recording period.

Metrics for measuring heartbeats

Given a sequence of RR intervals, we computed variability metrics defined as follows.

Figure 9
figure 9

Synthetic pulse sequences and the computed values of the coefficient of variation Cv, the local variation Lv, and the local-global variation ratio Lg. (a) Pulsation is locally regular while the rate is slowly modulated from low to high and then back to low. (b) Pulsation is irregular, although the average rate is nearly constant. (c) Long and short RR intervals alternate with each other. Cv has the same value for (a) and (b), which are composed of identical sets of RR intervals, whereas Lv is small for (a), in which the RR intervals are well organized, and is large for (b), in which RR intervals are arranged randomly. Lg is very small for (a), in which the rate is slowly modulated. \(Lg \approx 3/2\) for (b), in which RR intervals are randomly arranged. \(Lg \approx 3\) for (c), in which long and short RR intervals alternate.

  • Average heart rate (Hr) The most basic metric for measuring heartbeats is the average heart rate Hr, defined as,

    $$\begin{aligned} Hr = \text {number of heartbeats per minute}. \end{aligned}$$

    The 18-hour average value was adopted as an HR statistic for each subject.

  • Coefficient of variation (Cv)

    There are many conventional approaches for detecting HRV, such as linear, frequency domain, wavelet domain, and nonlinear methods. As a representative HRV metric, we used the coefficient of variation of RR intervals, defined by

    $$\begin{aligned} Cv = \frac{\Delta I}{\overline{I}}, \end{aligned}$$

    where \(\Delta I\) and \(\overline{I}\) represent the standard deviation and the mean of RR intervals. These statistics are typically measured every 10 minutes (references). Unless specified in the main text, the 18-hour mean of the logarithm of the 10-minute mean Cv was adopted. In Fig. 4, the 1-minute Cv and the 10-minute Cv were used only one-shot data by averaged over the first 1 minute and 10 minutes duration after 10-minute transient time, respectively. The 1-hour Cv was estimated as the logarithm of the 10-minute mean Cv averaged over the first 1 hour after the 10-minute transient. The coefficient Cv is designed to exhibit a value of unity for a Poisson random pulse train and is zero for a perfectly regular pulsation signal.

  • Local variation (Lv)

    The conventional method of measuring HRV is adversely affected by slow variations in the heart rate and is also sensitive to artifacts and errors. There have been efforts to remove these artifacts, and modern methods employ machine learning techniques such as the state-space method49,50. Here we employ a simple metric called the local variation Lv, which was introduced for measuring the firing irregularity of the neurons in the brain26,27,28. The local variation Lv is defined as,

    $$\begin{aligned} Lv = \frac{3}{n-1} \sum _{i=1}^{n-1} \left( \frac{I_i-I_{i+1}}{I_i+I_{i+1}}\right) ^2, \end{aligned}$$

    where \(I_i\) and \(I_{i+1}\) are the ith and \(i+1\)st RR intervals, respectively, and n is the total number of the intervals in a given duration. Note that the heartbeat that makes up the end of the ith RR interval and the start of the \(i+1\)st RR interval was within the duration. The coefficient 3 in the definition of Lv was chosen so that Lv gives the value of unity for a Poisson pulse train26. Lv is zero for a regular pulsation.

Whereas Cv represents the global variability of an entire sequence and is sensitive to rate fluctuations, Lv detects the instantaneous variability of intervals. To demonstrate the difference in the workings of these metrics, we created synthetic pulse sequences. In Fig. 9a, RR intervals are lined up in a regular manner from long to short and then short to long, while in Fig. 9b, the identical set of RR intervals is presented in a random sequence. Accordingly, (a) represents a locally regular pulsation while the rate is slowly modulated, whereas (b) represents irregular pulsation.

The coefficient of variation Cv has identical values for (a) and (b) because the standard deviation and the mean are the same for both sequences. However, Lv successfully ignores slow heart rate modulation, and accordingly, it identified the difference in local irregularity between the two sequences. Note that we made the variations in RR intervals much larger than those of real cardiac beats so that their differences would be apparent.

In addition to these metrics, we have newly introduced the local-global variation ratio \(Lg = Lv/Cv^2\). Its characteristics are analyzed in the following subsection. Lg takes a value much smaller than unity for a locally regular pulsation whose rate is slowly modulated (Fig. 9a). Lg takes the value 3/2 for a sequence in which RR intervals of mutually similar values are arranged randomly (Fig. 9b). Lg takes an even higher value of 3 for a pulsation in which long and short intervals alternate (Fig. 9c).

Analytical calculations of Cv, Lv, and Lg for some limiting cases

Renewal process with gamma-distributed interval

Expectation values are analytically available for a wide class of renewal processes in which pulse intervals are derived from the gamma distribution,

$$\begin{aligned} p_{z}(I) = (I/\tau )^{z-1}\exp (-I/\tau )/\Gamma (z), \end{aligned}$$

where \(\Gamma (z)\) is the gamma function defined as \(\Gamma (z) = \int _{0}^{\infty } dt \, t^{z-1} \exp (-t)\). The Poisson process corresponds to the case of \(z=1\), and we can generate more regular pulse trains with the larger z.

In this case, the expectation value of Lv is obtained analytically as \(3/(2z+1)\), and the expectation value of Cv is obtained as \(1/\sqrt{z}\). Accordingly, the local-global variation ratio Lg is

$$\begin{aligned} Lg = Lv/Cv^2 = 3z/(2z+1). \end{aligned}$$

While \(Lg = 1\) for the Poisson process with \(z=1\), \(Lg=3/2\) for a regular pulse train at a limit of \(z \rightarrow \infty\).

Sequences consisting of long and short pulse intervals

The local-global variation ratio Lg is undefined for a perfectly regular pulse train, for which both Lv and \(Cv^2\) are zero. Though we have obtained \(Lg=3/2\) for a regular pulse train in a limiting case of the renewal process with gamma interval distribution \(z \rightarrow \infty\), it might be sensitive to the arrangement of pulse intervals. Here we consider long pulse trains in which equal numbers of long and short intervals are arranged in various orders. Let the long and short intervals be denoted as \(\tau + \delta\) and \(\tau - \delta\), respectively (\(\tau>\delta >0\)). Because the mean interval and standard deviation are \(\tau\) and \(\delta\), respectively, the coefficient of variation Cv is \(\delta /\tau\). By denoting the switching probability between long and short intervals as p, the local variation Lv is given as \(3 p \, \delta ^2/\tau ^2\). Accordingly, the local-global variation ratio Lg is obtained as

$$\begin{aligned} Lg = Lv/Cv^2 = 3p. \end{aligned}$$

Thus \(Lg = 3/2\) if long and short intervals alternate randomly, in which case the switching probability between long and short is half, \(p=1/2\). But Lg can be as high as 3 if the long and short intervals always alternate (\(p=1\)). Lg can be close to zero for a long pulse train in which switching between long and short intervals occurs very rarely (\(p \approx 0\)). In this way, the local-global variation ratio may discriminate the order of long and short intervals, even if the difference between the long and short intervals is very small (\(\delta \ll \tau\)).