Inter- and intra-researcher reproducibility of heart rate variability parameters in three human cohorts

Heart rate variability (HRV) is a valid and non-invasive indicator of cardiac autonomic nervous system functioning. Short-term HRV recordings (e.g., 10 min long) produce data that usually is manually processed. Researcher subjective decision-making on data processing could produce inter- or intra-researcher differences whose magnitude has not been previously quantified in three independent human cohorts. This study examines the inter- and intra-researcher reproducibility of HRV parameters (i.e., the influence of R–R interval selection by different researchers and by the same researcher in different moments on the quantification of HRV parameters, respectively) derived from short-term recordings in a cohort of children with overweight/obesity, young adults and middle-age adults. Participants were recruited from 3 different studies: 107 children (10.03 ± 1.13 years, 58% male), 132 young adults (22.22 ± 2.20 years, 33% males) and 73 middle-aged adults (53.62 ± 5.18 years, 48% males). HRV was measured using a Polar RS800CX heart rate monitor. The intraclass correlation coefficient (ICC) ranged from 0.703 to 0.989 and from 0.950 to 0.998 for inter-and intra-researcher reproducibility, respectively. Limits of agreement for HRV parameters were higher for the inter-researcher processing compared with the intra-researcher processing. On average, the intra-researcher differences were 31%, 62%, and 80% smaller than the inter-researchers differences based on Coefficient of Variation in children, young and middle-aged adults, respectively. Our study provides the quantification of the inter-researcher and intra-researcher differences in three independent human cohorts, which could elicit some clinical relevant differences for HRV parameters. Based on our findings, we recommend the HRV data signal processing to be performed always by the same trained researcher and we postulate a development of algorithms for an automatic ECG selection.

Anthropometric measurements. Body weight and height were measured with an electronic scale and a stadiometer, respectively (SECA instruments, model 799, Electronic Column Scale, Hamburg, Germany) prior to the HRV assessment. Body mass index (BMI) was calculated as kg/m 2 . Body weight, height and waist circumference were measured twice consecutively by the same trained researcher, and the average for each parameter was used. www.nature.com/scientificreports/ Study design. Study design is graphically presented in Fig. 1. To assess the inter-researcher reproducibility of HRV parameters derived from short-term recordings, two different trained researchers processed the data (more than 3 years of experience; background in sport sciences and exercise physiology) from the same shortterm HRV recordings, (the two trained researchers could not be the same in all cohorts; see Fig. 1). To assess the intra-researcher reproducibility, the same HRV signal was processed twice by the same trained researcher 2 months apart (wash-out period). For this purpose, one of the researchers who first-processed the HRV signals was chosen to re-process the ≈ 50 to 60% of the recordings. Those recordings were randomly selected and blinded by an external researcher.
HRV recording and data processing protocol. Participants lay in a supine position for 10-15 min while the HRV signal was recorded. HRV assessment took place in a quiet room with dim lighting, controlled ambient temperature and humidity (ACTIBATE and FIT-AGEING studies) between 8 AM and 12 PM (specifically: ActiveBrains project: 9 AM-12 PM; ACTIBATE and FIT-AGEING studies: 8 AM-9 AM). The Polar RS800CX heart rate monitor (Polar Electro Oy Inc., Kempele, Finland) was used to record the HRV signal during 10-15 min at a sampling frequency of 1,000 Hz. Participants were instructed to breathe normally, and not to fidget, talk or sleep during recordings. For HRV signal analyses we used the normal R-R intervals after excluding the extreme values with a medium filter 18 using the Kubios software, standard version 3.2 (HRV analysis, University of Eastern Finland) 22,23 . The R-R intervals series were detrended using the smoothness prior method with lambda set at 500 and a cubic interpolation at the default rate of 4 Hz. A visual inspection was performed on the 10-15 min recorded, with each researcher manually selecting 5 min of the register (e.g., from minute 3 to minute 8) after considering that the selected 5 min met the top-quality criteria: (1) no large R-R interval outliers (i.e., a R-R interval extremely higher or lower than the whole R-R signal, based on the visual inspection of the HRV recording performed by the researcher), (2) R-R intervals equidistance (i.e., large or short distance between R-R intervals of the HRV recording, visually inspected by the researcher), and (3) Gaussians R-R intervals and heart rate distribution graphics 13,14 .
Under the Guidelines of Task Force of The European Society of Cardiology and The North American Society of Pacing and Electrophysiology recommendations 1 , we derived the most used time-and frequency-domain HRV parameters obtained from short-term recordings. In the time-domain, we computed the squared root of the mean of the sum of the squares of successive normal R-R interval differences (RMSSD) in milliseconds (ms), number of pairs of adjacent normal R-R intervals differing by more than 50 ms in the entire recording, expressed as a percentage (pNN50), and the standard deviation of all normal R-R intervals in ms (SDNN). In the frequency-domain, we performed spectral analyses using the non-parametric and parametric methods, fast Fourier transformation (FFT) and autoregressive (AR) algorithms respectively. FFT algorithm with Welch's periodogram method (50% overlap Hanning window as pre-processing technique and calculating area under the curve with an integration) and AR with a model order of 16 as previously recommended 24 . We derived the power in the high frequency (HF: 0.15-0.4 Hz), the power in the low frequency (LF: 0.04-0.15), both in absolute units (ms 2 ) and the LF/HF ratio.
The physiological interpretation of HRV parameters in time-and frequency-domain should be briefly mentioned. Usually, higher HRV values on time-domain parameters (i.e., RMSSD, pNN50 and SDNN) are considered as higher vagal tone associated with a lower cardiovascular disease risk and mortality 7 . However, the time-domain HRV parameters measure autonomic nervous system activity but cannot differentiate changes on HRV due to withdrawal of vagal tone or increased sympathetic tone and are less precise in terms of frequency delimitation 3 . On the other hand, HRV parameters in frequency-domain can quantify cyclic fluctuations of R-R intervals. The spectral analyses divide HRV into different oscillatory components (such as HF or LF). HF component reflects respiratory sinus arrhythmia, which is mainly influenced by vagal activity. LF fluctuations are due to parasympathetic and sympathetic control. The ratio LF/HF is often considered as an indicator of sympatho-vagal balance www.nature.com/scientificreports/ (e.g., parasympathetic withdrawal is accompanied by sympathetic activation), however, this interpretation has been recently challenged 25 .
Statistical analysis. Descriptive data are presented as means and standard deviations (SD) for continuous variables, and frequencies and percentages for categorical variables. Normal distribution of the included variables was tested using the Kolmogorov-Smirnov test, and visual inspection of histograms was also performed. HRV parameters derived from short-term recordings did not exhibit normal distributions, however, since sensibility analyses did not show differences after normalizing the variables, data were analyzed and presented as non-normalized values. Intraclass correlation coefficient (ICC) confidence interval (95% IC), inter-and intracoefficient of variation (CV), Bland-Altman plots (mean difference [bias] and limits of agreement) 26 and the root mean square error (RMSE) were used to study the inter-and intra-researcher reproducibility for HRV parameters derived from short-term recordings. Heteroscedasticity, or whether the variability (error) increases or decreases as the magnitude of the measure increase, was calculated using linear regressions between the mean values and the absolute values of HRV parameters differences. Of note, all the analyses were performed separately because we observed an interaction age × study population effect, influencing most of the HRV parameters (P < 0.05). All the analyses were conducted using the Statistical Package for Social Sciences (SPSS, v. 21.0, IBM SPSS Statistics, IBM Corporation). Table 1 shows participants' characteristics. Table 2 shows the inter-researcher reproducibility of HRV parameters derived from short-term recordings in time-and frequency-domains. The ICCs of HRV parameters in timedomain ranged from 0.822 to 0.989, while for the frequency-domain ranged from 0.703 to 0.985. Overall, the CVs for HRV parameters in the time-domain were lower than 5%, except for the pNN50 in the young and middle-aged adults' cohorts, in which it was lower than 6 and 9%, respectively. CVs for HRV parameters in frequency-domain were lower than 16% for children, 17% for young adults and 21% for middle-aged adults in all cases. RMSE of the HRV parameters in time-domain ranged from 2.66 to 9.17, while for the frequency-domain ranged from 154.19 to 839.51 (HF and LF respectively) and from 0.32 to 2.38 (ratio LF/HF). Table 3 shows the intra-researcher reproducibility of HRV parameters derived from short-term recordings. The ICCs of HRV parameters in time-domain ranged from 0.972 to 0.998, while for the frequency-domain ranged from 0.950 to 0.996. The CVs of HRV parameters in time-domain were lower than 4% in most HRV parameters, except for pNN50 in the middle-aged adult's cohort (11%). CVs of the HRV parameters in frequency-domain were lower than 4% for children, 11% for young adults and 14% for middle-aged adults in all cases. On average, the intra-researcher differences were 31%, 62%, and 80% smaller than the inter-researcher differences based on CVs in children with OW/OB, young and middle-aged adults, respectively (data not shown). RMSE of the HRV parameters in time-domain ranged from 1.09 to 3.21, while for the frequency-domain ranged from 113.05 to 402.13 (HF and LF respectively) and from 0.22 to 0.93 (ratio LF/HF).

Discussion
The main findings of our research are: (1) this study provides the quantification of the inter-researcher and intra-researcher differences (i.e., the influence of R-R interval selection by different researchers and by the same researcher in different moments on the quantification of HRV parameters, respectively) in three different human cohorts. Although the inter-and intra-researcher differences were not very large, the inter-researcher www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ reproducibility could elicit clinical relevant differences for some HRV parameters (e.g., some differences on SDNN values ≥ 10 ms 27 based on limits of agreement in Bland Altman plots); (2) overall, HRV parameters in frequency-domain derived from short-term recordings presented lower reproducibility compared with those in time-domain in both the inter-and intra-researcher processing. Therefore, the data obtained from our three cohorts support the idea that HRV signal processing should be performed by the same trained researcher to obtain consistent and reproducible HRV parameters derived from short-term recordings in time-and frequencydomain. Alternatively, automatic algorithms for an ECG segment selection should be developed to overcome the reproducibility problem. www.nature.com/scientificreports/ To our knowledge, only three previous studies focused on inter-and intra-researcher reproducibility for HRV derived parameters from long-term recordings [28][29][30] , whilst only two previous study focused on short-term recordings 15,16 . We observed similar findings like those reported by Farah et al. 15 and Bassi et al. 16 in a sample of 27 healthy adolescents and 44 middle-aged adults with type 2 diabetes, respectively. Of note, Farah et al. 15 registered the HRV signal using the same heart rate monitor (Polar RS800CX) and performed the same statistical tests to study inter-and intra-researcher reproducibility of HRV parameters derived from short-term recordings. www.nature.com/scientificreports/ In the study conducted by Farah et al. 15 carried out in adolescents, ICCs ranged from 0.915 to 0.996 and 0.990 to 0.993 for inter-and intra-researcher reproducibility, respectively, while for children with OW/OB our ICCs ranged from 0.823 to 0.989 and 0.950 to 0.998 for inter-and intra-researcher reproducibility, respectively. Likewise, CVs reported by Farah et al. 15 were lower than 20% and 7% for inter-and intra-researcher reproducibility, respectively. We found similar CVs in children, i.e., lower than 15.9% and 4.3% for inter-and intra-researcher reproducibility, respectively. The marginal differences observed across studies could be partially explained by the different biological characteristics' between them (e.g., children with OW/OB vs. adolescents) and/or by some methodological inconsistencies in HRV data collection (i.e., standard operation procedure) and/or HRV data signal processing.
In regards to young adults, in our study, ICCs ranged from 0.822 to 0.985 and 0.990 to 0.996 for inter-and intra-researcher reproducibility respectively. Likewise, the CVs were lower than 16.7% and 10.9% for inter-and intra-researcher reproducibility. On the other hand, respect to middle-aged adults, ICCs ranged from 0.850 to 0.987 (with the exception of the FFT_LF/HF parameter; ICCs 0.703) and 0.951 to 0.995 for inter-and intraresearcher reproducibility, respectively. Similarly to our findings in healthy middle-aged adults, Bassi et al. 16 reported ICC ranging from 0.730 to 0.970 and 0.910 to 0.990 for inter-and intra-researcher reproducibility in middle-aged adults with type 2 diabetes. The little variation observed of ICCs between both studies in middleaged adults could be partially explained by the health status of the study populations (i.e., healthy vs. type 2 diabetes) and by some methodological considerations, e.g., different model of the heart rate monitors used (Polar RS800CX vs. Polar S810i model).
The inter-researcher reproducibility of HRV parameters derived from short-term recordings was lower than intra-researcher reproducibility in our study, which could be explained by one issue related to the HRV data signal processing. Based on our previous experience in the HRV data signal processing from short-term recordings using the Kubios software, we found that, in HRV recordings presenting a high HRV (e.g., children), a Gaussian distribution of R-R intervals and HR distribution graphs is difficult to be found. Moreover, this higher HRV could complicate the decision-taking from researchers that guarantee the selection of the same best-quality 5 min, and this fact could produce the mentioned differences between HRV parameters derived from short-term recordings.
In regards to intra-researcher reproducibility, we reported CVs lower than 4.3%, 10.9% and 13.7% for children with OW/OB, healthy young and middle-aged adults respectively, in some of the HRV parameters derived from short-term recordings. Thus, in general the intra-researcher reproducibility was higher (i.e., lower CVs) in children with OW/OB in comparison with young and middle-aged adults. We expected that individuals with a higher HRV (e.g., children and young adults) presented worse intra-researcher reproducibility, in other words a higher variability between test and retest, than in individuals with lower HRV values (e.g., middle-aged or elderly adults). Hence, a higher HRV might complicates the selection of the same best-quality 5 min across researchers. In fact, in our study, the differences observed across cohorts in the intra-researcher reproducibility could be explained, in a greater or lesser extent, by methodological aspects, e.g., a different researcher performs the intra-researcher HRV signal data processing in each cohort (Fig. 1). However all of them were trained in the HRV data signal processing using the Kubios software (more than 3 years of experience; background in sport sciences and exercise physiology).
On the other hand, the Bland-Altman plots for time-domain HRV parameters derived from short-term recordings, i.e., RMSSD and SDNN (Figs. 2, 3), showed wider limits of agreement in the inter-researcher HRV signal data processing in comparison with the intra-researcher. These results are in concordance with the aforementioned statement, a higher HRV can complicate the selection of the same fragment best-quality 5 min' period between different researchers and, this could negatively affect the reproducibility of HRV parameters derived from short-term recordings. We should consider that the ICC and CVs previously mentioned and discussed against the Farah et al. 15 study in adolescents, include HRV parameters derived from short-term recordings in time-and frequency-domain. On the other hand, for Bland-Altman plots, we only took into account in the main text the HRV parameters derived from short-term recordings in time-domain more frequently used such as RMSDD and SDNN 1,10 (see Figs. 2,3).
Clinicians and researchers care about whether the magnitude of the differences elicited by the inter-and intra-researcher HRV data signal processing are clinically relevant. However, it is difficult to determine due to several methodological differences between studies related to the HRV data collection and HRV data signal processing (e.g., instrument employed to record the HRV signal, length of the recordings, HRV parameters derived reported, etc.) 31 .
Despite those methodological issues, it is relevant to take into consideration the absolute values of interand intra-researcher differences (see Figs. 2, 3 for a graphical visualization). In this regard, a mean increase of 6-10 ms on RMSSD has been reported after different kinds of interventions in children with OB, and in healthy and unhealthy adults [32][33][34] . For example, for RMSSD inter-and intra-researcher HRV signal data processing, we reported a mean difference of − 1.71 ms and 0.32 ms in children with OW/OB, respectively. Thus, based on the previous studies 32-34 , we can assume that these differences are not clinically relevant. However, limits of agreement describe inter-researcher differences up to 17.31 ms, i.e., twice the change reported in the previous mentioned interventions [32][33][34] , reaching intra-researcher differences a maximum of 5.15 ms. These findings question whether the nature of the change reported in the intervention could be produced by the intervention itself in a same extent that it could be produced by unreliable HRV data signal processing. To define the trained level or experience of the researchers and how many researchers were in charge of the HRV data signal processing in each time point of an intervention is a matter of main interest that should be reported in all the intervention studies.
Otherwise, Bilchick et al. 27 reported that each increase of 10 ms in SDNN supposed a reduction of 20% of the risk of mortality in adults with ischemic cardiomyopathy (24 h-holter recordings). Importantly, the mean values of the inter-and intra-researcher differences for SDNN in our study were 1.05-1.69 ms and 0.04-0.50 ms, respectively. However, the limits of agreement showed that some inter-researcher differences could reach 11.28 ms in www.nature.com/scientificreports/ young adults, where intra-researcher analysis can reach a maximum difference of 6.59 ms. Thus, attending to our findings, inter-researcher differences could be clinically relevant whereas intra-researcher differences could not. We encourage researchers to report the HRV signal data processing in detail (i.e., level of artefact correction, smoothness prior alpha value, interpolation value) including if the same or a different researcher carried out the HRV processing, as well as their experience (i.e., trained or not). Further studies are needed to test the impact of other decisions related to the HRV signal data processing such as different levels of artefact correction or experience status of the researchers on the quantification of HRV derived parameters.
Some limitations need to be mentioned: (1) the sample from ActiveBrains project were children with OW/OB, so we cannot extrapolate our findings to thinner children; (2) the intra-researcher HRV signal data processing was not performed by the same researcher in the three different cohorts, however, all the researchers in our study were trained in HRV signal data processing using the Kubios software; (3) the experimental conditions during the data acquisition were not identical in the different studies as different standard operation procedures were used (e.g., the hour and dates of the HRV assessments, etc.). On the other hand, the strengths of our study were: (1) relatively larger sample size compared to previous studies that analyzed inter-and intra-researcher reproducibility of HRV parameters derived from short-term and long-term recordings; (2) inter-and intra-researcher reproducibility of HRV parameters derived from short-term recordings were tested in three different cohorts and performing an identical methodology of HRV signal data processing and following the current guidelines for HRV assessment 1,35 . conclusion In conclusion, our study provides the quantification of the inter-researcher and intra-researcher differences (i.e., the influence of R-R interval selection by different researchers and by the same researcher in different moments on the quantification of HRV parameters, respectively) in three different human cohorts. Although the inter-and intra-researcher differences were not very large, the inter-researcher reproducibility could elicit some clinically relevant differences for some HRV parameters in time-domain. Based on our findings, we recommend that the HRV signal data processing should be performed by the same trained researcher to obtain more reproducible and consistent HRV parameters derived from short-term recordings. Alternatively, automatic algorithms for an ECG segment selection should be developed to overcome the reproducibility problem. These findings need to be considered with caution due to the three different cohorts selected from three studies with different objectives and inclusions criteria.