Introduction

Physical activity reduces the risk of chronic diseases and premature death1,2,3. Assessing physical activity in humans in epidemiological studies is challenging, and has mostly relied on self-reported questionnaires with low to moderate validity as compared to objective measures4, 5. Today, by measuring the body’s acceleration in all three spatial axes, accelerometers enable an improved physical activity assessment under free-living conditions. While, most previous studies use accelerometry assessed during waking hours only6,7,8,9, 24 h-accelerometry over multiple days is now being used in several large cohort studies like the German National Cohort, allowing to capture a comprehensive view on the individual habitual physical activity10, 11.

One important premise for the interpretation of accelerometer data in terms of habitual physical activity is knowledge about whether or not the device has been worn appropriately. In this context, detection of accelerometer non-wear time (NWT) is essential to correctly classify habitual physical activity. Assessment of NWT using diaries or questionnaires is time consuming for study participants and researchers, and may therefore not be feasible in large-scale observational studies. Automated detection from accelerometry data may be an efficient alternative way for assessment of NWT. Most previous studies have defined NWT as periods of >60 consecutive minutes of zero-acceleration readings6, 12,13,14. However, this definition, based on an algorithm proposed by Troiano et al., was developed using uniaxial accelerometers worn during waking hours only9, 15. While some previous studies already indicated that definitions with longer periods of zero-acceleration readings might more accurately detect NWT during waking hours16,17,18, complexity is added when recordings are conducted over 24 h periods and thus include periods of sleeping, since in sleeping phases motionless periods might be longer than 60 minutes. It is thus unclear to what extent the 60-min algorithm is suitable to detect NWT in accelerometry data captured over 24 h periods.

The aim of this study was therefore to assess to what extent algorithms of periods of >60 minutes of acceleration zero-readings are appropriate to define NWT in 24 h-accelerometry data assessed over multiple days in the general adult population. This was accomplished in two epidemiological studies on adults from the North (ActivE, Berlin; convenience sample) and the South of Germany (KORA, Augsburg; population-based sample).

Methods

Study populations

We used cross-sectional data of the ActivE study (Berlin, Germany) and the KORA (KORA - Cooperative Health Research in the Region of Augsburg, Germany) FF4 cohort. The ActivE study originally aimed to quantify activity-related energy expenditure using 24 h-accelerometry over two weeks. A convenience sample of 50 participants was recruited aged 20–69 years with body-mass index (BMI) 18.5–35.0 kg/m². Participants with physiological conditions interfering with energy metabolism or weight stability as well as mobility impairments were excluded. The initial KORA S4 cohort included 4,261 participants aged 25–74 years from the general population in the area of Augsburg, Germany, recruited between 1999–200116. For the present analysis, data was used from the second follow-up, KORA FF4 (in the following referred to as ‘KORA’), which was performed in 2013–2014, enrolling 2,279 participants. Of those, 1,043 participants were asked to take part in the 24 h-accelerometry; 562 agreed to participate. The study protocols of ActivE and KORA were approved by the ethics committee of the Charité - Universitätsmedizin Berlin and the Bavarian Medical Association, respectively, and by the local data protection officers. All investigations were carried out in accordance with the relevant guidelines and regulations. All participants gave written informed consent.

Data collection

At the study centers, anthropometric measurements were taken, and information on socio-demographic, economic, and lifestyle factors was collected. Participants were provided with the triaxial accelerometer ActiGraph GT3X+ (ActivE) or GT3X (KORA) (both ActiGraph LLC, Pensacola, FL, USA), and were asked to wear the device for a period of two weeks (ActivE) and one week (KORA) for all waking and sleeping phases except for water activities, sauna, or high contact sports. Previous studies have shown good validity and reliability of the ActiGraph GT3X+ and GT3X for assessment of habitual physical activity17,18,19. The only difference between the GT3X+ and the GT3X is that the former is waterproof, whereas the latter is not. In ActivE, accelerometers were put on during the first study visit, and participants were instructed to wear the device on the right hip. Because of limited recording capability, each participant was provided with a second pre-initialized accelerometer to be worn during the second week and data of both devices were later on joined to have two weeks of 24 h-accelerometry. In KORA, participants started wearing the accelerometer the morning after the study visit, and they were instructed to wear the accelerometer at the hip on the side of the dominant hand during daytime and to move it from hip to the wrist of the non-dominant hand overnight20. All participants kept activity diaries to record sleep/wake phases and starting/ending time and reasons of any NWT period. Due to missing or inconsistent diary or accelerometry data three ActivE and three KORA participants were excluded from all analyses, resulting in 47 ActivE and 559 KORA participants included in the present analysis.

The ActiLife software was used to initialize accelerometers and to download activity data as ‘counts’ (ActivE study, version 6.11.0; KORA study, version 6.11.2; ActiGraph LLC, Fort Walton Beach, FL, USA). Raw accelerometry data were sampled by a 12 bit analog to digital converter (dynamic range; ActivE: ±6G, KORA: ±3G) at a constant 30 (KORA, stored at a 1 Hz rate) or 100 Hz (ActivE, stored at a 100 Hz rate) rate using all three spatial axes (filter set to default, ‘normal’, in both studies). Data were converted into 60 sec-epochs and extracted as ‘vector magnitude counts/minute’ (resulting from acceleration detected via the vertical, horizontal, and perpendicular axis).

Statistical analysis

Age, height, weight, BMI, NWT minutes, and number of NWT periods are presented as median and interquartile range (IQR). NWT parameters were averaged per participant over the total time of accelerometry recorded.

For each participant we determined (a) the total time of assessment as period between first and last recorded time point of the diary, and (b) each waking and sleeping phase of the measured days/nights over the study period as recorded in the diary. Thus, we were able to compare NWT between accelerometry and diary over the total time of assessment and separately over waking or sleeping phases (Fig. 1). Three approaches were used to compare NWT detected based on diary and accelerometry data:

Figure 1
figure 1

Exemplary description of non-wear time (NWT) validation using consecutive time periods for calculation of sensitivity and specificity of NWT algorithms to detect NWT periods >60 to >180 minutes (black rectangles, wear time based on 24 h-accelerometry or diary data, respectively; white rectangles, NWT based on 24 h-accelerometry or diary data, respectively). As an example, in the original diary data (panel I), five NWT periods during waking were reported being 30, 30, 115, 130, and 30 minutes, respectively. When applying the 60-min NWT algorithm (panel II), two of all NWT periods in diary (115 and 130 minutes, respectively); at the same time, three NWT periods occurred based on the 24 h-accelerometry data (accelero.) during waking (65 and 140 minutes, respectively) and sleeping (75 minutes). Using the 90-min NWT algorithm (panel III), two NWT periods in diary were still detected (115 and 130 minutes, respectively); based on accelerometry, one NWT period was detected (140 minutes). No NWT was detected in diary or accelerometry data when applying the 150-min algorithm (panel IV). For each NWT algorithm, detected NWT periods were classified according to the fourfold table in panel V. Sensitivity (proportion of true positively identified NWT) and specificity (proportion of true negatively identified NWT) were then calculated using the formulas in panel VI.

1. Sensitivity and specificity of NWT algorithms >60 to >180minutes based on accelerometry compared to diary.

We determined each participant’s NWT based on the diaries’ information by identifying consecutive time periods of reported NWT between >60 to >180 minutes, respectively. To derive NWT based on accelerometry data, we used the extracted data of the ‘vector magnitude counts/minute’ and determined periods of >60, >90, >120, >150, or >180 minutes of consecutive acceleration zero-counts without allowing for interruptions. Based on the different algorithms, each identified NWT period was compared between accelerometry and diary, with the diary set as reference. If there was an overlap in length of NWT periods between accelerometry and diary of ≥50% this was coded as true positively classified NWT (Fig. 1, panel I–VI, classification ‘a, true positive’). If the overlap was <50%, the period was recorded as ‘not assigned’ (not included in analyses of sensitivity and specificity). If a NWT period in the accelerometry data did not correspond to a diary period, this period was assigned as false positively classified (Fig. 1, panel I–VI, classification ‘b, false positive’). A NWT period detected in the diary that was not detected in the accelerometry data was assigned as false negative (Fig. 1, panel I–VI, classification ‘c, false negative’). In case both, the diary and the accelerometry data indicated a continuous wear time during the period considered for analysis, it was classified as true negative (Fig. 1, panel I–VI, classification ‘d, true negative’). Based on the total number of NWT periods identified, sensitivity (proportion of true positively identified NWT) and specificity (proportion of true negatively identified NWT) were calculated for each NWT algorithm (Fig. 1, panel I–VI)21.

2. Overlap of accelerometry and diary NWT periods identified using the >60-min to >180-min algorithms.

We determined the overlap of the detected NWT between accelerometry and diary on a minute-by-minute basis for each NWT algorithm for NWT-periods >60 minutes (Fig. 2). We calculated for each algorithm the length (minutes) of: (1) NWT >60 minutes detected in both, diary and accelerometry data (Fig. 2, panel II-III, case ‘accelero. +diary’), (2) NWT >60 minutes detected in diary only (Fig. 2, panel II-III, case ‘diary only’), and, (3) NWT >60 minutes detected by accelerometry only (Fig. 2, panel II-III, case ‘accelero. only’). We then calculated the relative contribution of the sum of NWT minutes in NWT periods >60 minutes identified as case 1 (overlap, i.e., NWT minutes detected in both, diary and accelerometry), case 2 (diary only), and case 3 (accelerometry only), to the potential total NWT which is NWT detected by either diary, accelerometry, or both.

Figure 2
figure 2

Exemplary description of non-wear time (NWT) validation using a minute-by-minute evaluation for calculation of overlap in length (minutes) of NWT detected based on accelerometry (accelero.) as compared to diary (black rectangles, wear time based on 24 h-accelerometry or diary data, respectively; white rectangles, NWT based on 24 h-accelerometry or diary data, respectively). As an example, in the original diary data (panel I), five NWT periods during waking were reported being 30, 30, 115, 130, and 30 minutes, respectively. When applying the 60-min NWT algorithm (panel II) to both accelerometry and diary data two of all NWT periods in diary (115 and 130 minutes, respectively), and three NWT periods in accelerometry were detected (75, 65, and 140 minutes, respectively). When applying the 60-min NWT algorithm (panel III) to accelerometry data only while any NWT regardless of a minimal length reported in the diary was assessed, still three NWT periods in accelerometry (75, 65, and 140 minutes, respectively) but all five ‘original’ NWT periods in diary (30, 30, 115, 130, and 30 minutes, respectively) were detected. For each NWT algorithm and both approaches (panel II and III), NWT minutes detected by accelerometry only, in diary only and in both accelerometry and diary were assessed. We then calculated the relative contribution of each of these to the potential total NWT (i.e., NWT detected in either diary, accelerometry, or both). Overlap was defined as the NWT detected in both accelerometry and diary.

3. Overlap of any diary NWT with accelerometry NWT identified using the >60-min to >180-min algorithms.

To investigate the ability of the algorithms to detect NWT at all and to assess the contribution of NWT periods <60 minutes to total NWT in 24 h-accelerometry data, analogue analyses to the second approach were performed comparing any NWT detected in the diary (regardless of a minimal length) with NWT detected by the algorithms >60 to >180 minutes in the accelerometry data (illustrated in Fig. 2, panel III, for the 60-min algorithm) on a minute-by-minute basis.

Analyses were performed using SAS® Enterprise Guide®, version 4.3 (SAS Institute Inc., Cary, NC) as well as the R Statistical Programming Language for Windows, version 3.2.022 and MS Access 2010/SQL via Microsoft Visual Basic for Application 7.0.

Results

Characteristics of the study populations of ActivE (N = 47; men, 51.1%) and KORA (N = 559; men, 46.9%) are summarized in Table 1. According to the participants’ diary, the median number of NWT periods/24 h as well as median total length of NWT/24 h was similar between studies, (number, 0.8/24 h in ActivE, 0.9/24 h in KORA; length, 20.8 min/24 h in ActivE, 23.9 min/24 h in KORA) (Table 1). There were no significant differences in number of NWT periods/24 h or length of NWT/24 h between age groups of 20–39 years, 40–59 years, and 60+ years in ActivE (p = 0.36 and p = 0.88, respectively) and between age groups of <60 years and 60+ years in KORA (p = 0.54 and p = 0.96, respectively). Over the total time of assessment, 9.1% and 15.4% of the total NWT reported in the diary were >60 min in ActivE and KORA, respectively. Thus, in consequence, around 85% of the reported NWT periods were <60 minutes and thus not detectable by the algorithms. Common reasons for NWT reported were showering, personal hygiene, changing clothes etc. NWT periods >180 minutes made up <5% of all NWT reported. During waking, the proportion of reported NWT >60 to >180 minutes was comparable to the results seen over the total time of assessment, while during sleeping the proportions were considerably larger.

Table 1 Characteristics of the study populations of the ActivE and KORA FF4 study.

In ActivE, 657 waking and 656 sleeping phases of 47 participants were available for separate analyses. In KORA data of 559 participants were available for total time of assessment and 3,483 waking (546 participants) and 3,352 sleeping phases (551 participants) were included for separate analyses.

1. Sensitivity and specificity of NWT algorithms >60 to >180 minutes based on accelerometry compared to diary.

Overall, sensitivity and specificity to detect NWT in 24 h-accelerometry data were low for the 60-min algorithm in both studies (Table 2). Particularly, specificity was low for the total time of assessment (ActivE, 0.00; KORA, 0.08) and during sleeping (ActivE, 0.59; KORA, 0.77). In general, specificity increased with length of NWT algorithms. Sensitivity also increased with length of NWT algorithms for the total time of assessment. During sleeping phases, sensitivity already reached a maximum for the 60-min algorithm, being 1.00 in ActivE and 0.90 in KORA, respectively; sensitivity for longer NWT algorithms did not or only slightly decreased.

Table 2 Sensitivity and specificity: NWT algorithms >60 to >180 minutes using accelerometry versus diary.

For the total time of assessment, sensitivity and specificity were both high when applying the 180-min algorithm in ActivE and the 120-min algorithm in KORA, respectively.

During waking phases, both, sensitivity and specificity were high using the 90-min algorithm in ActivE and the 120-min algorithm in KORA, respectively.

During sleeping phases, both, sensitivity and specificity were high for NWT algorithms of >90 to >180 minutes of acceleration zero-counts with no or only slight differences in sensitivity and specificity between these algorithms in both studies.

2. Overlap of accelerometry and diary NWT periods identified using the >60-min to >180-min algorithms.

Results for the (minutes) of NWT detected by diary or accelerometry, and their overlap are shown in Table 3. Overall, in both studies the overlap was lowest when applying the 60-min algorithm: for the total time of assessment the length of NWT detected in both, diary and accelerometry contributed 18.0% in ActivE and 28.4% in KORA to all NWT minutes detected by either diary, accelerometry, or both (i.e., of potential total NWT). In contrast, the length of NWT detected in diary only (but not in accelerometry) contributed 4.2% in ActivE and 10.9% in KORA, whereas those detected by accelerometry only (but not in diaries) contributed 77.8% in ActivE and 60.7% in KORA, suggesting that a substantial proportion of NWT detected by accelerometry was false positively identified. This was mainly related to sleeping where about 90% of NWT time was detected by accelerometry only while it was about 40% during waking.

Table 3 Overlap of accelerometry and diary NWT periods identified using the >60-min to >180-min algorithms.

When focusing on the total time of assessment, the 180-min and 120-min algorithm that showed high sensitivity and specificity also showed the highest degree of overlap in NWT minutes detected by both, diary and accelerometry in ActivE (63.5%) and KORA (42.1%), respectively. Applying these algorithms, 26.7% and 43.7%, respectively, of total NWT time were detected by accelerometry only.

For waking phases, applying the most sensitive and specific 90-min and 120-min algorithm in ActivE and KORA, respectively, also resulted in a comparably high NWT minutes overlap, i.e. 44.4% and 45.2%, respectively. About 40% of total NWT time was detected by accelerometry only when using these algorithms.

During sleeping phases, using algorithms >90 to >180 minutes showing high sensitivity and specificity to detect NWT, the degree of overlap in NWT minutes identified by both, diary and accelerometry increased with length of the algorithm in both studies, from about 20% to 50% in ActivE and 26% in KORA. However, even with algorithms >180 minutes about 50% to 70% of the NWT were detected by accelerometry only.

3. Overlap of any diary NWT with accelerometry NWT identified using the >60-min to >180-min algorithms.

The overlap in NWT minutes identified by both, diary and accelerometry was low when using all reported NWT minutes from the diary (i.e., also less than 60 min) in comparison to accelerometry based NWT minutes using algorithms: for the total time of assessment the overlap ranged between 14.7% and 21.9% in ActivE and 21.8% and 26.7% in KORA (Table 4). The highest overlap was achieved in both studies when applying the 90-min and 120-min algorithm to identify accelerometry NWT. During sleeping, the overlap increased with algorithm length from 5.0% to a maximum of 49.4% in ActivE, and from 9.1% to 26.0% in KORA, respectively. Interestingly, when using the 60-min algorithm the overlap was highest during waking, i.e. 20.8% in ActivE and 27.3% in KORA, and decreased with increasing NWT algorithm length to 4.8% in ActivE and 17% in KORA, respectively.

Table 4 Overlap of any diary NWT with accelerometry NWT identified using the >60-min to >180-min algorithms.

Over the total time of assessment the proportion of NWT minutes detected in diary only increased with increasing algorithm length in both studies, i.e. from 23.8% to 73.7% in ActivE and from 29.9% to 55.7% in KORA. Comparing waking and sleeping revealed that this is mainly due to NWT during waking where up to 92.4% (ActivE) and 68.0% (KORA) were detected in diary only.

The proportion of NWT minutes detected by accelerometry only decreased in both studies with increasing algorithm length, for total time of assessment, waking, and sleeping. Particularly during sleeping the proportion was very high, about 90% for the 60-minute algorithm and still about 70% for the 120-minute algorithm.

Considering the algorithms that showed highest sensitivity and specificity (Table 2) during waking phases, the 90-min and 120-min algorithm, the overlap in NWT minutes detected in diary and accelerometry was about 20% only. Assessing the same for the sleeping phases the overlap in NWT minutes between diary and accelerometry was between 20% and 50% when applying NWT algorithms >90 to >180 minutes to accelerometry data.

Discussion

In this study, we tested the suitability of different algorithms to detect NWT in 24 h-accelerometry in two independent epidemiological studies in the general adult population. Overall, periods longer than 60 minutes made up only 10–15% of all NWT periods reported, whereas around 85–90% of total NWT minutes were spent in periods <60 minutes. This indicates that a considerable amount of NWT minutes was missed by all algorithms tested. When using the 60-min algorithm sensitivity and specificity to detect NWT based on 24 h-accelerometry was limited in comparison to diary information. The same was true when analyzing the overlap in NWT minutes between diary and accelerometry. Further, applying the 60-min algorithm resulted in high rates of false positive NWT detection based on accelerometry data. Algorithms between >90 to >180 minutes showed higher sensitivity, specificity, and larger overlap between NWT based on diary and accelerometry. However, we still found considerable differences when comparing diary and accelerometry derived NWT using algorithms regarding its overlap. Thus, our data suggests that the 60-min algorithm is less suitable for NWT detection in 24 h-accelerometry. Among the different algorithms assessed here, the 120-min algorithms seems to be a compromise between high sensitivity, specificity, large overlap, and the detection of as much NWT as possible. This holds true for continuous 24 h-accelerometry data as well as for waking and sleeping phases only.

Correct and precise detection of NWT periods is important for unbiased accelerometry based estimation of habitual physical activity. When NWT periods are unknowingly included in activity estimates, they lead to an overestimation of the time spent in the lowest physical activity intensity23. Conversely, when true wear periods without movements are classified as NWT, the subsequent exclusion of this time leads to an underestimation of sedentary behavior and overestimation of relative time in physical activity; this effect was shown in a study on sedentary behavior in children24.

The most common approach to identify NWT is to define >60 consecutive minutes of zero-counts as an objective criterion and was originally developed for accelerometry conducted during waking hours only9, 15. However, 24 h-accelerometry is increasingly applied, particularly in large-scale epidemiological studies, since it provides a comprehensive picture of activity habits encompassing also periods in between being awake and asleep and further reduces the likelihood of forgetting to put on the accelerometer after sleeping. Thus, we assessed algorithms to detect NWT periods of >60 to >180 minutes over the total time of assessment as well as separately for waking and sleeping phases. In general, as one may have expected, we detected fewer NWT periods in diary and accelerometry data the longer the NWT definition was.

Waking phases. Although the common 60-min algorithm was developed for waking hours, our data showed that only a small part of the total NWT periods during waking phases reported in diaries is actually longer than 60 minutes, indicating that a substantial proportion of NWT will be missed with this algorithm. Further, sensitivity and especially specificity of the 60-min algorithm was low. In addition, the overlap in NWT minutes identified by both, diary and accelerometry was lowest for the 60-min algorithm and the rate of false positively detected NWT highest comparing accelerometry and diary (Table 3). We found higher sensitivity, specificity, and larger overlap for the 90- and 120-min algorithm. Several previous studies indicate that NWT algorithms longer than 60 minutes improve NWT detection during waking hours. Peeters et al. recommended a 90-min algorithm that has already been applied in few field studies7, 25, 26. Choi et al. showed lower rates of NWT misclassification for a 90-min than for a 60-min algorithm27, 28. Hutto et al. found 60- and 90-min algorithms to substantially overestimate NWT and underestimate sedentary time and suggested a 120-min NWT algorithm29. Finally, Oliver et al. concluded that a 180-min algorithm is most suitable for NWT detection during waking hours; however this study was focused on a sedentary population30. Nevertheless, even when we applied the ‘best’ algorithms according to our analyses, only about 20% of all NWT reported regardless of their length were detected using accelerometry (Table 4).

Sleeping phases. In contrast to waking phases, periods with no detectable movement over 60 minutes or longer are likely to occur during sleeping phases. However, studies investigating NWT during sleeping phases are scarce. In our study, a minority of all NWT reported in the diaries occurred during sleeping phases and those occurred were mainly longer than all algorithms applied. Thus, sensitivity was generally high. However, specificity of the 60-min algorithm was substantially lower compared to the longer NWT algorithms, indicating that wearing time during sleep is often falsely assigned as NWT; indeed the proportion of NWT minutes false positively detected by accelerometry only was remarkably higher for the 60-min than for all longer NWT algorithms. At the same time, overlap of NWT minutes was extremely low for the 60-min algorithm but achieved even with the 180-min algorithms only 50% and 26% in ActivE and KORA, respectively. Specificity during sleeping phases was slightly higher in KORA than in ActivE. This might be explained by the fact that KORA participants wore the accelerometer on the wrist during sleeping phases; moving the arm/hand during sleeping is more likely and thus the probability of false negative NWT detection lower for wrist- than for waist-worn devices27. Taken our results into account and considering the fact that very few NWT periods occur during sleeping the algorithms tested have to be valued with caution as they might introduce a bias that is larger than just assuming no NWT during sleeping.

Total time of assessment. In our analysis, the 60-min algorithm showed moderate sensitivity and extremely low specificity to detect NWT periods when analyzing 24 h-accelerometry data. The high degree of false positively detected NWT during sleeping might substantially contribute to the extremely low specificity of NWT detection in 24 h-accelerometry data; however, surprisingly, even during waking phases, the 60-min algorithms failed to reliably detect NWT. Further, we observed the overlap of the length of NWT detected by both, diary and accelerometry to be lowest and the proportion of NWT detected by accelerometry only to be highest applying the 60-min NWT algorithm over the total time of assessment, indicating only little agreement between accelerometry and diary based NWT. Again, we found considerable limitations of this algorithm during waking and especially sleeping resulting in poor overlap found for the total time of assessment. Consequently, our data indicate that the 60-min algorithm is less suitable for NWT detection in continuous 24 h-accelerometry data. Falsely classifying longer periods of true wear time without activity as NWT might lead to a substantial underestimation of sedentary time (including sleeping) and overestimation of relative time in activity. Among those tested, the algorithms that had high sensitivity, specificity, and degree of overlap in NWT minutes over the total time of assessment were the 180- and 120-minutes algorithm in ActivE and KORA, respectively; however, still a large proportion of reported NWT was not detected when applying these algorithms. Accordingly, the detection of shorter periods of NWT is limited.

In our studies, short NWT periods (i.e., <60 minutes) accounted for the majority of all NWT periods, which is comparable to the study by Oliver et al. assessed during waking hours30. Further, our data show that even with the ‘best’ of the tested algorithms, most of these NWT minutes were not detected in 24 h-accelerometry as compared to diary data. In case the accelerometer was removed and physical activity conducted during these periods of NWT, their inclusion would result in an underestimation of physical activity. However, the main reasons for NWT reported in the diaries of the ActivE study were showering and changing clothes, and, as such, not related to substantial physical activity. We therefore speculate that inclusion of these shorter periods of NWT is likely to have only little effect on physical activity estimates.

When assessing sensitivity and specificity of the NWT algorithms, we considered the diaries to be the ‘gold standard’ reference. This seems reasonable since we expected participants to report the occurrence of NWT quite accurately. However, we speculate that the reported information on the length of NWT might be less precise than information derived from accelerometers, e.g., participants may tend to round or estimate time points. Thus, there may be the dilemma that while the diary may be the gold standard to detect the presence or absence of true NWT, accelerometers may be more valid to estimate the exact time of its beginning, ending, and length.

A strength of our study was that we analyzed two independent studies on the general adult population from different regions in Germany covering a broad spectrum of participants’ characteristics including highly motivated, voluntary, as well as population-based participants. In contrast to most other studies, we had 24 h-accelerometry as well as diary data recorded over 1–2 weeks, allowing a differentiated analysis for total time of assessment and for waking and sleeping phases. Nevertheless, our study has some limitations. First of all it’s limited to adults. Although we had two independent studies that should cover the range of characteristics seen in the general population, specific phenotypes, e.g. extreme obesity, advanced cardiovascular or respiratory diseases, and narrower, younger, or older ages might show other frequencies and durations of (non−) moving phases as well as different compliance to wearing the accelerometer. Further, participants of the ActivE study constantly wore the accelerometer on the right hip, while KORA participants changed the devices from the hip to the wrist for sleeping; however, absolute values and relative differences between algorithms were similar between both studies. Thus, this methodological aspect in the comparative analysis might not be substantial31. Additionally, wear time periods were defined as absence of any NWT; thus the specificity for the total time of assessment cannot directly be compared to the specificity of waking or sleeping phases, and the absolute values should be interpreted cautiously. Further, we assumed the participants’ diary data as ‘gold standard’ and any misreporting of our study participants may therefore dilute our findings. Finally, a general limitation of any NWT algorithm >60 minutes is that shorter NWT are not detectable at all but made up a substantial part of NWT. Thus, further studies on alternative approaches to detect NWT <60 minutes are warranted to enable a reliable and comprehensive NWT detection.

In conclusion, our data indicate that the 60-min algorithm is not suitable to detect NWT in 24 h-accelerometry data in epidemiological studies. This was due to a high rate of false positively detected NWT by accelerometry. NWT detection should be valued with caution during sleeping given the weak performance of all assessed algorithms and the low number of NWT during sleeping. All algorithms assessed may miss a substantial proportion of short NWT, which made up the vast majority of all NWT reported. Even the most sensitive and specific algorithms still false positively identified a large proportion of NWT based on accelerometry. Although each has limitations, among the different algorithms assessed here, applying the 120-min algorithm seems to be a compromise between accuracy and detection of as much NWT as possible in 24 h-accelerometry data.