Introduction

Sleep disordered breathing (SDB) is a highly prevalent secondary complication following spinal cord injury (SCI). As many as 91% of people with complete tetraplegia have SDB, with or without clinical signs [1, 2]. Tetraplegia substantially reduces health, but SDB confers additional, statistically and clinically significant impairments to health status [3] and neuropsychological function [4, 5]. As such, appropriate diagnosis and treatment of SDB following SCI is of high importance.

Sleep disordered breathing diagnosis and severity is typically quantified by the apnoea hypopnoea index (AHI), a measure of the number of apnoeas (cessations of breathing) and hypopnoeas (reductions in breathing) per hour of sleep. Although the prevalence of SDB in people with SCI has been widely reported as elevated in the literature, differing patient characteristics, study methodology, and importantly, changes over time in the rules defining respiratory events (apnoeas and hypopnoeas) are important challenges in pooling [6] and ideally meta-analysing the population prevalence data. In 1968 Rechtschaffen and Kales (R&K) produced the first standardised methodology for analysing sleep from polysomnography (PSG) [7], and in 1999 the American Academy of Sleep Medicine (AASM) produced a consensus set of rules for defining respiratory events during sleep (AASM1999) and recommendations for SDB diagnosis (AHI ≥ 5) and severity thresholds (mild SDB AHI ≥ 5–15; moderate SDB AHI > 15–30; severe SDB AHI > 30) [8]. In 2007 the AASM released an updated, integrated manual for both sleep staging and respiratory scoring (AASM2007) [9], with a further revision in 2012 (AASM2012) [10]. Each rule set revision has changed respiratory event scoring criteria, particularly for hypopnoeas [10], with AASM2007 having two hypopnoea definitions, recommended (REC) and alternative (ALT). The change from AASM1999 to AASM2007 was essentially one of stricter criteria, resulting in lower AHIs for any given severity of SDB, while the change from AASM2007 to AASM2012 was more inclusive and resulted in higher AHIs, although not as high as AASM1999 [11]. These changes have necessitated altered clinical interpretation of AHI values and diagnostic and severity thresholds, which have not been specifically updated to reflect the changed respiratory event criteria. As Chiodo et al observed [6], heterogeneity in testing technologies and the aforementioned rule-set changes have limited pooling of previous studies in SCI. We speculate that rule-set changes have undermined our ability to specifically define the burden of SDB in SCI and thus a consequent failure to adequately diagnose and treat the disease. This paper aims to address this issue by quantifying the impact of rule-set changes on the AHI, and thus providing a framework for interpretation over time.

Research conducted with full PSG, directly comparing AASM rule sets, have analysed change in AHI in sleep laboratory samples [11,12,13], lean (body mass index (BMI) in the normal range) obstructive sleep apnoea (OSA) patients [14], chronic SCI patients [2], and community samples [15, 16]. Other research has also compared AHIs with respiratory criteria which did not directly align with any AASM rule set [17, 18], with only respiration measured (not sleep) [17, 19], and within one rule-set only (the two AASM2007 criteria) [20]. The Australasian Sleep Technologist and Australasian Sleep Associations recommend [21] using the ‘alternative’ AASM2007 hypopnoea definition (AASM2007-ALT rather than AASM2007-REC) and as such, the current study aimed to generate new AHI cut-points for diagnosis and severity of SDB in SCI using three AASM rule-sets commonly reported in, and recommended for use by, the literature. Of note, cut-points for SDB severity (mild, moderate and severe) were defined in the 1999 AASM document [8] and as such comparisons of severity should be made relative to this rule-set. Specifically, comparisons of the AASM1999 AHI (AHI1999) to the AASM2007 ‘alternative’ AHI (AHI2007-ALT), and the AHI1999 to the AASM2012 AHI (AHI2012), in two research samples that sought to diagnose SDB in people with tetraplegia. Further, the current study provides a framework for consideration of the changes in respiratory event scoring, and the effect of rule-set modifications on comparisons between SCI and able-bodied samples.

Methods

Two previously collected datasets from two separate studies in people with tetraplegia were re-analysed with a different rule set than used in the original analyses. The first study was a randomised controlled trial of continuous positive airway pressure for people with acute tetraplegia (inclusion criteria was injury duration < 1 year) [22]. The second study was a population cohort study [1] conducted to develop and validate a sleep apnoea screening model for people with chronic tetraplegia (inclusion criteria was injury duration > 1 year) [23]. Both projects employed portable, unattended, full PSG with a Type 2 ambulatory device (Somte; Compumedics, Victoria, Australia) performed in participant’s acute or rehabilitation hospital beds or homes. The signals collected with the Type 2 (home) device are of the same type and employ the same technologies, filter settings, etc. as in-laboratory studies. Both research projects received ethics approval from the Austin Health Human Research Ethics Committee.

Analysis 1 examined a convenience sample of 24 PSGs, drawn from 88 PSGs from the first study, and compared the AHI1999 and AHI2007-ALT. This sample of 24 PSGs was split into terciles according to AHI with eight PSGs randomly selected from each tercile. Analysis 2 examined 78 PSGs from the second study in which the AHI1999 was compared to AHI2012. For both analyses montages and filters were set as per each scoring criteria manual, and apnoeas and hypopnoeas were scored from nasal flow and respiratory inductance plethysmography. Sleep and respiratory events were manually staged and scored according to each rule set, and summary indices generated with Profusion 3 software (Compumedics, Victoria, Australia). The AHIs were compared using paired t-tests, rate of misclassification was tabulated, and equivalent AHI severity cut-offs (for AASM1999 cut-points of 5, 15 and 30) calculated using receiver operating characteristic (ROC) curves and accompanying sensitivity and specificity values. The results of Analysis 1 and 2 are presented alongside data from similar ‘rule-set comparison’ studies conducted in the able-bodied [11, 12]. Previously published data are presented for Ruehland, Rochford [12], and raw de-identified data was obtained from Duce, Milosavljevic [11] to calculate AHI severity cut-offs via ROC curves, sensitivity and specificity.

Analysis 1

The Australasian Sleep Technologist and Australasian Sleep Associations commentary recommendations [21] on the AASM2007 criteria were applied, and the ‘alternative’ hypopnoea definition used (AASM2007-ALT rather than AASM2007-REC). Prior to any analyses, three experienced (>10 years) sleep scientists undertook a quality control and technique alignment exercise to optimise scoring consistency.

Each PSG was de-identified, previous analyses deleted and all PSGs analysed with one rule-set at a time (a 'block' of PSGs). The order of PSG blocks was randomly allocated for each scientist. The order of the PSGs within each block was also randomised each time. The scientist technique alignment exercise was repeated after the first block of PSGs. Each PSG was thus analysed twice by each scientist, once each with the AASM1999 and the AASM2007-ALT respiratory criteria. The AASM1999 respiratory scoring criteria was applied in conjunction with R&K sleep staging and American Sleep Disorders Association arousal scoring (defined as an abrupt shift in electroencephalography frequency, with frequency greater than 16 Hz but not spindle activity, at least three seconds in duration, and in REM sleep with concurrent electromyography increase) [24], while the AASM2007-ALT respiratory scoring was applied in conjunction with the accompanying AASM2007 rules for sleep staging and arousal scoring.

Change in AHI between the two analysis techniques was the primary outcome of interest. Event-by-event respiratory concordance, change in sleep staging, arousal, and respiratory event related arousal, are presented with further detail online. Inter-rater reliability for AHI was analysed with two-way, random, absolute agreement, intraclass correlation coefficients (ICC) for each rule-set. Data were averaged across the three scorers prior to analyses, to provide mean values for each separate PSG for each rule-set.

Analysis 2

The inter-rater reliability of the scorers in Analysis 1 was excellent (see Results). As such, a single experienced sleep scientist (above) reviewed each respiratory event within the PSGs previously marked according to the AASM1999 criteria to establish whether it met the AASM2012 respiratory event criteria as previously described [12].

Results

Analysis 1: comparison of AHI1999–AASM2007-ALT

Participant details are included in Table 1 and event-by-event comparisons between rule-sets are detailed online. Briefly, event-by-event respiratory analyses showed substantial changes to the hypopnoeas scored between rule-sets. Seventy percent of hypopnoeas scored with AASM1999, were scored differently using the AASM2007-ALT criteria (Online Tables 1 and 2). Respiratory event related arousals and respiratory disturbance indices are presented online (Online Table 3 and Fig. 1). The AASM2007 rule-set produced significantly less Stage 2 non-Rapid Eye Movement sleep (N2) and more Rapid Eye Movement (REM or Stage R) sleep than R&K. Twenty percent of R&K Stage 2 sleep was classified into alternative stages using AASM2007, most commonly Stage 1 (N1) and REM (Stage R). Nineteen percent of AASM2007 Stage R was classified differently using R&K, most frequently as Stage 1 and 2 (Online Tables 4 and 5). Overall, event-by-event analyses demonstrated good reliability between scorers and where differences were observed, they were predominantly in the marking of arousals.

Table 1 Participant demographic and injury information
Fig. 1
figure 1

Illustrated change in American Association for Sleep Medicine (AASM) respiratory event scoring definitions, and the effects on the average apnoea hypopnoea index (AHI) across various samples. Figure note: Changes in the apnoea hypopnoea index are calculated from overall group mean or median AHI (as provided) for each study and rule set (2, 11–16)

Excellent AHI reliability was observed between scientists (AASM1999 ICC = 0.995, 95%CI = 0.991–0.998; AASM2007-ALT ICC = 0.997, 95%CI = 0.993–0.997). The AHI2007-ALT was significantly lower than the AHI1999 across all SDB severities (Table 2). The mean AHI2007-ALT was 78% of the mean AHI1999, and the proportional effect of the reduction with AHI2007-ALT was largest in those with milder disease (Table 2).

Table 2 Change in apnoea hypopnoea index (AHI) and significant differences between the 1999 American Academy of Sleep Medicine (AASM) AHI (AHI1999) and the 2007 AASM ‘alternative’ AHI (AHI2007-ALT) in people with acute tetraplegia. Data are presented as mean (SD) and ranges (minimum to maximum) for each sleep apnoea severity group, and overall

The change from AASM1999 to the AASM2007-ALT resulted in 17% of the participants dropping below the SDB diagnostic threshold (AHI of 5), and a quarter of the participants moved to a less severe SDB category (Table 3). Table 4 provides equivalent cut-points using AASM2007-ALT criteria for AASM1999 AHI values of 5, 15 and 30, for current and past [11, 12] study findings.

Table 3 Sleep apnoea severity classification change from 1999 American Academy of Sleep Medicine (AASM1999) to 2007 AASM ‘alternative’ criteria (AASM2007-ALT) in people with acute tetraplegia
Table 4 2007 American Academy of Sleep Medicine (AASM) ‘alternative’ (AASM2007-ALT) criteria, equivalent apnoea hypopnoea index (AHI) cut-points for AASM1999 sleep apnoea severity classifications, across current and previous study findings

Analysis 2: comparison of AASM1999–AASM2012

The calculated AHI2012 were on average 83% of the AHI1999 and significantly lower across all SDB severities (Table 5). Approximately half of the participants previously classified as having mild (AHI 5 < 15) or moderate (AHI 15 < 30) SDB were reclassified into a lower severity using the AASM2012 criteria (Table 6); 48% of those previously classified as ‘mild SDB’ would no longer meet criteria for having SDB based on the AHI alone. Equivalent AHI cut-points of AASM1999 severity criteria were calculated at 3 (equivalent to mild), 10 (moderate) and 21 (severe) for the AASM2012 criteria (Table 7).

Table 5 Change in apnoea hypopnoea index (AHI) and significant differences between the 1999 American Academy of Sleep Medicine (AASM) AHI (AHI1999) and the 2012 AASM AHI (AHI2012) in people with chronic tetraplegia. Data are presented as mean (SD) and ranges (minimum to maximum) for each sleep apnoea severity group, and overall
Table 6 Change in sleep apnoea severity classification from 1999 American Academy of Sleep Medicine (AASM) apnoea hypopnoea index (AHI) (AHI1999)–2012 AASM AHI (AHI2012) in people with chronic tetraplegia
Table 7 2012 American Academy of Sleep Medicine (AASM) criteria (AASM2012) equivalent apnoea hypopnoea index (AHI) cut-points for AASM1999 sleep apnoea severity classifications, for current and previous study findings

Discussion

These data demonstrate the effect of different sleep staging and respiratory scoring rules on measures of SDB severity and diagnosis. Sleep disordered breathing is highly prevalent and frequently severe in tetraplegia, but if sleep scoring rules change the indices, with no change in underlying disease, it is difficult for patients, clinicians and policy makers to know what to do. A similar process of examining comparability of measures of injury completeness was undertaken with the American Spinal Injury Association classification standards in 2002 [25]. The current study demonstrates the magnitude of the effect of change in respiratory criteria over time and across multiple rule sets for people with and without SCI. Additionally, we provide diagnostic and severity thresholds for people with SCI, which could guide clinical practice. These 'threshold for disease' changes over time are particularly important in SCI for patients with milder disease where clinically important changes in diagnosis and treatment of disease might occur.

Inconsistencies have arisen in the literature around the prevalence of SDB in people with SCI. For example, the prevalence of SDB in people with acute tetraplegia has been reported at 83% when the AASM1999 criteria are employed and at 53% with the AASM2007 criteria [26]. This difference in prevalence would in large part be attributable to the different criteria used, however this is often not adequately considered in the discussion of such findings. Similarly, longitudinal change in SDB severity for an individual is difficult to interpret from an AHI that may have been scored differently over time.

The AASM1999 criteria were the only criteria to provide cut-off recommendations for diagnosis and severity thresholds of SDB [8]. With changes to criteria, the AASM have advised that thresholds should be adjusted depending on the respiratory event definition being used, however they have not provided revised cut-off estimates [10]. People with SCI are a unique population group in their SDB pathophysiology, incidence and severity [27,28,29]. Figure 1 illustrates respiratory event definition changes over time and the resultant impact on AHI in people with SCI [2], in sleep laboratory samples [11,12,13], lean OSA [14], and the general able-bodied population [15, 16]. Overall, the change from AASM1999 to AASM2007 resulted in the largest changes in AHI, while the introduction of the AASM2012 criteria has returned the AHI closer to the original AHI1999. The AASM2012 hypopnoea definition intermediates the AASM2007-ALT and previous AASM1999 criteria, and importantly includes both arousal and desaturation which each have detrimental effects and individually impact treatment outcome [14]. Each AASM revision simply calls the apnoea hypopnoea index the “AHI”, with no designation as to which rule-set was applied. It is likely that future AASM revisions will alter the landscape again and as such, cut-off revisions may be needed again.

Sankari et al. [2] investigated the change in AHI from AASM2007-REC to AASM2012 rule-sets in 26 participants with thoracic and cervical SCI. The people with SCI investigated by Sankari et al. [2] had more severe SDB overall than other able-bodied samples from sleep laboratory populations, and the overall percentage change in AHI was less (Fig. 1). The current study examined the PSGs of 102 people with acute and chronic tetraplegia. Similarly, with more severe SDB, the change from AASM1999 to AASM2007-ALT in people with acute SCI was less than that of sleep laboratory samples, and aligned more closely with the lean OSA patients who were of similar BMI [14].

Reclassification of SDB according to new respiratory criteria (AASM2012 and AASM2007-ALT), with use of the only pre-existing thresholds (set out by the AASM1999 rule-set) for diagnosis and severity [8] resulted in a non-diagnosis of SDB for 48-67% of people previously classified as having mild SDB according to the AASM1999 respiratory criteria, highlighting the need for adjusted thresholds. The variability in degree of change between populations highlighted in Fig. 1 also demonstrates the utility of population-specific thresholds. For people with SCI, pre-existing AHI thresholds of 5, 15 and 30, are revised to 2.4, 8.1 and 16.3 for AASM2007-ALT and 3.2, 10.0 and 21.2 for AASM2012. These are important not only for appropriate clinical practice, but interpretation of research and prevalence data published over time.

A limitation of the current study was that PSGs were scored using nasal pressure only, and not with the recommended standard for apnoea detection, both nasal pressure and thermistor. However, Thornton et al. [30] found that AHI from nasal pressure alone was 3% higher than with nasal pressure and thermistor, and concluded that analysis without a thermistor signal would have only a modest impact. Reliability between scorers in the current study was consistent with other research of similar methodology investigating inter-rater reliability in sleep analyses [31, 32], and both current and previous research have demonstrated that arousal scoring is less reliable than sleep staging and respiratory scoring [32, 33]. The AASM2007-ALT and AASM2012 have put greater importance on the arousal compared to AASM1999 which did not require an arousal (or desaturation) to mark a hypopnoea if the decrease in breathing was greater than 50%. The marking of arousals merits significant attention in training and concordance measurement for sleep staging and respiratory scoring. Differences in PSG recording sensors and technologies also contribute to differences in severity estimates in the literature. For example, a meta-analysis has shown that home sleep studies can underestimate sleep apnoea severity [34], however this meta-analysis included many ‘home sleep studies’ that were not Type 2 ‘full PSG’ (and therefore did not measure the same signals as Type 1 in-laboratory PSG) and only estimated sleep and respiration rather than measure them. In the current study this confounder was able to be avoided by standard use of a Type 2 PSG device (measuring the same signals as an in-laboratory PSG) and as such our results reflect comparisons in AASM rules only. However, we cannot discount the possibility of differential results due to the ambulatory nature of the sleep studies and so this remains a possible limitation of the current study.

Understanding change in AHI is important with every change of rules, including in clinically distinct populations such as SCI. Regardless of the underlying factors contributing to differences between the population groups investigated to date, the variable degree of change in AHI across population groups highlights the need for population specific prevalence data to inform index interpretation. Clinicians treating patients with SCI must ensure current and accurate interpretation of AHI indices and the shifting thresholds for diagnosis and severity. Additionally, interpreting research and prevalence data conducted over different time periods, with different analysis (and data collection) rule-sets, requires knowledge of the relationship between each rule-set over time. At an individual patient care level, understanding that the AHI may have changed in a person over time because the rules changed, rather than because the person changed, is important. It is recommended that future updates to respiratory event definitions provide distinct nomenclature for each iteration of the ‘AHI’ and be accompanied by research which quantifies change in AHI from the outset, providing immediate knowledge of required adjustment of interpretation. There may also be a role for the development of SCI-specific guidance documentation.

Data archiving

Deidentified group data, as per informed consent, are all made available within the paper and supplementary material.