Abstract
There are numerous reports of cardiorespiratory patterns in infants on home monitors, but no data to determine whether "experts" agree on the description of these patterns. Therefore, we evaluated agreement among four experienced investigators and five trained technicians who assessed independently the same sample of physiologic waveforms recorded from infants enrolled in a multicenter study. The monitor used respiratory inductance plethysmography and recorded waveforms for apnea ≥16 s or a heart rate <80 beats/min for ≥5 s. The investigators and technicians initially assessed 88 waveforms. After additional training, the technicians assessed another 113 additional waveforms. In categorizing waveforms as apnea present or absent, agreement among technicians improved considerably with additional training (κ 0.65 to 0.85). For categorizing waveforms as having bradycardia present versus absent, the trends were the same. Agreement in measurement of apnea duration also improved considerably with additional training (intraclass correlation 0.33-0.83). Agreement in measurement of bradycardia duration was consistently excellent (intraclass correlation 0.86-0.99). Total agreement was achieved among technicians with additional training for measurement of the lowest heart rate during a bradycardia. When classifying apnea as including ≥1, ≥2, ≥3, or ≥4 out-of-phase breaths, agreement was initially low, but after additional training it improved, especially in categorization of apneas with ≥3 or ≥4 out-of-phase breaths (κ 0.67 and 0.94, respectively). Although researchers and clinicians commonly describe events based on cardiorespiratory recordings, agreement amongst experienced individuals may be poor, which can confound interpretation. With clear guidelines and sufficient training raters can attain a high level of agreement in describing cardiorespiratory events.
Similar content being viewed by others
Main
For more than 20 years there has been a variety of monitors used at home to warn caregives of potential life-threatening events among infants at perceived risk for sudden death. More recently, monitors have been developed that record waveforms of cardiorespiratory data before, during, and after such events; these data potentially permit identification of factors associated with serious morbidity and mortality. It is commonplace for researchers and clinicians to assess infants based on a review of these waveforms and there have been a number of reports describing cardiorespiratory patterns among infants placed on home monitors (1–6). However, none of these reports provides a sufficient description of methods of assessment of waveforms to assure reproducibility of results, nor do any describe efforts to assure acceptable agreement between raters. To interpret these data and have a common basis for comparisons between studies, it is essential to establish reliable methods that describe and assess cardiorespiratory variables based on waveforms recorded from home monitors. Although several studies have reported a high level of agreement between observers when summarizing the total amount of apnea during a continuous overnight recording (7–11), studies have not assessed agreement between observers for categorization of specific waveforms as having apnea or bradycardia present or determined if observers agree on the specific characteristics of apnea or bradycardia events.
Early in the design and development of a multicenter study of the utility of home infant monitoring (CHIME) we found marked disparity in interpretation of such data among senior clinical investigators at the various study centers. Using 80 physiologic waveforms of beat-to-beat heart rate (derived from two lead ECGs) and respiration (thoracic impedance signal and inductance signal from thorax, abdomen, and sum of thorax and abdomen), seven senior investigators frequently disagreed when classifying the data as obstructive, central, or mixed apnea or no apnea (there was complete agreement only 4% of the time). From these preliminary data two issues became apparent: 1) each senior investigator used different criteria to define categories of apnea and 2) because of the uncertainty in identifying obstructed breathing, assignment to a type of apnea was often problematic.
Accordingly, the purposes of the present study, after the establishment of definitions, were: 1) to determine the level of agreement among observers in describing the features of cardiorespiratory waveforms recorded during home monitoring, 2) to identify the factors that interfere with agreement, and 3) to determine whether, with training focused on the sources of disagreement, CHIME technicians could achieve a high level of agreement for categorizing events (κ > 0.8) and for measuring event duration (ICC >0.8). Demonstration of this level of agreement in the assessment of home cardiorespiratory recordings is of paramount importance both for the conduct of our multicenter study and for the comparison of data among different studies.
Some important factors can contribute to the lack of agreement among investigators. The cardiorespiratory data obtained with a home monitor, compared with that obtained in a laboratory, may be limited by fewer channels of physiologic data, lack of corroboration of an event by an observer, and signal artifact, e.g. the distinction between movement artifact and airflow obstruction may be difficult without an end-tidal CO2 tracing, an airflow channel or a trained observer. Some events, e.g. a "normal" breath is inherently difficult to define no matter how much information is available. Even if algorithms were developed to enhance consistency of detecting or judging cardiorespiratory events, there needs to be clear definition of the physiologic behavior to develop the rules.
Thus, to conduct our multicenter study we needed to discern the factors that interfere with agreement and develop strategies to reach an acceptable level of agreement. To attain these goals we chose to develop stringent rules to describe and assess cardiorespiratory data, we trained a group of scorers to use these rule, we determined the reliability of their scoring, and, using an iterative approach, we refined the rules to enhance interrater reliability. In this report we describe our efforts in achieving reliability of the description and assessment of cardiorespiratory events obtained with a home infant monitor. Additionally, we tested the concordance between events detected on the home monitor and those observed with a multichannel recorder.
METHODS
The reliability assessment included two groups of raters: four CHIME investigators (J.M.S., L.J.B., S.L.D.W., C.E.H.) with considerable experience (from 10 to 20 y each) scoring physiologic signals, and five technicians trained to perform the CHIME centralized scoring at the Data Coordinating and Analysis Center.
Our approach to the assessment of interrater reliability involved: development of an event classification protocol and training of personnel in the use of this protocol; initial testing of interrater reliability among four CHIME investigators and five trained scoring technicians; review of the sources of disagreement between raters; retraining of the technicians; and finally, subsequent reassessment of interrater reliability.
Overview of approach. First, from a large array of physiologic waveforms recorded at home that met thresholds (i.e. events detected by the monitor) for either apnea or bradycardia, we selected representative samples for use in assessment of interrater agreement.
Second, for a given tracing each rater assessed the following: apnea recordings, 1) the longest interbreath interval based on the sum channel, and 2) occurrence of out-of-phase efforts on the abdominal and rib cage channels; bradycardia recordings, 1) duration of heart rate below 80 beats/min, and 2) heart rate nadir.
Third, we used two types of statistical analyses: 1) κ coefficient for categorical variables (i.e. presence or absence of events: apnea, bradycardia, and out-of-phase breath) and 2) ICC coefficient for the magnitude of differences in measurement of continuous variables (i.e. event duration and heart rate nadir).
Fourth, we recognized that disagreement in measurement of the duration of events had the potential to exaggerate disagreements in categorizing events as present or absent (e.g. raters might only differ in measurement by 1 or 2 s but fall on opposite sides of the event threshold). To address this issue we adopted a strategy where raters measured all interbreath intervals ≥10 s and any duration of a heart rate <80 beats/min, even though the monitor thresholds for recording waveforms had been set at apnea ≥16 s or bradycardia ≥5 s. The result of this approach was that: 1) we were able to assess the impact on agreement of varying the threshold for categorization of apnea or bradycardia as present (i.e. to what extent does a tight cutoff exaggerate disagreement), and 2) we were able to evaluate quantitatively (with ICC) the reliability in measurement of duration using a wide span of data.
Physiologic recordings. Infants were enrolled in one of four groups: SIDS siblings (infants who had a prior sibling who died of SIDS as documented at autopsy); ALTE (infants who had experienced an idiopathic apparent life threatening event that met specified severity criteria); preterm infants (born at <34 wk of gestation and <1750-g birth weight); and healthy term infants (born at term who had no problems requiring treatment during the newborn period). Written informed consent was obtained form the parent(s) or guardian(s) of all infants, and the study was approved by the institutional review board at each CHIME study site.
Each infant had cardiorespiratory waveforms recorded in the home using the CHIME study monitor (NonInvasive Monitoring Systems, Miami, FL). Ribcage and abdominal respiratory effort were recorded by respiratory inductance plethysmography bands and a third channel (proportional to tidal volume) was calculated based on their weighted algebraic sum (sum channel) (12,13). Heart rate was determined using an R-wave detection algorithm from the ECG taken using standard disposable infant ECG electrodes.
The monitor was set to record the physiologic waveforms whenever it detected apnea ≥16 s in duration or a heart rate that fell below 80 beats/min for ≥5 s. When the monitor identified an event as having met the recording criteria, physiologic waveforms were recorded for 75 s preceding onset of the event and for 30 s after resolution of the event.
Selection of waveforms for reliability studies from physiologic recordings. To compile a representative set of waveforms for evaluation of reliability, a member of the study team who did not participate in the reliability assessment screened a series of recordings. This screening verified the presence of apnea and/or bradycardia as well as false events. Two data sets were created such that each set contained waveforms from each of the four study groups and each subject contributed a similar number of waveforms. Ten infants (two ALTE, four preterm, two SIDS siblings, and two healthy term) contributed waveforms to data set no. 1, which included 62 waveforms recorded as having met the monitor's apnea threshold (i.e. potential apnea) and 26 waveforms recorded as having met the monitor's bradycardia threshold (i.e. potential bradycardia). In four of the above waveforms the potential apnea and bradycardia overlapped. In these cases the waveforms were assessed twice, once to assess the potential apnea and once to assess the potential bradycardia. Eight infants (two ALTE, three preterm, two SIDS siblings, and one healthy term) contributed to data set no. 2, which included 73 waveforms recorded as potential apnea and 40 recorded as potential bradycardia, none of which overlapped. Both the CHIME investigators and the technicians each assessed data set no. 1. After a period of additional training, the technicians each assessed only data set no. 2.
Evaluation of waveforms. A protocol was developed that described in detail the criteria for assessment of waveforms. Figure 1 provides a summary of these guidelines.
For each waveform recorded as potential apnea by the monitor, each rater initially determined if the longest interbreath interval was ≥10 s and, if so, used the software tool to measure and store 1) duration of apnea and 2) number of out-of-phase deflections within the apnea where abdominal and rib cage channels had minimal or no deflection on the sum channel. Waveforms with interbreath intervals of <10 s were noted as having apnea absent and were not assessed further. The purpose of recording the number of out-of-phase efforts is that they may have value in identifying some periods of obstructed breathing. We recognize that out-of phase deflections are not equivalent to obstructed breathing and that ultimately we shall need additional data (e.g. airflow) to understand better the relationship between out-of-phase efforts and obstructed breathing. However, it is important here to assess whether we can reliably identify out-of-phase breathing.
For each waveform recorded as potential bradycardia (i.e. met monitor's threshold of <80 beats/min for ≥5 s) each rater initially determined if there was any period where the heart rate fell below 80 beats/min and, if so, recorded the duration of the bradycardia (i.e. the number of seconds heart rate <80 beats/min) and the heart rate nadir. Waveforms judged to have no period of time below the heart rate threshold were noted as having bradycardia absent and were not assessed further.
Assessment of waveforms was performed on a computer with a 21-inch video display using a software tool developed for this project. The physiologic montage included the following five channels: abdominal inductance, thoracic inductance, sum channel, ECG, and heart rate. The assessment procedure involved transfer of the contents of a single removable hard drive (i.e. all of the waveforms recorded from an individual subject during a 2-4-wk time interval) into the software tool on the base station computer. The software tool was used to display and assess each waveform in a consistent manner according to our protocol. After the rater assessed all of the waveforms recorded on a particular removable hard drive, a new removable hard drive, obtained from another subject, was transferred to the base station computer and evaluated.
Training. The four investigators received their training as part of the process of developing the criteria to be followed in the assessment of waveforms. This process involved approximately 30 h during which the investigators reviewed a large number of waveforms, reached a consensus regarding the assessment criteria for the CHIME study, and verified that they each understood and were able to apply these criteria. At the time of this formal assessment of reliability (approximately 2 mo later) each of the investigators was provided with a written summary of the criteria that they developed.
The five technicians were all newly hired for the centralized scoring for the CHIME study, located at the Data Coordinating and Analysis Center in Boston, MA. All of these individuals had some prior experience assessing physiologic signals, but not in the format used for the CHIME study and not in the laboratories of any of the CHIME investigators. Before the assessment of data set no. 1, the technicians received approximately 120 h of formal training and practice scoring according to the previously described criteria. This training was under the supervision of two individuals at the Data Coordinating and Analysis Center (M.J.C. and M.P.) who both participated in the development of the assessment criteria and had over 10 y of experience in the assessment of physiologic signals.
After the initial assessment of agreement (data set no. 1) the technicians received an additional training session. This consisted of approximately 40 h during which the technicians reviewed the specific reasons for disagreement and were instructed in strategies designed to improve the consistency in interpretation of the most difficult types of signals.
Data analysis. Agreement among raters was assessed with κ for categorical variables including: apnea present or absent, bradycardia present or absent, and number of out-of-phase deflections exceeding or not exceeding a specific threshold; and ICC for continuous variables including: duration of apnea, duration of bradycardia, and heart rate nadir (14). κ and ICC properties are as follows: +1 for perfect agreement, <0 for poorer than chance agreement, and ≥0 for better than chance agreement. There are published "arbitrary benchmarks" when using κ for describing the relative strength of agreement: <0.00 poor, 0.00-0.20 slight, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 substantial, and 0.81-1.00 almost perfect (14).
Agreement was initially assessed for data set no. 1 followed by a review of the results in an attempt to identify sources of major disagreements among the raters. After additional training to reduce these sources of disagreement, data set no. 2 was scored by the five technicians and κ and ICC values were calculated. The goal of this second assessment was to determine whether it was possible to train individuals to improve agreement. For this purpose it was not necessary to reassess both groups of observers. There is ample reason to believe that if we achieved success with additional training of one group of individuals that similar training among other groups of individuals would likewise meet with success. We chose to perform this second assessment among the technicians because these were the individuals designated to perform the scoring in the data analysis phase of the entire CHIME study. We elected to use centralized scoring for this large multicenter study because we recognized that this would provide the best opportunity to achieve and sustain a high level of agreement among raters.
RESULTS
Categorizing waveforms as true or false events. Because each rater measured the duration of any apnea ≥10 s and any bradycardia ≥1 s (i.e. even when the rater's measurement was at a duration lower than the threshold set on the monitor), we could assess the impact on agreement of the use of more versus less stringent thresholds to categorize the presence of apnea or bradycardia. Figure 2A provides the agreement in categorizing apnea as present as the threshold varies from 10 (i.e. most liberal) to 16 s (i.e. most stringent). Two effects are demonstrated: 1) agreement improved as the threshold decreased below the monitor's recording threshold, and 2) with training, the technicians achieved a high level of agreement (κ = 0.83), even with use of the most stringent threshold.
Figure 2B provides the agreement in categorizing bradycardia as present as the threshold varies from 1 (i.e. most liberal) to 5 s (i.e. most stringent). For bradycardia events the trends are the same, although the impact of the changing the cutoff value is less apparent than that for apnea. The agreement among technicians was always near perfect.
Agreement in the characteristics of apnea and bradycardia events. Figure 2C illustrates the agreement in identifying out-of-phase efforts during an apnea using four different thresholds (i.e. ≥1, ≥2, ≥3, or ≥4 out-of-phase efforts). Agreement improved with training and also improved as the threshold value increased. The technicians achieved our goal of κ > 0.8 after the final training only after use of the threshold of ≥4 out-of-phase efforts. Figure 3 provides an example of a waveform where there was considerable rater disagreement regarding the number of out-of-phase breaths. The additional training received by the technicians focused on standardizing the use of the software's zoom and gain adjustment features.
Figures 4 and 5 illustrate agreement in measurement of the duration of apnea and bradycardia with Bland-Altman plots (15). These plots show the duration measured for each event and the extent to which each rater's measurement differed from the median. For data set no. 1, at least two investigators, in 23 waveforms, and at least two technicians, in 25 waveforms, noted an apnea duration of ≥10 s. For data set no. 2, at least two technicians, in 51 waveforms, noted an apnea duration of ≥10 s. The median apnea duration measurements were similar for the two data sets (10-28 s).
As described before, we assessed agreement in measuring apnea duration by calculating the ICC coefficient. We included in this analysis all apnea events for which two or more observers found a longest interbreath interval of at least 10 s. This approach permits assessment of agreement over the widest possible range of measurement. Agreement in measurement of duration was initially relatively low for both investigators and technicians (ICC 0.33 and 0.36, respectively). However, the technicians improved markedly after additional training (ICC = 0.85). One can appreciate the practical impact of this improvement by contrasting the wide dispersion on the Bland-Altman plots shown in Figure 4, A and B, with the relatively tight fit found in Figure 4C. Figure 6 provides an example of a waveform with rater disagreement in measurement of apnea duration. The additional training received by technicians focused on strategies to differentiate various types of artifact (especially due to movement and poor signal quality) from genuine breaths.
Figure 5, A-C, gives Bland-Altman plots for bradycardia duration. For data set no. 1, at least two investigators, in 10 waveforms, and at least two technicians, in 11 waveforms, noted a bradycardia duration of ≥1 s. For data set no. 2 at least two technicians, in 13 waveforms, noted a bradycardia duration of ≥1 s. The bradycardia duration measurements were similar for the two data sets (4-24 s). Agreement in measurement of bradycardia duration was higher for technicians than for investigators (ICC 0.83 versus 0.99), but was quite high in all cases. This suggests that it is easier to achieve consistency in measurement of bradycardia duration than in apnea duration.
The nadir in heart rate during bradycardia events was a mean of 62 beats/min in both data sets. Agreement in measurement of heart rate nadir was better for the technicians than investigators (ICC 0.77 versus 0.65). With additional training focused on measurement of heart rate in artifact free regions, technicians achieved total agreement (ICC = 1.0).
DISCUSSION
The purpose of this study was to quantify the level of agreement between raters when describing the features of cardiorespiratory waveforms recorded from infants on home monitors, to identify factors that impede interrater agreement and to determine whether CHIME study technicians could achieve a high (κ and ICC coefficients >0.8) level of agreement. We have shown that despite considerable initial training we were unable to achieve a high level of agreement among both investigators and technicians. We have shown, however, that additional training of technicians, directed at specific sources of disagreement, did result in improved agreement to levels considered generally sufficient for research purposes. Thus, from these data, the benefits of rigorous scoring algorithms, and the need for intensive training are quite clear. Our assessment before establishing clear guidelines indicated that experienced investigators disagreed in their categorization of cardiorespiratory events. Published descriptions of cardiorespiratory patterns in infants, except for assessment of agreement in the total amount of apnea during overnight studies, have not included formal attempts to assess interrater agreement. We can reasonably assume that experts would exhibit considerable disagreement in their judging of certain published data. Conflicting findings, within and across studies, may result from such disagreement, even with use of similar methods. In this form of clinical research, one must define events carefully and meticulously instruct and train those responsible for assessment of waveforms.
We have also demonstrated the impact on agreement of varying the threshold for defining presence of apnea and bradycardia. As expected, when there is not a high level in agreement in measuring event duration, there is exaggeration in disagreement in judging categorically the presence of apnea or bradycardia. This exaggeration is minimized by using more liberal thresholds. A practical strategy, often used in home monitoring, that reduces the impact of this problem is the common practice of setting the record threshold on a monitor at 5 s below the alarm threshold.
There are several factors that may contribute to disagreement between observers. First, the signals provided do not represent a "gold standard" for assessing cardiorespiratory events and, therefore, may be insufficient for an observer to define events accurately. Second, raters may differ in their understanding of the definition of variables and the criteria they employ when describing the waveforms. This may result from insufficient definition of a variable or insufficient training of raters. Hunt et al. (16) have described how varying definitions for apnea-onset (i.e. peak of inspiration versus end of expiration) may have a substantial impact on the assessment of apnea during overnight pneumogram recordings. Finally, inherent ambiguities in the signals make it difficult for raters to agree even when they use similar criteria in making judgments. An example of a physiologic pattern inherently difficult to interpret is the identification of out-of-phase breathing efforts during apnea (defined as efforts in opposite directions on of the abdominal and rib cage channels with no substantial deflection of the sum channel). Single out-of-phase breathing efforts often have low amplitude excursions and may be viewed variably as obstructive, central, or mixed apnea. Only with four or more out-of-phase breaths during a period of apnea for ≥16 s did we find high agreement between raters. The relationship between out-of-phase efforts and obstructed breathing remains to be determined, as does the biologic relevance of repetitive out-of-phase efforts compared with single out-of-phase breaths during the same time frame. Even "breaths" with no thoraco-abdominal asynchrony (i.e. in-phase), but low amplitude, are judged variably. Events with low amplitude excursions contribute substantially to disagreement among observers. Only with computer-based scoring that follows established rules will one likely reduce this type of disagreement. Computer-based automated scoring offers the potential to eliminate inconsistencies in scoring. The development of such a scoring system, however, is difficult and first requires consensus of the criteria used by the computer.
When interpreting the level of agreement observed in this study there are several factors that need consideration. The waveforms displayed continuous physiologic data beginning 75 s before an event and continuing until 30 s after resolution of the event (typically a total of 2 min). This is roughly twice the duration usually displayed by commercially available monitors. The increased ability to assess each epoch within its overall context may have favorably influenced reliability. Alternatively, in some instances, depending on the quality of the recorded signals, even a 2-min display time may be insufficient to understand the context of the displayed signals. Second, we note that the criteria developed for assessment of waveforms, although based on commonly used standards, were developed uniquely for this project. Our reliability may be strongly influenced by the specific criteria used. In addition, waveforms were displayed on a computer screen using unique software for display of waveforms and measurement of the variables. The display and measurement techniques clearly affect the level of agreement achieved. One must also consider the selection process for the waveforms used in the data sets. Although we selected waveforms to provide a representative sample of CHIME data, it is possible that we would have achieved different levels of reliability with waveforms that were either harder or easier to interpret than those we had selected. Finally, the impact of assessment of breathing by respiratory inductance plethysmography needs consideration. Most home monitors use transthoracic impedance to assess breathing movements. Although the levels of interrater agreement observed in this study might not seem applicable to assessment of waveforms of transthoracic impedance, in our judgment interrater agreement in assessment of waveforms of transthoracic impedance is as difficult to achieve as that for respiratory inductance plethysmography. Each of the factors contributing to disagreement apply similarly to both methods, except that one cannot assess out-of-phase breathing by transthoracic impedance.
In summary, although researchers and clinicians commonly describe events based on cardiorespiratory recordings, agreement among experienced individuals may be poor. This lack of agreement may result in conflicting findings, within and across studies, and mandates careful definition of variables, intensive training, and demonstration of reliability in the conduct of studies. However, when clear guidelines are developed, and sufficient training is provided, raters can attain a high level of agreement in judging the type and duration of cardiorespiratory events. The CHIME study, using a respiratory inductance plethysmography-based home monitor for breath detection, has developed guidelines and tools for assessment of cardiorespiratory waveforms and implemented training strategies that have achieved a high level of agreement in the assessment of cardiorespiratory waveforms.
Abbreviations
- CHIME:
-
collaborative home infant monitoring evaluation
- SIDS:
-
sudden infant death syndrome
- ALTE:
-
apparent life-threatening event
- ICC:
-
intraclass correlation
References
Nathanson I, O'Donnell J, Commins MF 1989 Cardiorespiratory patterns during alarms in infants using apnea/bradycardia monitors. Am J Dis Child 143: 476–480
Weese-Mayer DE, Brouillette RT, Morrow AS, Conway LP, Klemka-Walden LM, Hunt CE 1989 Assessing validity of infant monitor alarms with event recording. J Pediatr 115: 702–708
Weese-Mayer DE, Morrow AS, Conway LP, Brouillette, Silvestri JS 1990 Assessing clinical significance of apnea exceeding 15 seconds with event recording. J Pediatr 117: 568–574
Steinschneider A, Santos V 1991 Parental reports of apnea and bradycardia: temporal characteristics and accuracy. Pediatrics 88: 1100–1105
Silvestri JM, Weese-Mayer DE, Kenny AS, Hauptman SA 1994 Prolonged cardiorespiratory monitoring of children more than twelve months of age: characterization of events and approach to discontinuation. J Pediatr 125: 51–56
Meny RG, Currey K, Scheel JN, Harrington D, Vice FL 1996 Asystoles during infancy recorded by home memory monitors. Benign events? Arch Pediatr Adolesc Med 150: 901–905
Kahn A, Blum D, Rebuffat E, Sottiaux M, Levitt J, Bochner A, Alexander M, Grosswasser J, Muller MF 1988 Polysomnographic studies of infants who subsequently died of sudden infant death syndrome. Pediatrics 82: 721–727
Gordon D, Southall DP, Kelly DH, Wilson A, Akselrod S, Richards J, Kenet B, Kenet R, Cohen RJ, Shannon DC 1986 Analysis of heart rate and respiratory patterns in sudden infant death syndrome victims and control infants. Pediatr Res 20: 680–684
Laximinarayan S, Mills O, Michelson L, Cornwell AC, Marmarou A, Costigan EF Jr Weitzman ED 1983 Sudden infant death syndrome: a digital computer-based apnoea monitor. Med Biol Eng Comput 21: 191–196
Whyte KF, Allen MB, Fitzpatrick MF, Douglas NJ 1992 Accuracy and significance of scoring hypopneas. Sleep 15: 257–260
Macey PM, Ford RPK, Brown PJ, Larkin J, Fright WR, Garden KL 1995 Apnoea detection: human performance and reliability of a computer algorithm. Acta Paediatr 84: 1103–1107
Adams JA, Zabaleta IA, Stroh D, Johnson P, Sackner MA 1993 Tidal volume measurements in newborns using respiratory inductive plethysmography. Am Rev Respir Dis 148: 585–588
Brooks LJ, DiFiore JM, Martin RJ, and the CHIME Study Group 1997 Assessment of tidal volume over time in preterm infants using respiratory inductance plethysmography. Pediatr Pulmonol 23: 429–433
Landis JR, Koch GG 1977 The measurement of observer agreement for categorical data. Biometrics 33: 159–174
Bland JM, Altman DG 1986 Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1: 307–310
Hunt CE, Brouillette RT, Hanson D 1988 Apnea-onset definition significantly affects pneumogram results. Sleep 11: 286–290
Author information
Authors and Affiliations
Consortia
Additional information
Supported by National Institutes of Health, NICHID HD Grants 29067 (M.J.C.), 29071 (T.M.B.), 28971 (D.E.W.-M.), 29073 (D.H.C.), 29060 (T.T.H.), 29056 (C.E.H.), and 34625 (M.R.N.).
Rights and permissions
About this article
Cite this article
Corwin, M., Lister, G., Silvestri, J. et al. Agreement among Raters in Assessment of Physiologic Waveforms Recorded by a Cardiorespiratory Monitor for Home Use. Pediatr Res 44, 682–690 (1998). https://doi.org/10.1203/00006450-199811000-00010
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1203/00006450-199811000-00010
This article is cited by
-
Short-term event recording as a measure to rule out false alarms and to shorten the duration of home monitoring in infants
Wiener Klinische Wochenschrift (2003)