Methodological issues on interrater reliability of the modified Sarnat examination in preterm infants

Maleki, Shokofeh; Naderi, Mehdi

doi:10.1038/s41390-019-0741-9

Download PDF

Correspondence
Published: 02 January 2020

Methodological issues on interrater reliability of the modified Sarnat examination in preterm infants

Pediatric Research volume 87, page 614 (2020)Cite this article

498 Accesses
2 Citations
Metrics details

Dear Editor,

We carefully read the influential article by Pavageau and colleagues published in the journal Pediatric Research in 2019.¹ To determine the reliability of the modified Sarnat neurologic examination in late and moderately preterm neonates born at 32–36 weeks’ gestational age was the purpose of this study. Kappa values were used to evaluate and analyze the agreement between examiners.¹ Based on the results, the reliability of the neurologic exam between the gold standard (GS) study investigator and groups of attending neonatologists was good to excellent (k > 0.72) in most categories except for Moro and tone. While the agreement was poor/fair for both tone and Moro categories in infants born at 32–34 weeks’ gestation (k = 0.20–0.60), at 35–36 weeks’ gestation, in contrast, the agreement was perfect for the tone and Moro categories (k = 1.0). However, when the GS examiner was compared to groups of attending examiners, the agreement in the Moro and tone categories was fair, k = 0.46.¹

Depending on the type of variables, reliability analysis can be performed in different ways, one of which is the kappa coefficient, which has been used to assess agreement for qualitative variables. However, applying kappa for such a situation in particular circumstances can provide misleading results. These conditions are as follows: when the prevalence difference in each group can significantly change the kappa value. The second condition is when there are more than two categories.^2,3,4,5,6 Finally, the last critical situation occurs when the marginal distribution of voters' responses is different.^2,3,4,5,6 In such a situation, we strongly recommend using weighted kappa. Table 1 illustrates these circumstances with a hypothetical example and shows how much kappa (0.44 as moderate and 0.80 as very good) can change across circumstances with different prevalence rates and number of categories.^3,4,5,6

Table 1 The kappa and weighted kappa values for calculating reliability between two examiners for more than two categories and depend on prevalence.

Full size table

The authors concluded that there was strong reliability with the exception of Moro and tone for the modified Sarnat in preterm infants. However, the experience of the examiners can influence these results and can improve reliability in tone and Moro agreement after 35 weeks. Eventually, their institution adopted these two goals: first, by providing education that targets the assessment of tone in preterm infants, and second, by testing a new neurological examination form that omits the Moro and provides details for evaluating the tone adapted from the Dubowitz and Hammersmith infant neurological examinations.¹ Such a conclusion may have been due to the inappropriate use of the statistical test, which can ultimately lead to a misleading message. In this letter, we pointed out the disadvantages of using kappa to assess agreement.

References

Pavageau, L., Sánchez, P.J., Steven Brown, L. & Chalak, L. F. Inter-rater reliability of the modified Sarnat examination in preterm infants at 32-36 weeks' gestation. Pediatr. Res. https://doi.org/10.1038/s41390-019-0562-x (2019).
Szklo, M. & Nieto, F. J. Epidemiology Beyond the Basics 3rd edn (Jones and Bartlett Publisher, Manhattan, New York, 2014).
Sabour, S. & Dastjerdi, E. V. Reliability of four different computerized cephalometric analysis programs: a methodological error. Eur. J. Orthod. 35, 848 (2013).
Article Google Scholar
Naderi, M. & Sabour, S. Reproducibility of diagnostic criteria associated with atypical breast cytology: a methodological issue. Cytopathology 29, 396 (2018).
Article CAS Google Scholar
Sabour, S. & Ghassemi, F. The validity and reliability of a signal impact assessment tool: statistical issue to avoid misinterpretation. Pharmacoepidemiol Drug Saf. 25, 1215–1216 (2016).
Article Google Scholar
Naderi, M. & Sabour, S. Inter and intraobserver reliability and critical analysis of the FFP classification of osteoporotic pelvic ring injuries: methodological issue. Injury 50, 1261–1262 (2019).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Clinical Research Development Centre, Taleghani and Imam Ali Hospital, Kermanshah University of Medical Sciences, Kermanshah, I.R., Iran
Shokofeh Maleki & Mehdi Naderi

Authors

Shokofeh Maleki
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Naderi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.N. and S.M. conceptualized and designed the study. M.N. and S.M. participated in the writing of the first draft of the manuscript, reviewed the revisions and approved the final manuscript as submitted.

Corresponding author

Correspondence to Mehdi Naderi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maleki, S., Naderi, M. Methodological issues on interrater reliability of the modified Sarnat examination in preterm infants. Pediatr Res 87, 614 (2020). https://doi.org/10.1038/s41390-019-0741-9

Download citation

Received: 24 September 2019
Accepted: 07 October 2019
Published: 02 January 2020
Issue Date: March 2020
DOI: https://doi.org/10.1038/s41390-019-0741-9

This article is cited by

Correspondence on statistical rigor and kappa considerations: which, when, and clinical context matters
- Mary Ann O’Riordan
- Cynthia Bearer
Pediatric Research (2020)
Statistical rigor and kappa considerations: which, when and clinical context matters
- Lina F. Chalak
- Lara Pavageau
- Linda Hynan
Pediatric Research (2020)

Methodological issues on interrater reliability of the modified Sarnat examination in preterm infants

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Correspondence on statistical rigor and kappa considerations: which, when, and clinical context matters

Statistical rigor and kappa considerations: which, when and clinical context matters

Inter-rater reliability of the modified Sarnat examination in preterm infants at 32–36 weeks’ gestation

Correspondence on statistical rigor and kappa considerations: which, when, and clinical context matters

Statistical rigor and kappa considerations: which, when and clinical context matters

Search

Quick links

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Correspondence on statistical rigor and kappa considerations: which, when, and clinical context matters

Statistical rigor and kappa considerations: which, when and clinical context matters

Search

Quick links