Current evidence supports the provision of therapeutic hypothermia (TH) to those with moderate or severe neonatal encephalopathy (NE).1 However, determining the true grade of encephalopathy is challenging in the newborn infant in the first 6 h of life.2,3 Therefore, each of the randomized controlled trials (RCTs) described modified encephalopathy exams for TH eligibility that aimed to identify those ultimately categorized as moderate or severe NE.4,5 This led to minor variations in eligibility exams between the RCTs, and subsequently between the various jurisdictions that incorporated these evidence-based assessments into their practice.

In North America, TH eligibility is most commonly based on the clinical encephalopathy criteria described by the NICHD RCT (NICHD-Neonatal Research Network (NRN) system), with training available for researchers through the NRN ( In the United Kingdom, eligibility is often determined using the methodology described in the TOBY RCT, and recently reaffirmed by the British Association of Perinatal Medicine (TOBY-BAPM system).6 Both of these systems require infants to demonstrate evidence of perinatal asphyxia, plus clinical findings consistent with moderate or severe encephalopathy using a standardized exam.4,5 If both of these criteria are met, the infant is eligible for TH by the NICHD-NRN system, while TOBY-BAPM next requires the infant to demonstrate a moderate to severely abnormal amplitude-integrated electroencephalography (aEEG). If an aEEG is not available, BAPM guidelines advise that TH should proceed if the clinical exam criteria are met. Therefore, tremendous importance is placed upon the clinical neurological exam with its critical role in determining TH eligibility.

The clinical neurological exams used in both systems differ with the potential for these differences to impact the level of encephalopathy defined for an infant, and therefore result in variation in TH eligibility between centers, and nations, for an individual infant depending on which system is being applied. The aim of this study was to determine if the differences in the NICHD-NRN’s and TOBY-BAPM’s clinical neurological exam impacted an infants’ TH eligibility.


This is a secondary analysis of infants with NE who underwent TH between July 2014 and May 2019 in a large tertiary-level neonatal intensive unit. The need to initiate TH was determined by the clinician and based upon an adaptation of standard criteria (Table 1). These modified criteria include providing TH to those with milder encephalopathy, such that those with mild, moderate, and severe NE were included in this cohort.7 All infants had a systematic neurological exam performed and documented by both an attending neonatologist and a pediatric neurologist within the first 6 h of life prior to the initiation of TH, if the grade of NE differed the worst grade was utilized.

Table 1 The Brigham and Women’s Hospital Neonatal Encephalopathy Scale.

For inclusion in this analysis, all infants had to be ≥36 weeks gestation, have signs of perinatal hypoxia–ischemia (defined using TOBY-BAPM A criteria—including one of; pH of <7.00 or base deficit of ≥16 mmol/L on cord blood or any gas within 60 min of birth; or continued need for resuscitation at 10 min after birth (intubation or IPPV); or 10 min Apgar score of ≤5),4 and sufficient documentation of the neurological exam to allow determination of TH eligibility by both the NICHD-NRN and TOBY-BAPM neurological exam criteria. Institutional Review Board approval was obtained to conduct this analysis.

Demographic, clinical, and laboratory data were collected from the medical records. All infants included in this cohort underwent TH, had multichannel electroencephalography (EEG) during cooling, and had a magnetic resonance imaging (MRI) scan following re-warming. The continuous EEGs were placed as soon as possible after the decision to initiate TH and were maintained throughout TH and re-warming. For this analysis, the clinical neurophysiologist report from the first 24 h of age was used to define the EEG grade of encephalopathy. The presence of electrographic seizures at any point during the EEG monitoring was recorded.


The NICHD neurological exam (B criteria) indicates that infants with moderate or severe NE are eligible for TH. The NICHD-NRN neurological exam assesses six domains: level of consciousness, spontaneous activity, tone, posture, primitive reflexes (two sub-domains assessed independently, with the worst score providing the global grade for primitive reflexes—suck, and Moro reflex), and autonomic activity (three sub-domains assessed independently, with the worst score providing the global grade for autonomic activity—pupillary reaction, heart rate, and respirations) (Fig. 1).5 If ≥3 domains are consistent with moderate or severe NE, then the infant is eligible for TH. They are defined as moderate NE if they have ≥3 domains consistent with moderate or severe NE, but more domains are moderate than severe, and they are graded as severe if they have ≥3 domains consistent with moderate or severe NE, but more domains are severe than moderate. In addition, if they are encephalopathic on exam, they do not have sufficient moderate or severe criteria to meet the TH threshold, but develop seizures in the first 6 h of life, and they are categorized as moderate NE.

Fig. 1: Comparison of TH eligibility B criterion in NICHD and TOBY RCTs.
figure 1

Panel A; NICHD Exam (B) criterion, and Panel B; TOBY Exam (B) criterion.


The TOBY neurological exam criteria also indicate that infants with moderate or severe NE are eligible for TH. The TOBY-BAPM neurological exam requires that a moderate or severely abnormal level of consciousness (lethargy, stupor, or coma) must be present plus one additional finding of either clinical seizure, weak or absent suck, hypotonia, or abnormal reflexes including oculomotor or pupillary reflexes (Fig. 1).4 The TOBY trial defined infants as TH eligible or not eligible, but did not define the clinical grade of NE within those that were eligible for TH.1

Magnetic resonance imaging

All infants had a cerebral MRI performed within the first week of life. All scans were performed on a 3-T Siemens scanner (Siemens, Erlangen, Germany). The standard clinical imaging protocol included T1, T2, and diffusion-weighted imaging. The images for this study were analyzed independently by a pediatric neuroradiologist and a neonatologist (E.Y., T.E.I.), who were blinded to the clinical grades of encephalopathy. The presence and type of any MRI abnormalities were detailed. Analysis of the pattern and severity of brain injury was classified according to the grading system developed by Barkovich et al.8 A score of ≥2 in the deep nuclear gray matter, or a score of ≥3 in a watershed pattern, was considered consistent with moderate–severe MRI injury.

Statistical analysis

Statistical analysis was performed using PASW statistics 18.0. Nonparametric data were reported as median values with interquartile range (IQR) and comparisons were performed using the Mann–Whitney U test or Kruskal–Wallis H test, as appropriate. The χ2 test was used when comparing proportions. The TOBY trial defined infants as TH eligible or not eligible, but did not define the grade of NE within those that were eligible for TH.1,4 Therefore, in comparing the eligibility methods, the primary assessment compared if the infant met the clinical neurological exam (B criteria) TH eligibility Yes/No using both exam methods. Agreement between assessment methods was described using κ values. Statistical significance was taken as p < 0.05.


One hundred and one infants met our defined inclusion criteria between 2014 and 2019. Of these, ten were excluded, four as they were <36 weeks gestational age, and six due to insufficient exam documentation. Thus, the final study population consisted of 91 infants with NE: 45 mild, 43 moderate, and 3 severe NE per standardized exam (Table 1). The median age of the neurological exam used for analysis was 2 h (IQR 0.75–4 h). The clinical and demographic details by grade of NE are provided in Table 2.

Table 2 Screening criteria and TH eligibility based on Brigham and Women’s grade of NE.

TH eligibility by NICHD-NRN and TOBY-BAPM neurological exam criteria

Forty-six (50.5%) infants were eligible for TH per the NICHD-NRN exam, while 35 (38%) infants were eligible for TH per the TOBY-BAPM exam (Table 3). This represents a good agreement between the systems, with a κ value of 0.715, p < 0.001. However, the differences in TH eligibility between the systems were significant (p < 0.001), with the NICHD system identifying more infants as eligible. The two neurological assessments differed in determining TH eligibility for 13 infants.

Table 3 Agreement between TOBY-BAPM and NICHD-NRN TH eligibility exams.

Figure 2 demonstrates the distribution of the exam components for those who did and did not meet both the (a) NICHD-NRN and (b) TOBY-BAPM exam criteria. From Fig. 2, it is clear that the disagreement between the systems was predominantly related to the categorization of the level of consciousness. Twelve (26%) of the 46 infants eligible per NICHD-NRN had a level of consciousness that did not meet TOBY-BAPM criteria and were therefore ineligible for TH per TOBY-BAPM. Table 4 provides further details of initial screening and short-term outcomes specifically for the 13 infants for whom the exams disagreed on their TH eligibility.

Fig. 2: TH Eligibility Exam Findings.
figure 2

Distribution of exam sub-components among infants that were and were not eligible for TH based upon; a NICHDNRN B criteria and b TOBY-BAPM B criteria.

Table 4 Details of infants with a discrepancy in TH eligibility per NICHD-NRN and TOBY-BAPM neurological exams.

Comparing screening criteria, seizure frequency, and MRI outcomes associated with TOBY-BAPM vs. NICHD-NRN TH eligibility exams

There was no difference in demographic, screening criteria, or short-term outcome measures between the 35 infants who were eligible per TOBY-BAPM compared to the 46 infants that were eligible per the NICHD-NRN exam (Table 5).

Table 5 Comparing demographics, screening criteria, and short-term outcomes for Infants who were and were not eligible for TH depending on neurological exam criteria used (NICHD-NRN vs. TOBY-BAPM).

Similarly, there was no difference in demographics or screening criteria for the 56 infants who were not eligible per TOBY-BAPM compared to the 45 infants who were not eligible per the NICHD-NRN neurological exam. However, the clinical grade of NE did differ between those not eligible by the two different exams (p < 0.001), reflecting the disagreement in the “Methods” discussed above. Electrographic seizures developed after 6 h of age in 3 (5%) infants who were not eligible per TOBY-BAPM exam and in 1 (2%) infant who was not eligible per NICHD-NRN exam. There was hypoxic–ischemic cerebral injury on MRI in 13 (23%) infants who were not eligible per TOBY-BAPM and 6 (13%) infants who were not eligible per NICHD-NRN. Neither of these differences was statistically significant (Table 5).


Our results demonstrate that there is a notable difference in TH eligibility depending on which standardized neurological exam is used for the evaluation of clinical encephalopathy. The TOBY-BAPM neurological exam defined 24% fewer infants as being eligible for TH compared to the NICHD-NRN exam. There were no differences in short-term outcomes between the methods of assessment, and neither method identified all infants who developed seizures or had cerebral MR injury. However, among the 12 infants who were determined eligible for TH by NICHD-NRN exam, but were ineligible per TOBY-BAPM exam, two infants developed electrographic seizures and seven infants demonstrated cerebral injury. Although not statistically significant, these differences are potentially clinically relevant.

The primary driver for the difference in TH eligibility was due to the evaluation and importance of the level of consciousness in the TOBY-BAPM criteria. In this cohort, a quarter of those who met NICHD-NRN TH eligibility had a level of consciousness that did not meet TOBY-BAPM criteria. Unlike the TOBY-BAPM exam, the NICHD-NRN does not weight the level of consciousness differently to other domains. This is consistent with the original work by Sarnat and Sarnat.3 The Sarnats did not identify any one domain that was a prerequisite for defining severity, or that should be weighted over the other exam components. Furthermore, Robertson and Finer did not actually include the level of consciousness at all when describing moderate NE in their seminal work, rather defining it as consisting of “hypotonia and suppressed primitive reflexes.”2 Therefore, although the level of consciousness is one of the more overt signs of encephalopathy, the seminal work on defining grade of NE did not identify it as being superior to other components of the exam.

Both the TOBY-BAPM and NICHD-NRN neurological exams are evidence-based for the assessment of TH eligibility and are appropriate for use in clinical settings.4,5 This paper cannot determine if either is superior to the other, and indeed showed no difference in short-term outcomes between the two, rather we report that there is a notable difference in TH eligibility depending on which of the two systems are applied.

This has several implications. First, this has the potential to impact the care an infant receives dependent on the location in which they are born, with more infants eligible for TH in North America than in the United Kingdom. Second, TH rates are a frequent metric used by centers and health systems for benchmarking, and in some instances, infer the rate of moderate–severe NE based upon these rates. However, this paper demonstrates that comparison of TH rates between health systems, or the inference of rates of moderate–severe NE based upon TH rates, may be fundamentally flawed if different eligibility methods are used. The potential difference between the incidence of TH and moderate/severe HIE was recently highlighted by Shipley et al.9 They demonstrated that while the rate of TH in the United Kingdom is 1.26/1000 live births, the rate of moderate/severe HIE in infants ≥36 weeks in the Shipley was 2.03/1000 live births for the same time period. Shipley reported that excluding those who died prior to initiation of TH, 37.9% of those ≥36 weeks with a discharge diagnosis of moderate or severe HIE in the UK did not receive TH. Therefore, the appropriate metric for benchmarking is to report the grade of NE defined by the cumulative data and derived at the time of discharge.

Our center’s TH eligibility criteria (as detailed in Table 1) allow treatment to be provided to those with milder encephalopathy. This practice is becoming more common, with the use of TH in mild NE increasing internationally.10,11,12,13 While there is currently no evidence of treatment benefit among this population, Oliveira et al. reported that this was being driven by concern that these infants are at risk of injury and that the early grade of NE was not sensitive enough to discern which infant will have an adverse outcome.10 Supporting the first point, the evidence of injury among mild NE is now well established, with numerous groups demonstrating both significant risk of cerebral injury and adverse neurodevelopmental outcomes in this population.7,14,15,16,17 Regarding the early neurological assessment, as discussed previously, the NICHD-NRN and TOBY-BAPM TH eligibility assessments were developed to identify infants at high risk of cerebral injury. Now as clinicians consider the question of managing those with milder encephalopathy, we must first acknowledge that these standard assessment methods do not identify these children nor do they even define mild NE.

For this reason, the PRIME study published a novel modified NICHD assessment defining mild NE.18 Furthermore in an attempt to improve the sensitivity of the early assessment to detect the minimum threshold for injury, PRIME also proposed applying a scaled score (Total Sarnat Score [TSS]) to their modified NICHD exam.19 Rather than defining an infant as mild, moderate, or severe, using the same exam sub-components the TSS scores them from 1 to 18. This approach was applied in recognition that encephalopathy is a spectrum, with a range of severity existing both within individual grades and between them.

The PRIME study reported that a score of ≥5 in the first 6 h provided the optimum sensitivity for identifying those who would have cognitive deficits at 2 years. This equated to an infant with mild encephalopathy who did not meet the threshold for moderate NE. This approach was replicated by Morales et al. using the MARBLE study cohort.20 They reported that they could identify all who developed cognitive deficits at 2 years using a TSS threshold score of ≥4, again by capturing those at the more severe end of the mild NE spectrum. More recently, our own group has further validated this approach, reporting that both the TSS and a scaled score based upon the SIBEN exam each had a superior sensitivity to standard HIE grades for identifying infants at increased risk of cerebral injury.21 Similar to the MARBLE cohort, we found that the optimum threshold was a TSS of ≥4. In the current manuscript, we could not determine if either of the standard methods of TH eligibility assessment is superior, rather we demonstrated the discrepancies that exist between them. However, now that TH is the standard of care for encephalopathic infants at high risk of injury, we must look to further optimize our eligibility criteria. Potentially this will require a re-evaluation of the use of classical grades of NE to define risk thresholds, with the possibility that the greater granularity offered by numerical scoring systems may improve our risk stratification.

There are several limitations to our study. The first is that the synthesis of the neurological exam criteria was performed retrospectively. However, the exams themselves were performed and documented prospectively by experienced clinicians, which should limit potential bias. An additional limitation, as discussed above, is that our center provides TH to those with milder encephalopathy. As such all infants included in this study received TH. This would have no impact on the neurological exams performed, as in all cases the exams were performed prior to the initiation of TH. It could have impacted the frequency of seizures and MR injury in the population; however, given that all infants underwent TH, it should not have impacted on differences between assessment methods. Lastly, it is recognized that we did not include aEEG assessment in determining TH eligibility. However, while a component of TOBY-BAPM, this would not have increased the agreement, and would if anything have further decreased agreement, as only those who meet the neurological exam criteria by TOBY-BAPM are then assessed to see if they meet the aEEG criteria.

In conclusion, in our cohort, we have demonstrated that there was a notable difference in TH eligibility depending on which evidence-based clinical encephalopathy exam was applied. This has significant implications for the care of the individual infant, for benchmarking between health systems, and must be considered as we look to optimize future TH eligibility criteria.