Transition from fetal to neonatal condition is the most complex physiologic adaptation in life as it is associated with rapid cardiorespiratory, hemodynamic, and metabolic changes.1,2 Whereas the majority of newborns adapt without significant problems, up to 15% of all newborns need some medical support during their transitional period. To define infants needing support, Virginia Apgar developed a scoring system to assess postnatal condition by converting simple clinical observations into numerical data. The system included the following parameters: Skin color, heart rate (HR), reflex irritability, muscle tone, and respiration.3

Almost 70 years after Apgar’s seminal paper, the European Resuscitation Council (ERC) states that, in particular HR, muscle tone and respiration “may help identify infants likely to need resuscitation”.4 However, whereas widely used, there are major concerns regarding the reliability of these and other Apgar score parameters.5,6 Particularly challenging is the clinical assessment in infants born prematurely or receiving medical interventions.7

We propose that, in order to further improve the assessment of a newborn’s postnatal condition in the delivery room, contemporary obstetrical practices and advances in neonatology should be taken into account. Although alternatives to assess clinical condition of the newborn have been suggested,7 the validity and interobserver reliability/reproducibility of these modifications to the conventional Apgar score are still unknown. This paper aims to critically appraise the components of the Apgar score, as well as other currently used parameters including those of recently published adaptations to the conventional Apgar score.

Individual parameters of the conventional Apgar score

Skin color

Apgar assigned for skin color a score of two only “…when the entire child was pink”. The clinical definition of being blue or purple involves examination of the face, trunk, fingers, and mucous membranes. Apgar also said “…the inherited pigmentation of the skin of colored children” interfered with this sign and “this is by far the most unsatisfactory sign and caused the most discussion among the observers”.3

Physiological considerations

Deoxygenated hemoglobin absorbs light in the red-orange spectrum and reflects blue light. Thus, deoxygenated blood gives a blue appearance to the skin and mucous membranes called cyanosis. Lundsgaard8 described cyanosis originally as deoxygenated hemoglobin ≥5 g/dL. More recent studies indicate that visible cyanosis occurs already with 3 and 4–5 g/dL of deoxyhemoglobin in arteries and capillaries, respectively.9 As these values are absolute numbers, cyanosis is more prominent in polycythemic vs. anemic patients. ‘Peripheral cyanosis’, also called acrocyanosis is seen periorally and on the fingers and toes, and is caused by reduced circulation due to, for example, constricted small vessels. As Apgar stated, many children persist in having cyanotic hands and feet for several minutes in spite of excellent ventilation. “A score of two should only be given when the entire child is pink“.3

Clinical considerations

Clinical assessment of color is not easy as the color reflected from the infant’s skin depends on the color and intensity of the light illuminating the infant. A room with green window curtains will result in an apparently different skin color than the same room with pink curtains. As a result, color is an unreliable parameter to estimate oxygenation and contributes to a high interobserver variability in the Apgar score.5 O´Donnell et al.10 demonstrated not only disagreement about whether a newborn was judged to be pink, but also a wide variation in SpO2 in infants considered to be pink.

A prospective observational study of healthy term infants investigated visual assessment of the tongue and simultaneous SpO2 measurements. If the tongue was pink, infants were likely to have SpO2 > 70% and did not require supplemental oxygen.11 However, ERC states that oxygenation is better assessed using pulse oximetry.4

Predictive value

As skin color is not reliable for judging oxygenation, it is a poor predictor of outcome. Recommendations have been made that after birth color should change to pink within 30 s of effective breathing.4,12 At 5 min of age SpO2 should be >80%13 because there is evidence of poorer outcomes in very preterm infants where this is not achieved.14 The ERC does not mention skin color assessment as a means to identify infants at risk and in need for resuscitation.4

Interference with the parameter

Skin color is mainly affected by oxygenation, but also skin thickness, pigmentation,15 perfusion, blood hemoglobin concentration, and environmental factors such as light color and intensity.16 Oxygenation is affected by mode of delivery13,17,18 and sex.19

Interventions affecting the parameter

Oxygenation and thereby skin color is affected by oxygen administration and other measures to support ventilation and perfusion as well as by changes in light color and intensity.

Heart rate

“…a HR of 100–140 was considered good and given a score of two, a rate of under 100 received a score of one, and if no heart beat could be seen, felt or heard the score was zero (…)”.3

Physiological considerations

During fetal development, blood circulation begins with the first heartbeats occurring approximately on day 22 of gestation. The ability to adjust fetal cardiac output, defined as HR × stroke volume, is mainly limited to changes in HR: Fetal tachycardia is an indicator of fetal distress and increases cardiac output. Fetal bradycardia indicates acute fetal hypoxia and decreases cardiac output20 to reduce fetal oxygen consumption.21 Cardiovascular transition is closely linked to the establishment of lung aeration and pulmonary gas exchange,22 and bradycardia after birth is also considered to occur due to vagal stimuli, especially if umbilical cord clamping is performed before aeration of the lungs.23,24

Clinical considerations

HR is a more objective parameter with low interobserver variability.25 HR is counted by palpation of the umbilical cord or cardiac auscultation via stethoscope; however, these methods may underestimate HR. Better options could be technology-based measurements including pulse oximetry or electrocardiography (ECG). Potentially because pulse oximetry is influenced by low cardiac output, reduced peripheral circulation or irregular HR, ECG accuracy is higher than that of pulse oximetry.26,27 Also, a faster HR acquisition may be possible with novel ECG application methods.28

Predictive value

Studies have postulated using neonatal HR to distinguish between physiological and pathological transition.28,29 The ERC states that neonatal HR is an indicator of hypoxia4 and ≥100 beats per minute (bpm) is considered satisfactory, whereas a HR 60–100 bpm signifies a requirement for positive pressure ventilation (PPV) because the most likely underlying reason is insufficient lung aeration. However, some authors suggested that in the first 2 min of life, HR < 100 bpm should not lead to immediate PPV if breathing and muscle tone are normal.29 Very slow, i.e., <60 bpm or absent HR is defined as critical and cardiac compressions are recommended.4

Interference with the parameter

HR variability during pre- and perinatal life corresponds with central nervous system maturation and mental development.30 Fetal distress, perinatal hypoxia, and acidosis may result in bradycardia after birth.23,24,31 However, neonatal HR can also be affected by conditions that do not necessarily indicate hypoxia or respiratory impairment, e.g., gestational age, birth weight, and delivery mode. Dawson et al.29 found that HR increases more slowly the first minute of life in preterm infants and infants delivered by C-section compared with vaginally delivered full term newborns. Infections including chorioamnionitis can lead to fetal and neonatal tachycardia.32

Interventions affecting the parameter

An initial action after birth is tactile stimulation by drying the infant or rubbing the soles of its feet or the back to stimulate spontaneous breathing, and consequently supporting HR.33 Because the most common reason for bradycardia is insufficient lung aeration, effective PPV will most likely result in an increased HR.34,35,36

Reflex irritability

„… refers to response to some form of stimulation. The usual testing method was suctioning the oropharynx and nares with a soft rubber catheter which called forth a response of facial grimaces, sneezing, or coughing”.3

Physiological considerations

Hegyi et al.37 postulated that reflex irritability, together with respiratory effort and muscle tone reflect the neurological integrity of the infant.

Clinical considerations

Interobserver reliability in the assessment of reflex irritability is low, but higher than that of color, muscle tone, and respiratory effort.5

Predictive value

Reflex irritability correlates with respiratory effort and muscle tone,37,38 and may be predictive of cerebral palsy.39 A Swedish registry study showed that reduced reflex irritability was associated with increased risk of neonatal mortality only in term infants.40 On the other hand, Ensing et al.41 studied 276 infants admitted to the neonatal intensive care unit after a total 5 min Apgar < 7 and showed for the subgroup of 131 infants scoring zero for reflex irritability that outcome was predicted only by HR and respiratory effort.

Interference with the parameter

Gestational age has a significant impact on reflex irritability7,42 which is evidence that reflex irritability changes with physiological maturitation;40 and that grimaces, sneezing or coughing may be absent in extremely premature infants. Furthermore, assessment is affected by mode of inducing reflexes. Until 2005, airway suctioning was recommended as an initial step of resuscitation43 but discouraged thereafter.4 Thus, assessment of reflex irritability may have become less standardized during the last decade.

Interventions affecting the parameter

Not all infants are actively stimulated after birth, and as previously noted, routine nasal and oropharyngeal suctioning is discouraged.4 This raises the question of how assessment of this parameter is currently performed.5

Muscle tone

“…a completely flaccid infant received a zero score, and one with good tone, and spontaneously flexed arms and legs which resisted extension were rated two points”.3

Physiological considerations

Muscle tone is part of the evaluation of an infant’s physiological and neurological maturity, i.e., gestational age.44 Higher brain functions mature with advancing gestation and the cerebral cortex plays a central role in active muscle movement and tone, whereas passive muscle tone and primary reflexes are regulated in the brain stem,45 especially the pons. The pons is also central to sleep regulation, and fetal and neonatal muscle tone in hypoxia resembles different sleep stages in immature animals. In non-rapid eye movement (REM) sleep, there is a loss of muscle tone in the neck and trunk. In REM sleep, an inhibitory action of the pontine locus coeruleus results in a complete absence of muscular action potentials and generalized hypotonia.46 In fetal sheep, hypoxia-induced hypotonia results from a shift in pH and intracellular calcium that, via glutamatergic, adrenergic and pontine neuroinhibitory mechanisms, inhibits signaling through the spinal motor tracts.47

Adrenaline and other catecholamines increase in asphyxiated fetuses and play a central role in the pathophysiology of perinatal asphyxia.48 Low umbilical cord catecholamines correlate with reduced muscle tone in infants delivered by C-section.49 In perinatal asphyxia and hypoxic-ischemic encephalopathy (HIE), elevated catecholamines may explain the hyperalertness and normal or increased muscle tone characteristic of mild HIE.50 However, in severe HIE, depletion of cortical neurotransmitters and central neurological dysfunction cause hypotonia.50

Clinical considerations

Although Apgar stated that muscle tone is very easily classified with low interobserver variability,3 O’Donnell et al.5 and Rüdiger et al.51 found interobserver variability in muscle tone assessment in both term and preterm infants. Bashambu et al.52 found good interobserver agreement in the evaluation of a term infant video case, but not in preterm infants ≤28 weeks of gestation.

Predictive value

Muscle tone may be a predictor of neurological outcome including cerebral palsy.39 However, in infants with absent muscle tone, only 43% had a total Apgar score ≤3.40 This implies that more than half of infants assessed to have no tone had relatively reassuring values for other parameters of the score. Thus, tone was a less specific marker of compromise, potentially because the conventional Apgar score does not take gestational age into consideration.

Interference with the parameter

Muscle tone may be reduced secondarily to impaired breathing and pulse, and is closely linked to reflex irritability and color.37 While muscle tone has not been investigated as an independent ‘vital sign’ to guide immediate delivery room interventions, ERC recommends using it for initial evaluation after birth.4

Perinatal asphyxia and HIE result in changes in muscle tone3 and low umbilical cord pH have been association with poor neurological exam.53 Muscle tone is also affected by maturity42,44 and birth weight.37 Cnattingius et al.40 found that more than half of very preterm infants had reduced muscle tone and Pavageau et al.53 showed that tone could be reliably assessed as part of the Apgar score in infants 32–36 weeks of gestation if the infant’s developmental stage was taken into account.

Furthermore, muscular hypotonia is often associated with cardiac, neuromuscular and other genetic syndromes,50,54 other causes of neonatal encephalopathy than HIE,50,55 birth trauma,56 hypoglycemia, hypocalcemia, and infections.55 Finally, maternal anesthesia3,57,58 and delivery mode49,59 may also affect neonatal muscle tone.

Interventions affecting the parameter

Although only the effect of resuscitation on respiration and HR has been systematically addressed,60 it can be inferred that cardiorespiratory improvement positively affects muscle tone. The ERC states that “A very floppy infant is likely to need ventilatory support”, implying that assisted ventilation indirectly improves tone. Tactile stimulation to improve spontaneous breathing may improve muscle tone by some unknown mechanism, but poor tone is not explicitly stated as an indication to perform tactile stimulation.61 The physiological interplay between stimulation and tone is poorly understood.

Application of pressure to the face stimulates the sensory branches of the trigeminal nerve and may induce a parasympathetic reflex resulting in apnea and reduced muscle tone.62 Apnea inversely correlated with gestational age, lasted >30 s63 and occurred more often after the first application of binasal prongs or facemask.


Apgar evaluated a child that presented apneic at 60 s postnatally with the lowest grade of zero. An infant who breathed and cried received the highest score of two. Other types of respiratory effort, such as irregular, shallow ventilation received a score of one.3

Physiological considerations

Respiratory transition depends on lung aeration and is proposed to occur in three overlapping phases: (a) liquid airway clearance, (b) liquid accumulation within the lungs´ interstitial tissue compartment,64 and the final phase of (c) respiratory gas exchange and metabolic homeostasis.65

Lung development and growth depend on fetal lung liquid and fetal breathing movements,66 causing expansion of the lung tissues to stimulate growth.67 During intrauterine hypoxia, fetal breathing movements can be reduced or absent.20,68 In addition, maldevelopment of the lungs including lung hypoplasia, e.g., in infants with oligohydramnion69,70 and intrauterine growth retardation71 may cause an impaired respiration after birth.

Clinical considerations

Reliability of respiratory assessment immediately after birth shows a large variation in very low birth weight infants.51

Predictive value

Since delivery of oxygen to the circulation depends on sufficient ventilation, it can be assumed that poor ventilation is predictive for adverse outcome. However, there are no data available looking solely at the predictive value of respiration. Pulmonary hypoplasia, which is associated with poor ventilation, increases the risk of neonatal morbidity and mortality72,73,74 but this may be due to concomitant factors.

Interference with the parameter

Neonatal respiration can be affected by lung immaturity, pulmonary hypoplasia,69 hypothermia, neurological and muscular disorders,69 time of cord clamping, birth injury, and infection.

Interventions affecting the parameter

Pre- and postnatal exposure to drugs, as well as type of respiratory interface, may influence infant respiration immediately after birth.

Parameters not included in the Apgar score

When introducing her scoring system,3 Apgar selected five signs from a list “of all the objective signs which pertained in any way to the condition of the infant at birth”. The ones she did not select have never been revealed. However, she advised against using “breathing time, a satisfactory cry; and mild, moderate and severe depression”. Her five parameters quickly became the predominant assessment tools for over half a century. Only recently have publications explored alternatives or additions, including parameters that Apgar refuted, i.e., time to first breath and cry. Primitive reflexes have also recently been explored as additional delivery room assessment criteria.

Time to first breath

Prior to Apgar’s score, a commonly used variable was “time from delivery of the head to the first respiration”.75 This was abandoned until the early 1990’s when Saugstad et al.76 included it as a secondary outcome in their studies comparing air and 100% oxygen for neonatal resuscitation. From their Resair-2 trial,77 they noted that median time to first breath was significantly longer in infants exposed to hyperoxia during resuscitation compared to those receiving air. Time to first breath has also been used to assess the effects of delayed cord clamping78 and in low-resource settings by the Helping Babies Breathe program.79

Physiological considerations

Discontinuous variable fetal breathing movements quickly become regular and deeper after birth in order to clear lung liquid and open lung spaces. Stimuli include changes in the physical environment, increased pCO2, and chemoreceptor activation. These signals are processed in the two primary respiratory brainstem nuclei leading to efferent signaling to respiratory muscles.80 Hypoxia can depress this system and cause fetal primary apnea81 making the absence of breathing a potential marker of perinatal asphyxia.

Predictive value

In Resair-2,77 median time to first breath was related to 1 min Apgar score, potentially because breathing is a component of the score. There are neither reports relating this parameter to short- or long-term outcome, nor of its accuracy, interobserver reliability, or confounders such as delivery mode, maternal status, or gestational age.

Time to first cry

This was also a Resair-2 outcome, and as with time to first breath, 100% oxygen use was associated with a longer time to first cry.77 However, the parameter was not included in any prognostic analysis. More recently, the Helping Babies Breathe program, which includes absence of cry as a decision point,82 used time to first cry in studies of the program. In one study 95.2% of 2059 babies cried within a minute while only 0.34% took more than 2 min or failed to cry.83 In another smaller study, as many as 18 out of 130 babies failed to cry.84

Physiological considerations

Crying is a major factor, but not an absolute requirement,5 in postnatal lung aeration due to its prolonged expiratory phase and expiratory braking.85 This braking enhances the distribution of gas via pendelluft flow. In addition, crying raises the intra-airway pressure86 which helps keeping the larynx open.

Predictive value

There are limited data on reliability or prognostic significance of time to cry, but an Indian report using parent recall of 311 deliveries demonstrated a higher risk of perinatal death if crying was delayed or absent.87

Somannavar et al.83 found discrepancies between time to cry measured using a recording device vs. trained observers in only 3 of 430 pairs, i.e., an accuracy of 99.3%. As with time to first breath, there is an absence of data regarding clinical factors that could influence this parameter.

Primitive reflexes

Primitive reflexes are automatic motor responses present in newborns throughout the first few months of life (Table 1).88 Their presence or absence may reflect neurological status and risk of cerebral palsy.

Table 1 Approximate gestational ages at which primitive reflexes first appear, and gestational age when consistently observed.88,90,91

Physiological considerations

Primitive reflexes are presumed to be mediated through the brainstem, influenced by inhibitory cortical pathways. They appear between 25 and 35 weeks of gestation (Table 1)88 and develop with increasing gestational age.89

Predictive value

When primitive reflexes were included in a comprehensive neurological discharge exam of 210 premature infants, the negative/positive predictive values for an abnormal exam were for cerebral palsy 91%/38% and for developmental delay 73%/69%.90 The palmar grasp reflex had low interobserver reliability in infants <33 weeks, but was equal to the other Neonatal Resuscitation and Adaptation Score (NRAS) parameters and to the Apgar score parameters for older newborns. Given this observation and the variable onset of the primitive reflexes before 25–26 weeks gestation, and their inconsistency prior to 30–32 weeks gestation, their inclusion in any newborn assessment methodology is questionable.

Interference with the parameter

Reflexes depend on maturation. Although not in the delivery room, Allen and Capute91 found that all five infants born at 25 weeks of gestation demonstrated a grasp reflex, which led Jurdi et al.92 to include palmar grasp as the neurological parameter in their NRAS.

Future perspectives

Assessment of an infant’s condition in the delivery room represents a prerequisite to applying adequate medical support. Virginia Apgar described five clinical parameters for the initial infant assessment. However, since that time maternal and neonatal care has changed; interventions were improved and infants may be more premature. Within the context of these changes, a new scoring system that guides clinicians in assessing infants and helps to decide how to support an infant’s postnatal adaptation has to be critically appraised and finally tested in clinical trials.

HR seems to be the most objective criterion with the lowest interobserver variability. Assessment of respiration shows interobserver variations especially in very low birth weight infants,51 but remains one of the most important parameters to guide delivery room interventions. Other parameters of respiration such as time to first breath and cry do have some merits, but data on reliability or prognostic significance are limited. In accordance with previous reports, color seems to be unreliable and should be replaced by an appropriately applied pulse oximetry SpO2 measurement. However, color could be of some benefit in low-resource settings where SpO2 cannot be measured until e.g., a smart phone application has been validated to reliably determine SpO2. Reflex irritability and muscle tone may be of some relevance, but are gestational age-dependent, which should be taken into consideration when judging them. Furthermore, all of these clinical parameters are significantly affected by medical interventions. In order to improve assessment in the future, the following four issues have to be addressed:

What is the assessment aiming at?

Since Apgar’s first aim—to “draw attention to the infant”—has been achieved, a re-focusing of assessment is needed. The most important function of delivery room assessment should be directed toward determining how much medical support is needed and how the infant responds to these interventions. Thus, validation of a new assessment-tool should be based on its value in guiding clinical management. However, all historical and recent clinical trials of newborn assessment tools93,94,95,96,97 have used short-term (in-hospital) morbidities as outcomes, making it difficult to parse out a predictive function from immediate postnatal status and response to therapy. An acceptable outcome measure for guiding delivery room management is needed before any new score can be properly validated. This is unlikely going to be the typical neonatal outcome at 2 years corrected age as too many other confounders will either interfere in the time between the two events and/or need a very high number of infants to be included into such a study making its realization difficult.

How to account for developmental stages during assessment?

The condition of the infant depends on its maturity. The conventional Apgar score does not account for this and thus, low scores will reflect not only poor postnatal adaptation but also potentially immaturity. To overcome some of the shortcomings of the conventional Apgar score, the ‘Specified-Apgar’ score has been suggested to account for infant’s developmental stage; assigning two points if the parameter is “appropriate for gestational age”.7,98 Whereas “appropriateness for gestational age” is not well defined, it is currently the only evaluated method considering infant’s maturity during assessment. Use of nomograms for different parameters might be incorporated into a new tool to minimize subjectivity, but could also interfere with the ease of use of the tool.

How to account for interventions required to achieve the condition?

Currently, there is no agreement on how to account for medical interventions when assessing infant’s condition, partially explaining why Apgar scoring of extremely preterm infants systematically differed between units.25 If a new scoring system should help to decide how much medical support is needed and how the infant responds to these interventions, the interventions themselves should be taken into account. To limit the effects of the administered interventions on infant condition, it has been suggested to stop intervention at 1, 5 and 10 min when the infant’s condition is assessed,60 an approach that is not practicable in clinical routine. As an alternative, the ‘Expanded-Apgar’ takes each intervention into consideration that is required to achieve a certain condition.98 Such an approach has been validated in preterm and term infants and seems to be a very concise and easily applicable tool to score interventions used during neonatal support.

How to prove the validity of a new tool?

In the past, the predictive capacity has been used to validate the conventional Apgar. Similarly, a good prediction of neonatal mortality and morbidity has been shown for the “Combined-Apgar”, which consists of “Specified-Apgar” and “Expanded-Apgar” and takes not only prematurity into account, but also the interventions needed to achieve the condition.7 However, the trials of the modified Apgar scores or the NRAS92 were performed in subject samples that were significantly skewed to high-risk infants, which does not reflect the universal way the Apgar score is used, and might obscure problems with false positive results when done in every delivery. In addition, as noted, there is no accepted outcome with which to determine how well a new tool would assess the condition of the newborn and the effect of interventions, or, if surrogate measures such as mortality or morbidities are used, how well these surrogates reflect the primary goal of the assessment-tool. Beyond that, any new tool will require validation studies to examine interobserver variability of the new scoring tool.

In the future, rapid advances of electronic hardware and computational power may contribute to dealing with the increasing complexity of delivery room interventions through the development of devices that capture and analyze by artificial intelligence currently used and additional parameters. Such a device should be able to not only adjust FiO2 and respiratory support, but also give immediate feedback on the best approach to the individual infant.


70 years after Apgar’s seminal paper there is not only a need for but also a great chance to design a new scoring system that can replace the conventional Apgar score. Such a scoring system should account for the infant’s maturity and the interventions administered. Finally, to investigate the ability of such a system to define the need of interventions, to describe the efficacy of interventions, and to identify a potential to limit subjectivity and interrater variability, clinical validation studies should be undertaken.