To characterize acoustic features of an infant’s cry and use machine learning to provide an objective measurement of behavioral state in a cry-translator. To apply the cry-translation algorithm to colic hypothesizing that these cries sound painful.
Assessment of 1000 cries in a mobile app (ChatterBabyTM). Training a cry-translation algorithm by evaluating >6000 acoustic features to predict whether infant cry was due to a pain (vaccinations, ear-piercings), fussy, or hunger states. Using the algorithm to predict the behavioral state of infants with reported colic.
The cry-translation algorithm was 90.7% accurate for identifying pain cries, and achieved 71.5% accuracy in discriminating cries from fussiness, hunger, or pain. The ChatterBaby cry-translation algorithm overwhelmingly predicted that colic cries were most likely from pain, compared to fussy and hungry states. Colic cries had average pain ratings of 73%, significantly greater than the pain measurements found in fussiness and hunger (p < 0.001, 2-sample t test). Colic cries outranked pain cries by measures of acoustic intensity, including energy, length of voiced periods, and fundamental frequency/pitch, while fussy and hungry cries showed reduced intensity measures compared to pain and colic.
Acoustic features of cries are consistent across a diverse infant population and can be utilized as objective markers of pain, hunger, and fussiness. The ChatterBaby algorithm detected significant acoustic similarities between colic and painful cries, suggesting that they may share a neuronal pathway.
All infants cry to motivate their caregivers to respond to their needs.1 As a result, caregivers tend to interpret a baby crying as a signal of distress or need. Infants follow a predictable cry curve with a peak in intensity at around 6–8 weeks, and persistence after 3 months may be considered pathologic.2 The ability to distinguish pathological cries in infants using acoustic feature extraction and classification algorithms is validated in the literature; 27 prior studies were able to discriminate pathological infant cries (Down’s syndrome, brain damage, Cri du Chat) with an average accuracy rate of 96.9%.3
Acoustic analyses of an infant’s cry could be instrumental in the home setting. Despite caregivers’ best intentions, interpretation of infant cries can be difficult. The perceptions of the listener can be influenced by their sleep habits, mental state, their own physiologic response to the cry, and other sociodemographic factors.4,5 Machine learning could offer an objective assessment of the acoustic features of infant cries to translate their behavioral states.6 This would contribute significantly to infant care by distinguishing if an infant was experiencing pain or if they were responding to another behavioral state (i.e., hunger or being fussy).
It is not only in the home environment that machine learning could aid in infant care. Clinical care and especially hospital settings focus on mitigation of infant pain. Historically, it was believed that infants were incapable of feeling pain.7 However, recent research into the developmental physiology of nociception indicates that the opposite is true. Untreated pain in neonates can leave a lasting neurophysiological footprint associated with decreased brain8,9 and body growth,10 altered neural connections and organization,11,12 poorer cognitive and motor function,13 impaired visual–motor integration, and poorer executive functioning skills.14,15 To assess pain, providers rely upon rating scales such as the Neonatal Infant Pain Scale,16 premature infant pain profile,17 Face, Legs, Activity, Cry, and Consolability scale,18 and Crying, Oxygenation, vital signs, facial Expression, and Sleeplessness scale,19 among others. Most estimates of inter-rater reliability of infant scales are high16,20,21 with some studies showing poor agreement across these scales in measurements,22,23 suggesting that both clinical factors and the choice of scale may strongly influence the magnitude and the reliability of these pain measurements. In addition to measurement of pain using subjective infant pain scales, smaller-sample studies have found that infants in pain cry differently from infants who are not experiencing pain—with algorithms showing between 74% and 90% accuracy, as discussed further in the Supplementary Material.
These small-sample algorithms were not portable by nature; this leaves room for a universally applicable machine learning program to help home caregivers and medical providers accurately assess infant cry and determine when the infant is experiencing pain vs. another behavioral state. On the basis of finding a quantitative measure of infant cries, we created a free phone app, ChatterBabyTM, as an accessible and portable algorithm deployment to predict whether a baby’s cry was due to one of the three behavioral states: pain, hunger, or fussiness. The algorithms were then applied to infant cries where parents reported their infants as having colic. This process simulates an initial clinical visit where the parent has complaints of colic and a workup for conditions like reflux esophagitis or infantile migraine may be initiated and diagnosed. We hypothesized that colic cries would be acoustically similar to pain cries, a finding that would explain and validate caregiver distress regarding caring for an infant with colic.
This ChatterBaby study was conducted according to and approved by the UCLA Institutional Review Board (IRB#15-000931). Painful stimuli were defined by needles: routine vaccinations (without analgesia) and elective ear-piercings. Because audio was recorded in the natural environment, infants were in a variety of settings while being recorded, with ambient occurring background noise (adult voices, etc.) using different recording devices (e.g., cell-phones). Full details on data acquisition and statistical methodology are provided in Supplementary Material. In Supplementary Material, we also present a secondary cry detection algorithm that screens out cries from baby neutral/baby laughing/nuisance sounds.
After quality control, the study population for the primary cry states (Fussy, Hungry, Pain) included 691 infants (36% female) who were between the ages of 0 and 24 months (average age 3 months) for the primary training dataset of pain/hungry/fussy. Approximately 55% of infants’ ages were missing due to the voluntary submission of this variable. In all, 75% of the infants assessed were <6 months of age. All primary cries were from unique episodes and users. The colic population included 64 infants between the ages of 2 days and 4 months, with a median age of 2 months.
Pain cries (n = 353) were captured during two acutely painful stimuli (vaccinations, ear-piercings). Caretakers characterized other cries as “fussy” and “hungry,” followed by two independent characterization of each cry sound by two multiparous raters (authors A.E.A. and & B.S.). No cries in the ChatterBaby training database were from any of the authors’ children. Cries without unanimous agreement among the three-member rating panel (11.8%) were excluded from further analyses and were not reclassified. This process resulted in 171 fussy cries, 167 hungry cries, and 353 pain cries in the final training cohort. Colic cries were nominated by the parent/caretaker. Multiple colic cry samples (n = 380, 64 babies) were acquired across each cry episode, including ending periods where whimpering/fussing may have been present, to avoid selection bias in sample collection and assessment. This method of data acquisition yielded roughly 30 s (6 samples) of cry time from each child, a process that provided a wider range of time than is typically seen in most studies. Spectrograms for a single cry from each type are presented in Fig. 1.
Infant cries were summarized using the acoustic feature set previously used to identify pathological vocal patterns in neurological disorders such as Parkinson’s Disease24 and Amyotrophic Lateral Sclerosis,25 extracting >6000 acoustic features from each cry. Supra-segmental (utterance-level) acoustic features were extracted from 5-s cry clips26,27,28 using IS13_ComParE.conf in OpenSmile.29
To create the cry translation algorithm, a probabilistic random forests classifier was used to predict the category of a cry (fussy, hungry, pain) given its acoustic features using default parameter settings in R (500 trees, 1/3 of features sampled with replacement as possible predictors to construct individual trees).30 The random forests out-of-sample classification accuracy, analogous to the cross-validation error, was computed to estimate the testing accuracy of the algorithm on new data (Table 1). Further technical details are presented in Supplementary Material.
Using only the 200 most predictive features, the algorithm was retrained on the primary cries and tested on the colic cries, with roughly 6 colic cries obtained from the same cry episode per child (~30 s). Testing longer cry segments from the colic infants reduces the probability of selection bias; acoustic sample included segments of milder fussiness and whimpering following extreme bouts of crying, when available. The average pain probability from colic cries was compared with the out-of-sample pain-level predictions from the primary cries (fussy, hungry, pain) to test the hypothesis that colic cries were more closely associated with pain than the hungry or fussy states. We additionally assessed for longitudinal/age effects by testing for temporal drift within a single child who was not used for algorithm training, using cry recordings collected six separate times during routine vaccinations between 87 and 618 days of age, without the usage of analgesic.
The primary cry algorithm achieved overall accuracy in classifying among the three states as 71.5%, with the confusion matrix shown in Supplementary Material. The primary cry algorithm, trained as a multivariate classifier, was then treated as a binary classifier for obtaining Pain accuracy rates by pooling the Fussy and Hungry predictions as a “No Pain” category. The predictive accuracies for painful cries are shown in Table 1: sensitivity/recall of 0.91 (95% confidence interval (CI) = 0.876, 0.937), specificity = 0.68 (95% CI = 0.628, 0.727), positive predictive value = 0.75, negative predictive value = 0.87. The prevalence of Pain was 0.51, with the algorithm performing significantly above chance (p < 0.001). The area under the curve (AUC) = 0.88 as shown in the receiver operating characteristic curve in Supplementary Material. The AUC measures how effective the algorithm is at separating true positives and false positives over a range of decision thresholds.
Although the random forests algorithm had access to >6000 features, many of these features had low importance values and were not useful to discriminate among the different cry states, as shown in Supplementary Material. When testing the algorithm on colic cries, the probability of pain was significantly different across the three predictive states (p < 0.0001, analysis of variance), with the typical colic cry being predicted as 73% chance of painful. When comparing colic cries to fussy/hungry cries, the pain levels in colic were significantly greater than the pain levels seen in fussy and hungry cries (p < 0.001, 2-sample t test, Bonferroni corrected). As shown in Fig. 2, the average pain rating in colic was 0.73 (sd = 0.21), while the average out-of-sample predicted pain rating for fussy was 0.30 (sd = 0.18), hungry = 0.38 (sd = 0.19), and pain = 0.67 (sd = 0.20).
Previous literature demonstrated an increased pitch (fundamental frequency) in both pain and colic31 cries compared to fussy and hungry states, which we confirmed here (p < 0.05; 2-sample t test with Bonferroni correction, see Supplementary Material). The pitch did not significantly differ between colic and pain (p > 0.05, 2-sample t test with Bonferroni correction; see Supplementary Material). For many acoustic features such as loudness, energy, and pitch, the ordinal values fell in a spectrum ranging from fussy, hungry, pain, to colic. This spectrum suggests that colic cries are more intense acoustically than vaccination cries, although the clinical interpretation of this acoustic relationship is unknown. Fussy cries were the mildest acoustically across many acoustic metrics.
It is possible to use mobile recording methods to provide accurate and usable clinical information on an infant’s cry and behavioral state. With 70–90% accuracy, an easily accessible mobile app was built off of prior knowledge of the acoustical features of pathological cries in infancy. It was used to further explore a common diagnosis of infancy affecting one in five neonates and defined entirely by excessive crying: colic.
The acoustic markers of pain were multiple and complex, extending far beyond changes in pitch as was reported previously in the literature.31 The colic cries were not different from pain cries in their fundamental frequency (2-sample t test, p > 0.05), but the colic fundamental frequency was significantly elevated compared to hungry and fussy vocalizations (p < 0.05, Bonferroni corrected). This confirms the earlier findings of Lester et al. and St. James-Robert,32,33 which relied on significantly smaller-sample sizes than those assessed here.
Our work demonstrated that colic cries are more similar to pain cries than to either fussy or hungry cries, suggesting that colic could be a painful condition for infants or share similar source processes.34 Often colic occurs in the evening and clinicians do not observe it and have to rely on caregivers’ reports of the crying. Positive reinforcement and support for caregivers is considered the standard of clinical care in colic2 and focuses on helping caregivers through a stressful period. In 95% of cases of colic, a thorough workup for underlying medical disorders fails to uncover a definitive explanation for the infant’s presentation, and these infants will develop normally once they “outgrow” their colic.
Despite treatment through reassurance from providers, infantile colic is associated with increased rates of maternal anxiety and depression.35,36,37,38,39 Our results suggest that parents may be distressed by the cries of infants with colic because they may hear acoustic signatures indicative of pain as demonstrated in the algorithm. As such, clinicians might consider pain control (i.e., appropriate Tylenol dosing or behavioral pain control methods) as part of the management of colic.
There are several limitations to this study. Not all infants may respond to pain with a cry, thus a subset of infants experiencing pain may not have been reviewed. Our pain cries were in response to acutely painful stimuli; chronic pain may not show the same acoustic features. This could be elucidated with more pain samples from infants experiencing chronic pain (i.e., hospitalization with need for multiple procedures, such as intravenous access and lumbar punctures). Pain is also a subjective feeling, and degree of pain experienced by infants in the study could not be assessed. In addition, colic cries were labeled using parental assessment. It is not known whether these infants carried a clinical diagnosis of colic or whether they ever underwent any treatment for underlying medical conditions. Future studies will focus on clinically determined colic, rather than relying solely upon parental assessment. Of note, the diagnosis of colic is often based on history; so despite this being a limitation, it is likely a technique used diagnostically in the pediatrician’s office. The environment of the data collection was varied because it was performed by the caretakers: infants were in a variety of positions while vocalizing with naturalistic background noises present including adult voices and small children and were collected using a variety of recording devices such as cell-phones. However, the absence of a controlled environment simulates the variability of the testing environment in which these algorithms ultimately will be used, providing a more realistic estimate than previously published work on how these algorithms will fare when applied to new infants in new environments. Finally, we did not optimize the machine learning parameters within this algorithm intentionally, in order to avoid biasing the testing accuracy estimate. Our results are likely a lower bound for predictive accuracy, which we will refine with new data using deep learning algorithms.
Cry profiles may differ by age, which was unlikely to affect our results in secondary testing. Within the longitudinal vaccination recordings from a single child who was not used to create the algorithm, the Fussy/Hungry/Pain algorithm predicted similarly and consistently that the baby was experiencing pain for all six trials (average pain probability = 0.63, sd = 0.04). This suggests that the algorithm was not sensitive to aging effects within the age range evaluated (Fig. 3, also see Supplementary Material for Spectrograms of vaccine cry across age). Five-s audio clips from this child’s vaccinations over an 18-month period are available online at https://www.youtube.com/watch?v=eu332YZFTkA. Infant age and demographics were voluntarily provided, resulting in missing data. Because of this, we could not determine whether the predictive accuracy of our algorithm depends upon an infant’s age or whether our algorithm performs differently on preterm infants or those with developmental disorders. However, for a single infant not contained in the algorithm training dataset, six vaccination cry recordings were examined for age-related variation in pain ratings. These cry recordings were taken between 87 days and 618 days. Overall, the cry patterns were consistent across age (Fig. 3), but because this was for a single child, we cannot rule out different growth patterns in other children.
Although infant pain has both short- and long-term consequences, previously there was no automated quantitative device for pain or behavioral assessment in the home environment where most crying occurs. We developed a solution as a free smartphone app, ChatterBaby, available at https://chatterbaby.org. The measurements derived from the ChatterBaby algorithm may have in-hospital functions as well- a direction for future research. Passive acoustic pain assessment could serve as a complement to infant pain scales or a baseline metric for comparison of existing infant pain scales. With machine learning, we explored the acoustical features of excessive crying or colic. Future work will explore further evidence of whether colic is painful or whether colic merely shares similar neuronal connections as pain sensations. Such distinction would identify whether pain control merits as a part of colic treatment. The benefits and utility of a cry-translation algorithm have yet to be executed in clinical practice but are promising and wide-reaching, meriting further investigation.
LaGasse, L. L., Neal, A. R. & Lester, B. M. Assessment of infant cry: acoustic cry analysis and parental perception. Ment. Retard. Dev. Disabil. Res. Rev. 11, 83–93 (2005).
Douglas, P. & Hill, P. Managing infants who cry excessively in the first few months of life. BMJ 343, d7772 (2011).
Saraswathy, J., Hariharan, M., Yaacob, S. & Khairunizam, W. Automatic classification of infant cry: a review. In 2012 International Conference on Biomedical Engineering (ICoBE) 543–548 (IEEE, 2012).
Esposito, G., Nakazawa, J., Venuti, P. & Bornstein, M. H. Judgment of infant cry: the roles of acoustic characteristics and sociodemographic characteristics. Jpn. Psychol. Res. 57, 126–134 (2015).
Joosen, K. J. et al. Physiological reactivity to infant crying and observed maternal sensitivity. Infancy 18, 414–431 (2013).
Cabon, S. et al. Video and audio processing in paediatrics: a review. Physiol. Meas. 40, 02TR02 (2019).
Bellieni, C. V. Pain assessment in human fetus and infants. AAPS J. 14, 456–461 (2012).
Smith, G. C. et al. Neonatal intensive care unit stress is associated with brain development in preterm infants. Ann. Neurol. 70, 541–549 (2011).
Brummelte, S. et al. Procedural pain and brain development in premature newborns. Ann. Neurol. 71, 385–396 (2012).
Grunau, R. E., Holsti, L. & Peters, J. W. Long-term consequences of pain in human neonates. Semin. Fetal Neonatal Med. 11, 268–275 (2006).
Anand, K. & Scalzo, F. M. Can adverse neonatal experiences alter brain development and subsequent behavior? Neonatology 77, 69–82 (2000).
Anand, K. J. & Hickey, P. R. Pain and its effects in the human neonate and fetus. N. Engl. J. Med. 317, 1321–1329 (1987).
Mitchell, A. & Boss, B. J. Adverse effects of pain on the nervous system of newborns and young children: a review of the literature. J. Neurosci. Nurs. 34, 228 (2002).
Hall, R. W. & Anand, K. J. Pain management in newborns. Clin. Perinatol. 41, 895–924 (2014).
Anand, K. J., Papanicolaou, A. C. & Palmer, F. B. Repetitive neonatal pain and neurocognitive abilities in ex-preterm children. Pain 154, 1899–1901 (2013).
Lawrence, J. et al. The development of a tool to assess neonatal pain. Neonatal Netw. 12, 59–66 (1993).
Stevens, B., Johnston, C., Petryshen, P. & Taddio, A. Premature infant pain profile: development and initial validation. Clin. J. Pain 12, 13–22 (1996).
Manworren, R. C. & Hynan, L. S. Clinical validation of FLACC: preverbal patient pain scale. Pediatr. Nurs. 29, 140 (2003).
Krechel, S. W. & BILDNER, J. CRIES: a new neonatal postoperative pain measurement score. Initial testing of validity and reliability. Pediatr. Anesth. 5, 53–61 (1995).
Jonsdottir, R. B. & Kristjansdottir, G. The sensitivity of the premature infant pain profile–PIPP to measure pain in hospitalized neonates. J. Eval. Clin. Pract. 11, 598–605 (2005).
Taddio, A. et al. Evaluation of the reliability, validity and practicality of 3 measures of acute pain in infants undergoing immunization injections. Vaccine 29, 1390–1394 (2011).
Ahn, Y., Kang, H. & Shin, E. Pain assessment using CRIES, FLACC and PIPP in high-risk infants. J. Korean Acad. Nurs. 35, 1401–1409 (2005).
Suraseranivongse, S. et al. A comparison of postoperative pain scales in neonates. Br. J. Anaesth. 97, 540–544 (2006).
Grósz, T., Busa-Fekete, R., Gosztolya, G. & Tóth L. Assessing the degree of nativeness and Parkinson’s condition using Gaussian processes and deep rectifier neural networks. In Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH, 2015).
Wang, J. et al. Predicting intelligible speaking rate in individuals with amyotrophic lateral sclerosis from a small number of speech acoustic and articulatory samples. Workshop Speech Lang. Process. Assist. Technol. 2016, 91–97 (2016).
Schuller, B. et al. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In INTERSPEECH (Conference of the International Speech Communication Association) (INTERSPEECH, 2013).
Schuller B. W., et al. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In INTERSPEECH (Conference of the International Speech Communication Association) 427–431 (INTERSPEECH, 2014).
Schuller B., et al. The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, Parkinson’s & eating condition. In Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH, 2015).
Eyben, F., Wöllmer, M. & Schuller, B. Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proc. 18th ACM International Conference on Multimedia 1459–1462 (ACM, 2010).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Barr, R. G., Paterson, J. A., MacMartin, L. M., Lehtonen, L. & Young, S. N. Prolonged and unsoothable crying bouts in infants with and without colic. J. Dev. Behav. Pediatr. 26, 14–23 (2005).
Lester, B. M., Boukydis, C. Z., Garcia-Coll, C. T., Hole, W. & Peucker, M. Infantile colic: acoustic cry characteristics, maternal perception of cry, and temperament. Infant Behav. Dev. 15, 15–26 (1992).
St James-Roberts, I. What is distinct about infants’“colic” cries? Arch. Dis. Child. 80, 56–62 (1999).
Harb, T., Frederiksen, N. & Hill, R. J. Is sensory processing an issue for infants with colic? Infant Behav. Dev. 48, 105–113 (2017).
Humphry, R. A. & Hock, E. Infants with colic: a study of maternal stress and anxiety. Infant Ment. Health J. 10, 263–272 (1989).
Stifter, C. & Bono, M. The effect of infant colic on maternal self‐perceptions and mother‐‐infant attachment. Child Care Health Dev. 24, 339–351 (1998).
Vik, T. et al. Infantile colic, prolonged crying and maternal postnatal depression. Acta Paediatr. 98, 1344–1348 (2009).
Carey, W. B. Maternal anxiety and infantile colic: is there a relationship? Clin. Pediatr. (Phila.) 7, 590–595 (1968).
Maxted, A. E. et al. Infant colic and maternal depression. Infant Ment. Health J. 26, 56–68 (2005).
This research was supported by a subaward from NIH/National Center for Advancing Translational Science (NCATS #UL1TR001881 to A.E.A.), University of California, Los Angeles (UCLA) Clinical and Translational Science Institute (CTSI Grant Number UL1TR000124 to A.E.A.), the Autism Center for Excellence (NICHD #2P50HD055784-11 to A.E.A.), Jane and Terry Semel Institute for Neuroscience and Human Behavior at UCLA, the David Weil Endowment Fund, and the Burroughs Wellcome Fund (Career Award at the Scientific Interface). We thank Malina Beatrice, Herbert Powell, Shafali S. Jeste, Delbert Whetter, Sanaz Whetter, Mirella Diaz Santos, Yvette Quezada, and Enzo Acerbi for additional comments and feedback on our approach. We gratefully acknowledge the support of all ChatterBaby app users and of our many additional collaborators on this project, who are listed in detail at https://chatterbaby.org.
ChatterBabyTM technology is owned by the University of California, Los Angeles and is available as a free smartphone app for iOS and Android devices. The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Parga, J.J., Lewin, S., Lewis, J. et al. Defining and distinguishing infant behavioral states using acoustic cry analysis: is colic painful?. Pediatr Res 87, 576–580 (2020). https://doi.org/10.1038/s41390-019-0592-4