The behavioral consequences of anesthetic neurotoxicity in the developing brain were first described in rats a decade ago (Jevtovic-Todorovic et al, 2003). Exposure to an anesthetic, consisting of nitrous oxide, isoflurane, and midazolam caused widespread neuronal death and impaired learning and memory months later (Jevtovic-Todorovic et al, 2003). Almost immediately, scientists, clinicians, and the public began to ask if the same could possibly happen in humans, although this question remains unanswered. Some studies have suggested that there are long-term effects of anesthesia, while others have not (Bartels et al, 2009; Block et al, 2012; DiMaggio et al, 2009; DiMaggio et al, 2011; Hansen et al, 2013; Ing et al, 2012; Kalkman et al, 2009; Satomoto et al, 2009; Sprung et al, 2012; Sprung et al, 2009; Wilder et al, 2009). The most important limitation of this body of literature is the inability to dissociate general anesthesia from the underlying disease or surgical procedure that required anesthesia. Also, the degree to which co-existing conditions, known to affect cognition in children, such as low birth weight (Pyhala et al, 2011), or prematurity (for review see the study by (Volpe, 2009)), affected the results is unclear. In rats, anesthesia can be administered in the absence of surgical procedures. Further, randomization should distribute rats with preexisting conditions, if any, evenly between anesthesia and control groups. If anesthesia caused a similar cognitive outcome in humans and rats, it could be reasonably assumed that this outcome is indeed due to the anesthetic.

Animal studies have linked anesthesia to impairments on recognition memory tests (Jevtovic-Todorovic et al, 2003; Kodama et al, 2011; Sanders et al, 2009; Satomoto et al, 2009; Shih et al, 2012; Stratmann et al, 2009b; Stratmann et al, 2009c; Zhu et al, 2010). Recognition memory judgments can depend on either recollecting specific event details or experiencing familiarity for the event (for review see (Yonelinas et al, (2010))). Recollection is supported by the hippocampus, anterior thalamic nuclei, frontal cortex, retrosplenial/posterior cingulate cortex, and white matter tracts connecting these regions (Yonelinas et al, 2010). Some of these anatomic areas are prime targets of anesthesia-induced neurodegeneration in the developing brain of animals (for review see (Loepke and Soriano, (2008))). Familiarity, by contrast, is supported by anterior medial temporal cortices including the perirhinal cortex (for review see (Eichenbaum et al, (2007))), a region that is commonly not affected by anesthesia-induced neurodegeneration (for review see (Loepke and Soriano, (2008))). We tested the hypothesis that anesthesia in infancy impairs recognition memory, and that these effects would arise because of a reduction in recollection rather than in familiarity. Recollection and familiarity can be tested in humans and recollection- and familiarity-like memory can be tested in rats using similar cognitive assays (Eichenbaum et al, 2007; Fortin et al, 2004; Sauvage et al, 2008; Yonelinas, 1997). Here we show that general anesthesia in infancy impairs recognition memory performance due to impaired recollection but not familiarity in both humans and rats.


Human Study

With IRB approval, anesthesia billing databases for the year 2004 for the University of California San Francisco and University of California Davis were queried for eligible participants. Information included name, date of birth, gender, American Society of Anesthesiologists (ASA) physical status, body weight, date of surgery, type of surgery, diagnosis, anesthesia start time, time of entry into the operating room, and time of exit from the operating room. Children were eligible if they received a general anesthetic before age 2, were 6–11 years of age at the time of testing, and generally healthy (ASA physical status 1 or 2 without diagnoses or operations that might be associated with cognitive impairment, such as neurosurgical procedures or congenital heart disease). The lower age limit of 6 years was chosen because younger children do not reliably perform the recognition memory tests. The upper age limit of 11 years was chosen because older children were more likely to have received anesthetic agents that are no longer commonly used such as thiopental, halothane, or enflurane.

The chart was reviewed for further inclusion criteria such as anesthetic dose of greater than 120 minimum alveolar concentration (MAC) × min, induction of general anesthesia with either a volatile agent (+/− nitrous oxide) or propofol, and maintenance with either a volatile agent or combination of a volatile agent and nitrous oxide. MAC, or minimum alveolar concentration, is a measure of anesthetic potency and is the concentration of inhaled anesthetic required to prevent movement in 50% of subjects in response to painful stimulus. We chose 120 MAC × min, because this anesthetic dose caused robust neurodegeneration in the developing rodent brain, whereas 60 MAC × min did not (Stratmann et al, 2009a). Due to clerical error, two patients were enrolled whose anesthetic doses were less than 120 MAC × min (40 and 59 min). Their data were included because the results and conclusion were not changed.

Parents of eligible participants were contacted to evaluate exclusion criteria and to obtain consent. Exclusion criteria, identified from phone interview and chart review, included ASA physical status 3 or greater, impairments of attention or learning, preexisting conditions possibly associated with neurocognitive impairment (eg, a history of disease or trauma to the central nervous system), cancer, premature delivery (<37 weeks gestation), low birth weight (<5 lb), and known genetic syndromes. Further exclusion criteria included potentially confounding intraoperative physiologic factors persisting for more than 5 min—hypotension (systolic, diastolic, or mean arterial pressure <30% from baseline), bradycardia (heart rate <30% from baseline), hypoxemia (oxygen saturation reading<93%), hypercarbia (PaCO2 >60 mm Hg), or dysthermia (temperature<35 °C or >35 °C). Finally, patients were excluded when meaningful participation in the study was unlikely (eg, color blindness or inability to speak English).

The control sample was recruited from a registry of parents who previously expressed interest in having their children participate in research. Children were selected to match age and gender. Potential control participants were excluded if the child received general anesthesia, met any exclusion criteria described for the anesthesia group, or were not of the targeted age range.

Recognition memory was tested using two separate tasks—color task and spatial task. Afterward, the Wechsler Abbreviated Scale of Intelligence was administered that provided verbal, performance, and full IQ scores. While participants were tested, their parents provided demographic and disease-related information on a questionnaire, which was consolidated with information from medical records. Parents also filled out the Child Behavior Checklist, a standardized parent-reported questionnaire on which a child is rated for behavioral adjustment.

Recognition Memory Measures

Two separate tests were used to assess recognition memory as previously described (Ghetti et al, 2010). The examiner was blind to exposure status, and interactions with test subjects were standardized and scripted. The test results provide measures of recollection index, familiarity index, and source memory. Recollection and familiarity were determined using analysis of receiver operator characteristics (ROC) of recognition memory data (Ghetti and Angelini, 2008; Yonelinas, 1994; Yonelinas et al, 2002).

For each task, eighty black ink drawings on a white, square background were presented sequentially on a computer screen. In the color task, the drawing was presented with one of four colored borders (red, blue, yellow, or green). In the spatial task, the drawing was presented in one of four quadrants of the screen. Subjects were instructed to remember the drawing and the border color or spatial location associated with it. The four conditions of color or location were used with equal frequency and in random order, and the sequence of the two tests (color and spatial) was counterbalanced across participants. Items were chosen from a set of 320 unambiguous line drawings that were validated with child participants for familiarity, visual complexity, and name agreement (Cycowicz et al, 1997).

Five minutes after viewing all 80 drawings, participants were given a self-paced recognition test that included 80 original (‘old’) drawings and 80 previously unseen (‘new’) drawings. Images were presented in the center of the screen without a colored border. Participants first determined whether they had seen the drawing before. Next, they rated the confidence in their recognition response (not at all, a little, or very confident). If the drawing was recognized, then participants were further asked to recall the color of the border or the quadrant of the screen in which it had appeared previously.

Recollection and familiarity were determined using ROC curves that were generated by plotting the rate of correct ‘old’ judgment (hits) against the rate of incorrect ‘old’ judgments (false alarms) as a function of response confidence, ranging from 1 (‘very sure this is new’) to 6 (‘very sure this is old’) (Ghetti and Angelini, 2008). A curve was then fitted to these data points using a least squares model. Based on the dual-process model (Yonelinas, 1994), at each confidence level, hit rates are described by the equation: P(Ing et al)=R+(1−R) Φ (d′/2−ci), reflecting the independent contribution of the single threshold process of recollection (R) and the continuous process of familiarity (Φ (d′/2−c) (ie, the proportion of old item distribution that exceeds the response criterion c), whereas false alarm rates are expected to be described by the equation: P(false alarm)=Φ(−d′/2−c), reflecting only the contribution of familiarity. Given that recollection and familiarity are assumed to remain constant along the ROC curve while only the criterion varies, the set of equations above (ie, two for each confidence level) can be solved to derive R and d′.

In addition, source recollection, or the ability to recollect specific details, was evaluated by measuring the proportion of trials in which individuals correctly remembered the associated color or location of images. Source recollection was calculated by dividing the number of correctly identified sources (color or spatial location) by the number of ‘old’ judgments.

Rat Study


With IRB approval, cross-fostered 7-day-old male Sprague–Dawley rats (n=42) were randomized to receive either sevoflurane anesthesia for 4 h at 1 MAC (n=25) as previously described (Shih et al, 2012) or no anesthesia (n=17). As MAC is not stable in immature rodents (Kodama et al, 2011; Stratmann et al, 2009b), roughly half of the sevoflurane-treated animals were tail clamped during anesthesia to titrate the anesthetic concentration to the clinically relevant endpoint of MAC. Tail clamping also produces tissue inflammation and scarring similar to surgery. The anesthesia protocol has been published in detail elsewhere (Shih et al, 2012; Stratmann et al, 2009b), and the dose and duration were chosen based on prior experiments that resulted in neuronal death and an identifiable cognitive deficit.

Odor recognition task training

Odor recognition testing was performed as before by experimenters blind to group assignment (Fortin et al, 2004; Robitsek et al, 2008; Sauvage et al, 2008). Rats were trained to perform a distinct behavioral sequence for the following: (1) an ‘old’ odor that had been previously presented on the same day, or (2) a ‘new’ odor that had not been previously experienced. The rats’ responses were biased across five conditions yielding discrete false alarm rates. Biasing conditions were based on cup height, food reward for ‘old’ judgment, and food reward for ‘new’ judgment as shown in Table 1. Two unscored odors (one old, one new) were presented initially after the delay to allow the rat to experience the biasing conditions, followed by 20 scored odors in random order (10 old, 10 new).

Table 1 Strategy for Biasing Responses Toward ‘old’ or ‘new’ Judgments

For each bias level, a ‘hit’ rate and ‘false alarm’ rate were obtained. Once false alarm rates varied by less than 20% over five consecutive sessions, at least five responses per bias level were scored and averaged. As the duration over which rats learned the rules of the task varied (between 3 and 9 months), only the last 5 weeks of testing was scored. Thus, all animals were of similar ages, between 9–10 months, at the time of data collection. Recollection and familiarity were determined using analysis of ROC of recognition memory data (Fortin et al, 2004; Robitsek et al, 2008; Sauvage et al, 2008).

Odor dilution test

We investigated whether anesthesia affects rodents’ sense of smell by assessing odor identification in stepwise dilutions of the original scent (0.5 g of scented sand per 100 g of clean, unscented sand), starting with 1 : 105 dilution down to a final dilution of 1 : 109. This allowed us to compare control and anesthetized rats’ ability to identify scents at dilution levels lower than those used in the recognition memory testing. If anesthesia impairs the sense of smell, then anesthetized rats would have lower accuracy of odor identification relative to control rats at the same scent dilutions. Following the dilution series, the scent was reintroduced at the highest dilution associated with normal performance (>80% accuracy). Eight randomly selected rats (five control, three anesthetized) were tested by experimenters blinded to group assignment.

Statistical Analysis

Sample size determination

For the human study, it was determined that a total of 52 patients would be required to detect a difference in means in the spatial recollection index of 0.16 with 80% power at a significance level of 0.05, if the s.d. of the recollection index is 0.20. SAS (version 9.2, SAS, Cary, NC) Proc Power for two sample t-tests based on mean differences was used to calculate the sample size. This effect size was derived from pilot data using six patients and six matched controls. We planned to enroll 28 patients per group to create a small margin of error. SAS (version 9.2, SAS) Proc Power for two sample t-tests based on mean differences was used to calculate the sample size.

In rats, the pilot experiment to determine anesthetic effect size in five animals per group revealed a statistically significant difference in recollection between groups, making sample size calculation redundant. To validate these results, the experiment was repeated twice using similar sample sizes as in the pilot experiment.

Data fulfilling parametric assumptions were expressed as means and 95% confidence intervals of the means. Nonparametric data were expressed as medians and interquartile ranges or medians and ranges (IQ data). Group differences in the primary outcome measures—recollection and familiarity indices—were tested using Student’s t-test in humans and Mann–Whitney U-test in rats (data in the sevoflurane group did not fulfill parametric assumptions because half of all animals had recollection index of zero). Analyses were conducted using SPSS (version 20, IBM, Somers, NY) and Prism 5 for Mac OS X (GraphPad, San Diego, CA). A P-value of 0.05 was considered statistically significant.

The effect of demographic variables (age at testing, gender, socioeconomic status) and the effects of clinically relevant variables (anesthetic duration, age at first exposure, anesthetic agent, cumulative dose expressed as MAC × hours) were explored using multivariate analysis of covariance (ANCOVA) and tested using Bonferroni correction to test for the interactive effects below. The effect of test type (spatial vs color ) on source recollection was assessed with a 2-way repeated measures analysis of variance (ANOVA). Further exploratory analyses of clinically relevant variables were conducted, which included both linear regression and Spearman’s rank order coefficients correlation analyses. In addition, Spearman’s correlation analyses were conducted to evaluate whether familiarity or recollection outcomes in either task were influenced by the total dose (MAC min) of anesthetic received. Analyses were conducted using SPSS (version 20, IBM). A P-value of 0.05 was considered statistically significant.


Human Data

Patient enrollment is summarized in Figure 1. Eight-hundred thirty four records were screened for anesthetic exposure before age 2, and 350 potential participants met enrollment criteria. Of these, 58 families had valid contact information and were contacted. Thirty of these families agreed to participate (52%). Over a period of 20 months (May 2011–January 2013), a total of 21 boys and 9 girls were enrolled and tested. Two enrolled patients were excluded from analysis (inability to comply with test instructions). By chance, all 28 included patients received their first anesthetic before age one. Cases (n=28) and controls (n=28) were well-matched for age and gender, IQ, and CBCL Total Problems scores. One exception is that patient families earned significantly more income than control families (Table 2). The mean age at first exposure was 6 months (95% CI: 5.3 to 7.6 months, range: 2.2 to 11.2 months) with boys being exposed earlier than girls (5.7 vs 8.3 months respectively, 95%CI of difference between means: 0.5 to 4.8 months, P=0.02)

Figure 1
figure 1

Flow chart summarizing participant enrollment. CBCL, Child Behavior Checklist.

PowerPoint slide

Table 2 Demographic Characteristics of Sample

A summary of the patient characteristics, including details of the anesthetic exposures and surgeries performed, are listed in Table 3. The median anesthetic dose was 203 MAC × min (interquartile range 155–325). The mean anesthetic duration, defined by the time of entry into the operating room until time of exit from the operating room, was 148 min (95% CI: 119–178). Eighteen patients received a single anesthetic of a median dose of 238 MAC × min (interquartile range: 179–355). Ten patients received multiple anesthetics before testing with an initial median anesthetic dose of 169 MAC × min (interquartile range: 148–203). There was no difference in IQ scores between anesthetized children (median 112, range: 66–146) and controls (median 110, range: 66–133, rank sum difference 4, P=0.98, Mann–Whitney U-test).

Table 3 Patient Characteristics Including Details about Anesthetic Exposures and Surgeries as well as Primary Outcomes for each Patient

The anesthetics consisted of a mixture of nitrous oxide and a volatile anesthetic (all patients were exposed to nitrous oxide). Propofol was used in three patients but never as the sole agent. The most frequently used volatile anesthetic was sevoflurane (n=26) followed by isoflurane (n=13) and halothane (n=7). Eleven patients were exposed to only sevoflurane and one patient was exposed to only halothane. Most patients were exposed to a combination of volatile agents, including sevoflurane and isoflurane (n=10), sevoflurane and halothane (n=3), halothane and isoflurane (n=1), or all three agents (n=2).

Recognition memory

Item recognition memory, ROC curves were obtained for participants, and estimates of recollection index (R, the y intercept of the ROC curve) and familiarity index (d′, degree to which the ROC curve bows upwards) were determined for anesthetized and non-anesthetized children (Figure 2a and b). The recollection index of anesthetized children was significantly lower than that of controls in both the color task (anesthesia mean 0.34 vs control mean 0.47, P=0.02, t-test) and spatial task (anesthesia mean 0.38 vs control mean 0.49, P=0.04, t-test). This difference retained statistical significance when family income, the only variable that distinguished the groups, was included in a covariate 2 (patients vs control) × 2 (color recollection index vs spatial recollection index) mixed measures ANCOVA (F(1,53)=8.37, P=0.006). The familiarity index, however, was not different between groups in either the color task (anesthesia mean 0.75 vs control mean 0.77, P=0.88, t-test) or spatial task (anesthesia mean 0.84 vs control mean 0.78, P=0.77, t-test).

Figure 2
figure 2

ROC curves and derived estimates of recollection and familiarity of a color recognition memory task in humans (a), a spatial recognition memory task in humans (b) and an odor recognition memory task in rats (c). The composite ROC curve for each human group is fit to the six (human) and five (rat) data points representing the means of all ‘hit’ and ‘false alarm’ rates for each of the confidence (humans) and bias (rats) categories. The y intercept of this curve is the recollection index R, the degree to which the curve bows upwards (asymmetry) is the familiarity index F. ROC curve error bars are SE. The diagonal straight line represents chance performance. Recollection and familiarity data are medians and interquartile ranges. Group ROC curves and recollection/familiarity estimates of anesthesia with and without tissue injury caused by tail clamping during anesthesia (d), showing that tissue injury does not change the effect size of anesthesia-induced impairment in recollection. Tissue injury was caused by tail clamping 50% of rats under anesthesia to determine anesthetic depth. SEM, standard error of the mean.

PowerPoint slide

To further evaluate the nature of memory impairments, separate analyses were conducted on the hit rates (eg, the probability of correctly recognizing an old item as old) and false alarm rates (eg, the probability of incorrectly recognizing a new item as old) (Figure 3). Hit rates alone, although, cannot inform the nature of the deficits because they reflect the contribution of both recollection and familiarity processes. For the color task, the hit rate was significantly lower in the anesthesia group (anesthesia median 54.4% vs control median 71.3%, P=0.002, Mann–Whitney U-test, Figure 3a). The hit rate in the spatial task was not significantly different between groups (anesthesia median 63.8% vs control median 73.8%, P=0.15, Mann–Whitney U-test, Figure 3b). False alarm rates were not different between groups in either task (Figure 3c, d).

Figure 3
figure 3

Human memory testing results. (a–d) The hit rates (eg, the probability of correctly recognizing an old item as old) and false alarm rates (eg, the probability of incorrectly recognizing a new item as old) are shown for each task. For the color task, the hit rate was significantly lower in the anesthesia group. False alarm rates were not different between groups in either task. (e and f) Source recollection performance in the color task was significantly impaired in the anesthesia group.

PowerPoint slide

Source recollection in the color task was significantly impaired in the anesthesia group (correct source recollection in color task: anesthesia mean 18.2% vs control mean 22.7%, P=0.03, t-test, Figure 3e). In the spatial task, the source recollection reduction in the anesthetized group approached, but did not reach, statistical significance (correct source recollection in spatial task: anesthesia mean 31% vs control mean 39.2%, P=0.08, t-test, Figure 3f). A direct comparison indicated that the location memory deficit was not significantly smaller than the color memory deficit (treatment F(1,54)=5.96, P=0.02, test F(1,54)=35.84, P<0.0001, interaction F(1,54)=0.54, P=0.47, two-way repeated measures ANOVA). Overall, these results show that anesthetized children exhibit reduced recognition memory performance.

Exploratory Analyses

We collected data of clinical relevance that may help guide future investigations. To examine the effects of current age and sex, we conducted two separate 2 (participant group: patient vs control) × 2 (sex: male vs female) ANCOVA, entering age at testing as a covariate with recollection in the color and spatial tasks. In the color task, there was a significant main effect of participant group, F(1,51)=4.92, P=0.03, while the main or interactive effects of sex and age did not achieve statistical significance, Fs(1,51)<3.79, Ps>0.06. The spatial task, meanwhile, showed a significant interaction between sex and participant group, F(1,51)=4.60, P=0.04. Males in the patient group performed significantly worse than males in the control group (mean adjusted for covariate: patients −0.34, 95% CI: 0.26–0.43 vs control −0.54, 95% CI: 0.44–0.63); however, this was not the case in females (mean adjusted for covariate: patients −0.45, 95% CI: 0.31–0.59 vs controls −0.41, 95% CI: 0.30–0.53). No other significant main or interactive effects emerged, Fs(1,51)<2.17, Ps>0.15.

To explore the effects of age at first exposure and anesthetic duration, we created two median split variables to identify: (1) patients above or below the median age (6.4 months) at first exposure and (2) patients whose exposure was above or below the median duration (132 minutes). We then conducted a 2 (sex: male vs female) × 2 (duration: short vs long) × 2 (exposure timing: early vs late) ANCOVA, entering age at testing as a covariate and recollection in each task as repeated measures. This analysis confirmed the main effect of sex on recollection and also revealed a significant interaction between sex and age at first exposure, F(1,20)=7.63, P=0.01. In females, those with early exposures exhibited better overall recollection than those with late exposures (mean adjusted for covariate: early—0.78, 95% CI: 0.51–0.97 vs late—0.37, 95% CI: 0.27–0.47), and females had higher recollection than males regardless of when they received anesthesia. There was also a significant interaction between age at first exposure and duration of exposure, F(1,20)=8.86, P=0.01. Children with early, short exposures had higher recollection than either those with early, long exposures (mean adjusted for covariate: short—0.66, 95% CI: 0.46–0.85 vs long—0.39, 95% CI: 0.27–0.50) or those with late exposures regardless of duration (mean adjusted for covariate: short—0.31, 95% CI: 0.20–0.41; long—0.37, 95% CI: 0.28–0.46).

We also computed nonparametric Spearman’s rank order correlation coefficients between anesthetic status, age at testing, CBCL total problems, gender, full-scale IQ, family income, and recollection and familiarity estimates for spatial and color tasks (Table 4). Notably, there was a significant association between family income (higher in patients) and IQ but not with recognition memory measures. Thus, although income positively correlated with IQ, it did not extend to memory performance and did not alter the relationship between anesthetic exposure and recollection. In addition, separate Spearman’s correlation analyses were conducted but did not identify a significant correlation between anesthetic dose (MAC min) and outcomes in the memory tasks: color task recollection (r=−0.098, P=0.62), color task familiarity (r=0.040, P=0.84), spatial task recollection (r=−0.095, P=0.63), or spatial task familiarity (r=−0.022, P=0.91).

Table 4 Spearman Rank Order Correlation Coefficients with Characteristics and Outcome Variables

Given that the only difference between patients and controls was family income, we further conducted four separate linear regressions in which anesthesia status, age at testing, CBCL score, gender, IQ, and family income were used to predict color recollection, spatial recollection, color familiarity, or spatial familiarity. Results of these analyses are presented in Table 5. Only participant status and age predicted recollection indices, whereas the other variables did not significantly predict either recollection or familiarity indices. These exploratory results are consistent with our primary analyses in which history of anesthetic exposure was associated with lower performance on recollection but not familiarity parameters.

Table 5 Regression Betas of Demographic Characteristics on Recollection and Familiarity Estimates

Rat Data

Recognition memory was tested in rats using odor as a memory trace. Hits and false alarms were recorded for each of five response bias levels, reoccurring in random order once a week. One sevoflurane-treated rat had to be excluded for inability to learn the rules of the task. Once stable, performance at each bias level did not decrease over time. Data averaged over the last 5 weeks of testing were included in the analysis.

The ROC curves, as well as recollection and familiarity estimates, for rats anesthetized with sevoflurane at a dose of 240 MAC × min (4 h at 1 MAC) on day 7 of life (n=16) and non-anesthetized controls (n=17) are shown in Figure 2c. Anesthetized rats had significantly reduced recollection relative to controls. The group recollection index of controls was 0.36, whereas the group recollection index of anesthetized rats was zero. This impression was confirmed by analysis of data derived from the individual ROC curves: 8 of 16 rats in the sevoflurane group had a recollection index of zero, whereas only 2 of 17 controls had a recollection index of zero. The median recollection index of controls was 0.2 compared with 8.9 × 10−5 for sevoflurane (P=0.027, Mann–Whitney U-test). Familiarity indices, on the other hand, were not different between groups (control median: 0.73 vs anesthesia median: 0.67; P=0.69, Mann–Whitney U-test).

An exploratory analysis revealed that group ROC curves, as well as individual recollection and familiarity indices, for rats who had been tail clamped during anesthesia (n=8) were not different from those of rats that had received anesthesia without tail clamping (Figure 2d).

Odor dilution test

We performed separate experiments to investigate whether the rodent recognition memory testing may have been affected by the sense of smell. A random sample of anesthetized rats (n=3) and control rats (n=5) underwent odor recognition testing with stepwise dilution of odors. Even at a dilution of 10−9 of the original scented sand—a significantly weaker concentration than that used in the recognition memory testing—both control and anesthetized rats were able to detect the odor with greater than 80% accuracy. Only when new (unscented) playground sand was used did performance drop to chance in both groups as expected (Supplementary Figure 1). Performance returned to >80% in both groups with reintroduction of the odors at a dilution of 10−9. It remains possible that anesthetized rats have an altered sense of smell; however, the threshold for such a deficit is below a dilution of 1 : 109 relative to the concentration used in the study.


The main finding of this study is that general anesthesia in infancy impairs recollection but not familiarity in humans, although neither IQ nor CBCL scores were adversely affected. A comparable deficit in recollection-like memory was identified in recognition memory experiments in rats. The fact that a single episode of anesthesia in infancy impairs certain aspects of brain function in animals is now widely accepted (for review see (Stratmann, (2011))). However, the relevance to humans and the particular cognitive effects that might occur following anesthesia have been unclear (DiMaggio et al, 2009; Kalkman et al, 2009; Wilder et al, 2009). The present study provides valuable insight into these questions.

Episodic memory is the memory of past experiences and, as it was first described, involves the conscious recollection of an event (Tulving, 1983). It cannot be confirmed whether episodic memory occurs in nonhuman species, as it requires introspection and consciously re-experiencing the past. However, it has been shown that animals may have episodic-like memory that can be demonstrated through tests involving memory for ‘what,’ ‘where,’ and ‘when’ details of an event. This was first described in birds (Clayton and Dickinson, 1998) and more recently in rodents (Dere et al, 2005; Eacott et al, 2005; Eacott and Norman, 2004; Fortin et al, 2004; Kart-Teke et al, 2006).

In rodents, as in humans, recognition memory may be further explained by a dual-process model where recognition comprises of familiarity and recollection (although not uniformly accepted (Eichenbaum et al, 2008; Wixted and Squire, 2008)). Animal models have since been developed to support the idea of recollection-like memory in rodents, and that it can be distinguished from the process of familiarity (Eacott et al, 2005; Easton and Eacott, 2010; Eichenbaum et al, 2010; Sauvage, 2010). These findings suggest that rodents, like humans, rely on recollection and familiarity to perform recognition tasks. Despite reduced recollection, rats can accurately perform recognition memory tasks by compensating with an increased reliance on familiarity (Sauvage et al, 2008). Nevertheless, a recollection deficit remains important, since under certain circumstances, for instance, if the delay between memory encoding and memory retrieval is increased, recognition memory relies to a greater extent on recollection and could thus be impaired (Fortin et al, 2004).

The finding that anesthesia impairs recollection in children and recollection-like memory in rats has important implications. While the human data alone cannot rule out the contribution of the surgical procedure and/or the underlying disease to memory deficits, because rats have comparable deficits, it suggests that anesthesia in infancy is responsible for impairing recollection later in life. We studied a cohort of children who were likely free of confounding medical conditions. It is reasonable to assume that rats, too, were either free of underlying diseases or that medical conditions were randomly distributed between anesthesia and control groups. We then examined a defined cognitive outcome—recognition memory—and its underlying processes of recollection and familiarity. We found that recognition memory accuracy was impaired in the anesthesia group, which may be attributed to reduced recollection, and this occurred in spite of higher socioeconomic backgrounds of anesthetized children. Consistent with the literature (Noble et al, 2007), there was a positive correlation between family income and IQ in our subjects. Also, although not significant, there was a trend toward improved recognition memory performance with higher income. These findings would have biased against the hypothesis and results.

Rats show a similar decrease in recollection but not familiarity in the absence of concurrent tissue damage, which supports the notion that the anesthetic, rather than the surgical procedure or underlying condition, causes the recollection deficit. Consistent with this, we previously found that both sevoflurane and isoflurane impair short-term memory in a water maze task in rats, regardless of tissue injury (Shih et al, 2012; Stratmann et al, 2009a). The type of tissue injury used in half of the rats to assess anesthetic depth (tail clamping) is at least as noxious as surgical tissue injury (Eger et al, 1965). The fact that a similar test detects an anesthetic effect establishes a translational model of anesthetic neurotoxicity in infantile humans and rats. This should allow a more focused therapeutic approach to this problem by first assessing the effect of interventions in rats before designing clinical trials.

Among 28 anesthetized subjects, nine had a familiarity index of zero, and two of these also had a recollection index of zero. The zero-value indices, however, occurred in different tasks—the color task for one individual and the spatial task for the other. The higher frequency of zero-value familiarity indices does not indicate that anesthetized subjects were primarily affected in familiarity. Eight individuals in the control group also had a familiarity index of zero, and only the recollection indices for the anesthetized group were significantly lower than the control group. A zero value for familiarity may be in part due to the constraints of the model and fitting an individual’s performance to those parameters and more likely reflects the variability between individuals in their use of familiarity and recollection to recognize items, as one may compensate for the other to achieve successful recognition (Sauvage et al, 2008; Fortin et al, 2004).

Clinical significance of the observed primary outcome

What problems might an impairment in recollection-based memory present in day-to-day life? Recollection has demonstrated roles in autobiographical memory, prospection, classroom learning, reading comprehension, etc. (Ghetti and Bauer, 2012). Thus, even subtle recollection deficits may have immediate consequences and reduce the child’s potential to learn over time, which future studies should examine more closely. However, other factors such as motivation, attention, and intelligence also determine a child’s ability to learn and succeed. It has been suggested that anesthesia that impairs motivation in primates years after exposure (Paule et al, 2011) is associated with attention deficit disorder in humans (Sprung et al, 2012), and with disabilities in language and cognition (Ing et al, 2012) in humans. The contributions of these potential anesthetic effects to the cognitive state in general, and their significance to recognition memory in particular, will have to be unraveled in future studies. Also, the cumulative lifetime impact of 20–25% impairment in children’s recollection memory, as observed in this study, may be substantially greater than what might be apparent at 6–11 years of age and also warrants further study.

Demographic and clinically relevant variables

There were gender differences with respect to age at first anesthetic exposure, anesthetic duration, and vulnerability to anesthesia-induced impairment in recollection. Boys were overrepresented in the anesthetic cohort, which is expected, given the male preponderance among infants requiring surgery (Bartels et al, 2009; Block et al, 2012; DiMaggio et al, 2009; Kalkman et al, 2009; Sprung et al, 2012; Wilder et al, 2009). Boys were exposed roughly 2.5 months earlier than girls, and boys appeared to be more vulnerable to the effect of anesthesia on recollection than girls, which is consistent with recently published results from an animal study (Lee et al, 2014). Although recollection deficits were observed in both sexes, only the males exhibited a significant impairment in the spatial task. In addition, there was some indication that females who received anesthesia before 6 months of age performed better than those exposed later in life and that longer durations of anesthesia were associated with worse memory outcomes. The small sample size of the current study, however, limits the conclusiveness of these potential differences, and future studies evaluating gender, anesthetic duration, and age of exposure will be important.

Study limitations and areas of further study

There is a potential source of bias inherent in missing data from inability to contact or recruit participants. We do not know how this might have affected the results of this study. However, the fact that rodent and human data show substantial agreement seemingly validates the anesthetic effects. Further, absence of an effect of tissue injury in rats suggests that anesthesia, and not the surgical procedure, causes the recollection impairment. We cannot rule out the possibility, though, that the underlying condition or surgical procedure caused the deficit in humans, while anesthetic exposure alone led to the same outcome in rodents. This possibility, although unlikely, must be considered in future studies.

A causal link between anesthesia during infancy and long-term impairment of recollection would be further supported if the memory deficit in humans was observed in the absence of surgery (for instance, in children who only underwent magnetic resonance imaging scan under anesthesia). Another possible approach for testing the strength of the association between anesthesia and impaired recollection would be to compare a purely regional anesthetic vs general anesthesia for a given underlying condition or surgical procedure. Alternatively, monozygotic twins discordant for anesthetic exposure might be compared using cognitive endpoints.

It has been previously suggested that single anesthetic exposures are not associated with adverse cognitive outcomes, unlike multiple exposures (Sprung et al, 2012; Wilder et al, 2009). We found no difference between patients receiving single (n=18) vs multiple (n=10) anesthetics in any of the measured outcomes. This is consistent with a recent report suggesting that a single anesthetic exposure is associated with deficits in language and cognition (Ing et al, 2012) and evidence against the notion that a single anesthetic exposure in infancy is innocuous.

Finally, the anesthetic administered in the human arm of this study always included nitrous oxide and nearly always included sevoflurane and a narcotic. Exploratory analyses did not reveal an effect of the anesthetic agent used, which may be related to the small sample sizes of subgroups receiving a particular anesthetic agent. Given that individual agents act on unique receptors, understanding their differential contributions to the impairment in recollection is an important area of further investigation.

In conclusion, anesthesia in infancy impairs recognition memory performance. Recollection, specifically, accounts for this deficit and is impaired in both humans and rats anesthetized during infancy, while familiarity is unaffected.


The human study was funded by the International Anesthesia Research Society Clinical Scholars Award to GS. The rat study was funded by the John Severinghaus Research Award (Department of Anesthesia and Perioperative Care, UCSF) to GS. Further support was from the NIH R03HD054636 to SG and K08 GM06511 to JWS, the Foundation for Anesthesia Education and Research (Medical Student Anesthesia Research Fellowships to AA and EC) and the Department of Anesthesia and Perioperative Care, UCSF (informal grant to NL, Hamilton Award to JWS). The authors declare no conflict of interest.