Introduction

The number of surgical procedures performed worldwide is estimated at 312 million1. Nitrous oxide (N2O) is used in a significant number of procedures. In the USA, for example, around 35% of all general anesthesia include nitrous oxide2. N2O is also increasingly administered with oxygen (50%) for pain and stress management in obstetrics during delivery, in pediatrics and ambulatory medicine for peripheral minor surgery. For general anesthesia, because of its weak anesthetic properties, it is used as an adjuvant that enables dose reduction of other anesthetic drugs and limits their side-effects. Considered for decades as innocuous, there is emerging evidence that N2O carries a number of potential side effects3. It enlarges natural air spaces (bowels, lungs, tympanic cavity). It can cause transitory leucopenia4, postoperative nausea and vomiting5. N2O also increases plasma homocysteine for up to a week after surgery6,7. Elevation of plasma homocysteine causes endothelial dysfunction and mismatches between cerebral metabolism and blood flow8,9. As a result N2O may also lead to cerebrovascular dysfunction resulting in delirium, delayed neurocognitive recovery or persisting neuro cognitive disorders3,10. Existing evidence on the true impact of N2O on postoperative cognitive performance and recovery is however controversial. Whilst some reports (both animal and human studies) attribute postoperative learning difficulties, loss of memory, disorientation11,12,13 and reduced psychomotor performance to N2O14,15, others including randomized trials, fail to identify any detrimental effect of N2O on cognitive performance16,17,18,19,20. Some authors even find in animal studies a neuroprotective effect of N2O21.

Thus, the role of N2O in the development of delayed neurocognitive recovery remains to be determined. As part of the ENIGMA-II randomized multicenter trial initially designed to assess cardiovascular complications in patients undergoing major non-cardiac surgery, we performed a nested study assessing postoperative neurocognitive recovery. Using three computerized neuropsychological tests (thirteen outcome scores) of the Cambridge Neuropsychological Test Automated Battery (CANTAB), we compared postoperative neurocognitive recovery of patients receiving N2O in the anesthetic gas mixture administered during surgery with patients receiving N2O free anesthesia.

Methods

Study design and participants

The International ENIGMA-II study on Postoperative Cognitive Disorders (ISEP) trial was a randomized, controlled, multicenter parallel-group study performed to assess the effect of N2O on postoperative neurocognitive recovery. It was a study nested in the original ENIGMA-II trial in two participating centers (Hong-Kong and Geneva), both University affiliated Hospitals. Adults aged at least 45 years, at risk of cardiovascular complications and having general anesthesia for non-cardiac surgery exceeding 2 h were eligible. We excluded patients with untreated deficit in Vitamin B6, B12 and folic acid, those with marked impairment of gas-exchange requiring inspiratory oxygen concentration > 0.5, those with specific circumstances where N2O is contraindicated (e.g. colorectal, thoracic surgery). In addition, we also excluded patients with mini mental state examination (MMSE) test score ≤ 24 and advanced Parkinson’s disease, those suffering from alcohol dependency or taking tricyclic antidepressants or neuroleptics. Patients with a handicap (i.e. visual impairment) likely to hinder the correct performance of the CANTAB computerized neuropsychological tests were also excluded. The study protocol was approved by the Central Ethics Committee of the Geneva University Hospitals CER: 08–075 (NAC 08–021) and Joint Chinese University of Hong Kong – New Territories East Cluster Clinical Research Ethics Committee (CRE-2012.197-T). It also received approval by the Swiss Agency for Therapeutic Products (Swiss Medics). The research project was performed in accordance with institutional standards and regulations. The study is registered with ClinicalTrials.gov, number NCT00430989 (reg.02.02.2007) and NCT02489097 (reg. 02.07.2015).

Randomization, treatment allocation, blinding and data collection

The study was nested in the ENIGMA-II trial22 but was extended beyond the end of the original trial until May 2016. Following written informed consent from all study participants, they were randomized to receive a general anesthetic with a mixture of either N2O (70%) or Air (70%) in 30% O2. We used computer generated randomization in permuted blocks of 10 patients and stratified randomization by site. Group allocation could be accessed via an automated telephone voice-recognition service or in case of malfunction, it was sent directly by email to research staff. Patients, surgical team members, research staff including postoperative interviewers and cognitive testing assistants were blinded to group allocation. Only the anesthesiologist in charge was aware of the gas mixture provided during anesthesia.

On the day before surgery, after an initial screening for pre-existing cognitive disorders (MMSE test ≤ 24), Parkinson’s disease, visual or auditive impairment and other study exclusion criteria, a specially trained research assistant and a certified neuropsychologist performed the initial battery of cognitive tests CANTAB. The overall computerized battery includes 7 different categories of tests assessing the different domains of cognitive function (attention, executive functions, memory, visuospatial processing functions and language). These tests are designed to detect subtle changes in cognition. The CANTAB testing battery was preferred over other type of cognitive tests, since it is language-independent, culturally neutral and has been validated for the diagnosis of a wide range of cognitive disorders and syndromes23. For the study, we initially selected five tests to be performed at 7 and 90 days: screening tests MOT (Motor Screening Task), visual memory tests PAL(Paired Associates Learning), episodic memory tests PRM (Pattern Recognition Memory), decision making and processing speed tests RTI (Reaction time) and executive function tests OTS (One Touch Stocking of Cambridge). All these tests, except the MOT test, were chosen because they are assessing cognitive domains most likely to be influenced by anesthesia24. The MOT test is a standard screening test that is systematically performed before all other CANTAB tests in order to train participants to the use of the computer and touch screen technology24. It was therefore not selected as a main study outcome. The PAL test proved to be particularly challenging for patients in addition to all the other tests (average duration of testing sessions without PAL test: 45 min) and it was finally removed from the testing battery. Thus only the PRM test was used to assess memory, a choice which was approved by the neuropsychologists of our research team. Details of the CANTAB tests are provided in “Supplementary 1: Appendix 1”.

On the day of surgery, patients were randomly allocated (1:1) to N2O or N2O-free anesthesia. Group allocation prepared by study coordinators on each study site was provided in opaque sealed envelopes to the anesthesia team in charge. All patients received standard anesthetic and perioperative care. Choice of anesthetic agents, muscle relaxants, perioperative analgesia and prophylactic antibiotics was left to the discretion of the anesthesiologist in charge and only N2O administration and concentration was defined according to group random allocation. Additional neuraxial and other regional anesthetic techniques were accepted. Anesthesiologists were expected to maintain oxygenation, heart rate and blood pressure within the patient’s usual range at all times perioperatively and were advised to avoid intraoperative hypothermia (Temperature < 36 °C).Supplemental O2 could be administered at any time if required by impaired gas exchanges.

On the day of surgery, patients were assessed by a neuropsychologist in recovery for the presence of acute postoperative delirium according to the DSM-IV criteria [Agitation/restlessness; disorientation; speech confusion; attention deficit].

On day 7 following surgery, cognitive test administrators used the same computerized battery of tests (CANTAB) as in the preoperative period. To minimize learning effect and testing administrator bias, a parallel 1 version of PRM tests was used and the same administrator performed the testing process. On day 90 following surgery, the same computerized battery of tests (CANTAB) was used in a parallel 2 version of PRM test.

The CANTAB tests were administered either at the hospital or at home, if patients had been discharged. Quality of life was also measured using the EuroQol test (EQ-5D questionnaire http://www.euroqol.org/. When adverse events or unexpected outcomes were detected, further testing and clinical review were also organized.

We recorded patient demographics, risk factors, ASA scores, medication and all perioperative events and complications. To identify possible confounders for cognitive disorders, we measured anxiety and depression using the hospital anxiety and depression scale (HAD), the verbal pain score (VRS) and recorded all benzodiazepine use, cortisone, alcohol consumption opiates, antidiabetic drugs, non-tricyclic antidepressant drugs, antihypertensive drugs, natremia before each cognitive testing session. We also recorded preoperative blood glucose, hemoglobin level, blood pressure, heart rate and body temperature.

Study outcomes

The primary outcome was postoperative neurocognitive recovery at 7 days and testing repeated at 90 days. It was defined as the within subject change between preoperative (baseline) and postoperative scores for each of the outcome measures selected within the following CANTAB tests: 1) PRM (Pattern recognition memory) test; 2) RTI (reaction time) test; 3) OTS (One Touch Stocking of Cambridge) test. Details of the tests are provided in “Supplementary 1: Appendix 1”.

The secondary outcomes were postoperative delirium, the number of unplanned intensive care unit (ICU) admissions, duration of hospital stay and quality of life using the EQ-5D questionnaire25. We also measured all adverse events occurring after surgery in both groups. Study data collected were stored locally in a locked database before being all securely transferred to the Geneva Study Centre. After all queries from the database manager were answered, individual center data were cleaned, aggregated and finally transferred to the statistical unit of the Center for Clinical Research.

Sample size calculation

The sample size calculation was based on a cohort study assessing deterioration in cognitive performance (more specifically memory) following non-cardiac surgery16. To detect a 11.8% difference in cognitive impairment at 7 days with an increase from 15.7% to 27.5% between patients receiving or not N2O during anesthesia, we calculated a need for 190 patients per group, with 80% power and a significance level of 5%. To account for two planned interim analysis and the loss of follow-up (assumed to be limited) we targeted an overall recruitment of 420 patients. A constant likelihood group sequential method with formal futility boundaries was used with a two-sided Pocock stopping rule. There was no contingency for early termination for efficacy. An acceptance region plot (or a futility region plot) was generated using Spotfire SeqTrial for S+, TIBCO Spotfire S+ Version 8.2.0 for Windows, TIBCO Software Inc, Palo Alto, CA, USA https://edelivery.tibco.com/storefront/eval/tibco-spotfire-s-/prod10222.html. The two-sided futility boundary (for the differences in proportions between the N2O and the Air/Oxygen group) at planned interim analysis T1 (N = 140) was from -0.0244 to + 0.0244 and at analysis T2 (N = 280) from -0.0696 to 0.0696 (Supplementary 2: Appendix 2).

Statistical analysis

The trial data and safety monitoring committee monitored compliance and the analysis of study results. Because repeated cognitive testing following anesthetic care can generate significant anxiety in patients, two interim analyses assessing episodic memory (PRM test) were scheduled. One after enrolment of 140 and another after 280 patients. The interim analysis was adjusted according to the Pocock Type I error function. The futility boundaries were ± 0.0244 at the first and ± 0.0696 at the second analysis for the between trial groups difference at seven days in the primary outcome (proportion of successful results for immediate and delayed PRM tests)26.

Categorical variables were summarized as frequencies and percentages and continuous variables as means with standard deviation (SD). For primary and secondary outcomes, changes from baseline were compared between groups using the χ2 test or Student t test or Mann–Whitney test as appropriate. A modification of the intervention effect between days 7 and 90 was investigated by using mixed effect linear regression models with a random intercept considering only timing (day 90 or day 7) and study group (N2O vs Air/O2) as independent variables with fixed effects and an interaction term. For co-variates analysis we used multivariate linear regressions and an interaction between timing and study group was tested in a linear model with mixed effects27. Group allocation, duration of surgery and average concentration of sevoflurane used were introduced into the model.

Possible collinearity was tested and could be formally excluded. The final results are expressed as adjusted 95% CI and P values. A P-value of < 0.05 was considered statistically significant. We used the Statistical Package for Social Sciences-SPSS (Version 22, SPSS, Inc., Chicago-Illinois/US) and R (release 2.13.1; R Foundation for Statistical Computing, Vienna, Austria).

Results

Study course

At interim analysis 1, investigators submitted data to the Data Safety and Monitoring Committee. For the primary outcome used for sample size calculation (episodic memory[PRM test] :proportion of correct answers immediate and delayed) the between group difference observed at 7 days was < 0.0244 and within the stopping boundaries for futility at T1. Study recruitment proved also to be particularly difficult and resources falling short. Based on both arguments, the committee decided to stop the trial.

During the study period, 609 patients were found eligible for the study of which 140 consented to be randomized; 68 patients were in the Air/O2 study arm and 72 in the N2O/O2 group. By day 90, seven patients were lost to follow up (5 in the group Nitrous Oxide and 2 on group Air/Oxygen). The whole cognitive testing process (preoperative, day 7, day 90) could be achieved in 114 patients. In the remaining group of 28 patients, only preoperative, day 7 or day 90 testing could be performed. The study flow chart is provided in Fig. 1.

Figure 1
figure 1

Study flow chart.

All randomized patients were included in the statistical analysis and analyzed as complete cases. Both intention-to-treat and as-treated analyses for outcome measures of the CANTAB test were performed. Results are provided in “Supplementary 3: Appendix 3”.

Study participants’ characteristics

Baseline characteristics such as age, education level, MMSE and HAD tests scores were similar between groups (Table 1). Study patient mean age was 70.5 years in the Air/O2 group and 69.4 years in the N2O group. About two-thirds of patients were men. Alcohol consumption, benzodiazepine or antidepressant drugs use was similar in both groups. The mean inspired oxygen concentration did not differ between groups (P value = 0.18).

Table 1 Demographic data comparing patient in the N2O versus N2O-free anesthesia group.

There was no between group difference in procedure related data except a longer duration of surgery (35 min on average) and a higher concentration of sevoflurane consumption (expired concentration) in the Air/O2 group [1.2% vs. 0.7% ] (Table 2).

Table 2 Procedure and postoperative related data comparing patient in the N2O versus N2O-free anesthesia group.

Between group difference for primary outcomes

For outcome measures of the PRM test (episodic memory), the mean (95% CI) between group difference for the change in the proportion of correct answers (immediate recall) was -1.5% (-7.1 to 4.0), P = 0.583. At Day 90, it was -0.9% (-6.8 to 4.9), P = 0.744. For delayed recall at Day 7, the mean (95% CI) difference was 2.9% (-4.1 to 10.0), P = 0.406. At Day 90, it was 4.4% (-2.1 to 11.0), P = 0.182 (Table 3). These fell in the futility margins of the Pocock boundaries (Supplementary 2: Appendix 2).

Table 3 Group differences for outcome measures of the pattern recognition memory test (PRM).

For outcome measures of the RTI test (decision making and processing speed) there was no between group difference at day 7 or day 90. At 7 days, the mean (± SD) reaction time change from baseline (single stimulus mode) was 19.4 (85.5) msec in the Air/O2 group and 43.3 (164.7) msec in the N2O group. The mean (95% CI) between group difference was -23.9 (-75.2 to 27.4), P = 0.356 (Table 4).

Table 4 Group differences for outcome measures of the reaction time test (RTI).

For outcome measures of the OTS test (executive functions) we found a significant between group differences at 7 days (Table 5). The mean (± SD) change from baseline for the total number of problems successfully solved on first choice (out of 20) was -0.1 (2.6) in the Air/O2 group and 1.2 (2.1) in the N2O group. The mean (95% CI) between group difference was -1.3 (-2.7 to -0.1), P = 0.048 in favor of N2O. The mean (± SD) change from baseline for the number of box choices made to correctly solve the problem was 0.01 (0.1) in the Air/O2 group and -0.1 (0.1) in the N2O group. Mean (95% CI) between group difference was 0.11 (0.01 to 0.20), P = 0.029 in favor of N2O. At Day 90, the between group differences disappeared showing marginal differences with preoperative values, particularly for the RTI test, suggesting a full recovery of preoperative function at 3 months.

Table 5 Group differences for outcome measures of the one touch stocking test (OTS).

Covariates analysis and secondary outcomes

To adjust for confounding factors we selected covariates that were imbalanced between treatment and control groups and likely to be significantly related to the outcome (cognitive function)27. Following multivariate adjustment for duration of surgery and the concentration of sevoflurane (higher in the Air/O2) group, we found a persisting difference for the OTS test (P = 0.042 and P = 0.026). Results are provided in Table 6. As-treated and intention to treat analyses did not meaningfully differ (Supplementary 2: Appendix 2).

Table 6 Adjusted Group differences for outcome measures of the OTS.

In the analysis of secondary outcomes, we found that the number of unplanned ICU admissions was higher in the Air/O2 group than in the N2O group, 12 (17.6%) vs 5 (6.9%). This just not reach statistical significance (P = 0.05).

The duration of hospital stay, the number of patients with acute postoperative delirium, utility and pain scores for EuroQoL 5D were similar between both groups. The number of adverse events did not differ between both groups either, P = 0.66. (Table 7).

Table 7 Group differences for secondary outcomes and adverse events.

Discussion

Main results

The study purpose was to compare postoperative neurocognitive recovery in patients receiving N2O or N2O-free anesthesia. For all outcome measures of the CANTAB tests used for episodic memory and decision making/processing speed assessment, there were no between group differences at 7 and 90 days following surgery, suggesting harmlessness of N2O. Surprisingly, outcome measures of executive function tests significantly differed at 7 days. Patients receiving N2O had improved postoperative neurocognitive recovery compared with patients receiving Air/Oxygen. This finding cannot be explained by preexisting differences in the level of preoperative anxiety and depression (HAD), age, cognitive reserve or education level28,29 since all were equal between groups. For secondary outcomes (duration of hospital stay, delirium, unplanned admissions to the ICU, utility and pain scores for EuroQoL 5D) no difference was found between the two groups.

Possible explanations for associations identified or not between cognitive tests results and N2O administration

Our study findings significantly differ from those from animal studies which describe impaired memory and learning as well as neuro behavioral disturbances following anesthesia with N2O. A detrimental effect of nitrous oxide on brain function is advocated in these studies to explain these findings13,30,31,32. However, in these animal studies, subjects were submitted to long durations of exposure to N2O (between 4 to 8 h) and to the combined administration of several different anesthetic drugs and gases such as isoflurane. As a result the specific contribution of N2O to cognitive changes following anesthesia is difficult to demonstrate33. In studies on humans, only a limited number of studies (cases reports and small size studies) seem to support the evidence of the noxious effect of N2O on brain function14,34,35. These studies find modified psychomotor performance after surgery, particularly reduced reaction time and short term memory in patients having received N2O during their anesthesia. In one case report, severe lesions of the brain with nerve demyelination following the administration of N2O is even described36. However, large observational studies and randomized trials do not seem to confirm these findings, including in the long term (i.e. 3 months)16,18,19,20,37. Likewise in our trial, we did not find evidence of a detrimental effect of N2O on memory or reaction time, including at 90 days. In contrast, we found that patients receiving N2O compared with patients receiving Air/O2 had an improved recovery process of executive functions at seven days after surgery.

Some other authors in animal studies describe a neuroprotective effect of N2O21 and Leung JM et al19, found in a randomized trial involving 228 elderly patients a higher (but non-significant, P = 0.59) incidence of postoperative cognitive disorder in the Air/ O2 group compared with the N2O group (18.6% vs 14.8%). This finding similar to the one in our study requires further discussion. The association between an improved recovery process of executive functions following surgery and the use of N2O for anesthesia is quite unexpected.

Several hypotheses can be formulated to explain this finding. One is the sparing effect of N2O when used in combination with other anesthetic agents such as isoflurane or sevoflurane. It allows the use of lower concentrations of volatile anesthetics and consequently decreases the risk of detrimental cognitive side effects often observed when high doses of isoflurane or sevoflurane are used38,39. However, this cannot explain our study findings. Following multivariate adjustment for volatile anesthetic concentration and duration of surgery, a significant difference could still be observed for the OTS test. This suggests an independent effect of N2O on the recovery of executive functions after anesthesia. Another possible explanation is a regression to the mean phenomenon. Patients in the N2O group started with lower preoperative scores for the OTS test compared with patients receiving Air/O2. Their postoperative improvement could be simply explained by a natural variation following repeated testing. Another explanation could be the selective blockage of N-methyl-D-aspartate (NMDA) receptors and a possible neuroprotective effect of N2O. The NMDA channel allows the influx of Ca2+ into specific brain cells. This mechanism is considered as critical for synaptic plasticity and it can affect both circuit and brain function40,41. When extra synaptic NMDA receptors are over-activated by high glutamate secretion (i.e. stroke, brain ischemia) an excessive influx of Ca2+occurs, leading to excitotoxicity and progressive cellular death42. Blocking NMDA receptors could therefore protect ischemic neurons from cellular death and promote functional recovery43,44.

Since a high number of NMDA receptors are located in the prefrontal cortex involved in executive functions, their selective blockade by N2O may have protected this cerebral area from the effect of intraoperative stress or hypoxemia. Subgroup analyses of the IHAST Trial45 show that patients in the N2O group could be discharged home earlier and had improved recovery46. However further studies are needed to confirm this hypothesis. Another interesting finding of our study, although just not statistically significant, is the lower number of ICU admissions in patients receiving N2O. While this may be due to chance, it could also be the result of a lower number of postoperative complications in patients receiving N2O47. The mechanism to explain this finding is however unclear.

Strengths and limitations

A limitation of our study is the premature discontinuation of patient recruitment at first interim analysis. The trial fell in the futility margins of the Pocock boundaries for outcome measures of the PRM test, with no difference identifiable in episodic memory change between the two groups compared. There was however a significant difference in cognitive function recovery, using the OTS test. Yet this outcome had not been used for sample size and interim analysis boundaries estimations. Thus this result may be due to chance following interim result analysis and a fully completed trial may possibly not confirm this finding. Furthermore, we found that patients in the N2O group not only recovered better but even significantly improved their OTS score following surgery. This may seem counterintuitive and lead to the conclusion of an incorrect finding. This cannot be excluded. However a meta-analysis of patients having Coronary By Pass Graft (CABG) surgery48 confirms that cognitive performance can improve following surgery. This may be due to the beneficial effect of surgery on overall inflammatory status (once diseased organs have been removed) or to a learning effect of the cognitive tests administered29. This is particularly true when using tests sensitive to changes in the speed of psychomotor function. The OTS test used in our study is very sensitive to such changes. It assesses a fine set of cognitive abilities such as planning, decision making and impulse control. These functions involve for a large portion brain cells that are located in the pre-frontal cortex and that can easily be altered by perioperative stress, inflammation or transitory hypoxemia49,50. Significant improvement may be observed, particularly in patients receiving N2O. A third limitation of the trial is the confounding effect of the learning phenomenon when the same tests are performed several times. Participants’ performance automatically improves by learning. Score differences in cognitive testing may therefore reflect a learning effect rather than a true difference in cognitive function. But since we used, whenever possible, parallel versions of the CANTAB cognitive tests, a learning phenomenon is quite unlikely to explain our study results. The fourth limitation is the “ceiling effect”. It is observed in tests based on relatively easy tasks to perform. Patients can easily achieve a high score during the initial phase of the testing (for example preoperatively in our study) and a slight change in further testing may consequently not be detected since initial scores are already quite high. This increases the risk of type 2 errors in studies. In our trial, preoperative results of the reaction time test (RTI) were relatively high with a simple accuracy score of respectively 10.8 (3.3) and 11.2 (3.5) in the Air/O2 and N2O groups. This is however quite unlikely to have had an impact on our study results since we used several different outcome measures to compare each group. Many of these outcome measures are unlikely to be affected by a “ceiling effect” since they include no predefined upper limits (i.e. processing speed, latency time). A ceiling effect cannot occur since no maximum score can be reached.

Another limitation is the relatively low level of education of participants (two thirds had lower secondary or elementary school level). This may be responsible of performance limitations at the lower ranges of the tests, making decline more difficult to detect, particularly for tests that are less sensitive such as the PRM test. To minimize this effect we chose to define cognitive change following administration of N2O as a within individuals and between groups difference of either 1 SD or at least 25% difference in 1 or 2 tests. We also compared group differences using t tests for statistical differences 51.

Despite these limitations we found that N2O had no impact on postoperative episodic memory and processing speed functions at 7 days and 3 months following surgery. Patients who received N2O appeared even to have improved recovery of executive function at seven days. Due to the limitations of this interim study analysis finding, further studies are however needed to confirm a possible neuroprotective effect of N2O administered during anesthesia.

In conclusion, while confirming the harmlessness of N2O on executive memory and processing speed function this study opens interesting clinical and research perspectives on a possible use of N2O for high risk surgery, for patients with brain trauma or those having prolonged sedation in ICU care units and requiring neuroprotection.