Introduction

Adverse effects of carbon dioxide (CO2) on cognitive processes have been reported,1,2,3 but the effects observed occurred at CO2 concentrations that were considerably higher than those deemed safe by regulatory agencies. However, studies4,5 using the Strategic Management Simulation (SMS) test to assess complex decision making demonstrated effects of CO2 on decision-making performance at or below 2500 ppm, a level that is half that of the permissible exposure limit for CO2 set by the Occupational Safety and Health Administration. The SMS detects cognitive deficits resulting from traumatic brain injury at decrement levels well below the threshold of sensitivity of traditional psychometric methods.6,7 Therefore, effects on cognitive functions observed with the SMS4,5 at surprisingly low levels of CO2 may be an outcome enabled by the greater sensitivity of the SMS to cognitive impairments.

On the other hand, the findings of effects of low concentrations of CO2 upon cognition are controversial and the literature is unsettled. No statistically significant effects on acute health symptoms or cognitive performance were seen during exposures of college students for 4.25 h to pure CO2 at 1000, 3000, or 5000 ppm.8,9,10 However significant decrements in cognitive performance were found when subjects were exposed to metabolically produced CO2 at 3000 and 5000 ppm.8,9,10 Zhang8,9,10 concluded that exposures to moderate concentrations of bioeffluents (BEs), but not CO2, will cause deleterious effects upon cognitive performance.

Disparity in outcomes that have assessed effects of CO2 on cognition are not limited to studies that have employed different methods of assessment. Recently, a study conducted at the Naval Submarine Medical Research Laboratory with 36 US submariners produced no significant differences in any SMS measures when results from CO2 exposures at 2500 and 15,000 ppm were compared to those at 600 ppm.11 The conflicting outcomes between that study11 and others4,5 that have used the SMS to assess effects of CO2 upon cognition recapitulates the conflict in outcomes obtained with traditional psychometric methods.12 This suggests that the reason for the disparate outcomes among studies is likely less related to differences among the cognitive tests used than to differences among other features of the studies.

It may be that characteristics of the subjects are important determinants in the outcome of the studies in which effects of CO2 upon complex decision making are assessed. The study of Satish4 involved a cohort of college-age students. In a different study, which included professional class employees, Allen5 found that performance on the SMS was adversely effected at concentrations as low as 950 ppm. On the other hand, a study performed by Rodeheffer11 using submariners of the US Navy, who are highly motivated and accomplished and who were admitted to their chosen profession after being screened by highly stringent processes that select applicants for their ability to maintain very high levels of performance while operating under duress in an extremely hostile environment, found no performance decrement on the SMS when the submariners were subjected to 2500 or 15,000 ppm CO2. It has been well established that different experience levels and age have an effect on the choice of decision making paradigm.13,14,15,16,17 Given the disparity in outcomes among the various studies however, there is no basis for predicting how CO2 would affect cognitive processes of astronauts.

Because it is not unusual for CO2 levels aboard the International Space Station (ISS)18 to exceed levels at which cognitive effects of CO2 were observed by Satish,4 and because thresholds for some clinically significant effects of CO2 are considerably lower in space than they are on the ground,18 it was important to determine whether the cognitive functions associated with complex decision making of crew-like subjects are affected by acute exposures to CO2 at concentrations that are routinely encountered aboard the ISS. Therefore, to examine the significance of the effects of acute exposures to CO2 on cognition within the contexts of NASA’s needs for behavioral health management and toxicity assessment, we have used the SMS to determine if acute exposures to CO2, at or below operationally relevant concentrations, affects cognitive functions of astronaut-like subjects.

The Spaceflight Cognitive Assessment Tool for Windows (WinSCAT) has been used operationally on the ISS on all expeditions. It provides crew surgeons with a tool to assess an astronaut’s cognitive status. WinSCAT is scheduled to be taken monthly but may be taken whenever a crewmember desires a self-assessment.19,20 However, WinSCAT may suffer from a ceiling effect, which occurs when high-performing subjects achieve perfect scores with no measureable difference between subjects at the ceiling level. Therefore reduced performance variance near the ceiling levels will result in an unreliable estimate of population performance variability. Traditional psychometric tests may show effects of severe trauma but not be sufficiently sensitive to assess or predict changes in operational efficiency that could have impacts on crew health, or safety.19,21 Thus for several reasons, including small sample size, learning effects, and lack of sensitivity, “our knowledge about cognitive effects of spaceflight is superficial”.22

A cognitive test battery, called Cognition, has been designed specifically to avoid a ceiling effect when assessing spaceflight crews. The 10 tests included in the battery cover a range of cognitive domains relevant for successful spaceflight operations and have been mapped to underlying neural substrates by functional magnetic resonance imaging (fMRI).23,24 Therefore, this tool provides bridges between cognitive models, neuroscience, and behavior, and is likely more sensitive in astronauts than tools that have been designed for a standard clinical population.

Spaceflight crews have often reported symptoms, such as problems concentrating, headaches, and on some occasions, dissatisfaction with their cognitive performance.21 The potential causes for performance decrements during space missions are many (e.g., CO2, fluid shifts, poor sleep, fatigue, stress, high workloads), but it is not possible to independently assess the effect of each in a space vehicle. Therefore a ground-based study, free of potential confounders that would be present during a space mission, was conducted in which effects of operationally relevant concentrations of CO2 on cognitive functions of astronaut-like subjects were assessed with Cognition24 and with the SMS.4 Because several components of Cognition assess cognitive functions that are important to adaptive decision-making, findings from Cognition also provide context for the interpretation of assessments of complex decision-making made with the SMS. This study provides a baseline terrestrial dataset for effects of CO2 on cognitive functions against which data collected with these tests during spaceflight may be compared.

Results

Participants were randomly assigned to one of four groups. Six subjects were successfully recruited for all but the last of the four groups, which included four subjects. The 22 subjects included 14 men and 8 women. The average age for all participants was 38.8 (ranges 31–53 for men and 31–51 for women).

Subjects continuously wore wrist activity monitors (Actigraph wActiSleep-BT) for assessing sleep–wake patterns starting 1 week prior to the first exposure until after the last exposure. Actigraphy demonstrated very good compliance with the requirement to maintain their normal sleep durations (determined prior to the first exposure) during the course of the study. The average amount of night sleep during the week preceding exposure, and total sleep during the night preceding each of the exposures, did not differ significantly among the targeted CO2 concentrations. Although the amount of sleep during the night preceding each of the exposures did not differ significantly among the targeted CO2 concentrations, when investigated as a covariate, the amount of sleep by an individual preceding each exposure was found to be a significant covariate for the variable Initiative (p = 0.0332).

The data demonstrate that 600 ppm CO2 was maintained within ± 10% and the other three concentrations were maintained well within ± 5% of the targeted concentrations. The means and standard deviations for environmental variables at each of the targeted CO2 concentrations are given in Table 1. None of the environmental variables differed significantly among the targeted CO2 concentrations. Oxygen was maintained between 20.9% and 21.1%. Atmospheric pressure varied from a minimum of 755 mmHg to a maximum of 765 mmHg. Temperature and relative humidity of the subject-occupied area of the chamber were maintained in the ranges 67–72°F and 58–70%, respectively. With respect to noise levels in chamber, the total number of instances per hour in which the maximum level with A-weighted frequency response and slow time constant (LAS,max) exceeded 70 dB on any of three sound dosimeters over the course of the exposures ranged from 3 to 6.5, and the average level of LAS,max in excess of 70 dB ranged from 71.5 to 74 dB among the targeted CO2 concentrations.

Table 1 Environmental parameters

Estimated means of each of the SMS measures at each of the targeted concentrations of CO2 are shown in Fig. 1. All measures of complex decision-making changed significantly from their baseline values at 600 ppm when CO2 was increased to 1200 ppm (Fig. 1). For eight of the nine measures, scores decreased; however, for Information Utilization, the score increased at 1200 ppm. At 2500 ppm, only Task Orientation and Applied Activity scores were significantly different from baseline measures, and both measures exceeded their baseline values. At the highest concentration of 5000 ppm, again only two of the measures differed significantly from baseline. At this concentration, Focused Activity Level exceeded the baseline value, and Basic Activity was less than baseline.

Fig. 1
figure 1

Means ± 95% confidence intervals of SMS measures at each targeted concentration of CO2. The raw scores assigned for each measure are linearly related to performance, with a higher score indicating better performance. Values are based on the relationship to established independent standards of performance among thousands of previous SMS participants.4 Measures for Initiative are the log-transformed values. *The threshold for significance used for post hoc comparisons by pairwise contrasts of adjusted predictions was p < 0.008, which was derived by dividing 0.05 by 6, the number of post hoc pairwise comparisons made

Raw scores that have been normalized to the percentile ranks are illustrated in Fig. 2. In contrast to a prior report,4 percentile ranks on all measures were always average or higher at all concentrations of CO2 targeted. Average percentile ranks were most often observed when subjects were exposed to 1200 ppm CO2, and better than average percentile ranks were the norm at the other concentrations tested.

Fig. 2
figure 2

Mean ± 95% confidence intervals of percentile ranks for SMS measures at targeted concentrations of CO2. Decision-making performance scores were converted to percentile ranks by indexing against scores of performance measured in more than 20,000 subjects ages 16–83 years who were chosen to represent the working population of the US.4 The baseline is composed of responses by a variety of members of this population, including students, professionals, homemakers, and laborers

The effect on most SMS measures, as CO2 was increased from baseline to 1200 ppm, was a decrease in performance that was comparable to those observed in other studies.4,5 When viewed as a percentage change from the baseline (Fig. 3), the SMS measures that were most adversely affected differed among the studies but pairs of studies were similar to each other. Similarities in the set of most affected measures were greater when the study of Satish4 was compared to that of Allen5 and when this study was compared to that of Rodeheffer.11

Fig. 3
figure 3

Percent change of SMS scores from baseline at elevated concentrations of indoor pollutants determined in several studies. When viewed as a percentage change from the baseline, the SMS measures that were most adversely affected differed among the studies but similarities in the set of most affected measures were greatest between the reports of Satish4 and Allen.5 In the Study of Allen5 most affected measures were the same for CO2 and VOCs. VOCs—volatile organic compounds

Raw scores for all Cognition tests were examined for outliers by multiple methods. Removal of data points flagged by the majority of methods as potential outliers produced no effect on outcomes, and therefore the analyses were conducted using the complete data set. Estimated means for accuracy and for speed for all Cognition measures, at each of the CO2 concentrations targeted, are shown in Fig. 4.

Fig. 4
figure 4

Mean ± 95% confidence intervals of accuracy (a) and speed (b) for the 10 cognition measures by group at each of the targeted CO2 concentrations (600, 1200, 2500, 5000 ppm). p-Values refer to Type-III fixed effects of variance (with p < 0.05 indicative for at least one concentration differing from the overall mean)

The p-values for summary statistics of Cognition results are provided in Table 2. Mixed models were used to estimate group least-square means and their differences, and to determine whether the difference was significantly different from 0 (LSMEANS statement in SAS). Only one of the 10 measures showed a statistically significant (p = 0.0019) difference from baseline (600 ppm). This was an improved score (Percentage Correct [PC]) on the Visual Object Learning Task (VOLT) at 2500 ppm. This difference remained significant at p < 0.05 after correcting for multiple testing with the false discovery rate method (N = 20 tests).25 Digital Symbol Substitution Task (DSST) and Psychomotor Vigilance Test (PVT) accuracy outcomes were transformed to binary outcomes (1 indicating 100% correct on the DSST and > 90% of non-lapse and non-false start responses on the PVT) and non-linear mixed effect models equivalent to model 1, described in Methods, were run. Likelihood ratio tests based on the full model and a model with CO2 condition removed indicated a significant effect of CO2 condition for the DSST (p = 0.0260). Regression model contrasts indicated that subjects were more likely to achieve 100% accuracy in the 2500 ppm condition relative to 5000 ppm (p = 0.0078). The estimated probabilities for 100% accuracy on the DSST were 72.3%, 72.8%, 80.9%, and 56.6% for 600, 1200, 2400, and 5000 ppm, respectively (estimates for test 1, session 1, and average pre-exposure probability of 61.4%). For the PVT, the probability of achieving an accuracy score of >90% decreased in a dose–response like fashion from 79.5%, 74.7%, 73.4%, to 64.0% for 600, 1200, 2400, and 5000 ppm, respectively (estimates for test 1, session 1, and average pre-exposure probability of 72.7%). However, there was no significant main effect of CO2 condition for the PVT (p = 0.4114).

Table 2 Cognition summary statistics

The Cognition battery was administered early and late during the exposure period. Expected practice effects were noted for 5 of the 10 Cognition speed outcomes (Average Response Time [ART]) and accuracy on the Fractal 2 Back test (F2B) and Emotion Recognition Task (ERT) (Table 2), but no significant interaction between CO2 concentration and exposure duration could be found for any of the Cognition outcomes (all p > 0.05). Finally, Cognition performance post-exposure did not differ significantly between CO2 concentrations (adjusting for pre-exposure performance, all p > 0.05).

In addition to analyzing results of performance on the individual tests, analysis of aggregated standardized scores across tests was also performed. These data are summarized in Fig. 5. No significant main effect on speed (p = 0.0921), accuracy (p = 0.6304), or efficiency (p = 0.2976) were found, but response times, accuracy, and efficiency were lowest during exposures to 1200 ppm. While the overall effects were not statistically significant, they do indicate a trend for reduced accuracy, speed, and efficiency at 1200 ppm. However, performance across tests did not differ between baseline (600 ppm) and the higher concentrations.

Fig. 5
figure 5

Evaluation of standardized scores of speed, accuracy, and efficiency across tests (higher scores reflect better performance). The p-values for significant differences in overall speed across tests achieved at different CO2 concentration are given on the graphs for Overall Speed. Error bars indicate the 95% confidence intervals

The number of subjects that we assessed was based upon a power analysis of data from the study of Satish.4 Table 3 shows that the average coefficient of variation (CV) in our study at 2500 ppm (0.35) and 5000 ppm (0.34) was less than the CV at 1200 ppm (0.52) and also less than that of Satish4 at 2500 ppm (0.49) at which concentration effects on the SMS were pronounced. The CVs in the study of Rodeheffer,11 which reported no effects of CO2 upon performance on the SMS, ranged between 0.47 and 0.53. Therefore, the comparisons of CV among the studies indicates the absence of significant effects at our two higher concentrations was not a consequence of a greater variability, and hence less power to detect significant differences, at those concentrations.

Table 3 Coefficients of variation of measures of the SMS from several studies in which performance was assessed during exposures to CO2

None of the subjectively assessed outcomes differed significantly between CO2 exposure concentrations (p > 0.05). The estimated means of all outcomes were in the bottom half or third of the scale.

Discussion

A principal aim of this study was to determine if the adverse effect of low concentrations of CO2 on the decision-making abilities of predominantly young college-age adults4 could be replicated in older astronaut-like individuals. Clearly, the dose-dependent, monotonic, reciprocal relationship between CO2 concentration and performance on the SMS that was demonstrated in earlier studies4,5 was not replicated in this study, which included concentrations within the ranges used in those earlier studies (Fig. 1 and 2). Interestingly, the response from baseline to 1200 ppm, for most measures, exhibited a decrease in performance that was comparable to those observed in other studies (Fig. 3). However, this trend did not hold in this study population at higher concentrations.

Our findings at 2500 and 5000 ppm diverge from those anticipated by the findings of earlier studies that demonstrated substantial effects of CO2 upon performance on the SMS at lower concentrations4,5 but the absence of an effect at 2500 ppm replicates the finding of Rodeheffer11 at that concentration. On the other hand we detect effects at 1200 ppm, as have other studies.4,5 Therefore, our findings, in part, both comport with and diverge from the finding of others.4,5,11 Several factors, discussed below, may contribute to our unusual and unexpected findings.

One potential variation among the studies that could affect differences in performance is the amount of sleep obtained by subjects preceding their exposures to CO2. Although the amount of sleep during the night preceding each of the exposures did not differ significantly among the targeted CO2 concentrations, the amount of sleep by an individual was a significant covariate for the variable Initiative (p = 0.0332). The sleep status of the subjects was not reported in the studies of Satish.4 In a study26 in which the SMS was utilized it was observed that an improvement of 25% in sleep score was associated with a 2.8% increase in cognitive function scores. If decrements follow a reciprocal relationship to that shown for improvements then, because our subjects averaged only 6.3 h of sleep during the nights preceding their exposures (the nightly average of the general population is 6.8 h27), the difference between the large percent decrease in cognitive scores seen in the study of Satish,4 and the absence of similar effects in this study at the same concentrations of CO2, could be expected to be attributable to differences in the sleep status of the subjects of the two studies only if sleep scores among Satish’s4 subjects were well below those of this study.

Differences in characteristics of various subject populations may account for diverging outcomes in studies assessing effects of CO2 upon decision-making. It may be that astronaut-like operations personnel and submariners, who are high-level performers, are more likely to have heightened situational awareness because of their stringent training. Therefore, these groups may develop faster adaptive patterns of responses and be more perceptive of their cognitive decline, and therefore may compensate more efficiently for self-perceived drops in performance than subjects drawn from the general population. Such distinctions could explain the differences in outcomes between college students4 and submariners11 to elevated CO2, but the decrements in performances of astronaut-like subjects that occurred when they were exposed to 1200 ppm CO2 are inconsistent with this account.

There is abundant evidence that the default decision-making paradigms of young and/or novice individuals differ from those of older and/or experienced individuals.13,14,15,16,17 The former most often make use of expected utility or compensatory decision-making paradigms and the latter are more likely to employ heuristics or noncompensatory mechanisms.13,14,15,16,17 If the decision-making paradigms were different among different subject populations and the SMS provides a more sensitive measure of one paradigm than the other, such circumstances could produce the disparities in outcomes that have occurred among studies that have utilized the SMS to assess the effect of CO2 upon complex decision making.4,5,11 Populations that are using similar decision-making paradigms may be more likely to share similarities in the subset of SMS measures most affected by CO2 than those that are using different decision-making paradigms. Fig. 3 illustrates greater similarity in most affected measures between the studies of Satish4 and Allen5 and between our study and that of Rodeheffer11 when the effect is measured as a percentage deviation from baseline values. A post hoc analysis of variance with the data from Table 3 showed that the means of individual measures of the SMS are most often significantly different between the study of Satish4 and that of Rodeheffer11 and between Satish’s study4 and our study, whereas there were few measures with a significant difference between our study and that of Rodeheffer11 (Table 4). The subjects of Satish’s study4 were predominately college students whereas subjects of the studies of Rodeheffer11 (US submariners) and this study (astronaut-like subjects) were older and principally from operations-oriented disciplines. The performance scores reported by Allen’s study,5 which involved professional-grade employees (architects, designers, programmers, engineers, creative marketing professionals, managers), were normalized to a unique experimental condition and so could not be directly compared to those of other studies. Our subjects exhibited performance decrements at 1200 ppm comparable in magnitude to those observed in Satish4 and Allen5 at similar concentrations. This finding indicates that the SMS is also sensitive to CO2-induced decrements in the decision-making paradigm that may be shared by astronaut-like subjects and submariners, which likely differs from that of the subjects of the studies of Satish4 and Allen.5 Therefore, we conclude that it is unlikely that disparities in outcomes among the studies that have assessed effects of CO2 on complex decision making with the SMS are due to differences in the sensitivity of the SMS to different decision-making paradigms used by the various subject populations. The disparities are more likely due to differing characteristics of the various subject populations and differences in the aggregation of unrecognized stressors, in addition to CO2.

Table 4 Significance differences of measures of the SMS from several studies in which performance was assessed during exposures to CO2

Because the decrements in performance on the SMS observed when 1200 ppm CO2 was targeted were not observed at higher concentrations of CO2, the possibility that the effect observed could have arisen from circumstances that were unique to conditions during exposure at 1200 ppm was considered. Ventilation rates differed between the exposures at 600 ppm and those at the three higher concentrations of CO2. When 600 ppm was targeted, CO2 produced metabolically by the subjects was prevented from accumulating by continuous operation of a blower that brought outside air into the third level of the chamber at 4.5 m3/min. CO2, when required, was introduced via the heating, ventilation, and air conditioning (HVAC) system, which at this targeted concentration was operated continuously at 5.4 m3/min. With all other targeted CO2 concentrations, the fresh air blower was disengaged and the HVAC flow was operated continuously at 5.1 m3/min. Therefore, accumulation of volatile organic compounds (VOCs) and/or BEs emitted by the subjects would be expected to be lowest when 600 ppm CO2 was targeted and higher during exposures to the other concentrations of CO2 during which no outside air was brought into the exposure chamber. Because accumulation of VOCs or BEs have measurable effects on performance on the SMS,5,26,27,28,29 it would be expected that if these agents contributed to the depressed performance at 1200 ppm then their effects should also have been evident when the two higher concentrations of CO2 were targeted unless these effects were alleviated by the higher concentrations of CO2. Increased CO2 blood concentrations elicit a number of physiological responses triggered by a pH-induced stimulation of central and peripheral chemoreceptors, including increases in heart rate and minute ventilation, cerebral arterial vasodilation, and central nervous system (CNS) arousal.30,31,32,33 For these reasons, it is plausible that a slight to moderate increase in CO2 levels increases CNS arousal and cognitive performance. However, the possibility of mitigation of effects of BEs by the higher levels of CO2 seems disallowed by reports of adverse effects on performance on the SMS4,5 in subjects exposed to CO2 at lower concentrations and ventilation rates sufficiently high to effectively purge BEs4,5 (Table 5).

Table 5 Exposure parameters

Findings converse to those discussed above4,5 have been reported8,9,10 from studies in which moderate accumulations of metabolically produced CO2 and accompanying BEs, but not exposures to identical concentrations of pure CO2, caused decrements in cognitive performances.8,9,10 The finding by Zhang8,9,10 provide no support for the hypothesis that adverse effects of VOCs and BEs may be mitigated by CO2 at our higher concentrations, unless the comparable levels of CO2 in Zhang’s studies were accompanied by substantially greater levels of BEs than those in our study. The levels of BEs were not reported by Zhang but they could have been well in excess of the levels of BE accumulated in this study because our targeted concentrations of CO2 were attained in a chamber volume that exceeded that of Zhang by a factor >2.5 (Table 5) and, unlike Zhang,8,9,10 exogenous CO2 had to be added to achieve our high targeted concentrations.

Mitigation of CO2 effects due to VOCs and BEs at the higher concentrations in this study may be refuted by the observation of performance decrements among office workers in locations described as afflicted with sick building syndromes. In these locations, high levels of VOCs and BEs are accompanied by elevated levels of CO2. However, in these settings, the sources of VOCs are potentially much greater than those in exposure chambers of controlled studies, and other environmental factors may be influencing performance as well.28

Although it is possible that CO2 at higher concentrations mitigates effects of BEs and/or VOCs in this study, in view of the disparate outcomes among this study and the various studies that have assessed the effects of CO2 upon complex decision making4,5,11 or general cognitive performance,8,9,10 it seems most probable that differing characteristics of the various subject populations and differences in the aggregation of unrecognized stressors, in addition to CO2, were responsible for the varied, disparate, and conflicting outcomes among these studies.

A principal objective in utilizing Cognition was to investigate whether performance on this test battery, which was specifically designed for the high-performing astronaut population, is affected by short-term exposure to levels of CO2 routinely occurring on the ISS. This ground-based study avoided other environmental stressors typically encountered on the ISS that could have confounded the effects of CO2 on cognition (e.g., fatigue, stress, high workloads) and permitted a direct assessment of the effects of brief exposures to low concentrations of CO2 on cognitive functions assessed by Cognition. A significant CO2 main effect was only observed for accuracy on the VOLT and for the probability to achieve perfect accuracy on the DSST. However, there was no clearly discernable dose–response pattern for any of the individual measures of Cognition.

When the results obtained with all Cognition measures were taken in aggregate, a slight decrease in performance at 1200 ppm relative to 600 ppm was observed. Performance with higher, but still modest, CO2 concentrations (2500 and 5000 ppm) were similar to performance at baseline (600 ppm). With effect sizes <0.2, the differences between CO2 conditions were small. This “dose–response” of performance on Cognition to CO2 recapitulates the dose–response obtained with the SMS test, which was administered during the same exposure sessions. It seems likely that the factors that were responsible for the dose–response pattern seen with the SMS, identified in the earlier discussion of results of the SMS, also produced the similar pattern in the aggregated scores of Cognition. The convergence of results obtained with Cognition and with the SMS provides confidence in results that differ significantly from those anticipated by the findings of Satish.4

The effects of short-term exposure to CO2 concentrations of up to 5000 ppm on Cognition performance were small and with no dose–response function that would indicate decreasing performance levels with increasing CO2 levels. Past studies on the effects of elevated CO2 levels on cognitive performance investigated substantially higher CO2 concentrations, and only some studies found effects on cognitive performance.12 As noted earlier, it is plausible that a slight to moderate increase in CO2 levels increases CNS arousal and cognitive performance. Based on the paucity of literature, symptom reports related to increased levels of CO2, and the CNS arousing properties of CO2, both positive and negative associations between CO2 levels and cognitive performance were plausible outcomes of our study.

The current findings suggest that performance on Cognition is not relevantly affected if astronaut-surrogate subjects are exposed to CO2 concentrations of up to 5000 ppm for less than 3 h. On the other hand, it could be that none of the 10 Cognition tests was sensitive enough to detect subtle CO2-induced changes in cognitive performance, or that the 10 tests did not cover those cognitive domains that would be considerably affected by elevated CO2. This is unlikely, however, as Cognition covers a range of cognitive domains and has been shown to be sensitive to other stressors like sleep loss,34,35 recovery from anesthesia,36 and head-down tilt bed rest.35 It is thus more likely that any observed effects induced by short-term exposure to CO2 concentrations of up to 5000 ppm were simply too subtle to induce relevant changes in performance on the measures of Cognition.

Interestingly, a recently published study on the effects of 12° head-down tilt with and without elevated levels of CO2 also found the VOLT as the most sensitive test relative to 5000 ppm CO2 levels.35 Therefore, it could be that the medial temporal cortex and the hippocampus are especially sensitive to changes in CO2 concentration, with concomitant changes in memory performance.

SMS and Cognition test performances assessing a range of cognitive domains important for safe spaceflight operations suggest minor effects of an exposure for <3 h to CO2 concentrations of up to 5000 ppm in the investigated ground-based population. Both the SMS and Cognition demonstrated a slight performance decrease at 1200 ppm relative to 600 ppm. Our results are unique and comport with neither those of Satish4 or Rodeheffer,11 which conflict with each other in their conclusions regarding the effect of CO2 on complex decision-making as assessed by the SMS. It is possible that the effects we observed on both the SMS and Cognition may be due to accumulated VOCs and BEs, and the recovery of performance with higher but still modest CO2 concentrations may be related to the excitatory and vasodilatory properties of CO2. However, in view of the disparate outcomes among this study and the various studies that have assessed the effects of CO2 upon complex decision making4,5,11 or general cognitive performance,8,9,10 it seems most probable that differing characteristics of the various subject populations and differences in the aggregation of unrecognized stressors, in addition to CO2, were responsible for the varied conflicting outcomes among these studies. Environmental control and life support systems of spacecraft are required to avoid accumulation of VOCs and BEs. Additional studies of acute exposures, along with studies of longer exposure durations and studies that evaluate the effects of acute CO2 spikes on top of an elevated background, are needed to further evaluate potential adverse impacts of CO2 on decision-making and cognition during spaceflight operations.

Methods

The study was reviewed and approved by the Institutional Review Board (IRB) of the Johnson Space Center (JSC). Written informed consent was obtained from the human participants who took part in the study. Twenty-two healthy, astronaut-like persons at the JSC were recruited by the Human Subject Test Facility at JSC to participate in this investigation. Volunteer subjects were selected according to inclusion criteria that are used in the selection of astronaut candidates (see Subject Criteria in Supplementary Method). Exclusion criteria (see Subject Criteria in Supplementary Method) were used to avoid potential risks to the subject or study.

A double-masked format was used in which both the subjects and the experimenters and data analysts were unaware of the CO2 concentrations used during any of the exposure sessions. Our experimental design involved four groups, each composed of 4–6 subjects who participated in repeated trials of the experiment under varying concentrations of CO2. Subjects participated in each of four different conditions: 600, 1200, 2500, and 5000 ppm CO2. Each group was exposed to one concentration of CO2 on 1 day in each of 4 consecutive weeks. We randomized groups to one of four different dose exposure sequences (dose orders: A, C, B, D; B, D, A, C; D, A, C, B; C, B, D, A).

Each group was exposed for ~3 h in the morning, on 1 day each week, for 4 consecutive weeks. Groups 1 and 2 completed their full sequence of exposures before exposure sessions were begun with groups 3 and 4. Each session included the steps and intervals illustrated in Fig. 6.

Fig. 6
figure 6

Sequence and durations of events on days of exposure. The sequences and duration of tests and intervening rest periods on days of exposure are indicated on the time line

Significant work responsibilities prevented some subjects from attending all sessions. After permission of the IRB at the JSC was secured, additional sessions were scheduled with subjects who were willing to reschedule a missed session. As with the regularly scheduled sessions, only the chamber operators were cognizant of CO2 concentrations targeted in the rescheduled sessions. Two of these sessions targeted 1200 ppm for members of group 2, others targeted 600 and 2500 ppm for members of group 4, and 5000 ppm for a member of group 3. Sessions could not be rescheduled for three subjects, members of groups 1, 2, and 4. Two of the subjects missed sessions in which 5000 ppm was the targeted exposure concentration and the third missed a session in which 1200 ppm was targeted. The use of makeup sessions resulted in no unique exposure sequences. In all cases, subjects who did not complete the full complement of exposures had the best performances (aggregate score for all SMS measures) at 2500 ppm. The actual subject-groupings and exposure sequences are shown in Supplementary Table 1 that is available on-line.

Information on the quantity and quality of sleep of each subject was provided by data from actigraphy and sleep logs. Subjects were required to wear an actigraphy watch (Actigraph wActiSleep-BT) for 7 days before their first exposure and throughout their entire participation in the study.

Exposures were performed on the first floor of a human-rated three-story, 20-foot chamber (229 m3 total volume) at JSC. The facility was configured to support the safe evaluation of human subjects at elevated concentrations of CO2 for a period up to 4 h at sea level, with normal O2, and room temperature conditions. A Pressure Control System was modified (both mechanically and via firmware) to provide the introduction, monitoring, and control of CO2 for the chamber. The chamber has an adjustable HVAC system and a dedicated two-speed positive pressure blower. Both were used to maintain temperature and humidity, and CO2 in the desired ranges. To maintain CO2 at the lowest concentration targeted, a blower was used to prevent accumulation of CO2 produced metabolically by the subjects. This blower augmented ventilation provided by the HVAC (5.4 m3/min) by bringing outside air into the third level of the chamber at a rate of 4.5 m3/min. For all other targeted CO2 concentrations, the fresh air blower was disengaged, and the HVAC flow was decreased to 5.1 m3/min. Two high-resolution (0–5000 ppm) and two low-resolution (0–7000 ppm) sensors were located on the first level, and two low-resolution sensors were located on the unoccupied second and third levels. In addition to CO2, oxygen content, relative humidity, pressure, and temperature were monitored and recorded for the first level of the chamber. Three noise dosimeters were distributed in the exposure chamber. These dosimeters were accurate between 70 and 140 dB.

The primary outcomes for the study are the cognitive performance measures provided by the same SMS software that had been used to examine the effects of elevated levels of CO2 on aspects of cognitive decision-making by college students.4 Nine cognitive scores (each derived from multiple measurements built into a computer program that subjects interact with) were assessed under three different CO2 levels 600, 1000, and 2500 ppm. The factor scores resulting from the SMS software are continuously scaled and normally distributed, and appropriate for analysis by standard parametric statistical methods. From these data, we extracted the means, variability measures, and correlations among repeated measures necessary to derive power curves that associate the likelihood of detecting effects of similar magnitude among these three levels of CO2 on the nine cognitive factors. Power analysis indicated that a minimum n of 20–25, would be sufficient to exceed 80% power to detect differences between 1200 and 2500 ppm on all nine of these cognitive factors, and five of the nine factors in the 600 vs. 1200 ppm comparisons.

We used the SMS4 (Upstate Medical University, State University of New York) in our assessment of effects of each of four concentrations of CO2 on cognitive functions. SMS test simulations are broad, open-ended performance-based test scenarios that assess wider range of neural substrates than those that assess one or a small subset of executive functions. Therefore, a broader survey may provide a greater range within which to detect decrements.4 The SMS is unique in that it assesses the process of adaptive decision-making (planning, execution, and monitoring), whereas other psychometric tests typically assess individual or more limited sets of executive functions. Executive functions are high-level abilities that influence more basic functions, and include initiation, planning, sequencing, monitoring (attention), problem solving, working memory, divided attention, flexibility, and motor skills.37 Executive functions are important for adaptation and performance in real-life situations. In real world settings, options, priorities, and requirements are not always evident, outcomes depend on self-initiated actions and monitoring, and the effects of choices and actions may not be apparent. The SMS test simulations expose subjects to situations in which decisions must be made in conditions of volatility, uncertainty, complexity, and with delayed feedback.38 Decision-making competence is assessed in the SMS by how information is applied to make a decision. This is in contrast to assessments of decision-making that assess what was decided.

Prior to the first testing session, subjects were provided with a training session in which they were familiarized with the operation of the SMS during an abbreviated presentation of a scenario. Four scenarios were used during the study. Each of the four scenarios was used once with each group, the order of presentation of the scenarios was the same in each group and therefore the CO2 concentrations during which each scenario was presented differed among groups. The availability of multiple scenarios allowed retesting of subjects greatly reduces bias due to experience and learning effects, and intra-subject variability is low.4 Scenarios were presented to subjects via personal computer along with a variety of options to deal with the circumstances presented, including the option to do nothing. All subjects received the same quantity of information at fixed points in the simulated time, but actions could be taken and decisions made at any time during the simulation. Subjects, therefore, as in the real world, were not constrained to a particular action, plan, or strategy style. The SMS calculated raw scores based on the actions taken in response to incoming information, and information available earlier, and outcomes and their stated plans. More than 80 computer-gathered measures, which have been identified in earlier simulation studies as optimal predictors of success in complex decision making and subjected to multiple stepwise regression procedures to identify intercorrelations among simulation measures, are loaded on reliable and independent factors based on factor analytic varimax rotation of data collected from more than 20,000 subjects.7,38,39,40,41,42,43 The validated measures, which are derived from complexity theory, vary from assessments of simple competencies, such as speed or response and task orientation, to initiative, use of information, breadth of approach to problems, planning capacity, and strategy (Table 6). The measures have been validated by successfully predicting success among individuals engaged in positions exercising considerable complex perceptual and decision-making tasks.7,38,39,40,41,42,43 Decision-making performance scores were converted to percentile ranks by indexing against scores of performance measured in more than 20,000 subjects ages 16–83 who were chosen to represent the working population of the US.4 The baseline is composed of responses by a variety of members of this population, such as from students, professionals, homemakers, and laborers.

Table 6 Descriptions of measures of the SMS4,39

Raw scores for all measures at all points during all sessions were examined for outliers by inspection of scatter plots, box plots, and plots of Cooks’s distance, covariance ratios, robust regression residuals vs. robust distance, and examinations of Studentized residuals. Removal of data points flagged by at least three methods as potential outliers (11 of 850 data points), produced no effect on outcomes, and therefore the analyses were conducted using the complete data set. Statistical analysis software (Stata 14.1, College Station, TX; SAS 9.4, Carey, NC) was used for analyses, employing hypothesis-driven two-tailed alpha to reject the null hypothesis at 0.05. The main-effect variable examined, concentration of CO2, was treated as a categorical variable with values of 600, 1200, 2500, and 5000 ppm. Statistical assumptions were tested in concert with all techniques, and appropriate data transformations were used as needed to meet these assumptions. Values of the variable Initiative were transformed to their logarithms to meet criteria required for parametric analyses.

All of our primary outcome measures described above are continuously scaled, and all followed a normal distribution (or could be normalized) so that standard parametric statistical techniques were used. For these outcomes, we submitted the data to separate (per outcome) mixed-effects analyses that included both repeated-measures ANOVA (with SAS) and repeated-measures (subject) random intercept restricted maximum-likelihood method (Stata) to accommodate the repeated-measures experimental design. Our preliminary models included main effects and an interaction term for a variable in order to determine if the variable influenced performance outcomes. We independently assessed age, gender, session, and sleep durations preceding exposures as covariates. The amount of sleep by an individual preceding each exposure was found to be a significant covariate for the variable Initiative. Otherwise no effect for any factor was observed, so we reverted to a primary model comparing effects of the various concentrations of CO2 on each of the SMS factors. When significant differences were determined to exist among effects of concentrations, post hoc analyses among multiple pairs of concentrations were conducted using both Diffograms (mean—mean scatter plot) produced with Proc GLIMMIX (SAS) and pairwise contrasts of adjusted predictions (Stata) to determine which concentrations differed. The threshold for significance used was 0.008, which was derived by dividing 0.05 by 6, the number of post hoc pairwise comparisons made.

Before taking each Cognition test battery, subjects filled out a 10-item Likert-type (range 0–10) survey that asked, “How are you feeling now?”. The questions had the following anchors: Not sleepy at all–Very sleepy, Happy–Unhappy, No headache–Severe headache, Energetic–Physically exhausted, Mentally sharp–Mentally fatigued, Not stressed at all–Very stressed, Not confused at all–Very confused, No shortness of breath–Severe shortness of breath, No problems concentrating–Severe problems concentrating, Heart beating normally–Heart racing. The survey also asked subjects to identify items consumed, including food, drink, smoking, medications, and to indicate the quantities and times of consumption. The times of the start and ending of any strenuous activities was also requested.

We implemented a version of the Cognition battery of psychometric tests as described by Basner23 and by Moore.44 The tasks are “touch-based cognitive tasks” administered via an iPad. Data (metrics, metadata, and configuration data), as well as comments that can be entered by subjects, were recorded at the completion of each task. The component tasks of the Cognition battery, the cognitive domains involved, the primary brain areas recruited for each task, and the average duration for each task are shown in Table 7.

Table 7 Cognition Tasks: The table identifies the cognitive domain, brain areas primarily recruited in performing the task and the time required to administer the task23

The Cognition test battery consists of 10 brief neurocognitive tests (tasks) that cover a range of cognitive domains (Table 7). These include executive control, memory, attention, emotional processing, risk decision-making, abstraction, and sensorimotor speed. It was specifically designed for high-performing astronauts, and consists of 15 unique versions that allowed repeated administration of the battery with minimal re-use of the same stimuli. Importantly, brain regions involved in performing each of the Cognition tests have been established with fMRI, and the tests that are the basis for Cognition have been well validated in both healthy individuals (e.g., 60,000 soldiers in the Army STARRS project)45 and patient populations.46 Cognition was performed on a fourth-generation iPad in this study.

Cognition consists of the following 10 cognitive tests (for a detailed description of the battery see Basner23): The Motor Praxis Task is a measure of sensory-motor speed and taps the sensorimotor cortex.47 Participants had to mouse click on ever-shrinking blue boxes that appeared in varying locations on the screen. The VOLT is a measure of visual object learning and memory, and links to the medial temporal cortex and the hippocampus.48 Participants had to remember and later recognize ten 3D Euclidean shapes. The Fractal-2-Back is a measure of attention and working memory related to the dorsolateral prefrontal cortex, cingulate cortex, and hippocampus.49 Fractal images were projected at 1 Hz and participants were asked to press the spacebar whenever the fractal on the screen was the same as the fractal before the previous one (2 back). The Abstract Matching Task is a measure of abstraction and recruits prefrontal cortex.50 Participants were asked to pair a central target object with two objects on either the left or the right lower side of the screen. The Line Orientation Task is a measure of spatial orientation ability, based on Benton’s test, and activates the right temporo-parietal cortex and the visual cortex.51 In each trial, participants were asked to rotate a moveable blue line of variable length so that it is parallel to a fixed black line. The Emotion Recognition Task recruits the cingulate cortex, amygdala, hippocampus, and fusiform face area.52 Participants were shown a series of faces and asked to determine what emotion each face was showing: happy, sad, anger, fear, or no emotion. Difficulty was varied by emotion intensity. The Matrix Reasoning Task is a measure of abstract reasoning and consists of increasingly difficult pattern-matching tasks.47,53,54 It is analogous to Raven Progressive Matrices55 and recruits prefrontal, parietal, and temporal cortices.54 The Digit Symbol Substitution Task involves matching numbers to symbols and is a measure of complex scanning, visual tracking, and processing speed.56,57,58 It relates to temporal, prefrontal, and motor cortices. The Balloon Analog Risk Task is a measure of risk decision-making and recruits the orbital frontal cortex, amygdala, hippocampus, and anterior cingulate cortex.59 Participants bet by inflating 30 computerized balloons, with larger balloons offering greater but riskier rewards since no reward is given if the balloon “explodes”. The 3-min Psychomotor Vigilance Test measures vigilant attention by recording reaction times to visual stimuli that appeared at random inter-stimulus intervals.60,61,62 It relates to prefrontal, motor, and visual cortices. Cognition was administered before, during (early and late), and after each exposure session.

For each of the 10 Cognition tests, one key accuracy outcome and one key speed outcome were analyzed using a linear mixed-effects model with restricted maximum-likelihood estimation. Random-effects intercept terms per subject were used to accommodate the repeated-measures experimental design. For each outcome variable, we calculated four separate models:

  1. 1.

    Discrete CO2 effect model: Independent variables included CO2 condition (four levels), experimental session (four levels), time in CO2 (two levels), and pre-exposure performance (continuous variable).

  2. 2.

    CO2 effect by time in CO2 interaction model: As model 1, but including a CO2 condition/time in CO2 interaction term.

  3. 3.

    Recovery model: Independent variables included CO2 condition (four levels), experimental session (four levels), and pre-exposure performance (continuous variable).

  4. 4.

    Continuous CO2 effect model: Independent variables included CO2 exposure level (continuous), CO2 exposure level squared (continuous), and pre-exposure performance (continuous variable).

For models 1, 2, and 4, data were restricted to the measurements performed in the chamber. For model 3, data were restricted to the post-exposure measurement. Least-squares estimation was used to produce predicted average scores and confidence limits for each dose level by predicting the marginal means over a balanced population. Q–Q plots of model residuals were checked for normality. Only residuals for models with DSST percent correct and the PVT accuracy as outcomes did not follow a normal distribution. These outcomes were transformed to binary outcomes (100% accuracy was coded as 1, and 0 otherwise). We then ran non-linear mixed effect models for model 1 above. Four subjects were identified being potentially non-compliant on one test (N = 3 subjects) or two tests (N = 1 subject). In sensitivity analyses, analyses were repeated without these subjects. A total of 111 (or 3.0%) out of 3740 expected test bouts were missing due to absent subjects or subjects logging in with the wrong ID.

Speed and accuracy scores across tests were generated by first z-transforming each outcome based on the mean and standard deviation of the four pre-exposure tests calculated using the data of all subjects, and then averaging z-transformed scores across the 10 tests (speed scores were multiplied by −1 so that higher scores reflected faster speed). MPT, DSST, BART, and PVT were not included in the calculation for the accuracy score, as subjects were not asked to hit the center of the square (MPT), PVT, and DSST primarily address speed, and BART primarily addresses risk taking and not accuracy. For the ERT and MRT, we used weighted scores based on Item Response Theory analyses of individual stimuli. Efficiency scores were calculated by averaging speed and accuracy scores. Data from tests of non-compliant subjects were excluded from standardization and analysis (i.e., 0.6% of data excluded). All Cognition data were analyzed using SAS v9.4.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.