Introduction

Goal progress is seldom straightforward. Mistakes, competing priorities, distractions, and temptations are common setbacks. The volatility of goal progress necessitates internal regulatory systems to represent our intentions, monitor for conflicting events, and exert control to create behavior that is aligned with our intentions. These systems are commonly identified as self-regulation, and typically involve a discrepancy-reducing process that reduces the distance between intended and actual states1,2.

Self-regulation is the focus of intense investigation in psychology and neuroscience, and is variously identified as self-regulation, self-control, executive functioning, and/or cognitive control. This terminology has largely arisen in different sub-disciplines, and often co-occurs with differences in both methods and level of analysis, ranging from self-reported traits and reported everyday experience, to behavioral and neuroscientific analyses of speeded tasks. Despite obvious differences, each perspective agrees that self-regulation is not unitary, but that it instead relies on multiple processes, including goal setting, planning, reward sensitivity, performance monitoring, and inhibitory control, among others2,3. Each perspective also addresses a common overarching problem: how do we guide our thoughts, feelings, and behaviors out of trouble and towards an intended outcome? Testing this core question requires both reliable measurement of the various components of self-regulation, as well as high-quality tests of their predictive validity for everyday goal pursuit.

Some approaches to trait self-regulation have long grappled with these questions. Statistically powerful investigations have established the predictive validity of scale measures of self-regulation (e.g., trait conscientiousness, grit, and trait self-control) across multiple domains, including health behavior, academic attainment, morbidity, and mortality4,5,6,7,8,9,10. While theoretical differences exist among these traits, they also show considerable conceptual overlap and correlate strongly with each other (rsā€‰>ā€‰0.7; refs. 8,11). In short, these traits lie within a conceptual space that behaves as self-regulation should; they correlate with other measures of self-regulation and predict real-world outcomes.

Experimental psychologists and cognitive neuroscientists have also conducted numerous laboratory experiments that propose to reveal the mental and biological processes underlying the various components of self-regulation. This research has been incredibly generative, revealing a range of highly replicable behavioral phenomena, computational models, and neural correlates of self-regulation that are associated with sensitivity to reward, detecting the need for self-regulation (e.g., performance-monitoring in the anterior midcingulate cortex, aMCC), and signaling to other brain areas to increase goal-directed top-down control12,13,14.

Many theoristsā€”including ourselvesā€”have enthusiastically incorporated cognitive neuroscience methods to shed light on the mechanisms underlying self-regulation15,16,17,18. These approaches suggest that the neural mechanisms underlying the various components of cognitive control observed in the laboratory might predict self-regulation outside the lab. While numerous studies have investigated the brain as a predictor of real-world self-regulation [e.g., refs. 14,15,19,20,21], not one study has examined the predictive validity of these measures in the real world longitudinally beyond a week or two. We tested this premise in the current study. Specifically, we tested the predictive validity of trait and neural measures associated with various aspects of self-regulation in a large sample that combined measures of multiple components of self-regulation across diverse methods, including EEG, behavioral tasks, ecological momentary assessment of self-regulatory processes (e.g., desire frequency, intensity, and resistance), and longer-term goal progress assessed at 1-, 3-, and 6-month follow-ups.

We investigated three event-related potentials (ERPs) that have been mapped onto components of self-regulation in which one top-down control system is capable of (down)regulating automatic processes that arise from a bottom-up habitual or reward-driven system [e.g., refs. 12,22]. First, the error-related negativity (ERN23) is a response-locked ERP that differentiates error from correct responses within 100ā€‰ms and has been localized to the anterior midcingulate cortex24. The ERN might be one of the most replicable effects in all of cognitive neuroscience, and its relationship to internal performance-monitoring has led several theorists to directly implicate the ERN as a marker of a neurocognitive process underlying self-regulation in its broadest sense15,16,18. Empirical investigations of the real-world impact of the ERN suggest it might predict broad self-regulatory outcomes in the real world [e.g., refs. 21,25].

ERP research has also revealed components related to the processing of appetitive stimuli. The reward positivity (RewP) arises 250ā€“350ā€‰ms after feedback stimuli, and is potentiated to reinforcing signals26. The Late Positive Potential (LPP27 is a positive ERP that develops over several seconds at parietal midline electrode sites from 300ā€‰ms after stimulus onset is maximal when highly arousing, motivationally relevant images are presented28.

Unlike the ERN, which signals when control is needed, the RewP and LPP to positive images are better aligned with the appetitive processes that could undermine self-regulation (i.e., temptation, desire) if they come into conflict with a currently pursued goal22. That is, the RewP and LPP to positive images might reflect how reward sensitive a person is, and thus how motivated they are by appetitive stimuli. The ERN, in contrast, might reflect how attuned a person is to potential conflicts with their longstanding goals, including conflicts brought about by their responses to temptations in their environments.

Bridging the gap between laboratory and everyday self-regulation, and doing so longitudinally, allows us to test that the neural correlates of the various components of self-regulation elicited by laboratory tasks do not fall foul of several validity challenges. First, while well-controlled laboratory paradigms might allow for causal tests of predictions made from theories, internal validity sometimes comes at a cost to external validity29,30. The so-called mutual internal validity problem can become particularly acute if outcomes from lab experiments are used to develop theories whose predictions are tested through an iterative process of further lab experimentation that eventually focuses on the explanation of artificial lab tasks at the expense of ecological validity31. As a result, established lab tasks can become unmoored from the reality they are trying to model.

Neural measures that are commonly related to aspects of self-regulation might suffer from the mutual internal validity problem. After all, they are often studied in tightly controlled laboratory behavioral tasks (e.g., the Stroop, Flanker, or Go/no-go task) that were designed for the detailed examination and explanation of the behavioral, neural, and computational correlates of cognitive control [e.g., refs. 12,32,33] rather than as predictors of real-world outcomes. Avoiding the mutual internal validity problem in the current context requires testing the ability of various putative neural correlates to predict self-regulation outcomes outside the laboratory.

Partial support for the candidacy of the ERN, RewP, and LPP as predictors of real-world outcomes comes from demonstrations that ERP scores typically possess psychometric properties (i.e., internal consistency, heritability, test-retest reliability) that are consistent with stable individual differences34,35,36,37,38,39. While such reliability is only a precondition for establishing construct validity, it is noteworthy that metrics derived from laboratory behavioral measures often show relatively poor reliability40,41. This difference in reliability between scores on neural and behavioral measures of aspects of self-regulation, then, opens the possibility that while task measures have limited real-world predictive value42,43, the ERPs derived from the very same tasks are nevertheless plausible longitudinal predictors.

The validity of these ERPs as neural correlates of aspects of self-regulation also depends on the extent to which they behave as trait self-regulation should. This can be established by testing their relationship to a network of other measures that sit within the conceptual space occupied by self-regulation44. In short, these ERP components should relate to other trait measures related to self-regulation (i.e., convergent validity) as well as to outcomes that theoretically result from good self-regulation.

There is mixed support for the suggestion that the ERN sits well within this broad view of self-regulation. Many tests have demonstrated that the ERN is related to outcomes such as academic attainment, obesity, and everyday emotion regulation25,45,46,47, all of which might reflect good self-regulation (among other things). The component also shows lower amplitude in groups associated with a range of self-regulation problems, including addictions48,49,50,51. Even though the ERN relates to one aspect of self-regulation (error monitoring), this aspect is thought to be at the very core of the self-regulation process15. It is why the numerous previous studies have sought to associate it with broad self-regulatory outcomes with no direct mechanistic relationship with error monitoring.

However, there is little evidence that the ERN is correlated with self-regulatory personality traits. Instead, increased ERN amplitudes are commonly associated with anxious psychopathology and neuroticism52,53,54,55. These findings question the positioning of the ERN within the traditional conceptual space of self-regulation, as good self-regulation is commonly associated with increased psychological adjustment9 and satisfaction with life56. These factors prompt some caution regarding a sweeping hypothesis that larger ERNs necessarily predict better self-regulation, although it is possible that the ERN is linked to self-regulation through separate mechanisms that involve the integration of negative affect and neural monitoring57,58,59.

Both the RewP and LPP seem to fit more conventionally within a network of broad self-regulation given their clear association with reward sensitivity and approach motivation. The LPP is elevated to positive images for multiple groups with substance use disorders60,61,62, extroversion63,64, and trait behavioral approach65. The RewP is positively correlated with both subjective liking of rewards66, reward sensitivity67, trait approach motivation68, and extroversion69. Both reward sensitivity and approach motivation are strongly implicated in trait impulsivity70, suggesting that people who have high RewPs or LPPs might struggle to control their impulses, and struggle with self-regulation.

One last factor that urges caution when making predictions from the existing literature is low statistical power that is known to inflate published effects as well as increase rates of both false positives and false negatives71. Underpowered studies are a challenge for many disciplines; however, recent estimates suggest that cognitive neuroscience might be especially underpowered to detect all but unrealistically large effect sizes72,73,74,75. Large effect sizes are generally implausible across psychology76, and are particularly unlikely in the current case when a relatively narrowly defined measure (e.g., neural reactivity to mistakes in a flanker task) would be unlikely to explain large amounts of variance in noisy real-world outcomes [cf., ref. 77]. Thus, studies investigating real-world prediction, we believe, should be powered to detect small-to-medium effect sizes.

While self-report measures are highly scalable, neuroscience is considerably more resource intensive78. Indeed, ERP sample sizes tend to be smaller for frequently studied individual differences (e.g., mean Nā€‰=ā€‰66 in a meta-analysis of anxiety-ERN relationship, ref. 54), and studies that have tested the predictive validity of the ERN have returned mixed results [e.g., refs. 21,25,79]. Thus, the existing evidence provides inconsistent support for the real-world predictive validity of the ERN, often in studies that are likely underpowered to detect small-to-medium effect sizes, do not include longer-term follow-ups (i.e., beyond a few weeks), and/or were not preregistered. We aimed to provide a well-powered test of the predictive validity of the ERN in the context of a wide conceptualization of self-regulation including scale trait measures, everyday experience sampling, and the longitudinal attainment of personal goals.

Here, we took a mixed-method approach to explore the predictive validity of self-report (trait self-control, conscientiousness, behavioral approach system), neural (ERN, RewP, LPP), and behavioral task (cognitive control on the flanker task) measures of aspects of self-regulation as predictors of real-world goal processes unfolding both in the moment (assessed through experience sampling) and over longer (1-, 3-, and 6-month) periods (see Fig.Ā 1). Critically, we preregistered a number of our hypotheses in advance of data analysis.

Fig. 1: Schematic overview of our longitudinal mixed-methods design.
figure 1

The laboratory measures were assessed at intake (top panel), the subsequent week-long experience sampling at time two (middle panel), and the three follow-ups on personal goal progress at time-points 3ā€“5, corresponding to 1-, 3-, and 6-month follow-ups, respectively. ERN: error-related negativity; RewP: reward positivity; LPP: late positive potential; ISI: inter-stimulus interval; BIS: Behavioral Inhibition System; BAS: Behavioral Activation System; IAPS: International Affective Picture System.

Our mixed-method design and large sample size also allowed us to describe the relationship broadly among a range of neural, behavioral, and self-report assessments of aspects of self-regulation. This had multiple elements, including exploring convergence among diverse self-regulation measures, testing for the predictive validity of each measure as a determinant of long-term goal attainment, and aiming to replicate previous relationships (e.g., that temptation and not self-control predicts goal success, ref. 80). However, the central focus of this paper relates to the relationship between ERPs and real-world self-regulation (i.e., experience sampling, long-term goal attainment). The ERP hypotheses were preregistered (https://osf.io/v8jzd/) as follows:

  1. 1.

    ERPs related to the processing of positive/appetitive stimuli (i.e., the LPP and RewP) will be associated with reduced long-term goal progress.

  2. 2.

    The LPP and RewP will be associated with in-the-moment experiences indicative of poor self-regulation (increased desires, increased enactment, reduced resistance). If significant, momentary self-regulation variables will be tested as mediators of the relationship between the LPP/RewP and goal progress.

  3. 3.

    Based on prior findings that the ERN relates to everyday self-regulation, we predicted that higher ERN amplitudes should relate to greater goal progress.

  4. 4.

    A self-regulation account of the ERN also suggests that the component will be associated with enhanced self-regulation (i.e., reduced desires, reduced enactment, enhanced resistance). If significant, these relationships will be tested as mediators of the relationship between the ERN and goal progress.

  5. 5.

    Hypotheses regarding the ERN were presented with the caveat that this component is often associated with forms of anxiety/negative affectivity that are often inversely related to self-regulation. We preregistered our ambivalence about the ERN-regulation relationship, suggesting that this component may be unrelated to momentary self-regulation and long-term goal attainment.

Results

Descriptive statistics and convergence among self-regulation measures

TableĀ 1 presents all the descriptive statistics and intercorrelations for the personality, EEG, and cognitive measures assessed at intake. FiguresĀ 2ā€“4 show the canonical ERP effects for the ERN, RewP, and LPP. Further analyses of our ERP results and cognitive task performance are presented in the Supplementary Information (also available on the Open Science Framework: https://osf.io/kmj9w/). As can also be seen in Fig.Ā 5, self-report measures related to self-regulation (conscientiousness and trait self-control) were strongly correlated with each other. The neural measures, however, were uncorrelated with self-reported personality traits. The RewP and LPP were also uncorrelated with a self-report measure of behavior activation (the Behavior Activation Scale; BAS).

Table 1 Descriptive statistics and correlations for baseline personality and EEG measures.
Fig. 2: The ERN and RewP at electrode FCz.
figure 2

a, b Central lines depict the grand average ERP amplitude across participants (dotted line: correct; dash-dotted line: error; solid line: difference wave, and the shaded error bands denote 95% between-subjects confidence intervals (red: correct; blue: error; black: difference wave). Timepoint zero in the ERN waveform refers to the participants button-press response, whereas timepoint zero in the RewP waveform refers to the onset of the external feedback stimulus. The ERN was significantly more negative on error than correct trials, two-sided paired samples Studentā€™s t test: t(169)ā€‰=ā€‰20.42, p < 0.001, dā€‰=ā€‰1.57, 95% CIs [1.34, 1.79], and the RewP was significantly more positive to correct feedback than error feedback on correct trials, two-sided paired samples Studentā€™s t-test: t(188)ā€‰=ā€‰16.73, p < 0.001, dā€‰=ā€‰1.22, 95% CIs [1.03, 1.40].

Fig. 3: Figure depicting LPP at electrode Pz.
figure 3

Central lines depict the grand average ERP amplitude across participants, and the shaded error bands denote 95% between-subjects confidence intervals (red: high arousal negative; orange: high arousal positive; green: low arousal negative; blue: low arousal positive; purple: neutral). a ERPs showing the LPP for every condition in the experiment. A 2 (valence: positive vs. negative)ā€‰Ć—ā€‰2 (arousal: high vs. low)ā€‰Ć—ā€‰2 (window: early LPP vs. late LPP) repeated-measures ANOVA was used to assess the LPP across image types revealed a significant three-way interaction, F(1, 189)ā€‰=ā€‰23.34, pā€‰<ā€‰0.001. b ERPs showing the effect of valence (positive minus negativeā€”dashed black line) and arousal (high minus lowā€”solid black line) on the LPP amplitude. HA, high arousal image; LA, low arousal image; D denotes difference wave.

Fig. 4: Topographic maps showing the scalp distribution of the grand average ERPs.
figure 4

Increasing red intensity depicts positive-going ERP amplitudes, blue intensity depicts negative ERP amplitudes. a The early LPP for high-arousal positive images (top), the difference score for high arousal low arousal (middle), and the difference score for positive valence minus negative valence (bottom). b The equivalent topographic maps as in (a), but for the late LPP window. c Topographic maps in the time course of the ERN for correct trials (top), error trials (middle), and the difference score subtracting correct trials from errors (bottom). d The same pattern of topographic maps as in (c), but this time for the time course of the RewP after feedback stimuli. LPP: late positive potential; ERN: Error-related negativity; RewP: reward positivity.

Fig. 5: Pearsonā€™s correlation plots showing associations among dependent variables in our study.
figure 5

Increasing size and saturation of squares depict the effect size, negative correlations are shown in blue and positive correlations in red. BIS: Behavioral Inhibition System; BAS: Behavioral Activation System; Trait SC: Trait Self-Control Scale; LPP HA: LPP to high arousal positive images; LPP Diff Arousal: LPP difference score (high arousalā€‰āˆ’ā€‰low arousal); LPP Diff Valence: LPP difference score (positive valenceā€‰āˆ’ā€‰negative valence); Diff ERN: ERN difference score (errorā€‰āˆ’ā€‰correct); RewP: Reward Positivity; Diff Rew P: RewP difference score (correctā€‰āˆ’ā€‰error); Diff RT: reaction time difference on the flanker task (incompatibleā€“compatible); Diff Errors: error rate difference on the flanker task (incompatibleā€“compatible).

Many potential variables could be extracted from ERP averages to associate with aspects of self-regulation. We selected 6 ERPs to represent theoretically relevant constructs while avoiding multiple comparisons including redundant, highly correlated variables (e.g., including the ERN, CRN, and āˆ†ERN). The difference ERN (āˆ†ERN) was chosen as a measure reflecting the sensitivity of a monitoring system that differentiates between error and correct trials. Two variables represented neural responses to feedback; the RewP to capture neural reactivity to positive reinforcement, and the āˆ†RewP to reflect the feedback monitoring systems relative reactivity to feedback valence (correct-error). For the LPP, we included the early LPP to positive stimuli to capture initial orienting to appetitive stimuli, as well as two difference waves on the entire LPP window (early and late) that reflected the arousal (high-low) and valence (positive-negative) dimensions of affect81. We also conducted various exploratory analyses using alternate operationalizations of the ERPs, never finding results in disagreement with our main conclusions (see https://osf.io/kmj9w/).

TableĀ 2 presents the descriptive statistics and intercorrelations for the experience sampling and goal progress variables. On average, 60% of participantsā€™ desires conflicted with at least one goal. As in previous research80,82, greater resistance was related to reduced enactment of a desire, at least in the moment. However, unlike in past research, greater desires were not significantly related to goal progress months later. Surprisingly, enactment had a small, positive relationship with goal progress at 1 and 3 months, suggesting that those who gave in to their desires reported more goal progress at later intervals. As in previous research80, experiencing greater conflict with personally important goals (i.e., more temptations) was related to lower progress on those goals months later; stronger resistance, on the other hand, was unrelated to goal progress. Additional analyses were conducted examining conflicting desires only (looking both at desires conflicting with goals, and desires which are at least somewhat resisted (i.e., problematic)); results from these analyses do not change any of our conclusions (see TableĀ S2 in the Supplementary Information). Finally, we conducted additional analyses examining our preregistered hypothesis regarding desire and resistance as predictors of goal progress and report them in the Supplementary Information (see TableĀ S1).

Table 2 Descriptive statistics and correlations for experience sampling and goal progress variables.

Relations with ESM

TableĀ 3 presents the correlations along with corrected and uncorrected p-values, and Bayes factors; Fig.Ā 5 shows the heat map associated with that table. Consistent with previous research, trait self-control was related to lower average resistance; the higher peopleā€™s trait self-control, the less they relied on self-control in the moment. However, that correlation was not robust to multiple comparisons or the Bayes factor analyses. Only two correlations remained significant when controlling for multiple comparisons: both agreeableness and BAS were related to experiencing stronger experienced desires. Exploratory analyses with conflict strength showed that neuroticism was related to greater conflict and openness to experience with lower conflict. None of our a-priori hypotheses regarding neural correlates of desire were supported. Furthermore, looking at Bayes factors for those correlations, the null hypotheses were 4ā€“10 times better than the alternatives, suggesting moderate evidence in favor of the null hypotheses.

Table 3 Time one individual difference measures predicting the experience sampling measures.

Relation with goal progress

Finally, we examined preregistered hypotheses related to goal progress. TableĀ 4 presents the correlations along with corrected and uncorrected p-values, and Bayes factors; Fig.Ā 5 shows the heat map associated with that table. As expected, trait self-control was related to goal progress at all time points (although the relation was non-significant at the second follow-up after we corrected for multiple comparisons; see also Fig.Ā 6). Surprisingly, conscientiousness was not significantly related to progress (despite a strong correlation with trait self-control). Unexpectedly, agreeableness was positively related to goal progress and neuroticism negatively related to goal progress at two follow-ups. None of the neural indicators nor behavioral indicators were related to goal progress. This is contrary to our hypotheses predicting a relationship between RewP and goal progress and between the LPP and goal progress, but in line with our competing prediction regarding ERN and behavioral task measures, where we pre-registered competing predictions for both these variables. Bayes Factors show that for neural indicators, the evidence in favor of the null is moderate (3 to 9 times stronger than for the alternate). For behavioral measures extracted from the flanker task, the evidence in favor of the null is weak to moderate (BF01 of 1.3 to 9.1). Note that we preregistered an examination of whether in-the-moment desire and resistance mediate potential relations between personality/neural indicators (at baseline) and goal progress. However, given that there are no main effects or relations between self-regulatory variables and goal progress, we did not conduct those analyses.

Table 4 Time one measures predicting longitudinal goal progress 1, 3, and 6 months later.
Fig. 6: Scatterplots showing Pearson's correlation coefficients between the key individual difference predictors and longitudinal goal progress at 1-, 3-, and 6-month follow-ups.
figure 6

The relationship between goal progress at 1, 3, and 6 months are depicted in separate plots for trait self-control (aā€“c), the difference ERN amplitude (dā€“f), the difference RewP (gā€“i), and the LPP difference between arousal levels (jā€“l). Lines depict simple linear regression to aid visual interpretation, error bands depict the standard error. Trait SC: trait self-control; ERN: error-related negativity; RewP: reward positivity; LPP: late positive potential.

Discussion

We investigated diverse neural, behavioral task, and self-report measures broadly related to various aspects of self-regulation as predictors for everyday goal pursuit both during in-the-moment goal pursuit (i.e., during a week of experience sampling) and longitudinally in the way of 1-, 3-, and 6-month assessments of personal goal progress. None of the neural indicators (RewP, LPP, nor ERN) were related to self-reported traits, experienced desires, desire resistance, or long-term goal progress. In fact, Bayes Factors indicated that it was 4ā€“10 times more likely that neural indicators were not meaningfully related to either momentary or long-term self-regulation. These results are consistent with wide-spread jingle-fallacies in self-regulation research where various measures labeled as related to aspects of self-regulation are largely uncorrelated with each other. In addition, we found variation in predictive validity among self-regulation measures. While our ERP measures were unrelated to other assessments of self-regulation, trait self-control predicted greater goal progress up to six months later, and trait behavioral approach was associated with subsequent desire intensity during experience sampling.

Here, we present a large preregistered study assessing the predictive validity of ERPs related to performance monitoring (ERN), feedback processing (RewP), and motivated attention to external stimuli (LPP) in the context of longitudinal goal attainment. Neural reactivity to positively-valenced events was not associated with a higher prevalence or intensity of daily desires during experience sampling, or to long-term goal attainment. The ERN was also not associated with reduced enactment of desires or with eventual goal attainment. There are strong theoretical reasons to believe the ERN should predict real-world goal attainmentā€”after all, it is only via continuous monitoring of oneā€™s behavior that discrepancies between longstanding goals and current desires can be noted and resolved15. Our null results provide a cautionary note to previous theorists, including ourselves, who have put forward the ERN as an analog to the self-monitoring processes emphasized by traditional cybernetic models of self-regulation [e.g., refs. 15,18]. Our results do not challenge the role of the ERN within existing cognitive neuroscience models (e.g., the conflict monitoring account12), but do suggest that the ERN is potentially often mischaracterized as a measure of self-regulation. Given the common conceptualization of self-regulation as a psychological construct that predicts human health, wellbeing, and goal attainment2, our results suggest that the ERN does not behave as a measure of self-regulation should.

Our ERP results complement suggestions that behavioral measures of cognitive control do not predict the types of self-regulation that are relevant for everyday life11,43. While basic psychometric limitations might explain the null associations between behavioral tasks and everyday self-regulation40,41,42, our ERPs demonstrated good-to-excellent internal consistency. Thus, the non-convergence points more directly to problems in the conceptualization of these variables themselves as measuring aspects of self-regulation. Scale measures ask individuals to report how they generally act across a spectrum of regulation-relevant items (e.g., ā€œI am good at resisting temptationā€; ā€œI am lazyā€). Thus, the broader bandwidth of these scales could account for the relation we observe between trait self-control and goal attainment (see ref. 77). Conversely, ERPs were assessed in a one-off laboratory setting largely dissimilar to context in which everyday control occurs. Perhaps ERPs, then, measure a narrower, less ecologically valid construct than self-report measures. Such considerations have not prevented scholars from exploring whether ERPs self-regulation-related outcomes; but we now wonder if such examinations were asking these ERPs to predict far more than they are capable of.

Previous work has found mixed support for the relationship between the ERN and various outcomes [e.g., refs. 21,25,79). Direct comparison with previous studies is difficult because of several methodological differences. That said, we did operationalize the ERN in a traditional manner and our study features a large sample, preregistered analysis plan, and multiple longitudinal follow-ups. Thus, if the ERN was predictive of personal goal pursuit, our design should have found evidence supporting this hypothesis. Our ERN results were consistent across measures; however, this consistency was in the form of repeated support for a null relationship between the ERN and all other measures that could be identified as either directly related to established measures of self-regulation (e.g., trait self-control, daily resistance) or as an expected outcome of successful self-regulation (reduced enactment, attainment of personal goals). While no single study can provide a definitive answer to the broadest question of the general predictive validity of the ERN, our results (in addition to considerations of the likely low statistical power of past investigations; ref. 83) raise questions about the ability of lab-derived ERN scores to predict the longer-term pursuit of personal goals. Future work would benefit from conducting larger-scale confirmatory attempts to replicate previously identified ERN-outcome associations in other, perhaps more specific, domains (e.g., between the ERN and academic grades, or between the ERN and everyday emotion regulation).

It is worth noting that individual differences in ERN amplitudes often fail to predict between subject variation during the performance of laboratory cognitive control tasks themselves, such as slowing after initial mistakes to become more cautious (i.e., posterror slowing84). These findings, in addition to our own, suggest that the ERN might serve poorly as an individual difference predictor of subsequent behaviorā€”even when this behavioral adjustment and the ERN arise within the context of the same task. In contrast, within-person variation in ERN amplitude does predict within-person adjustments in response caution on a trial-by-trial basis84,85. Future studies might test this idea outside of the lab by, for example, pairing ambulatory assessments of the ERN with experience sampling to test if within-person error-related brain activity tracks with enhanced self-regulation on a moment-to-moment basis.

While trait self-control was associated with long-term goal progress, we found surprisingly few correlations between most self-reported traits and both momentary experience sampling and longer-term goal attainment. One exception was the BAS, which was positively related to greater desire strength, replicating previous work82,86. Unlike past research [e.g., refs. 82,87], trait self-control was not significantly related to frequency of desire or desire strength; for resistance and conflict there was a relationship in the expected direction (more self-control related to less resistance and less conflict), but it was not robust to correction, and Bayes Factor (of 1.0 or less) showed no support either for or against it. We also found virtually no relations between big-five personality and momentary measures related to aspects of self-regulation. One other study reported on the interaction between prior self-control and each of the big five on desire enactment, but did not report main effects88. Prior research on state manifestation of personality finds that in typical behavior there is a lot of within-person variability, such that resisting desires may not be a behavioral manifestation of conscientiousness89. We were also surprised that agreeableness was related to experiencing stronger desires; future research needs to replicate this finding and explore it further.

Past research suggests that most of the variance in goal progress is at the goal level (i.e., within-person; see ref. 90, for an overview), perhaps explaining the few associations between individual differences and goal attainment. Besides self-control, agreeableness was also positively related to goal progress, and neuroticism negatively related to progress (at 2 out of the 3 follow-ups). These were exploratory results (not preregistered), but they do replicate some past research on these traits and personal goal progress91,92,93. Conscientiousness was unrelated to goal progress, adding to the mixed literature on the role of conscientiousness in personal goal pursuit91,92,94,95.

Our work suggests several avenues for future research. We examined self-regulation in the real world by examining goal pursuit, broadly construed. ERPs might predict outcomes in the real world, but perhaps only in those specific outcomes that closely resemble the processes thought to be tapped by these ERPs. Future studies, therefore, might benefit from examining error-monitoring or reward-monitoring outside the lab (e.g., probing awareness of errors or reward) and then investigating whether this awareness is correlated with their attendant ERPs. Building on recent suggestions that self-regulation resembles value-based choice96, one fMRI study recently found that the neural correlates of subjective valuation processes did predict everyday self-regulation in a sample of almost 200 participants97. Thus, examining self-regulation in the real world by examining psychological variables more closely-aligned with theorized processesā€”in this case taking a decision-making approach to self-regulation [e.g., ref. 98]ā€”may be fruitful for future research.

In addition, even though experience sampling results in many observations per participant, here we aggregated these observations to examine individual differences (i.e., in how they generally feel and act across the week). This comes at a cost to power. Although our sample sizes are high for an EEG study, we only had 80% power to detect correlations greater than rā€‰=ā€‰0.23 at the one-month follow-up, rising to rā€‰=ā€‰0.27 at six months. As such, our study meets common power conventions to detect small effect sizes99, but has lower power to detect even smaller effects. It is important to note that our Bayesian analyses did provide evidence in favor of null relationshipsā€”suggesting that samples were not too small to provide evidence in relation to our hypotheses. However, these considerations highlight that potentially even larger, preregistered studies are required to continue to investigate the predictive validity of ERPs for everyday outcomes. Increased statistical power will not, however, overcome the possibility (as discussed above) that individual differences extracted from behavioral cognitive control tasks, including their neural correlates, are not particularly valid predictors of everyday self-regulation.

In conclusion, for studies and theories to be meaningful, they must iteratively move between the laboratory and the real world to avoid generating elaborate theories and tasks that account for little variability in the real world, the so-called mutual internal validity problem31. Our results suggest that some research on self-regulation has fallen prey to this validity issue, casting doubt on the broad applicability of past research. Our findings provide evidence that a range of ERPs related to performance monitoring (i.e., the ERN) and neural reactivity to positive stimuli (RewP, LPP) are not associated with measures of self-regulation, including self-reported self-control, everyday desire, and resistance during experience sampling, or the long-term attainment of personally selected goals. These findings challenge the ecological validity of brain measures thought to assess aspects of self-regulation. These measures do not appear to predict goal-directed behavior in the real world and challenge a too simplified view of self-regulation.

Methods

Open science statement

All hypotheses and analytical plans were registered on the Open Science Framework (https://osf.io/v8jzd/) after the data was collected but before anyone looked at the longer-term follow-up data (date of latest registration: 16 July 2020). Our original analysis plan also included exploratory analyses including time-frequency analyses in the theta and delta band. However, we opted to only conduct and report the confirmatory ERP analyses to reduce the complexity of our report. The exploratory analyses have not been conducted to date, and, as such, we cannot make any claims about the predictive/ecological validity of time-frequency approaches to studying cognitive control as a measure of self-regulation.

Participants and procedure

Participants (Nā€‰=ā€‰226) were predominantly recruited using a convenience sampling approach through an undergraduate participant pool, though a smaller number were also recruited through on-campus and local advertisements. Participants were compensated with up to $75CAD for participating in the study ($25 for the initial lab portion and the experience sampling, with an additional opportunity to earn up to a $20 for completing 85%ā€‰+ā€‰of the experience sampling signals, and $10 for completing 75ā€“85%. Participants also received $5 for each follow-up, and a $5 bonus for completing all follow-ups). Participants could choose between receiving their compensation in cash or as an online shopping voucher. Participants were 37% male, 63% female with a mean age of =20.4 (SDā€‰=ā€‰5.93). 92.4% of the participants reported being a current student.

Although we did not conduct an a-priori power analysis, a sensitivity analysis showed that this sample size would allow us to find effects as small as rā€‰=ā€‰0.19 with 80% power for the between-subject analyses. Participants came into the lab for a 2-h session during which they completed questionnaires and computerized tasks while their brain activity was recorded with EEG. Three computerized tasks were administered, in addition to baseline measures of EEG activity: a flanker task (to assess error-related negativity, ERN), a passive image viewing task (to assess late positive potential, LPP), and a time estimation task (to assess reward positivity, RewP). All materials can be found at OSF. A week later, participants begun the experience sampling portion of the study: each day for seven days, participants received seven signals with brief surveys. Using SurveySignal100, these signals were sent at random times in seven equal intervals between 9:30ā€‰am and 9:30ā€‰pm. Please note that we also collected additional data on in-the-moment self-regulation strategies, which were reported in Milyavskaya et al.101. As part of a larger data collection, we also collected data from a nightly survey sent at 10:15ā€‰pm each day of the experience sampling portion of the study. These questionnaires were included to answer questions beyond the scope of this paper and will not be reported further. Participants received online follow-ups 1 month, 3 months, and 6 months after their lab session. There was also another follow-up survey sent out 12 months after intake; however, we wanted to examine goal pursuit only for those goals that participants were still pursuing, and participants did not complete measures of goal progress for all their goals at the 12-month follow-up; as such, we did not include it in the present paper. Each follow-up survey piped in the goals that participants indicated in the intake survey and asked them a series of questions about goal pursuit, including a measure of goal progress. Participants provided informed consent, and we complied with a protocol that was reviewed and approved by the University of Toronto Research Ethics Board, Social Sciences, Humanities, and Education Committee (approval number: 30380).

Following data cleaning and participant non-response to follow-up surveys (see Supplementary Information: https://osf.io/g759u/), we were left with 201 participants at baseline (first survey and experience sampling), 149 at the 1-month follow-up, 132 at the 3-month follow-up, and 107 at the 6-month follow-up. Follow-up sensitivity analyses showed that these sample sizes were sufficient to detect effects as small as rā€‰=ā€‰0.23 at the first follow-up, rā€‰=ā€‰0.24 at the second follow-up, and rā€‰=ā€‰0.27 at the third follow-up with 80% power for the between-subject analyses.

Materials

Personality

Trait self-control was measured using the trait self-control scale9, consisting of 13 items (e.g., ā€œPeople would say that I have iron self-disciplineā€) rated on a scale of 1 (not at all) to 7 (very much). A scale score was computed as the average of 13 items (after recoding reversed items), with higher scores representing greater self-control.

Participants then completed the Big Five Inventory102, which consists of the stem ā€œI am someone whoā€¦ā€ followed by 44 items to assess conscientiousness (e.g., ā€˜does a thorough jobā€™); extraversion (e.g., ā€˜is talkativeā€™); openness to experience (e.g., ā€˜is ingenious, a deep thinkerā€™); agreeableness (e.g., ā€˜has a forgiving natureā€™), and neuroticism (e.g., ā€˜worries a lotā€™). Each item was rated on a scale of 1 (Disagree strongly) to 5 (Agree strongly). Separate scores were calculated for each subscale by taking the average of the corresponding items.

To assess behavioral approach and inhibition, participants completed the BIS/BAS scale103. This scale consists of 24 items. Seven items assess the strength of the behavior inhibition system (BIS; e.g., ā€˜Criticism or scolding hurts me quite a bit.ā€™), and 13 items assess different aspects of the behavioral activation System (BAS), including BAS drive (four items, e.g., ā€˜When I want something I usually go all-out to get it.ā€™), BAS fun seeking (four items, e.g., ā€˜I crave excitement and new sensations.ā€™), and BAS reward responsiveness (5 items, e.g., ā€˜It would excite me to win a contest.ā€™). Reliabilities were operationalized as Cronbachā€™s alpha, and were acceptable for each variable, ranging from 0.65 to 0.87 (see TableĀ 1 for descriptive statistics).

Goal setting

As part of the survey, participants were asked to describe four goals that they planned to pursue over the coming year. In line with past research (e.g., refs. 80,104), goals were described as follows: ā€œPersonal goals are projects and concerns that people think about, plan for, carry out, and sometimes (though not always) complete or succeed at. They may be more or less difficult to implement; require only a few or a complex sequence of steps; represent different areas of a personā€™s life; and be more or less time consuming, attractive, or urgent.ā€ Participants were then asked to write down their most important personal goal that they planned to pursue over the coming year, followed by 3 other goals they planned to pursue. Examples of goals include ā€œachieve a 3.5 GPAā€, ā€œbe more socialā€, and ā€œwork out at the gymā€.

Tasks

Flanker task

Participants performed an arrow version of the flanker task in which five arrowheads were presented in white on a black screen and the stimulus arrays could be either compatible (<<<<<, >>>>>) or incompatible (<<><<, >><>>). Participants had to respond to the direction of the central arrowhead while ignoring the flanking arrows. The right and left-facing arrow keys were responded to the right and left arrow keys of a millisecond accurate QWERTY keyboard (Empirisoft DirectIN Millisecond Accurate Keyboard). Trials commenced with the presentation of a fixation cross (250ā€‰ms) that was followed by the presentation of a flanker target stimulus until response (max: 1000ā€‰ms). The target trial was followed by a blank screen for between 600 and 1000ā€‰ms before the start of the next trial. Participants performed a total of 420 trials with equal proportions of compatible and incompatible trials. Participants were given self-paced breaks after every 60 trials. Participants were instructed to respond as fast as possible. This task was programmed in MediaLab (v2012, Empirisoft, New York, NY).

Time estimation task

The time-estimation task was used to elicit the RewP. The task was programmed in E-Prime 2.0 (Psychology Software Tools, Pittsburgh, PA) and started with a fixation cross for 250ā€‰ms that was followed by a blank screen. The participantsā€™ task was to press the X key once they estimated that one second had elapses since the presentation of the initial fixation cross. External feedback was presented at a fixed interval of 2000 ms after the initial fixation cross. Correct feedback was provided visually by a plus sign in the center of the screen, and incorrect feedback was provided by a minus sign. Correct feedback was provided if the participantsā€™ response was within a pre-defined window that was centered around 1ā€‰s after the initial fixation cross. The duration of this window was titrated adaptively throughout performance to ensure that participants received roughly equal numbers of correct and incorrect feedback. The window was initially set at 100ā€‰ms, and was reduced or increased by 10ā€‰ms for accurate and inaccurate responses, respectively. This adaptive procedure meant that participants received roughly equal numbers of incorrect feedbacks. Participants first completed 20 practice trials of this task, followed by 168 experimental trials. The experimental trials were further divided into four blocks of equal length, separated by self-paced breaks.

Picture viewing task

To obtain the LPP, participants viewed 210 images (presented in random order): 30 each of negative and positive high arousal and low arousal that were analyzed here (full image list: https://osf.io/c283f/), as well as 30 neutral images from the IAPS that were not used in the current analyses (using the same materials and protocol as in ref. 105). An additional 60 images were included to assess neural responses to healthy and unhealthy foods. Images were taken from the food-pics database (a large database of food and control images rated on characteristics such as valence and palatability, as well as micronutrient information; ref. 106); as these food images were not relevant to the current paper, we will not consider them further. This task was programmed in MediaLab (v2012, Empirisoft, New York, NY).

EEG measures

EEG activity was recorded continuously throughout the entire in-lab session as the participants completed each task. The EEG was recorded from 36 Ag/AgCl sintered electrodes arranged according to the international 10ā€“20 system in a stretch-lycra cap (Electro-Cap International, Eton, OH). Vertical electro-oculography (VEOG) was recorded via a supra- to sub-orbital bipolar montage surrounding the right eye. Impedances were monitored during recording and kept at less than 5ā€‰kĪ©. The continuous EEG signal was digitized at 512ā€‰Hz using ASA acquisition hardware and software (asalab 4.9.4 software, TMSi Refa8 device; Advanced Neuro Technology, Enschede, the Netherlands). Recordings used the average earlobe and forehead electrodes as reference and ground, respectively. All subsequent data analyses were conducted offline using Brain Vision Analyzer (v.2.2; Brain Products, GmbH, Gilching, Germany). The offline EEG processing was pre-registered before our analyses began (https://osf.io/znsw8/).

The data were band-pass filtered offline using zero phase shift Butterworth filters (24ā€‰dB/octave roll-off) with a high-pass filter of 0.1ā€‰Hz and a low-pass filter of 20ā€‰Hz. Eye-blinks were corrected for using regression-based procedures107. Semiautomatic procedures were then used to identify and reject EEG artifacts. The artifact criteria were a voltage step of more than 25ā€‰ĀµV between sample points, a voltage difference of 150ā€‰ĀµV within 200ā€‰ms intervals, voltages above 85ā€‰ĀµV and below āˆ’85ā€‰ĀµV, and a maximum voltage difference of less than 0.05ā€‰ĀµV within 100ā€‰ms intervals. Intervals were rejected on an individual channel basis to maximize data retention for the subsequent ERP analyses. Averaged ERPs were rejected if they comprised fewer than 5 trials for the response-locked ERPs, and fewer than 20 trials for the RewP, and fewer than 8 trials for the LPP (see https://osf.io/g759u/).

Reliability was assessed for each ERP score using split-half reliability assessment. This was achieved by ordering all viable epochs at the electrode of interest and creating separate averages for odd and even trials. A reliability statistic was then calculated by first conducting a Pearsonā€™s correlation between the odd and even ERPs (e.g., rodd-even) and applying Spearman-Brown correction, rSBā€‰=ā€‰2*(rodd-even)/(1+rodd-even), to adjust for the smaller number of trials per condition as a result of creating the split-halfs of the data [see ref. 108]. Reliability for the difference waves was calculated by first subtracting the odd/even ERN split-half averages from the odd/even CRN split-half averages, and then subjecting these to the same analysis steps as the ERN/CRN to compute reliability.

The ERPs were then operationalized as follows. For the response-locked ERPs, epochs were created that started 400ā€‰ms before each response and lasted for 1400ā€‰ms. Epochs were then averaged separately for error and correct trials for each participant. The peak of the ERN was then operationalized by first creating a grand average ERN across all participants to find the peak ERN amplitude at electrode FCz. A 50ā€‰ms window surrounding this peak (26ā€“76ā€‰ms) was then used to obtain a mean amplitude measure of the ERN (reliabilityā€‰=ā€‰0.86) and its correct-trial equivalent, the correct-related negativity (CRN, reliabilityā€‰=ā€‰0.98) for each participant. The difference ERN (āˆ†ERN, reliabilityā€‰=ā€‰0.89, see TableĀ 1) was obtained by subtracting error epochs from correct epochs and then extracting a mean amplitude measure in the same time-window used for the ERN. The āˆ†ERN was used for our primary confirmatory tests and are presented in the main manuscript; however, we also present results for the CRN and ERN in the Supplementary Information (see TableĀ S3). Importantly, the choice of ERP did not influence the conclusions drawn in the manuscript.

Feedback-related ERPs were identified using epochs that started 400ā€‰ms before a feedback stimulus and lasted for 1400ā€‰ms. These epochs were baseline corrected using a 200ā€‰ms window that started 200ā€‰ms before the onset of the feedback stimulus. The RewP was operationalized as a 50ā€‰ms mean amplitude window at FCz that was identified using a collapsed localizer method that collapsed across condition (i.e., correct feedback, error feedback) and participant. We extracted mean amplitude measures 255ā€“305ā€‰ms after feedback onset separately for correct (reliabilityā€‰=ā€‰0.97) and error (reliabilityā€‰=ā€‰0.95) trials. We used a difference wave (correct minus error; āˆ†RewP, reliabilityā€‰=ā€‰0.79) as our primary measure for confirmatory analyses and we also report the RewP to correct feedback in all tables (see Supplementary Information TableĀ S3 for results with the RewP to error feedback). As with the ERN results, the choice of ERP operationalization did not influence the conclusions of our study.

The LPP was also operationalized in a window that started 400ā€‰ms before each emotional image and lasted for 2400ā€‰ms. The LPP was extracted as two adjacent 500ā€‰ms mean amplitude windows to reflect the early and late aspects of this components (early LPP: 350ā€“850ā€‰ms; late LPP: 850ā€“1350ā€‰ms). As with the Pe and RewP, these aspects of the LPP were operationalized using a collapsed localized method across conditions and participants at electrode Pz. Our analyses focused on three operationalizations of the LPP. First, we analyzed initial orienting to the high arousal positive stimuli using the early LPP to high arousal positive images (split-half reliabilityā€‰=ā€‰0.82; see TableĀ 1). We additionally computed difference waves (e.g., positive minus negative; high arousal minus low arousal), see Figs.Ā 3 and 4. However, these variables demonstrated poor split-half reliability (ranging from 0.23 to 0.54) and should be treated with caution in the subsequent analyses. We also present supplementary ERP analyses for the early and late LPP to every image type in the Supplementary Information (see TableĀ S3). These analyses supported the same conclusions as those in the main manuscript.

Experience sampling

In the experience sampling survey, participants were first asked about whether they were currently experiencing a desire or had experienced one in the past 30ā€‰min. When participants indicated that they were or had recently experienced a desire, they then indicated what the desire was for, choosing from among 23 categories (adapted from ref. 82; see all materials on OSF). They then reported on desire strength (ā€˜how strong is/was the desire?ā€™) using a slider scale ranging from 1 (very weak) to 7 (very strong) and whether they had the opportunity to satisfy the desire (y/n). If they indicated that they had the opportunity to satisfy the desire, they were asked about resistance (ā€˜did you try to resist the desireā€™), using a slider with the anchors 1 (did not try to resist at all) and 7 (tried very hard to resist). Those who reported resisting at least somewhat (did not select 1) were asked about the strategies that they used to resist. Participants then reported whether they gave in to the desire (y/n). Please note that we had initially proposed to operationalize desire strength using a variable that accounts for instances where participants did not experience a desire. We created a variable where all reports of desire strength for experienced desires were copied over, and instances where participants did not experience a desire were recorded as 0 on this new variable (so that the new desire strength variable now ranged from 0 to 7). However, this variable was almost redundant with proportion of desires (rā€‰=ā€‰0.92), so we kept the original measure of desires (departing from our pre-registration).

For supplemental (non-pre-registered) analyses, we also examined the extent to which the desires conflicted with or helped four personal goals that participants reported at baseline. Ratings for each goal ranged from āˆ’3 (conflicts with goal pursuit) to 3 (helps with goal pursuit); a measure of conflict strength was computed for each desire by averaging across the four goals and reverse-scoring it (such that larger numbers indicated greater conflict). We also computed additional measures of desire strength, resistance strength, and desire enactment for only those desires that conflicted at least somewhat with at least one of the goals; results with these variables are reported in the Supplementary Information (see TableĀ S2).

Follow-up goal progress

At each of three follow-ups (1, 3, and 6 months later), participants were reminded of their four goals that they set at the baseline assessment and asked about goal progress using 3 items previously used in goal research (e.g., ā€œI have made a lot of progress towards this goalā€10,104) rated on a scale of 1 (strongly disagree) to 7 (strongly agree). The average across the three items, and the four goals, at each time constituted the goal progress variable.

Supporting the psychometric validity of these measures, the internal consistency for each of these items (12 items per time point, 3 items for each of the 4 goals) was high (Cronbach's Ī±ā€‰=ā€‰0.83 at 1 month, 0.80 at 3 months, and 0.84 at 6 months). As noted in TableĀ 2, we also found that goal progress measures were correlated positively with each other (rsā€‰=ā€‰0.37ā€“0.53), suggesting that self-reported goal progress was moderately stable over the 6-month period.

Planned analyses

Given that here we were interested in person-level variables, we aggregated our key desire variables by person, computing for each person new variables representing the proportion of desires reported (out of all the signals that the person completed), the average desire strength, the average resistance strength, and the proportion of desires that were enacted (out of all the desires reported). We then correlated these variables with our personality and neural indices. Given that we conducted 16 correlations for each DV, a False Discovery Rate correction109 was applied to reduce the proportion of false positives. In addition, to corroborate results, we used Bayesian analyses to compute a Bayes factors for each correlation (using JASP software; 0.14.1). The Bayesian Pearsonā€™s correlation analyses used default settings in JASP to test for an alternative hypothesis that variable pairs were correlated using a stretched beta prior width of 1.0.

Reporting summary

Further information on research design is available in theĀ Nature Research Reporting Summary linked to this article.