Task compliance predicts suppression-induced forgetting in a large sample

Suppression-induced forgetting (SIF) refers to a memory impairment resulting from repeated attempts to stop the retrieval of unwanted memory associates. SIF has become established in the literature through a growing number of reports built upon the Think/No-Think (TNT) paradigm. Not all individuals and not all reported experiments yield reliable forgetting, however. Given the reliance on task instructions to motivate participants to suppress target memories, such inconsistencies in SIF may reasonably owe to differences in compliance or expectations as to whether they will again need to retrieve those items (on, say, a final test). We tested these possibilities on a large (N = 497) sample of TNT participants. In addition to successfully replicating SIF, we found that the magnitude of the effect was significantly and negatively correlated with participants’ reported compliance during the No-Think trials. This pattern held true on both same- and independent-probe measures of forgetting, as well as when the analysis was conditionalized on initial learning. In contrast, test expectancy was not associated with SIF. Supporting previous intuition and more limited post-hoc examinations, this study provides robust evidence that a lack of compliance with No-Think instructions significantly compromises SIF. As such, it suggests that diminished effects in some studies may owe, at least in part, to non-compliance—a factor that should be carefully tracked and/or controlled. Motivated forgetting is possible, provided that one is sufficiently motivated and capable of following the task instructions.


Method
Participants. For Experiment 1, 146 participants (40 male and 106 female, all between 16 and 24 years of age) from the Southwest University in China participated as paid volunteers. Five participants were removed from the eventual data analysis as they asked to withdraw early from the study (before the final test) because they indicated becoming uncomfortable being in the MRI scanner for the full duration of the TNT phase, leaving 141 participants contributing to the analysis of SIF and facilitation effects. Three participants failed to complete the compliance questionnaire, while three did not complete the test-expectancy questionnaire because of experimenter error. These participants were included in the above analyses but were necessarily excluded from analyses involving the compliance and test expectancy. For Experiment 2, 351 (97 male and 254 female) undergraduate and postgraduate students (18-25 years of age) from the Southwest University in China participated. Five participants failed to complete the compliance questionnaire, while three failed to complete test-expectancy questionnaire; these participants were excluded from analyses involving compliance and test expectancy. Across both studies, all participants were right-handed native Chinese speakers with normal or corrected-to-normal vision. None reported a history of neurological or psychiatric disorders. All participants provided an informed consent prior to the study, which was approved by the Institutional Human Participants Review Board of Southwest University Imaging Center for Brain Research. The experimental procedures were approved by the Academic Committees of Southwest University in China and conducted in accordance with relevant guidelines and regulations.
Design. The experiment used a 3 × 2 × 3 mixed-subjects design, with Task (Baseline, Think, and No-Think) and Test Type (Same-Probe vs. Independent-Probe test) manipulated within participants, and Item Counterbalancing manipulated between participants. We assessed the percentage of items correctly recalled on the final test as our primary dependent measure. We computed this measure in two ways: (1) based on all of the studied items, irrespective of whether they had been demonstrably learned prior to entering the TNT phase (Unconditionalized recall data); and (2) considering only those items that had been correctly recalled on the initial pre-test prior to Materials. We selected 66 weakly related word pairs from previous studies 11,12 and translated these into Chinese. We divided the pairs into three subsets of 16, which were then separately assigned to the Think, No-Think, and Baseline conditions, counterbalanced across participants. We reserved the remaining 18 pairs as fillers to be used in the practice TNT phase and in the practice test used to reinstate the initial learning context before the critical final memory tests (see Procedure).
Procedure. We used the conventional TNT paradigm 12 , which consists of three phases: a study phase, the TNT phase, and a final test phase. Both experiments used the same procedure, except that participants in Experiment 1 performed the TNT phase in the MRI scanner (hereinafter referred to as the fMRI sample), whereas participants in Experiment 2 performed the TNT phase outside of scanner (hereinafter referred to as the behavioral sample). We only used the behavioral data from these samples in this study; the imaging data were reported elsewhere 35 . In the sections below, we describe the details of each phase.
Study phase. In the study phase, we instructed participants to learn 66 cue-target word pairs so that they could recall the target as soon as they saw a cue word. The study phase took place in three stages. First, participants studied the word pairs one by one, with each word pair presented in a white font in the middle of a black screen for 3.4 s (0.6 s ITI). Second, participants had up to three test-feedback cycles to achieve at least 50% accuracy in recalling the associations. During this stage, we presented each cue word for up to 3.4 s (0.6 s ITI). After participants recalled the target word aloud or the 3.4 s had elapsed, participants viewed the correct target as feedback for 1 s. As a final step, we tested participants on all 66 pairs: We presented each cue word on the screen for 4 s and asked participants to report the target word aloud. This final "criterion test" was used to establish which pairs had been learned successfully, and performance on this test was used to decide which pairs would be analyzed in our conditionalized recall measure (see Design section).
Think/no-think phase. Although the TNT phase for Experiment 1 was conducted in the MRI scanner, rather than in a behavioral testing room, the TNT phase unfolded similarly across both experiments. Specifically, in this phase, each trial presented a single cue word from either the Think or No-Think conditions, which were randomly intermixed. We told participants that some of the cue words would appear in green (Think trials), and that their task for these items would be to recall the associated target as soon as possible and keep it in mind for the duration of the trial. In contrast, other cue words instead would appear in red (No-Think trials), and, for these trials, their task would be to prevent the associated target word from coming into awareness by blocking out all thoughts about it without replacing it with any other thoughts. As such, these No-Think instructions are consistent with the Direct Suppression technique described elsewhere 36 . During each trial, the cue word appeared for 3 s with a jittered ISI (1 s, 3 s, 5 s, 7 s) that helped optimize the efficiency of the event-related fMRI design. Participants viewed a fixation cross during the ISI. There was no jittered ISI in Experiment 2.
Before the TNT phase proper, we led participants through a practice TNT phase with fillers pairs to make sure participants fully understood and complied with the instructions at this stage. Not only are these efforts a standard part of what has become the TNT paradigm's typical implementation, but we also wanted to establish that any reported non-compliance in the TNT phase proper was not due to a misunderstanding of the instructions at the outset of the task. The practice phase consisted of two short blocks. After each block, we administered a diagnostic questionnaire to ensure that participants understood the procedure, and we gave corrective feedback as necessary (e.g., if they covertly rehearsed the target words for No-Think trials or if they did not always actively push the target word out of mind if it did come to mind during the red cue). Participants received a 5-min break between the practice and the formal TNT phase. All the participants received a refresher presentation of all the word pairs before the TNT phase in the scanner (1 s per pair).
The TNT phase proper was divided into 6 blocks, each lasting for 6.7 min. Each block contained 16 Think items and 16 No-Think items pairs, with each item presented twice. We inserted a 30-40 s break after each block, and we administered an additional diagnostic questionnaire after the first three blocks to ensure that the participants continued to follow the instructions. We obtained the diagnostic questionnaire from Michael Anderson and translated it into Chinese.
Final test phase. We tested participants' memory for the Think, No-Think, and Baseline items in two ways, in two separate test blocks, the order of which was counterbalanced across participants: a Same-Probe (SP) test and an Independent-Probe (IP) test. On each trial of the SP test block, we presented a cue word from one of the studied pairs on the screen for 3.4 s (ISI 0.6 s) and asked participants to recall aloud the word they had learned to associate with it during the initial study phase. On each trial of the IP test, in contrast, we presented a category or a semantically related cue of the target word on the screen for 3.4 s (ISI 0.6 s) and asked participants to recall a studied response word that fit those cues. Before these two tests, we administered to participants a practice test block of 18 filler word pairs containing a mixture of filler Baseline, Think, and No-Think items. This practice helped ensure participants understood the task procedure and also helped reinstate the context of the original study phase, in which they had learned the Baseline pairs, as well as the Think and No-Think pairs. www.nature.com/scientificreports/ Compliance questionnaire. After the memory tests, participants filled out a questionnaire to assess their compliance with the No-Think instructions. First, participants were asked to provide honest ratings from 0 (Never) to 4 (Always) on three statements to indicate whether they ever intentionally made an effort to think about the targets during No-Think trials: (a) When I saw the red cue word, I quickly checked to see if I remembered the target word; (b) After a red cue word went off the screen, I checked to see if I still remembered the target word; (c) When I saw a red cue word, I thought about the target word that went with it to purposely improve my memory for that word pair. We computed a summary compliance score across the three main "cheating" or "memory checking" behaviors (i.e., checking during the No-Think trial, checking after the No-Think trial, and intentional rehearsal of No-Think items). This score served as a key dependent measure that we used to examine a possible relationship between non-compliance and suppression-induced forgetting. We also analyzed these items separately.
Test-expectancy questionnaire. Participants were then asked to indicate the extent to which they expected a final memory test. Specifically, they were asked to provide a rating from 0 to 4 (0 = "No, I did not think that"; 2 = "Unsure if I did think that"; 4 = "Yes, I definitely thought that") in response to the following prompt: In the main phase of the experiment, we asked you to not think about the associated target word for cue words colored in RED. During this phase, did you suspect that you would later be asked to recall the target for these RED cue words? In other words, did you anticipate some form of a final test?

Results
Training phase performance. For  Because we used the same experimental design in the two experiments, we first combined data from both to increase the statistical power and to establish the general pattern of results. We followed this by an analysis of the two experiments reported separately. In addition, we report the foregoing analyses both using the conditionalized and the unconditionalized recall data (see Design). We present the mean recall percentage for each condition for each sample in Table 1.
One reason for this lack of observed stability in the observed facilitation effect is that it interacted with Test Type. In the overall sample, the interaction of Task  Examining the test types separately, the facilitation effect in the overall sample was significant on the SP test in both the conditionalized data, F (1, 491) = 42.104, p < 0.001, η P 2 = 0.079, and the unconditionalized data, F (1, 491) = 104.032, p < 0.001, η P 2 = 0.175. However, this effect was not significant for the IP test, and, indeed, was reversed in the unconditionalized data, F (1, 491) = 44.195, p < 0.001, η P 2 = 0.083. Similar patterns arose in each individual experiment. These findings show that facilitation due to retrieval practice is entirely specific to the practiced association and does not generalize to independent cues. Indeed, retrieval practice can impair retention of a memory when it is tested via a novel cue, an effect sometimes observed in prior studies and which has been attributed to increasing encoding specificity due to retrieval practice 23 . Most participants reported complying with suppression instructions. Given that we provided participants with practice on the TNT task prior to the critical phase and gave them extensive feedback on the instructions for the No-Think task (via repeated administration of the diagnostic questionnaire), one might expect that most participants would report compliance with the No-Think instructions after the experiment. Consistent with this supposition, the results from our compliance questionnaire indicated that most participants at least claimed to be compliant with task instructions during the TNT phase (see Fig. 1). To get the non-compliance (memory checking) score, we summed the ratings across three non-compliance questions (each on a scale from 0 to 4). This yielded a score ranging from 0 to 12. The mean (± standard deviation) non-compliance score from the overall sample was 1.352 ± 1.424. For the fMRI sample and the behavior sample, the non-compliance scores were 1.259 ± 1.583 and 1.388 ± 1.388, respectively. So, although compliance was not perfect (which would be represented by a non-compliance score of exactly 0), participants did, on the whole, appear to avoid any intentional efforts to think of the No-Think items. www.nature.com/scientificreports/ Nevertheless, non-compliance occurred in some cases. To characterize the nature of this behavior, we first examined the frequency of different types of non-compliance measured on the three relevant questions. For the overall sample, intentional checking of memory for No-Think items during a No-Think trial (the score of the first question in the compliance questionnaire) was more frequent than intentional checking just after the No-Think trial was over (the score of the second item of the compliance questionnaire), t(477) = 4.281, p < 0.001 (all significant p values in this set of analyses were Bonferroni corrected for repeated measurements). The results from the fMRI sample and the behavioral sample (respectively) also revealed this pattern individually, t(134) = 1.907, p < 0.05; t(342) = 3.836, p < 0.001.
Moreover, we found that checking during a trial was more likely than intentional rehearsal of the No-Think items (the rating for the third item of the compliance questionnaire) in the overall sample, t(477) = 14.128, p < 0.001. The data of the fMRI and behavioral samples (respectively) also showed this effect individually, t(134) = 5.793, p < 0.001; t(342) = 13.260, p < 0.001.
We also found that checking after a No-Think trial (the second item of the compliance questionnaire) was more likely than was intentional rehearsal (the third item of the compliance questionnaire) in the overall sample, t(477) = 10.193, p < 0.001. The results of the fMRI and behavioral samples individually showed that same pattern of results, t(134) = 4.653, p < 0.001; t(342) = 9.135, p < 0.001. Together, these results show that participants' urge to quickly "check their memory" for No-Think items during or after a No-Think trial was more common than outright attempts to intentionally rehearse the items for the later test. This suggests that many people may view such "quick checks" of their memory for No-Think items as distinct from deliberate rehearsal of No-Think items, even though both behaviors are clear violations of the instruction to avoid awareness of the memory.
We had hypothesized that participants' compliance in the fMRI sample would be higher than that of the behavioral sample because we repeatedly emphasized to participants the very high cost of the fMRI experiment to the relevant participants before the TNT phase. To test this, we performed an independent-sample t-test comparing the total memory checking score in the fMRI sample to that of the behavioral sample. Although the total non-compliance was numerically higher in the behavioral sample, the groups did not differ significantly, t(476) = -0.888, p > 0.05.

Most participants expected a final memory test.
We report the test-expectancy data in Fig. 2, with the proportion of participants giving each rating, separately for the fMRI and behavioral samples. The results showed that most participants claimed some suspicion that they would be tested. The mean test expectancy of the overall sample was 2.102 ± 1.426.
Test expectancy was significantly higher in the behavioral sample (2.236 ± 1.357) than it was in the fMRI sample (1.763 ± 1.541), t(476) = − 3.300, p < 0.001, suggesting that the scanner context may have altered participants' perceptions of the purpose of the study. Nevertheless, although we sought to characterize the procedure for the TNT experiment as being about attention (rather than memory) before the experiment to all participants, the test expectancy effect still occurred.
Memory checking is associated with greater test expectancy. If participants expected a final memory test, they might have been more motivated to check their memory of the learned word pairs during or after No-Think trials. To test this, we computed a Pearson correlation between participants' test expectancy ratings and their total non-compliance scores (memory checking). Using 1000 bootstrap samples to test for significance, we found a reliable correlation, r = 0.185 (95% CI = [0.100, 0.268]), p < 0.001. This positive relationship was also significant within the behavioral sample, r = 0.196 [0.105, 0.282], p < 0.001, but it was only marginally significant   Fig. 3).

Memory checking is associated with reduced suppression-induced forgetting.
We tested whether participants' tendency to check their memories during and after No-Think trials (contrary to the instructions) was associated with the amount of SIF observed on both the SP and IP tests. We also examined which individual behaviors identified in our memory checking questionnaire most strongly moderated SIF. We report the results for both the conditionalized and unconditionalized data below.

Relationship between memory checking and suppression-induced forgetting.
We found that suppression-induced forgetting declined with increasing memory checking. To test this, we first examined the correlation between overall SIF (collapsed over Test Type) and memory checking using the unconditionalized data from the full sample, testing for significance based on 1000 bootstrap samples. We observed a significant negative correlation between memory checking and forgetting, r = −0.209 (95% CI = [−0.289, −0.135]), p < 0.001(see Table 2). This effect was significant individually for the fMRI sample, We next separately analyzed the correlation between memory checking and SIF, this time measured separately on the SP test and IP tests (see Fig. 4A). For the SP test, there was a significant negative correlation between SIF and memory checking using the unconditionalized data from the overall sample,   Identifying a memory checking threshold for future studies. Next, we sought to identify a reasonable cutoff score for task compliance to use as an exclusion criterion in future TNT studies. We compared participants' SIF effect from the overall (combined) sample to zero using a one-sample t-test (separately for the Same Probe and Independent Probe tests) based on participants' memory checking score (summed across the component items). Significant SIF was observed for the subsamples of participants who had a total memory checking score of either 1, 2, or 3 (lower scores reflect less checking during No-Think trials and, therefore, greater compliance); no SIF was observed when checking scores exceeded 3. This was true of the unconditionalized data from the IP test,  www.nature.com/scientificreports/ as well as for the conditionalized data in the IP test. The same tendency was observed in the unconditionalized data in the SP test, and for the conditionalized data in the SP test. Based on our exploration of the data from our large sample, we would therefore recommend that researchers consider excluding (or considering separately) participants whose total TNT memory checking score is ≥ 4 (see Table 3). We also made Bayesian factor analysis of SIF on the SP and IP tests, according to compliance (see Supplementary Table S1). The results was the same with the one-sample t-test.

Memory checking during or after a No-Think trial is negatively correlated with suppression-induced forgetting.
Although memory checking, overall, was associated with reduced SIF, the preceding analyses left it unclear as to which specific checking behaviors contributed to the observed effect. To examine this, we correlated responses from each item of the memory checking questionnaire with SIF (see Fig. 4B). In the conditionalized recall data, we observed a negative correlation between SIF and both the first item of the questionnaire ("When I saw the red cue word, I quickly checked to see if I remembered the target word.  Table 3. One-Sample t-test of SIF on the SP and IP tests, according to compliance. One-sample t-tests compared to zero were used to identify when final recall on the various test measures reliably fellow below Baseline (indicating positive suppression-induced forgetting); MCR = Memory Checking Rating (total compliance score across questionnaire items); CI = Confidence Interval; bootstrap results are based on 1000 bootstrap samples. www.nature.com/scientificreports/ Test expectancy is not associated with suppression-induced forgetting. Given the correlation between test expectancy and memory checking, one might assume that the test expectancy also predicted the amount of SIF. To our surprise, however, we found a significant correlation between test expectancy and overall SIF in neither the unconditionalized data, r = 0.

Discussion
Intuitively, the mnemonic consequences of instructions to suppress retrieval should depend upon whether those instructions actually lead participants to suppress. The number 12 and quality 37,38 of the suppression attempts previously have been shown to influence the magnitude of suppression-induced forgetting. However, whereas suppression practice-either in the laboratory or through naturally occurring events 4 -tends to be associated with greater levels of SIF, intentional retrieval practice facilitates recall of the practiced items on standard memory tests. Thus, it makes sense that investigators have reported initial signs that intentionally subverting memory suppression instructions in the TNT paradigm may water down or even reverse SIF 29 . Such observations have encouraged the use of relevant questionnaire-based exclusion criteria in subsequent protocols designed to better focus on the consequences of intentional memory stopping. Here we tested the merits of this concern using two very large samples of healthy young adults participating in the TNT paradigm. Consistent with previous work 10 , we replicated the canonical below-baseline SIF effect for No-Think items across our two samples using direct suppression instructions. Indeed, SIF arose regardless of whether recall performance was analyzed for all studied items, or only those that were demonstrably learned during the initial study phase. Critically, this evidence for SIF generalized to a test using independent probes. That cue-independent forgetting arose supports the argument that inhibition contributes to SIF 6,10,11,39 . Relative to SIF, above-baseline facilitation owing to repeatedly practicing the retrieval of Think associates appeared less generalizable to the independent probe variant of the final recall test. No reliable facilitation was detected on the SP test. This result is consistent with previous suggestions that the benefits of rehearsal tend to be most apparent on tests with cues matching those that were originally trained and practiced (i.e., on same probe tests 11,40,41 ). Such findings may reflect a feature of the encoding specificity principle: Because the initial encoding process biases the meaning of the items to the original cue, a different final test probe would be expected to reduce recall probability 23,42 . Indeed, the more strongly that a target is associated with its original cue, the more detrimental the effect of shifting cues should be 43 . Our criterion test and measures of Baseline SP recall were consistent with such a strong association having been established through the initial study and test-feedback training. Think items were then subjected to continued practice with the original cue throughout the TNT phase, thereby emphasizing the original bias even further and, presumably, making it especially difficult to retrieve the Think targets given the independent probes.
The primary aim of this study, however, was to determine the extent to which the relative difficulty in recalling associates that participants had been instructed to repeatedly suppress (rather than retrieve) is related to their self-reported level of task compliance and/or test expectancy. Indeed, the data revealed that participants' self-reported compliance with No-Think task instructions was negatively associated with the magnitude of their SIF effect. Although participants' memory checking increased when a memory test was expected to occur later in the experiment, test expectancy itself was not directly associated with SIF. The present work is consistent with the concern that task compliance could influence variability in the magnitude of SIF. As such, the present findings indicate that researchers making use of the TNT paradigm should closely monitor participants' compliance. Moreover, we used a data-driven analysis and found that participants who had a total memory checking score ≥ 4 disproportionately influence the SIF measure, providing a strong rationale for excluding participants meeting this criterion in future studies.
Some might argue that the non-compliance rate in this study is too low for any practical concern, given that we observed reliable SIF in our overall, nevertheless. Indeed, most of our participants reported complying with suppression instructions. However, not all studies making use of the TNT paradigm are able achieve the power afforded by the nearly 500 participants, 48 critical TNT pairs, and 12 repetitions of the critical Think and No-Think cues during the critical phase of the present work. Studies with relatively less power to detect a standard SIF effect would be more sensitive to the distorting effects of non-compliance, even if they undertook all the other measures we employed to ensure understanding of the instructions and shape expectations. Thus, we believe caution is warranted.
Our participants' overall level of non-compliance was negatively associated with the magnitude of SIF, as were the specific measures of non-compliance (intentionally checking their memory for No-Think associates they had been instructed not to think about) both during and after No-Think trials. Even when test expectancy was at its highest, this negative relationship was still observed. Despite an association between memory checking behaviors and test expectancy, test expectancy itself failed to reliably directly predict the magnitude of SIF on its own, suggesting that task compliance during and around suppression windows is a more powerful determinant of individuals' memory control scores than are expectations about testing. While expectation of a final test may encourage non-compliant behaviors (e.g., checking one's memory), many expectant participants apparently were able to resist the urge to do so and go on to demonstrate their control abilities in the form of measurable SIF. www.nature.com/scientificreports/ In this, as in other studies of retrieval suppression, several methods were employed to encourage and track participants' compliance. First, we masked the true focus of the study using a cover story in which participants were told that we were interested in their ability to pay attention and ignore distracting things (e.g., the learned associates whenever cues were presented in red during the main TNT phase). This framing made the avoidance of "distraction" by No-Think items a key goal. To support this framing, we carefully avoided references to "memory" and any hints of a final memory test at all stages of research-from advertisements, to consent forms, to laboratory context (e.g., no memory books on shelves; not memory decor), to instructions or computer displays. By eliminating such references, we avoided encouraging participants to adopt a contraindicated strategy that they might have assumed would improve their retention of No-Think items. Second, we administered a series of diagnostic questionnaires throughout the practice and critical TNT phases to reaffirm the task instructions, correct any apparent misunderstandings, and assess compliance at early stages. These efforts presumably curbed checking behaviors to some extent and contributed to the high level of compliance, overall (i.e., non-compliance would have been even higher, overall, without these procedures in place).
Nevertheless, despite these efforts, the present work documents self-reported evidence that a certain subset of participants still admitted to engaging in some degree of memory-checking behaviors (during and/or after No-Think trials), although deliberate attempts to intentionally rehearse the items were relatively rare. Despite everything, though, we anticipate that some percentage of participants will still fail to comply with the instructions. While no self-report measure is perfect in capturing non-compliance, we believe that our questionnaire, together with an established exclusion threshold (or a covariate entered into the statistical model) could allow researchers to better focus on the aftereffects of actual suppression attempts, if that is their aim. The results of our analyses lead us to conclude that it is critical that compliance be encouraged, measured, and considered during the analysis of suppression-induced forgetting in future studies using the TNT paradigm. Of course, we must acknowledge that there are alternative interpretations of the correlations reported here. It is possible, for example, that participants who were naturally bad at suppressing retrieval sometimes attributed intrusions of the associate as non-compliance (although this contribution was possibly limited by our repeated emphasis that checking needed to be intentional). Alternatively, the correlation between the participants' compliance and SIF effect might be driven by a third variable that caused both the lack of compliance and poor SIF. Further study, perhaps with manipulations or objective measures to supplement after-the-fact, self-reported compliance scores, the directionality and causality of this observed linkage could be determined.
Researchers intending to examine the aftereffects of retrieval suppression using the TNT paradigm may gain substantial added traction and clarity by focusing on participants who are naturally motivated to comply with the instructions; over and above that, though, efforts to foster a higher level of intrinsic motivation in participants to suppress the targets stand to increase the efficiency of this research and potentially translate to real-world applications. Variants of the paradigm incorporating negatively valanced or personally relevant No-Think items may represent one means to this end. Other approaches may involve increasing the stakes of a distraction popping to mind, perhaps by tying online success measures to reward (or failures to low-grade but salient forms of punishment).

Conclusion
In conclusion, we reported the results from two large, independent samples of healthy participants" demonstrating that access to encoded memories was reliably impaired through retrieval suppression prompted by the TNT paradigm. Importantly, the results further revealed that participants' self-reported compliance with the suppression instructions-but not test expectancy-predicted SIF. As such, the current study provides clear evidence consistent with task compliance being a likely source of variability in the SIF, as measured via the TNT paradigm. To better isolate the forgetting effect of interest, future TNT studies should ensure participants' compliance with the suppression instructions through the careful administration of task instructions and regular use of diagnostic questionnaires. Moreover, strategies designed to improve participants' intrinsic motivation to suppress the memory could be considered. Directions to suppress, on their own, may not always be sufficient to induce forgetting, especially when participants may be motivated to retain the suppressed content (as when they expect a later test). Given the proper commitment to push unwanted memories out of mind, suppression reliably yields suppression-induced forgetting.