Introduction

Mindfulness-based interventions (MBIs) have demonstrated efficacy in improving a range of clinical outcomes, such as depression and anxiety1. In a rigorous randomized controlled trial, mindfulness-based stress reduction (MBSR) was even found to be non-inferior to antidepressant medication2. However, MBI delivery and impact remain limited by various factors, two important ones being barriers to access and difficulties with sustained engagement. That is, for many individuals, MBIs remain inaccessible for the same reasons that mental health treatment remains inaccessible, including cost, stigma, a shortage of clinicians, and various logistical barriers (e.g., lack of transportation, lack of childcare)3,4. In addition, MBIs necessitate practice outside of session, which contributes to outcomes5; however, many struggle to sustain a consistent mindfulness practice on their own outside of in-person sessions.

Technology can bridge the gap in both of these situations. Mindfulness apps can provide an alternative when in-person MBIs are inaccessible, and integrating mindfulness apps into in-person treatment can facilitate practice and increase intervention impact6,7. Yet most commercially available mindfulness apps have not been scientifically evaluated8, and most mental health apps struggle to keep users engaged9. Related, uptake of mindfulness apps is low in treatment, despite interest from clinicians10 and their patients11,12. One commonly cited barrier is a lack of knowledge about which apps are credible and effective13. To address these barriers and stimulate more research into building the mindfulness app evidence base, we conducted a systematic review to assess these apps’ effectiveness in shifting psychological processes of change related to mindfulness.

Recent reviews suggest that mindfulness app effects on clinical outcomes are often inconsistent. For example, one review found generally small app effects on depression and contradictory results for anxiety14. However, the common approach of evaluating app effects on such distal psychological outcomes as psychiatric disorders is problematic because app intervention periods tend to be too brief for these types of outcomes to demonstrate significant and consistent change. A recent meta-analysis of 23 mindfulness app evaluations found that only nine studies used intervention periods that adhered to the recommended eight weeks of such MBIs as MBSR and MBCT15. Therefore, a more suitable approach to reviewing mindfulness app efficacy may be to focus on the more proximal processes of change, or mechanisms, that have been empirically demonstrated to explain the effects of mindfulness practice on more distal psychological outcomes. Temporally, mechanisms shift first16; thus, focusing on these intermediary outcomes may provide a clearer picture of the efficacy of mindfulness apps.

Adopting a mechanisms-as-outcomes approach has three additional benefits. First, the knowledge gained from such an approach can lead to more targeted apps, which may enhance their efficacy. Second, current evidence suggests that mHealth app engagement in the general public falls to near zero after two weeks9. Given this reality, it is key to understand whether the brief periods in which apps tend to be evaluated have any impact on mechanistic targets. If they do not, it will be important to focus efforts on sustaining engagement for longer in the hopes of seeing a substantial impact on these important targets. Third, this approach provides valuable insights for clinicians specializing in evidence-based treatments as many of the mechanisms of mindfulness practice (e.g., emotion regulation) are also the transdiagnostic mechanisms targeted in such therapies17,18. Therefore, knowledge gained from this approach can aid clinicians in evaluating such apps as potential complements to ongoing treatment goals.

To date, no mindfulness app review of which we are aware has focused on the mechanisms of mindfulness training as outcomes. Thus, a systematic review is warranted to investigate the evidence of mindfulness app effects on the mechanistic processes through which mindfulness training has been demonstrated to influence transdiagnostic symptom change19.

Methods

This systematic review was conducted according to PRISMA guidelines20 and registered on the International Platform of Registered Systematic Review and Meta-Analysis Protocols (#202350017). To identify mechanisms of mindfulness practice, we first searched for papers that proposed likely mechanisms based on a thorough rationale. We searched for these papers in Pubmed (using the keywords “mindful*” in the Title field, and “mechanism” or “mediat*” in the Text field). This method yielded four theory papers21,22,23,24, from which we extracted the proposed mechanisms. For each proposed mechanism, we then searched the literature for empirical support (obtained through mediation analysis). Our list of theoretically and empirically supported mechanisms of mindfulness practice appears in Table 1. (For an overview of corresponding theories, see eTable 1).

Table 1 List of outcomes of interest.

To be included in this review, a study had to (a) be a randomized controlled trial design, (b) evaluate a mindfulness-based mobile app, (c) assess change in one or more of our identified mechanisms using a validated, reliable measure, (d) focus on adults (≥18 years), and be (e) peer-reviewed and (f) written in English. A mindfulness-based app was defined as any app that was designed for the sole purpose of facilitating mindfulness practice. We excluded studies on Web-only or text-based interventions, as we were most interested in apps for their accessibility and scalability. To avoid sample biases, we also excluded studies of non-smartphone technology (e.g., VR, wearables, tablet apps), which are not yet widely adopted. We also excluded studies on adolescents because many mindfulness apps limit use to adults in their terms and conditions, and because some recent evidence suggests that mindfulness practice may affect adolescents differently than it does adults25. Finally, regarding validated measures, we made an exception for ecological momentary assessment (EMA) studies, which tend to use few items to reduce participant burden.

An electronic literature search was performed by the first author on October 26, 2022, on Pubmed, APA PsycINFO, and Web of Science. The search was updated on August 7, 2023. (For search strategy, see eTable 2). Studies identified were divided among four pairs of reviewers (NM & ZM, NM & TG, NM & ER, NM & SL). Reviewers independently assessed studies based on title and abstract and gave inclusion/exclusion recommendations, which were subsequently compared; any disagreements were resolved through discussion in each pair, consulting JT if consensus could not be reached. The same process was followed for full-text review, data extraction, and quality assessment (QA). The Quality Assessment Tool for Quantitative Studies, which has evidence of validity and reliability26, guided the quality assessment process. The tool outlines assessment criteria for eight domains of bias. Overall QA ratings and domain-specific section ratings for each study appear in eTable 3 and eTable 4, respectively.

The range of clinical and methodological characteristics in the studies included in this review prevented a meta-analysis, and we employed a narrative synthesis of the data. We first grouped studies by thematic similarity. Within each group, we assessed studies by findings, searching for similarities and differences. When findings were contradictory within a group of studies, we examined potential contributors (e.g., differences across studies in sample and study characteristics, such as control group strength, type of app evaluated, and measurement instruments). The results of this process are described in the subsequent sections.

Results

A PRISMA flow diagram summarizing the results of our study selection process appears in eFigure 1. In total, data was collected from 5963 adults across 28 studies that varied widely in terms of location. The mean age across 23 studies that reported it was ~33 (SD = 8.98). Only 17 studies described the racial/ethnic composition of the sample; samples were predominantly White, and none were nationally representative. Approximately 79% identified as female (across the 24 studies that reported on female gender) and 19% as male (across the 17 studies that reported on male gender). Only one study reported on sexual orientation. See Table 2 for detailed sample characteristics.

Table 2 Sample characteristics.

Study characteristics

Studies assessed Headspace (n = 12), VGZ Mindfulness Coach (n = 3), Unwinding Anxiety (n = 2), Healthy Minds Program (n = 2), Calm (n = 1), Stop, Breathe & Think (n = 1), Craving to Quit (n = 1), MediTrain (n = 1), Balloon App (n = 1), REM Volver a Casa (n = 1), Spirits Healing (n = 1), Wildflowers (n = 1), and Mindfulness (n = 1). These apps are available on both Apple and Android phones, except two: one offered on iPhones only (Mindfulness app19) and one that was commercially available at the time of investigation but now appears to be defunct (Wildflowers app27). (For more details on these apps, see eTable 7).

Most studies prescribed a specific dose, or amount, of app-delivered mindfulness practice (n = 20), ranging from 10 minutes a day (n = 9), several exercises a day (n = 5), daily (n = 3) or weekly (n = 1), or beginning at 10–20 minutes daily and gradually increasing use (n = 2). (For more details on app features designed to facilitate mindfulness practice, see eTable 7).

All 28 studies had at least one control group. Active control groups tended to be digital in nature, with most involving non-mindfulness apps (n = 10), one offering a WeChat-based health consultation, one a multimedia stress-related psychoeducation website, and one in-person MBSR. Non-mindfulness apps used to control for cognitive expectancies and attention included emotion self-monitoring apps (n = 3), cognitive training apps such as the 2048 app and the Peak app (n = 2), apps delivering other psychological interventions such as behavioral activation and progressive muscle relaxation (n = 2), a list-making app (n = 1), a music app (n = 1), and directions to split time equally among three apps (i.e., Duolingo, Tai Chi app, or logic games) identified in a prior study as matched in cognitive outcome expectancy (n = 1). Passive control group participants were either waitlisted (n = 15), offered treatment as usual (n = 2), or provided with no intervention (n = 1). See Table 3.

Table 3 Study characteristics.

The average intervention phase lasted ~5.46 weeks (SD = 2.23). In all studies, participants were asked to train with the mindfulness app on their own (rather than in a controlled lab environment). Outcomes were measured with pre- and post-intervention self-report questionnaires in all studies but three. These three studies used objective behavioral tasks to measure outcomes, with one administering a gamified app remotely28 and two administering cognitive tasks in a lab environment27,29. Only 10 studies included follow-up assessments (i.e., assessments taking place at least one month after the end of the intervention period) to examine whether changes in the outcomes of interest to this review were sustained in the long term. (See Table 3).

App engagement metrics reported varied widely. Some reported engagement in terms of average number of minutes of app use (total or per day or week), average days practiced, and average number of app sessions/exercises completed (total or per day). As such, it was difficult to determine patterns of engagement across studies. To identify patterns, we grouped studies with similar metrics by intervention length and computed ratios based on the two metrics most often reported. Results indicated that engagement was generally low (see eTable 5).

Methodological quality

Overall, study quality was rated as moderate to weak, with all studies having some concerns (see eTable 3). Most studies minimized measurement, allocation, and detection bias, as they assessed outcomes with valid and reliable measures or tasks, used appropriate allocation methods, and ensured research staff were blinded to condition. Bias tended to arise in terms of selection, attrition, and lack of attention on minimizing potential confounders. Most studies used self-referred convenience samples from one setting, and attrition rates ranged from moderate (i.e., 21%–40%) to high (i.e., >40%), with an average of 23% (SD = 13%) across studies. Most studies did not adjust for important confounders (see eTable 4 note). In addition, 12 studies were underpowered. Implementation bias was difficult to detect, as most studies did not report the percentage of participants who received the allocated intervention as it was intended (i.e., recommended dose of app use).

Outcomes and findings

Across 28 studies, 67 outcome comparisons were made between the intervention and control group. Of these 67 comparisons, 35 (53%) revealed a between-group difference favoring the intervention group. Of the 35 between-group effects favoring the intervention group, most were found when the mindfulness app was evaluated against a passive (n = 28; 65%) versus an active (n = 7; 30%) control group. (Note: Passive, or inactive, control groups involved either waitlisting participants, or offering them treatment as usual or no intervention. Active control groups offered participants a comparable task to engage in, such as a non-mindfulness app.) Effect sizes tended to be moderate to large across domains, and gains from using mindfulness apps were generally sustained at follow-up. (See Table 4). Results by outcome domain appear in Table 4 and Fig. 1.

Table 4 Study findings by outcome category.
Fig. 1: Summary of results.
figure 1

Dark green represents a between-group effect favoring the mindfulness app group; orange represents a between-group effect favoring the control group. Light green denotes studies that found no between-group effect (i.e., both groups improved or within-group effect favoring the mindfulness app group present); gray denotes studies that found no between-group effect (i.e., neither group improved or within-group effect favoring the control group). Light blue represents no between-group effect but unclear whether both or neither group improved. Rep. Neg. Thkg = Repetitive Negative Thinking.

Awareness. The most frequently examined outcome was awareness, assessed in 15 comparisons and measured with the Acting With Awareness subscale of the Five Facet Mindfulness Questionnaire (FFMQ30)31,32,33,34,35,36 or of its short-form version (FFMQ-SF37)38,39,40, a one-item measure based on the FFMQ Acting With Awareness subscale in an experience sampling study41, the Acceptance subscale of the Philadelphia Mindfulness Scale (PHLMS42)43, the Multidimensional Assessment of Interoceptive Awareness (MAIA44)45, or the Interoceptive Respiration Task27. Findings were mixed, with about half the studies (n = 7) finding an effect favoring the intervention group (small to large effect sizes), five finding that both groups improved, and three that neither improved. Studies that found an effect favoring the intervention (versus those that did not) used passive control groups and tended to have samples with a greater female composition (see eTable 6). The four studies that used active control groups found that either both groups improved39,41 or neither did27,36.

Nonreactivity was assessed in 12 comparisons and measured with the nonreactivity subscale of the Five Facet Mindfulness Questionnaire (FFMQ31) in all but two studies that instead used the nonreactivity subscale from its 24-item short-form version (FFMQ-SF37)38,40. Findings were mixed, with six comparisons yielding an effect favoring the mindfulness app (medium to large effect sizes)34,35,38,39,45,46, three showing that both groups improved33,39,40, two that neither did36,40, and one yielding an effect favoring the control group30. All six comparisons that yielded an effect favoring the mindfulness app were made with passive control groups and tended to have samples with a greater female composition. Two studies that used active control groups found that either both groups improved39 or neither did36. The study finding an effect favoring the control group had a very small sample size and was underpowered30. No consistent associations between intervention length and outcomes were apparent across studies.

Non-judgment was assessed in 10 comparisons, using the non-judging of inner experience subscale from either the Five Facet Mindfulness Questionnaire (FFMQ31) or its short-form version (FFMQ-SF37). Findings were mixed, with four finding an effect favoring the mindfulness app30,34,35,39, three that both groups improved36,39,40, and three that neither improved33,40,47. Only two studies used active control groups, both finding that both groups improved36,39.

Positive affect was examined in five studies and measured with the Positive and Negative Affect Scale48,49 or one-item measures in EMA studies36,50,51. Findings were mixed, with two finding an effect favoring the intervention group48,49, two that both groups improved36,51, and one that neither group improved.52 All five studies used an active control group, although in two, control groups were non-equivalent48,52. Two of the three that found no between-group differences were underpowered51,52, and in one, the intervention app dose varied across participants, with some receiving it for 40 days and some for 6051. The two studies that found a between-group difference had samples with a greater female composition.

Repetitive negative thinking. Ten comparisons assessed repetitive negative thinking styles, including worry (n = 7), perseverative thinking (n = 2), and rumination (n = 1). Worry was assessed with the Penn State Worry Questionnaire. Three studies found an effect favoring the intervention group, with small to large effect sizes45,46,53, and one of these had an active control group53. Two studies that found that neither group improved were underpowered52,54. Studies that found a between-group difference (versus none) had samples with a greater female composition.

Two studies examined perseverative thinking32,55, assessing it with the Perseverative Thinking Questionnaire (PTQ56), a measure of both worry and rumination, and using a waitlist control group. Both studies found an effect favoring the mindfulness app. Only one study examined rumination directly53, measuring it with the brooding subscale of the Ruminative Response Scale (RRS57); no significant between-group differences were found.

Attention regulation was evaluated in only three studies (that yielded four group comparisons) and measured with behavioral tasks, including the Centre for Research on Safe Driving-Attention Network Test (CRSD-ANT58)27, which is a validated briefer version of the Attention Network Test (ANT59); the validated sustained attention task Test of Variables of Attention (TOVA60)29; and a gamified sustained attention task (“Go Sushi Go”)28 based on the validated Sustained Attention to Response Task (SART61). All four yielded an effect favoring the intervention group, with effect sizes ranging from small to large. All studies used an active control group.

Decentering/defusion was examined in three studies. Two32,55 used the Drexel Defusion Scale62 and one36 the decentering subscale of the Toronto Mindfulness Scale63. All three found a between-group difference favoring the intervention group; one had an active control group36.

Acceptance/psychological flexibility was examined in three studies and measured with the acceptance subscale of the Philadelphia Mindfulness Scale (PHLMS64)43, or with the English65 or Dutch52 version of the Acceptance and Action Questionnaire—II (AAQ-II66). No between-group differences were found; one study that used an active control group of a behavioral activation app found that both groups improved65. Two other studies found that neither group improved43,52, although one was underpowered52.

Finally, only one study each examined self-regulation, reappraisal, suppression, values, and extinction, with one study examining the first three against a waitlist control group67 using the Self-Regulation Scale68 and the German version of the Emotion Regulation Questionnaire69. This study found a between-group effect favoring the app group for self-regulation and reappraisal, but not suppression. One study assessed behavioral enactment of values30 with the Valuing Questionnaire70 and used a waitlist control group; results favored the intervention over the control group. The study that examined extinction71 used a two-day lab-based aversive Pavlovian conditioning and extinction procedure and a waitlist control group. Results showed that after using the mindfulness app for 4 weeks, the intervention (versus waitlist control) group had greater retention of extinction learning, as demonstrated by less spontaneous recovery of conditioned threat responses one day after extinction training.

Mediation analysis

Only two studies conducted mediation analysis with a psychological disorder as an outcome. One study found that worry partially mediated the relationship between mindfulness practice and anxiety45 and the other that worry fully mediated the association between mindfulness training and worry-related sleep disturbance46.

Heterogeneity & certainty of evidence

The range of populations in which apps were evaluated and inconsistent app engagement likely contributed to heterogeneity in findings. Methodological quality was also a likely contributor to inconsistent findings, as quality was moderate to low across studies. In the awareness domain, for example, of studies that found no between-group differences, one was underpowered,27one used a single-item measure that did not correlate highly with the full measure41, another had a 45% dropout rate39, and in another, data came from only 4% of eligible patients who enrolled43. Such methodological weaknesses, found across domains, likely increased the heterogeneity of findings and lower confidence that the lack of effects was due to a lack of app efficacy.

Methodological weaknesses also lower the certainty of evidence in domains with more consistent findings. In most domains, when effects favoring the mindfulness apps were found, most or all were from studies with passive, rather than active, control groups. In only two domains did all studies use active control groups: positive affect and attention regulation. However, in the positive affect domain, studies finding an effect favoring the mindfulness app group had relatively high attrition rates (38% and 35%), lowering confidence in findings. (For context, the average attrition rate in a recent meta-analysis of mHealth studies was 24%;72 objectively, attrition rates of up to 20% are considered ideal, and those nearing 40% are deemed to be high as they risk introducing bias26).

The domain of attention regulation was the strongest set of studies. All studies in this domain employed not just an active digital control group but also objective task measures to assess outcomes, increasing the certainty of evidence, although more studies are needed in this domain.

Discussion

This systematic review identified 28 RCTs that evaluated a mindfulness app and examined as an outcome at least one theoretically and empirically supported mechanism of mindfulness practice. By focusing on mechanisms, this review aimed to provide a more nuanced understanding of the psychological impact of mindfulness apps. Overall, more research is needed in most outcome domains assessed in this review. Effects tended to favor the mindfulness app (versus control) group in the domains of attention regulation, repetitive negative thinking, and decentering/defusion, and findings were mixed in the domains of awareness, nonreactivity, non-judgment, positive affect, and acceptance/psychological flexibility. Various methodological issues, population characteristics, and app engagement problems likely contributed to the heterogeneity of findings.

The attention regulation domain was the strongest set of research studies. Results favoring the mindfulness app group in this domain are promising and consistent with other findings suggesting that in-person MBIs have positive effects on executive function73,74. They are also consistent with other study findings suggesting that those with (versus without) meditation experience exhibit greater cognitive flexibility75.

A trend that became apparent across most sets of studies is that studies with more female participants tended to more consistently find effects favoring the mindfulness app group. This trend is in line with other recent findings suggesting that females (versus males) may benefit more from mindfulness-based interventions14,76,77,78. Some have suggested that this difference may be due to the fact that mindfulness targets rumination, a problematic emotion regulation strategy more often used by females than males; in contrast, men tend to more often use distraction, and the focus on the present-moment experience that mindfulness training requires may initially increase negative affect for men76. Based on this finding, more research into these potential gender differences is warranted. If this finding is indeed replicated, gender-specific modifications in app delivery for males (e.g., emphasis on non-judgmental observation of experience) may be beneficial.

Another likely moderator of mixed findings was app engagement. Engagement metrics reported across studies varied widely, and it was difficult to assess overall engagement across the majority of studies. From the available metrics, however, engagement appeared to be generally low. The lack of consensus on engagement metrics is a recognized challenge in the mHealth space79,80, as is the difficulty sustaining engagement over time81. Notably, some studies that found no between-group differences found a mindfulness app effect at higher engagement rates38,40. Such findings are in line with evidence of a dose-response relationship between home practice and outcomes in in-person MBIs, which also demonstrate problems with adherence to at-home mindfulness practice, as data suggests that MBI participants complete, on average, only about 64% of the assigned amount of home practice5. This nevertheless amounts to a much higher rate of daily practice than seen in the studies of mindfulness apps in this review, underscoring the importance of incorporating strategies to increase app engagement so that the efficacy of these apps can be better evaluated.

It is also worth noting two other potential contributors to heterogeneity that relate to broader issues in the field. There is a lack of consensus on the definition of mindfulness, and the resulting diverse mindfulness conceptualizations82 may lead to different teams emphasizing different aspects of mindfulness practice during intervention implementation—differences that may have contributed to heterogeneity in outcomes. In addition, despite more mechanism-driven research into in-person MBIs over the past decade, these mechanisms are not yet well understood83, with some leading mindfulness mechanism theories at times yielding mixed support84. A better understanding of the transdiagnostic factors through which in-person MBIs impact change in mental health outcomes will lead not just to more refined mHealth interventions but also stronger evidence for the theories informing these interventions.

Limitations of body of evidence and future directions

To advance this literature, we propose several future directions and research recommendations. First, future studies replicating these findings should employ strategies that foster app engagement. Sustained app engagement is key to obtaining accurate estimates of apps’ impact on various outcomes. In addition, although the use of incentives is acceptable in (and in line with the goals of) earlier stages of research, it is not a scalable strategy for real-world dissemination. Selecting theory-based strategies (e.g., goal-setting features, support) and building them into an app’s design, even in earlier stages of research, paves the way toward creating efficacious apps that have a greater likelihood of successful dissemination.

Related, more fine-grained details on app engagement would likely aid in resolving some of the inconsistent findings. Even mindfulness apps have a variety of features, some of which do not necessarily strengthen practice (e.g., soothing sounds or music that several apps offered, as seen in eTable 7). Better understanding how participants were using apps could help clarify why app use, in some cases, was less impactful. In addition, some people stop engaging with apps as they achieve their mental health goals, a phenomenon referred to in the literature as “e-attainment.”85 Thus, in some cases, discontinuation could be associated with positive outcomes, as some may have stopped using the app because mindfulness practice became a part of their routines. Thus, assessing reasons for app discontinuation can also help clarify inconsistent outcomes.

Second, future studies should better control for digital placebo effects. Many of the studies that found app effects used passive control groups, which provides encouraging evidence but does not rule out the possibility that improvements were due to simply using an app rather than to the mindfulness-specific aspects of the app. At the same time, active control groups should be chosen with careful consideration. For example, one study used a progressive muscle relaxation app as an active control and found no between-group differences in positive affect51. This finding may be expected, however, as relaxation has also been found to increase positive affect86.

Third, future studies should carefully consider the measurement of mindfulness-related constructs. There is growing concern that the conceptualization of mindfulness—and thus its measurement—is culturally biased, with some evidence suggesting that such widely used measures as the FFMQ may not actually perform well in non-Western populations87. Without this awareness, researchers risk continuing to build a body of evidence based on mindfulness definitions that are not necessarily universally accessible. Fortunately, alternative, more culturally relevant measures are starting to be developed88. In addition, although objective outcome measures are often not widely available, when they are, they should be used in future studies. Some examples of objective outcome measures include app-based cognitive games that are gamified versions of validated neuropsychological paradigms28, implicit tasks (e.g., the IPANAT for positive affect89), wearables to measure physiological reactivity (which, when combined with self-reported arousal, can be a measure of experiential avoidance90), or rumination induction tasks91 to assess whether participants who have been practicing mindfulness more are better able to exit such repetitive negative thinking states. Confidence in findings from self-report measures can be strengthened by the addition of objective measures.

With respect to study population, future studies should evaluate apps in nationally representative samples to increase the generalizability of findings. However, studies should also continue to evaluate apps in specific populations but test population-specific, theory-driven hypotheses about specific mechanisms most pertinent to that population. Doing so can help inform ways to tailor app delivery to each population to better target mechanisms. Related, greater empirical focus is needed on evaluating mindfulness apps in minoritized populations, who continue to be underrepresented in mHealth research92—a trend that also became apparent in the studies included in this review. Some evidence suggests that being African American is associated with lower odds of accessing and continuing to use a leading commercially available mindfulness app93, and lower educational attainment is also associated with lower odds of app access93. It is critical that future research studies focus on minoritized populations to avoid perpetuating disparities and introducing new ones in the form of digital inequities.

In addition, most studies did not report on implementation details, including details on how mindfulness was explained to participants. Yet how an intervention is introduced affects engagement and outcomes94,95, and calls have been made for mindfulness intervention studies to report on the explicit instruction given to participants regarding mindfulness82. This is especially important, given evidence that core aspects of mindfulness practice are often misunderstood by the general public96 and given the different conceptualizations of mindfulness82 that may lead to differences in intervention design and implementation. Better reporting on instruction details may elucidate some heterogeneity in findings. Researchers can also focus on other aspects of delivery beyond instructions, such as tailoring recommendations regarding timing and practice. For example, in samples of socioeconomically disadvantaged individuals facing multiple daily stressors, special attention could be placed on creating a tailored practice schedule. This discussion would help integrate mindfulness practice into their daily routine and better relate the practice to their specific challenges (e.g., constant worry regarding financial strain). This strategy may increase app relevance to each population’s contextual factors and heighten the app’s impact on hypothesized mechanisms.

Finally, moderators should be conceptualized and measured. While heterogeneity is often viewed as a signal of low efficacy, it is, in fact, normal and expected97. Aside from main and mediating effects, it is also important to consider when and for whom app effects are strengthened or weakened. Population-specific moderator hypotheses can relate to technology (e.g., app features), the individual (e.g., beliefs about technology), and their context (e.g., app integration into lifestyle). Special consideration should be paid to gender differences to increase our understanding of how gender influences mindfulness app outcomes. Overall, there has been little empirical focus on individual differences in the broader MBI literature too98, a gap that needs to be addressed in both of these areas of research.

Guidance for clinicians: integrating apps into care

Although this review focuses on mindfulness apps’ clinical foundation, it is important to note that evidence of efficacy is just one of the five factors clinicians need to consider when selecting apps to recommend to patients. The other four factors are described in the APA app evaluation model99, a framework for helping clinicians choose suitable apps: accessibility (e.g., app cost, offline features), privacy and safety (i.e., data protection), app usability, and data integration toward the therapeutic goal (e.g., can app data be easily shared with the provider?)99. To ease the process of evaluating these factors, clinicians can use an app database, such as mindapps.org, a constantly updated database designed to make the APA framework easily actionable for public use. Using such tools can leave clinicians empowered to integrate mindfulness apps that may improve outcomes into care.

Limitations

Several limitations of this review are worth noting. First, we did not extend the search into gray literature, which may bias results to only published evidence. Second, despite efforts to be inclusive of mindfulness mechanisms, we neglected to include self-compassion, one mechanism that has also been theoretically and empirically supported100. Future research should extend the focus on this important potential intermediary outcome of mindfulness app use. Third, our review did not focus on SMS-based interventions, which are also promising digital mental health tools that can enhance the impact of MBIs101 and thus warrant future empirical attention. Finally, given that our research question focused on discrete mechanistic targets that theories suggest would change after the onset of mindfulness practice, we excluded studies that only reported on composite measures of mindfulness (e.g., FFMQ, MAAS). Given that these scales measured several of our constructs of interest together, they were deemed out of the scope of this review. Although this limitation was partially addressed by a recent meta-analysis on composite measures of mindfulness as an outcome of mindfulness app interventions14, whether included studies examined mindfulness as a mechanism was not reported. Thus, a future review on this topic may be potentially fruitful.

Conclusion

Mindfulness-based mobile apps can not only enhance mental health treatment but also offer scalable solutions to address barriers to in-person MBI access. The literature on the psychological impact of mindfulness apps is still nascent and suggests that mindfulness-based apps are promising, especially for regulating attention, reducing repetitive negative thinking, and promoting decentering/defusion. Continuing to elucidate mindfulness apps’ impact on processes of change that account for transdiagnostic symptom reduction is crucial in optimizing app design to enhance app efficacy and truly realize the potential of these apps as viable complements to routine care.