Introduction

Mental disorders (MDs) are highly prevalent worldwide1. Globally, every fifth person is affected, and roughly one-third of adults have experienced mental illness at least once2. MDs constitute a substantial burden for individuals and society. Meta-analytic evidence shows an elevated risk of mortality in people with MDs3,4 and low quality of life5. In addition, MDs appear to be correlated with several physical illnesses6 such as stroke, pain, cancer, diabetes mellitus, asthma, heart disease, hypertension, and insomnia7. According to the World Health Organization, disease burden as expressed in disability-adjusted life years (DALYs) associated with MDs is substantial and has remained constant over time and across countries8. In 2016, Vigo et al. argued that the “true” estimate of the global burden caused by MDs will double compared with earlier estimates and will account for 13% of total DALYs. Hence, the burden of MDs is comparable with those of cardiovascular and circulatory diseases9.

MDs are associated with substantial economic costs for society. Associated productivity losses due to absenteeism and presenteeism, earlier retirement, and increased level of healthcare utilization have major influence on society. In 2010, the global costs associated with MDs were estimated at US$2.5 trillion10. Indirect costs, such as productivity losses or premature death, were twice as high as direct medical costs related to health service use. In the EU, MD-associated costs are estimated at €798 billion in 201011. However, costs are expected to double by 203010 because of increasing demand and rising costs.

Despite the availability of effective psychological interventions12, the majority of individuals with MDs remain untreated13 or receive delayed treatment often initiated several years after MD onset14. The reasons are multifaceted. Attitudinal barriers, such as low perceived need or a stigma-related desire to handle one’s problems seems to be more important than structural barriers, such as availability of treatment and expenses both for initiating and continuing treatment15. One promising approach to overcome these barriers of traditional psychological interventions are internet- and mobile-based interventions (IMIs). IMIs can address these barriers, as IMIs are anonymous, effective, and accessible 24/716,17. Additionally, IMIs can be implemented as stand-alone self-help interventions, as blended care (a face-to-face therapy extended with psychoeducation delivered via the internet) or as part of a stepped care approach in which the amount of support is adjusted to the patient’s needs. IMIs were shown to be effective for treating common MDs across various settings and age groups18,19,20.

Although the initial costs of developing IMIs can be substantial, the low marginal costs of providing IMIs to additional users can result in lower overall expenditure because of an economies of scale effect16. However, intervention costs largely vary based on the following four aspects: development phase (new product vs. modified version), scaling-up effects (small vs. large number of users), overestimation of costs (small number of study participants), and efficiency (improving productivity vs. additional costs when newly implemented)21. In addition, IMIs are likely to reduce healthcare costs compared with traditional face-to-face treatment, as IMIs reduce costs stemming from therapist’s time and patient’s travel to health services22. Hence, IMIs are often touted to be cost-effective despite the weak evidence base for their cost-effectiveness.

Several systematic reviews have attempted to establish the cost-effectiveness of IMIs for MDs in comparison with various control groups. However, the presented evidence on whether IMIs for MDs provide good value for money is inconclusive because some reviews included only few internet-based studies: n = 323, n = 424, N = 1225, n = 126, and n = 527. In addition, 6 of 8 reviews can be considered obsolete today with the latest primary study stemming from 201622,23,24,25,27,28, whereas many more studies have since been published, e.g., 26 identified ongoing cost-effectiveness studies for major depression25. Moreover, previous reviews used broad definitions of IMIs, e.g., any internet or web enabled platform for diagnosis, screening, treatment, prevention, training, education, or facilitating self-management of MDs29. Finally, previous reviews have not always included full health economic evaluations, but have reported costs and effects without relating them to each other23,29, and if they did, they only focused on internet-based cognitive behavioral therapy (iCBT)22. Likewise, there exist only a few economic evaluations for common treatment options (different types of psychotherapy, pharmacological interventions, such as antidepressants) for depression30 and anxiety disorders24. Some evidence shows that psychotherapy might be cost-effective compared with pharmacological interventions.

Therefore, a comprehensive overview of the state-of-the-art evidence of IMIs across MDs and symptoms including studies with good methodological quality and full economic evaluations are needed to enable better comparisons and obtain reliable conclusions on guidance, cost perspective, and psychological interventions other than iCBT.

In view of the disease and economic burden of MDs, first, we evaluated whether IMIs for the prevention and treatment of common MDs represent good value for money. Second, we assessed whether these interventions have a good methodological quality. In this respect, our review provides additional evidence to decision makers31 to make informed decisions on the allocation of scarce resources to provide sustainable healthcare.

Results

Study selection

A total of 4044 articles were identified, of which 2951 duplicates and non-relevant studies were removed. Of the 277 full text articles, 36 were eligible for inclusion (Fig. 1), referring to 32 studies. One study was assessed by three articles, and two studies were assessed by two articles. These articles differed by perspectives taken32,33,34,35, time horizons used36,37, or type of analysis36,38 used for the evaluation.

Fig. 1: PRISMA flow diagram.
figure 1

Using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses as a screening process, a total of 36 studies were included in the study.

Study characteristics

Table 1 lists relevant study characteristics. Of the 32 studies, 5 have 3 and 1 has 4 comparison groups, whereas 27 only compare 2 groups. In three studies, the same IMI was evaluated39,40,41,42. The included studies encompassed a total of 10,083 participants. The studies were published between 2010 and 2021 and originated from Australia (n = 2), Canada (n = 1), Germany (n = 7), Netherlands (n = 8), United Kingdom (n = 6), Spain (n = 1), and Sweden (n = 10). On average, studies were published in 2015, and most studies were published in 2014 (n = 7) and 2017 (n = 6). All studies targeted an adult population, except for four studies that were either directed at adolescents (aged 12–19 years, n = 2) or people aged >65 years (n = 2). Participants were recruited from primary care (n = 3481), workplace (n = 1260), general population (n = 4581), or a mixed setting (n = 1057, primary/secondary care and general population). Most of the participants were female (n = 7282; 72%) and aged 40 years (mean age 42, SD = 13). The majority of the studies targeted major depressive disorder (MDD) or depressive symptoms (n = 15), followed by anxiety disorders (n = 7), and obsessive-compulsive disorder (OCD, n = 4). Other studies have evaluated sleep disorders (n = 2), elevated stress levels (n = 2), posttraumatic stress disorder (PTSD, n = 1), and suicidal ideation (n = 1). Most studies evaluated guided (n = 21) or unguided (n = 9) interventions, and only two evaluated both guided and unguided IMIs. Most IMIs were based on iCBT (n = 35), problem-solving therapy (iPST; n = 3), mixed approaches combining different aspects such as problem-solving and emotion regulation (iMA; n = 2), positive psychology (iPPI; n = 1), and preventive cognitive therapy (iPCT, n = 1). On average, an intervention consisted of 7.9 (2–15) sessions and was most often compared with a wait-listed control group (WLC; n = 12). Further details of the studies are presented in Table 1.

Table 1 Study characteristics.

Most studies (n = 16) conducted both a cost-effectiveness analysis (CEA) and a cost-utility analysis (CUA). Other studies focused solely on either CUAs (n = 10) or CEAs (n = 4). Three studies conducted a cost-benefit analysis (CBA) in addition to CEA and CUA. The included studies differed in perspectives taken: societal (n = 15), healthcare (n = 6), and both perspectives (n = 9). In the remaining studies, the employer’s perspective (n = 3) alone or in combination with other perspectives were applied. One study conducted a cost-minimization analysis (CMA). Three studies did not report the study perspective. The time horizon of the follow-ups varied across studies ranging from ≤3 months (n = 12), >3 to ≤6 months (n = 8), >6 to ≤12 months (n = 9) to 2 years (n = 4).

Quality assessment

Table 2 contains the Consensus on Health Economic Criteria (CHEC) quality scores. The quality of studies was mainly good (average total score = 85%, range 56–100%). Three studies met all CHEC criteria34,43,44, whereas three studies showed average quality41,45,46. Common reasons for the lower quality were the lack of reporting on the generalizability of the results (n = 29), an insufficient time horizon (n = 16), or lack of sensitivity analyses (n = 8). All studies met the items on appropriateness of the economic study designs and outcome measurement.

Table 2 CHEC quality assessment.

Regarding risk of bias (RoB), most studies showed good (n = 22), and only a few studies showed fair (n = 10) or poor (n = 4) quality (Fig. 2 and Table 3). Detection, attrition, and selection bias were low. By contrast, reporting bias (n = 9) and other biases were high (n = 14). Selective reporting may arise when outcomes for a CEA are not sufficiently described in study protocols and outcome paper. Other biases may arise when there are insufficient information or limitations because of the high complexity of assessing outcomes, e.g., the annualization of short term costs. The agreement for CHEC and RoB between the two raters with Cohen’s kappa (κ) = 0.90–0.91 can be considered almost perfect47.

Fig. 2: Risk of bias assessment.
figure 2

The graph displays the authors’ judgments on risk of bias of each included study, presented as percentage totals according to the Cochrane Collaborations tool.

Table 3 Risk of bias assessment.

Findings of included studies

Supplementary Table 1 displays the following characteristics and outcomes for each of the included health economic evaluations: perspective taken, cost categories used, type of health outcome and measurements, mean incremental cost-effectiveness ratio (ICER) or cost-utility ratio (ICUR) and its position in the quadrant of the cost-effectiveness plane, and probabilities of the intervention being cost-effective given various willingness to pay (WTP) thresholds. This table lists all costs in national currency units and for the index year as published by the primary studies. In the next section, probabilities are only listed if reported in the studies: CUA, WTP threshold of £30.000 per QALY gained; CEA, WTP of £0 per additional, e.g., treatment responder.

MDD

Treatment of MDD, minor/subthreshold depression, and depressive symptoms

Fifteen studies evaluated IMIs for MDD (n = 8) and depressive symptoms (n = 5), whereas two studies focused on depression onset and relapse prevention. The control conditions consisted of alternative guidance formats: iPST, iPPI, iPCT, standard care, stepped care pathway, treatment as usual (TAU), WLC, and attention control (AC). Depressive symptom severity at baseline had no recognizable effect on cost-effectiveness.

One-third of the studies (n = 5) evaluated unguided IMIs based on CBT (n = 4) or positive psychology (n = 148). As for unguided IMIs compared with TAU (n = 3), results from the CUA conducted from the healthcare perspective after 1–2 years did not suggest an economic merit40,49 (at a WTP threshold of £30,000, the probability of cost-effectiveness varied: CUA = 4–38%). However, findings from the societal perspective suggested that one IMI50 had an acceptable likelihood of being cost-effective (at WTP = 0, CEA = 70%; at WTP = £30,000; CUA = 55%). Compared with WLC or AC (n = 2), unguided IMIs from the societal perspective provided only little and unclear evidence for cost-effectiveness (at the WTP = 0, CEA = 20%48; CUA was not reported41).

Six of the 15 studies evaluated guided IMIs based on iCBT (n = 4) or iPST (n = 2). Two guided IMIs were compared with TAU and showed opposing results after 6–12 months. Findings from the societal perspective showed a moderate-to-acceptable likelihood of being cost-effective (at WTP = 0, CEA = 4851–62%52), one above52 and one below51 the proposed threshold of £30,000. From the employer’s perspective, one IMI was the dominant treatment option (WTP = 0, CEA = 55%)52.

Four guided53,54,55,56 IMIs, compared with WLC, were considered cost-effective (<£30,000 per QALY gained, probabilities ranging from 5553 to 98%55) from the societal and healthcare perspective. Results of the cost-effectiveness analyses were unclear54 or showed a low likelihood of being cost-effective at a WTP of nil from a societal perspective (CEA = 30–38%53).

Two studies compared similarly effective guided to unguided IMIs after 12 months. In one study, from the societal perspective, both IMIs generated less costs than usual care and were judged cost-effective57 (<£30,000 per QALY gained, probabilities were not reported). In the other study, from the NHS’ perspective, the guided IMI resulted in more QALYs gained at lower costs than the unguided IMI (considered cost-effective, at WTP = £ 30,000, CUA = 55%39).

Prevention of MDD onset and relapse prevention

The remaining studies evaluating guided IMIs (n = 2) focused on the prevention43 or relapse44 of MDD in comparison with usual care. Findings from cost-effectiveness analyses employing a societal perspective suggested a moderate likelihood of them being cost-effective, with probabilities ranging from 38% to 40% at a WTP of nil. CUA showed a moderate (CUA = 40%44) to acceptable (CUA = 60%43; ICUR < £30,000 per QALY gained) likelihood of them being cost-effective. From the healthcare perspective, one IMI43 showed a small likelihood of being cost-effective per depression-free year gained (WTP = 0, CEA = 17%) but was considered cost-effective when below the cost-utility threshold (at WTP = £30,000, CUA = 64%).

Anxiety disorders or symptoms

Eight studies evaluated guided (n = 5) and unguided (n = 3) IMIs for anxiety disorders based on CBT compared with TAU, AC, WLC, group-administered CBT (gCBT), or iMA. The included studies targeted panic disorder (n = 1), generalized anxiety disorder (GAD) (n = 1), health anxiety (n = 2), social anxiety (n = 2), any anxiety disorder (n = 1), and PTSD (n = 1).

Three studies comparing guided IMIs to AC or WLC in the short term (8–12 weeks) were judged cost-effective from the societal and healthcare perspectives (<£30,000, per QALY gained, probabilities >90%46,58,59). Cost-effectiveness analyses showed that the IMIs dominated the control group by generating less costs at higher effects from the societal perspective (at WTP = 0, CEA = 6458–95%46).

Two studies comparing guided IMIs with gCBT after 6 months to 4 years provided good evidence for their cost-effectiveness. The first IMI was cost-effective from the societal perspective in the short and long term (<£30,000 per QALY gained, CUA = 3437–79%36). Results of the cost-effectiveness analyses showed that the IMI produced less costs at higher effects (WTP = 0, CEA = 81%36) in the short term and increased costs with lower probability of being cost-effective in the long term (WTP = 0, CEA = 62%37). From a healthcare perspective, the same IMI was cost-effective based on a CMA (WTP = £30,000, CMA = 67%38). The second IMI was likewise cost-effective from the healthcare perspective, being the dominant treatment option (WTP = 0, CEA = 75%45).

By contrast, for two studies evaluating unguided IMIs, the results of the cost-utility analyses were considered cost-effective (yet no probabilities were reported), but the CEA did not support these findings. The first IMI60 was compared with unguided iMA from a societal perspective, which resulted in higher costs per responder, showing low probabilities of being cost-effective (at WTP = 0, CEA = 8%), but being below the £30,000 threshold per QALY gained. The second IMI generated less costs per QALY gained than WLC from both healthcare and societal perspectives61. A third unguided study compared an unguided IMI (self-help app) targeting posttraumatic stress62 with TAU from a healthcare perspective and showed a low probability of cost-effectiveness (≈27% at WTP = £30,000 per QALY gained).

OCD

Three studies evaluated guided IMIs for OCD based on CBT in comparison with either self-help book with guidance, WLC, AC, or a booster session. The evidence for cost-effectiveness was contradictory regarding QALYs and moderate regarding clinical outcomes because of heterogeneous control conditions.

From the societal and healthcare perspective, one IMI was cost-effective compared with AC being below the acceptable threshold per QALY gained (at WTP £30,000, CUA = 90–95%63). By contrast, the IMI was judged not cost-effective per additional remission in the short term (at WTP = 0, CEA = 0–15%) nor per relapse prevented after 2 years when a booster session was offered in a crossover design (at WTP = 0, CEA = 0–18%64).

Two studies compared IMIs with WLC after 3 months. From the societal and healthcare perspectives, one study did not report probabilities of cost-effectiveness nor ICUR65, and the other was neither cost-effective compared with WLC (ICUR > 30,000 per QALY gained66, CUA = 35–52%) nor more effective than guided self-help.

Other mental disorders

Most of the remaining five studies used CBT (guided, n = 4; unguided, n = 1), and only one intervention used iMA. The IMIs targeted insomnia, perceived stress or stress-related disorders, or suicidal ideation and showed a moderate to high probability of cost-effectiveness.

IMIs targeting insomnia were cost-effective per QALY gained but unconvincing regarding cost-effectiveness analyses. One IMI was cost-effective compared with WLC and below the threshold per QALY gained (at WTP = £30,00034, CUA = 99%) from the societal and healthcare perspectives. Cost-effectiveness analyses also showed a high probability of being cost-effective, dominating the WLC per additional treatment responder (CEA = 87%, employer’s perspective35) or symptom-free status (CEA = 94%, societal perspective34), but generating higher costs from the healthcare perspective, leading to a low probability of cost-effectiveness (CEA = 6%34).

Another IMI67 was compared with gCBT from a societal perspective. Both treatments showed similar effects, and the IMI led to a high probability of cost-savings while trading off health gains (at WTP = 0, CEA = 95%) but generating more QALYs (at WTP = £30.000, CUA = not reported).

IMIs targeting adjustment or exhaustion disorder, or perceived stress, were mostly cost-effective compared with WLC. Based on findings of the cost-utility analyses, two IMIs were below the threshold of £30.000, showing high probabilities of being cost-effective from the societal perspective (CUA = 7568–79%33). In addition, findings of the cost-effectiveness analyses showed that both IMIs dominated the WLC, yielding acceptable probabilities of cost-effectiveness at a WTP of nil from the employer’s (CEA = 67%69) and societal (CEA = 70%33) perspectives, but not from the healthcare perspective (CEA = 12%68) where higher costs were generated.

The only unguided IMI70 targeting suicidal ideation dominated the WLC, generating a high probability of being cost-effective at a WTP of nil from the societal perspective (CEA = 92%).

Workplace setting

Cost-benefit analyses evaluating costs relevant to the employer yielded a benefit-to-cost ratio (BCR) > 1 (1.6–3.1) and net-benefit greater zero (181–417), which indicates that guided IMIs were cost-effective when compared with TAU and WLC for the treatment of insomnia35, elevated stress69, and depression52.

Guidance and comparators

The majority of studies evaluated guided IMIs (n = 24), which were mostly cost-effective, indicated by ICURs < £30,000/QALY gained, irrespective of the types of control conditions. However, unguided IMIs (n = 11) showed little evidence of cost-effectiveness.

Discussion

This review presents a comprehensive overview of trial-based economic evaluations providing evidence regarding the cost-effectiveness of IMIs for the prevention and treatment of MDs and symptoms. This review identified 32 studies applying societal (n = 24), healthcare (n = 15), and employer’s perspectives (n = 3) in 65 full economic evaluations (CBA, n = 3; CEA, n = 31; CMA, n = 1; CUA, n = 30).

In half of the CEAs (N = 14; MDD, n = 3; anxiety, n = 5; stress, n = 3; sleep n = 2; suicidal ideation, n = 1), the IMI was the dominant treatment option, which means that more health effects were generated at lower costs in comparison with control conditions. Of these, two did not report a WTP and five showed a high probability (≥80%) of being more cost-effective than control conditions at a WTP of nil. For all CEAs, the range of probability at WTP of nil varied from 0 to 95%. Regarding cost-utility, most interventions were cost-effective, being either dominant (n = 13) and/or below the WTP threshold of £30,000 per QALY gained (n = 26) compared with any control condition and often regardless of the perspectives taken. By applying the criterion that an IMI showed at least an 80% probability of cost-effectiveness at WTP of £30,000 compared with a control condition (if reported), 11 IMIs were judged to be cost-effective. Cost-benefit analyses from the employer’s perspective (n = 3) yielded positive net benefits representing the money gained after costs were recovered. In addition, the overall quality of studies (CHEC) was good (n = 30), only a few were excellent (n = 3) or average (n = 3). Reasons for a low rating were no discussion of generalizability, short time horizon, or lack of sensitivity analyses. Regarding RoB, most studies showed good quality (n = 22), and only few studies (n = 6) showed at least one item at high risk of bias.

Our findings expand and strengthen the evidence base for the cost-effectiveness of IMIs. First, our findings support the evidence of cost-effectiveness of guided IMIs for depression and anxiety24,25,27,28,29. Second, our review includes new evidence related to under-researched disorders such as OCD (n = 4), PTSD (n = 1), stress (n = 3), and sleep (n = 2). However, given the limited number of studies, more evidence is needed.

The strength of this review is related to the comprehensive and systematic search strategy in several electronic databases for common MDs and problems, and the resulting health-economic comparisons. The quality of studies was assessed on the methodology of cost-effectiveness analyses and RoB. To further improve comparability and clarity, economic outcomes were converted to Pound Sterling for the reference year 2020 and mapped to the quadrant of the cost-effectiveness plane in which the mean ICER fell (as far as reported in the primary studies). Likewise, unified thresholds and transparent criteria proposed by the authors were used.

However, the comparability of evidence across the studies was hampered by the high heterogeneity stemming from different study designs, methods, study populations, outcome measures, time horizons, comparators, economic perspectives, cost items, and their evaluation. As a case in point, the operationalization of societal costs and intervention costs varied widely. The costs of development and maintenance of the IMIs were often not included or incompletely reported, leading to a possible underestimation of intervention costs. Half of the studies (n = 16) did not report intervention costs or only valued the time for the therapist needed to support the participants.

Another limitation is the lack of interpretability regarding cost-effectiveness, as the WTP for diagnosis-specific measures (e.g., symptom-free, reliable change) is unknown and the WTP threshold for QALYs is somewhat arbitrary, as universally accepted thresholds are unavailable71. For healthcare decision-making, several countries compared ICER to a reference value (generic cost-effectiveness threshold) that represents the maximum cost the health system is willing to pay for a health outcome. These generic thresholds vary largely depending on the methods (e.g., per capita income, benchmarking interventions, and leagues tables: ranking the ICERs of interventions given a specific budget) and setting71. An international survey assessing the individual WTP for one additional QALY gained showed that the thresholds vary between countries (e.g., Taiwan 2.14 times the UK’s per QALY gained)72. Consequently, higher thresholds lead to interventions being adopted earlier than in countries with lower thresholds. Beyond the narrow cost-effectiveness arguments, other criteria of health technology assessment should also be considered for decision-making purposes (e.g., disease burden, prognosis, medical ethics, access, equity, feasibility of implementation and scale-up of the interventions, and acceptability of the intervention by its intended recipients)73. Furthermore, most health-economic evaluations alongside randomized controlled trials (RCTs) are not powered to detect differences in costs nor QALYs. This might result in non-significant differences in costs and QALYs, which can lead to wider uncertainty intervals surrounding the ICER estimates74. Moreover, some studies (n = 3) only collected data over a short period of the study duration and annualized effects and costs. In addition, in some studies (n = 6), the uncertainty surrounding the ICER point estimates was not clear because neither the CEA plane nor the cost-effectiveness acceptability curve where reported. As all studies were conducted in Western countries, especially in the NW Europe, the generalizability of results is restricted to these regions. In this regard, selection bias could have been introduced, as only studies published in German and English were included.

The results may lead to several clinical implications. The review could be important for decision-makers when allocating scant resources to meet the demands for the many in need of sustainable healthcare. With the increasing use of economic data in decision-making in public mental health and the increasing societal and economic burden of MDs, consideration of the cost-effectiveness of psychological preventive interventions and treatments is becoming increasingly important. IMIs might be an important way forward. Moreover, since the COVID-19 pandemic, increasing numbers of patients and health services had to shift toward IMIs for the receipt and delivery of mental healthcare. Thus, this may have paved the way for scaled-up uptake of IMIs.

Despite the high heterogeneity stemming from intervention types and comparators of the included studies, some promising trends toward specific mental health targets were seen. Recommendations for policy makers and relevant stakeholders can be made, relating to existing NHS guidelines75 for the application of low-intensity psychosocial interventions in depression and anxiety. Based on our results, guided IMIs for MDD and anxiety disorders should be offered as treatment option. The evidence regarding the cost-effectiveness of under-researched disorders (e.g., OCD, sleep, and stress) and of unguided interventions is limited, and offering such interventions should rely on case-by-case decisions. However, unguided IMIs are scalable and easy to implement, showing a high potential to make an impact at a population level.

Besides these recommendations related to financial aspects, the implementation setting, target population, symptom severity and disorders should be considered. In addition, knowledge about diverse stakeholders’ views and values relevant to priority setting enables decision-makers to make better-informed decisions and appropriate judgments about allocation of scant resources.

In practice, most healthcare providers are receptive to the advantages of IMIs as part of their treatment. However, IMIs should meet the criteria of government reimbursement mechanisms, like the National Institute for Health and Clinical Excellence’s (NICE) in the UK or the one for digital health applications in medical and psychotherapeutic care in Germany, to become sustainable. Such criteria include evidence on effectiveness, interoperability, safety, and data security76.

Following this, we provide several recommendations for future research. First, various anxiety disorders such as panic disorder, GAD, and social anxiety were underrepresented, and disorders such as specific phobias were not found for this review. Moreover, studies were only conducted in resource-rich high-income countries. Hence, we recommend focusing on under-researched disorders and conducting research in low- and middle-income countries.

Second, we recommend publishing study protocols that adhere to economic evaluation guidelines (ISPOR77 and CHEERS78) and quality checklists (Drummond31 and CHEC79), thereby minimizing biases and improving study quality (e.g., reporting of uncertainty, sensitivity analysis and combined reporting of disease-specific and generic health outcomes to facilitate comparability, and interpretation for decision-making).

Third, the cost-effectiveness of IMIs for MDs and symptoms was frequently based on short term findings (6–16 weeks, n = 13), whereas the remaining studies reported findings based on moderate (6–12 months, n = 14) to long follow-up periods (2–4 years, n = 3). We recommend conducting economic evaluations over longer follow-up periods to better capture longer-term productivity losses and gains, especially in preventive interventions in remittent disorders, such as anxiety disorders.

Fourth, more research is needed on IMIs compared with active control condition across all disorders to establish the cost-effectiveness of IMIs as possible alternative to face-to-face treatments.

Fifth, studies are needed to carefully choose the perspectives taken depending on the decision maker, target population, disorder, or setting. For employers, productivity losses are most important, whereas from a healthcare system’s perspective, a high healthcare coverage for people affected by disorders is prioritized.

Finally, the acceptability of an IMI among patients and relevant stakeholders is worth investigating to provide more insights pertinent for the implementation, uptake, and use thereof.

In conclusion, this systematic review provides an overview of economic evaluations of internet-based interventions for the treatment and prevention of MDs. Guided iCBTs for anxiety disorders and MDD showed a high probability of being cost-effective. IMIs for insomnia, suicidal ideation, and stress had the potential of being cost-effective, whereas the evidence base for the cost-effectiveness of IMIs in OCD was not very firm. Although many studies were identified, more robust conclusions about the cost-effectiveness of IMIs could not be reached given the high heterogeneity across the studies with regard to methodologies, interventions, and comparators in a range of disorders and symptoms among various populations and age groups. More cost-effectiveness research is warranted in unguided and preventive IMIs that are proven to be effective, specifically in under-researched disorders and symptoms and preferably over longer time horizons. From a methodological perspective, future studies should more stringently adhere to existing health-economic guidelines to increase comparability and enhance their value for decision-making purposes in healthcare.

Methods

The guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses80 and preparation for systematic reviews of economic evaluations81 were followed. This systematic review was registered in the international prospective register of systematic reviews, PROSPERO (CRD4201809380882).

Search strategy

An extensive literature search was conducted, using the following electronic databases: MEDLINE, PsycINFO, Cochrane Central Register of Controlled Trials (CENTRAL), PSYNDEX, and National Health Service (NHS) Economic Evaluations Database. Relevant articles published before 10/05/2021 were identified using standardized subject terms. A search strategy consisting of four main categories was applied for each database selecting articles referring to (1) intervention, treatment, prevention, or psychotherapy; (2) MDs, (3) internet, online, or mobile-based; and (4) economic evaluation (Supplementary Table 2).

Eligibility criteria

Studies were eligible for inclusion if they met the following inclusion criteria:

Population: participants regardless of age with a diagnosis of MD or symptoms such as MDD, dysthymia, bipolar disorder, social phobia, panic disorder, GAD, PTSD, OCD, specific phobia, and separation anxiety, sleep disorders, or transdiagnostic key symptoms such as suicidal thoughts, and psychological distress, all of which were required to be assessed with validated self-report questionnaires or being based on diagnostic interviews.

Intervention: psychological interventions that are provided in an online setting, defined as internet-, online-, web-, or mobile-based and grounded in CBT, interpersonal therapy, problem-solving therapy, positive psychology intervention, psychodynamic therapies, behavior therapy or behavior modification, systemic therapies, third-wave cognitive behavioral therapies, humanistic therapies, or integrative therapies. Internet-based interventions can be “guided”, offering patients human support by a psychotherapist via email or chat or automated feedback delivery, or “unguided”, only offering self-help interventions without any additional human support.

Comparator: included one of the following control groups: another psychological intervention, TAU, WLC, or AC group.

Outcome measures: reported economic evaluation estimates based on CEA, CUA, CBA, and CMA of a full economic evaluation, which means that the study compared both costs and effects (e.g., QALYs, treatment response, relapse avoided, and remission) of two or more alternatives.

Study types: RCTs, full texts are accessible as peer-reviewed papers, in English or German.

Studies were excluded if the intervention was not delivered online. IMIs were excluded when provided in combination with a face-to-face or video-based sessions delivered by a therapist (i.e., blended intervention). Studies were excluded if they did not report a meaningful outcome measure for economic evaluation (e.g., point improvement on an ordinal scale). Health-economic modeling studies were excluded because of methodological differences compared with trial-based economic evaluations (e.g., not directly based on observational data) limiting internal validity of the review. Conference abstracts, protocol papers, non-peer-reviewed papers, cost of illness, observational studies, cohort studies, case studies, pilot studies, and feasibility studies were also excluded.

Study selection and extraction

First, titles and abstracts of the identified articles were screened. Then, studies were evaluated whether they met the criteria in full text by two independent researchers, F.K. and C.B. Disagreement was discussed and/or a third reviewer (D.D.E.) consulted. Interrater agreement (Cohen’s kappa) of the two reviewers was examined.

Data of eligible studies were extracted using the Consolidated Health Economic Evaluation Reporting Standards Checklist78: (1) characteristics of participants (setting, age, sex, and screened symptoms/diagnosis), (2) study design (sample size, trial arms, and assessment points), (3) intervention (psychological approach, guidance, and length of intervention), (4) economic outcome measures, (5) type of economic evaluation, (6) characteristics of derived costs (cost categories, cost data sources, price year, currency, and mean incremental costs), (7) perspective of economic evaluation, and (8) cost-effectiveness estimates, such as incremental costs (i.e., cost difference between IMI and comparator), incremental effects, ICER, and ICER acceptability for various WTP levels.

Summary measures

Only base-cases analyses adhering to the intention-to-treat (ITT) principle were reported. Cost-effectiveness is ascertained when an intervention dominates the alternative, so it is both more effective and less costly or provides a greater outcome at higher costs that the society is willing to pay for31. In practice, interventions often show greater effects for higher costs. The efficacy of interventions is one of the indicators for their cost-effectiveness, as it represents the denominator of the ICER. Consequently, most often, the investment required for obtaining a favorable health outcome decreases with increasing effectiveness. Therefore, more effective treatments have a higher probability of being cost-effective. The relative effectiveness of an intervention is further influenced by its comparator, with smaller incremental effects in active comparator interventions to larger incremental effects in passive control groups4. Similarly, the level of therapist-led guidance in IMIs induces some effect moderation because it adds costs to an IMI, but may also enhance its effectiveness4,83. This is important when making conclusions about incremental cost-effectiveness. In this review, IMIs were judged to be cost-effective when:

  • the IMI was dominant, i.e., the IMI’s effect was better, and its costs were lower than those of the comparator;

  • the costs per QALY was below the WTP of £30,000 as suggested by the NICE84;

  • studies using disease-specific clinical outcome such as treatment response, reliable change, were judged to be cost-effective when the probability of cost-effectiveness at a WTP of £0 was 80% or higher, which provides a high level of certainty for decision-making.

This means that the intervention is estimated to be more effective and costly in 80% of the cases. This criterion can be seen as conservative, as most interventions show higher effects at higher costs than alternative interventions. Again, as no thresholds for the WTP of these units of effect exist, applicable studies should be judged individually by decision-makers.

To facilitate comparison between countries, all national currencies were converted to Pound Sterling for the price year 202085. First, the currency of the study was indexed to a 2020 equivalent by country-specific gross domestic product inflators (e.g., euro area 19) and then converted to Pound Sterling (£) using purchasing power parities86.

Quality assessment

The quality of health-economic evaluations was assessed using the CHEC79. This 20-item checklist was developed to evaluate the methodological quality (internal and external validity) of economic evaluations. The total score is expressed as the percentage of the maximum score for each study. A summary quality score was calculated24 (percentage of criteria met by each study [range: 0–100%]) based on a scoring of “yes” (= 1), “suboptimal” (= 0.5), “no” (= 0), not applicable (NA)24. The following quality categories were used: excellent (100–95%), good (75–94%), average (50–74%), and poor (<50%).

In addition, Cochrane Collaboration’s tool for assessing RoB was used87 to determine selection, performance, detection, attrition, reporting, and other bias in research studies. Each item was rated as high, low, unclear RoB, or NA. Performance bias was not assessed, as participants and personnel cannot be blinded due to the nature of IMIs. Furthermore, detection bias was always rated as low, as IMIs commonly rely on self-report instruments. Incomplete outcome data were rated as low risk when data analysis was conducted in accordance with the ITT principle. RoB was converted to the Agency for Healthcare Research and Quality88 standards (i.e., good, fair, or poor quality). RoB and CHEC were rated independently by F.K. and C.B. Disagreement was discussed or resolved by a third reviewer (D.D.E.).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.