A pragmatic randomized waitlist-controlled effectiveness and cost-effectiveness trial of digital interventions for depression and anxiety

Utilization of internet-delivered cognitive behavioural therapy (iCBT) for treating depression and anxiety disorders in stepped-care models, such as the UK’s Improving Access to Psychological Therapies (IAPT), is a potential solution for addressing the treatment gap in mental health. We investigated the effectiveness and cost-effectiveness of iCBT when fully integrated within IAPT stepped-care settings. We conducted an 8-week pragmatic randomized controlled trial with a 2:1 (iCBT intervention: waiting-list) allocation, for participants referred to an IAPT Step 2 service with depression and anxiety symptoms (Trial registration: ISRCTN91967124). The primary outcomes measures were PHQ-9 (depressive symptoms) and GAD-7 (anxiety symptoms) and WSAS (functional impairment) as a secondary outcome. The cost-effectiveness analysis was based on EQ-5D-5L (preference-based health status) to elicit the quality-adjust life year (QALY) and a modified-Client Service Receipt Inventory (care resource-use). Diagnostic interviews were administered at baseline and 3 months. Three-hundred and sixty-one participants were randomized (iCBT, 241; waiting-list, 120). Intention-to-treat analyses showed significant interaction effects for the PHQ-9 (b = −2.75, 95% CI −4.00, −1.50) and GAD-7 (b = −2.79, 95% CI −4.00, −1.58) in favour of iCBT at 8-week and further improvements observed up to 12-months. Over 8-weeks the probability of cost-effectiveness was 46.6% if decision makers are willing to pay £30,000 per QALY, increasing to 91.2% when the control-arm’s outcomes and costs were extrapolated over 12-months. Results indicate that iCBT for depression and anxiety is effective and potentially cost-effective in the long-term within IAPT. Upscaling the use of iCBT as part of stepped care could help to enhance IAPT outcomes. The pragmatic trial design supports the ecological validity of the findings.


INTRODUCTION
Stepped-care models have been proposed as a potential solution 1 to bridge the substantial gap between the prevalence of common mental health disorders, including depression and anxiety, and the access rates for evidence-based treatments 2,3 . Stepped-care seeks to up-scale treatment initiatives by matching treatment intensity and duration to clients' presenting needs, thereby optimizing outcomes and service capacity utilization. Investing in upscaling initiatives for mental health treatments is projected to produce large returns at a benefit-to-cost ratio of 3.3-5.7 to 1 when accounting for economic benefits and the value of health returns 4 . The Improving Access to Psychological Therapies (IAPT) programme in the UK is one of the first examples of a mental health stepped-care model implemented nationwide 5 .
IAPT services offer evidence-based treatments to individuals experiencing depression and/or anxiety, providing low-intensity interventions alongside traditional treatments (e.g. face-to-face therapy) 5 . Specifically, at Step 2, low-intensity interventions are offered to patients presenting with mild to moderate depression and anxiety symptoms, while those with more severe or complex presentations of depression and anxiety are assigned to step 3 high-intensity treatments. Low-intensity interventions include empirically established treatments like guided bibliotherapy and internet-delivered cognitive behavioural therapy (iCBT). Similar multicomponent models of care exist in other countries, such as the collaborative care models in the USA 6 . A key aspect of IAPT is routine outcome monitoring, which is used to improve individual clinical outcomes by aiding ongoing treatment decisions, but also to establish publicly available service-level clinical performance reports 7 . In the period 2018-19, IAPT received 1.6 million new referrals, of which 1.09 million were seen at least once for assessment and guidance and 582,556 received a course of therapy (defined as two or more sessions) 8 .
Political and policy initiatives that helped establish IAPT promised significant economic benefits, claiming that making evidence-based psychological treatments available would have no net cost to the Treasury 9 , yet envisioned economical returns from IAPT remain debated 10 . Internet-delivered interventions may be one way to improve IAPT outcomes in a cost-effective way 11 . However, currently iCBT accounts for only 7% of treatments completed within IAPT 8 . Clark et al. found that amongst services that achieve lower treatment rates, engaging more users in treatment could improve recovery and reliable improvement by 33% and 90%, respectively 7 , highlighting the potential for increased use of iCBT within IAPT to help achieve this aim.
Generally, iCBT for depression and anxiety has been found to significantly reduce symptoms and produce medium to large effect sizes at post-treatment, with a maintenance of effects at follow-up 12 . As a result, iCBT has established itself as a viable mode of treatment for depression and anxiety. Still, most research has explored iCBT's efficacy under more controlled settings, with effectiveness trials in routine care finding mixed outcomes 12,13 .
Most effectiveness trials were situated in specialized iCBT clinics. Research to assess iCBT's effectiveness in non-specialized settings is scarce. In the UK, iCBT has previously been investigated within a primary care context (the REEACT trials) 13,14 . Outcomes from these trials were discouraging, finding iCBT to be neither more effective nor cost-effective than treatment as usual by general practitioners alone. Implementation challenges resulting in low up-take and usage of iCBT and less than optimal support provided during treatment (i.e. only minimal or non-therapeutic support when research emphasises the superiority of supported-iCBT) 15 likely affected the trials' outcomes.
We sought to evaluate the effectiveness and cost-effectiveness of iCBT for depression and anxiety in a pragmatic clinical trial within IAPT routine stepped care. We hypothesized that iCBT would reduce anxiety and depression severity over an 8-week treatment period compared to waiting-list control and be considered cost-effective based on criteria set out by the National Institute for Health and Care Excellence (NICE; UK). Effectiveness and cost-effectiveness were hypothesized to be maintained throughout follow-up.
Follow-up outcomes Follow-up models demonstrated maintained or improved symptom levels across all follow-up time-points (Fig. 3 (12/90) in the control arm. All between-group differences were significant (p < 0.01). Amongst 3-month M.I.N.I.7.0.2 completers (n = 179), 60% (18/30) with depression, 50% (24/48) with anxiety, and 46% (30/65) with comorbid depression and anxiety diagnoses, did not meet diagnostic criteria anymore at 3-months. Overall, 56.4% (101/179) of participants did not meet diagnostic criteria for any disorder at 3-months (see Supplementary Tables 3 and 4  Cost-effectiveness In terms of preference-based health status, the EQ-5D-5L tariff score in the ITT analysis suggest a sustained health improvement over 12 months relative to baseline for those using iCBT, and at a higher magnitude relative to control at 8-weeks (Supplementary  Tables 5 and 6). To give an indication of reported resource-use and associated costs, descriptive statistics (i.e. response rates, resourceuse, and associated costs) by trial-arm in both, Complete-Case and ITT, analyses are presented in Supplementary Tables 7-9.
The incremental CEA and within trial-group results are presented in Supplementary Tables 10 and 11. Figure 4 presents the cost-effectiveness acceptability curve suggesting probability of iCBT cost-effectiveness dependent on willingness-to-pay per QALY, by time-horizon of the baseline-adjusted analyses. The base-case CEA (baseline-adjusted costs/baseline-adjusted QALYs) suggests over 8-weeks, iCBT has a high probability of producing QALY gains (>96.6%), but low probability of cost-savings (<0.5%), at an estimated iCBT intervention cost of £94.63 per person (Supplementary Table 12) against the no-intervention-cost waiting-list control. Across base-case analyses, probability of cost-effectiveness at NICE's upper £30,000 per QALY threshold ranged from 46.6% (baseline-adjusted QALY and baselineadjusted costs ITT analysis; ICER = £29,764) to 65.5% (unadjusted ITT analysis; ICER = £20,310); however, these probabilities decrease as the willingness-to-pay threshold decreases e.g. 21.0-47.9% within the same aforementioned analyses at NICE's lower £20,000 per QALY threshold over 8-weeks.
In the scenario analyses at 6, 9 and 12 months, the probability of cost-effectiveness increases as the time horizon increases (Fig. 4). The probability of cost-effectiveness at 12-months ranges from 91.2% to 92.0% at £30,000, or 88.5-90.1% at £20,000, per QALY gained dependent on analysis conducted (Fig. 4 only shows baseline-adjusted costs/baseline-adjusted QALYs). An interpretation of these scenario analyses relative to base-case results is provided in the Supplementary (section 4).    Table 13).

DISCUSSION
This large-scale RCT conducted within IAPT showed that iCBT for depression and anxiety is effective as a standalone intervention when fully integrated and operated in non-specialised routine stepped-care settings. The pragmatic trial design increases the ecological validity of the observed effectiveness of iCBT for depression and anxiety. The primary hypothesis of the study was confirmed by showing iCBT to be effective at the end of the 8week intervention period compared to waiting-list control, with further statistically significant improvements in the primary outcomes over the 12-month follow-up. The probability of costeffectiveness was predicted to be much higher over longer timehorizons than that observed over the 8-week treatment period alone, driven by QALY-based health improvement, which in part aligns with our clinical and longer-term cost-effectiveness hypotheses.
The results build upon prior effectiveness evidence of supported iCBT interventions for individuals with depression and anxiety disorders in primary care 12,15 . The observed effects also align with previous reviews in terms of the type of comparator, where waitlists comparators lead to larger effects that when compared to treatment as usual 12 . In the context of UK's primary care, the observed effects outperform the findings of the REEACT trials 13,14 , which could be explained by the stepped-care model within which these interventions were embedded, with contextual factors favouring the deployment and uptake of the interventions (i.e. trained personnel for the use of iCBT, routine outcome monitoring). The significant improvements on depression and anxiety symptoms observed in the follow-up compared to the end of treatment period have also been shown in prior reviews 12,15 . In addition, the statistically significant improvement in functional impairment is coherent with the intervention's effects on depression and anxiety 16 .
Regarding clinical diagnoses, in the current study between 46% and 60% of individuals did not meet diagnostic criteria anymore for one or multiple diagnoses after receiving the iCBT intervention, which is comparable to rates observed in previous studies of iCBT 17 . In this regard, the rates of recovery from diagnosis align with the rates of reliable improvement and recovery as assessed using the primary outcomes. It is important to consider that in our calculations of clinical improvement, we did not rely on the continuous assessments from the IAPT routine outcome monitoring system but we used different methods to account for missing data 5 . Therefore, our outcomes are not directly comparable to publicly reported IAPT outcomes as they are calculated differently.
Over 8 weeks, the baseline-adjusted ITT cost-effectiveness result suggests an ICER of £29,764 which is below the £30,000 per QALY NICE ICER upper threshold. This produced a probability of costeffectiveness of 46.6% due to the skewed nature of the estimates driven by a high probability of QALY gains, but low probability of  cost-savings. Short comparative trial time-horizons, which are often associated with waiting-list designs due to ethical reasons when with-holding care for longer-periods, limit potential QALY gains which are time dependent (i.e. more potential for QALY gains over 12 months relative to 8 weeks). Thus, a statistical extrapolation of costs and outcomes in the control were used to predict potential cost-effectiveness beyond 8-weeks, which suggested much higher probabilities of cost-effectiveness (i.e. >91% at £30,000 per QALY over 12 months) driven by the sustained health improvement observed in the intervention-arm than predicted in the control. These findings support the potential for internet-delivered interventions to be a cost-effective complement to established interventions when assessing costeffectiveness beyond the 8-week treatment period 11 . Decision makers wanting to use the results from the scenario analyses based on the statistical extrapolation should first consult the extended discussion and exploration of the scenario analyses and what these results represent in Supplementary (section 4). The results align with previous findings from economic evaluations of e-health interventions in different settings, which support the potential of iCBT to be a cost-effective complement to established interventions 11,18 . Although there was a low estimated probability of cost-savings from a health and social care perspective, this aligns with evidence from previous studies. Effective treatments of depression (in particular) tend to show an increase in 'direct' costs (e.g. intervention, hospital, primary care), with cost-savings often associated with decreased 'indirect' costs (e.g. lost productivity from absenteeism and presenteeism) 19 . Our costing perspective (as per the NICE reference case) does not account for wider societal costs and so does not capture these cost aspects.
Participants use of the interventions were similar to that observed in a previous trial of the same platform (i.e. around 5 h of usage and 14.7 logins), which also produced positive outcomes compared to a waiting-list group 16 . These results align with previous literature of iCBT, where it has been found that users do not need to complete all the intervention to benefit from it 20,21 . In this regard, there should be a shift in how adherence, and implicitly attrition, to iCBT is conceptualised, from a more traditional conceptualisation of completing all the modules, towards determining how much exposure is enough to produce clinical benefits 20 .
The execution of the trial within an IAPT service where iCBT is currently used as per normal service procedures is a key strength of the study, since it increases the ecological validity of the findings, by utilizing a real-world context 22 . In addition, the service population where we undertook the study has socio-demographic and clinical characteristics which are representative of the overall IAPT population, although female gender was slightly over represented compared to IAPT reports (70% females in this trial vs. 65% in IAPT referrals) 8 . The addition of clinician-administered   This study also has several limitations. First, the impossibility of following-up the waiting-list group for ethical reasons prevents observed analysis of treatment effects and cost-effectiveness results beyond 8-weeks; however, the maintained iCBT treatment effect (relative to baseline) and scenario CEA suggests promising, positive long-term outcomes (see also Supplementary section 4, for an extended discussion around the scenario analyses). Second, a waiting-list comparator was used instead of care as usual or an active comparator, which would be more reflective of what usually would happen in the absence of this iCBT intervention. Furthermore, the use of a WL group might also produce an overestimation of the treatment effects, since research has shown larger effects sizes in favour of this type of control group when compared to treatment as usual comparators 12 . Despite being common in anxiety research on psychological interventions, there are also other inherent issues with wait-list comparators such as issues of expectancy 23 .
Third, inter-rater reliability for the M.I.N.I.7.0.2. was not assessed in this study which calls into question the internal validity of the results, although the high coefficients observed in validation studies of the M.I.N.I. and the training and supervision provided to the PWPs who were involved ensured that they properly administered the interview. Fourth, asking participants to selfreport resource-use over the past three months using the CSRI might have produced recall errors and subsequent recall-bias. Fifth, the fact that the trial was conducted within routine care settings caused that some users were still in treatment by the end of the 8-week period, which could have influenced the effects of the intervention at this timepoint. Sixth, although support was stated on the protocol to be offered online, some reviews were conducted over the phone (as per service-related reasons). Future secondary analyses will explore potential confounding effects of telephone reviews further. Finally, the study was powered to detect between-groups differences based on F-tests; however, after powering the study but before trial completion and unblinding of the data, it was decided to apply linear mixed models, since they are more robust analyses (see "Supplementary Methods"). Post-hoc power analyses for the primary outcomes indicated power to have been in excess of the assumed 80% and therefore adequate in detecting moderate between-groups effect sizes for the primary outcomes.
Limitations related to the CEA include that the predicted costs and effects beyond 8-weeks for waiting-list control account for the intervention's treatment effect, which potentially underestimates the QALYs gained and cost-differences between trial-arms. Trial comparisons against waiting-list controls tend to estimate higher intervention-related QALY gains than against 'active' controls (e.g. face-to-face CBT), which limits the findings of this study compared to other active interventions (e.g. other delivery methods of CBT). Increased productivity could well be a source of cost-savings from this iCBT intervention which can't, or are unlikely to, be observed in our CEA given the costing perspective and short trial timehorizon. We applied a mean intervention cost to all participants, rather than costing the intervention to the specific participant, due to the inability to retrospectively link specific PWPs to trial participants which will have an effect on how the uncertainty around the intervention cost is accounted for in the bootstrap replicates and subsequent CEAC.
Given that we compared iCBT to a waiting-list group in IAPT, it would be valuable for future studies to examine the relative efficacy of iCBT compared to other low-intensity interventions offered within IAPT (i.e. guided self-help, psychoeducational groups). Furthermore, a trial analysing effectiveness and costeffectiveness of iCBT compared to face to face CBT would shed more light into the potential of iCBT to produce cost-savings and thus make further recommendations on service funding and delivery. To complement quantitative findings, we included qualitative interviews examining the facilitators and barriers that affected the implementation of the current intervention in IAPT services and these will be published separately. Distinct support models (for example support-on-demand) could shed light on the role of the supporter within a stepped-care model. Usage and engagement metrics, more easily measurable within an iCBT modality, can add to the understanding of the therapeutic process.
The results of the current study, conducted within a routine, NHS IAPT setting, show statistically significant and sustained reduction in depression and anxiety severity which translated into QALY-based health improvements, considered cost-effective by criteria typically used by NICE (albeit against a waiting-list control) when assessed beyond the 8-week treatment period. These results consolidate iCBT for depression and anxiety as an effective treatment alternative within stepped-care models. The results of this trial support the findings observed in trials from other countries where iCBT was implemented and assessed as part of stepped care and collaborative care models 24,25 . In a society where the prevalence for depression and anxiety is rising and demand is outpacing what mental health services are able to offer, embedding digital interventions as treatment alternatives increases accessibility and service efficiency, which has the potential to be cost-effective for the health care system.

Study design
An 8-week randomized controlled trial (RCT) with 2:1 (iCBT intervention: waiting-list control) allocation, and 12-month follow-up in the interventionarm. This study uses a pragmatic design where the trial procedures were built around the standard service procedures used at step 2 of IAPT and the support was offered by service employees. The study setting was an NHS IAPT Trust in England (i.e., a local health service provider covering the population within a geographical area). At Step 2 IAPT offers low-intensity psychological interventions to individuals experiencing mild to moderate symptoms of depression and anxiety. The trial protocol 26

Participants
Participants were chosen based on IAPT standard operating procedures to best represent participants who may be eligible and require iCBT as part of usual IAPT care. Participants were new IAPT referrals, included if aged between 18-80 years, screened for psychological pathology using clinical thresholds for Patient-Health Questionnaire-9 (PHQ-9 ≥ 9) 27 and/or Generalised Anxiety Disorder-7 (GAD-7 ≥ 8) 28 , and suitable for iCBT (i.e. willing to engage in iCBT, internet access). Exclusion criteria included: suicidal ideation/intended (PHQ-9 question 9 score > 2 and/or expressed during clinical interview); psychotic illness; organic mental health disorder; alcohol and/or drug misuse; and currently receiving psychological treatment. Recruited participants signed informed consent via digital signature.

Randomization and masking
Participants were randomized at an individual level using an algorithm developed by a computer scientist (Sealed Envelope, 2016) and executed independently of the research team, employing random permuted blocks using block sizes of 9, and including stratification within a 2:1 allocation ratio between treatment and waiting list control groups. The Psychological Wellbeing Practitioners (PWP) who carried out the support and assessment of the patients could not be blinded to allocation for practical reasons.

Procedures
As per routine practice, clients attending the IAPT service received an assessment of their needs which determined their initial presentation (depression/anxiety), suitability for Step 2 and for iCBT, and allocation to one of the available programmes. Subsequently, eligible participants were invited to the trial and thereafter, as part of this research trial, signed informed consent via digital signature, completed the baseline research assessments and completed the M.I.N.I.7.0.2 with a PWP to determine a primary diagnosis of depression or anxiety with initial treatment assignment as per routine service's practice (most suitable intervention based on their symptomology). Given the higher prevalence of anxiety disorders compared to depression 29 , primary diagnosis was used for randomization to ensure a balanced sample; this data will inform future secondary analyses. Those randomized to the intervention-arm commenced their treatment with their PWP supporter.
The iCBT programs were SilverCloud Health's 'Space from Depression', 'Space from Anxiety' and 'Space from Depression and Anxiety' interventions, whose efficacy has been previously supported 16 and which have been described in detail elsewhere 26 . The interventions share core CBT content with customisations for depression and specific anxiety-disorder presentations (e.g., social anxiety, generalized anxiety, panic disorder, etc.). Participants were supported during their iCBT treatment by PWPs, who are psychology graduates specifically trained in the provision of lowintensity CBT interventions 5 . PWPs were located in offices across several localities in the area, which are separate to primary care premises. Assessment and triage for services typically occurs over the phone but can sometimes occur in face-to-face settings. PWPs assessed participants for suitability, assigned participants to a specific iCBT programme, monitored their progress throughout the trial and provided regular reviews. During the reviews, PWPs provided feedback to clients based on their work from week-to-week (e.g. modules completed, tools used, shared journal entries) and encouragement in order to promote meaningful engagement with the programme 30 . PWPs were instructed to provide six online reviews (15 min per participant per review) over the 8-week intervention period. Potential risk to participants was monitored in line with routine service procedures (see protocol for full description of risk management). Fidelity to the treatment was ensured through checklists and clinical supervision offered by case managers. Specifically, the checklist assessed if the content of the review offered by the supporter considered the client, the tools used and the previous levels of engagement of the client when writing the review. In total, 176 checklists were completed that represented 85 cases with an average of 2 supervisions per case. One-hundred-and-seventy (n = 170) of the checklists were marked as compliant with the content of online reviews. 6 cases were either not indicated as compliant (reason not stated) or marked as the client not attending the session.  effect (8 weeks), plus one month that is required by diagnostic criteria for some disorders (e.g. panic disorder, social anxiety disorder) to be asymptomatic in order to meet the criteria. Thereafter, online selfreported measures were administered at 3-, 6-, 9-, and 12-month followup timepoints. In order to enhance retention of participants for research purposes only, participants received financial incentives in the form of vouchers (up to £97.50) for the completion of research measures ("Supplementary Methods"). For trial purposes, PWPs were formally trained in the procedures they would be undertaking that were outside of normal service (i.e. diagnostic interview, checking for consent, monitoring fidelity).

Outcomes
The primary outcome measures were the PHQ-9 for depressive symptoms 27 and the GAD-7 for anxiety symptoms 28 .
Secondary outcomes included rates of 'significant improvement' and 'recovery', based on PHQ-9 and GAD-7 scores at 8-weeks. These were calculated as follows: (a) significant improvement: a score change of PHQ-9 ≥ 6 and/or GAD-7 ≥ 4, where the thresholds are calculated based on Jacobson's & Truax's criteria for clinically significant change, provided they do not reliably deteriorate on either measure (b) recovery: moving from 'caseness' (defined thresholds as set by IAPT being PHQ-9 ≥ 10 and/or GAD7 ≥ 8) to 'non-caseness' (below the threshold on both measures) 32 . The Work and Social Adjustment Scale (WSAS) was used to measure functional impairment levels 33 . The M.I.N.I.7.0.2 was administered to determine diagnosis of depression and/or anxiety disorder at baseline and 3-months follow-up.
Rates of deterioration at post-treatment (increase in PHQ-9 ≥ 6 and/or GAD-7 ≥ 4) and an increase in the number of diagnoses at 3-months were considered as adverse events. Metrics of programme usage such as time spent in the platform, number of logins and percentage of programme completion were collected for those participants assigned to the iCBT group.
Cost-effectiveness analysis was based on the preference-based EuroQoL Five-Dimension Five-Level (EQ-5D-5L) 34 health status measure using the EQ-5D-5L cross-walk algorithm as NICE's interim-position 34,35 for 'cost per quality-adjusted life year (QALY)' analysis. An adapted Client Service Receipt Inventory (CSRI; routinely completed at study site, see supplementary Fig. 1) 36 was used to collect self-reported care resource-use over the past 3 months (note, the CSRI at 8-weeks asked about resource-use over the past 3 months, rather than previous 8 weeks as intended, and was not administered at 3-month follow-up).
Measures were administered at baseline and 8-weeks forming the primary endpoint between trial-arms, with further intervention-arm followup at 3-, 6-, 9-, and 12-month. At baseline, demographic details were collected.

Statistical analysis
Sample size was determined separately for primary presentations of depression and anxiety. In total, 360 participants were required to detect a moderate between-group effect size (d = 0.5) on repeated-measures for PHQ-9 and GAD-7, based on F-tests, with a two-tailed α of .05, power of 80%, and 2:1 randomisation procedure (to reduce likelihood of having many people waiting). The expected between-group difference and an aggregated 25% uplift to ameliorate against attrition were based on a previous study and a meta-analysis of iCBT for depression 16,37 . Due to a discrepancy between the original formula for statistical power, based on Ftests, and the statistical analysis actually applied (i.e. linear mixed models; "Supplementary Methods"), statistical power of the mixed effects models was checked through the R package nlmeU and it was found adequate.
Baseline demographic and outcome variable differences between groups, and measure responders versus non-responders, were investigated through Chi-squared, Mann-Whitney and t-tests. Missing data mechanisms were assessed separately at 8-weeks (across and within trial-arms) and throughout follow-up (intervention-arm only), using Little's 'missing completely at random' (MCAR) test. Missing data was deemed to be 'MCAR' at 8-weeks (see Supplementary Table 3). Across follow-up, data was deemed to be 'missing at random (MAR)' (see Supplementary Table 4).
Linear mixed and marginal models were used to evaluate intervention effectiveness, both considered robust intention-to-treat (ITT) analyses. Analyses included all available data from all participants. Graphical and statistical checks were conducted to assess model assumptions. These checks did not flag any major violations of assumptions. Six models were built, one up to 8-weeks across arms ('8-week model') and one including all intervention-arm data ('follow-up model') per outcome variable (PHQ-9, GAD-7, and WSAS). 8-week models were random intercept linear mixed models and included maximum likelihood estimation. Follow-up models were marginal models with an unstructured correlation structure as these produced superior model fit than corresponding linear mixed models. Given marginal model assumptions and MAR data at follow-up, follow-up analyses were preceded by multiple-imputation via multilevel joint modelling 38,39 . Bonferroni adjusted paired comparisons based on estimated marginal means were conducted to explore change between timepoints and differences between the intervention-arms, and any subgroups. Between-group Cohen's d effect sizes were calculated utilising raw score standard deviations 40 . Analyses were conducted in IBM SPSS version 26 and R version 3.6.1 using the 'nlme', 'nlmeU', 'mitml', 'geepack' and 'emmeans' packages.

Cost-effectiveness analysis
Cost-effectiveness analyses were based on NICE's reference case 41 from a health and social care perspective, with QALYs elicited using the area under the curve (AUC) method 42 from the EQ-5D-5L cross-walk algorithm as NICE's interim-position 34,35 . Application of unit costs for 2017/18 in Great British Pounds (GBP, £) to per patient resource-use to estimate intervention and downstream costs is described in Supplementary Tables  14 and 15.
The CEA were conducted on a complete-case (CC) and ITT basis, with missing cases imputed for the latter using multiple-imputation by chained equation (MICE) for MAR data 43 . Incremental mean-point estimates of mean cost differences (intervention minus control) over mean QALY differences between trial-groups was used to determine incremental costeffectiveness ratios (ICERs). Baseline adjustments (BA) were made using ordinary least squares (OLS) regression models with covariates including trial-group and: (1) 3-months pre-baseline costs for costs; (2) baseline EQ-5D-5L cross-walk score for QALYs; with both unadjusted and adjusted estimates reported to aid transparency in the reported results 43 . Nonparametric bootstrapping was used to calculate bootstrapped 95% confidence intervals (bCIs) and standard errors (bSE) around costs and effects, and for plotting cost-effectiveness acceptability curves (CEACs). CEACs present the probability of intervention cost-effectiveness compared to control across a range of decision maker willingness-to-pay (WTP) thresholds e.g. NICE's £20,000 to £30,000 per QALY thresholds 41 .
The base-case CEA is conducted over the initial 8-weeks. As part of scenario analyses, regression-based extrapolations were used to estimate QALYs and costs beyond 8-weeks for the waiting-list control. In the ITT dataset for the intervention-arm, OLS models were fitted to the (EQ-5D-5L) tariff score/(downstream) costs at 3 (not for costs), 6, 9 and 12 months as the response variable, independently, and tariff score/costs at baseline and 8 weeks as the explanatory variables. The predicted models were then fitted to the waiting-list control's data to estimate their potential tariff score/costs at the aforementioned time-points from which total costs and QALYs could be estimated, and CEA conducted. A step-by-step guide describing the scenario analyses is provided in Supplementary (section 4). Analyses were conducted in Stata version 15.

Data Monitoring and Trial Managing Committee
The Data Monitoring and Trial Managing Committee (DMTMC) was composed of the following personnel: the SilverCloud data manager and two trial managers, Berkshire NHS representatives including the PI and Co-PI at Berkshire, two R&D clinician research assistants, and two R&D research assistants. The DMTMC met weekly and minutes were recorded at each of these meetings. Each week participants progress was monitored for safety and routine clinical decisions e.g. stepping up a participant from lowintensity to high-intensity treatment were communicated to the DMTMC and recorded (see consort flow diagram). To ensure the validity and integrity of the data collected throughout the trial period, the data manager provided weekly progress reports to the DMTMC.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article. D. Richards et al.