Introduction

Excessive alcohol use is associated with an increased risk of injury, liver disease, heart disease, and cancer, resulting in roughly 140,000 U.S. deaths and 3.6 million years of potential life lost annually1,2. Approximately 17% of U.S. adults binge drink (drinking 5 or more drinks on an occasion for men or 4 or more drinks for women)1,2. There has been an acceleration of alcohol-induced deaths across the U.S. population in the first two decades of the 21st century3, with notable increases in mortality associated with alcohol-associated liver disease in women4.

Alcohol use disorder (AUD) can be conceptualized along a spectrum of risk, with appropriate interventions dependent on an individual’s risk of progressing to severe AUD over the lifetime5. Improving access to effective interventions across a broad base of the population is warranted for a disorder as pervasive and damaging as AUD1,4. Patient self-management can be fundamental to addressing AUD because it allows individuals with heavy use to initiate their own positive behavior change, and technology can play an important role. Riper et al.6 completed a systematic review highlighting the potential of internet-based interventions in offering scalable, accessible, and effective support for individuals with AUD in both community and healthcare settings6. Digital medicine apps hold promise for the treatment and continuing care of patients with AUD because such systems can provide a widely accessible vehicle for both self-management and clinical monitoring7. However, there has been little research on the implementation of digital medicine technologies for AUD to reach populations that could benefit from early intervention5. A key, unanswered research question relates to the role of human support (e.g., medical providers, health coaches) in enhancing the effectiveness of digital medicine apps8,9,10.

This study tested several pragmatic digital medicine support models that systematically varied the level of human support used to facilitate the implementation of an evidence-based digital medicine intervention for AUD. The primary aim was to detect the effectiveness of each support model on self-reported heavy drinking days (HDD: 5 or more standard drinks on any day for men under 65, 4 or more standard drinks for women and men over 65) and quality of life (QOL). We conducted a 12-month randomized controlled trial using a hybrid type 1 effectiveness-implementation study design. Type I hybrids are intended to test the effects of an intervention while gathering information on implementation11. Our rationale for this design is that a low-touch model is the least costly and easiest to implement. However, if the active involvement of either peer support specialists or health coaches increases the intervention’s effectiveness, the additional cost may represent a worthwhile investment. Our hypotheses were that, for patients with mild-to-moderate AUD, (1) group-oriented peer mentorship would be more effective than self-monitoring; (2) 1:1 health coaching would be more effective than self-monitoring; and (3) 1:1 health coaching would be more effective than group peer mentorship

Results

Study flow and survey completion

Among 2317 individuals who completed the eligibility screening, 1254 (55%) met the screening criteria and were invited to join the onboarding process; 558 (44%) completed the onboarding process and were randomized (Fig. 1). Two main reasons for those excluded during onboarding (n = 696) were not creating a Tula account (n = 414 or 59%) and not scheduling a randomization call (n = 156 or 22%). The completion rate of the follow-up surveys was overall 78% (1756/2232), ranging from 93% at the end of month 3 to 67% at month 12. The mean number of surveys completed per person across groups was 3.14 out of 4 follow-up surveys, with SD of 1.31. The clinically integrated (CI) group participants (M = 2.75, SE = 0.11) completed fewer follow-up surveys than the self-monitored (SM) or peer-supported (PS) group participants (Wald Chi-square (2) = 14.179, p < 0.001).

Fig. 1: Study flow diagram.
figure 1

The study flow diagram shows the number of participants from the initial screening to randomization through completion of follow-up surveys. Participants who contacted the research team to dropout of the study: SM = 0, PS = 1, CI = 9.

Characteristics of the study population

Table 1 shows the baseline characteristics of the study population. The majority (91%) of the study population was white, slightly higher than the proportion of white patients in the overall patient population in the health system (82%). The proportion of black participants enrolled in the study was roughly the same as their representation in the health system population (4.8%) and the Madison, WI metropolitan area. Females were overrepresented, comprising almost 2/3 of the study population. The population was highly educated, reflecting that Madison, WI, is the 5th most educated city in the US. Except for education level, all other demographics and outcomes at baseline were similar between groups. The CI group had a slightly higher percentage with a high school degree/GED as their highest education level but a slightly lower percentage of those who completed a vocational or associate degree. The PS group had a slightly lower percentage with master’s degrees.

Table 1 Baseline characteristics of the study population

Intervention utilization

Figure 2 depicts the percentage of participants who logged into Tula at least once during each study week. Use declined over time, most notably among the CI participants. Participants clicked an average of 806 links/pages in the Tula app during the 12-month study period with a wide range from 6 to 4844 pages. An average of 11.2 out of 12 weekly check-in surveys were submitted during the 3-month intervention period (with incentives), and 28.3 out of 40 weekly surveys were submitted during the rest of the 9-month follow-up period (without incentives). The PS and CI group participants could read and post messages on Tula discussion forums. Most PS and CI participants (96%, 178/186 and 97%, 182/187, respectively) read Tula discussion forum messages, and 96 (52%) PS and 52 (28%) CI participants posted one or more messages. Health coaching visits were utilized by 125 (67%) CI participants, with a mean number of 1.7 visits and a SD of 1.3.

Fig. 2: Percentage of weekly Tula users over time.
figure 2

Illustrates the percentage of Tula users each week during the 52-week (or 12-month) study period. The numerators are the number of participants who used the Tula app in a particular study week while the denominators are the number of participants actively enrolled in the study during that week.

Descriptive analysis–primary outcomes

Mean values of primary outcomes from baseline to month 12 for all three groups are listed in Table 2 and visualized in Fig. 3.

Table 2 Outcomes of the study population
Fig. 3: Primary outcomes over time.
figure 3

Presents the group means of the primary outcomes over time (percentage of heavy drinking days, quality of life-physical health, and quality of life-mental health). The primary outcomes were collected in quarterly surveys at 0, 3, 6, 9, and 12 months after randomization.

Absolute and relative risk reductions between outcomes at baseline to 12 months are listed in Table 2. An overall decline in percentage of heavy drinking days (PHDD) over time—from an average 38.4 PHDD (95% CI [35.8, 41]) at baseline to 22.5 PHDD (19.5, 25.5) at month 12, about a reduction of 15.9 PHDD or a 41% reduction in PHDD—was observed across all the study groups. However, rates of PHDD in the SM group bounced back slightly at month 12 while the PS group remained the same and the CI group continued to decrease. The physical health scores showed little change except for a slight improvement for the PS group from month 6 (48.7) to month 9 (49.6). In the mental health domain, all three groups showed slight improvements in mental health, with more noticeable improvement for the CI (5.9%) and PS groups (4.8%) from baseline to month 12 compared to the SM group (2.6%).

Primary statistical analysis

Percentage of heavy drinking days (PHDD)

The results of the General Linear Mixed Model (GLMM) analysis are listed in Table 3. There was no group effect on PHDD (F(2,1729) = 0.375, p = 0.688). However, a significant overall time effect (F(2,1729) = 6.142, p < 0.001) and a trending group by time effect (F(6, 1729) = 1.936, p = 0.072) were found. The overall PHDD declined significantly from month 3 to months 9 and 12 (p < 0.001 and p = 0.005, respectively). The pair-wise comparisons for the group by time effect showed that PHDD had a significant reduction from month 3 to months 9 and 12 among the CI group participants (F(3,1729) = 6.773, p < 0.001) but not in the other two groups (SM: F(3,1729) = 2.124, p = 0.095; PS: F(3,1729) = 0.474, P = 0.7).

Table 3 Treatment and time contrast on primary outcomes

Quality of life-physical health (QOL-PH)

No group effect (F(2,1728) = 1.345, p = 0.261) was found in the mixed model analysis for QOL-PH. The overall time effect was significant (F(3, 1728) = 2.856, p = 0.036), but the group by time effect was not (F(6,1728) = 0.695, p = 0.654). The pair-wise comparisons between the time points and the group-by-time interactions were not statistically significant.

Quality of life-mental health (QoL-MH)

An overall group effect (F(2,1728) = 4.277, p = 0.014) and time effect (F(3,1728) = 4.597, p = 0.003) were found for QoL-MH. The group-by-time effect was not significant (F(6,1728) = 0.984, p = 0.434). The pair-wise contrasts showed that the CI participants overall reported a better level of mental health (p = 0.011) than the SM group. The significant group difference was also found at 9 months (F(2,1728) = 3.989, p = 0.019) with a trending significance at 6 months (F(2,1728) = 2.862, p = 0.057) and 12 months (F(2,1728) = 2.644, p = 0.071). The overall QoL-MH across the three groups was found to improve significantly from 3 to 6 months and 3 to 12 months (p = 0.006 and 0.023, respectively). The PS participants reported significant improvement in QOL-MH (F(3,1728) = 4.681, p = 0.003), specifically from 3 months to 6 and 12 months (p = 0.01 and 0.015, respectively).

Discussion

This hybrid implementation-effectiveness trial systematically varied the degree of human touch offered to support the implementation of an evidence-based digital health app for alcohol use reduction in participants with mild-to-moderate AUD. All three randomized groups had reductions in PHDD—from 38.4% (95% CI [35.8%, 41%]) at baseline to 22.5% (19.5%, 25.5%) at 12 months, representing a statistically significant time effect. The CI group showed a significant reduction in alcohol use from 3 to 9 and 12 months (p < 0.001 and p = 0.005, respectively) while the other two groups did not. There were no statistically significant differences in PHDD reduction among the SM, PS, and CI groups. The clinical and societal significance of a 16-percentage point reduction (41% relative reduction) in PHDD is difficult to gauge. For context, the CDC estimates that excessive alcohol use is responsible for more than 140,000 annual deaths and an economic cost of $249 billion1. AUD is a chronic, relapsing, and remitting condition that can worsen over the lifespan, and arresting problematic use across a broad base of the population before alcohol-related complications manifest could have significant population health benefits.

Concerning the primary outcome of QOL, the null hypothesis was not rejected, with the SM group showing similar effectiveness as the PS and CI groups. The CI group showed significant improvements in mental health QOL scores compared to the SM group (p = 0.011). However, higher rates of attrition in the CI group warrant caution in interpreting the results (see Supplementary Information). It appears that younger persons dropped out at a higher rate than older participants, and a pattern mixture model conducted via sensitivity analysis (as opposed to the base case assumption of data missing at random) negated the positive effect on mental health QOL in the CI group (i.e., improved mental health observed among CI participants may be due to different drop-out patterns vs. the intervention itself).

There has been limited published research on digital medicine interventions targeted at this population. Crane et al.12 examined the Drink Less app in the U.K., which enrolled 672 participants12. In that study, only 179 (27%) participants completed the primary outcome measure, and no main effects on alcohol use were found. The literature suggests that systematic tracking of alcohol use and mindful awareness of drinking patterns can serve as effective interventions per se6,13,14, and this type of functionality was supported by all three groups, including the self-monitoring group.

With 558 participants, this study is the largest controlled trial featuring a version of A-CHESS. An implementation study of an A-CHESS variant in 3 federally qualified health centers showed positive effects on alcohol and substance use outcomes but was not sustained after the trial ended15. A 2 × 2 factorial trial featuring a head-to-head comparison of A-CHESS (a digital health intervention) vs. Telephone Monitoring and Counseling (a telehealth intervention) found that both interventions had positive and statistically equivalent effects on HDD, although there was no additional benefit found by combining the 2 interventions16. Kiluk et al.17 compared usual care to clinician-delivered CBT and to an internet-delivered program called CBT4CBT. After 12 weeks, all three arms showed reductions in substance use, with higher reductions in the clinician-delivered CBT arm and the CBT4CBT arm when compared to the usual care. However, the clinician-delivered CBT arm had a higher drop-out rate than the other two arms17. Tetrault et al.18 compared CBT4CBT to in-person treatment and found positive effects for both groups but no significant difference between them18.

It is essential to highlight differences in the current study population compared to prior studies of A-CHESS16,19. In contrast to prior studies that enrolled participants in recovery for severe AUD, our study participants had mild-to-moderate AUD. Connection with peers has been integral to prior versions of A-CHESS, and this dynamic was modeled in the PS group via peer mentorship and the discussion forum. However, peer mentoring had minimal additional effect compared to self-monitored use in this study, perhaps because isolation is not as prevalent as it often becomes for persons with more severe AUD.

With 558 randomized participants and an overall response rate of 78%, this study was among the most expansive and rigorously conducted digital health trials for alcohol use reduction. Follow-up rates were high compared to norms in digital health research, where a 2022 systematic review found median completion rates of only 48% across 37 studies (a 78% response rate would correspond to the first quartile)20. Follow-up rates did vary by group, however. The research team took note of higher drop-out rates in the CI group early in the trial and made repeated efforts to contact participants to increase follow-up, with limited success. We don’t know why attrition was higher in the CI group. While representing only a subset of participants lost to follow-up (Fig. 1), 10/558 participants asked the research team to be withdrawn from the study, and 9/10 were in the CI group. The study team is also conducting qualitative interviews to examine participants’ experiences with the study interventions. Early indications are that attrition may have resulted from mismatches between the intervention and participants’ preferences surrounding issues such as privacy and intervention intensity. This preliminary finding is consistent with evidence that some individuals with behavioral health issues are more comfortable sharing personal information with a computerized intervention vs. a human interventionist21. Our findings are also consistent with findings by Kiluk et al.17, showing unexpectedly high attrition in human-delivered vs. automated cognitive behavioral therapy17.

This study focused on implementation of digital health in a primary healthcare system and was intentionally pragmatic vs. explanatory, as outlined by PRECIS-2 criteria22. Pragmatic study design decisions were made in partnership with health system leadership, including self-reporting of alcohol use without biomarker verification, unblinded assignment for implementers and participants, and implementation by healthcare and community-based practitioners vs. research staff. Importantly, the study did not include a pure control group. Doing so would require a “sham device,” which was deemed impractical given the aims of the study: no plausible placebo digital medicine support model for AUD could be envisioned that would be acceptable to the health system. Participation in research can have effects, regardless of the specific interventions tested, and other factors may explain the alcohol use reductions observed (e.g., temporal trends, responding to monetary incentives).

Prior research suggests that people who use complementary and alternative medicine interventions tend to be female, of middle age, and have more education23. Our sample reflects this. Our use of convenience sampling may limit generalizability to other populations and highlights the need to actively engage communities to keep health equity concerns at the forefront of digital health implementation research24.

Paying mindful attention to alcohol use may be the first step in contemplating and addressing problematic habits and patterns of use that may lead to future problems. While adding peer support or 1:1 health coaching did not clearly yield more effectiveness in alcohol use reduction, incorporating patient preferences for using digital health apps may be worth considering when choosing digital medicine support models, particularly in light of the mental health improvements observed in the CI group. Additionally, the more intensive nature of the CI intervention may have represented a lack of fit for persons with mild/moderate AUD. Choice in selecting an intervention has been shown to improve outcomes in other areas of research, such as heart disease25. Thus, one possible implication for health systems would be to consider offering a self-guided version of the app with an option for supplemental health coaching for individuals who desire a personal connection or additional support.

The trial results suggest that self-guided use may be as effective as more intensive digital health support models guided by peer mentors and health coaches in a population with mild-to-moderate AUD. A self-guided support model is certainly less expensive and simpler for health systems to adopt than human-guided support models, and the differential attrition in the CI group suggests that persons with mild-to-moderate AUDs may value autonomy and anonymity over personalized and more intensive health coaching. In conclusion, making an evidence-based digital health intervention focused on wellness, mindful awareness, and prevention of AUD progression may be a viable option for health systems looking to promote evidence-based interventions for AUD.

Methods

Theoretical and empirical foundations

The protocol was approved by the University of Wisconsin Health Sciences Minimal Risk Institutional Review Board (2019-0337). This study was prospectively registered at clinicaltrials.gov on 7/03/2019 (NCT04011644). The study protocol was previously published and summarized here26. This trial used a digital medicine app called Tula (Sanskrit for “balance”) adapted from the Addiction-Comprehensive Health Enhancement Support System (A-CHESS)—one of the first digital medicine apps whose efficacy was supported in a trial of patients recovering from severe AUD after leaving 90-day residential treatment19. These results were replicated when A-CHESS was shown to be effective in reducing heavy drinking in a 2022 study comparing telephone- and smartphone-based continuing care for AUD16. A-CHESS has also been tested across a range of other populations15,27,28,29.

The theoretical framework supporting the A-CHESS app is self-determination theory, which asserts that meeting three fundamental psychological needs (competency, relatedness, and autonomous motivation) produces intrinsic motivation for behavior change30. Unlike A-CHESS, which focuses on abstinence as part of recovery from severe AUD, Tula focuses on reducing alcohol consumption for individuals with mild-to-moderate AUD. Tula is built upon the Whole Health Model, which focuses on “what matters to me” as a whole person and not just the disease (i.e., “what’s the matter with me?”)26. Tula places alcohol use within the context of integrative medicine, an adaptation made based on input from partnering healthcare stakeholders26. Tools and services included in Tula are listed in Table 4

Table 4 Tula content and tools

Eligibility criteria

Eligibility criteria are listed in Table 5. To maintain anonymity, all participants were identified by usernames they created while setting up their accounts.

Table 5 Patient inclusion and exclusion criteria

Setting

Study participants were recruited within the geographic catchment area of an integrated health system serving more than 700,000 patients each year in Wisconsin and the Upper Midwest. Individuals interested in the study used a QR code or a link from the study website to complete a secure, web-based screening survey. IP addresses and location verification were used to confirm that research participants lived within the health system’s catchment area.

Digital medicine support models: study groups

The study was designed to examine the clinical integration of digital medicine, as described by Hermes et al.31, where we compared a fully automated support model with two human-guided support models, differentiated by graduated levels of external support. While all participants had access to the basic content in Tula, the three study arms varied by the level of human touch (self-monitored “low touch,” peer-supported “medium touch,” and clinically integrated “high touch”). See Fig. 4.

Fig. 4: Study groups.
figure 4

The three study groups are listed in the columns, and their unique properties are compared in the rows.

Self-monitored (SM) group (low touch)

Participants in this group used the app on their own, like most commercially available health apps. They had no access to a discussion forum or private messaging nor any connection to an external support system (e.g., peer support specialist, health coach). Participants could communicate with the study team via email, phone, or a messaging function built into Tula. However, communication was limited to questions about the study and receiving tech support.

Peer-supported (PS) group (medium touch)

Participants in this group had access to social support from others in the same study arm via Tula’s discussion forum and to certified peer specialists who worked for a community-based non-profit organization that has historically partnered with the health system to provide peer mentoring services for patients with substance use disorders. Peer specialists are certified recovery coaches who have lived experience with substance use and/or mental health issues. They were not assigned to individual participants but were available to any participant who reached out via Tula’s private messaging feature. They also facilitated and moderated discussion forums and encouraged participants to utilize Tula’s resources.

Clinically integrated (CI) group (high touch)

Participants in this group worked one-on-one with health coaches employed within the healthcare system and had access to a discussion forum with other participants in their study arm. Health coaches are trained and certified in behavior change theories, motivational strategies, and health education. They worked with participants 1:1 to set goals for a healthier lifestyle through reduced alcohol consumption. Health coaches engaged with participants via Tula’s private messaging feature. Participants were offered three 1:1 health coaching sessions (via phone). Tula’s Tracker feature was also monitored by health coaches as part of goal setting.

Participant recruitment, retention, and timeline

We used a three-pronged recruitment strategy encompassing clinical settings, community-based organizations, and public media. We enlisted clinical study champions (primary care providers, behavioral health specialists, etc.) to provide information to potentially eligible patients. We also engaged local leaders from underrepresented communities to promote the study in ways that invite the inclusion of diverse perspectives. Lastly, we used targeted digital, television, and print media to promote the study broadly.

All participants who completed the eligibility screening survey received a $10 digital gift code. Study incentives were built into the first 12 weekly check-in surveys (as part of the self-monitoring feature of Tula) and four quarterly follow-up surveys at months 3, 6, 9, and 12. The incentives were distributed monthly in the form of digital gift codes sent via the message feature in Tula. By the end of the 12-month study period, participants could earn up to US $250 in digital gift codes for data submission.

Eligible participants were given a 72-hour window to download the app, complete a baseline survey, and schedule a phone call with research staff for enrollment. Recruitment lasted from March 2020–September 2023, with the last participant completing their 12-month intervention in September 2024.

Patient randomization

A sequence of randomized assignments for each group was generated by the project statistician using the block randomization procedure in the Power Analysis and Statistical Software package32. Randomization was stratified on participant-reported biological sex (male/female) and alcohol use severity (mild/moderate) based on DSM-5 score (mild ≤ 3 and moderate = 4–5)33, using participants’ responses to the screening survey. The randomization list was masked and placed in a protected spreadsheet. The randomization process involved a phone call between a study team member and the participant. When the study arm assignment was unmasked, a study ID was entered into the next available placement in the randomization sequence. While on the call, the study team member configured the participant’s app permissions based on their group assignment and reviewed a second informed consent specific to that study arm.

Blinding

Due to pragmatic and ethical considerations, the randomization assignment was not blind to either implementers or participants.

Outcomes

Demographics and severity

Participants reported biological sex and their AUD severity (mild or moderate; those with severe AUD were excluded) in the screening survey, which was used as strata for randomization. Participants reported additional demographics, including age, race/ethnicity, education, and income in the baseline survey.

Primary outcomes

The primary outcomes were HDD and QOL. HDD was measured using the 7-day timeline follow-back survey34,35. At baseline and during each quarterly survey, participants were asked how many days they had 5 or more standard drinks (for men under 65) or 4 or more standard drinks (for women and men over 65) on any day in the last 7 days. The definition of a standard drink was “One drink = one shot of hard liquor (1.5 oz.), 12-ounce can or bottle of beer, or 5-ounce glass of wine.” To allow for comparison with other studies using varying recall periods, we converted the number of HDD to a percentage (PHDD) by dividing by 7 (days/week).

Patient-reported QOL was measured using the four global physical health and four global mental health items in the Patient-Reported Outcome Measurement Information System (PROMIS) global health short form (SF10 ver.1.2)36. The raw scores for physical and mental health were converted to T-scores using the PROMIS scoring manual (p.16)36. T-scores are standardized scores that can be compared to the US general population and have a mean QOL score of 50 with a standard deviation of 10.

Participants received automated reminders to complete surveys in the app. Survey data were stored on a secure server at the University of Wisconsin.

Sample size

The study was designed to detect differences in two primary outcomes (HDD and the QOL) among three study arms across a 12-month period. Sufficient power (1–β = 0.80, multiple comparison-adjusted α = 0.00833, two-tailed) to detect a conservative effect size of Cohen’s d = 0.25 with four repeated measurements and a first-order autoregressive covariance structure (correlation ρ = 0.2) required ~182 participants per study arm (546 total), assuming 28% attrition16. An effect size of 0.25 equated to a difference of ~0.24 HDD per week, 2.18 points for QOL-Physical Health, and 2.03 points for QOL-Mental Health36.

Statistical methods

Descriptive analysis

We conducted descriptive analysis for demographic, utilization, and outcome variables across all three study arms. Group comparisons were conducted using chi-squared tests for categorical variables and t-tests for continuous variables. All statistical tests were two-tailed with α = 0.05.

Primary analysis

We constructed a longitudinal model of the outcome measures at months 3, 6, 9, and 12 after randomization using the GLMM with a random intercept and an auto-regression covariance structure for repeated measures. The stratification variables and the baseline values of the outcomes were included as covariates, with a separate model for each primary outcome. The primary analysis is the group fixed effects and pair-wise comparisons between study arms. Other significant fixed effects and pair-wise comparisons were reported. The Sidak method was used to adjust p-values in pair-wise comparisons37. Because the value of PHDD has a positive skewness, we used the square-root transformed PHDD, consistent with prior analyses of A-CHESS16. Participants who could not be reached for follow-up surveys were analyzed following the intention to treat principle. Missing outcomes data were considered missing at random. Other secondary analyses (mediation, economic, and qualitative)26 will be reported separately.

Ancillary analyses

Sensitivity analyses are included to examine model assumptions regarding equivalence of groups at baseline, skewness of the primary outcome, and to account for differential attrition (see Supplementary Information).

Important changes after trial commencement

Originally, in-person health coaching was available, but due to COVID-19, coaching was only offered via telephone. We also added QOL as a co-primary outcome26 to provide a more holistic evaluation for individuals with mild-to-moderate AUD and performed a square-root transformation on PHDD to be consistent with prior analyses16.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.