Predictors of placebo response in three large clinical trials of the V1a receptor antagonist balovaptan in autism spectrum disorder

High rates of placebo response are increasingly implicated in failed autism spectrum disorder (ASD) clinical trials. Despite this, there are limited investigations of placebo response in ASD. We sought to identify baseline predictors of placebo response and quantify their influence on clinical scales of interest for three harmonized randomized clinical trials of balovaptan, a V1a receptor antagonist. We employed a two-step approach to identify predictors of placebo response on the Vineland-II two-domain composite (2DC) (primary outcome and a caregiver measure) and Clinical Global Impression (CGI) scale (secondary outcome and a clinician measure). The initial candidate predictor set of variables pertained to participant-level, site-specific, and protocol-related factors. Step 1 aimed to identify influential predictors of placebo response using Least Absolute Shrinkage and Selection Operator (LASSO) regression, while Step 2 quantified the influence of predictors via linear regression. Results were validated through statistical bootstrapping approaches with 500 replications of the analysis dataset. The pooled participant-level dataset included individuals with ASD aged 5 to 62 years (mean age 21 [SD 10]), among which 263 and 172 participants received placebo at Weeks 12 and 24, respectively. Although no influential predictors were identified for CGI, findings for Vineland-II 2DC are robust and informative. Decreased placebo response was predicted by higher baseline Vineland-II 2DC (i.e., more advanced adaptive function), longer trial duration, and European (vs United States) sites, while increased placebo response was predicted by commercial (vs academic) sites, attention deficit hyperactivity disorder and depression. Identification of these factors may be useful in anticipating and mitigating placebo response in drug development efforts in ASD and across developmental and psychiatric conditions.


INTRODUCTION
Autism spectrum disorder (ASD) is a common, lifelong, and heterogenous neurodevelopmental condition that is characterized by difficulties in social communication and interaction and repetitive/restrictive behaviors [1]. The experience of each autistic individual is unique in terms of clinical presentation, with varying symptom severity, associated symptoms, and comorbidities [1,2]. Therefore, there is no one-size-fits-all approach for ASD therapies [2].
Despite ASD prevalence estimates as high as 1 in 44 [3], no studies have demonstrated conclusive pharmacotherapeutic efficacy in targeting core symptoms of either socialization and communication difficulties or restricted and repetitive behavior [4][5][6][7][8]. Accordingly, there are no US Food and Drug Administration (FDA)-approved pharmacotherapies for core symptoms [9], making ASD among the most prevalent health conditions lacking core medication treatments. This highlights an urgent need for ongoing efforts in development of novel pharmacotherapies, but there are several notable challenges associated with ASD clinical trial design and methodology [7]. First, the genetic architecture of ASD is heterogeneous, with contributions from both rare and common genetic variants [10], in addition to environmental factors [1]. Additionally, ongoing efforts to develop more sensitive, valid, and reliable outcome measures for assessing core ASD symptoms are needed [11,12]. Current outcome measures are comprised either of subjective reporting by participants and their caregivers or clinician rater observations, which can introduce unintentional bias [13,14].
Placebo response is a ubiquitous challenge in medicine. Though reported symptoms (e.g., pain, fatigue, and psychiatric symptoms) may be particularly vulnerable to expectancy bias and other forms of bias [15][16][17], placebo response has also been demonstrated in studies investigating more "objective" disease markers such as glycosylated hemoglobin in diabetes, hepatocyte histology in non-alcoholic steatohepatitis, body mass index (BMI), and blood pressure [18][19][20][21]. Unsurprisingly, high rates of placebo response have been observed across several ASD randomized controlled trials for multiple symptoms and endpoints [5,7,[22][23][24]. By limiting the ability to discern treatment effects, placebo response may contribute to the low success rate of ASD trials and lack of approved pharmacotherapies [7]. Understanding sources of placebo response would therefore advance efforts to discern treatment effects of new pharmacotherapies desperately needed for at least a subgroup of autistic individuals [7,24,25]. While a few studies have used single samples or meta-analysis of published trials to assess potential predictors of placebo response, relatively small sample sizes and limited harmonization across studies has limited their impact [7,24,25]. These meta-analyses may be further limited by their use of study-level as opposed to participant-level data. Robust methodologies to identify predictors in larger samples are needed to better understand placebo response in ASD.
Balovaptan is a vasopressin 1a (V1a) receptor antagonist that has been investigated for the treatment of socialization and communication difficulties in autistic individuals. The balovaptan clinical development program comprised V1aduct, a phase 3 trial in adults, aV1ation, a phase 2 trial in children and adolescents, and VANILLA, a phase 2 trial in adults. The totality of evidence concluded that balovaptan did not show advantage over placebo in improving socialization and communication in ASD. Notably, a placebo response was observed across several primary and secondary endpoints, including the Vineland-II two-domain composite (2DC) (comprised of the Vineland-II Communication and Socialization domains) and the Clinical Global Impression (CGI) scale [5,6,8] (see supplement for details).
To harness the large sample size represented by these three harmonized trials, we employed a two-step statistical approach to robustly identify predictors of placebo response and to quantify their influence on clinical scales of interest in a participant-level dataset.

MATERIALS AND METHODS Experimental procedures
This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline for cohort studies. Ethics board approval was not required for this study as pooled anonymized data from clinical trials were used.
To allow for the exploration of different predictor-endpoint relationships at different time points, Week 12 and 24 cohorts were created based on respective data. Both pooled cohort (aV1ation, V1aduct, and VANILLA participant data combined) and individual cohort data were analyzed to avoid missing cohort-specific signals. The main analyses comprised individuals with complete case analysis, whereby only participants with complete endpoint and baseline data of the included predictors were retained in the study. This led to a small reduction in sample size in both Week 12 and 24 cohorts (Fig. 1A). Missing data were not imputed because missing data are largely due to trial non-completion, which would not contribute to placebo response, and imputation of multiple correlated variables, if not thoroughly designed, may introduce bias into the analyses [26].

Outcomes and covariates
The primary objective was to identify influential predictors of placebo response for Vineland-II 2DC change from baseline at Week 12 and Week 24 separately. The secondary objective was to identify predictors of placebo response for CGI -Improvement ≤3 (CGI-I; 3 = minimally improved, 2 = much improved, 1 = very much improved), at Week 12 and Week 24 separately. The candidate predictor set included different dimensions from a conceptual model based upon previously identified factors that may influence placebo response (Fig. 1B) [7,27]. In addition to basic demographic and site-specific factors, a broad range of candidate predictors were selected relating to symptom severity, diagnostic comorbidity, and family strain. Specifically, the candidate predictor set included baseline demographics (age, sex, intelligence quotient [IQ], BMI); the Social Responsiveness Score raw total score; CGI -Severity (see supplement for details): low severity (<5: mildly ill, moderately ill) versus high severity (markedly ill, severely ill, extremely ill); Vineland-II Socialization and Communication standard scores; Pediatric Quality of Life™ Inventory Family Impact total score; Repetitive Behavior Scale -Revised (RBS-R) baseline subscale scores (compulsive, restricted, ritualistic, sameness, self-injurious, stereotyped); concomitant medications; and comorbidities. In addition to number of sites per arm, individual site-specific factors included commercial versus academic, number of participants enrolled, and percentage dropout. Percentage dropout was a post-hoc predictor used as a proxy summary for unmeasurable or unobserved site-level characteristics.

Statistical analysis
We used knowledge-based and data-driven approaches in this analysis. A two-step analysis was carried out.
Step 1 (variable selection) selected influential predictors of placebo response among the candidate set of variables. In Step 2 (predictor significance), identified predictors were taken forward into linear regression analysis to quantify the size of the influence on placebo response. Analysis was repeated separately for Week 12 and 24 cohorts and for the pooled and individual cohorts across two endpoints (Vineland-II 2DC and CGI-I). All analyses used R version 3.1.4.
Step 1: variable selection To achieve robust findings, a number of methods were considered to identify influential predictors, including Least Absolute Shrinkage and Selection Operator (LASSO) [28], adding non-linear terms in LASSO regression (non-linear LASSO) [28], linear regression, and machine learning methods (random forest, neural networks) [29,30]. Internal validation was performed with the bootstrap procedure with 500 replications [29][30][31]. All models were assessed on the following performance metrics: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R 2 for the continuous outcome (Vineland-II 2DC). For the categorical outcome (CGI-I), specificity, sensitivity, and under the receiver-operating-curve (ROC) were considered. The best performing method was then selected, and the predictor was ranked based on magnitude of the coefficient. The most influential predictors, with absolute value of effect sizes greater than 1.96, were selected and passed to the next phase of analysis (Step 2) to quantify effect size. The entire analysis in Step 1 was performed for the pooled cohort and for each time point separately. In addition, to avoid missing potential predictors in individual datasets, the analysis was repeated for each study cohort separately for a total of 16 analyses (two endpoints [Vineland-II 2DC and CGI-I]; individual aV1ation, V1aduct, and VANILLA cohorts and a pooled cohort; and at two time points [Week 12 and Week 24]). All influential predictors across these analyses were collected as possible predictors to be included in Step 2.
Step 2: predictor effect Predictor effect was determined by quantifying the association between influential predictors and placebo response.
Step 2 was necessary as estimates derived from penalized approaches used in Step 1 (e.g., LASSO) was by design, biased, as the algorithms prioritized predictive performance [28]. The relationship between the influential predictors and endpoints were instead evaluated using linear regression (change from baseline in Vineland-II 2DC and CGI-I) separately for each time point in the pooled cohort only. For completeness, results derived from the original model (including only the predictors specified in the trial protocol) and from the updated model (adding influential predictors) are reported. Only results of statistical significance are discussed (i.e., those corrected by Bonferroni testing for multiple comparisons).

Baseline characteristics
The pooled participant-level data included autistic individuals aged 6 to 62 years (mean age 21 [SD 10]), among which 263 and 172 participants received placebo, and 405 and 248 received balovaptan at Week 12 and 24, respectively (Fig. 1A). Baseline characteristics were well balanced across balovaptan and placebo groups (Table 1A, B). Mean (SD) age in the Week 12 and 24 cohorts, respectively, was 21.7 (10.0) and 20.0 (10.1) in the placebo group and 20.8 (9.5) and 18.6 (9.9) in the balovaptan group. Baseline characteristics for the individual study cohorts (i.e., aV1ation, VANILLA, and V1aduct) are included in the Supplemental Materials (Tables S1-3). The distribution of comorbidities and concomitant medications for individual study cohorts are provided in the Supplemental Materials (Fig. S1). Fig. S2 shows the distribution of outcomes and predictors of participants in the placebo arm, separately for Week 12 and 24 analyses. We found that, apart from site-related variables, all clinical scales share similar distributions in Week 12 and 24, including the pattern of outliers (<5% of participants overall).

Predictors of placebo response
Step 1: variable selection. For Vineland-II 2DC model comparison between linear regression, LASSO non-linear form, random forest, and LASSO for the pooled Week 12 and 24 cohorts is shown in           (Fig. S3).   (Fig. S4).
Step 2: predictor effect. The original predictors included in the respective trial protocols are baseline Vineland-II 2DC, age, sex, country, and IQ. The additional variables identified in Step 1 were added into the linear regression model in Step 2. Table 3

DISCUSSION
This study aimed to identify and quantify the influence of predictors of placebo response in three large harmonized clinical trials of balovaptan in ASD. High rates of placebo response can mask therapeutic signal detection and are increasingly implicated in failed ASD trials [5,7,8]. However, there are few investigations into placebo response in ASD trials and to our knowledge, only one has assessed participant-level data [7,24,25]. By leveraging participant-level data from three ASD multi-site trials of a single investigational medication across a large span of ages (ages 6-62 years), several participant, protocol, and site-related factors were found to influence placebo response on the primary outcome, Vineland-II 2DC.
For CGI-I, clinical response was considered as a CGI-I of 1 (very much improved) or 2 (much improved). For Vineland, cut-offs for minimally clinically important difference (MCID) for pediatric populations may vary by age. In adults, the Vineland-based MCID was set at 4 or 6 based on prior efforts and clinician consultation on MCID [32]. Approximately 18% of participants receiving placebo across the three studies reported a clinically significant response of a CGI-I score 1 or 2. In VANILLA 37.9% of adult participants receiving placebo met the MCID Vineland-II composite score criteria of ≥4 points at Week 12 and in V1aduct 48.5% met the MCID Vineland-II 2DC score of ≥6 points at Week 24 [5,6]. Among participant-related factors, higher baseline Vineland-II 2DC, i.e., better adaptive functioning, was shown to reduce placebo response. This may be consistent with other studies that have demonstrated higher rates of placebo response with increased symptom severity in ADHD and hyperactivity associated with ASD [24,33]. However, our finding of decreased placebo response in individuals with better adaptive functioning is notably in contrast to other investigations associating increased placebo response to lower symptom severity in studies of major depressive disorder, anxiety disorders, and several medical areas [34,35]. This raises the possibility that individuals with higher baseline adaptive functioning may have less measurable room to improve upon already established adaptive skills [36]. Additionally, acquired skills at more advanced levels are complex and may require time courses for development beyond a 6-month trial duration. Because adaptive functions are a dimensional construct, raw scores on the Vineland-II 2DC vary by age and improve over time [37]. This may add noise to analyses of placebo response, at least with respect to adaptive functioning. A previous study assessing the use of citalopram for ASD identified that worse adaptive functioning (as measured by the Vineland-II Socialization domain only) predicted greater placebo response, a parallel to the results presented here [24].
Depression and ADHD comorbidities were shown in this study to be associated with increased placebo response. It is possible that individuals with comorbidity had positive past treatment experiences in management of their comorbidities that may have contributed to expectation bias and subsequently a higher placebo response [38]. Furthermore, participating in a clinical trial may encourage greater adherence to all concomitant medications, in addition to the active treatment, resulting in better outcomes. Though the presence of psychiatric comorbidity could simply be an indicator of higher overall impairment, it is notable that placebo response has been observed in both depression and ADHD randomized controlled trials. Importantly, depression is an episodic disorder more prone to spontaneous remissions [39,40]. Depression and ADHD symptoms may also contribute to impairment in adaptive function, and improvements in these cooccurring conditions would be expected to manifest as better Vineland-II 2DC performance.  Higher BMI was shown to reduce placebo response in the Week 24, but not the Week 12, analysis cohort. BMI was previously shown not to be a significant predictor of placebo response in a meta-analysis of 86 randomized ASD pharmacologic or dietary supplement placebo-controlled trials [7]. However, as average BMI varies with age and sex, it is possible that differences in baseline characteristics between studies may have led to this inconsistency.
Higher baseline RBS-R ritualistic and RBS-R compulsive scores were associated with increased placebo response. These findings are unexpected, as we would anticipate that individuals with lower severity of ritualistic or compulsive behavior may be less resistant to change. In contrast, RBS-R sameness, RBS-R restricted, and RBS-R repetitive scores had no significant influence on placebo response. Another possible explanation may lie in the statistical properties of variables. Specifically, baseline RBS-R subscales are not correlated with other predictors, but correlations among the different RBS-R subscales were high (Fig. S5). This indicates that any statistical effects may be partially explained across the subscales. Further research will be required to understand the mechanistic and clinical reasons that underlie these findings. To our knowledge, no studies have identified a correlation between baseline severity on RBS-R scales and placebo response.
Among site/protocol-related factors, in line with previous literature [41,42], we identified that placebo response is more likely at commercial versus academic sites. This could be due to the participant populations or the expertise at the respective sites [43]. Commercial sites may rely more heavily on study-specific recruitment efforts and advertisements, which may generate more expectation bias. Conversely, participants and their families at specialized academic sites may have more research familiarity, sophistication in understanding the importance of objective reporting, and hence be less prone to placebo response. Likewise, academic investigators may be more experienced in working with autistic individuals and may be better able to mitigate expectation bias [43]. We found that longer trial duration was predictive of a decreased placebo response, in agreement shorter trial duration has been predictive of high placebo responses in previous studies [16].
Consistent with findings in a meta-analysis of 421 antidepressant trials [44], higher dropout rate per site across the balovaptan trials was associated with increased placebo response. Dropout rate was used as a post-hoc factor that acts as a proxy for unmeasurable features of site management and participantrelated factors (e.g., expectation, heterogeneity, proximity to the site, etc.). It is possible that sites that better prevent dropout are also setting more modest expectations of potential benefit, leading to less disappointment if improvement is not seen. Sites with higher dropout rate may have set greater response expectations, leading both to higher placebo response and greater likelihood of participants dropping out if response is not seen. Participants who complete studies may also have stronger ties to the recruitment site and given their past experiences as patients and research participants, may be less prone to placebo effects. Similarly, in trials with a smaller dropout rate, the raters may be more familiar with the participant and better able to provide consistent and accurate ratings, particularly at study entry. Fewer predictors and lower overall predictor effect were identified in the Week 24 versus Week 12 cohorts, suggesting that placebo response may be less likely in longer trials. This may indicate a learning curve for participants and their support providers in observing/reporting adaptive skills as measured by the Vineland-II 2DC. In parallel with our findings, shorter trials have been shown to increase placebo response in depression trials [45]; although this has not been previously observed as a predictive factor in ASD trials [7].
Interestingly, we observed that while both Vineland-II 2DC and CGI-I were subject to placebo response in the balovaptan trials, the predictors of placebo response identified in the Vineland-II 2DC were not replicated in the CGI-I. While the scales may share some similarities, the Vineland-II 2DC is specific to caregiverreported adaptive functioning, compared with CGI, which assesses global improvement through clinician observations [32,46]. Furthermore, due to the categorical structure of the CGI scale, modest improvements on the more dimensional Vineland-II 2DC may not be reflected in the CGI. Caregiver ratings have been previously reported to be more likely subject to placebo response than clinician ratings in ASD trials [7,47]; however, another metaanalysis reported the opposite [25].
Participants' treatment expectations are a known mediator of the placebo response [48]. Participants and their families in the aV1ation and V1aduct trials may have been impacted by press releases highlighting balovaptan's FDA breakthrough therapy status [49]. Within the balovaptan trials, steps were taken to reduce treatment expectation such as specific participant, caregiver, and site training on placebo response (given by an independent provider), with the goal of managing expectations. Additionally, within aV1ation and V1aduct trials, central review and consistency checks of Vineland-II administration were performed, which may have mitigated placebo response. Prior studies have emphasized the importance of reducing placebo response by modifying the design of future trials [7,47]. A recent meta-analysis of 122 major depressive disorder clinical trials indicated that adapting trial methodology to reduce placebo response, e.g., sequential parallel comparison design and placebo lead-in phases, may be less beneficial than focusing on other factors, such as enhanced participant selection, rater training, and treatment adherence monitoring [50]. The desired participant profile may vary depending on the outcomes and target symptoms being explored, requiring a balanced study-specific approach when considering participant-related factors. For example, a participant with higher symptom severity may have more room for improvement on the outcome measure of interest but greater probability of treatment resistance that could interfere with response.
An important potential modification to trial design includes the development of biomarkers as outcome measures [13,51,52]. Clinician-and caregiver-rated scales can be highly subjective, unintentionally biased, or fail to comprehensively assay symptoms, particularly in the case of heterogeneous conditions such as ASD [13,53]. Therefore, it will be beneficial for objective, robust, and quantifiable outcome measures to be developed and utilized, including fine-grained observation of behavior such as eye tracking or biomarkers such as electroencephalography [12,51]. Several initiatives are ongoing for the development of biomarkers for ASD, including the Autism Biomarkers Consortium for Clinical Trials [13]. Additionally, biomarkers may indicate differential neurobiological drivers that could be useful in enhancing participant selection based on known drug interactions [54].
Strengths of our analysis include that two of the studies had a long (6-month) duration, enabling rich analysis. The studies are also harmonized in their design (i.e., similar inclusion/exclusion criteria, common baseline measures, and outcome measures), enabling pooling of the participant-level data for a more robust analysis, as opposed to prior meta-analyses that compare study results [7,25]. Furthermore, diversity in sites and inclusive entry criteria can be more readily generalized to a clinically identified autistic population. Robust methodology was also used for analysis, embracing both data-driven results and clinically driven insights, with the model and codes developed herein being fully reproducible and easily repeated or adapted by other researchers to identify placebo response predictors. In contrast to other efforts, we also reported operations and goodness-of-fit statistics [55].
Study limitations include the fact that evaluation of placebo response factors was not an a priori goal of the balovaptan trials. Greater expectation bias may be observed in trials of a later phase due to previous positive results. The pooled cohort comprises two phases of the balovaptan clinical development program (i.e., phase 2 aV1ation and VANILLA and phase 3 V1aduct), and greater expectation bias in the V1aduct trial may have skewed results in the direction of increased placebo response [5]. There were also an unequal set of sites in the US versus Europe, making it difficult to differentiate the impact of each region on placebo response. Furthermore, we were unable to assess all variables identified in the conceptual model (Fig. 1B) as some data were unavailable and/or challenging to quantify. In addition, the overall R 2 for the analyses was low (18% for Week 12 analysis and 21% for the Week 24 analysis); however, this reflects the current limited understanding of placebo response in ASD. In line with this, there are a lack of ASD comparator studies, with none to date reporting R 2 [7]. At present, there are no available datasets to test external validity of our results; however, this may be tested in the future as more research is carried out on placebo response. Finally, findings of the placebo response from Vineland-II 2DC may not be generalizable to other outcomes, including self-report measures.

CONCLUSION
Our findings have identified several predictors of placebo response that can be further validated and ultimately considered for mitigating placebo response in future ASD trials. The application of our novel statistical methodology and associated findings may extend beyond ASD and contribute more broadly to psychiatric clinical trials. Better understanding of factors influencing placebo response may improve trial methods and lead to the development of efficacious therapies in ASD and other neuropsychiatric conditions.

DATA AVAILABILITY
For up-to-date details on Roche's Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see here: https:// go.roche.com/data_sharing. Requests for data underlying this publication require a detailed, hypothesis-driven statistical analysis plan that is collaboratively developed by the requestor and company subject matter experts. Such requests should be directed to datarequest.autism@roche.com for consideration. Anonymized records for individual patients across more than one data source external to Roche cannot, and should not, be linked due to a potential increase in risk of patient re-identification.