Introduction

Considerable variation exists in the reported usual management (“standard of care”) of pediatric conditions between centers.1 Variability in practice is not uncommon, and may be due to several factors including differences in hospital budgets, physician and patient preferences, and the adoption of new treatment guidelines and technologies at variable rates.2 In light of this, for scientific reproducibility to be possible, it is essential that researchers provide adequate descriptions of “standard of care.” The use of the term “standard of care” (or treatments-as-usual, usual care, regular treatment, or existing practice) to describe control groups has been previously investigated.3,4,5,6,7 These studies have demonstrated that what constitutes “standard of care” is often unclear.3,4,5,6,7 An example of deficient descriptions of “standard of care” control arms was demonstrated in a previous evaluation of behavioral interventions for youth with type 1 diabetes.8 Control arm descriptions of the three trials included a “traditional” treatment involving routine access to a clinical psychologist and social worker,9 “traditional” treatment with the ability to contact a nurse or physician by phone,10 and “usual care.”11 Thus, it becomes impossible to compare these three trials accurately to assess superiority of the different behavior change interventions studied if the “standard” or “traditional” care is often completely undefined.

When authors fail to adequately describe a control arm, the interpretation of results, pooling of data in systematic reviews, replication of findings, and clinical application by physicians and patients is not possible.11,12,13,14,15 Ultimately, this contributes to increasing worldwide waste in clinical research resources.12,13,14,15 To date, this source of research waste has not been quantified in pediatric trials. This study aims to investigate the extensiveness of the reporting of “standard of care” control arm compared to the intervention arm within the same published pediatric clinical trial. An exploratory second objective is to describe the clinical and methodological characteristics of these trials, and to evaluate whether presence of certain characteristics is associated with more complete reporting.

Methods

Search strategy

To identify prospective controlled pediatric intervention trials, a search strategy validated by experienced research librarians at The Hospital for Sick Children was used. The search was limited to publications from the year 2014, with MeSH and non-MeSH terms relevant to pediatric clinical trials and study protocols. The study search strategy is presented in the Supplemental Materials S1.

Screening

Title and abstract screening was conducted by two independent reviewers. Abstracts pertaining to prospective intervention trials in pediatric patients were selected for full-text assessment. Criteria for inclusion were full-text prospective intervention trials with pediatric patients aged 18 years or younger, or with a mean study patient population <21 years at the time of recruitment. Included studies were those that reported use of a “standard of care” control arm, or similar terms such as “usual,” “standard,” or “regular” treatment. Randomized and non-randomized trials were included. Trials without a control arm, without patients aged 18 years or younger, or with a mean age >21 years were excluded. Study protocols and systematic reviews were excluded. Institutional ethics committee approval was not required as all data were extracted from published literature.

Data extraction

Data from included full-text articles were extracted and managed using REDCap (Research Electronic Data Capture) electronic data capture tool hosted at The Hospital for Sick Children.16 The following data were extracted independently in duplicate by two investigators: population, age group, disease area, intervention, control intervention, presence of study blinding, location of trial recruitment, reporting of study funding and source, reporting of a clinical trials registry, number of study sites, publishing journal, publishing journal impact factor, sample size, terms used to describe “standard of care” (usual treatment, regular care, etc.), and randomization. Interventions were classified as: drug, device, surgery or radiotherapy, behavioral/rehabilitation or psychosocial, vaccine, natural health product, communication/organization or education, prevention or screening, complex intervention, and other.17 Age groups of participants involved were classified as preterm neonate (births prior to gestation <37 weeks), term neonate (birth to 28 days), infants (29 days to <1 year), toddler (1 year to <2 years), early childhood (2–5 years), middle childhood (6–11 years), early adolescents (12–18 years), late adolescence (19–21 years), adults (over the age of 21 years), or unspecified.”18, 19 Preterm neonates are all children born <37 weeks regardless of their current age; “infants” comprise all children 28 days–1 year chronologic age regardless of their gestational age.

Impact factors for journals were obtained from CiteFactor’s Impact Factor List 2014.20 Impact factors for journals not indexed on CiteFactor were obtained from Thomson Reuters 2014 journal citation reports.20

Reporting assessment using the TIDieR checklist

Assessments were conducted in duplicate by two independent authors using a modified 12-item TIDieR (Template for Intervention Description and Replication) checklist.21 To use the TIDieR checklist for our objectives, we separated the “who provided” treatment item (TIDieR #5) to include whether the authors described (a) who provided the therapy and (b) any specific training given to those administering the therapy. We added whether or not justification criteria in the form of reference(s) were provided to support the choice of intervention or control. We also combined adherence (planned; TIDieR #11 and actual; TIDieR #12) with modifications to the intervention (TIDieR #10). This item was scored as 1 (present) if (a) authors explicitly stated the presence of any deviation or modification from an a priori study design, or (b) authors explicitly stated there was no deviation or modification from a study protocol or a priori study design. This item was scored as 0 (not present) if there was no mention of adherence or modification to a study protocol or no mention of fidelity to the a priori study design. After reviewing our initial first few studies and piloting our data extraction form, we did not find that many studies distinguished planned or actual variability in adherence or therapy modifications. These changes were made to specifically gather more in-depth information about why the “standard of care” was selected and more information on what training was necessary to administer, which was felt to be particularly important for behavioral and organizational/educational interventions and to foster replicability.

Within-trials modified TIDieR checklist items were assessed for both1 intervention arm and2 “standard of care” control arm. The same checklist was used to assess the reporting for both arms. The number of reported checklist items was recorded. Items were marked as 1 (item reported) or 0 (item not reported), with a maximum of 12 reported items for each individual trial arm (intervention and control). An item was marked as reported if sufficient details according to the TIDieR checklist were provided in the publication. Items were marked as incomplete if they were partially present and did not report key TIDieR elements.21 The sum of the total number of reported TIDieR checklist items was used to quantify the overall extensiveness of reporting for a trial, with a maximum of 24 total reported items per trial (12 per study arm). For trials evaluating more than one intervention, the number of reported TIDieR items of all experimental arms was averaged. For trials with adjuvant therapy in addition to “standard of care,” descriptions of “standard of care” was scored separately. Placebo arms were excluded from all analyses. Scores of the two investigators were compared for each study. If consensus could not be obtained between the two assessors, a senior author was consulted for agreement. For trials that indicated components of study methods were “reported elsewhere,” or were noted to be found in supplementary materials (i.e., published study protocol), those materials were obtained and included in all extensiveness of reporting analysis.

Analysis

The full dataset is available from the senior author upon request. We counted the number of items reported on intervention and “standard of care” and compared these counts across all studies included. Statistical analysis was conducted using IBM SPSS Statistics version 20 software.22 Categorical data were Funding declared, Randomized, Described as blinded, and Reported a clinical trial registration number; continuous data were Sample size (total study sample size, control group sample size), Journal impact factor, Total number of tidier items reported, and Number of study sites. Categorical data were analyzed using contingency tables and two-tailed Fisher's exact analysis. As an exploratory analysis to identify factors which correlated with more complete reporting, multivariate linear regression was used to model the following covariates' relation with reporting score in control arms, intervention arms, and overall study scores: reporting of blinding of the intervention, reporting of study funding, reporting of registration on a clinical trials registry, reported number of study sites, journal impact factor, total study sample size, and reporting of study randomization. The association between the total extensiveness of reporting scores of “standard of care” control arms and intervention arms within the same trial was tested with a Person's correlation coefficient and two-tailed test for significance.

Results

The MEDLINE (31 December 2013 to 01 January 2015, Ovid interface) search yielded 11,947 publications. Following title and abstract screening, potentially relevant studies pertaining to prospective intervention trials in children were selected for full-text assessment (N = 3375). Following full-text assessment, 214 studies were included for analysis in this study, a total of 91,961 pediatric clinical trial participants. A PRISMA study information flow diagram is presented in Fig. 1.23 Extensiveness of reporting scores and selected study characteristics are presented in Table 1; a full description of study characteristics are presented in Supplemental Materials S2. The median study sample size was 100 (range 11–7744) patients. A variety of terms were used to describe “standard of care” control arms, including: standard (41%), common (26%), conventional (10%), routine (8%), regular (7%), and other (8%), including traditional, gold standard, normal, typical, or expectant care.

Fig. 1
figure 1

PRISMA flow diagram

Table 1 Select characteristics of included studies and extensiveness of reporting scores

There were nine modified TIDieR checklist items that were significantly more frequently reported in the intervention arm (Table 2). The mean number of reported TIDieR checklist items was 5.81 (SD 2.13) for “standard of care” control arms and 8.45 (SD 1.39) for intervention arms. The percentage of studies reporting each modified TIDieR checklist item for the intervention arm and for “standard of care” control arm can be found in Fig. 2.

Table 2 Modified TIDieR checklist21 used to evaluate the extensiveness of reporting for pediatric trials with standard of care control arms
Fig. 2
figure 2

Percentage of pediatric studies (n = 214) reporting each modified TIDieR checklist item for the intervention arm and for the “standard of care” control arm

Exploratory analyses

We found no significant association between the total number of reported TIDieR checklist items of control and intervention arms within the same trial, with a Pearson's correlation of 0.090 (p = 0.189). The multivariate linear regression model (Table 3) produced an adjusted R2 value of 0.044 for prediction of reported checklist items for the control arm (p = 0.043), and an adjusted R2 value of 0.013 for intervention arm (p = 0.247). Reporting of study funding yielded a significant increase in the number of reported checklist items for the intervention arm (p = 0.031), but not for the control arm. The number of study sites predicting fewer reported items for control arms (p = 0.022). Reporting of a clinical trials registry, presence of blinding, journal impact factor, total study sample size, and presence of study randomization did not predict number of reported checklist items for “standard of care” control arms or intervention arms (p > 0.05).

Table 3 Multivariate linear regression of study characteristics

Discussion

In this study, the reported descriptions of “standard of care” control arms were more often incomplete and lacking detail than the reporting of intervention arms within the same study as measured by the TIDieR checklist. Numerous trials included in this study did not report adequate information for replication of the control arm. For example, a “standard of care” control arm was used in a randomized trial of a psycho-educational intervention in pediatric cancer patients24 and a cluster randomized trial of various school-based physical activity interventions.25 These trials reported only that “usual care” was used a control, the actual name of the control arm intervention, its rationale, procedures involved, mode of delivery, location, and timing of administration were not described in either study making replication and interpretation challenging and implementation near impossible.26

Deficiency in reporting details of “standard of care” control arms in trials presents a serious issue, as the level and quality of care received by control arm participants can significantly impact the results of clinical trials. Without a true understanding of the control arm intervention, biased interpretation of reported safety and efficacy effect measures may occur. For instance, an intervention compared with a highly effective “standard of care” arm control may yield a smaller effect size than the same intervention compared to a less effective “standard of care” control arm.26 Thus, it is difficult to interpret study results and ascertain whether a highly positive outcome is the result of an efficacious intervention, or a poorly selected “standard of care” control arm.3, 6 The results of our study highlight the necessity for researchers to systematically and transparently describe the interventions used in control arms, and to further justify their decision in selecting the chosen control arm.14

A similar issue can arise with multicenter research designs. In our study, we observed that the presence of more study sites predicted fewer reported TIDieR checklist items for the control arm. With inadequate reporting, it becomes difficult to ascertain whether each of the sites contributing to a single trial was able to provide equivalent care, and whether the reported treatment effect size is valid. This adds to the many challenges multicenter trials have, maintaining intervention fidelity, participant retention, and protocol adherence, especially in trials that span countries or even continents.27 Not all “standard of care” interventions lack specificity, as such, an intervention which is the sole gold standard treatment for a condition may not require an extensive description for accurate replication across multiple sites and studies. In our study, only 2/98 (2%) of the multicenter trials we reviewed commented on limitations in ensuring equivalent care was provided across sites for their standard care control arm. These findings suggest that a multisite setting should alert both researchers and publishers to provide information on how the “standard of care” was consistently implemented across sites to improve interpretability of results.

We found no association between number of TIDieR items reported in intervention arms and control arms from the same trial. This seems surprising, but could have resulted from oversight or lack of adequate word limits to describe the control arm. With respect to reported funding source, intervention arms were more likely to have an increased number of reported TIDieR checklist items, but did not reflect more complete reporting of the “standard of care” control arm. This may result from direct funding incentives for evaluation of the intervention warranting thorough descriptions of the intervention without the same consideration for the control arm.

Omission of fundamental information about study interventions has been identified as an important and avoidable contributor to the worldwide waste in research.12, 13 In an analysis of randomized trials of non-pharmacological interventions published in 2009, researchers found that out of 137 interventions, only 39% were adequately described.28 In an examination of the web-based publishing instructions to authors of 106 journals, only 14% specifically cited the reporting of interventions.29 It is clear that researchers and publishers must work together to demand comprehensive and transparent reporting of all aspects of intervention and control arms in order for study results to be generalized or incorporated into clinical practice. Consequences of inadequate reporting can result in wasted research funding, time, health-care resources, and inability to incorporate research into synthesized data (systematic reviews, meta-analyses, clinical practice guidelines).12, 15 Ioannidis et al.30 describe the extensive resources involved in conducting a single study, including trial costs of paying researchers, using research infrastructure, the numerous human resources involved for a trial to occur (ethics review board, grant committees), and the invaluable time of researchers and study participants that may have been spent elsewhere.30 Overall, our findings highlight the need for higher standards for reporting of control groups, and the need for researchers to justify their appropriate selection of trial control arms, and for publishers to demand more complete reporting of control arms prior to publication for research waste to be reduced.14

Limitations

To our knowledge, this is the first time the TIDieR checklist has been used to quantitatively assess extensiveness of reporting of intervention and "standard of care" control arms within the same study. While the TIDieR checklist was not designed, or validated for our specific study purpose, we believe comparing the number of items satisfied by the control and intervention arm descriptions within the same studies allowed us to avoid assigning an arbitrary number of reported items to represent an “adequate description”, which would be outside of the tool”s intended purpose. We believe describing the difference in the number of items on the TIDieR checklist satisfied between control and intervention arm descriptions within the same studies is an appropriate use of the TIDieR checklist. Arguably, not all items are equally relevant for interpretation of control and intervention arms. We note that TiDIER was not designed to differentiate between control and intervention arms. In our study, we used the same items to score control and intervention, without putting different weighting on each item, determined by use in control and intervention arms. While we found that 9 out of the 12 items were significantly (p < 0.05) more frequently reported for the intervention arm; we cannot state whether the reporting of intervention arms were of adequate quality to begin with as there is no gold-standard grading or cut-off definition for “adequate reporting” using this tool. This is a current limitation of the tool itself, and further work is necessary to develop an accepted minimal definition for adequate intervention or control arm description.

An additional limitation of this analysis includes a lack of assessment of the impact of journal work limits for clinical trials which may force authors to make decisions about which details are most important to report. Future research should compare reporting completeness according to journal page limit restrictions. Furthermore, while this study is focused on reporting, we were unable distinguish between problems of study design with deficiencies in reporting. For example, if usual care in a trial is not “protocolized,” detailed reporting of the range of usual care practices that occur in the comparison arm of a trial might make the study reporting better, but would not necessarily address the design issue that may jeopardize the validity of the trial.

Conclusion

In clinical trials, “standard of care” or similar terms are inadequate descriptors of control arm interventions, since “standard of care” can vary greatly between countries, sites, physicians, and through time. Poor reporting of intervention and control arms contributes to research waste, including the possibility of false positives, false negatives, or conflicting conclusions regarding interventions leading to impaired clinical practice, conduct of meta-analysis, or updating clinical practice guidelines. In the short term, enforcement of the adoption and requirement of journal authors to adhere to reporting guidelines like TIDieR for all study arms, including “standard of care” control arms, may aid in reducing this research waste and increasing research impact.