Main

Biopsy for pharmacodynamic (PD) and biomarker analysis is increasingly common in early-phase cancer trials (Twelves, 2006; Goulart et al, 2007). In principle, PD end points can provide evidence of target effects for a drug, and support decision making for subsequent trials (Workman, 2003; Sarker et al, 2007; Sarker and Workman, 2007; Tan et al, 2009). However, many PD studies require invasive procedures like tumour biopsy.

Studies find that many patients are willing to undergo research biopsy (Seah et al, 2013) and that ethics review committees and oncologists may overestimate patient anxiety associated with biopsies (Agulnik et al, 2006). In one study, overall and major complication rates for tumour biopsies were 5.2% and 0.8%, respectively (Overman et al, 2012). However, the majority of patients describe their biopsies as being painful (Agulnik et al, 2006) and other studies indicate that 10% of patients receiving one common procedure – breast tumour biopsy – report moderate-to-severe pain (a more extended discussion of tumour biopsy risk and burden is available at Brown et al (2008); Hemmer et al (2008); Kimmelman et al (2012)). As biopsies often have no value for subjects in terms of clinical management, their ethical justification rests on an expectation that their performance will be redeemed by the value of the knowledge accrued (Olson et al, 2011).

Given that the burdens of such procedures are well understood, debates concerning their application revolve around conflicting views about the scientific utility of PD evidence. Some commentators question whether research biopsies return sufficient knowledge to justify their risks (Dowlati et al, 2001; Parulekar and Eisenhauer, 2004; Davis et al, 2005; Goulart et al, 2007; Ratain and Glassman, 2007). Such critics describe research biopsies as ‘taking without giving in return’ and an ‘expensive distraction’ (Helft and Daugherty, 2006; Ratain and Glassman, 2007; Olson et al, 2011). One critic argues, ‘given that biomarker support of mechanism, or lack thereof, has not contributed to go/no-go decisions in practice, sponsors should reconsider the value of including any biomarker evaluations in phase I oncological studies’ (Ratain and Glassman, 2007). Others insist that the procedures are safe and feasible, and stress the importance of gathering mechanistic evidence in drug development; defenders point to examples where enrichment trials involving biopsy enabled rapid translation of cancer strategies (Kelloff and Sigman, 2005; Agulnik et al, 2006; Cannistra, 2007; Brown et al, 2008; Peppercorn et al, 2010).

Such debates are hampered by a paucity of systematic evidence concerning the knowledge value of PD studies. In part, this reflects the fact that there are no widely accepted measures of knowledge value. In this report, we sought to highlight measures that could improve the risk/knowledge value of tumour biopsies and associated PD analyses. In particular, we measured two objective proxies of knowledge value: reporting practices and outcome diversity. In order for ‘knowledge value’ to accrue, scientific findings must be reported in sufficient detail to permit readers to form or update beliefs. They must also enable others to reproduce findings in studies addressing similar questions. We measured the extent to which publications reported on study elements that were viewed as important in similar studies – those involving tumour prognostic biomarkers. The second proxy builds on the premise that a population of studies is more informative when it reflects a diversity of outcomes for tested hypotheses. Pharmacodynamic studies generally set out to test well-formulated hypotheses about specified target effects. Finding that in a population of PD studies, hypothesised target effects are almost always confirmed suggests either publication bias, limited information gain (as outcomes were predicted in advance of the PD study), or both. Our studies highlight the potential value of reporting standards for PD studies in cancer.

Materials and methods

Our primary objective was to describe the reporting practices in a convenience sample of recent invasive PD studies embedded within cancer trials. Our secondary objectives were to measure diversity of study outcomes and to identify characteristics of studies that correlate with better reporting.

Sample

Our study utilised a convenience sample of studies involving tumour biopsy. To capture a sample of studies that involved PD analyses and invasive tissue procurement while excluding the very large volume of studies involving minimally invasive collection (for example, venipuncture), we devised a search strategy that was highly specific. Briefly, we used keywords like ‘biopsy’ and ‘pharmacodynamic’ to search PubMed for articles published from 2000 to 2010 (inclusive) reporting on the use of invasive, non-diagnostic tissue procurement in cancer trials. We excluded articles where (a) non-diagnostic status of tissue procurement was ambiguous; (b) biopsy was not performed; (c) trials did not involve cancer patients; or (d) tissue procurement was minimally invasive (for example, venipuncture). Our search methods are described in greater detail elsewhere (Freeman and Kimmelman, 2012). After an initial screening by title and abstract, eligibility was confirmed using the full report.

Extraction elements

We developed a data extraction form for assessing study reporting and outcomes. Our form (Appendix 1) covered three domains: (1) study characteristics (for example, the year of publication, phase of trial, drug identity); (2) PD study practices and reporting (for example, description of assays, patient flow through study, use of blinded analysis); and (3) study outcomes (for example, confirmation status of PD hypotheses, author conclusions).

Elements within the second domain were adapted from REMARK criteria and supplemented with items described in Eisenhauer et al (2006); McShane et al (2005). Extraction elements and coding conventions were initially developed by JK, and then discussed, refined, and approved by JD and JGM. After piloting extraction against 15 studies, we refined our form and coding criteria.

Extraction

All articles were extracted using paper forms by two reviewers (GF and JK) blinded to the other’s extractions (but not author identities). We interpreted the absence of an affirmative practice statement as the absence of that practice (that is, studies not reporting blinded assessment were coded as not having implemented blinded outcome assessment). Studies were classified as implementing mandatory biopsy when explicitly stated in the report or when tissue samples were collected from all subjects. Data from extractions were entered into an Excel spreadsheet for analysis. Cohen’s κ-inter-rater agreement was calculated; values exceeded 0.8, which we considered ‘good agreement’ (Fleiss, 1981; Toulmonde et al, 2011). Disagreements were resolved through discussion.

Reporting score

We developed a reporting score (RS) in order to explore the range of reporting quality, and to enable a series of tests concerning relationships between study characteristics and reporting. Our score was modelled after those used for prognostic tumour biomarker studies and randomized trials (Lai et al, 2006; Kyzas et al, 2007; Rios et al, 2008; Toulmonde et al, 2011) and was developed through discussions with all authors. It consisted of eight reporting domains: (1) goal and hypothesis; (2) subject eligibility; (3) specimen characteristics; (4) assay protocol; (5) statistics; (6) subject flow; (7) results; (8) discussion. Domains contained one or more evenly weighted reporting variables. Reporting on any item within a domain would result in a fractional score and each domain had a potential score of one. Scores in each domain were summed to calculate an overall RS for each study.

Outcome reporting

Studies were assessed along three outcome categories. The first was results of hypothesis tests. Results were coded as positive where a treatment caused hypothesised changes in targets (that is, an increase in apoptosis assessed by TUNEL staining with a proapoptotic drug) and negative where hypotheses failed confirmation (but were not necessarily disconfirmed). As most studies tested many markers, we coded each report according to whether some, all, or no tested hypotheses were positive. The second outcome category was discussion of results in light of hypotheses. Studies were scored as ‘positive’ when discussions indicated that PD results were consistent with the predicted molecular effects of the agent. Discussions were coded as ambiguous where they gave no clear indication as to whether PD supported the predicted effect of the agent, and were coded as negative where they suggested PD did not support the predicted molecular effects. The third outcome assessed was discussion of results in light of future study planning. Studies were coded as informative where PD results (whether themselves positive or negative) were said to inform planning of future studies. Discussions were coded as uninformative where they gave no clear indication of how PD results related to future investigations.

In a post hoc analysis, we studied the effect of industry funding on PD outcome reporting, focusing on the proportion of positive assay results and the discussion of those results in light of hypotheses and planning for future studies. Fisher’s exact test of independence was used to calculate significance (McDonald, 2009).

Statistics

As this was an exploratory study, we used a convenience sample of PD studies rather than a prospectively determined sample size. We tested a priori-formulated hypotheses of correlation between RS and the following seven variables: (1) the year of publication, (2) public funding, (3) journal impact factor, (4) separate publication for PD results, (5) use of a non-novel test drug, (6) mandatory biopsy; and (7) author assessment of the trial outcome (negative outcome defined as studies recommending that further trials of the investigational agent should not be undertaken). Significance of relationships was tested using one-way ANOVA with SPSS software. We defined significance as P0.05. We did not correct for multiple comparisons.

Results

Sample

Our PubMed search produced a sample of 68 eligible articles reporting results from early-phase cancer trials utilising non-diagnostic biopsy for PD analysis (flow of articles is described in Figure 1; see Appendix 2 for an inventory of studies). Table 1 displays the characteristics of the trials in our sample; Table 2 reports biopsy characteristics within our sample. Ten studies in our sample (15%) actively reported safety events related to biopsy; of these, one reported a single adverse event at or above grade 3.

Figure 1
figure 1

Diagram of flow of the published articles selection process.

Table 1 Characteristics of early-phase cancer trials included in sample (n=68)
Table 2 Characteristics of early-phase cancer trials included in sample (n=68)

Our sample captured a total of 2644 patients receiving invasive non-diagnostic biopsies. Although reporting of patient flow through PD studies was poor, we recorded author explanations for discrepancies between patients approached for biopsy, samples collected, and samples analysed. The most common reason for discrepancy was insufficient quality or quantity of sample for analysis (84%), followed by patient refusal (19%) and medical contraindication for biopsy (19%). Missed samples (3%) were because of patient death.

Reporting score

We calculated the RS for each article in our sample. The RS range had a score centred around 5.5 (Figure 2). Some variables, like description of causal pathway and biopsy location, were consistently reported (Table 3). However, there was broad variation within specific domains in the RS. A fifth of articles did not report results for all PD analyses performed; 57% did not report the status of blinding for pathological analysis and 62% did not provide information about the dimensions of the biopsy sample.

Figure 2
figure 2

Distribution of RSs for sample of early-phase cancer trials utilising biopsy for PD study.

Table 3 Reporting score (RS) outcomes (n=68)

Reporting predictors

The use of a non-novel study drug showed positive but non-significant trend towards a higher RS (5.6 vs 5.2, P=0.219). Pharmacodynamics as a primary end point showed a significant positive relationship with RS (5.8 vs 5.1, P=0.04), as did the use of mandatory biopsy (5.9 vs 5.1, P=0.023). We found no relationship between RS and the year of publication, journal impact factor, funding source, or author assessment of the trial outcome.

Pharmacodynamic study outcome

The majority of articles (66%) reported some negative PD results and 10% of the articles reported all negative PD results. Fifty-six percent of studies reported at least one positive PD parameter. The majority of studies (61%) described their PD results as ‘positive’ in discussions (for example, PD results provided evidence of the investigational agent having intended effects on molecular targets).

A large majority of articles (78%) contained a discussion of PD results in relation to the direction of future studies. Among these, 72% discussed possible amendments to the conduct or direction of future studies based on the PD findings of the current study.

Industry funding vs results positivity

Industry-funded trials were more likely to report all or some positive PD results than non-industry-funded studies. No industry-funded trial reported all negative results for PD parameters tested. Trials with industry funding trended towards greater positivity in discussion both in terms of support for the predicted method of action of the drug (75% vs 53%, P=0.11) and planning for future studies (80% vs 67%, P=0.359).

Discussion

Biopsies for PD in anticancer drug trials are often burdensome and entail non-trivial costs. Justification of procedures rests on a favourable gain of scientific knowledge (Weijer and Miller, 2004). Poor PD reporting does not adequately redeem burdens and can produce biased findings that lead to unsuccessful clinical development (Tan et al, 2009). At present, there is little systematic evidence to inform the planning, implementation, and ethical evaluation of PD studies involving invasive tissue procurement.

Our study explored two relatively objective proxies of knowledge value in a convenience sample of PD studies using research biopsies. Encouragingly, a large fraction of studies reported tissue location, procurement method, and discussion of PD results. However, many important items were reported sporadically, including results of all planned tests, use of blinded histopathological assessment, biopsy dimensions, and description of patient flow through the PD portion of the trial. Previous studies of prognostic marker research reporting showed that over 90% of studies reported ‘positive’ outcomes (Kyzas et al, 2005; Kyzas et al, 2007). Disproportionate reporting of positive results was also observed in genetic association studies (Ioannidis et al, 2001). We entered this study expecting near-uniform positivity among PD reports. Instead, we found that two-thirds of articles contained negative outcomes, and a similar proportion described PD analysis as affirming hypotheses in discussion. This is evidence that PD is not characterised by overwhelming publication bias, and that results are not overdetermined at study inception. Nevertheless, that the fraction of studies reporting uniform positivity (34%) vs those reporting uniform negativity (10%) suggests, in our view, the presence of some bias. Whether this bias pertains to publication bias, or enhanced pre-test probability, we are unable to say. Analysis of positivity would be greatly aided if studies declared their primary hypothesis; the only instance where this occurred was in studies that reported only a single PD marker analysis. We further take the fact that a large fraction of PD studies were described as informing decisions for future studies as support for invasive PD evaluation. Future studies should investigate the fraction of PD findings that motivate actual new investigations.

Our study has several limitations. First, some might question the premises guiding our proxy indicators of knowledge value. Poorly reported studies can still hold value, and uniformly positive results can convert modest degrees of belief in drug effects into higher degrees of belief. Still, uniform confirmation would seem a modest gain of information for considerable burden. Second, some items in the RS, similar to blinded outcome assessment, straddle ‘good reporting’ and ‘good methodological practice,’ and high quality reporting can mask poor methodological practice (Huwiler-Muntener et al, 2002; Toulmonde et al, 2011). Third, in line with the exploratory orientation, our study did not capture a comprehensive sample of studies involving research biopsies. A larger sample might have produced different findings and our sample may have been underpowered to detect relationships between study characteristics and reporting quality. Fourth, although our article points to ways that reporting of PD might improve, nothing in our premises, data, or analysis provides a clear basis for deciding whether current research biopsy and PD study practices meet an adequate threshold of knowledge value. Last, our RS scale should be interpreted with caution. It was not the result of a consensus building process (unlike CONSORT and REMARK) (Harris, 2005; McShane et al, 2005; Lai et al, 2006; ‘How CONSORT began’, 2008; Rios et al, 2008; Toulmonde et al, 2011). Furthermore, it gave uniform weighting for each criterion, which may not be appropriate, given that some items probably matter more than others with respect to valid study interpretation. Nevertheless, our scale was at least modelled on validated criteria and we believe its application is justified in the context of this exploratory exercise. Finally, although this study identifies deficiencies in current reporting practices and may aid in the development of consensus guidelines, it must be noted that delay to publication means that current study practices may not be accurately represented in our study.

Our study suggests several avenues investigators, funders, or IRBs might consider for improving the risk–benefit balance of PD studies. First, we recommend the research community develop formalized reporting guidelines similar to REMARK and CONSORT. Second, given our observation that separate PD-reporting trends towards higher quality, and that reporting quality for PD studies may be constrained by word counts at journals, we encourage investigators to consider separate PD publication, using standard methods described in a reference or reporting methods in supplementary materials (Toulmonde et al, 2011). Journal editors may have a role in limiting ‘text limitation bias.’ Third, given that PD components might not be registered in http://www.clinicaltrials.gov, IRBs might have a more active role in promoting reporting and publication by asking investigators to provide a detailed reporting plan for PD studies. A recent article recommended the creation of an online biomarker study registry similar to http://www.clinicaltrials.gov (Andre et al, 2011). We support extending this initiative to PD.

Together with a previous study by our team, our results offer a complex picture of the quality of reporting for PD studies involving non-diagnostic biopsy. A preponderance of positive results, coupled with a finding that 63% of PD studies go unreported suggests biases. Low perceived quality of reports, and low reporting of basic factors like patient flow, suggests considerable room for improvement. On the other hand, some studies demonstrate careful reporting, many negative results are reported, and a large fraction of studies report that PD findings will help guide future investigations. In the end, we conclude that the evidence gathered above provides ammunition for proponents as well as opponents of research biopsies in cancer. In any event, our findings and analysis provide grounds for developing and disseminating PD-reporting standards.