A systematic review of PDE-5 inhibitors for erectile dysfunction was performed to evaluate the utility of quantitative methods for identifying and exploring the influence of bias and study quality on pooled outcomes from meta-analyses. We included 123 randomized controlled trials (RCTs). Methodological quality was poorly reported. All three drugs appeared highly effective. Indirect adjusted analyses showed no differences between the three drugs. Funnel plots and statistical tests showed no evidence of small-study effects for sildenafil whereas there was evidence of such bias for tadalafil and vardenafil. Adjustment for missing studies using trim and fill techniques did not alter the pooled estimates substantially. The exclusion of previous sildenafil nonresponders was associated with larger treatment effects for tadalafil. This investigation was hampered by poor reporting of methodological quality, a low number of studies, heterogeneity and large effect sizes. Despite such limitations, a comprehensive assessment of biases should be a routine in systematic reviews.
The introduction of the oral drugs sildenafil (Viagra), tadalafil (Cialis) and vardenafil (Levitra) has had a major impact on the treatment of male erectile dysfunction (ED). The efficacy of these PDE-5 inhibitors is well known,1, 2, 3 but evidence of comparative meta-analyses is inconclusive.4, 5
Bias can distort results of systematic reviews and meta-analyses,6, 7 and this may hide potential real differences in outcome between the three drugs. For example, many studies have shown that trials with highly positive results have a tendency to be published ahead of less favorable results, which creates publication bias.8, 9 Selection bias in individual studies may arise if men who were previously unresponsive to one type of PDE-5 inhibitor are excluded for studies of other PDE-5 inhibitors. Broderick et al.10 concluded that the exclusion of such men does not substantially affect the outcome in tadalafil trials, and Gehr et al.11 presented evidence of ‘fading of reported effectiveness’, which represents decreasing effectiveness of medical therapies over time. When comparing several interventions, such bias may impair the validity of meta-analyses and distort results favoring the newer therapies.
Our analysis aimed to evaluate the utility of quantitative methods for identifying and exploring the influence of bias and study quality on pooled outcomes in meta-analyses. We chose to apply these methods to a systematic review of PDE-5 inhibitors since this provided an area where an established product had been available in the market for several years prior to the introduction of two recent competitors.12 Such a scenario might provide a good substrate from which to explore the presence of bias, since the competitive environment could intensify the need to yield positive data.
Fifteen electronic databases were searched up to July 2006: MEDLINE, EMBASE, The Cochrane Library, Cinahl, Pascal, BIOSIS, Chemical Industry Notes, Derwent Drug File, Drug Information Fulltext, Gale Group PharmaBiomed Business Journals, Health Periodicals Database, International Pharmaceutical Abstracts, Pharm-line, SciSearch and ToxFile. The search strategy included the terms impotence or erectile dysfunction combined with the following words describing the drugs sildenafil, Viagra, tadalafil, Cialis, vardanafil, Levitra, PDE-5 inhibitors and Phosphodiesterase adj 4 inhibitors. The search was limited by a highly sensitive filter to identify randomized controlled trials (RCTs).13
In addition, seven trial registries and the websites of the United States Food and Drug Administration and the European Agency for the Evaluation of Medical Products were searched and manufacturers were contacted for absent studies. Finally, reference lists of all included studies and of recent systematic reviews were screened for missed references.1, 2, 3, 5, 10 There was no language restriction and published as well as unpublished studies and abstracts were included.
Studies were eligible for inclusion if they (1) were randomized trials, (2) included men with ED, (3) examined the efficacy or safety of sildenafil, tadalafil and vardenafil versus placebo or versus another PDE-5 inhibitor and (4) reported measures of efficacy or safety. Only the first part of crossover trials was included in the review, provided that all inclusion criteria were met. Integrated analyses were excluded.
One reviewer (GEB) screened the title and abstract of all references located by the search strategy. All potential relevant studies were retrieved. One reviewer (GEB) read all retrieved papers in full to reconfirm their suitability for inclusion.
Data extractions were performed by one reviewer and checked by a second reviewer (GEB, AMAS) using a standardized data extraction form. Data on study design, study participants, intervention, previous treatments, follow-up period and methodological quality were extracted. Authors were contacted if the reported data was unclear or if necessary information was missing.
The following factors were a priori defined to be potential confounders or effect modifiers: patient age, etiology of ED, duration of ED at baseline, severity of ED at baseline (using the International Index of Erectile Function (IIEF)—erectile function (EF) domain score), inclusion criteria (inclusion versus exclusion of nonresponders to previous PDE-5 inhibitors), pre-existent conditions (such as diabetes, hypertension, and so on).
Assessment of methodological quality
Eight items of methodological quality were extracted: generating a random sequence, concealment of allocation, blinding of patient, blinding of care giver, blinding of assessor, was the blinding formally tested, were groups comparable at baseline and was an intention to treat (ITT) analysis used. ITT was defined as when all patients randomized were included in the analysis.14 In addition, the proportion of men that were randomized but not included in the analysis was extracted.
Quantitative data synthesis
For this study we selected the most commonly reported outcomes for the three drugs. These outcomes were final IIEF EF domain score (range 0–30) and general efficacy question (GEQ) ‘Has the treatment improved your erections?’ (Yes or no).
Adjusted indirect comparison were calculated based on random effects models using the method of Bucher et al.15, 16 Odds ratios (ORs) were calculated for dichotomous data and weighted mean differences (WMD) for continuous data (including 95% confidence intervals; 95% CI).
The difference in outcome between intervention and placebo was calculated for each study and then combined through random effects meta-analysis using the inverse variance method. Our analysis only includes the highest dose of the intervention within dosing recommendations used in the study.
Forest plots were used to display the results of all meta-analyses, and to visualize heterogeneity between studies. In addition to the χ2-test for homogeneity,17 the amount of heterogeneity was quantified using the I2 statistic.18 An I2 value greater than 50% was considered to represent substantial heterogeneity.19
For comparisons that had at least 10 studies funnel plots were used to examine ‘small-study effects’, which is the tendency for smaller studies in a meta-analysis to show larger effects.20 In this plot the treatment effect on the x axis was plotted against the standard error (s.e.) on the y axis.21 Funnel plot asymmetry was also tested using the following statistical methods: Begg's test22 and the Egger's test.23
When there was evidence of small-study bias, for each drug the trim and fill technique was used to estimate the number of “missing” studies and subsequently to estimate the adjusted treatment effect.24, 25 Evidence of small-study bias was defined as: asymmetry in the funnel plot and P<0.2 or no asymmetry in funnel plot and P<0.1.
Meta-regression analyses were used to examine the influence of selective exclusion and methodological quality on treatment effect. Analyses were univariate for tadalafil and vardenafil due to the limited number of studies. The results on the GEQ were converted to ORs using e(beta).
Overall, 123 unique RCTs were included in the review and 88 were included in the meta-analysis (see Figure 1). For each study an average of two references were located, ranging from 1 to 9.
Sixty percent of sildenafil trials were conducted in broad spectrum population. Other populations included men with spinal cord injury, renal failure, diabetes, cardiovascular disease, postoperative ED and depression. Five studies (10%) excluded previous sildenafil nonresponders. Trial size ranged from 26 to 847 men (mean: 240).
Eighty-nine percent of tadalafil trials were conducted in broad spectrum population. Other populations were diabetes and postoperative ED. Eleven studies (58%) excluded sildenafil nonresponders. Trial size ranged from 120 to 483 men (mean: 248).
Sixty-nine percent of vardenafil trials were performed in a broad spectrum population. Other populations were postoperative ED, depression, spinal cord injury and diabetes. Seven (44%) trials excluded previous sildenafil nonresponders. Trial size ranged from 229 to 805 men (mean: 486). Tables 1, 2 and 3 present some characteristics of trials on sildenafil, tadalafil and vardenafil, respectively. The tables illustrate a poor reporting of methodological quality items.
Direct and indirect comparison
VAR03 examined the efficacy of vardenafil in three treatment arms (5, 10 and 20 mg) versus placebo. One additional group received sildenafil 50 mg. WMD between sildenafil (50 mg) and vardenafil (10 mg) was −0.36 (95% CI −2.08, 1.36) on IIEF.
No differences were found when indirectly comparing the three interventions (Table 4).
The random effects WMD for the efficacy of sildenafil versus placebo was 7.55 on IIEF (95% CI 6.88, 8.22) (n=5583). There was evidence of heterogeneity (χ2 P=0.000; I2=55.8%). For tadalafil, WMD was 7.27 (95% CI 5.70, 8.85; n=2329). There was evidence of heterogeneity (χ2 P=0.000; I2=73.7%). This estimate was 7.47 (95% CI 6.93, 8.00) for vardenafil (n=5469) without evidence of heterogeneity (χ2 P=0.57; I2=0%).
The results for the GEQ were as follows: sildenafil versus placebo (n=8523); random effects OR 10.56 (95% CI 9.10, 12.25). There was evidence of some heterogeneity (χ2 P=0.001; I2=44.4%). Tadalafil versus placebo (n=2470) OR 10.26 (95% CI 7.60, 13.83). There was evidence of some heterogeneity (χ2 P=0.02; I2=49.5%). Vardenafil versus placebo (n=3923) OR 10.34 (95% CI 8.80, 12.16). There was no evidence of heterogeneity (χ2 P=0.34, I2=10.5%). Forest plots and funnel plots are available as Supplementary Information.
For IIEF, no evidence of small-study bias was found for sildenafil or vardenafil. The funnel plot for tadalafil appeared asymmetric with Egger test (P for bias=0.19). A trim and fill analysis filled five studies and the filled random effects WMD was 5.86 (95% CI 4.28, 7.44). The adjusted estimate did not differ substantially from the original results.
For the GEQ, no evidence of small-study bias was found for sildenafil. However evidence of small-study bias was found for tadalafil (Begg's test P=0.14) and vardenafil (Begg's test P=0.03). The trim and fill analysis filled two studies for tadalafil, the filled random effects OR was 9.36 (95% CI 6.95, 12.62). The trim and fill analysis filled three studies for vardenafil, the filled random effects OR was 9.43 (95% CI 7.91, 11.25). For both drugs, the adjusted estimates did not differ substantially from the original estimates.
For sildenafil, analyses showed no need to adjust for patient characteristics. The results on IIEF decreased over time (β −0.35; 95% CI −0.60, −0.10); this means that IIEF is estimated to decrease each year by 0.35. No association was found between results and trial size, blinding of the patient, whether groups were similar at baseline and whether an ITT analysis was performed on IIEF and GEQ. A nonsignificant association was found between the exclusion of previous nonresponders and IIEF (β −1.82; 95% CI −3.90, 0.16); thus results of studies that excluded previous nonresponders were estimated to be 1.82 points on IIEF lower compared to studies without such exclusion criterion.
For tadalafil, results on IIEF also decreased over time (IIEF β −0.76; 95% CI −1.41, −0.11). Studies that excluded previous nonresponders showed larger effects than studies without such exclusion criterion. For IIEF, β was 3.35 (95% CI 0.96, 5.70). For the GEQ, the ratio of ORs between studies that excluded and studies that did not exclude previous nonresponders was 1.77 (95% CI 1.00, 3.13). Studies with a larger proportion of patients not included in the analyses tended to show larger effects. Results on the GEQ were estimated to increase by a ratio of odds ratios of 1.20 (95% CI 0.99, 1.46), for a 1% increase in exclusions. It should be noted that there were only eight studies included in this analysis. No other associations were found.
For vardenafil, results on the GEQ decreased over time. The GEQ was estimated to decrease by a ratio of ORs of 0.92 (95% CI 0.86, 0.995) for each year. No other associations were found.
This meta-analysis showed that all three PDE-5 inhibitors are highly effective for male ED. There were no differences identified between the three treatments. More studies in more discrete populations have been conducted for sildenafil.
There was no evidence of small-study bias for sildenafil. Publication bias, if present, appeared to have a low impact on vardenafil and tadalafil studies. Evidence of ‘fading of reported effectiveness’ was found for all drugs, which may have impaired the validity of the meta-analyses. Selective inclusion was more frequent in tadalafil and vardenafil trials, an effect that was associated with larger treatment effects for tadalafil.
The paucity of details reported in each study relating to study design characteristics resulted in low sensitivity regarding methodological quality. Contacting authors and manufacturers to obtain more information had little success. This hampered our investigation of the relationship between study design features and outcomes. Future study reports should meet the requirements of the CONSORT statement.26
Our extensive search resulted in the inclusion of 22 additional studies compared to previously performed meta-analyses.1, 2, 3, 5, 10 There are indications that more trials have been performed as some reviews included several unpublished trials that could not be obtained by us. It is not clear whether results of those studies are published or not. The impact of these missing studies on our meta-analyses remains unknown.
This study did not find strong evidence of publication bias. The funnel plot is widely used as a test for publication bias. However, asymmetry of the funnel plot, visually or statistically may not accurately predict publication bias.27 The more advanced techniques of Egger and Begg are similarly influenced by low study numbers or when there is a substantial treatment effect.20 Both of these phenomena were present in our analyses. In addition, the number of studies for tadalafil and vardenafil was low compared to sildenafil and consequently these tests have lower power for the first two drugs.
Simulation studies have found that the trim and fill technique can predict missing studies even when there is little evidence of bias.28 One cause of this can be the presence of substantial heterogeneity, as was detected in the analysis of tadalafil for IIEF. To allow for these influences, we compared results with and without adjustment to see whether the predicted missing studies made a significant difference to the results, an approach recommended by Sutton et al.29 In our study, the adjusted estimates did not differ substantially from the original results, suggesting that the impact of publication bias, if present may be small.
Meta-regression failed to show an association between patient blinding and study results.
Blinding of patients is critical especially in trials where the outcomes are based on subjective measurements such as in this review. The judgments whether blinding was adequate was based on reported information and no studies reported results of formal tests of blinding adequacy. However, blinding may be difficult to maintain during the study when drugs are highly effective with clear and evident effects as is the case for PDE-5 inhibitors. It could be possible that real differences between the three drugs are being masked by the large measurement biases. Future trials would benefit from formally testing the adequacy of the blinding of patients.
Our analysis suggested that studies that excluded more patients from their analyses reported larger treatment effects. The majority of trials excluded patients after randomization and many of these trials defined intention-to-treat analysis as analyzing only those patients who took at least one dose of drugs and who had at least one postefficacy measurement done (for example, Lewis et al.30 Brock et al.31) or only those patients who had a baseline and at least one postbaseline observation (for example, Carson et al.32). Hence, patients who withdrew from the study or those who were lost to follow-up were excluded. Although it is common practice to refer to intention-to-treat as the analysis of all available subjects as randomized,33 this approach may lead to bias unless the data are missing at random.33, 34 We believe that for a correct intention-to-treat analysis all randomized patients should be analyzed.14
Melander et al.35 concluded that selective reporting—the tendency to publish the more favorable per protocol analyses—was a major cause for bias in studies sponsored by pharmaceutical industry. Although the definitions of analyses in Melander's study are not entirely clear, this could be an important source of bias as up to 27% of randomized patients were excluded from the analysis.
We found evidence of a decrease of effectiveness over time or ‘fading of reported effectiveness’11 for all three drugs. The decreased effectiveness did not appear to be mediated by change in baseline severity, although baseline IIEF was not reported for about 25% of the studies and for almost half of the trials the timing was estimated using date of publication. The validity of these meta-analyses may be impaired by this phenomenon.
Relation to other literature
Two other meta-analyses were performed that compared the three PDE-5 inhibitors. Moore et al.5 presented an indirect comparison of efficacy and harm using published RCTs. They reported similar efficacy between PDE-5 inhibitors. However, the drugs were apparently compared subjectively through a visual comparison of study results. This approach lacks the rigor of statistical testing.
Berner et al.4 meta-analyzed a selection of trials aiming to assess the best evidence. The restricted inclusion criteria meant that only eight trials for tadalafil and three trials for both sildenafil and vardenafil were considered. The authors conclude that there is evidence that sildenafil might be more efficacious than vardenafil. The selective inclusion of studies limits the generalizability of Berner's results. In addition, the low number of studies is problematic when it comes to publication bias tests as illustrated above.
A comprehensive assessment of biases should be a routine component of systematic reviews. Although the methods may be hampered by issues such as low number of studies, heterogeneity and large effect sizes, such an approach ensures that the reader can be more confident that the evidence presented is robust and that simple causes of confounding have been considered and explored. Without such an assessment the conclusions of a systematic review may be insecure and could lead to inappropriate practices being applied in healthcare settings.
Fink HA, Mac DR, Rutks IR, Nelson DB, Wilt TJ . Sildenafil for male erectile dysfunction: a systematic review and meta-analysis [see comment]. [Review] [37 refs]. Arch Intern Med 2002; 162: 1349–1360.
Markou S, Perimenis P, Gyftopoulos K, Athanasopoulos A, Barbalias G . Vardenafil (Levitra) for erectile dysfunction: a systematic review and meta-analysis of clinical trial reports. [Review] [34 refs]. Int J Impot Res 2004; 16: 470–478.
Carson CC, Rajfer J, Eardley I, Carrier S, Denne JS, Walker DJ et al. The efficacy and safety of tadalafil: an update. BJU Int 2004; 93: 1276–1281.
Berner MM, Kriston L, Harms A . Efficacy of PDE-5-inhibitors for erectile dysfunction. A comparative meta-analysis of fixed-dose regimen randomized controlled trials administering the International Index of Erectile Function in broad-spectrum populations. Int J Impot Res 2006; 18: 229–235.
Moore RA, Derry S, McQuay HJ . Indirect comparison of interventions using published randomised trials: systematic review of PDE-5 inhibitors for erectile dysfunction. BMC Urol 2005; 5: 18.
Schulz KF, Chalmers I, Hayes RJ, Altman DG . Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA 1995; 273: 408–412.
Moher D, Pham B, Jones A, Cook DJ, Jadad AR . Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet 1998; 352: 609–613.
Dickersin K, Min YI, Meinert CL . Factors influencing publication of research results. Follow-up of applications submitted to two institutional review boards. JAMA 1992; 267: 374–378.
Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR . Publication bias in clinical research. Lancet 1991; 337: 867–872.
Broderick GA, Donatucci CF, Hatzichristou D, Torres LO, Valiquette L, Zhao YL et al. Efficacy of tadalafil in men with erectile dysfunction naive to phosphodiesterase 5 inhibitor therapy compared with prior responders to sildenafil citrate. J Sex Med 2006; 3: 668–675.
Gehr BT, Weiss C, Porzsolt F . The fading of reported effectiveness. A meta-analysis of randomised controlled trials. BMC Med Res Methodol 2006; 6: 25.
US Food and Drug Administration, http://www.fda.gov/cder/. Accessed: September 4, 2007.
Robinson KA, Dickersin K . Development of a highly sensitive search strategy for the retrieval of reports of controlled trials using PubMed. Int J Epidemiol 2002; 31: 150–153.
Altman DG . Practical Statistics for Medical Research, 2nd edn. Chapman & Hall: London, 1997.
Bucher HC, Guyatt GH, Griffith LE, Walter SD . The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J Clin Epidemiol 1997; 50: 683–691.
Song F, Altman DG, Glenny AM, Deeks JJ . Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472.
Deeks JJ, Altman DG, Bradburn J . Statistical methods for examining heterogeneity and combining results from several studies in meta-analysis. In: Egger M, Davey SG, Altman DG (eds). Systematic Reviews in Health Care. Meta-Analysis in Context, 3rd edn. BMJ: London, 2003, pp 285–312.
Higgins JP, Thompson SG . Quantifying heterogeneity in a meta-analysis. Stat Med 2002; 21: 1539–1558.
Higgins JP, Thompson SG, Deeks JJ, Altman DG . Measuring inconsistency in meta-analyses. BMJ 2003; 327: 557–560.
Sterne JA, Gavaghan D, Egger M . Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000; 53: 1119–1129.
Sterne JA, Egger M . Funnel plots for detecting bias in meta-analysis: guidelines on choice of axis. J Clin Epidemiol 2001; 54: 1046–1055.
Begg CB, Mazumdar M . Operating characteristics of a rank correlation test for publication bias. Biometrics 1994; 50: 1088–1101.
Egger M, Davey SG, Schneider M, Minder C . Bias in meta-analysis detected by a simple, graphical test. BMJ 1997; 315: 629–634.
Duval S, Tweedie R . Trim and fill: a simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000; 56: 455–463.
Duval SJ, Tweedie RL . A non-parametric ‘trim and fill’ method of assessing publication bias in meta-analysis. J Am Stat Assoc 2000; 95: 89–98.
Begg CB, Cho MK, Eastwood S, Horton R, Moher D, Olkin I et al. Improving the quality of reporting of randomized controlled trials: the CONSORT statement. JAMA 1996; 276: 637–639.
Lau J, Ionnidis JPA, Terrin N, Schmidt CH, Olkin I . The case of the misleading funnel plot. BMJ 2006; 333: 597–600.
Sterne JA, Egger M . Re: High false positive rate for trim and fill method. Website only: http://www.bmj.com/cgi/eletters/320/7249/1574 2000.
Sutton AJ, Duval SJ, Tweedie RL . Re: Re: High false positive rate for trim and fill method. Website only: http://www.bmj.com/cgi/eletters/320/7249/1574 2000.
Lewis R, Bennett CJ, Borkon WD, Boykin WH, Althof SE, Stecher VJ et al. Patient and partner satisfaction with Viagra (sildenafil citrate) treatment as determined by the Erectile Dysfunction Inventory of Treatment Satisfaction Questionnaire. Urology 2001; 57: 960–965.
Brock G, Nehra A, Lipshultz LI, Karlin GS, Gleave M, Seger M et al. Safety and efficacy of vardenafil for the treatment of men with erectile dysfunction after radical retropubic prostatectomy. J Urol 2003; 170 (4 Part 1): 1278–1283.
Carson C, Shabsigh R, Segal S, Murphy A, Fredlund P, Kuepfer C et al. Efficacy, safety, and treatment satisfaction of tadalafil versus placebo in patients with erectile dysfunction evaluated at tertiary-care academic centers. Urology 2005; 65: 353–359.
Gravel J, Opatmy L, Shapiro S . The intention-to-treat approach in randomized controlled trials: are authors saying what they do and doing what they say? Clin Trials 2007; 4: 350–356.
Hollis S, Campbell F . What is meant by intention to treat analysis? Survey of published randomized controlled trials. BMJ 1999; 319: 670–674.
Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B . Evidence b(i)ased medicine—selective reporting from studies sponsored by pharmaceutical industry: review of studies in new drug applications. BMJ 2003; 326: 1171–1173.
This paper is based on original research conducted by Kleijnen Systematic Reviews. The paper, its content, financial support and editorial control were at all times under the control of Kleijnen Systematic Reviews and its lead authors. However, to ensure full transparency we would like to declare that we have been involved in a systematic review of PDE-5 inhibitors that was supported and funded by Pfizer Ltd.
About this article
Cite this article
Bekkering, G., Abou-Setta, A. & Kleijnen, J. The application of quantitative methods for identifying and exploring the presence of bias in systematic reviews: PDE-5 inhibitors for erectile dysfunction. Int J Impot Res 20, 264–277 (2008). https://doi.org/10.1038/sj.ijir.3901626
- publication bias
- erectile dysfunction
- PDE-5 inhibitors
Do randomized clinical trials with inadequate blinding report enhanced placebo effects for intervention groups and nocebo effects for placebo groups?
Systematic Reviews (2014)
The evidence base for couple therapy, family therapy and systemic interventions for adult-focused problems
Journal of Family Therapy (2014)
Do randomized clinical trials with inadequate blinding report enhanced placebo effects for intervention groups and nocebo effects for placebo groups? A protocol for a meta-epidemiological study of PDE-5 inhibitors
Systematic Reviews (2012)