Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Statistical power calculations reflect our love affair with P-values and hypothesis testing: time for a fundamental change

Statistical power calculations are still considered essential when planning trials in spinal cord injury (SCI) and some may think it heresy to suggest otherwise.1 Power calculations are reported in grant applications, ethic submissions and journal publications. The power of a trial refers to the probability (usually set at 80%) that the trial will reach statistical significance if there truly is the specified difference between two groups.2 Power calculations rely on estimating variability and nominating an α-level (usually, 0.05). As variability increases, then power decreases. Power is only a relevant concept during the planning stages of a trial. Once a trial is completed then power is meaningless.2, 3 To determine whether a completed trial has a sufficient sample size requires examination of the precision of treatment estimates (for example, the 95% confidence intervals of the mean between-group differences).

Power calculations are problematic for one fundamental reason; they perpetuate an out-dated focus on the hypothesis-testing approach to statistical analysis.2, 4 The hypothesis-testing approach dichotomises results as significant or not significant making interpretation misleadingly simplistic. Clinical trialists and ‘statisticians have for a long time argued that this [approach] is inappropriate and that trials should be used to estimate treatment effects and not to test hypotheses’ (p. 806).5 This might go against everything some have been taught. However, the push away from P-values and hypothesis testing is not new, and journals are increasingly trying to encourage researchers and readers to focus on estimates of treatment effects6 (the uncertainty of those estimates can be quantified with confidence intervals). One leading journals tried abolishing P-values altogether, and a number of others insist that estimates of treatment effects be reported in an effort to stop researchers and readers simply dichotomising results as significant or not.6

An alternative to statistical power calculations is called ‘precision for planning’ (p. 372).3 This involves calculating the sample size required to ensure a pre-set level of precision in estimates. Precise estimates require larger sample sizes than imprecise estimates. This approach still requires predicting variability that can be problematic but, importantly, it does not focus on P-values. Before we see a shift to ‘precision for planning’ there will need to be a fundamental change in attitudes and understanding. Trials will need to be valued for their ability to provide estimates of treatment effects rather than their ability to provide statistically significant findings. This shift in attitudes and understanding is not only important for how we determine sample size requirements for trials but also for how we conduct and interpret clinical trials, systematic reviews and clinical practice guidelines. We need this shift to advance evidence-based care for people with SCI.

References

  1. Schulz KF, Grimes DA . Sample size calculations in randomised trials: mandatory and mystical. Lancet 2005; 365: 1348–1353.

    Article  Google Scholar 

  2. Goodman SN, Berlin JA . The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med 1994; 121: 200–206.

    CAS  Article  Google Scholar 

  3. Cumming G . Understanding the New Statistics: Effect Sizes, Confidence Intervals and Meta-Analysis. Routledge: London, UK. 2012.

    Google Scholar 

  4. Schmidt F, Hunter J . Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In: Harlow L, Mulaik S, Steiger J (eds). What If There Were No Significance Tests?. Erlbaum: Mahwah, NJ. 1997.

    Google Scholar 

  5. Edwards SJL, Lilford RJ, Braunholtz D, Jackson J . Why ‘underpowered’ trials are not necessarily unethical. Lancet 1997; 350: 804–807.

    CAS  Article  Google Scholar 

  6. Fidler F, Thomason N, Cumming G, Finch S, Leeman J . Editors can lead researchers to confidence intervals, but can’t make them think: statistical reform lessons from medicine. Psychol Sci 2004; 15: 119–126.

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Geoff Cumming, Joanne Glinsky and Ian Cameron for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L A Harvey.

Ethics declarations

Competing interests

The author declares no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Harvey, L. Statistical power calculations reflect our love affair with P-values and hypothesis testing: time for a fundamental change. Spinal Cord 52, 2 (2014). https://doi.org/10.1038/sc.2013.117

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/sc.2013.117

Further reading

Search

Quick links