Halsey et al. reply:

We agree with Lazzeroni et al. that researchers often believe P values are infallible1. If the intervals Lazzeroni et al. propose were obligatory with each presentation of P, the unthinking use of the unqualified P value would be undermined2. In theory this would be an excellent outcome.

However, in practice, simply providing tools for quantifying the fickleness of P will highlight an endemic problem without offering any treatment. Whereas Lazzeroni et al. suggest providing information to support P, we have suggested using measures that supersede P for interpreting data3,4. Effect sizes can be standardized, are not based on dichotomous decision making (the flaws of which severely limit the value of statistical power5) and address the more natural research question of how big the effect is, rather than simply asking whether there is an effect3,6. And 95% confidence intervals for the effect size provide a more consistent indication of the true (population-level) condition than does P. Thus comparing the effect sizes and confidence intervals of several similar studies typically uncovers a coherent pattern that is masked when only the P values of those studies are compared2. Furthermore, and crucially, the sample effect sizes and confidence limits of multiple studies can be combined for meta-analysis, enabling researchers to home in on the true effect.

Although we do not encourage the use of power analysis, Lazzeroni et al.'s figure supports our own illustration of the variability in P. As both our models7 and Lazzeroni et al.'s models demonstrate, unless the results of an experiment show a very marked pattern in the data, the reported P value will be accompanied by limits so broad as to render P uninterpretable. Put simply, P is untrustworthy unless the statistical power is very high (above 90%), which offsets advantages of P such as its simplicity. As researchers better appreciate the typically artificial nature of the null hypothesis3 and the limited capacity of P to support hypothesis testing, we believe that P will become much less highly valued.