Widespread poor research practices raise difficult questions about how to bring about improvements. Unfortunately, I believe that the Analysis article by Button et al. (Power failure: why small sample size undermines the reliability of neuroscience. Nature Rev. Neurosci. 14, 365–376 (2013))1, along with previous similar discussion of sample size2, misidentifies small sample size as a fundamental cause of problems in research and at the same time uncritically accepts a very harmful overemphasis on whether p < 0.05.

Much of their argument is undercut by the fact that the positive predictive value of p < 0.05 (PPV) is an unacceptably poor measure of the evidence that a study provides. PPV ignores distinctions between different p values below 0.05, such as p = 0.049 versus p < 0.0001, and therefore wastes information. Estimated effects, confidence intervals and exact p values should be considered when interpreting a study's results, and these make power irrelevant for interpreting completed studies3,4,5. In addition, any specific result (for example, p = 0.040) is not weaker evidence because of small sample size per se than the same p value would be with a larger sample size6.

Button et al.1 rightly distinguish the inherent consequences of small sample size from associated characteristics, but they do not acknowledge that questioning the validity of all small studies on the basis of associated factors is likely to be both ineffective and unfair. Associated problems should be addressed directly; trying to mitigate them by advocating larger sample sizes is distracting and confusing. Indeed, the concept of 'adequate' sample size promotes misinterpretation7 of study results owing to the focus being only on whether p <0.05. Importantly, the 'winner's curse' is caused by selection and is not an inherent problem for small studies if their results will be disseminated no matter what they turn out to be.

The discussion of ethics in the article1 neglects a fundamental fact about power (and any other measure of a study's projected value): diminishing marginal returns8,9. Each additional subject produces a smaller increment in projected scientific or practical value than the previous one. This implies that efficiency defined by projected value per animal sacrificed will be worse with a larger planned sample size8.

In addition, Button et al.1 do not fully acknowledge the many conceptual and practical difficulties of power-based sample size planning. The fact of diminishing marginal returns precludes any meaningful definition of 'adequately powered' versus 'underpowered' (Ref. 7); the goal of 80% power is only an arbitrary convention10. In addition, specifying the 'right' alternative effect size, along with other assumptions needed for calculations (such as the standard deviation), is often difficult; the true effect is not always a sensible choice for power calculations (for example, see the Xuan row in table 1 in the article1) and cannot be known with good accuracy in advance7. Power calculations therefore should not overrule cost–efficiency and feasibility9, and this is impossible in real research practice anyway.

Manipulation of the design, conduct, analysis and interpretation of studies towards producing more 'interesting' results is a serious problem, as is selective dissemination of studies' results, but these are not caused by small sample size. In addition, it is counterproductive to analyse power and PPV, the very definitions of which contain the assumption that a study's results will be dichotomized. Trying to improve research while conceding that a study's information will be reduced to just one bit of information — whether p < 0.05 — is like starting an armistice negotiation by offering unconditional surrender.