Test for reliability of results ‘too easy to pass’, say editors.
A controversial statistical test has finally met its end, at least in one journal. Earlier this month, the editors of Basic and Applied Social Psychology (BASP) announced that the journal would no longer publish papers containing P values because the statistics were too often used to support lower-quality research1.
Authors are still free to submit papers to BASP with P values and other statistical measures that form part of ‘null hypothesis significance testing’ (NHST), but the numbers will be removed before publication. Nerisa Dozo, a PhD student in psychology at the University of Queensland in Brisbane, Australia, tweeted:
Jan de Ruiter, a cognitive scientist at Bielefeld University in Germany, tweeted: “NHST is really problematic”, but added that banning all inferential statistics is “throwing away the baby with the p-value”.
P values are widely used in science to test null hypotheses. For example, in a medical study looking at smoking and cancer, the null hypothesis could be that there is no link between the two. Many researchers interpret a lower P value as stronger evidence that the null hypothesis is false. Many also accept findings as ‘significant’ if the P value comes in at less than 0.05. But P values are slippery, and sometimes, significant P values vanish when experiments and statistical analyses are repeated (see Nature 506, 150–152; 2014).
In an editorial explaining the new policy, editor David Trafimow and associate editor Michael Marks, who are psychologists at New Mexico State University in Las Cruces, say that P values have become a crutch for scientists dealing with weak data. “We believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research,” they write.
Speaking to Nature, Trafimow says that he would be happy if null hypothesis testing disappeared from all published research: “If scientists are depending on a process that’s blatantly invalid, we should get rid of it.” He admits, however, that he does not know which statistical approach should take its place.
Some puzzled over how scientists are supposed to judge whether work has validity without some statistical rules, and the suggestion that scientists could do away entirely with P values met with some derision online. Sanjay Srivastava, a psychologist at the University of Oregon in Eugene, wryly tweeted that conclusions should be the next thing to be banned. But Srivastava also sees a serious side to the new policy. In another tweet, he said:
Srivastava told Nature that he was pleased to see that several psychology journals — including Psychological Science and the Journal of Research in Personality — recently adopted different standards for data analysis, and that he is keeping an open mind about BASP’s change of course. “A pessimistic prediction is that it will become a dumping ground for results that people couldn’t publish elsewhere,” he says. “An optimistic prediction is that it might become an outlet for good, descriptive research that was undervalued under the traditional criteria.”
De Ruiter says that he doesn’t harbour much love for P values, mostly because they don’t accurately reflect the quality of evidence and can lead to false positives. But he is still “baffled” by the move to get rid of them completely. “I predict this will go wrong,” he says. “You can’t do science without some sort of inferential statistics.”
Trafimow responds that experiments and hypothesis testing had been around for centuries before P values were invented. “I’d rather not have any inferential statistics at all than have some that we know aren’t valid,” he says.
Trafimow, D. & Marks, M. Basic Appl. Soc. Psych. 37, 1–2 (2015).
Related links in Nature Research
Related external links
About this article
Cite this article
Woolston, C. Psychology journal bans P values. Nature 519, 9 (2015). https://doi.org/10.1038/519009f
Empirical Software Engineering (2022)
Knowledge and Information Systems (2022)
Philosophy, Ethics, and Humanities in Medicine (2019)
Statistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic review
European Journal of Epidemiology (2017)
Confidence limits, error bars and method comparison in molecular modeling. Part 2: comparing methods
Journal of Computer-Aided Molecular Design (2016)