Testing for statistical significance should be an aid to interpreting scientific results, and — when applied sensibly — to decision-making. It should not be a mindless quest for verification (see V. Amrhein et al. Nature 567, 305–307; 2019). In my experience, the correction of P values for multiple testing — a valuable tool in the fight against P hacking and in the proper interpretation of genome-wide association studies, for example — is being comparably abused through ignorance.
Too often, I find myself up against criticisms from reviewers who draw no distinction between tests carried out on evidence-weighted, mechanistically legitimate risk variables and tests applied to ad hoc collections of measurements (roughly akin to grandmothers’ dogs’ tail lengths). The distinction was spelt out more than 20 years ago (T. V. Perneger Br. Med. J. 316, 1236–1238; 1998). That nobody took any notice shows how tight a grip the lust for certainty — neatly dubbed by Amrhein and colleagues as “dichotomania” — has on a researcher’s psyche.
Nature 569, 192 (2019)