We agree that arbitrarily branding experimental findings as significant or non-significant generates a false sense of certainty (V. Amrhein et al. Nature 567, 305–307; 2019). However, when done properly, hypothesis testing is an important precondition for estimating an effect size.
In his 1928 book Statistical Methods for Research Workers, British statistician Ronald Fisher remarked that “it is a useful preliminary before making a statistical estimate … to test if there is anything to justify estimation at all”. And British polymath Harold Jeffreys declared in Theory of Probability in 1939 that “variation must be taken as random until there is positive evidence to the contrary”. Hence, testing and estimation are complementary. Testing establishes whether there is an effect, and that helps to determine whether or not the magnitude needs to be estimated.
What happens when statistical testing is skipped and the null hypothesis is ignored? Well, noise would be interpreted as structural, and any differences between observations would be considered meaningful. Parameters would need to be estimated for all these differences, resulting in a “mere catalogue” of data “without any summaries at all”, as Jeffreys put it.
Without the restraint provided by testing, an estimation-only approach will lead to overfitting of research results, poor predictions and overconfident claims.
Nature 567, 461 (2019)