These are just some of the phrases that my red pen has traversed in recent months:
Such statements are often innocently inserted but the underlying suggestion is that the non-significant findings might be significant if the study was to be done again, or if the study had used a larger sample.
Perhaps statements like this largely originate from the mistaken belief that if the study was nearly significant this time then it will probably be significant next time. However, there is no basis to this belief. A p value does not get progressively smaller with replication. To the contrary, p values are random variables: if a study was to be exactly replicated many times the p value would jump around [1, 2]. This has been called the “dance of the p values” . Even if the next study has a larger sample size than the first, there is no guarantee that a “nearly significant” result will become a “significant result”.
The “nearly significant” terminology may in part be due to the arbitrary nature of setting critical p values at 0.05. There is no good reason, beyond neatness, for setting the critical p value at 0.05 rather than 0.06 or 0.07. So—surely—near enough is good enough. This logic might be acceptable if it were used consistently and in both directions: the researcher would need to be prepared to say just as many times that a p value of 0.04 was nearly insignificant or on the brink of insignificance as they were to say that a p value of 0.06 was nearly significant. Of course that is never going to become accepted practice! After all, a p value is a line drawn in the sand. Researchers are free to draw different lines before they start a study but they are not free to change the rules when their data do not oblige.
Spinal Cord will continue to remove any statements that imply that a p value is on the brink of, close to, or approaching significance because: