Significant confusion in scientists' grasp of statistics

Article metrics

Sir

I agree with the points you make about statistical significance under the heading 'Significant' in your News Feature 'Disputed definitions' (Nature 455, 1023–1028; 2008). However, you do imply that the term 'significant' means simply above or below the 5% level — a figure chosen by the statistician R. A. Fisher for practical reasons and used in the days when people did arithmetic by hand and referred to printed tables.

Nowadays, of course, personal computers do more general calculations and report probability (P) values directly. A P-value may be exact (obtained from counting permutations), an approximation based on asymptotes, or derived from a model by repeated simulation. It then has to be reported and interpreted. Too many scientists — and editors — take the line you reproach and use statistical significance as a criterion of importance.

In addition, significance is calculated in respect of a null model, chosen by the researcher and often in the knowledge that it is untenable. Why would you make measurements to compare groups if you expected to find no differences? A small P-value may therefore be pure fiction as a measure of knowledge gained. This comes on top of any undisclosed history of data selection and of cherry-picking results during the data analysis.

Conversely, numbers obtained from small surveys rarely demonstrate clear-cut (significant) results for individual questions, and a pattern of non-significant results in an expected direction across a range of questions could still be worth reporting as indicative. When the null hypothesis is a straw man, it may be more interesting not to be able to demonstrate the anticipated effect — for example, in a pay survey that finds no gender differences.

I endorse your view that what may seem to be sophistry is a crucial distinction. Compare, for example, the statement “The observed differences could occur 5% of the time if the true effect is zero” with the statement “The probability that the true effect is zero is 5%”. Not only is the latter statement wrong, it does not match the scientific question, which should be to estimate, at a given probability, the minimum size of the effect. Another common variation is to report “no differences between groups” on the basis of t-tests that check for a difference only between the group means.

For scientists, talking statistics can be more dangerous than what your interviewee described as “talking Swahili in Louisiana” — unless they grasp the grammar as well as the words.

Author information

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Reese, R. Significant confusion in scientists' grasp of statistics. Nature 456, 315 (2008) doi:10.1038/456315b

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.