Know when your numbers are significant

Vaux, David L.

doi:10.1038/492180a

Download PDF

Comment
Published: 12 December 2012

Research methods

Know when your numbers are significant

David L. Vaux¹

Nature volume 492, pages 180–181 (2012)Cite this article

151k Accesses
96 Citations
221 Altmetric
Metrics details

Subjects

Experimental biologists, their reviewers and their publishers must grasp basic statistics, urges David L. Vaux, or sloppy science will continue to grow.

The incidence of papers in cell and molecular biology that have basic statistical mistakes is alarming. I see figures with error bars that do not say what they describe, and error bars and P values for single, 'representative' experiments. So, as an increasingly weary reviewer of many a biology publication, I'm going to spell out again¹ the basics that every experimental biologist should know.

Credit: ILLUSTRATION BY PETE ELLIS/DRAWGOOD.COM

Simply put, statistics and error bars should be used only for independent data, and not for identical replicates within a single experiment. Because science represents the knowledge gained from repeated observations or experiments, these have to be performed more than once — or must use multiple independent samples — for us to have confidence that the results are not just a fluke, a coincidence or a mistake. To show only the result of a single experiment, even if it is a representative one, and then misuse statistics to justify that decision, erodes the integrity of the scientific literature.

It is eight years since Nature adopted a policy of insisting that papers containing figures with error bars describe what the error bars represent². Nevertheless, it is still common to find papers in most biology journals — Nature included — that contain this and other basic statistical errors. In my opinion, the fact that these scientifically sloppy papers continue to be published means that the authors, reviewers and editors cannot comprehend the statistics, that they have not read the paper carefully, or both.

Why does this happen? Most cell and molecular biologists are taught some statistics during their high-school or undergraduate years, but the principles seem to be forgotten somewhere between graduation and starting in the lab. Often, the type of statistics they learnt is not relevant to the kinds of experiment they are now doing. And, once in the lab, people generally just do what everyone else does, without always understanding why.

Even if experimental biologists do not need to use statistical evidence for their own experiments, they should have an understanding of the basics so that they can interpret others' work critically. They don't all need to understand complex statistics, or to hire professional statisticians, but there would be fewer sloppy papers if every author, reviewer and editor understood statistical concepts such as standard deviation, standard error of the mean (s.e.m.), sampling error and the difference between replicate and independent data (see 'Statistics glossary').

Table 1 Statistics glossary: Some common statistical concepts and their uses in analysing experimental results.

Full size table

Back to basics

In the life sciences there are typically two types of publication: those that use large data sets and rely mostly or wholly on statistical evidence (for example, epidemiology, psychology, clinical trials and genome-wide association studies), and those that do not — such as much cell and molecular biology, biochemistry and classical genetics.

For papers with large data sets that rely purely on statistical evidence, recommendations exist for computing sample size, reporting on outlying results and other issues^3,4. But these guidelines do not serve authors of the other category of papers. Cell and molecular biologists have the luxury of being able to probe their experimental systems in multiple, independent ways and can therefore often get by with Ns of three, without the need for sophisticated statistics.

The first figure in a typical paper in cell or molecular biology, for example, might show the difference in phenotype between three wild-type and three gene-deleted mice. The second figure might compare the levels of proteins in cells derived from the mice, looking at both the deleted protein and one of its substrates, or the effects of treating wild-type cells with an inhibitor of the protein encoded by the deleted gene. If the evidence from these experiments is consistent, and gives support to a coherent model, it would be unnecessary to analyse 30 mice of each type, or to repeat the Western blots of protein levels 30 independent times. Watson and Crick's paper on the structure of DNA⁵ does not contain statistics, graphs with error bars or large Ns.

Understanding the rudiments of statistics would stop experimental biologists from calculating a P value and a s.e.m. from triplicates from one representative experiment, and might stop the reviewers and editors from letting these pass unquestioned. If the results from one representative experiment are shown, then N = 1 and statistics do not apply. Besides, it is always better to include a full data set, rather than withholding results that are not representative. When N is only 2 or 3, it would be more transparent to just plot the independent data points, and let the readers interpret the data for themselves, rather than showing possibly misleading P values or error bars and drawing statistical inferences.

If the data in an experiment are equivocal, or the effect size is small, it is much better to come up with an extra, mechanistically different, experiment to test the hypothesis, than to repeat the same experiment until P is less than 0.05.

If statistics are shown, it should be for a good reason. Descriptive statistics, such as range or standard deviations, are only necessary when there are too many data points to visualize easily. Inferential statistics (an s.e.m., confidence interval or P value) should be shown only if they make it easier to interpret the results, and they should not detract from other key considerations such as the magnitude of the effects or their biological significance.

Figure legends should state the number of independent data points and, for experiments in which replicates were performed, only the mean of the replicates should be shown as a single independent data point. For replicates, no statistics should be shown, because they give only an indication of the fidelity with which the replicates were created: they might indicate how good the pipetting was, but they have no bearing on the hypothesis being tested⁶.

Experimental biologists should know what sort of sampling errors are to be expected.

All experimental biologists and all those who review their papers should know what sort of sampling errors are to be expected in common experiments, such as determining the percentages of live and dead cells or counting the number of colonies on a plate or cells in a microscope field. Otherwise, they will not be able to judge their own data critically, or anyone else's.

Repeat after me

How can the understanding and use of elementary statistics be improved? Young researchers need to be taught the practicalities of using statistics at the point at which they obtain the results of their very first experiments.

To encourage established researchers to use statistics properly, journals should publish guidelines for authors, reviewers and editors on the use and presentation of data and statistics that are relevant to the fields they cover. All journals should follow the lead of the Journal of Cell Biology⁷ and make a final check of all figures in accepted papers before publication. They should refuse to publish papers that contain fundamental errors, and readily publish corrections for published papers that fall short. This requires engaging reviewers who are statistically literate and editors who can verify the process. Numerical data should be made available either as part of the paper or as linked, computer-interpretable files so that readers can perform or confirm statistical analyses themselves.

When William Strunk Jr, a professor of English, was faced with a flood of errors in spelling, grammar and English usage, he wrote a short, practical guide that became The Elements of Style (also known as Strunk and White)⁸. Perhaps experimental biologists need a similar booklet on statistics.

References

Cumming, G., Fidler, F. & Vaux, D. L. J. Cell Biol. 177, 7–11 (2007).
Article CAS Google Scholar
Vaux, D. L. Nature 428, 799 (2004).
Article ADS CAS Google Scholar
Landis, S. C. et al. Nature 490, 187–191 (2012).
Article ADS CAS Google Scholar
Nakagawa, S. & Cuthill, I. C. Biol. Rev. Camb. Philos. Soc. 82, 591–605 (2007).
Article Google Scholar
Watson, J. D. & Crick, F. H. Nature 171, 737–738 (1953).
Article ADS CAS Google Scholar
Vaux, D. L., Fidler, F. & Cumming, G. EMBO Rep. 13, 291–296 (2012).
Article CAS Google Scholar
Rossner, M. The Scientist 20, 24–25 (2006).
Google Scholar
Strunk, W. Jr & White, E. B. The Elements of Style (5th edn) (Allyn & Bacon, 2009).
Google Scholar

Download references

Author information

Authors and Affiliations

David L. Vaux is professor of cell biology at the Walter and Eliza Hall Institute of Medical Research and at the University of Melbourne, Parkville, Victoria 3052, Australia.,
David L. Vaux

Authors

David L. Vaux
View author publications
You can also search for this author in PubMed Google Scholar

Vaux, D. Know when your numbers are significant. Nature 492, 180–181 (2012). https://doi.org/10.1038/492180a

Download citation

Published: 12 December 2012
Issue Date: 13 December 2012
DOI: https://doi.org/10.1038/492180a

This article is cited by

Quick tips for interpreting cell death experiments
- Scott J. Dixon
- Michael J. Lee
Nature Cell Biology (2023)
MilkyBase, a database of human milk composition as a function of maternal-, infant- and measurement conditions
- Tünde Pacza
- Mayara L. Martins
- József Baranyi
Scientific Data (2022)
Specificities of exosome versus small ectosome secretion revealed by live intracellular tracking of CD63 and CD9
- Mathilde Mathieu
- Nathalie Névo
- Clotilde Théry
Nature Communications (2021)
Albumin-based nanoparticles as contrast medium for MRI: vascular imaging, tissue and cell interactions, and pharmacokinetics of second-generation nanoparticles
- E. A. Wallnöfer
- G. C. Thurner
- P. Debbage
Histochemistry and Cell Biology (2021)
Autofluorescence-based sorting removes senescent cells from mesenchymal stromal cell cultures
- Alessandro Bertolo
- Julien Guerrero
- Jivko Stoyanov
Scientific Reports (2020)

Know when your numbers are significant

Subjects

References

Author information

Authors and Affiliations

Related links

Related links in Nature Research

Rights and permissions

About this article

Cite this article

This article is cited by

Quick tips for interpreting cell death experiments

MilkyBase, a database of human milk composition as a function of maternal-, infant- and measurement conditions

Specificities of exosome versus small ectosome secretion revealed by live intracellular tracking of CD63 and CD9

Albumin-based nanoparticles as contrast medium for MRI: vascular imaging, tissue and cell interactions, and pharmacokinetics of second-generation nanoparticles

Autofluorescence-based sorting removes senescent cells from mesenchymal stromal cell cultures

Search

Quick links

Subjects

References

Author information

Authors and Affiliations

Related links

Related links

Related links in Nature Research

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Quick tips for interpreting cell death experiments

MilkyBase, a database of human milk composition as a function of maternal-, infant- and measurement conditions

Specificities of exosome versus small ectosome secretion revealed by live intracellular tracking of CD63 and CD9

Albumin-based nanoparticles as contrast medium for MRI: vascular imaging, tissue and cell interactions, and pharmacokinetics of second-generation nanoparticles

Autofluorescence-based sorting removes senescent cells from mesenchymal stromal cell cultures

Search

Quick links