Collection 09 May 2017

Statistics for Biologists

There is no disputing the importance of statistical analysis in biological research, but too often it is considered only after an experiment is completed, when it may be too late.

This collection highlights important statistical issues that biologists should be aware of and provides practical advice to help them improve the rigor of their work.

Nature Methods' Points of Significance column on statistics explains many key statistical and experimental design concepts. Other resources include an online plotting tool and links to statistics guides from other publishers.

Image Credit: Erin DeWalt

Statistics in biology

Number crunch

The correct use of statistics is not just good for science — it is essential.

Editorial12 Feb 2014 Nature
Know when your numbers are significant

Experimental biologists, their reviewers and their publishers must grasp basic statistics, urges David L. Vaux, or sloppy science will continue to grow.
- David L. Vaux
Comment12 Dec 2012 Nature
Scientific method: Statistical errors

P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.
- Regina Nuzzo
News Feature12 Feb 2014 Nature
Weak statistical standards implicated in scientific irreproducibility

One-quarter of studies that meet commonly used statistical cutoff may be false.
- Erika Check Hayden
News11 Nov 2013 Nature
The fickle P value generates irreproducible results

The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample-to-sample variability in the P value. We explain why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.
- Lewis G Halsey
- Douglas Curran-Everett
- Gordon B Drummond
Commentary26 Feb 2015 Nature Methods
Vital statistics

As the data deluge swells, statisticians are evolving from contributors to collaborators. Sallie Ann Keller urges funders, universities and associations to encourage this shift.
- Sallie Ann Keller
Comment20 Oct 2010 Nature
Sometimes Bayesian statistics are better
- Stefan Herzog
- Dirk Ostwald
Correspondence6 Feb 2013 Nature
A call for transparent reporting to optimize the predictive value of preclinical research

Deficiencies in methods reporting in animal experimentation lead to difficulties in reproducing experiments; the authors propose a set of reporting standards to improve scientific communication and study design.
- Story C. Landis
- Susan G. Amara
- Shai D. Silberberg
PerspectiveOpen Access10 Oct 2012 Nature

Practical guides

Statistical power and significance testing in large-scale genetic studies

This Review discusses the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
- Pak C. Sham
- Shaun M. Purcell
Review Article17 Apr 2014 Nature Reviews Genetics
Power failure: why small sample size undermines the reliability of neuroscience

Low-powered studies lead to overestimates of effect size and low reproducibility of results. In this Analysis article, Munafò and colleagues show that the average statistical power of studies in the neurosciences is very low, discuss ethical implications of low-powered studies and provide recommendations to improve research practices.
- Katherine S. Button
- John P. A. Ioannidis
- Marcus R. Munafò
Analysis10 Apr 2013 Nature Reviews Neuroscience
Basic statistical analysis in genetic case-control studies
- Geraldine M Clarke
- Carl A Anderson
- Krina T Zondervan
Protocol3 Feb 2011 Nature Protocols
Erroneous analyses of interactions in neuroscience: a problem of significance

The authors analyze a large corpus of the neuroscience literature and demonstrate that nearly half of the published studies considered incorrectly compared effect sizes by comparing their significance levels.
- Sander Nieuwenhuis
- Birte U Forstmann
- Eric-Jan Wagenmakers
Perspective26 Aug 2011 Nature Neuroscience
Analyzing 'omics data using hierarchical models

Hierarchical models provide reliable statistical estimates for data sets from high-throughput experiments where measurements vastly outnumber experimental samples.
- Hongkai Ji
- X Shirley Liu
Primer1 Apr 2010 Nature Biotechnology
Advantages and pitfalls in the application of mixed-model association methods

Alkes Price, Peter Visscher and colleagues provide recommendations on the application of mixed-linear-model association methods across a range of study designs.
- Jian Yang
- Noah A Zaitlen
- Alkes L Price
Perspective29 Jan 2014 Nature Genetics
Quality control and conduct of genome-wide association meta-analyses

A protocol providing guidelines on the organizational aspects of genome-wide association meta-analyses and to implement quality control at the study file level, the meta-level across studies, and the meta-analysis output level.
- Thomas W Winkler
- Felix R Day
- The Genetic Investigation of Anthropometric Traits (GIANT) Consortium
Protocol24 Apr 2014 Nature Protocols
Circular analysis in systems neuroscience: the dangers of double dipping

This perspective illustrates some of the problems involved in analyzing the complex data yielded by systems neuroscience techniques, such as brain imaging and electrophysiology. Specifically, when test statistics are not independent of the selection criteria, common analyses can produce spurious results. The authors suggest ways to avoid such errors.
- Nikolaus Kriegeskorte
- W Kyle Simmons
- Chris I Baker
Perspective26 Apr 2009 Nature Neuroscience
A solution to dependency: using multilevel analysis to accommodate nested data

The authors examine papers in high profile journals and find that while collection of multiple observations from a single research object is common practice, such nested data are often analyzed using inappropriate statistical techniques. The authors show that this results in increased Type I error rates, and propose multilevel modelling to address this issue.
- Emmeke Aarts
- Matthijs Verhage
- Sophie van der Sluis
Perspective26 Mar 2014 Nature Neuroscience
How does multiple testing correction work?

When prioritizing hits from a high-throughput experiment, it is important to correct for random events that falsely appear significant. How is this done and what methods should be used?
- William S Noble
Primer1 Dec 2009 Nature Biotechnology
What is Bayesian statistics?

There seem to be a lot of computational biology papers with 'Bayesian' in their titles these days. What's distinctive about 'Bayesian' methods?
- Sean R Eddy
Primer1 Sep 2004 Nature Biotechnology
What is a hidden Markov model?

Statistical models called hidden Markov models are a recurring theme in computational biology. What are hidden Markov models, and why are they so useful for so many different problems?
- Sean R Eddy
Primer1 Oct 2004 Nature Biotechnology
Importance of being uncertain

Statistics does not tell us whether we are right. It tells us the chances of being wrong.
- Martin Krzywinski
- Naomi Altman
This Month29 Aug 2013 Nature Methods
Error bars

The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.
- Martin Krzywinski
- Naomi Altman
This Month27 Sep 2013 Nature Methods
Significance, P values and t-tests

The P value reported by tests is a probabilistic significance, not a biological one.
- Martin Krzywinski
- Naomi Altman
This Month30 Oct 2013 Nature Methods
Power and sample size

The ability to detect experimental effects is undermined in studies that lack power.
- Martin Krzywinski
- Naomi Altman
This Month26 Nov 2013 Nature Methods
Visualizing samples with box plots

Use box plots to illustrate the spread and differences of samples.
- Martin Krzywinski
- Naomi Altman
This Month30 Jan 2014 Nature Methods
Comparing samples—part I

Robustly comparing pairs of independent or related samples requires different approaches to the t-test.
- Martin Krzywinski
- Naomi Altman
This Month27 Feb 2014 Nature Methods
Comparing samples—part II

When a large number of tests are performed, P values must be interpreted differently.
- Martin Krzywinski
- Naomi Altman
This Month28 Mar 2014 Nature Methods
Nonparametric tests

Nonparametric tests robustly compare skewed or ranked data.
- Martin Krzywinski
- Naomi Altman
This Month29 Apr 2014 Nature Methods
Designing comparative experiments

Good experimental designs limit the impact of variability and reduce sample-size requirements.
- Martin Krzywinski
- Naomi Altman
This Month29 May 2014 Nature Methods
Analysis of variance and blocking

Good experimental designs mitigate experimental error and the impact of factors not under study.
- Martin Krzywinski
- Naomi Altman
This Month27 Jun 2014 Nature Methods
Replication

Quality is often more important than quantity.
- Paul Blainey
- Martin Krzywinski
- Naomi Altman
This Month28 Aug 2014 Nature Methods
Nested designs

For studies with hierarchical noise sources, use a nested analysis of variance approach.
- Martin Krzywinski
- Naomi Altman
- Paul Blainey
This Month29 Sep 2014 Nature Methods
Two-factor designs

When multiple factors can affect a system, allowing for interaction can increase sensitivity.
- Martin Krzywinski
- Naomi Altman
This Month25 Nov 2014 Nature Methods
Sources of variation
- Naomi Altman
- Martin Krzywinski
This Month30 Dec 2014 Nature Methods
Split plot design

When some factors are harder to vary than others, a split plot design can be efficient.
- Naomi Altman
- Martin Krzywinski
This Month26 Feb 2015 Nature Methods
Bayes' theorem

Incorporate new evidence to update prior information.
- Jorge López Puga
- Martin Krzywinski
- Naomi Altman
This Month31 Mar 2015 Nature Methods
Bayesian statistics

Today's predictions are tomorrow's priors.
- Jorge López Puga
- Martin Krzywinski
- Naomi Altman
This Month29 Apr 2015 Nature Methods
Sampling distributions and the bootstrap
- Anthony Kulesa
- Martin Krzywinski
- Naomi Altman
This Month28 May 2015 Nature Methods
Bayesian networks
- Jorge López Puga
- Martin Krzywinski
- Naomi Altman
This Month28 Aug 2015 Nature Methods
Association, correlation and causation
- Naomi Altman
- Martin Krzywinski
This Month29 Sep 2015 Nature Methods
Simple linear regression
- Naomi Altman
- Martin Krzywinski
This Month29 Oct 2015 Nature Methods
Multiple linear regression

When multiple variables are associated with a response, the interpretation of a prediction equation is seldom simple.
- Martin Krzywinski
- Naomi Altman
This Month1 Dec 2015 Nature Methods
Analyzing outliers: influential or nuisance?

Some outliers influence the regression fit more than others.
- Naomi Altman
- Martin Krzywinski
This Month30 Mar 2016 Nature Methods

Statistics for Biologists

Statistics in biology

Practical guides

Search

Quick links