The correct use of statistics is not just good for science — it is essential.
Statistics for Biologists
There is no disputing the importance of statistical analysis in biological research, but too often it is considered only after an experiment is completed, when it may be too late.
This collection highlights important statistical issues that biologists should be aware of and provides practical advice to help them improve the rigor of their work.
Nature Methods' Points of Significance column on statistics explains many key statistical and experimental design concepts. Other resources include an online plotting tool and links to statistics guides from other publishers.
Image Credit: Erin DeWalt
Statistics in biology
Experimental biologists, their reviewers and their publishers must grasp basic statistics, urges David L. Vaux, or sloppy science will continue to grow.
P values, the 'gold standard' of statistical validity, are not as reliable as many scientists assume.
One-quarter of studies that meet commonly used statistical cutoff may be false.
The reliability and reproducibility of science are under scrutiny. However, a major cause of this lack of repeatability is not being considered: the wide sample-to-sample variability in the P value. We explain why P is fickle to discourage the ill-informed practice of interpreting analyses based predominantly on this statistic.
As the data deluge swells, statisticians are evolving from contributors to collaborators. Sallie Ann Keller urges funders, universities and associations to encourage this shift.
Animal studies have contributed immensely to our understanding of diseases and assist the development of new therapies, but inadequate experimental reporting can sometimes render such studies difficult to reproduce and to translate into the clinic. This year, a US National Institute of Neurological Disorders and Stroke workshop addressed this issue, and its conclusions are discussed in a Perspective piece in this issue of Nature. The main workshop recommendation is that at a minimum, studies should report on randomization, blinding, sample-size estimation and how the data were handled.
This Review discusses the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Low-powered studies lead to overestimates of effect size and low reproducibility of results. In this Analysis article, Munafò and colleagues show that the average statistical power of studies in the neurosciences is very low, discuss ethical implications of low-powered studies and provide recommendations to improve research practices.
The authors analyze a large corpus of the neuroscience literature and demonstrate that nearly half of the published studies considered incorrectly compared effect sizes by comparing their significance levels.
Hierarchical models provide reliable statistical estimates for data sets from high-throughput experiments where measurements vastly outnumber experimental samples.
Alkes Price, Peter Visscher and colleagues provide recommendations on the application of mixed-linear-model association methods across a range of study designs.
A protocol providing guidelines on the organizational aspects of genome-wide association meta-analyses and to implement quality control at the study file level, the meta-level across studies, and the meta-analysis output level.
This perspective illustrates some of the problems involved in analyzing the complex data yielded by systems neuroscience techniques, such as brain imaging and electrophysiology. Specifically, when test statistics are not independent of the selection criteria, common analyses can produce spurious results. The authors suggest ways to avoid such errors.
The authors examine papers in high profile journals and find that while collection of multiple observations from a single research object is common practice, such nested data are often analyzed using inappropriate statistical techniques. The authors show that this results in increased Type I error rates, and propose multilevel modelling to address this issue.
When prioritizing hits from a high-throughput experiment, it is important to correct for random events that falsely appear significant. How is this done and what methods should be used?
There seem to be a lot of computational biology papers with 'Bayesian' in their titles these days. What's distinctive about 'Bayesian' methods?
Statistical models called hidden Markov models are a recurring theme in computational biology. What are hidden Markov models, and why are they so useful for so many different problems?
Statistics does not tell us whether we are right. It tells us the chances of being wrong.
The meaning of error bars is often misinterpreted, as is the statistical significance of their overlap.
The P value reported by tests is a probabilistic significance, not a biological one.
The ability to detect experimental effects is undermined in studies that lack power.
Use box plots to illustrate the spread and differences of samples.
Robustly comparing pairs of independent or related samples requires different approaches to the t-test.
When a large number of tests are performed, P values must be interpreted differently.
Nonparametric tests robustly compare skewed or ranked data.
Good experimental designs limit the impact of variability and reduce sample-size requirements.
Good experimental designs mitigate experimental error and the impact of factors not under study.
Quality is often more important than quantity.
For studies with hierarchical noise sources, use a nested analysis of variance approach.
When multiple factors can affect a system, allowing for interaction can increase sensitivity.
When some factors are harder to vary than others, a split plot design can be efficient.
Incorporate new evidence to update prior information.
Today's predictions are tomorrow's priors.
When multiple variables are associated with a response, the interpretation of a prediction equation is seldom simple.
Some outliers influence the regression fit more than others.