Functional neuroimaging techniques have transformed our ability to probe the neurobiological basis of behaviour and are increasingly being applied by the wider neuroscience community. However, concerns have recently been raised that the conclusions that are drawn from some human neuroimaging studies are either spurious or not generalizable. Problems such as low statistical power, flexibility in data analysis, software errors and a lack of direct replication apply to many fields, but perhaps particularly to functional MRI. Here, we discuss these problems, outline current and suggested best practices, and describe how we think the field should evolve to produce the most meaningful and reliable answers to neuroscientific questions.
There is growing concern about the reproducibility of scientific research, and neuroimaging research suffers from many features that are thought to lead to high levels of false results.
Statistical power of neuroimaging studies has increased over time but remains relatively low, especially for group comparison studies. An analysis of effect sizes in the Human Connectome Project demonstrates that most functional MRI studies are not sufficiently powered to find reasonable effect sizes.
Neuroimaging analysis has a high degree of flexibility in analysis methods, which can lead to inflated false-positive rates unless controlled for. Pre-registration of analysis plans and clear delineation of hypothesis-driven and exploratory research are potential solutions to this problem.
The use of appropriate corrections for multiple tests has increased, but some common methods can have highly inflated false-positive rates. The use of non-parametric methods is encouraged to provide accurate correction for multiple tests.
Software errors have the potential to lead to incorrect or irreproducible results. The adoption of improved software engineering methods and software testing strategies can help to reduce such problems.
Reproducibility will be improved through greater transparency in methods reporting and through increased sharing of data and code.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $22.08 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
R.A.P., J.D., J.-B.P. and K.J.G. are supported by the Laura and John Arnold Foundation. J.D. has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 706561. M.R.M. is supported by the Medical Research Council (MRC) (MC UU 12013/6) and is a member of the UK Centre for Tobacco and Alcohol Studies, a UK Clinical Research Council Public Health Research Centre of Excellence. Funding from the British Heart Foundation, Cancer Research UK, the Economic and Social Research Council, the MRC and the National Institute for Health Research, under the auspices of the UK Clinical Research Collaboration, is gratefully acknowledged. C.I.B. is supported by the Intramural Research Program of the US National Institutes of Health (NIH)–National Institute of Mental Health (NIMH) (ZIA-MH002909). T.Y. is supported by the NIMH (R01MH096906). P.M.M. acknowledges personal support from the Edmond J. Safra Foundation and Lily Safra and research support from the MRC, the Imperial College Healthcare Trust Biomedical Research Centre and the Imperial Engineering and Physical Sciences Research Council Mathematics in Healthcare Centre. T.E.N. is supported by the Wellcome Trust (100309/Z/12/Z), NIH–National Institute of Neurological Disorders and Stroke (R01NS075066) and NIH–National Institute of Biomedical Imaging and Bioengineering (NIBIB) (R01EB015611). J.-B.P. is supported by the NIBIB (P41EB019936) and by NIH–National Institute on Drug Abuse (U24DA038653). Data were provided (in part) by the Human Connectome Project, WU-Minn Consortium (principal investigators: D. Van Essen and K. Ugurbil; 1U54MH091657), which is funded by the 16 Institutes and Centers of the NIH that support the NIH Blueprint for Neuroscience Research, and by the McDonnell Center for Systems Neuroscience at Washington University. The authors thank J. Wexler for performing annotation of Neurosynth data, S. David for providing sample-size data, and R. Cox and P. Taylor for helpful comments on a draft of the manuscript.
A depiction of the data from Figure 1 showing all data points.
- Linear mixed-effects analysis
An analysis in which some measured independent variables are treated as randomly sampled from the population, in contrast to a traditional fixed-effects analysis, in which all predictors are treated as fixed and known.
- Familywise error
(FWE). The probability of at least one false positive among multiple statistical tests.
- Random field theory
The theory describing the behaviour of geometric points on a random topological space.
- Euler characteristic
A topological measure that is used to describe the set of thresholded voxels in the context of random field theory.
- False discovery rate
(FDR). The expected proportion of false positives among all significant findings when performing multiple statistical tests.
- Functional localizer
An independent scan that is used to identify regions on the basis of their functional response; for example, for the responses of face-responsive regions to faces.
- Bayesian methods
An approach to statistical analysis focusing on updating beliefs via probability distributions and symmetrically comparing candidate models.
- Mass univariate testing
An approach to the analysis of multivariate data in which the same model is fit to each element of the observed data (for example, each voxel).
- Permutation tests
Also known as randomization tests. Approaches for testing statistical significance by comparing to a null distribution that is obtained by rearranging the labels of the observed data.
- 'Not invented here' philosophy
The philosophy that any solution to a problem that was developed by someone else is necessarily inferior and must be re-engineered from scratch.
The operation by which a function is applied to the sampled data to obtain estimates of the data at positions where data have not been sampled.
- Software container
A self-contained software tool that encompasses all of the necessary software and dependencies to run a particular program.