Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence

Abstract

Most neuroscientists would agree that for brain research to progress, we have to know which experimental manipulations have no effect as much as we must identify those that do have an effect. The dominant statistical approaches used in neuroscience rely on P values and can establish the latter but not the former. This makes non-significant findings difficult to interpret: do they support the null hypothesis or are they simply not informative? Here we show how Bayesian hypothesis testing can be used in neuroscience studies to establish both whether there is evidence of absence and whether there is absence of evidence. Through simple tutorial-style examples of Bayesian t-tests and ANOVA using the open-source project JASP, this article aims to empower neuroscientists to use this approach to provide compelling and rigorous evidence for the absence of an effect.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: P value of a t-test and BF+0 as a function of effect size and sample size.
Fig. 2: Hypothesis testing under the Bayesian framework.
Fig. 3: Illustration of the data for the two simulated scenarios.
Fig. 4: Screenshot from the ‘Bayesian Independent Samples T-Test’ in JASP.
Fig. 5: Screenshot of the Bayesian repeated measures ANOVA of muscimol1.
Fig. 6: Further outputs for the Bayesian t-test on muscimol1.csv.

Data availability

All data and code can be downloaded at https://osf.io/md9kp/.

References

  1. 1.

    Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).

    Article  Google Scholar 

  2. 2.

    Dienes, Z. Using Bayes to get the most out of non-significant results. Front. Psychol. 5, 781 (2014).

    Article  Google Scholar 

  3. 3.

    Gallistel, C. R. The importance of proving the null. Psychol. Rev. 116, 439–453 (2009).

    CAS  Article  Google Scholar 

  4. 4.

    Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D. & Iverson, G. Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev. 16, 225–237 (2009).

    Article  Google Scholar 

  5. 5.

    Love, J. et al. JASP: Graphical statistical software for common statistical designs. J. Stat. Softw. 88, 1–17 (2019).

    Article  Google Scholar 

  6. 6.

    Wagenmakers, E.-J. et al. The need for Bayesian hypothesis testing in psychological science. in Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions (eds. Lilienfeld, S. O. & Waldman, I.) 123–138 (Wiley, 2017).

  7. 7.

    Altman, D. G. & Bland, J. M. Absence of evidence is not evidence of absence. Br. Med. J. 311, 485 (1995).

    CAS  Article  Google Scholar 

  8. 8.

    Edwards, W., Lindman, H. & Savage, L. J. Bayesian statistical inference for psychological research. Psychol. Rev. 70, 193–242 (1963).

    Article  Google Scholar 

  9. 9.

    Jeffreys, H. Theory of Probability (Oxford University Press, 1961).

  10. 10.

    Szucs, D. & Ioannidis, J. P. A. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 15, e2000797 (2017).

    Article  Google Scholar 

  11. 11.

    Etz, A. & Wagenmakers, E.-J. J. B. S. Haldane’s contribution to the Bayes factor hypothesis test. Stat. Sci. 32, 313–329 (2017).

    Article  Google Scholar 

  12. 12.

    Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    Article  Google Scholar 

  13. 13.

    Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge University Press, 2013).

  14. 14.

    Morey, R. D. & Rouder, J. N. BayesFactor: computation of Bayes factors for common designs. v. 0.9.12–4.2 https://cran.r-project.org/package=BayesFactor (2018).

  15. 15.

    Carrillo, M. et al. Emotional mirror neurons in the rat’s anterior cingulate cortex. Curr. Biol. 29, 1301–1312.e6 (2019).

    CAS  Article  Google Scholar 

  16. 16.

    Jeffreys, H. Theory of Probability (Oxford University Press, 1939).

  17. 17.

    Nieuwenhuis, S., Forstmann, B. U. & Wagenmakers, E.-J. Erroneous analyses of interactions in neuroscience: a problem of significance. Nat. Neurosci. 14, 1105–1107 (2011).

    CAS  Article  Google Scholar 

  18. 18.

    Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).

    Article  Google Scholar 

  19. 19.

    Morey, R. D. & Rouder, J. N. Bayes factor approaches for testing interval null hypotheses. Psychol. Methods 16, 406–419 (2011).

    Article  Google Scholar 

  20. 20.

    Rouder, J. N., Morey, R. D., Speckman, P. L. & Province, J. M. Default Bayes factors for ANOVA designs. J. Math. Psychol. 56, 356–374 (2012).

    Article  Google Scholar 

  21. 21.

    Rouder, J. N., Engelhardt, C. R., McCabe, S. & Morey, R. D. Model comparison in ANOVA. Psychon. Bull. Rev. 23, 1779–1786 (2016).

    Article  Google Scholar 

  22. 22.

    Myung, I. J. & Pitt, M. A. Applying Occam’s razor in modeling cognition: a Bayesian approach. Psychon. Bull. Rev. 4, 79–95 (1997).

    Article  Google Scholar 

  23. 23.

    Efron, B. Why isn’t everyone a Bayesian? Am. Stat. 40, 1–5 (1986).

    Google Scholar 

  24. 24.

    Lee, M. D. & Vanpaemel, W. Determining informative priors for cognitive models. Psychon. Bull. Rev. 25, 114–127 (2018).

    Article  Google Scholar 

  25. 25.

    Bayarri, M. J., Berger, J. O., Forte, A. & Garcia-Donato, G. Criteria for Bayesian model choice with application to variable selection. Ann. Stat. 40, 1550–1577 (2012).

    Article  Google Scholar 

  26. 26.

    Cremers, H. R., Wager, T. D. & Yarkoni, T. The relation between statistical power and inference in fMRI. PLoS ONE 12, e0184923 (2017).

    Article  Google Scholar 

  27. 27.

    Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D. & Wagenmakers, E.-J. The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev. 23, 103–123 (2016).

    Article  Google Scholar 

  28. 28.

    Marsman, M., Waldorp, L., Dablander, F. & Wagenmakers, E. J. Bayesian estimation of explained variance in ANOVA designs. Stat. Neerl. 73, 351–372 (2019).

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    van Doorn, J., Marsman, M., Ly, A. & Wagenmakers, E.-J. Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and Spearman’s ρ. J. Appl. Stat. https://doi.org/10.1080/02664763.2019.1709053 (2020).

  30. 30.

    Wagenmakers, E.-J., Morey, R. D. & Lee, M. D. Bayesian benefits for the pragmatic researcher. Curr. Dir. Psychol. Sci. 25, 169–176 (2016).

    Article  Google Scholar 

  31. 31.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction. (The MIT Press, 1998).

  32. 32.

    Wrinch, D. & Jeffreys, H. On certain fundamental principles of scientific inquiry. Philos. Mag. 42, 368–374 (1923).

    Article  Google Scholar 

  33. 33.

    Rozeboom, W. W. The fallacy of the null-hypothesis significance test. Psychol. Bull. 57, 416–428 (1960).

    CAS  Article  Google Scholar 

  34. 34.

    Stefan, A. M., Gronau, Q. F., Schönbrodt, F. D. & Wagenmakers, E.-J. A tutorial on Bayes factor design analysis using an informed prior. Behav. Res. Methods 51, 1042–1058 (2019).

    Article  Google Scholar 

  35. 35.

    Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2018).

    Article  Google Scholar 

  36. 36.

    Rouder, J. N. Optional stopping: no problem for Bayesians. Psychon. Bull. Rev. 21, 301–308 (2014).

    Article  Google Scholar 

  37. 37.

    Schönbrodt, F. D. & Wagenmakers, E.-J. Bayes factor design analysis: planning for compelling evidence. Psychon. Bull. Rev. 25, 128–142 (2018).

    Article  Google Scholar 

  38. 38.

    Consonni, G., Fouskakis, D., Liseo, B. & Ntzoufras, I. Prior distributions for objective Bayesian analysis. Bayesian Anal. 13, 627–679 (2018).

    Article  Google Scholar 

  39. 39.

    Gronau, Q. F., Ly, A. & Wagenmakers, E.-J. Informed Bayesian t-tests. Am. Stat. 74, 137–143 (2019).

    Article  Google Scholar 

Download references

Acknowledgements

C.K. is funded by NWO VICI grant 453-15-009; V.G. is funded by ERC grant 758703 and NWO VIDI grant 452-14-015; E.J.W. is funded by NWO VICI grant 453-16-003. We thank F. Bartos for help with Fig. 2.

Author information

Affiliations

Authors

Contributions

All authors conceived the project together and contributed to the writing of the manuscript. E.J.W. coordinates the development of JASP.

Corresponding author

Correspondence to Christian Keysers.

Ethics declarations

Competing interests

E.J.W. declares that he coordinates the development of the open-source software package JASP (https://jasp-stats.org), a non-commercial, publicly-funded effort to make Bayesian statistics accessible to a broader group of researchers and students. C.K. and V.G. declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks Denise Cai, Zhe Dong, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The relationship between BF, p, and effect sizes values.

a, This log-log plot shows the BF+0 values corresponding to familiar critical p values for a one-tailed one-sample t-test at different sample sizes (n). The curves show the BF+0 values obtained in a Bayesian t-test based on the critical t-value that provides P=0.05 (yellow), P=0.01 (green), P=0.005 (black) and P=0.001 (black). The yellow dashed horizontal line indicates the BF+0=3 bound for moderate evidence considered by Jeffreys9 to be similar to P=0.05, the green one the BF+0=10 for strong evidence considered similar to P=0.01. The two black dashed lines mark BF+0=1, i.e. the line of no evidence, and BF+0=1/3, the bound for moderate evidence of absence. The background gradient reminds the reader that the BF reference values of 3 and 10 should not be considered hard bounds. Instead the BF should be interpreted as a continuous value, with values diverging more from 1 supporting stronger conclusions. This panel makes two points. First, there is no simple equivalence between p and BF that holds over all sample sizes. This is because in a frequentist t-test, the observed effect size (d) sufficient to generate a specific p value decreases with \(\sqrt {\mathrm{n}}\) more rapidly than for the BF. As a result, at large n, very small effect sizes generate ‘significant’ t-test: at n=1000, the critical t-value for a one-tailed P=0.05 is 1.65, corresponding to d=1.65 /\(\sqrt {\mathrm{n}}\) =0.05. For the BF, such a minuscule effect is 4 times more likely under H0 than H+ (BF+0=0.26). Hence, for small sample sizes p and BF support similar conclusions (e.g., P=0.05 at n=4 corresponds to BF+0>3, supporting the same conclusion of evidence for an effect), but for large sample sizes the frequentist and Bayesian conclusions can diverge in the presence of very small effect sizes (e.g., P=0.05 at n=1000 corresponds to BF+0<1/3, see Jeffreys, H. Some Tests of Significance, Treated by the Theory of Probability. Proc. Cambridge Philos. Soc. 31, 203–222 (1935)). Considering confidence or credible intervals of the effect size in addition to p or BF values helps interpret such cases. Second, the fact that the dashed lines are above the curve of the same color for all n>4 shows that BF+0=3 and BF+0=10 indeed protect against Type I errors in a frequentist sense at least at P=0.05 or P=0.01, respectively. In other words, if BF10>3, p<0.05, and if BF10>10, p<0.01, but how much lower than 0.05 or 0.01 the exact P value is, depends on n. b, BF+0 (left) and p (right) values as a function of measured effect- and sample-sizes. These panels illustrate the measured effect sizes necessary to provide evidence for an effect at different sample sizes in a one-sample one-tailed t-test using the BF vs. traditional p values. Each curve connects the results at different sample sizes for the specified value of d. The logarithmic BF and p scales are aligned so as to place BF=3 next to P=0.05, and BF=10 next to P=0.01.

Extended Data Fig. 2 Evidence for or against a factor in a Bayesian ANOVA.

A Bayesian ANOVA is a form of model comparison. This figure illustrates how the Bayes factor can provide evidence for a simpler model by concentrating its predictions on a single parameter value. This example ANOVA determines whether or not the data D depend on the value of the factor Group by comparing the Null Model D=0*Group (left) against the Group Model D=β*Group, with a Cauchy prior on β (right). The top row illustrates the prior probability attributed to the different values of β under the two competing models. Note how both models include β = 0 as a possibility, but given that the probability values must integrate to 1 over the entire β space, for the Null Model p(β = 0) = 1 while for the Group Model, the probability is distributed across all plausible alternative values. The middle row shows the predicted t-values based on these priors, where t represents the difference between the data from the two groups as in Fig. 2. Note how these predictions are more peaked for the Null compared to the Group model. The bottom row compares the predicted probability of finding particular t-values under the two models, and shows how values close to zero (i.e., small or no difference between the groups) are predicted more often by the Null compared to the Group Model, while the opposite is true for large t-values. If conducting the experiment reveals a measured t-values close to zero, the Bayes Factor for including the factor Group would be substantially below 1, providing evidence for the absence of an effect of Group, while the inverse would be true for high t-values.

Extended Data Fig. 3

Examples of how to report results.

Supplementary information

Supplementary Information

Supplementary Note on continuous testing and Supplementary Fig. 1.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Keysers, C., Gazzola, V. & Wagenmakers, E. Using Bayes factor hypothesis testing in neuroscience to establish evidence of absence. Nat Neurosci 23, 788–799 (2020). https://doi.org/10.1038/s41593-020-0660-4

Download citation