Calibrating the experimental measurement of psychological attributes


Behavioural researchers often seek to experimentally manipulate, measure and analyse latent psychological attributes, such as memory, confidence or attention. The best measurement strategy is often difficult to intuit. Classical psychometric theory, mostly focused on individual differences in stable attributes, offers little guidance. Hence, measurement methods in experimental research are often based on tradition and differ between communities. Here we propose a criterion, which we term ‘retrodictive validity’, that provides a relative numerical estimate of the accuracy of any given measurement approach. It is determined by performing calibration experiments to manipulate a latent attribute and assessing the correlation between intended and measured attribute values. Our approach facilitates optimising measurement strategies and quantifying uncertainty in the measurement. Thus, it allows power analyses to define minimally required sample sizes. Taken together, our approach provides a metrological perspective on measurement practice in experimental research that complements classical psychometrics.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Retrodiction and calibration.
Fig. 2: Power analysis.
Fig. 3: The retrodiction approach.


  1. 1.

    Steegen, S., Tuerlinckx, F., Gelman, A. & Vanpaemel, W. Increasing transparency through a multiverse analysis. Perspect. Psychol. Sci. 11, 702–712 (2016).

    Article  Google Scholar 

  2. 2.

    Silberzahn, R. et al. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Methods Pract. Psychol. Sci. 1, 337–356 (2018).

    Article  Google Scholar 

  3. 3.

    Lonsdorf, T. B. et al. Navigating the garden of forking paths for data exclusions in fear conditioning research. eLife 8, e52465 (2019).

    CAS  Article  PubMed Central  Google Scholar 

  4. 4.

    Boucsein, W. et al. Publication recommendations for electrodermal measurements. Psychophysiology 49, 1017–1034 (2012).

    Article  Google Scholar 

  5. 5.

    Blumenthal, T. D. et al. Committee report: guidelines for human startle eyeblink electromyographic studies. Psychophysiology 42, 1–15 (2005).

    Article  Google Scholar 

  6. 6.

    Ojala, K. E. & Bach, D. R. Measuring learning in human classical threat conditioning: Translational, cognitive and methodological considerations. Neurosci. Biobehav. Rev. 114, 96–112 (2020).

    Article  Google Scholar 

  7. 7.

    Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

    Article  Google Scholar 

  8. 8.

    Lonsdorf, T. B., Merz, C. J. & Fullana, M. A. Fear extinction retention: is it what we think it is? Biol. Psychiatry 85, 1074–1082 (2019).

    Article  Google Scholar 

  9. 9.

    Houwer, J. D. Why the cognitive approach in psychology would profit from a functional approach and vice versa. Perspect. Psychol. Sci. 6, 202–209 (2011).

    Article  Google Scholar 

  10. 10.

    Luce, R.D. & Suppes, P. Representational measurement theory. in Stevens’ Handbook of Experimental Psychology (ed. Pashler, H.) (2002).

  11. 11.

    Michell, J. The psychometricians’ fallacy: too clever by half? Br. J. Math. Stat. Psychol. 62, 41–55 (2009).

    Article  Google Scholar 

  12. 12.

    Estler, W. T. Measurement as inference: fundamental ideas. CIRP Annals 48, 611–632 (1999).

    Article  Google Scholar 

  13. 13.

    Phillips, S. D., Estler, W. T., Doiron, T., Eberhardt, K. R. & Levenson, M. S. A careful consideration of the calibration concept. J. Res. Natl. Inst. Stand. Technol. 106, 371–379 (2001).

    CAS  Article  PubMed Central  Google Scholar 

  14. 14.

    International Bureau of Weights and Measures (BIPM). The international vocabulary of metrology—basic and general concepts and associated terms (VIM). (JCGM, 2012).

  15. 15.

    Shadish, W.R., Cook, T.D. & Campbell, D.T. Experimental and Quasi-Experimental Designs for Generalized Causal Inference (Houghton Mifflin, 2002).

  16. 16.

    Cronbach, L. J. & Meehl, P. E. Construct validity in psychological tests. Psychol. Bull. 52, 281–302 (1955).

    CAS  Article  Google Scholar 

  17. 17.

    Cronbach, L.J. Five perspectives on validity argument. in Test Validity (eds. Wainer, H. & Braun, H. I.) 3–17 (Routledge, 1988).

  18. 18.

    van der Maas, H. L. J., Molenaar, D., Maris, G., Kievit, R. A. & Borsboom, D. Cognitive psychology meets psychometric theory: on the relation between process models for decision making and latent variable models for individual differences. Psychol. Rev. 118, 339–356 (2011).

    Article  Google Scholar 

  19. 19.

    Bach, D. R. & Friston, K. J. Model-based analysis of skin conductance responses: Towards causal models in psychophysiology. Psychophysiology 50, 15–22 (2013).

    Article  Google Scholar 

  20. 20.

    Bach, D. R. et al. Psychophysiological modeling: Current state and future directions. Psychophysiology 55, e13214 (2018).

    Article  Google Scholar 

  21. 21.

    Bach, D. R. & Melinscak, F. Psychophysiological modelling and the measurement of fear conditioning. Behav. Res. Ther. 127, 103576 (2020).

    Article  PubMed Central  Google Scholar 

  22. 22.

    Bach, D. R., Tzovara, A. & Vunder, J. Blocking human fear memory with the matrix metalloproteinase inhibitor doxycycline. Mol. Psychiatry 23, 1584–1589 (2018).

    CAS  Article  Google Scholar 

  23. 23.

    Novick, M. R. The axioms and principal results of classical test theory. J. Math. Psychol. 3, 1–18 (1966).

    Article  Google Scholar 

  24. 24.

    Lord, F. M. A strong true-score theory, with applications. Psychometrika 30, 239–270 (1965).

    CAS  Article  Google Scholar 

  25. 25.

    Metzner, C., Mäki-Marttunen, T., Zurowski, B. & Steuber, V. Modules for automated validation and comparison of models of neurophysiological and neurocognitive biomarkers of psychiatric disorders: ASSRUnit—a case study. Comput. Psychiatry 2, 74–91 (2018).

    Article  Google Scholar 

  26. 26.

    Rigdon, E. E., Sarstedt, M. & Becker, J. M. Quantify uncertainty in behavioral research. Nat. Hum. Behav. 4, 329–331 (2020).

    Article  Google Scholar 

  27. 27.

    Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).

    CAS  Article  Google Scholar 

  28. 28.

    Smaldino, P. E. & McElreath, R. The natural selection of bad science. R. Soc. Open Sci. 3, 160384 (2016).

    Article  PubMed Central  Google Scholar 

  29. 29.

    Khemka, S., Tzovara, A., Gerster, S., Quednow, B. B. & Bach, D. R. Modeling startle eyeblink electromyogram to assess fear learning. Psychophysiology 54, 204–214 (2017).

    Article  Google Scholar 

  30. 30.

    Bang, D. & Fleming, S. M. Distinct encoding of decision confidence in human medial prefrontal cortex. Proc. Natl. Acad. Sci. USA 115, 6082–6087 (2018).

    CAS  Article  Google Scholar 

  31. 31.

    Wager, T. D. et al. An fMRI-based neurologic signature of physical pain. N. Engl. J. Med. 368, 1388–1397 (2013).

    CAS  Article  PubMed Central  Google Scholar 

  32. 32.

    Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).

    Article  Google Scholar 

  33. 33.

    Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl. Acad. Sci. USA 115, 2600–2606 (2018).

    CAS  Article  Google Scholar 

Download references


D.R.B. is supported by funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. ERC-2018 CoG-816564 ActionContraThreat). S.M.F. is supported by a Sir Henry Dale Fellowship from the Wellcome Trust and Royal Society (206648/Z/17/Z). The Wellcome Centre for Human Neuroimaging is funded by core funding from the Wellcome Trust (203147/Z/16/Z). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information




D.R.B., F.M., S.M.F. and M.C.V. contributed to conception of the work. D.R.B. wrote and F.M. and M.C.V. contributed to the mathematical derivation. D.R.B., F.M., S.M.F. and M.C.V. wrote and revised the paper.

Corresponding author

Correspondence to Dominik R. Bach.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Primary handling editor: Jamie Horder

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Results, Supplementary Discussion, Supplementary Fig. 1 and Supplementary References.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bach, D.R., Melinščak, F., Fleming, S.M. et al. Calibrating the experimental measurement of psychological attributes. Nat Hum Behav (2020).

Download citation


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing