Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Toward dynamic phenotypes and the scalable measurement of human behavior


Precision psychiatry demands the rapid, efficient, and temporally dense collection of large scale and multi-omic data across diverse samples, for better diagnosis and treatment of dynamic clinical phenomena. To achieve this, we need approaches for measuring behavior that are readily scalable, both across participants and over time. Efforts to quantify behavior at scale are impeded by the fact that our methods for measuring human behavior are typically developed and validated for single time-point assessment, in highly controlled settings, and with relatively homogeneous samples. As a result, when taken to scale, these measures often suffer from poor reliability, generalizability, and participant engagement. In this review, we attempt to bridge the gap between gold standard behavioral measurements in the lab or clinic and the large-scale, high frequency assessments needed for precision psychiatry. To do this, we introduce and integrate two frameworks for the translation and validation of behavioral measurements. First, borrowing principles from computer science, we lay out an approach for iterative task development that can optimize behavioral measures based on psychometric, accessibility, and engagement criteria. Second, we advocate for a participatory research framework (e.g., citizen science) that can accelerate task development as well as make large-scale behavioral research more equitable and feasible. Finally, we suggest opportunities enabled by scalable behavioral research to move beyond single time-point assessment and toward dynamic models of behavior that more closely match clinical phenomena.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: The need for scale in behavioral research.
Fig. 2: Iterative task development.


  1. 1.

    Redish AD, Gordon JA, editors. Computational psychiatry: new perspectives on mental illness. Cambridge, MA: MIT Press; 2016.

  2. 2.

    Yehia L, Eng C. Largescale population genomics versus deep phenotyping: brute force or elegant pragmatism towards precision medicine. NPJ Genome Med. 2019;4:1–2.

    Google Scholar 

  3. 3.

    Open Science Collaboration. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716.

    Google Scholar 

  4. 4.

    Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.

    CAS  Google Scholar 

  5. 5.

    Szucs D, Ioannidis JP. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 2017;15:1–18.

    Google Scholar 

  6. 6.

    Henrich J, Heine SJ, Norenzayan A. Beyond WEIRD: towards a broad-based behavioral science. Behav Brain Sci. 2010;33:111.

    Google Scholar 

  7. 7.

    Williams DR, Jackson PB. Social sources of racial disparities in health. Health Aff. 2005;24:325–34.

    Google Scholar 

  8. 8.

    Brown G, Marshall M, Bower P, Woodham A, Waheed W. Barriers to recruiting ethnic minorities to mental health research: a systematic review. Int J Meth Psychiatr Res. 2014;23:36–48.

    Google Scholar 

  9. 9.

    Arean PA, Alvidrez J, Nery R, Estes C, Linkins K. Recruitment and retention of older minorities in mental health services research. Gerontologist. 2003;43:36–44.

    Google Scholar 

  10. 10.

    Chen H, Kramer EJ, Chen T, Chung H. Engaging Asian Americans for mental health research: challenges and solutions. J Immigr Health. 2005;7:109–18.

    Google Scholar 

  11. 11.

    Le HN, Lara MA, Perry DF. Recruiting Latino women in the US and women in Mexico in postpartum depression prevention research. Arch Women’s Ment Health. 2008;11:159–69.

    Google Scholar 

  12. 12.

    Miranda J. Introduction to the special section on recruiting and retaining minorities in psychotherapy research. J Consult Clin Psychol. 1996;64:848.

    CAS  Google Scholar 

  13. 13.

    Cohen RA, Sparling-Cohen YA, O’Donnell BF. The neuropsychology of attention. New York, NY: Plenum Press; 1993.

    Google Scholar 

  14. 14.

    Torous J, Nicholas J, Larsen ME, Firth J, Christensen H. Clinical review of user engagement with mental health smartphone apps: evidence, theory and improvements. Evid Based Ment Health. 2018;21:116–9.

    Google Scholar 

  15. 15.

    Ng MM, Firth J, Minen M, Torous J. User engagement in mental health apps: a review of measurement, reporting, and validity. Psychiatr Serv. 2019;70:538–44.

    Google Scholar 

  16. 16.

    Apodaca R, Lea S, Edwards B. The effect of longitudinal burden on survey participation. In: Proceedings of the Survey Research Methods Section. American Statistical Association; 1998.

  17. 17.

    Kerr DC, Ornelas IJ, Lilly MM, Calhoun R, Meischke H. Participant engagement in and perspectives on a web-based mindfulness intervention for 9-1-1 telecommunicators: multimethod study. J Med Internet Res. 2019;21:e13449.

    Google Scholar 

  18. 18.

    Yancey AK, Ortega AN, Kumanyika SK. Effective recruitment and retention of minority research participants. Annu Rev Public Health. 2006;27:1–28.

    Google Scholar 

  19. 19.

    Gilliss CL, Lee KA, Gutierrez Y, Taylor D, Beyene Y, Neuhaus J, et al. Recruitment and retention of healthy minority women into community-based longitudinal research. J Wom Health Gend Base Med. 2001;10:77–85.

    CAS  Google Scholar 

  20. 20.

    Musthag M, Raij A, Ganesan D, Kumar S, Shiffman S. Exploring micro-incentive strategies for participant compensation in high-burden studies. In: Proceedings of the 13th International Conference on Ubiquitous Computing; 2011. p. 435–44.

  21. 21.

    Loxton D, Young A. Longitudinal survey development and design. Int J Mult Res Approaches. 2007;1:114–25.

    Google Scholar 

  22. 22.

    Anguera JA, Jordan JT, Castaneda D, Gazzaley A, Areán PA. Conducting a fully mobile and randomised clinical trial for depression: access, engagement and expense. BMJ Innov. 2016;2:14–21.

    Google Scholar 

  23. 23.

    Ejiogu N, Norbeck JH, Mason MA, Cromwell BC, Zonderman AB, Evans MK. Recruitment and retention strategies for minority or poor clinical research participants: lessons from the Healthy Aging in Neighborhoods of Diversity across the Life Span study. Gerontologist. 2011;51:S33–45.

    Google Scholar 

  24. 24.

    Loue S, Sajatovic M. Research with severely mentally ill Latinas: successful recruitment and retention strategies. J Immigr Minor Health. 2008;10:145–53.

    Google Scholar 

  25. 25.

    Anderson ML, Riker T, Hakulin S, Meehan J, Gagne K, Higgins T, et al. Deaf ACCESS: adapting consent through community engagement and state-of-the-art simulation. J Def Stud Deaf Educ. 2020;25:115–25.

    Google Scholar 

  26. 26.

    Deering S, Grade MM, Uppal JK, Foschini L, Juusola JL, Amdur AM, et al. Accelerating research with technology: rapid recruitment for a large-scale web-based sleep study. JMIR Res Protoc. 2019;8:e10974.

    Google Scholar 

  27. 27.

    Zaphiris P, Kurniawan S, Ghiawadwala M. A systematic approach to the development of research-based web design guidelines for older people. Univers Access Inf Soc. 2007;6:59.

    Google Scholar 

  28. 28.

    Friedman MG, Bryen DN. Web accessibility design recommendations for people with cognitive disabilities. Technol Disabil. 2007;19:205–12.

    Google Scholar 

  29. 29.

    Bernard R, Sabariego C, Cieza A. Barriers and facilitation measures related to people with mental disorders when using the web: a systematic review. J Med Internet Res. 2016;18:e157.

    Google Scholar 

  30. 30.

    Akoumianakis D, Stephanidis C. Universal design in HCI: a critical review of current research and practice. Eng Constr. 1989;754.

  31. 31.

    McCarthy JE, Swierenga SJ. What we know about dyslexia and web accessibility: a research review. Univers Access Inf Soc. 2010;9:147–52.

    Google Scholar 

  32. 32.

    Nordhoff M, August T, Oliveira NA, Reinecke K. A case for design localization: diversity of website aesthetics in 44 countries. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 2018. p. 1–12.

  33. 33.

    Gajos KZ, Chauncey K. The influence of personality traits and cognitive load on the use of adaptive user interfaces. In: Proceedings of the 22nd International Conference on Intelligent User Interfaces. 2017. p. 301–6.

  34. 34.

    Eraslan S, Yaneva V, Yesilada Y, Harper S. Web users with autism: eye tracking evidence for differences. Behav Inf Technol. 2019;38:678–700.

    Google Scholar 

  35. 35.

    Schwartz AE, Kramer JM, Longo AL. Patient‐reported outcome measures for young people with developmental disabilities: incorporation of design features to reduce cognitive demands. Dev Med Child Neurol. 2018;60:173–84.

    Google Scholar 

  36. 36.

    Hawthorn D. Interface design and engagement with older people. Behav Inf Technol. 2007;26:333–41.

    Google Scholar 

  37. 37.

    Lindgaard G, Dudek C, Sen D, Sumegi L, Noonan P. An exploration of relations between visual appeal, trustworthiness and perceived usability of homepages. ACM Trans Comput Hum Interact. 2011;18:1–30.

    Google Scholar 

  38. 38.

    Finnerty A, Kucherbaev P, Tranquillini S, Convertino G. Keep it simple: reward and task design in crowdsourcing. In: Proceedings of the biannual conference of the Italian chapter of SIGCHI. New York, NY: Association for Computing Machinery; 2013. p.1–4.

  39. 39.

    Kosslyn SM, Cacioppo JT, Davidson RJ, Hugdahl K, Lovallo WR, Spiegel D, et al. Bridging psychology and biology: the analysis of individuals in groups. Am Psychol. 2002;57:341.

    Google Scholar 

  40. 40.

    Enkavi AZ, Eisenberg IW, Bissett PG, Mazza GL, MacKinnon DP, Marsch LA, et al. Large-scale analysis of test–retest reliabilities of self-regulation measures. Proc Nat Acad Sci. 2019;116:5472–7.

    CAS  Google Scholar 

  41. 41.

    Hedge C, Powell G, Sumner P. The reliability paradox: why robust cognitive tasks do not produce reliable individual differences. Behav Res Methods. 2018;50:1166–86.

    Google Scholar 

  42. 42.

    McNally RJ. Attentional bias for threat: crisis or opportunity? Clin Psychol Rev. 2019;69:4–13.

    Google Scholar 

  43. 43.

    Parsons S, Kruijt AW, Fox E. Psychological science needs a standard practice of reporting the reliability of cognitive-behavioral measurements. Adv Methods Pract Psychol Sci. 2019;2:378–95.

    Google Scholar 

  44. 44.

    Passell E, Dillon DG, Baker JT, Vogel SC, Scheuer LS, Mirin NL, et al. Digital cognitive assessment: results from the TestMyBrain NIMH Research Domain Criteria (RDoC) field test battery report. Psyarxiv. 2019.

  45. 45.

    Plomin R, Kosslyn SM. Genes, brain and cognition. Nat Neurosci 2001;4:1153–4.

    CAS  Google Scholar 

  46. 46.

    Rodebaugh TL, Scullin RB, Langer JK, Dixon DJ, Huppert JD, Bernstein A, et al. Unreliability as a threat to understanding psychopathology: the cautionary tale of attentional bias. J Abnorm Psychol. 2016;125:840.

    Google Scholar 

  47. 47.

    Kappenman ES, Farrens JL, Luck SJ, Proudfit GH. Behavioral and ERP measures of attentional bias to threat in the dot-probe task: poor reliability and lack of correlation with anxiety. Front Psychol. 2014;5:1368.

    Google Scholar 

  48. 48.

    Waechter S, Nelson AL, Wright C, Hyatt A, Oakman J. Measuring attentional bias to threat: reliability of dot probe and eye movement indices. Cogn Ther Res. 2014;38:313–33.

    Google Scholar 

  49. 49.

    Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychol Bull. 1955;52:281.

    CAS  Google Scholar 

  50. 50.

    Basil VR, Turner AJ. Iterative enhancement: a practical technique for software development. IEEE Trans Softw Eng. 1975;4:390–6.

    Google Scholar 

  51. 51.

    Nielsen J. Iterative user-interface design. Computer. 1993;26:32–41.

    Google Scholar 

  52. 52.

    Kohavi R, Longbotham R. Online controlled experiments and A/B testing. Encycl Mach Learn Data Min. 2017;7:922–9.

    Google Scholar 

  53. 53.

    Condon DM, Revelle W. The international cognitive ability resource: development and initial validation of a public-domain measure. Intelligence. 2014;43:52–64.

    Google Scholar 

  54. 54.

    Condon DM, Revelle W. Selected ICAR data from the SAPA-Project: development and initial validation of a public-domain measure. J Open Psychol Data. 2016;4.

  55. 55.

    Baribault B, Donkin C, Little DR, Trueblood JS, Oravecz Z, van Ravenzwaaij D, et al. Metastudies for robust tests of theory. Proc Nat Acad Sci. 2018;115:2607–12.

    CAS  Google Scholar 

  56. 56.

    Germine L, Reinecke K, Chaytor NS. Digital neuropsychology: challenges and opportunities at the intersection of science and software. Clin Neuropsychol. 2019;33:271–86.

    Google Scholar 

  57. 57.

    Beukenhorst AL, Howells K, Cook L, McBeth J, O’Neill TW, Parkes MJ, et al. Engagement and participant experiences with consumer smartwatches for health research: Longitudinal, Observational Feasibility Study. JMIR mHealth uHealth. 2020;8:e14368.

    Google Scholar 

  58. 58.

    Buhrmester M, Kwang T, Gosling SD. Amazon’s Mechanical Turk: a new source of inexpensive, yet high-quality data? In: Kazdin E, editor. Methodological issues and strategies in clinical research. 2016. p. 133–9.

  59. 59.

    Palan S, Schitter C. Prolific. ac—a subject pool for online experiments. J Behav Exp Financ. 2018;17:22–7.

    Google Scholar 

  60. 60.

    Van Pelt C, Sorokin A. Designing a scalable crowdsourcing platform. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 2012. p. 765–6.

  61. 61.

    Cornwall A, Jewkes R. What is participatory research? Soc Sci Med. 1995;41:1667–76.

    CAS  Google Scholar 

  62. 62.

    Minkler M, Wallerstein N, editors. Community-based participatory research for health: from process to outcomes. San Francisco, CA: John Wiley & Sons; 2011.

  63. 63.

    Horowitz CR, Robinson M, Seifer S. Community-based participatory research from the margin to the mainstream: are researchers prepared? Circulation. 2009;119:2633–42.

    Google Scholar 

  64. 64.

    Duchaine B, Germine L, Nakayama K. Family resemblance: ten family members with prosopagnosia and within-class object agnosia. Cogn Neuropsychol. 2007;24:419–30.

    Google Scholar 

  65. 65.

    Germine L, Nakayama K, Duchaine BC, Chabris CF, Chatterjee G, Wilmer JB. Is the Web as good as the lab? Comparable performance from Web and lab in cognitive/perceptual experiments. Psychon Bull Rev. 2012;19:847–57.

    Google Scholar 

  66. 66.

    Oliveira N, Jun E, Reinecke K. Citizen science opportunities in volunteer-based online experiments. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 2017. p. 6800–12.

  67. 67.

    Hartshorne JK, Germine LT. When does cognitive functioning peak? The asynchronous rise and fall of different cognitive abilities across the life span. Psychol Sci. 2015;26:433–43.

    Google Scholar 

  68. 68.

    Jun E, Hsieh G, Reinecke K. Types of motivation affect study selection, attention, and dropouts in online experiments. Proc ACM Hum Comput Interact. 2017;1:1–5.

    Google Scholar 

  69. 69.

    Li Q, Gajos KZ, Reinecke K. Volunteer-based online studies with older adults and people with disabilities. In: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility. 2018. p. 229–41.

  70. 70.

    Ye T, Reinecke K, Robert Jr LP. Personalized feedback versus money: the effect on reliability of subjective data in online experimental platforms. In: Companion of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing. New York, NY: Association for Computing Machinery; 2017. p. 343–6.

  71. 71.

    Fabsitz RR, McGuire A, Sharp RR, Puggal M, Beskow LM, Biesecker LG, et al. Ethical and practical guidelines for reporting genetic research results to study participants: updated guidelines from a National Heart, Lung, and Blood Institute working group. Circ Cardiovasc Genet. 2010;3:574–80.

    Google Scholar 

  72. 72.

    Wallace SE, Kent A. Population biobanks and returning individual research results: mission impossible or new directions? Hum Genet. 2011;130:393–401.

    CAS  Google Scholar 

  73. 73.

    Burke W, Evans BJ, Jarvik GP. Return of results: ethical and legal distinctions between research and clinical care. Am J Med Genet Part C Semin Med Genet. 2014;166C:105–11.

    Google Scholar 

  74. 74.

    Fernandez CV, Kodish E, Weijer C. Informing study participants of research results: an ethical imperative. IRB: Ethics Hum Res. 2003;25:12–9.

    Google Scholar 

  75. 75.

    Jarvik GP, Amendola LM, Berg JS, Brothers K, Clayton EW, Chung W, et al. Return of genomic results to research participants: the floor, the ceiling, and the choices in between. Am J Hum Genet. 2014;94:818–26.

    CAS  Google Scholar 

  76. 76.

    Sankar PL, Parker LS. The Precision Medicine Initiative’s All of Us Research Program: an agenda for research on its ethical, legal, and social issues. Genet Med. 2017;19:743–50.

    Google Scholar 

  77. 77.

    Wong CA, Hernandez AF, Califf RM. Return of research results to study participants: uncharted and untested. JAMA. 2018;320:435–6.

    Google Scholar 

  78. 78.

    Macdonald K, Germine L, Anderson A, Christodoulou J, McGrath LM. Dispelling the myth: Training in education or neuroscience decreases but does not eliminate beliefs in neuromyths. Front Psychol. 2017;8:1314.

    Google Scholar 

  79. 79.

    Reinecke K, Gajos KZ. LabintheWild: conducting large-scale online experiments with uncompensated samples. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. 2015. p. 1364–78.

  80. 80.

    Xu K, Nosek B, Greenwald A. Psychology data from the race implicit association test on the project implicit demo website. J Open Psychol Data. 2014;2.

  81. 81.

    Thornton MA, Tamir D. Six dimensions describe action understanding: the ACT-FASTaxonomy. PsyArxiv. 2019.

  82. 82.

    Molenaar PC, Campbell CG. The new person-specific paradigm in psychology. Cur Dir Psychol. 2009;18:112–7.

    Google Scholar 

  83. 83.

    Sliwinski MJ. Measurement‐burst designs for social health research. Soc Pers Psychol Compass. 2008;2:245–61.

    Google Scholar 

  84. 84.

    Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annu Rev Clin Psychol. 2008;4:1–32.

    Google Scholar 

  85. 85.

    Russell MA, Gajos JM. Annual research review: ecological momentary assessment studies in child psychology and psychiatry. J Child Psychol Psychiatry. 2020;61:376–94.

    Google Scholar 

  86. 86.

    Heron KE, Smyth JM. Ecological momentary interventions: incorporating mobile technology into psychosocial and health behaviour treatments. Br J Health Psychol. 2010;15:1–39.

    Google Scholar 

  87. 87.

    Sliwinski MJ, Mogle JA, Hyun J, Munoz E, Smyth JM, Lipton RB. Reliability and validity of ambulatory cognitive assessments. Assessment. 2018;25:14–30.

    Google Scholar 

  88. 88.

    Ruderman D. The emergence of dynamic phenotyping. Cell Biol Toxicol. 2017;33:507–9.

    Google Scholar 

  89. 89.

    Ram N, Gerstorf D. Time-structured and net intraindividual variability: tools for examining the development of dynamic characteristics and processes. Psychol Aging. 2009;24:778.

    Google Scholar 

  90. 90.

    Baker JT, Germine LT, Ressler KJ, Rauch SL, Carlezon WA. Digital devices and continuous telemetry: opportunities for aligning psychiatry and neuroscience. Neuropsychopharmacology. 2018;43:2499–503.

    Google Scholar 

  91. 91.

    Onnela JP, Rauch SL. Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health. Neuropsychopharmacology. 2016;41:1691–6.

    Google Scholar 

  92. 92.

    Barnett I, Torous J, Staples P, Sandoval L, Keshavan M, Onnela JP. Relapse prediction in schizophrenia through digital phenotyping: a pilot study. Neuropsychopharmacology. 2018;43:1660–6.

    Google Scholar 

  93. 93.

    McCoy TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatr. 2016;73:1064–71.

    Google Scholar 

  94. 94.

    Bedi G, Carrillo F, Cecchi GA, Slezak DF, Sigman M, Mota NB, et al. Automated analysis of free speech predicts psychosis onset in high-risk youths. NPJ Schizophr. 2015;1:15030.

    Google Scholar 

  95. 95.

    Corcoran CM, Carrillo F, Fernández‐Slezak D, Bedi G, Klim C, Javitt DC, et al. Prediction of psychosis across protocols and risk cohorts using automated language analysis. World Psychiatr. 2018;17:67–75.

    Google Scholar 

  96. 96.

    Murphy E, King EA. Testing the accuracy of smartphones and sound level meter applications for measuring environmental noise. Appl Acoust. 2016;106:16–22.

    Google Scholar 

  97. 97.

    Harati S, Crowell A, Mayberg H, Kong J, Nemati S. Discriminating clinical phases of recovery from major depressive disorder using the dynamics of facial expression. In: Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2016. p. 2254–7.

  98. 98.

    Campbell K, Carpenter KL, Hashemi J, Espinosa S, Marsan S, Borg JS, et al. Computer vision analysis captures atypical attention in toddlers with autism. Autism. 2019;23:619–28.

    Google Scholar 

  99. 99.

    Jones SH, Hare DJ, Evershed K. Actigraphic assessment of circadian activity and sleep patterns in bipolar disorder. Bipolar Disord. 2005;7:176–86.

    Google Scholar 

Download references

Author information




LG drafted the manuscript based on input from RS, SS, and MJS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Laura Germine.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Germine, L., Strong, R.W., Singh, S. et al. Toward dynamic phenotypes and the scalable measurement of human behavior. Neuropsychopharmacol. 46, 209–216 (2021).

Download citation

Further reading


Quick links