Rapid advancements in digital technologies and data analytics allow us to both understand people’s health-related behavior and provide personalized health care resources in unprecedented ways.

Digital phenotyping [1] refers to the moment-by-moment quantitative measurement of an individual’s behavior, using data collected via personal digital devices as individuals move through their daily lives. Given the ubiquity of access to digital technology worldwide, digital technologies afford new opportunities to examine individuals’ health behavior through intensive collection data using mobile devices, wearable sensors, continuous monitoring, and mapping digital footprints. Digital technologies can also enable widespread reach and scalability of evidence-based, anytime/anywhere access to personalized interventions [2].

Specifically, mobile technologies enable ecological momentary assessment (EMA), a method that prompts individuals to respond to queries on mobile devices, and which enables “automated hovering” [3], or real-time monitoring, of individuals’ behavior while they engage in everyday activities and over time. The frequent, longitudinal assessment afforded by EMA in naturalistic contexts may clarify the dynamic role of mechanisms, or reveal new mechanisms, of health behavior (or risk behavior).

Digital technologies also enable passive sensing and inference from smartphones or sensing devices worn on the body which is transforming how we understand human behavior, including health behavior. These mobile sensing technologies enable the continuous measurement of physiological and behavioral data in daily life. This sensor data can be wirelessly streamed to a smartphone and processed in real-time on the mobile device to infer information about an individual’s behavior, health, and environment. These data from sensors can be combined with data from self-report EMA assessments embedded on mobile devices for real-time inference of environmental, social, and behavioral conditions. Collectively, this information can then be used to prompt the delivery of interventions in real-time (e.g., interventions that are directly responsive to the health needs of an individual, such as a depressive state or craving of a substance of abuse).

Digitally derived, empirical data can markedly refine and advance our understanding of health behavior. These data allow for the development of dynamic models of health behavior to understand behavior in real-time and in response to changing environmental, social, physiological, and intrapersonal factors. And, they can help us understand when individuals may be most vulnerable and receptive to health promotion interventions. They can, in turn, inform optimal delivery of “Just-in-Time Adaptive Interventions” that provide the right type/amount of therapeutic support at the right time, by adapting to an individual’s changing internal and contextual state.

To date, research on digital phenotyping methodologies—although still in its infancy—offers the promise to enable new insights into several areas of health behavior, including mental health disorders and physical well-being [4].

In this issue, Barnett and colleagues [5] used digital phenotyping to examine clinical relapse events in a pilot study with individuals with schizophrenia. Relapse events were defined as either psychiatric hospitalization or an increase in the level of psychiatric care provided to a given individual. Specifically, the research team used moment-by-moment quantification of an array of individual-level data collected both via periodic EMA data collection and frequent passive sensing (assessing individuals’ mobility, sociability, and clinical outcomes) collected via participants’ own smartphones.

Although the sample size in this study was small, relapse events were rare, and the observation window was limited, the paper describes some promising associations between behavior of individuals with schizophrenia and clinical relapse. Specifically, individuals who had a clinical relapse event had a 71% higher rate of anomalies detected in their passive data streams two weeks prior to the relapse (relative to the rate of anomalies detected at dates further away from the relapse event).

Although interesting and promising, the state of this area of scientific inquiry is still quite early stage and clearer predictive relationships between data captured via digital technology, and clinical outcomes are needed before this approach is ready to potentially inform intervention delivery.

That said, this paper highlights an especially important issue that is infrequently addressed in the digital health literature, namely, that a main bottleneck in much of the current digital phenotyping work is not due to technical challenges but more due to the lack of sufficient statistical methodology. Despite rapid advances of digital technologies which allow for novel granular and voluminous data capture about individual’s health status, the advancement of statistical methods to analyze and derive meaning from these intensive data has not advanced at a comparable pace. Although some pioneering work is being conducted to develop quantitative methods to glean insights from digitally derived data [6], many researchers continue to use statistical approaches that have been applied to more traditional, less intensive data and/or different data structures [7]. As highlighted in the paper by Barnett and colleagues, data analytic models need to embrace the “volume, variety, and velocity” of smartphone data. Failing to do so, can lead to drawing partial or incorrect conclusions from the data and result in lack of replicability of findings.

Barnett et al. thoughtfully discuss their data analytic approach and appropriately recognize the risk of overfitting (and thus limited generalizability of results) that may result from some statistical models. The analytic model they selected to use (focused on anomaly detection) is an example of the types of statistical methods that may be applied and refined to more precisely analyze and interpret digital health data.

Identifying appropriate quantitative analytic models is not the only remaining gap in the realm of digital phenotyping. Evolving research in this realm will help us understand how well digital data can accurately characterize individuals’ health status and clinical trajectory. It will help us understand if such data will indeed be useful in triggering in-the-moment, responsive interventions (including increasing our understanding of the array of false positive and false negatives that may emerge in such automated intervention delivery in response to an individual’s data patterns).

And further, and perhaps most importantly, increased attention to privacy and ethics are sorely needed in the digital phenotyping space as it evolves. For example, questions such as who owns the data? When, where, and with whom are data shared? How do consumers, clinicians, payers, governmental, and industry stakeholders develop best practices for digital health in the space of public health? And, how do we engineer digital health technology to be secure, respect individual privacy, and be usable by people with limited technology expertise? These remain critical research opportunities and needs in the future of digital phenotyping.