Psychomotor function measured via online activity predicts motor vehicle fatality risk

Impaired psychomotor performance severely increases the risk of fatal and non-fatal car accidents. However, we currently lack methods to continuously and non-intrusively monitor psychomotor performance. We show we can estimate psychomotor function at population scale from 16 billion observations of typing speeds during the input of web search queries. We show that these estimates exhibit diurnal variation with a substantial increase during typical sleep times, matching published accident risk rates. Further, we show that psychomotor impairment, as measured by keystroke timing, predicts motor vehicle fatality risk on a population level (Spearman ρ = 0.61; p « 10−10). The methods and results highlight a promising direction of harnessing ambient streams of data, such as patterns of interactions with devices, as large-scale sensors to continuously and non-intrusively monitor human psychomotor performance at population scale.


INTRODUCTION
Motor vehicle crashes are responsible for 1.25 million deaths annually and are the leading cause of death for people of ages 15-29. 1 The risk of car crashes based on operator error increases significantly with insufficient sleep. 2,3 Recent advances in inferring psychomotor function using measures of typing speed of queries during web search enable population-scale estimation of psychomotor impairment. 4 We examine whether psychomotor performance as indicated by slower typing speeds during web search predicts population-level motor vehicle fatalities by locale.

RESULTS
County-level average keystroke timing is strongly correlated with motor vehicle fatalities across 2723 counties (Spearman ρ = 0.61; p « 10 −10 ; Fig. 1a). Controlling for potential confounding factors in a multivariate linear regression, keystroke timing remains a statistically significant predictor of motor vehicle fatalities (p « 10 −10 ; t-test; Adj. R 2 = 0.554; N = 2,555). This correlation is further illustrated through the ordering by keystroke timing for the five largest California counties by population (Fig. 1b). For example, San Bernadino County has the highest average keystroke times and also the largest number of deaths due to traffic accidents (11.57 per population of 100,000). Additionally, the diurnal variation in keystroke timing matches that of published accident risk rates with a substantial increase during typical sleep times. 5

DISCUSSION
Interactions with a web search engine enable inferences about accident risk on a national scale. The study was limited by its cross-sectional and correlational design. Search queries and accident risk data were collected during different time periods and included subjects do not necessarily overlap. However, fatality rates have been highly correlated from year to year Average keystroke timings across all US counties over a 4-month interval (April-July 2016) were computed from archival, de-identified search query logs from the Bing web search engine of data routinely collected for improving search results and permitted through Bing's Terms of Service. Only queries from desktop and notebook computers were used and queries from mobile devices were excluded. Keystroke timing is defined as the time in milliseconds between two key-down events and is estimated from consecutive search engine requests as detailed in Althoff et al. 4 Search engine requests from counties with less than 10,000 keystrokes in total (4.7%) were excluded. The sample includes 16.1 billion timed keystrokes over 2723 counties. Keystroke timing estimates have high precision due to the large number of keystrokes from each individual country (e.g., 393 million keystrokes from Los Angeles county alone).
To control for potential confounding, a multivariate linear regression analysis was performed that included the following factors 6 (using all 2555 those who commute to work alone, who drive longer than 30 min to work each day), fraction of population living in rural area, insufficient sleep (% reporting sleeping less than 7 h per night), fraction of population reporting excessive drinking, and fraction of traffic accident deaths with alcohol involvement.
This study was conducted in accordance with guidance from the Microsoft Ethics Review Board.

Data availability
The data that support the findings of this study are available from Microsoft but restrictions apply to the availability of the data. Data are available from the authors on reasonable request and with permission of Microsoft. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.