Addiction is a complex disease that involves neuropathological, psychiatric, and environmental determinants. As the primary federal agency supporting scientific research on drug use and its consequences, the mission of the National Institute on Drug Abuse (NIDA) is to advance science on the causes and consequences of drug use and addiction ( NIDA strives to improve individual and public health by both strategically supporting and conducting basic and clinical research on drug use, its consequences, and the various underlying mechanisms involved (

Like many other fields, biomedical research aimed at understanding how drugs of abuse alter brain biology and function to engender a state of physical dependence and/or promote the compulsive behavior that characterizes addiction is generating substantial data of various types (imaging, genetic, physiological, electronic health records, etc.) The untapped power of data emerging from these studies lies in their integration, mining and analysis (e.g., effectively integrating genome-wide molecular profiling datasets for addiction with other datasets). The interdisciplinary field of data science evolved from the necessity to extract knowledge and insights from increasingly large and/or complex datasets using new quantitative and analytical approaches. Intra/inter-university and multi-disciplinary collaborations are encouraged for data science projects.

NIDA’s vision for how big data science can be leveraged to reveal new aspects of addiction biology is closely aligned with the NIH Strategic Plan for Data Science ( Specifically, NIDA is focusing on: (1) the integration of existing datasets and tools with those that are being newly developed; (2) making datasets and resources findable, accessible, interoperable, and reusable (FAIR) [1]; (3) ensuring that the algorithms and tools being developed in academia meet industry standards for ease of use and efficiency of operation; (4) data storage and management; and (5) enacting appropriate policies to promote stewardship and sustainability. This approach allows for increased opportunities for secondary data analysis to answer new research questions as well as examine alternative perspectives on the original questions of previous studies, offering cost and time benefits (e.g., reusing and integrating the Treatment Episode Data Set with other studies).

To carry out these goals, NIDA is continuously creating opportunities for the latest computational capabilities to be combined with biomedical research. The NIDA website lists the latest big data funding opportunities (e.g., NOT-DA-19-041). Big data science is inextricably linked with artificial intelligence, particularly machine learning approaches, and such algorithms have played an important role in recent years in making sense of addiction data. Advancements in supercomputing from the petascale to the exascale levels are helping to address issues of scale, complexity, and processing speed, and are allowing for even more opportunities to analyze and model complex biological systems to increase our understanding of addiction, and quantum computing is on the horizon; collaboration/stewardship are encouraged. With the incorporation of data science as a new tool for the study of substance use disorders comes the need for individuals with expertise in computer science, bioinformatics, and mathematics. The integration of data of many different types will enable scientific discovery with regard to the biological and behavioral complexity underlying addiction.

Funding and disclosure

The authors have nothing to disclose.