The current policy discourse over federal infrastructure has reignited interest in the potential for better data and modeling methods to help address the US opioid crisis. Continued increases in overdose deaths, which appear to have accelerated during the COVID pandemic, highlight a need for fundamental change in the collection and use of surveillance data [1, 2]. For example, provisional data from the National Vital Statistics System (NVSS) estimates 90,722 overdose deaths in the 12-month period ending in November 2020, up from 70,357 in the preceding 12 months [3]. The evolving nature of the epidemic further requires linking timely surveillance data to implementation of effective service, treatment, and prevention approaches [4]. Yet at present, the quality and timeliness of US surveillance data often limits data-driven policy and intervention approaches, despite their utility in other health crises [5, 6]. For example, rapid identification of an HIV outbreak in Scott County, Indiana, led to declaration of a public health emergency and establishment of syringe-exchange program to curtail the outbreak [7] and rapid identification of COVID-19 cases continues to be key to determining public health responses to the disease [8]. Yet similar approaches are not currently available at scale for the opioid epidemic.

In this Perspective, we focus on surveillance data needed to contain and eventually end the opioid crisis. We outline current information needs, summarize limitations of existing data, propose complementary surveillance resources, and provide examples of promising approaches designed to meet the needs of data-end users. We do not address use of health-related surveillance resources for other purposes, such as development of etiological models or evaluation of service access, though these are important challenges and could benefit from the approaches we consider.

Surveillance data needs

Estimation of healthcare needs and rational allocation of resources are central surveillance data goals. Without timely data, resource allocation falls prey to nonscientific considerations and its attendant inefficiencies or inequities. Surveillance data and modeling of opioid use disorder (OUD) can improve allocation of prevention and treatment resources by stratifying population groups by risk level or identifying groups with special needs. Although surveillance data can suggest biological risk factors, most known risk factors are environmental. Some examples include sociodemographic characteristics [9], geographical location [10], co-occurring health crises [6, 11], and personal history, such as adverse childhood events [12].

Cross-sectional prevalence estimates and correlates become more valuable when they are complemented with trends in substance- and mental health-related outcomes, psychosocial function, and mortality information [13,14,15]. Trend data can help anticipate needs and redirect resources, as well as help evaluate the effectiveness of interventions, including policy changes and resource re-allocation. Surveillance of emerging hot spots, modeling, and geographic visualization and can be particularly useful in understanding spatial-temporal trends and patterns of opioid use and OUD [16, 17].

Data end-users

One way to think about surveillance is as a goal in itself. An alternative is to consider the needs of its end-users and organize data collection to meet their needs [18]. Among the most important data end-users are public health systems leaders; public, nonprofit and prepaid healthcare systems, such as health maintenance organizations; and other systems, including the criminal justice system, public policy makers, and interested researchers. We summarize some examples below (see also Table 1).

Table 1 Examples of data, potential end-users, and potential policy actions to improve response to the opioid epidemic.

Although there is considerable variation regarding the scope of services they provide, public health systems need data to advocate for resources, and to inform how and where to deploy them [19]. For example, location of naloxone distribution centers can be informed by the geographic distribution of overdoses [20]. Changes in patterns of drug use can also inform resource allocation. Increases in prescription opioid use disorder (POUD), for instance, may require somewhat different approaches than increases in heroin or fentanyl use [21]. While improved training in pain management and reduction of opioid diversion may help curtail POUD, improving detection of fentanyl lacing may be more important to prevention of heroin overdoses.

Public, nonprofit, and prepaid healthcare systems could benefit from timely data to estimate treatment needs, unmet need for emergency services [22, 23], changes in needs [24] and, in the case of population-oriented health systems, plans for prevention [2, 23, 25].

Policy makers need timely data to make resource allocation decisions. This typically involves balancing competing needs, such as other health-related needs, education, or infrastructure [26, 27]. Yet at present, there are no formal mechanisms to reconcile the needs and interests of surveillance systems and data end-users and modelers. This results in inefficiencies and mismatches between data collection efforts and the informational needs of decision-makers. For example, services funding decisions can be preferentially based on wealth or political influence rather than on the number and distribution of overdoses or OUD prevalence.

Constraints on resources for surveillance combined with multiple data end-users, which may have competing interests, suggest the need for a dialogue between data collecting entities and end-users. Such dialogue could lead to better understanding of end-user needs, increased transparency concerning the goals of data collection, better tailoring of actionable data, and improved dissemination of the results to end-users who can then implement informed policies [2, 18]. To our knowledge, there are currently no systematic mechanisms to ensure this dialogue. Appointment of representatives of end-users (e.g., associations of public health officials or health system administrators) to advisory boards overseeing publicly funded surveys could help bridge this chasm. Being explicit about survey goals and about data needs of end-users could also improve convergence between the goals and needs of data generators and end-users. For example, more timely information about overdoses could inform distribution of naloxone, while changes in the prevalence or treatment-seeking patterns for OUD could inform location of new clinics or deployment of mobile treatment units.

In some cases, health systems or public health departments collect data but do not have the infrastructure or resources necessary to analyze them beyond simple descriptive reports, as such analyses would be considered research, and thus outside their purview. Challenges exist in creating cultures within health and public health agencies and systems in which data analysis is viewed as central to their mission. Too often this work is viewed as a diversion of resources rather than fundamental to the evaluation of practices and allocation of resources that are key to targeting data collection and evidence-based approaches to public health. One potential solution for localities that are too small to have their own data analytic departments is to join forces with larger nearby jurisdictions, universities, or other data analytic processing centers to share these tasks.

Data sources

Existing data sources commonly used for opioid surveillance can be classified into five broad categories [28]: (1) national surveys, (2) electronic health records (EHR) and claims data, (3) mortality records, (4) prescription drug-monitoring data, and (5) contextual and policy data.

National surveys with population-based sampling frames collect information on substance use (including opioids), lifetime or current mental disorders (including substance use disorders), treatment patterns, and sociodemographic characteristics of participants. Examples of these include the National Survey on Drug and Health [29], Monitoring the Future [30], and several surveys and data collection systems funded by the Centers for Disease Control and Prevention (CDC) [31], such as the Behavioral Risk Factor Surveillance System [32] and the National Health Interview Survey [33]. Other national surveys collect information from treatment providers, including clinicians, hospitals, and other treatment facilities. Examples of these include the National Survey of Substance Abuse Treatment Services data [34], the National Ambulatory Medical Care Survey [35], and the Healthcare Cost and Utilization Project [36]. These data systems monitor healthcare use and therefore do not directly reflect the underlying incidence or prevalence of disorders, as many people do not use the healthcare system. There are many factors beyond the disorder and its severity that influence treatment seeking and access to care, including racial/ethnic disparities, socioeconomic status, insurance, and service availability [37].

EHR data are available in real-time and often contain information, such as health behaviors or results of laboratory tests that may indicate opioid misuse, but are still challenged by quality issues and limited generalizability due to biases implicit in specific healthcare recording processes [38]. Claims data, which may provide greater coverage than EHR data, tend to have less detailed clinical information and are often expensive to obtain.

Mortality records include the CDC Wonder, the NVSS and the National Death Index (NDI) [39]. CDC WONDER and NVSS support linkages and county-level analyses. The NDI allows for person-level linkages through Social Security Numbers and other personal health information with multiple other databases, including some population-based surveys and treatment utilization data. These linkages have a cost, although the fee can be waived in certain cases for NIH grantees.

Prescription drug-monitoring programs (PDMPs), generally managed by a state authority, collect information on patterns in opioid analgesic prescribing, dispensing, and use. Finally, policy data sources capture information on state opioid policies and thus are generally analyzed and linked using state as the unit of analysis. Contextual data sources are generally used in opioid research to assess state- or county-level factors associated with opioid-related outcomes or to account for time-varying state- or county-level factors that may confound estimation of outcomes in analyses of opioid-related policies.

In addition to these five broad categories, there are a variety of other national, state, or local data sources (e.g., online state opioid dashboards) or proprietary sources such as national poison control centers data, data on surveillance of abuse or diversion (such as the Researched Abuse, Diversion and Addiction-Related Surveillance or RADARS programs) or surveys, such as Ipsos [40] and NORC through its Amerispeak panel [41].

Existing challenges

While existing surveys provide surveillance information on the opioid epidemic, there are several limitations and challenges.

Data timeliness

In light of the fast-moving nature of local outbreaks, such as shifting patterns of illicit fentanyl use, more nimble surveillance methods are critically needed. Current data processing and quality control lead to findings becoming available typically more than 6 months and often more than a year after data collection. These delays limit the value of data for planning and resource allocation. Trade-offs persist between speed and accuracy of data. Faster reporting is associated with lower accuracy, but delayed reporting decreases the ability of data to inform service and policy decisions [42].

One solution involves generation of provisional estimates. For example, the CDC generates provisional estimates of overdoses [3], which are revised as more reliable information becomes available. Models that combine data from provisional estimates and alternative indicators and then are calibrated against slower but more reliable data might be able to generate faster yet relatively accurate estimates [43]. Convenience samples, such as internet panels, can also provide rapid information, but approaches to calibrate them against population-based samples and reweight them to obtain generalizable estimates remains underdeveloped. Analysis of internet search terms or analyses of data from social media are other potential surveillance information streams [44].

In some cases, state-level data using distributed research networks can generate data faster than through federal institutions, while addressing concerns about data sharing and privacy [45].

Sample representativeness

Representativeness of estimates is increasingly constrained by declining rates of survey response [46], non-response bias, and exclusion of important populations at increased risk for substance use and substance use disorders, including especially those in jail or prison, or without stable housing [47]. Exclusion or undercounting of certain populations in the census can distort the survey sampling framework and decrease its representativeness [48]. Furthermore, the accuracy of some data elements, such as cause of death, may be subject to inaccuracies and thereby introduce systematic geographical or racial/ethnic biases in data quality [49, 50]. Establishment of national standards for data collection and reporting has been suggested as a way to improve data quality and minimize geographical variation and biases [28].

Several surveys are focused on the general population. Although they provide useful information regarding prevalence of several mental disorders and behaviors, they underestimate prevalence of opioid use, OUD, and related outcomes. This is because some groups, such as justice-involved populations or homeless individuals, have higher OUD prevalence [48], but are frequently undersampled. Capture–recapture epidemiological methods can improve estimates of prevalence of OUD by leveraging how often an individual with OUD is identified in one or more relevant databases (e.g., healthcare or justice systems) [51]. In the future, it will be important to develop, and implement approaches that capture information from high-risk or hard to reach populations [52], to obtain more accurate overall prevalence estimates and ensure appropriate service planning for these populations. A complementary strategy involves triangulation, such as reweighting convenience samples using reference populations [53, 54] or combining information from different sources, e.g., population-based surveys with treatment data or with data from digital surveillance methods (Twitter, Google, and others) [44].

Populations variability

Geographic or sociodemographic variations in prevalence of opioid use, OUD, and overdoses pose challenges regarding assessment and service planning needs. Jurisdictions often seek local data to inform their decision-making. Except for highly prevalent disorders, however, local data can be sparse and lead to unstable estimates. There is a need to develop and implement methods that combine different data sources, such as Bayesian approaches, that could use national data as priors and incorporate local data to better approximate the prevalence in specific jurisdictions. Combining data from general population surveys with data from the healthcare or justice systems can also generate more stable and precise local estimates.

Simulation models, which have been widely used to estimate the spread and transmission of the COVID-19 pandemic and forecast its population incidence after the implementation of interventions [55, 56], can also be useful for similar purposes in the opioid epidemic [57, 58]. An important area of research will be development of methods to generalize results of simulations and project rates of the epidemic and effects of the interventions in populations other than those used to generate the initial models. More generally, discussions between developers of the simulation models and end-users may be necessary to optimize trade-offs between dedicating resources to collect ever more local data versus funding other community needs, such as increased availability of treatment or prevention services [18].

Limited flexibility

Most existing data collection systems lack the necessary flexibility to adapt to shifts in the environment, including during pandemics or other disasters. Infrastructure or procedures that increase flexibility and responsiveness to changing conditions without substantially increasing the cost may help data collection efforts to become more relevant to changing environments.

Database linkage

Data linkage is critical to achieve most goals of interest to end-users. For example, linking individual overdose rescue data with medical records could provide new insights about risk factors for overdose, as well as more systematic information on the consequences of nonfatal overdoses. Linkage to the location of emergency departments and treatment facilities could inform deployment of additional resources, while combination of EHR, claims data and population-based surveys could shed light on which types of individuals are seeking treatment and which populations may face barriers to treatment. PDMPs could help identify the proportion of individuals with overdoses receive opioids from their clinicians versus illegal markets, whereas contextual and policy data may help determine whether changes in policies are effective in decreasing overdoses.

Despite these potential advantages, the existing infrastructure makes linking databases time consuming and resource-intensive, especially when it involves crossing data systems, raising issues of interoperability [59]. Because linkages are expensive, data collectors and end-users need to establish priorities regarding which data linkages are most likely to yield novel insights and inform critical public health decisions. For example, the benefits of linking individual overdose data with a variety of databases need to be compared with others, such as collecting better data on unreported naloxone rescues (which could help identify population at risk), the overlap between suicide and overdose, the complexity of overdose death with multiple substances detected or the number of individuals who die alone of an overdose.

Linkages can also pose threats to confidentiality that must be carefully managed. Achieving the right balance between data availability and privacy protection has become more challenging as increased computational power and availability of data derived from the Internet has made it easier to reidentify participants. Ongoing efforts including blockchain-based approaches [60] seek to develop privacy preserving record linkages that do not require sharing personally-identifying information among disparate organizations. Yet these technologies are still years away [61]. In other cases, such as studying the effects of policies on populations, it may not be necessary to link databases at the individual level, thereby reducing cost and eliminating concerns over re-identification. Linking at the supra-individual level is generally less resource-intensive, making these linked datasets more readily available to a broad range of researchers and other end-users. Developing infrastructure and methods to facilitate linkages at different levels could leverage use of existing data that are currently suboptimally used.

Longitudinal data

All of these challenges become more difficult in longitudinal surveys and cohorts followed within EHR and claims data. For example, representativeness is further threatened by loss to follow up from non-response in surveys or departures by patients from health systems in administrative data, which adds to cost and complicates design and analysis. As compared to cross-sectional data, longitudinal data also require more complex and timely data processing and analysis. Longitudinal information can also increase risks of re-identification, requiring additional levels of privacy protection.

Promising examples and future directions

There are some promising developments in infrastructure and methods. For example, the Washington/Baltimore Drug Intensity Traffic Area (W/B HIDTA) has developed the Overdose Detection Mapping Application Program (ODMAP) [62]. ODMAP provides near real-time suspected overdose surveillance data across jurisdictions by linking first responders and relevant record management systems (e.g., EMS, law enforcement and healthcare data) to a mapping tool to track overdoses to stimulate real-time response and strategic analysis across jurisdictions. ODMAP is currently only available to government (state, local, federal, or tribal) agencies serving the interests of public safety and health. However, government agencies may choose to provide access to ODMAP to nonprofit agencies, by registering individuals under the government agency. ODMAP offers a promising example of how different data streams can be integrated for better analyses and faster public health response.

Automatic data extraction from electronic medical records (EHRs) or other databases, sometimes combined with natural language processing methodologies, can broaden information sources for opioid surveillance. Distributed data networks such as those used in Medicaid Outcomes Distributed Research Network [45] or the FDA sentinel system [63] allow the combination of EHR data from multiple health systems, while preserving the privacy of the individuals in those systems.

A series of projects have also sought to better integrate data collection, statistical modeling, and public health interventions. For example, a NIDA-funded study is developing predictive analytics models in Rhode Island to forecast future overdose mortality at the neighborhood-level, using publicly available information and data from a multicomponent overdose surveillance system [64]. The results might better inform resource allocation to communities in greatest need of prevention, treatment, recovery, and overdose rescue services.

As part of another initiative, the Massachusetts Department of Public Health has linked 28 government databases to establish an integrated data warehouse to inform the state’s response to the opioid epidemic, including clinical guidance and policy decisions [19]. This initiative uses project-specific identification numbers to minimize risk of re-identification. Partnerships with researchers have led to several influential publications [51, 65]. This model could also be replicated in other jurisdictions.

Other developments rely on alternative or complementary sources of information. For example, wastewater [66] or drug testing samples [67] are being studied as means of estimating population changes in opioid use. Cell phones and virtual technologies also offer alternative means of data collection. Use of wearables that monitor patients’ activities without physical contact (e.g., sleep and heart rate variability) and mailing of biological specimens can also be used in selected samples for more intensive data collection.


Given fiscal constraints on federal and state government budgets and urgent competing public health needs posed by COVID-19, it remains unclear how public health authorities and private healthcare systems will respond to surveillance challenges needed to more effectively combat the opioid epidemic. Traditional data collection systems are slow, labor-intensive, expensive, and have been insufficient to provide the information needed to control the epidemic. There is a pressing need to develop and implement more efficient and nimble approaches. Because improved surveillance would benefit multiple public and private constituencies and end-users, a public debate will likely ensue over how to finance development and implementation of these activities.

Accelerating data reporting will require increasing the capacity of states and other relevant jurisdictions to collect, process, and transmit data, while maintaining quality standards. Because of financial constraints, efforts to improve interoperability and link databases will require difficult decisions about prioritizing linkages. Although focused on opioid surveillance, many of these themes apply, with small variations, to the surveillance of cocaine, methamphetamine, and other substances.

Despite these challenges, there is increased awareness of the need to use data in real-time to inform policy and public health decisions. There is growing interest in collaborations and keen interest in developing new data collection and modeling methods. These trends together with progress in public health learning systems of care offer new possibilities to advance science, combat the opioid epidemic, and save lives.