Within the field of epidemiological research on drug dependence (DD) in cross-sectionally observed community samples, there is a tradition of plotting the number of days of drug use (DU) on the x axis and the estimated probability of being drug dependent on the y axis (Apelberg et al, 2014; Chen et al, 1997; Chen and Kandel, 2002; Esser et al, 2014). The result is a figure that resembles a dose–response or dose–effect curve from laboratory experiments designed so that dosage level or number of dosage days are ‘fixed effect’ exogenous explanatory variables that are under the control of the experimenter. In truth, the variable on the x axis in epidemiological research generally is an endogenous variable that is interdependent with the probability of being drug dependent, with likely feedback loops, as explained elsewhere (Anthony, 2010).

In this research project, we try to push this tradition of DD epidemiology in a new direction, harnessing a functional analysis approach that can be extended from cross-sectional snapshots to dynamic longitudinal data. With focus on newly incident users (as opposed to prevailing or ‘prevalent’ users), our hope is to enhance understanding of heterogeneous published estimates and to present new estimates needed to influence study designs and to guide future studies. Our primary research focus is the drug-by-drug comparison, but as explained below, we also illustrate how the approach might be used to study subgroups of drug users, with an example based on possible male–female differences.

At this stage in the research, we cannot solve the problem of feedback loops, but we can impose a partial constraint by focusing estimation on the first months after onset of newly incident DU when feedback effects might be trivial. During these months, no more than a small minority subset of drug users has developed patterns of sustained daily use that sometimes typify DD cases.

To illustrate, in the first 1–2 years after onset of cocaine use, no more than ~5%–6% of cocaine users develop a syndrome of cocaine dependence. For cannabis and for alcohol, the corresponding transition probabilities are well below 4% (Wagner and Anthony, 2002). For this reason, in this project we restrict the epidemiological samples to individuals for whom no more than 12 months has passed since first extra-medical use of the drug. Here, the adjective ‘extra-medical’ refers to DU for feelings such as ‘to get high’ and otherwise using the drug outside boundaries that a prescribing clinician might intend in relation to approved indications and acceptable clinical practice.

By using the term ‘functional analysis,’ we refer to a nonlinear statistical model based on a known mathematical equation with parameters for relationships of interest. In the present study, we investigate occurrence of DD syndromes as clinically meaningful responses as might be influenced by the frequency of drug taking soon after initiation of extra-medical DU. The relationship of interest is expected to be ‘S’-shaped (a sigmoid), as in most dose–response curves of pharmacology or toxicology. Our approach is to estimate a parsimonious set of epidemiologically meaningful parameters that characterize this ‘S’-shaped relationship, based on a nonlinear four-parameter Hill equation. Using this approach, we seek to simplify drug-by-drug comparisons in the comparative epidemiology of DD, specifying our response variable to be the empirical probability of a DD syndrome. The approach can be extended readily to other responses (eg, different dependence features).

As distinct from concepts of LD50 and ED50, one of the Hill function parameters estimated under this model describes the estimated number of days of DU corresponding to the midpoint DD location or half-probability of dependence, which we label as PD50. The PD50 estimate can be interpreted as a half-way point in the direction of an upper asymptote, Pmax, where Pmax describes a limiting probability of becoming drug dependent when there is daily DU, as observed when the newly incident drug user is assessed (with an interval of up to 12 months since first use). The observed Pmax estimate for cocaine might well approach 100%, but our expectation is that Pmax might have lower values for other drugs such as cannabis and alcohol. In some respects, the upper asymptote is akin to a ‘saturation point’ in pharmacokinetics data, possibly not attained within the range of the observed x values (ie, within the observed range of continuous daily DU). Nonetheless, these PD50 estimates can still be derived using Hill equation fits and used for rank-order analyses drug-by-drug (eg, Engel et al, 2013; Frank, 2013; Prins et al, 1999).

Most epidemiological methods used to study sigmoid relationships have relied either on dose–response epidemiologic meta-analysis (Bagnardi et al, 2014; Berlin et al, 1993) or on one or more nonparametric techniques (Bagnardi et al, 2004; Royston, 2014; Royston and Sauerbrei, 2007). A restrictive meta-analysis assumption is that dose–response relationships are linear on the natural logarithm scale. The nonparametric approach suffers, because its lower and upper asymptotes for the ‘S’-shaped curve are not limited to the (0,1) interval, whereas a practical model should allow neither negative probabilities of DD for small amounts of DU nor probabilities over 1 for large doses. In addition, the nonparametric method essentially is a ‘connect the dots’ approach. It fails to offer a set of multiple parameter estimates we can use to compare one drug versus another.

Pharmaceutical scientists often use parametric functional analysis approaches (Ankomah and Levin, 2012; Black and Leff, 1983; Regoes et al, 2004; Zernig et al, 2007). The same is true for preclinical drug research (Ahmed and Koob, 2005) and for weed science (Brain and Cousens, 1989; Seefeldt et al, 1995). Nonetheless, nonlinear functional modeling largely has been ignored in epidemiological field studies.

In order to draw attention to new lines of research opened up with this functional approach, we provide an illustration from our research on differences in the ‘S’-shaped curve for four different drug subtypes: alcoholic beverages (hereinafter, alcohol), cannabis, cocaine, and prescription opioid pain relievers (PPRs). In the case of PPRs, the domain of inquiry is restricted to extra-medical PPR use as defined above and elsewhere (Anthony et al, 1994).

In addition, pursuing general NIH program interest in male–female differences, we considered how parameter estimates might differ for male and female drug users (Mello, 1986; Sartor et al, 2014; Wagner and Anthony, 2007; Wetherington, 2007; Wilhelm et al, 2014). In prior work, female–male contrasts for risk of becoming dependent assume that age is held constant (eg, see Chen and Kandel, 2002) or hold constant elapsed time since first onset of DU (eg, see Sartor et al, 2014; Wagner and Anthony, 2007). In contrast, here we explore male–female differences in chances of becoming drug dependent conditional on the same rate of DU counted in relation to days of DU in a specified interval before the date of assessment (ie, the 30-day interval just before the date of survey assessment).


Population Under Study, Sampling, and Measurements

The study population consists of non-institutionalized civilian residents of the United States, age 12 years and older, with a range of dwelling units that includes homeless shelters and other non-institutional group quarters, as well as households. Each year from 2002 through 2011, large independent multi-stage area probability samples of this study population were drawn for the National Surveys on Drug Use and Health (NSDUH; n>65 000 each year). NSDUH protocols, approved by cognizant Institutional Review Boards, protected the human subject participants.

The main NSDUH measurement approach has involved completion of audio computer-assisted self-interviews, in either English or Spanish. Standardized multi-item modules have been used to assess broad range of DU and health indicators.

The module items on DU and DD, with criteria from the penultimate Diagnostic and Statistical Manual (DSM-IV; American Psychiatric Association, 2013), are available online These and other details about the NSDUH have been published in multiple prior journal articles (eg, Seedall and Anthony, 2013; Vsevolozhskaya and Anthony, 2014; Ryan et al, 2012; Vaish et al, 2013) and are described in readily available online publications (

Data Source

Owing to confidentiality concerns, non-governmental researchers typically do not have direct access to the NSDUH restricted-use micro-data ( For the present investigation, we fit multiparametric models to data from tabulated summary statistics from the NSDUH Restricted-use Data Analysis System (R-DAS), based on fieldwork completed between 2002 and 2011 ( In 2016, once NSDUH data from 2012 through 2015 have become available, cross-replication of this study’s estimates will become possible. (For this reason, to create an opportunity for independent replication, now we do not use available data from 2012 to 13 in this research report.)

Statistical Analysis: Focus on Newly Incident Drug Users

Within these cross-sectional R-DAS data sets, there are month-by-month histories about DU initiation, making it possible to identify ‘newly incident drug users’ or ‘recent initiates’ (ie, those for whom no more than 12 months have passed between initiation of use and the date of survey assessment). The assessment indicates which newly incident users developed DSM-IV dependence within that first year after initiation, as well as frequencies of DU. The list of the R-DAS variables used to generate tabulated estimates and SEs is available online at R-DAS-tabulated data presented in a user-friendly format are available at SEs are from Taylor series expansion, as described by Heeringa et al (2010).

Statistical Analysis: Hill Model-Fitting

We fit ‘S’-shaped curves to model relationships linking frequency of DU in the month just before assessment and the probability of developing DD within the first year after DU initiation. The mathematical equation describing this sigmoid relationship is a Hill function:

where y[0, 1] is the DD incidence rate, x=0, 1, …, 30 is the frequency of DU as governed by the number of days of DU in the month just before the NSDUH interview date, Pmin is the lower asymptote, Pmax is the upper asymptote, k is the Hill coefficient that measures the steepness of the curve, and PD50 denotes half probability of dependence, (PmaxPmin)/2. Figure 1 provides geometric interpretation of the parameters.

Figure 1
figure 1

The ‘S’-shaped curve with the count of days of drug use on the x axis and estimated risk of developing drug dependence on the y axis. The four parameters that determine the shape of the curve are Pmin, Pmax, PD50 and k. Data from newly incident drug users found in the United States National Surveys on Drug Use and Health, 2002–2011.

PowerPoint slide

The mathematical expression in Equation (1) approximates an ‘ideal’ situation from a deterministic model in which both the predictor values x and the response values y are observed without errors and are ‘true’. In practice, however, the observed survey data come with measurement errors and only approximations of true population values. Combining all measurement errors into a single term ε, we can rewrite equation (1) as

Equation (2) is an example of a nonlinear statistical regression model, because the response is a nonlinear function of the unknown parameters (Pmin, Pmax, k, and PD50). This nonlinear model can be fit using a standard least-squares minimization algorithm implemented in multiple statistical software packages, eg, nls() R function ( For readers not familiar with R statistical software, we refer to the text Ritz and Streibig (2008). We are also sharing our work with others via online access to our R scripts for parameters (Pmin, Pmax, k, and PD50) (

Finally, as this work is a ‘proof-of-concept’ illustration at this stage, we have not modeled multiple covariates or suspected influences such as age-of-onset variations as might be incorporated in future research. The contrasts of male and female subgroups serve to illustrate an approach that can be used when estimating these suspected effects.


Development of DD Syndrome

Nonlinear regression approaches require users to start by supplying a set of possible starting values of the parameters (ie, Pmin, Pmax, k, and PD50). For starting values in this project, we produced ‘eyeball’ estimates while plotting R-DAS-generated DD rate estimates on the y axis and the count of days of DU on the x axis. For all four drug subtypes, we then derived estimates of the risk of developing DD syndrome by recent frequency of DU with corresponding 95% confidence intervals (CIs), as shown in Figure 2. As it happens, few newly incident users progressed to frequent DU within the first year, such that observed 95% CIs became quite wide across increasing values of the count of DU days. We took this condition into account using inversed empirical variances as weights. As noted above, we also restricted parameter values of Pmin and Pmax to be in the allowable closed interval (0,1). As shown in Figure 2, for all drug subtypes the estimated probability of DD among newly incident drug users was observed to be larger across levels of increasing recent DU frequency, with eventual leveling off.

Figure 2
figure 2

Dots and vertical bars are observed empirical estimates for risk of drug dependence (DD) with the corresponding 95% confidence intervals (95% CI). Lines show estimated probability of having developed DD across drug-frequency strata, as estimated via nonlinear regression, and shaded regions show corresponding 95% bootstrap CIs. Data from newly incident drug users found in the United States National Surveys on Drug Use and Health, 2002–2011.

PowerPoint slide

Comparison of the Estimated Parameters Across Drugs

Table 1 shifts focus from qualitative comparison of ‘S’-shaped curves toward the four estimated parameters from the Hill equation, Pmin, Pmax, PD50 and k, with 95% weighted residual bootstrap CIs. Supplementary Information accompanying this paper provides a detailed discussion of the weighted bootstrap procedures we used to obtain the CIs.

Table 1 Parameter Estimates and with the 95% Bootstrap CIs

In terms of the four estimated parameters, one might expect cocaine to have the largest estimated values of Pmin, Pmax, PD50 and k among the drug subtypes considered, given cocaine’s exceptional functional value as a reinforcer and previously observed epidemiological estimates (eg, see Wagner and Anthony, 2002). The observed estimates are consistent with this expectation. For those who used cocaine at least once in the past year but with no use in the month of the interview, ie x=0, the estimated probability of having become dependent is 5% (95%CI=4%, 6%). Nonetheless, the PD50 mid-value is estimated as 18 days of recent cocaine use (95% CI=9, 27 days). For the very small subset of daily users, the DD probability is essentially 1 (100%).

By comparison, parameter estimates for newly incident alcohol users are smaller but noteworthy. Among newly incident drinkers with no recent drinking, the estimated probability of alcohol dependence (AD) is 1%; alcohol’s PD50 estimate is 4 days of recent drinking (95% CI=3, 10 days). Among newly incident users consuming alcohol essentially every day, an estimated 10% qualified as AD cases (95% CI=6%, 17%).

We also present estimates for prescription pain relievers (generally opioid PPRs) and for cannabis, which resemble one another. We note however that the estimates for PPR are based on relatively small numbers of days of DU and small number of cases. The result is low statistical precision and very wide CIs.

Female–Male Contrasts

Figure 3 depicts estimated female–male differences for AD probabilities as a function of recent drinking days. The contrasting Hill equation parameter estimates are not too distant from one another with one exception, PD50. Conditional on the same rate of recent alcohol use, the estimated PD50 is 7 days for females and 11 days for males (p-value for the difference=0.002).

Figure 3
figure 3

Female–male contrasts in estimated probability of alcohol dependence across levels of recent drinking. Data from newly incident drug users found in the United States National Surveys on Drug Use and Health, 2002–2011.

PowerPoint slide

Figure 4 shows estimates for cocaine dependence. Relative to males, newly incident cocaine-using females are more likely to develop cocaine dependence soon after first cocaine use, even among the lowest-frequency newly incident users (Pmin=7% for females vs 3% for males; p-value <0.001). With frequency of cocaine use held constant, women seem to be more likely to become newly incident cases of cocaine dependence. The estimated PD50 is 14 days of recent use for females and 22 days for males (p-value for the difference=0.027).

Figure 4
figure 4

Female–male contrasts in estimated probability of cocaine dependence across levels of recent cocaine use. Data from newly incident drug users found in the United States National Surveys on Drug Use and Health, 2002–2011.

PowerPoint slide

For cannabis and prescription pain relievers, no statistically significant differences can be seen (p>0.05), even though the number of newly incident users is substantial. For the PPR, one explanation for the lack of statistical significance at the conventional level in this female–male contrast is the relatively smaller numbers of newly incident PPR users who are becoming dependent with the first year after onset of extramedical PPR use.


By turning to functional analysis of Hill equations, we offer a novel approach not previously seen in DD epidemiology’s drug-by-drug comparisons. Based on theory and prior studies, our approach assumes that clinically significant dependence syndromes generally do not appear suddenly. Many extramedical drug users try the drug once or a few times and then never use the drug again. In other users, the dependence syndrome takes form in the days and months after initial DU, with incremental strengthening and eventually leveling off of the probability of becoming drug dependent, with a resulting ‘S’-shaped curve resembling dose–response curves of pharmacology and toxicology. The Hill function yields four parameter estimates that help enable drug-by-drug comparisons not previously seen in epidemiological research.

How well do our Hill equation parameter estimates for newly incident drug users replicate what others have found in prior studies with samples of all prevailing users (ie, when no distinctions are drawn between those who just started to self-administer versus those who have been using the drug for more than 1 year). For cocaine, our Pmin estimate for the probability of cocaine dependence being seen among newly incident users with low DU frequency is 5%, not too distant from the 5% to 6% estimate observed elsewhere in the first 1–2 years after first cocaine use, but substantially lower than the estimated 16%–20% value observed when cocaine users are studied many years after first cocaine use, irrespective of DU frequency (Reboussin and Anthony, 2006; Wagner and Anthony, 2002). Studying more recent epidemiological samples of cocaine users, Lopez-Quintero et al (2011) produced a corresponding estimate of 20%–21% for ever-users of cocaine, aggregating ever-users with cases found during two separate assessments. Studying a prevalence sample of daily cocaine users, Chen and Kandel (2002) found that 74% were cocaine dependent.

With respect to alcohol, the life-table approach used by Wagner and Anthony (2002) indicated that as many as 12%–13% of drinkers develop AD within the first 10 years after onset of drinking. The Lopez-Quintero study (Lopez-Quintero et al, 2011) estimated that 22%–23% of ever-drinkers had developed AD, with cases identified via two separate assessments. In Ridenour et al (2006), the estimate of active AD among daily drinkers is 15%. This study’s Hill function estimate for newly incident daily drinkers is 10%, not too distant from the Ridenour estimate for all daily drinkers identified in a prevalence sample (ie, with no restriction to newly incident drinkers).

Turning to cannabis, we note that Wagner and Anthony (2002) found that roughly 8% had become cannabis dependent within 10 years after first cannabis smoking. The Lopez-Quintero study estimate was 9%, once baseline and follow-up cases were counted (Lopez-Quintero et al, 2011). Estimates from Chen et al (2005) suggest that 4% of newly incident cannabis users develop cannabis dependence within 24 months after initiation, with somewhat larger values seen among adolescent-onset users. Studying a prevalence sample of adolescents using cannabis daily, Chen et al (1997) found that 30% were cannabis dependent. More tightly focused on cannabis users studied within 12 months of first use, this study produced a corresponding Hill function estimate for Pmin of 2%, and for daily cannabis users the estimated Pmax is 36%, irrespective of age.

This study’s Pmin and Pmax values for newly incident extramedical users of prescription pain relievers are 4% and 38%, respectively. The corresponding PPR estimate from the Wagner–Anthony life-table approach was 9% for all users, based on experiences accumulated during the first 10–15 years after initiation (Wagner and Anthony, 2002). Studying all prevalent PPR users cross-sectionally, Becker et al (2008) and Novak et al (2009) produced estimates of 13% and 20% for the probability of opioid dependence (OD) among current PPR users. Martins et al (2007) estimated the odds of OD for frequent PPR users (50+ days in the past year) and the odds of OD for infrequent PPR users (1 day in the past year), and derived an odds ratio estimate of 14. The corresponding OR based on our Hill function analysis estimates of Pmin and Pmax for newly incident PPR users is 15.

As for female–male differences, there is a relatively limited epidemiology trace of published estimates (eg, Bobzean et al, 2014; Chen and Kandel, 2002; Kasperski et al, 2011; Sartor et al, 2014). Overall, for cocaine, these investigations suggest that women, in the past, have been less likely to become newly incident cocaine users, and that there has been no pronounced female–male difference in the probability of transitioning from initial cocaine use into the clinically significant state of cocaine dependence (eg, see Wagner and Anthony, 2007). For alcohol, there is some evidence of a more rapid transition from alcohol onset until AD for females (eg, see Hernandez-Avila et al, 2004; Mello, 1986; Zilberman et al, 2003). For cannabis, the female–male contrasts for these estimates are mixed (eg, Chen et al, 1997; Hernandez-Avila et al, 2004). More research on female–male differences is needed for all of these drug compounds, especially for cannabis and for PPR, where the available published evidence is spotty.

Readers might wish to know why we did not include tobacco products in this comparative analysis. The reason is that NSDUH assesses dependence on tobacco products only in relation to the 30 days before assessment. The result is an ‘apples versus oranges’ comparison across drug subtypes. For tobacco, the focus is on dependence in the past month, ie, on the 30 days before assessment. For all other drugs, the focus is on dependence in past year, ie, on the 12 months before assessment. There is no way to recalibrate the NSDUH 30-day interval with its 12-month interval.

Before any further discussion of the Hill model for comparative epidemiological research on this topic, several study limitations deserve attention. Of special concern is the self-report interview data from NSDUH. Most likely, in the context of nationally representative sample surveys on this scale, there are few logistically feasible and affordable alternatives to self-report. Counter-balancing our acknowledgment of limitations in self-report interview data, there is a body of evidence indicating that self-report assessment methods can have reasonable levels of reliability and validity (Del Boca and Darkes, 2003; Vignali et al, 2012). In addition, the NSDUH team conducted detailed method studies on these topics ( These methodological studies do not indicate perfection in the assessments, but the findings from the methods research have been generally supportive.

In addition, cross-sectional data always will be inferior to longitudinal data when fitting Hill functions of this type. Nonetheless, as noted in our introduction, this study’s cross-sectional estimates can serve as starting values for future longitudinal analyses and should help guide newly planned longitudinal research on this topic. To illustrate, from this cross-sectional work, it can be anticipated that exceptionally large samples of newly incident PPR users will be required for effective estimates of Hill function parameters in the longitudinal context. Furthermore, within a few years, it will be possible to seek replication of the cross-sectional estimates, once newer NSDUH data from surveys in 2012–2015 become available. We note that replication of large sample longitudinal study estimates is rare, making us wonder whether large-sample longitudinal study estimates will ever be replicated with independent samples. Nonetheless, cross-sectional study estimates might be of value while we await systematic replication of longitudinal study estimates, because it always will be possible to compare estimates from cross-sectional studies with those obtained via longitudinal research designs.

One additional limitation of cross-sectional data might be avoided by asking newly incident drug users to give a month-by-month, or perhaps a week-by-week report about frequency of use and occurrence of clinical features (ie, week-by-week since onset of first use). In this way, it might be possible to tease apart paths that run from frequency of use toward onset of clinical features versus paths running from clinical features toward frequency of use. To the best of our knowledge, no research project has attempted this intensity of measurement for fine-grained time-sequenced drug experiences. It remains on an agenda for potential future research.

It is also important to remember that this study’s estimates are based on ‘DD’ assessments based on pre-DSM-5 criteria. Large-sample epidemiological studies, often planned years in advance of the actual field assessments, tend to adapt slowly to changes in diagnostic criteria advocated by clinical psychiatrists. NSDUH is no exception. To date, the NSDUH DU disorder assessments are based on pre-DSM-5 diagnostic criteria (American Psychiatric Association, 2013). Once NSDUH assessments are adapted to DSM-5 criteria, it should become possible to repeat analyses of this type and to produce estimates for DSM-5 DU disorders in aggregate, and for ‘DSM-5 drug addiction’ as a separable category.

Conceptualizing extensions beyond this initial study, we propose that one way to accelerate this line of research is to formalize a multivariate vector of DD clinical features, each of which represents a constituent unit of response to drug exposure. We then can substitute this multivariate response vector in place of standardized diagnosis of DD of the type considered in this illustration. An advantage of this substitution is that there is a larger number of newly incident users who experience subjectively felt tolerance or time displacement of other valued non-drug activities, as the drug user starts to ‘spend more time’ on drug activities, as compared with those who qualify as diagnosable cases of DD. Moreover, when the constituent clinical features of DD are studied as responses, an allowance can be made for the possibility of a shift of the response curve to the left for clinical features that take form quite quickly after onset of extramedical use (eg, subjectively felt tolerance, ‘displacement,’ or ‘spending more time (with the drug)’) and a corresponding shift to the right for clinical features that take more time to appear (eg, manifestations of withdrawal on abrupt discontinuation of use). Ultimately, elaboration of the modeling process to allow for heterogeneity of parameter estimates across clinical features will be useful. Here again, estimates based on already available cross-sectional data can be used to frame specific hypotheses against non-null alternatives that otherwise would be constrained to tests for departures from the null.

Other forms of heterogeneity in parameter estimates should be anticipated beyond this work’s initial focus on drug-by-drug variations and male–female differences. To illustrate another source of heterogeneity, recent studies (eg, Kandel and Kandel, 2014) suggest that an allowance should be made for heterogeneity of cocaine dependence processes in relation to ‘pre-treatment’ with nicotine products such as tobacco cigarettes or smokeless tobacco, with the minority of newly incident cocaine users having no past nicotine exposures showing different parameter estimates than the majority of newly incident cocaine users who generally have started to smoke tobacco and use nicotine products before the onset of cocaine use. Even though our project has not taken into account heterogeneity introduced when multiple drugs are used, this initial work with readily available cross-sectional data should have future impact on the field of ‘polydrug use research’ (Lopez-Quintero and Anthony, 2015), as manifest in general NIH policy to promote research on male–female differences of the type seen here when each drug compound has been considered one at a time, without reference to ‘pre-treatment’ effects or concurrent and simultaneous DU.


In addition to peer review assignments mentioned in his financial disclosure statement, Dr Anthony recently completed a grant application review for the American Association for the Advancement of Science, for which he received a small honorarium compensation. The authors declare no conflict of interest.