Substance use disorders (SUDs) are common worldwide, resulting in significant health and economic costs1. It is well established that both genetic and environmental influences shape substance use2, with 50–60% of SUD risk attributed to heritable contributions3. Although specific genetic influences on alcohol use disorders have been widely investigated and documented, genetic influences on other SUDs such as cocaine, opioids, and cannabis are limited despite their high prevalences, leading researchers to highlight the need for additional research on the genetics of those SUDs4. Concurrently, in light of evidence demonstrating immune system influences on neural and behavioral aspects of addiction5, there has been increasing emphasis on immunological and psychoneuroimmunological aspects of SUDs5,6. Here, we bridge those lines of research by evaluating the immunogenetics of six SUDs—namely, alcohol, amphetamine, cannabis, cocaine, opioid, and “other” dependence (a residual category including hallucinogens, inhalants, sedatives, and solvent dependence)—according to their human leukocyte antigen (HLA) profiles.

The HLA region of chromosome 6 codes for two classical types of cell surface proteins that are instrumental in immune system surveillance and elimination of non-self antigens. Class I HLA molecules (HLA-A, B, C) bind and export small peptides from proteolytically degraded cytosolic foreign antigens to the cell surface for presentation to CD8+ cytotoxic T cells, signaling cell destruction. Class II HLA molecules (HLA-DPB1, DQB1, DRB1) present larger peptides derived from endocytosed exogenous antigens to CD4+ T cells to facilitate B cell mediated antibody production and adaptive immunity. The HLA region is the most highly polymorphic region of the human genome7, and variation in HLA has been shown to contribute to variation in disease susceptibility8. HLA-disease associations have been most widely established for autoimmune disorders9; however, HLA associations have been increasingly documented for diseases not traditionally characterized primarily by immune system dysregulation including various psychiatric conditions10. With regard to SUDs, HLA has been implicated as an important genetic factor associated with alcohol dependence11,12 and alcohol-related liver disease13, although methodological limitations have rendered findings of HLA associations with alcohol dependence across studies largely inconsistent14. HLA associations with other SUDs have received modest attention in humans (c.f., ref.15); however, major histocompatibility complex class I (MHCI; HLA Class I equivalent) expression in dopaminergic neurons has been shown to play a key role in suppressing reward-seeking behavior related to cocaine use in mice16, and morphine administration in rats has been shown to suppress MHCII (HLA Class II equivalent) expression17, highlighting the interactions between addictive substances and immunogenetics. Here, we used an immunogenetic epidemiological approach to evaluate associations between the population frequencies of a large number of HLA alleles and the population prevalences of SUDs in Continental Western Europe to begin to elucidate immunogenetic profiles for SUDs (SUDHLA). Furthermore, since SUDs frequently co-occur18, we evaluated associations between SUDHLA profiles to identify immunogenetic influences underlying their co-occurrence. Finally, since each individual carries 12 HLA alleles (two alleles per HLA gene), we used the population scores to estimate individual SUD risk.


Immunogenetics profiles

The immunogenetic scores of the 6 SUDs and the alleles of 6 classical HLA genes A, B, C, DPB1, DQB1, and DRB1 (127 alleles in total) are given in Tables 1, 2, 3, 4, 5, 6, respectively, their frequency distributions are plotted in Fig. 1, and their descriptive statistics are given in Table 7. In the permutation test, where SUD prevalences were randomly paired with HLA allele frequencies, not a single case (out of 1,000,000 runs) was found to match the observed SUDHLA profiles of any of the 6 SUDs, thus rejecting the null hypothesis that the observed profile could be accounted for by chance (P < 1 × 10–6). The same results were obtained in the ranks version of the random permutations test, which relaxed the requirement of an exact SUDHLA match and focused instead on a match of the ranked SUDHLA scores: no cases of an exact match was found, thus rejecting again the null hypothesis that the SUDHLA profiles could be accounted for by chance (P < 10–6). Therefore, we analyzed the 6 sets of SUDHLA scores with the following results.

Table 1 Immunogenetic SUDHLA scores of HLA Class I gene A alleles.
Table 2 Immunogenetic SUDHLA scores of HLA Class I gene B alleles.
Table 3 Immunogenetic SUDHLA scores of HLA Class I gene C alleles.
Table 4 Immunogenetic SUDHLA scores of HLA Class II gene DPB1 alleles.
Table 5 Immunogenetic SUDHLA scores of HLA Class II gene DQB1 alleles.
Table 6 Immunogenetic SUDHLA scores of HLA Class II gene DRB1 alleles.
Figure 1
figure 1

Frequency distributions of immunogenetic SUDHLA scores for the 6 SUDs studied. N = 127 for each distribution.

Table 7 Descriptive statistics of the 6 SUDHLA scores.

Associations between SUDHLA scores

All 15 pairwise correlations of the SUDHLA scores of the 6 SUDs are given in Table 8. Most notable are the high positive correlations between the cannabis and cocaine scores (Fig. 2A), and between the opioid and “other” scores (Fig. 2B).

Table 8 Pearson correlations and associated statistics for the 15 pairwise associations of the SUDHLA scores of the 6 SUDs.
Figure 2
figure 2

(A) cannabis SUDHLA scores are plotted against cocaine use disorder scores (P < 0.001, Table 3). (B) opioid use disorder scores are plotted against other substance use disorder (P < 0.001, Table 3).

Factor analysis of SUDHLA scores

The factor analysis yielded 2 components (with eigenvalue > 1) which accounted for 72.9% of the variance (Table 9; Fig. 3A, scree plot). The correlation between the components was very low (r = 0.099). The specific assignment of the 6 SUDs to the 2 factor analysis components was inferred from the factor analysis structure matrix, which provides the correlations between SUD type and factor analysis component (Table 10). It can be seen that alcohol, amphetamine, opioid and “other” SUDs were primarily associated with Component 1, whereas cannabis and cocaine use disorders were primarily associated with Component 2. This is illustrated in the component plot of Fig. 3B, where it can be seen that alcohol, amphetamine, opioid and other disorders project at high values on Component 1, whereas cannabis and cocaine use disorders project highly on Component 2.

Table 9 Variance explained by components of SUDHLA scores.
Figure 3
figure 3

(A) scree plot of factor analysis of the 6 SUDHLA scores. (B) component plot of factor analysis of the 6 SUDHLA scores. The first component (green cluster) comprises alcohol, amphetamine, opioid, and other SUDs; the second component (magenta cluster) comprises cannabis and cocaine use disorders. See text for details.

Table 10 Structure matrix of the 2 factor analysis components.

Application to individuals: assessment of SUD risk based on the individual’s whole HLA profile

Since each individual carries a total of 12 HLA alleles (2 per 6 classical HLA genes), we used the average (τ, Eq. 2) of the 12 SUDHLA scores of an individual as an estimate of the risk of that individual for a particular SUD. In order to be able to interpret this risk measure, we standardized τ with respect to a large simulated population by generating, for each SUD, a large sample of expected τ* values (N = 1,000,000) using a bootstrap procedure, where τ* was the sum of 12 randomly selected SUDHLA scores (2 per gene). The resulting frequency distributions of τ* were unimodal, approximating a normal distribution; descriptive statistics of these distributions of the 6 SUD τ* values are given in Table 11. The Pearson correlations between the 6 SUD τ* distributions were very similar to those of SUDHLA (Table 8) and are given in Table 12. Similarly, the same factor analysis applied to the τ* distributions yielded the same number of components (Table 13) and component structure matrix (Table 14) as the distributions of SUDHLA scores (Tables 9 and 10).

Table 11 Descriptive statistics of the 6 SUDHLA τ*scores (N = 1,000,000 per row).
Table 12 Pearson correlations and associated statistics for the 15 pairwise associations of the SUDHLA τ* scores of the 6 SUDs.
Table 13 Total variance explained by components of SUDHLA τ* scores .
Table 14 Structure matrix of the 2-component FA for SUD types.

The τ* distribution for alcohol is shown in Fig. 4. The red line is at the level of mean + 2 SD, thus providing an estimated threshold of excessive alcohol SUD risk, along the rationale of using T score in estimating risk related to bone density. We employed a similar approach here and used the z-score of the τ* distribution as a continuously varying risk score. The relevant computations are shown in Table 15, where T thresholds for excessive risk (> mean + 2SD) are given for each SUD. For a given individual, the only information needed to compute their T score is the set of the 12 HLA alleles the individual carries: then, using Tables 1, 2, 3, 4, 5, 6, the average τ score is calculated and its z-score (for a particular SUD) is computed and referred to the threshold(s) in Table 15 for assessment of the risk. As an applied exercise, we calculated the T scores for the best (lowest SUDHLA) and worst (highest SUDHLA) cases by averaging the 2 smallest (for the former case) and the 2 largest (for the latter case) SUDHLA scores from each one of the 6 genes, yielding the τmin and τmax, respectively. The relevant data, τ, and T values are given in Tables 16 and 17, for lowest and highest risk assessments.

Figure 4
figure 4

Frequency distribution of τ* for Alcohol (N = 1,000,000). See text for details.

Table 15 T-score calculation for estimated individual SUD risk.
Table 16 Lowest risk HLA genotypes for the 6 SUD studied.
Table 17 Highest risk HLA genotypes for the 6 SUD studied.


Here we used an epidemiological approach to evaluate the immunogenetic profiles of 6 SUDs and their associations and to estimate individual SUD risk. We documented robust immunogenetic associations between SUDs at both the population and individual level characterized by two groupings—one comprised solely of cannabis and cocaine dependence and the other of alcohol, amphetamine, opioid, and other dependence. These findings, which provide novel evidence of immunogenetic associations with SUDs, are discussed below.

Relatively few studies have focused on HLA-SUD associations, and most previous HLA-SUD association studies have been limited to alcohol use disorders. Several studies in 1980s documented HLA associations with alcohol use disorders and sequelae, although findings across studies were inconsistent14 and that line of research subsequently dwindled; however, recent advances in genetic association studies have renewed interest in the role of HLA in alcohol use disorders. To that end, a recent candidate gene association study identified several single nucleotide polymorphisms (SNPs) in the HLA-DRA gene that were associated with alcohol dependence12, and epigenetic changes of several genes related to inflammation and immune system regulation including HLA have been reported among those with alcohol use disorders19. Our findings suggest that HLA-SUD associations extend beyond alcohol to other addictive substances and highlight immunogenetic groupings among SUDs.

We found evidence of two distinct groups of SUDs based on their immunogenetic profiles. Previous research evaluating genetic and environmental risk for SUDs in twins identified two genetic factors for SUDs that, remarkably, correspond with findings from the present study20. Specifically, similar to our findings, Kendler et al.20 found that cocaine and cannabis loaded onto 1 genetic factor, whereas other SUDs (licit substances including alcohol, nicotine, and caffeine dependence) loaded onto a separate, albeit highly intercorrelated, genetic factor. We did not evaluate licit SUDs other than alcohol in our study although previous research indicates caffeine and nicotine are largely influenced by unique genetic factors20. Two large twin studies of illicit drugs found predominantly common genetic risk shared across illicit SUDs21,22, with modest specific genetic influences on risk for some drugs22. Our findings, which utilize a different approach based on population immunogenetics, extend the literature by evaluating the influence of HLA genes on population and individual risk for SUDs, and document that genes involved in the immune response to foreign antigens are associated with two clusters of SUDs based on immunogenetic profiles.

Cannabis and cocaine dependence formed one HLA-based SUD group that was distinguished from the other grouping containing all four of the other SUDs. It is notable that cannabis and cocaine SUDHLA profiles were very highly correlated both at the population level and for individual SUD risk. Their correspondence was further reflected in analyses identifying the 2 alleles (out of 127) associated with the highest risk of each SUD. For cannabis and cocaine, the high risk alleles were virtually identical (9 out of 12 alleles) and minimally overlapped with high risk HLA alleles for SUDs in the other group (Table 12). In fact, in some cases the genes associated with the highest risk for an SUD from one group were associated with the lowest risk for an SUD in the other group as exemplified by A*02:01 conferring high risk for alcohol and low risk for cocaine (Table 11). Prior studies reviewed elsewhere23 have highlighted links between cannabis and cocaine including similar neuropharmacological actions of cannabinoids and cocaine, and endocannabinoid system involvement in cocaine addiction. In contrast, cannabidiol has been shown to inhibit the reward-facilitating effect of opioids and other substances that were part of the second cluster in the present study24,25. Beyond differences in brain reward effects, the separate clustering of cocaine and cannabis from the other SUDs investigated here documents immunogenetic differences between the two clusters of SUDs. Similar to the cannabis-cocaine grouping, the finding that alcohol, amphetamines, opioids, and “Other” dependencies clustered together suggests common HLA associations amongst those SUDs that differ from the cannabis-cocaine cluster. Indeed, the SUDHLA correlations among alcohol, amphetamine, opioid, and “Other” were considerably stronger than their associations with cannabis or cocaine. Taken together, the present findings highlight similarities and difference in the immunogenetic profiles of SUDs. The HLA-SUD associations documented here are particularly interesting in light of research on immunotherapies for treatment of addictions including several Phase I and II clinical trials evaluating the effectiveness of anti-addiction vaccines and antibodies aimed at preventing drugs from reaching the brain and activating reward centers26.

To our knowledge, this is the first study to evaluate immunogenetic profiles of SUDs and their associations. The findings of this immunogenetic epidemiological study provide novel insights regarding HLA-SUD associations; however, the findings must be considered within the context of study limitations. First, this was an epidemiological study. We utilized the population level SUDHLA scores to estimate individual risk although future studies are warranted to determine whether the individual risk estimates are corroborated in vivo. Second, the current study focused on Continental Western Europe. Since geographic and ethnic variation in HLA are well-established27,28 and SUD prevalence varies globally1, the HLA-SUDs associations identified here may vary in other regions. An additional consideration involves reporting of illicit substances. Some individuals may be hesitant to disclose problematic substance use which may impact estimates; however, potential reporting biases are somewhat mitigated by the fact that population estimates of SUDs used in the present analyses were obtained from the Global Burden of Disease study which is the most comprehensive epidemiological study of diseases including SUDs. Finally, many other genetic and environmental factors not investigated here contribute to SUDs; how those factors interact with HLA to influence SUD prevalence remains to be investigated.

Materials and methods

Epidemiological data

Prevalence of substance use disorders

The population prevalence of alcohol use disorder, amphetamine use disorder, cannabis use disorder, cocaine use disorder, opioid use disorder, and other drug use disorders in 2019 was computed for each of the following 14 countries in Continental Western Europe (CWE): Austria, Belgium, Denmark, Finland, France, Germany, Greece, Italy, Netherlands, Portugal, Norway, Spain, Sweden, and Switzerland. Specifically, the total number of people with each SUD in each of the 14 CWE countries was identified from the Global Health Data Exchange29, a publicly available catalog of data from the Global Burden of Disease study, the most comprehensive worldwide epidemiological study of more than 350 diseases. The number of people with each SUD in each country was divided by the total population of each country in 201630 and expressed as a percentage. We have previously shown that life expectancy for these countries is virtually identical31; therefore, life expectancy was not included in the current analyses.


The frequencies of all reported HLA alleles of classical genes of Class I (A, B, C) and Class II (DPB1, DQB1, DRB1) for each of the 14 CWE countries were retrieved from the website (Estimation of Global Allele Frequencies)32,33 on October 20, 2020. As we reported previously31, there were 844 distinct alleles, i.e. alleles that occurred in at least one country. Of those, 127 alleles occurred in 9 or more countries and were used in further analyses. This criterion is somewhat arbitrary but reasonable; it was partially validated in a previous study34.

Data analysis

HLA-SUD profiles

HLA-SUD profiles for each SUD disorder above were derived by computing the covariance between the nonparametric normal scores35 of the prevalence of a SUD and those of the population frequency of an allele, comprising 69 HLA Class I and 58 Class II alleles, for a total of 127 alleles. The covariance can be negative or positive, indicating a negative or positive association, respectively. The equation for the HLA-SUD score is:

$${\text{SUD}}^{{{\text{HLA}}}} \;{\text{score}} = \frac{1}{N - 1}\sum\limits_{i}^{i = 1,N} {\left( {f_{i} - \overline{f}} \right)\left( {p_{i} - \overline{p}} \right)}$$

where \({f}_{i}, {p}_{i}\) denote the normal scores of allele frequency and SUD prevalence for the ith country, respectively, and \(\overline{f },\overline{p }\) are their means. Thus a SUDHLA profile is a vector with 127 HLA-SUD scores. It should be noted that covariance is a descriptive measure of interdependence not subject to formal statistical testing and one that has been used routinely for many years routinely and successfully in other fields, including evolutionary biology36 and finance37.

Standard statistical methods were used to analyze the HLA-SUD scores, including parametric univariate (mean, standard deviation, etc.), bivariate (Pearson correlation), multivariate (factor analysis), and permutations-based statistics.

Random permutations test for assessing the statistical significance of the HLA-SUD profiles

In this analysis, we tested the null hypothesis that the HLA-SUD profiles may be due to chance by performing a permutation test, where the pairing of allele frequencies and SUD prevalences was randomly scrambled. More specifically, let \(H\) be HLA-SUD profile for a specific SUD, and let H′ be the profile obtained after random pairing of pairing of alleles and countries. If the profiles are identical, the sum S of the absolute paired differences between them (\(H\),H′) would be zero. We carried out this procedure 1,000,000 times for each one of the 6 HLA-SUD profiles and counted the number of times M for which the sum S was equal to zero, indicating that the randomly obtained profile would be the same as the observed one. Then, the ratio \(w=\frac{M}{\mathrm{1,000,000}}\) is the probability that the observed profile \(H\) could be due to chance. In a relaxed variation of the test, we computed the sum of the absolute differences between the ranked profiles.

Factor analysis

A factor analysis (FA) were performed to identify potential groupings (“components”) of SUDHLA scores. The method of principal components was used for extraction and the method of direct oblimin (delta = 0) with Kaiser normalization was used for factor rotation.

Application to individuals

Since every individual carries k = 12 classical HLA alleles (2 of each 3 HLA Class I and 3 Class II genes), average SUDHLA scores were calculated:

$$\uptau =\frac{1}{12}\sum_{k}^{k=\mathrm{1,12}}{\mathrm{SUD}}^{\mathrm{HLA}}(k)$$

We obtained expected estimates of τ using a bootstrap procedure38, as follows. For each HLA gene and SUD, two SUDHLA scores were drawn randomly (with replacement) from the pool of available alleles and were averaged to yield bootstrap values of τ* for a simulated “individual”. The procedure was repeated 1 million times for a total of 1,000,000 τ* values which were used for further analyses. The same random seed was used for each draw of the 12 SUDHLA values, such that SUDHLA values for all 6 SUDs referred to the same set of alleles, thus allowing for an assessment of associations between τ* distributions.

Implementation of analysis procedures

The IBM-SPSS statistical package (version 27) was used for implementing standard statistical analyses. All P values reported are 2-sided. The permutation test and bootstrap procedure was implemented using FORTRAN (Geany, version 1.38, built on or after 2021–10-09) and 64-bit Mersenne Twister random number generator with a large random double-precision odd seed.