## Main

We developed a multi-ancestry meta-regression method to meta-analyse ancestrally diverse genome-wide association study (GWAS) summary statistics from 60 cohorts with 3,383,199 individuals (Supplementary Table 1; see Supplementary Fig. 1 for an overview of the project), representing major clines of recent human ancestry (Fig. 1a). The meta-analytic method used here uses meta-regression to account for per study axes of genetic ancestry variation combined with a random effect to capture further unexplained heterogeneity in the effect of a given genetic variant. Although ancestry here is continuous, we also performed secondary analyses of continental groups reflecting four ancestry clines, including individuals of African (AFR; maximum n = 119,589) and American (AMR; n = 286,026) recently admixed ancestries primarily from the United States; individuals of East Asian ancestries (EAS; n = 296,438) primarily from the United States, People’s Republic of China and Japan; and individuals of European ancestries (EUR; n = 2,669,029) from the United States, Europe and Australia (see Extended Data Fig. 1 and Supplementary Note). Smoking phenotypes were selected to represent different stages of tobacco use and addiction, including initiation, the onset of regular use, amount smoked and cessation. Measures of onset included whether an individual ever smoked regularly (smoking initiation (SmkInit); n = 3,383,199) and the age at which the individual began smoking regularly (AgeSmk; n = 728,826). Amount smoked among current and former regular smokers was measured as cigarettes smoked per day (CigDay; n = 784,353). Smoking cessation (SmkCes; n = 1,400,535) contrasted current versus former smokers. Alcohol use was widely available across most studies, measured as drinks per week (DrnkWk; n = 2,965,643).

## Multi-ancestry meta-analysis

Using our multi-ancestry meta-analysis, we identified 2,143 associated loci across all phenotypes (sentinel variant P < 5 × 10−9), with 3,823 independently associated variants (Extended Data Fig. 2, Supplementary Tables 2 and 3 and Supplementary Figs. 2 and 3). Of these, 1,346 loci and 2,486 independent variants were associated with SmkInit, 33 loci (39 variants) with AgeSmk, 140 loci (243 variants) with CigDay, 128 loci (206 variants) with SmkCes and 496 loci (849 variants) with DrnkWk. Approximately 64% (n = 1,364) of loci were phenotype-specific, five loci were associated with all four smoking phenotypes but not with DrnkWk, and five loci were associated with all five phenotypes. All sentinel variants within identified loci had high posterior probabilities that their effect would replicate in a sufficiently powered study according to a trans-ancestry extension of our GWAS cross-validation technique6. Only 17 sentinel variants (0.7%) had such posterior probabilities of less than 0.99 and were therefore removed from the counts above and from further consideration (additional details on these 17 variants are shown in Supplementary Fig. 4).

Inclusion of diverse ancestry may improve the discovery of new variants through a combination of increased genetic variation, larger sample sizes and improved fine-mapping due to diverse patterns of linkage disequilibrium (LD). We quantified gains in power from the use of our multi-ancestry model over a simpler ancestry-naive fixed-effects model excluding the ancestry meta-regression. Comparing the number of associated variants, we found 721 additional independent variants that were identified only by the multi-ancestry meta-regression analysis. Both sets of models were fit to the same data, such that the larger number of significantly associated variants identified with the multi-ancestry model indicates increased power from accounting for axes of genetic variation and residual heterogeneity. Included among these 721 were newly associated variants in genes related to nervous system function (for example, NRXN1) including glutamatergic (GRIN2A) neurotransmission, which is of relevance to neurocircuitry in addiction7,8.

To isolate likely causal variants, we used a fine-mapping procedure (see Supplementary Note) that leverages variation in LD across ancestry groups to construct 90% credible intervals. We identified 597 loci (27.9%) in which the 90% credible intervals included fewer than five variants, including 192 loci (9.0%) with a single fine-mapped variant. Overall, credible intervals contained medians of 9–19 variants and median spans of 32–78 kb across phenotypes (Supplementary Table 4). Compared with the EUR-stratified GWAS (described in the next section), the trans-ancestry fine-mapping increased the number of 90% credible intervals containing fewer than five variants by 27.6%, and containing a single variant by 41.2%. Across all 2,143 loci, 1,330 (62.1%) loci had a reduced number of variants in the credible intervals in the multi-ancestry analysis. To determine the gain in resolution attributable to increased sample size (versus LD differences), we ‘downsampled’ the multi-ancestry analysis by removing EUR ancestry cohorts until the total sample size was approximately equal to that of the EUR-stratified analysis and regenerated fine-mapping results. Using the 1,330 loci with improved resolution in multi-ancestry analysis, we found that the credible intervals were reduced from a median of 22 variants in the EUR-stratified analysis to 12 variants in the downsampled multi-ancestry analysis, suggesting that approximately 55% of the observed improvement in fine-mapping is attributable to larger multi-ancestry sample sizes alone. These findings highlight the utility of both increased sample size and diverse ancestry in fine-mapping variants for these complex behavioural phenotypes. To characterize genes prioritized from fine-mapping, we conducted a series of functional enrichment analyses. We first selected intervals fine-mapped to fewer than five variants from the multi-ancestry results and mapped each variant to the nearest gene to identify ‘high-priority’ genes. Relative to genes mapped from variants with posterior inclusion probabilities (PIP) < 0.01, the high-priority genes were enriched across brain and nerve tissues (Extended Data Fig. 3a and Supplementary Table 5). Within the brain, cell-type enrichment of the high-priority genes was observed for projecting glutamatergic neurons from the cortex, hippocampus and amygdala (telencephalon excitatory projection neurons) and projection GABA neurons from medium spiny neurons of the striatum (telencephalon inhibitory projecting neurons), along with neurons in various subcortical structures such as the hypothalamus and midbrain, consistent with aspects of the mesolimbic theory of addiction7,8 (Extended Data Fig. 3b). Finally, these high-priority genes that were strongly associated with substance use were enriched in gene pathways related to neurogenesis, neuronal development, neuronal differentiation and synaptic function. The neurodevelopmental aspect of the high-priority genes could indicate a role for these genes in processes that predispose individuals to risk of substance use and/or may contribute to brain circuit rewiring during drug use.

The multi-ancestry meta-analysis method also allowed for tests of whether a variant effect size differed (that is, was moderated) by ancestry along four ancestry dimensions estimated from multidimensional scaling (MDS) of allele frequencies from each participating study (Fig. 1a). Roughly, the first axis represents an EAS ancestry cline, the second axis an AFR cline, the third a EUR cline (north to south EUR) and the fourth an AMR cline. There was minimal evidence of effect size moderation by ancestry for most independent variants, ranging from 76.6% (187 variants) in CigDay to 85.0% (175 variants) in SmkCes. Another 7.7–18.1% showed modest evidence for moderation. Finally, roughly 3.6% of all independent variants, reflecting 136 variants from 84 distinct loci, showed strong evidence of effect size moderated by ancestry (complete results are shown in Supplementary Table 2). Comparisons between the variants with strong evidence for effect size moderation by ancestry and those with no evidence suggested that the identification of these 136 variants was not driven to a large extent by differences in imputation quality, LD scores or Fst (fixation index) across ancestries.

Across phenotypes, 88 of these 136 variants showed moderation by the first axis of ancestry variation (approximate EAS cline; Fig. 1b, left), 29 variants by the second axis (approximate AFR cline; Fig. 1b, middle) and 10 variants by the fourth axis (approximate AMR cline; Fig. 1b, right). Nine variants showed differences in effect size moderated by the third axis (EUR cline). Only the effect of one variant was moderated by three or more ancestry clines (EAS, AFR and AMR): rs1229984, a missense variant in the alcohol dehydrogenase gene ADH1B, which has been shown to be protective against alcohol consumption9. An increase on any of these clines was associated with a reduced effect size of this allele, on average. For example, if there are two people who both carry one copy of the protective T allele for this variant but are separated by 1 s.d. on MDS component 1 (EAS cline), the person with a lower value on that MDS cline would be expected to drink 0.3 fewer drinks than the person with a higher MDS value, despite the same rs1229984 genotype in ADH1B.

To further evaluate causal genes and relevant tissues through which associated variants may be operating, we applied a trans-ancestry transcriptome-wide association study (TWAS) analysis to each phenotype across 49 tissues derived from the GTEx Consortium10. Using a P value threshold Bonferroni-corrected for the total number of genes and tissues within a phenotype, we found 1,167 genes significantly associated with SmkInit, 21 genes with AgeSmk, 203 genes with CigDay, 188 genes with SmkCes and 504 genes with DrnkWk (resulting in 1,705 unique genes across phenotypes; Supplementary Table 6). For each of our five phenotypes, matrix decomposition parallel analysis11 of the per-tissue P value correlation matrix suggested two components: one explaining 53.7–55.2% of the variance in P values, and another explaining 3.5–3.8% of the variance in P values. Similar loading patterns were observed for all phenotypes such that all tissues loaded strongly (all loadings > 0.12) on the first component, suggesting that it represents a general effect across tissues, whereas only brain tissues had strong loadings on the second component (all loadings > 0.12), indicating the importance of brain-specific gene expression effects for these tobacco and alcohol use phenotypes. Pathway enrichment analyses of the TWAS-associated genes identified 1,029 unique gene pathways across phenotypes that were broadly enriched across tissues (Supplementary Table 7), including many of obvious relevance to neurotransmission and neurodevelopment.

To further illustrate several variants within genes of interest, we integrated findings described above to select variants for which there was evidence of association across analytic methods and for which the availability of diverse ancestries was clearly relevant. Illustrative variants were chosen in a similar way as described for the enrichment analyses above: (1) we extracted variants from multi-ancestry fine-mapped credible intervals containing less than five variants, and (2) we cross-referenced the resulting variants with the multi-ancestry TWAS cis-expression quantitative trait loci and their significantly associated genes. We highlight five of the 52 genes that resulted from this process.

We found the nicotinic gene cluster CHRNA5–A3–B4 to be significantly associated with SmkInit12 with a fine-mapped 90% credible interval that shrank from 53 variants in EUR-stratified results to just two variants in multi-ancestry results (rs2869055 and rs28438420; Supplementary Table 4). These variants are not in high LD (r2 = 0.31 for both variants) with the well-known variant rs16969968 in this gene cluster. By contrast, this locus was fine-mapped to two variants in high LD with rs16969968 for CigDay (r2 = 0.84 and 0.86), suggesting that the variants underlying this signal for smoking initiation may be distinct from those for cigarettes per day. We also found a novel association between SmkInit and CACNA1B, which encodes a voltage-gated calcium channel (Cav2.2) that controls neuronal neurotransmitter release and has been associated with cocaine reinstatement13, increased aggression and vigilance, and reduced startle and exploration14. CACNA1B is linked to multiple psychiatric disorders, including schizophrenia, bipolar disorder and autism spectrum disorders15,16,17.

CigDay was associated with variants in neurturin (NRTN), a type of glial cell line-derived neurotrophic factor involved in the development and survival of dopamine neurons18. This gene has been studied in relation to Parkinson disease for its potential to restore dopamine neurocircuitry19. Likewise, PAK6 was another novel gene strongly associated with CigDay in TWAS results and was fine-mapped to just three variants in the 90% credible interval. PAK6 encodes a p21-activated kinase that is highly expressed in the striatum and hippocampus, has been implicated in the migration of GABAergic interneurons20 as well as the modulation of dopaminergic neurotransmission21, and is involved in locomotor activity and cognitive function22. PAK6 has been robustly associated with schizophrenia23 and neurodegenerative diseases24,25, such as Parkinson disease and Alzheimer disease, further highlighting its role in synaptic changes. Finally, we found a novel association between ECE2 and DrnkWk. ECE2 is involved in cortical development26 as well as the processing of several neuroendocrine peptides, including neurotensin and substance P27, and may also have a role in amyloid-β processing28. ECE2 also generates peptides such as BAM 12 (which shows κ-opioid receptor selectivity) and BAM 22 (which shows μ-opioid receptor selectivity), suggesting a link with pain transmission27.

## Genetic correlation and polygenic scores

To evaluate heritability, genetic correlation and polygenic scoring, we generated ancestry-stratified GWAS meta-analysis results for each of the four continental groups: AFR, AMR, EAS and EUR (Supplementary Table 2 lists ancestry-stratified loci). Heritability and cross-phenotype genetic correlations were generally similar in sign and modest in magnitude in each ancestry (Fig. 2a and Supplementary Tables 8 and 9). Smoking phenotypes were moderately genetically correlated with each other (|rg| = 0.30–0.63) and with DrnkWk (|rg| = 0.16–0.27). Genetic correlations for the same phenotype between each of the largest contributing cohorts and all remaining cohorts (restricted to EUR ancestries only) were generally high for each smoking phenotype (mean rg of 0.93) and DrnkWk (mean rg of 0.72), indicating that these measures were reliable across cohorts (Supplementary Table 9).

To characterize the multifactorial genetic aetiology of tobacco and alcohol use, we computed genetic correlations of our EUR-stratified results with 1,141 medical, biomarker and behavioural phenotypes from the UK Biobank29 (Supplementary Tables 10 and 11). An affinity propagation clustering algorithm30 was used to aid interpretability by grouping UK Biobank phenotypes such that each of the five current phenotypes were exemplars (Supplementary Fig. 5). SmkInit and AgeSmk clustered together, as did SmkCes and CigDay, with all four forming a broad higher-level smoking cluster. Phenotypes with high positive genetic correlations with SmkInit included addiction to any substance, neighbourhood material deprivation, diagnosis of chronic obstructive pulmonary disease, and a negative correlation with age at first sexual intercourse (|rg| = 0.57–0.64). For AgeSmk, the largest genetic correlations were with reproductive phenotypes such as age at first birth (rg = 0.69–0.71) and measures of years of education and attainment (rg = 0.58–0.69). CigDay and SmkCes were most highly positively correlated with respiratory and cardiovascular diseases and cancers (rg = 0.52–0.72), highlighting their genetic link to adverse disease outcomes. Finally, DrnkWk was most strongly correlated with problematic drinking behaviours (rg = 0.52–0.70), indicating extensive overlap in the genetic architecture of DrnkWk and measures of alcohol use, problems and alcohol use disorder. This is consistent with previous findings of strong but imperfect genetic correlations (for example, rg = 0.8) between alcohol consumption and alcohol use disorder from large-scale GWAS31,32. We note, however, that genetic correlations can be difficult to interpret33,34 as they may be affected by genetic confounding, mediation effects or sampling bias.

We used the ancestry-stratified meta-analysis results to construct ancestry-specific polygenic risk scores in Add Health35, an independent target sample of individuals of diverse ancestries from the United States (n = 2,199 AFR, 1,132 AMR, 525 EAS and 6,092 EUR). To evaluate within-ancestry and across-ancestry performance of polygenic scores, we iteratively fit a multiple regression model and evaluated the incremental predictive accuracy of each ancestry-based score, over and above scores already entered into the model (that is, first including the AMR-based score, then adding the AFR-based, EAS-based and EUR-based scores one at a time to evaluate incremental prediction accuracy). EUR-based scores in EUR ancestries outperformed ancestry-matched scores in non-EUR ancestries (Fig. 2a) and showed significantly stronger associations with most phenotypes in EUR ancestries than in non-EUR ancestries (described by decile plots and tested by modelling an interaction between the EUR-based polygenic risk score and the target sample ancestry group), consistent with expectations36 (Fig. 2b,c). For each ancestry and phenotype, the EUR-based score on its own outperformed the ancestry-matched score on its own (Supplementary Table 12). These results highlight the relative utility of current polygenic scores for EUR ancestries versus all others. In interpreting these results, however, we note that some comparisons may be underpowered to identify differences in the variance explained by polygenic scores between ancestries. Finally, EUR-based scores overpredicted tobacco and alcohol use for individuals of non-EUR ancestry and underpredicted for individuals of EUR ancestry, although this prediction bias is readily eliminated through statistical correction with genetic principal components.

## Summary

Tobacco and alcohol use are heritable behaviours that can be radically affected by environmental factors, including cultural context37 and public health policies38,39. Despite this, we found that a large majority of associated genetic variants showed homogeneous effect size estimates across diverse ancestries, suggesting that the genetic variants associated with substance use affect such individuals similarly. The limited extent of variant effect size heterogeneity, coupled with similar heritability estimates and cross-trait genetic correlations, indicates that the genetic architecture underlying substance use is not markedly different across ancestries. There are some potentially interesting exceptions of ancestrally heterogeneous effects in genes such as ADH1B and CACNA1B. By contrast, polygenic scores generally performed well in EUR ancestries but with mixed-to-limited results in other ancestries, suggesting that portability of such scores across ancestries remains challenging, even when discovery sample sizes across all ancestries are more than 100,000. Explanations for this apparent discrepancy have been proposed40, but more stringent and sensitive tests will be required to draw strong conclusions about such patterns of heredity.

Most individuals of EUR, AFR and AMR ancestries in the current study live in the United States and Europe and share somewhat similar environments regarding tobacco and alcohol availability and policies surrounding use of these substances, and all included individuals were adults. Further increases in genetic diversity and consideration of environmental moderators, including cultural factors, will continue to add to our understanding of the genetic architecture of both substance use and related behaviours and diseases.

## Methods

Here we describe an overview of the methods used to conduct the association, fine-mapping and downstream in silico functional analysis. Additional details can be found in the Supplementary Note.

### Generation of summary statistics and ancestry considerations

Except for TOPMed studies, in which the genetic data were derived from deep whole-genome sequencing, participants in all studies were genotyped on genome-wide arrays. The majority of studies imputed their genotypes to the Haplotype Reference Consortium41 (for EUR ancestries) or 1000 Genomes42 (Supplementary Table 1). GWAS summary statistics were generated in each study sample typically using RVTESTS43, BOLT-LMM44 or SAIGE45 with covariates of sex, age, age squared and genetic principal components according to an analysis plan detailed in the Supplementary Note. Studies composed primarily of closely related individuals (for example, family studies) first regressed out covariates, inverse-normalized the residuals as necessary and then tested additive variant effects under a linear mixed model with a genetic kinship matrix for all phenotypes. Some studies of unrelated individuals followed the same analysis for quasi-continuous phenotypes (AgeSmk, CigDay and DrnkWk), but estimated additive genetic effects under a logistic model for binary phenotypes (SmkInit and SmkCes).

We used terminology and acronyms from the 1000 Genomes Project42 to describe ancestry. The majority of participating cohorts stratified their sample by ancestry before generation of summary statistics. Cohorts composed of substantial samples of multiple ancestry groups provided summary statistics stratified by ancestry, as well as results based on all individuals regardless of ancestry for use in the multi-ancestry meta-analyses. As TOPMed served multiple functions in the present study, including as an LD reference panel, we detailed the ancestry analyses and classification of TOPMed data in the Supplementary Note. For example, for both ancestry-stratified and multi-ancestry conditional analysis, we created TOPMed reference panels for estimating LD. We first created ancestry-stratified reference samples, resulting in matched ancestry reference sample sizes of n = 28,665 AFR, n = 19,737 AMR, n = 4,918 EAS and n = 51,656 EUR. To create a TOPMed-based reference sample for multi-ancestry analyses, we combined the matched ancestry individuals, resulting in a diverse ancestry reference panel (n = 104,976) that matches the ancestry proportions of the included cohorts to estimate LD.

Extensive quality control and filtering were performed on the summary statistics from each cohort. We removed studies with a sample size of less than 100, and those with genomic control values greater than 1.1 or less than 0.9 and a sample size of less than 10,000 (per study sample size and genomic control values are listed in Supplementary Table 1), as well as variants with an imputation quality of less than 0.3.

### Ancestry-stratified meta-analyses

Ancestry-stratified meta analyses were performed using the software package rareGWAMA (see URLs for software use). Specifically, the method aggregated weighted Z-score statistics, that is,

$${Z}_{{\rm{META}}}=\frac{{\sum }_{k}{w}_{k}{Z}_{k}}{{({\sum }_{k}{w}_{k}^{2})}^{1/2}},$$

where Zk is the Z-score statistic in study k. The weight wk is defined by $${w}_{k}=\sqrt{{N}_{k}{p}_{k}(1-{p}_{k}){R}_{k}^{2}},$$ where $${p}_{k}$$ is the variant allele frequency, and $${R}_{k}^{2}$$ is the imputation quality in study k. This method accounts for between-study heterogeneity in phenotype measures, imputation accuracy, allele frequencies and sample sizes.

### Multi-ancestry meta-analyses

Multi-ancestry meta-analyses were performed using mixed-effects meta-regression for optimal trans-ancestry meta-analysis (MEMO) implemented in rareGWAMA (see URLs for software use). The full model is $${b}_{{jk}}={\sum }_{l=0}^{L}{C}_{{lk}}{\gamma }_{{jl}}+{e}_{{jk}}+{{\epsilon }}_{{jk}},$$ where $${b}_{{jk}}$$ is the genetic effect estimate for the jth variant in the kth study, and $${C}_{{lk}}$$ is the lth ancestry component for the kth study. Note that we set $${C}_{0k}=1$$, so $${\gamma }_{j0}$$ serves as the intercept. The regression coefficient $${\gamma }_{{jl}}$$ captures the effect of the lth axis of genetic variation for the jth variant, with $${\gamma }_{j0}$$ as an intercept in the model, and $${e}_{{jk}}\sim N\left(0,{\tau }^{2}\right)$$ is the random effect that captures unexplained effect size heterogeneity after adjusting for genetic variation. Finally, $${{\epsilon }}_{{jk}}\sim N\left(0,{s}_{{jk}}^{2}\right)$$ is the random error term, where $${s}_{{jk}}^{2}$$ is the variance of the genetic effect estimate $${b}_{{jk}}$$. This method models heterogeneity of effects attributable to ancestry as well as a random effect to capture residual heterogeneity. The MEMO model contains fixed-effect, random-effect and meta-regression models as special cases. Specifically, removing the random effect $${e}_{{jk}}$$ results in a regular meta-regression model, removing the covariates of genetic variation ($${C}_{{lk}})$$, but retaining $${e}_{{jk}}$$ results in a random-effect meta-analysis model, whereas removing both $${e}_{{jk}}$$ and $${C}_{{lk}}$$ results in a fixed-effect meta-analysis model.

Per study ancestry variation, $${C}_{{lk}}$$ is calculated using MDS on the basis of allele frequency. We defined the genetic distance between two studies, that is, study k and k′, with J variants, as $${d}_{k{k}^{{\prime} }}=\sqrt{{\sum }_{j}{\left({f}_{{jk}}-{f}_{j{k}^{{\prime} }}\right)}^{2}},$$ where $${f}_{{jk}}$$ and $${f}_{j{k}^{{\prime} }}$$ are the allele frequency for the jth variant for study k and k′, respectively. We fit models with 0, 1, 2, 3 and 4 MDS components and combined the results using a minimal P value approach (see Extended Data Fig. 1a for a visual representation of the first four MDS components).

To better ensure robustness, for each phenotype, we filtered variants from the meta-analytic results to variants that were present in at least three studies, had an effective sample size (sample size multiplied by imputation accuracy) to maximum sample size ratio of ≥ 0.1, and minor allele frequency (MAF) > 0.001 in the multi-ancestry and EUR-stratified meta-analysis or MAF > 0.01 for AMR-stratified, AFR-stratified and EAS-stratified meta-analysis, given the expected drop off in imputation accuracy for those ancestries. These filters reduce potential artefacts arising from sparse data or poor imputation and retain variants with reasonable statistical power.

With increasing imputation accuracy and the inclusion of variants with MAF down to 0.1% (for EUR), genome-wide significant variants were identified using a threshold of P < 5 × 10−9, to account for approximately 10 million independent tests. The threshold was chosen based on previous work on low-frequency variants5,46,47. All statistical tests are two-sided unless otherwise stated.

### Robustness and replicability of signals

We applied genomic control correction for low-frequency variants (MAF < 1%) in both multi-ancestry and ancestry-stratified meta-analyses. Genomic control correction for common variants was not applied given that elevation of genomic control values is expected with high polygenicity (that is, it assumes sparsity) and very large sample sizes48; such a correction may be overly conservative. To evaluate this decision, we estimated the replicability of associated loci using a trans-ancestry extension of an existing method6. This method, ‘RATES’, incorporates cohort-level summary statistics (single-nucleotide polymorphism (SNP) effect sizes and their corresponding standard errors), along with allele frequency-based MDS components per study to assign a posterior probability that each sentinel variant effect would replicate in a sufficiently powered study. To further evaluate robustness of our results, we estimated LD score regression (LDSC) intercepts and attenuation ratios to account for bias in the intercept test when sample sizes become extreme, as in the present case. Results were within expected limits and consistent with a limited effect of population stratification on the meta-analysis results44 (Supplementary Table 8). Then, we compared the sign of SNP effect size estimates between EUR-stratified results and within-sibling GWAS results from the UK Biobank, finding sign concordance estimates of 63.4–80% across phenotypes, all of which were significantly higher than would be expected if our results were driven entirely by population stratification or cryptic relatedness and were consistent in magnitude with other large-scale association studies49. Finally, given reduced power in the within-sibling GWAS, we additionally compared the sign of SNP effect size estimates between EUR-stratified 23andMe summary statistics (the largest participating cohort) and EUR-stratified summary statistics with all cohorts except 23andMe, finding sign concordance estimates of 94.3–100%. See the Supplementary Note for further details on the methods and full results, including the list of excluded variants and loci.

### Conditional analyses and locus definitions

We performed sequential forward selection to identify independently associated variants in each locus50 for ancestry-stratified and multi-ancestry results. The procedure begins by including only the top association signal into a set of independently associated variants (ϕ) per locus. Conditional analysis is then conducted on the remaining variants, conditioning on variants in ϕ. If any of these conditional signals remained significant (that is, P < 5 × 10−9), we added the top signal to the set ϕ. The process iterates until there are no remaining significantly associated variants. The method requires an external genomic reference panel to estimate LD patterns. For ancestry-stratified conditional analyses, we used ancestry-matched individuals from TOPMed to estimate LD (sample sizes given previously). For multi-ancestry conditional analyses, we used the diverse ancestry TOPMed reference panel (n = 104,976) that matched the ancestry proportions of the included cohorts.

Loci were defined based in part on the conditional analysis, using a multi-step approach. First, consistent with previous GWAS meta-analysis5 in EUR ancestries, we identified all 1-Mb windows surrounding sentinel variants and collapsed overlapping windows. This resulted in a total of 1,449 such windows. For each window, we then used our ancestry-aware conditional analysis51 (described previously) with an ancestry-matched reference panel from TOPMed to enumerate all independent variants within each window. Then, for each independent variant, we defined a locus as the region including all variants in LD of r2 > 0.1, based on the same ancestry-matched TOPMed reference panel (Supplementary Table 3 and Supplementary Fig. 3). Overlapping loci were then collapsed. This procedure avoids conventional definitions of a locus based on work in EUR ancestries and is tailored to the multi-ancestry data at hand.

### Allelic effect size moderation

We evaluated evidence of effect size moderation by ancestry in the multi-ancestry model for each independent variant. To do so, we extended the MEMO model into a mixture model that separated variants with homogenous effects (models with only an intercept term) from those with possible heterogeneous effects (on at least one axis of genetic variation). We considered six sub-models including the null model, and the models in which the number of included components varied from 0 to 4.

$$L(\,y)=\prod _{a}{p}_{a}^{{\rm{N}}{\rm{U}}{\rm{L}}{\rm{L}}}p({b}_{j}|{\rm{N}}{\rm{U}}{\rm{L}}{\rm{L}})+{{\rm{p}}}_{{\rm{a}}}^{{\rm{A}}{\rm{L}}{\rm{T}}}\sum _{{\rm{j}}\in {{\rm{S}}}_{{\rm{a}}}}[{{\rm{q}}}_{{\rm{j}}0}\,{\rm{p}}({{\rm{b}}}_{{\rm{j}}}|{\rm{M}}{{\rm{R}}}_{0}(\,{\rm{j}}))+\ldots +{{\rm{q}}}_{{\rm{j}}4}\,{\rm{p}}({{\rm{b}}}_{{\rm{j}}}|{\rm{M}}{{\rm{R}}}_{4}(\,{\rm{j}}))],$$

where $$p({b}_{j}|{\rm{N}}{\rm{U}}{\rm{L}}{\rm{L}})$$ and $$p\left({b}_{j}| M{R}_{l}\right)$$ are the likelihoods of the variant j effect sizes under the null model and the meta-regression models with l axes of genetic variation, respectively; $${p}_{a}^{{\rm{NULL}}}$$ and $${p}_{a}^{{\rm{ALT}}}$$ are the probabilities of locus a carrying zero or at least one causal variant, respectively. The term $${q}_{{jl}}$$ is the probability that the model with l axes of genetic variation best fit the data. We selected the model with the largest posterior probability for each variant as the best-fitting model to capture the genetic effect heterogeneity. Variants in which the zero component model was selected (that is, all models with at least one component were rejected) were considered to have homogeneous effects across ancestry. Among the remaining variants, we considered which one of the meta-regression models (that is, 1–4 components) best described the extent of effect heterogeneities based on the posterior probabilities for each model. In addition, we required that strongly heterogeneous variants had an MDS component effect that was significantly different from zero and were polymorphic in two or more ancestry-stratified cohorts to ease interpretation of heterogeneous effects. For example, a variant in which the model with two components best fit the data was considered at least weakly heterogeneous. If this variant also had a component two effect significantly different than zero ($${\gamma }_{j2}\ne 0,$$ from above) and was polymorphic in at least two ancestries, it was considered strongly heterogeneous.

### Fine-mapping

On the basis of the selected genetic effect model (above), for each variant in a locus, we calculated the Bayes factor by $${\varLambda }_{j}={\rm{\exp }}\left[\frac{{X}_{j}-\left(T+1\right){\rm{\log }}K}{2}\right]$$, where $${X}_{j}$$ denotes the chi-squared test statistic for variant j, T denotes the number of axes of genetic variation included in the best-fitting model (that is, 0–4 MDS components) and K denotes the number of studies contributing to the GWAS. Using the approximate Bayes factor, we then calculated the posterior inclusion probability for each variant as $${\pi }_{j}=\frac{{\varLambda }_{j}}{{\sum }_{i}{\varLambda }_{i}}$$, where i indexes each locus. Finally, we derived 90% credible intervals by ranking variants within a locus by their single posterior estimate and selecting variants until the cumulative posterior inclusion probability reached 0.90.

For EUR-stratified fine-mapping, we approximated the Bayes factor as above with T set to 0. Fine-mapping was conducted in EUR-stratified results, using identical loci as in multi-ancestry fine-mapping, to describe the increased resolution attributable to diverse ancestry inclusion and differences in sample size.

Functional enrichment analysis was conducted to test whether high-priority genes identified in the fine-mapping results were expressed in specific tissue types or enriched in certain cell types or gene pathways. High-priority genes were defined as those mapped from variants in credible intervals containing less than five variants. That is, for each variant in credible intervals with less than five variants, we used the UCSC genome annotation database to assign genes. We assigned intergenic variants to the nearest gene. We mapped genes from variants with PIP < 0.01 (as ‘control’ genes) in the same way. Functional enrichment was then evaluated by estimating a relative risk (as described and implemented previously52), defined as the ratio of the proportion of genes mapped from variants in credible intervals with less than five variants that are in a given annotation category to the proportion of genes mapped from variants, within associated loci, with PIP < 0.01 in the same annotation category. Annotation categories were derived from GTEx tissue expression53, central nervous systems cell types50 and gene pathways54.

### TWAS

TWAS were performed using a trans-ancestry method. In brief, this method fits a series of meta-regression models including the first four axes of genetic variation (MDS components), similar to that of our multi-ancestry meta-analysis model minus the random-effect term. Genetic effect estimates from these four models were then used to estimate phenotypic effects of each variant. Together, with variant weights taken from PrediXcan55 based on 49 tissues from GTEx10 release version 8 (which includes up to 15% of individuals of non-EUR ancestry), the phenotypic effect estimates were used to construct a single TWAS statistic for each MDS component. A minimum P value approach56 was then applied to combine all four TWAS statistic P values. Finally, we used a Cauchy combination test57 to combine P values across all available tissues for each gene. The final, combined P value was subjected to a Bonferroni correction for 22,121 genes in 49 tissues. We present our TWAS results based on per gene P values combined across all available tissues, resulting in a 5 (phenotype) × 22,121 (gene) matrix of P values. Pathway enrichment was also conducted using a weighted regression approach58 with the TWAS per-tissue P values to quantify the enrichment of identified genes in each pathway.

### Heritability and genetic correlations

LDSC59 was used to estimate heritability of our five phenotypes for EAS and EUR ancestries using a standard 1-cM window size. For ancestries with more recent admixture (AFR and AMR ancestries), we used covariate-adjusted LDSC60 for the same analyses in which in-sample LD scores were calculated using ancestry-matched TOPMed reference samples and adjusted by the first 50 principal components. For more recently admixed AFR and AMR ancestries, which tend to show longer-range LD, we used a 20-cM window size when calculating LD scores. For both LDSC and covariate-adjusted LDSC, variants were subset to HapMap3 (ref. 61) with MAF > 0.05, as recommended for this approach.

We calculated genetic correlations between our five phenotypes and 4,065 UK Biobank phenotypes (both restricted to EUR ancestry) using bivariate LDSC with 1000 Genomes-based pre-calculated EUR LD scores for HapMap3 variants. We excluded phenotypes with heritability Z-scores less than 3 (reflecting near-zero heritability), genetic correlations with our phenotypes less than −0.8 or greater than 0.8, to remove phenotypes approaching redundancy with our target tobacco and alcohol use measures (for example, cigarettes per day versus packs per day), and those whose genetic correlations were unable to be estimated largely due to negative heritability estimates, leaving 1,141 UK Biobank phenotypes. Affinity propagation clustering62, a message-passing algorithm based on exemplars that identifies their corresponding set of clusters, was then used to further interpret the pattern of genetic correlations and multifactorial nature of substance use. A Bonferroni-corrected P value threshold for 1,141 UK Biobank phenotypes was used to identify genetic correlations that were significantly different from zero.

### Polygenic scoring

Polygenic risk scores were computed using LDpred for each ancestry group separately, an approach that incorporates the correlation between genetic variants to re-weight effect size estimates63. We used an independent prediction cohort, Add Health35, to validate each score. Add Health is a nationally representative sample of US adolescents enrolled in grades 7 through 12 during the 1994–1995 school year. The mean birth year of respondents was 1979 (s.d. = 1.8) and the mean age at assessment (here, wave 4) was 29.0 years (s.d. = 1.8), which is comparable, in general, to the age of participants in the 23andMe cohort but younger, on average, than those in other cohorts. Add Health is composed of individuals from the same four major ancestral groups (defined with reference to 1000 Genomes; see Supplementary Note for details) comprising our ancestry-stratified results (EUR, AFR, AMR and EAS). Phenotypic descriptive statistics are given in Supplementary Table 12. Across the full Add Health sample, approximately 41% ever smoke regularly and reported an average of 7.3 cigarettes per day. For each polygenic score, we used only HapMap3 variants and those with MAF > 0.01. We used each Add Health ancestry group as its own LD reference panel for construction of each polygenic score, after removing related individuals, except for EAS in which we use 1000 Genomes due to the small sample size in Add Health.

Prediction accuracy of each polygenic score was estimated by taking the difference in the coefficient of determination (R2) between a base model that included only the covariates of age, sex, age × sex interaction, and the first ten genetic principal components, and a full model that additionally included the polygenic score. All scores were scaled to have a mean of zero and standard deviation of one.

### Ethics

Ethical review and approval were provided by the University of Minnesota institutional review board. All human participants provided informed consent.

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.