Introduction

Schizophrenia is a common (~0.6–1%), chronic and debilitating neuropsychiatric syndrome for which most of the variability in liability is attributable to genetic factors (~80%) [1]. While rare genetic variants play a role in the underlying liability [2,3,4,5,6,7,8,9,10,11], most of the currently explained liability is harbored in common variation [7, 12,13,14]. Genome-wide common variants, routinely assayed by commercially available genotyping arrays, can explain up to 20% of the variability in liability to schizophrenia, but its multifactorial architecture is highly complex, and the strongest associations from large GWAS of schizophrenia account collectively for only 6.2% of the explainable heritability in individuals of European descent [15].

The past decade has seen the successes of psychiatric GWAS abound, including the first definitive demonstration of polygenic influences on schizophrenia risk and its shared basis with bipolar disorder [14], and ever-increasing numbers of robustly associated, replicated SNP associations, culminating in the identification of 108 physically distinct risk loci for schizophrenia [12], a number which has since grown to 145 [16]. This progress can be credited to collaborative enterprise on an unprecedented scale, as exemplified by the Psychiatric Genomics Consortium (PGC), and a philosophy of data sharing that has enabled widespread meta-analysis and replication [17].

The largest genome-wide association studies (GWAS) have disproportionately focused on cohorts of European descent [12, 14, 18, 19]. This European bias is not unique to psychiatric genetics research, but systemic within the GWAS literature. Although the proportion of non-Europeans has since increased to ~20%, this is primarily due to greater representation of Asian populations [20, 21]. Importantly, empirical evidence indicates that at least some of this common variant attributable risk is shared between populations of European, East-Asian and African ancestry [14, 22, 23], suggesting that variation predating divergence of European and African populations harbors most of the heritability of schizophrenia.

To our knowledge, the largest schizophrenia cohort of African ancestry that has been genotyped is the Molecular Genetics of Schizophrenia (MGS-AA) study (N = 2259 African-American individuals) [14, 24, 25]. In the first definitive demonstration of polygenic influences on schizophrenia risk, aggregate genetic scores constructed from International Schizophrenia Consortium GWAS results were of significant but attenuated predictive value in African-ancestry individuals—individual-level scores predicted ~2–3% of the variance in schizophrenia risk in European samples, and less than half a percent in MGS-AA [14]. It is by now well understood that both specific and aggregate GWAS findings are incompletely generalizable across diverse populations [26,27,28], owing largely to population differences in genome-wide allele frequencies and patterns of linkage disequilibrium [29, 30].

We have undertaken the largest GWAS of admixed African individuals to date, with a combined sample size of 6152 schizophrenia and schizoaffective disorder cases and 3918 screened controls from the Genomic Psychiatry Cohort (GPC). With available sample sizes now on a par with earlier European GWAS that yielded the first replicated, genome-wide significant associations with single nucleotide polymorphisms (SNP), we consider evidence of novel genetic associations and assess trans-ancestry replication support for 128 independent associations (representing 108 physical loci) identified in the landmark study of the Psychiatric Genomics Consortium Schizophrenia Working Group (PGC-SCZ2). We consider the implications of the pronounced underrepresentation of African and Latino populations in psychiatric GWAS and highlight the potential for improved fine-mapping resolution at identified risk loci by incorporating data from diverse populations.

Methods

Subject ascertainment and diagnosis

The GPC is a large cosmopolitan sample of repository and newly ascertained schizophrenia and bipolar disorder cases and screened controls, with considerable representation of individuals with African, European, and Latino ancestries. In the present analysis, we considered as cases all individuals with a diagnosis of schizophrenia or schizoaffective disorder. Details of ascertainment and diagnosis are given in the Supplemental Material.

Single nucleotide polymorphism (SNP) genotyping and imputation

Genotyping of N= 33,422 participants was performed on Illumina Infinium arrays in a total of 11 “batches” (Table 1); four of these cohorts were ascertained as being primarily of African ancestry (OmniExpress 2.5 and Multi-Ethnic Global Array); three cohorts were of broadly Latino background (OmniExpress 2.5 and Multi-Ethnic Global Array); one included participants of any background (Global Screening Array); and three consisted mainly of European participants (OmniExpress and PsychArray) selected as part of parallel research initiatives. Typed variants were aligned to the human reference genome (GRCh37). Within each genotyping batch, we excluded any variant with missingness greater than 2% or Hardy−Weinberg Equilibrium P value <10−6. Our scripts for pre-processing GWAS array data are downloadable from https://github.com/freeseek/gwaspipeline.

Table 1 GPC sample sizes by genotyping batch and assigned ancestry. For constituent datasets in the current analysis (Genotyping Wave/Batch), the commercial genotyping array and the numbers of individuals assigned to African, Latino, and European ancestry groups are displayed. Within each ancestry group, the reported total is based on those quantities appearing in boldface

Computational phasing was performed for each genotyping batch using Eagle (v2.3.5) [31] and default parameters. Statistical genotype imputation was performed for each genotyping batch using Minimac3 (v2.0.1) [32] and default parameters, using publicly available reference haplotypes from the 1000 Genomes Project (1KGP) Phase 3 [33].

Relationship inference, population structure and ancestry assignment

We used the KING software package [34] to identify duplicates and infer familial relationships in the full GPC cohort using a set of overlapping, genotyped variants. Within genotyping batches, we excluded from pairs of duplicates the sample with the larger fraction of missing genotypes. Next, we retained one sample from each remaining pair of duplicates or first-degree relatives (i.e. parent−offspring or sibling pairs), preferentially retaining cases from affected/unaffected relative pairs. For diagnostically concordant pairs, we considered the degree (and direction) of case−control imbalance in each of the originating batches in terms effective sample size, where Neff = 4/(1/Ncases + 1/Ncontrols). We preferentially assigned samples to batches with smaller ratios of NeffN when this was ameliorative of case−control imbalance, and updated batch-wise values of Ncases, Ncontrols and Neff after each assignment.

Principal components analysis (PCA) was performed with GCTA (v1.2.4) [35], using a genome-wide genetic relatedness matrix (GRM) estimated for the full GPC dataset and reference samples from the 1KGP Phase 3 data [33] based on 34,918 genotyped SNPs. For each individual, we estimated genome-wide average proportions of African (AFR), European (EUR), Admixed American (AMR), East Asian (EAS), and South Asian (SAS) ancestry from global ancestry PCs using a simple linear mixed model. Using these estimated proportions and defining significant admixture as 25% or more of a given continental origin, we assigned individuals to three broad ancestry groups: 10,070 African (≥25% AFR and <25% AMR, <25% EAS, <25% SAS); 4324 Latino (≥25% AMR and <25% AFR, <25% EAS, <25% SAS); and 10,580 European (<25% AFR, <25% AMR, <25% EAS, <25% SAS) (Fig. 1). Clustering of individuals in each broad ancestry group with the 1KGP reference populations are shown in Supplemental Figs. 13. We refer to the admixed African and Latino ancestry GPC cohorts as GPC-AA and GPC-Latino, respectively.

Fig. 1
figure 1

Ancestry assignment and Manhattan plots for trans-ancestry meta-analyses of GPC-AA and GPC-Latino with PGC-SCZ2. a PCA-based clustering of GPC participants shaded by broad ancestry assignment. b Red and blue dashed lines denote thresholds for genome-wide significance (P < 5 × 10−8) and replication follow-up in PGC-SCZ2 (P < 10−6). For newly genome-wide significant regions, the top SNP within a 3 Mb region is displayed as a diamond; nearby SNPs in linkage disequilibrium (r2 > 0.1) are highlighted

Genome-wide association and trans-ancestry meta-analysis

Within each broadly defined ancestry group, we tested for association between imputed genotype dosages and a diagnosis of schizophrenia (or SAD) by logistic regression using PLINK [36, 37], and including the first six ancestry PCs and site/cohort indicator variables as covariates. Within each analysis, we retained variants with imputation quality (INFO) of 0.3 or greater and minor allele frequency (MAF) of at least 1%, based on average values calculated for the combined ancestry cohort. We combined association results across ancestry groups under fixed effects (i.e., inverse variance weighted) and Han and Eskin’s random effects (RE2) models, as implemented in METASOFT [38]. The Han and Eskin random effects model is optimized to detect allelic associations in the presence of heterogeneity [38]. We also applied this method to combine male- and female-specific association results for X chromosome variants.

In our primary trans-ancestry meta-analyses, we combine genome-wide summary statistics for African and Latino ancestry GWAS with the PGC-SCZ2 study results. The discovery phase of PGC-SCZ2 included 34,241 cases and 45,604 controls from 46 European and 3 East-Asian case-control studies, and 1235 parent affected offspring trios from 3 family-based samples of European ancestry [12]. The PGC-SCZ2 summary statistics are publicly available (https://www.med.unc.edu/pgc/results-and-downloads) and have been widely utilized in dozens of follow-up studies, and thus represent a meaningful benchmark for genetic analysis. We apply the same filters for SNP association results as described in the original study (INFO ≥ 0.6, MAF ≥ 1%, and present in at least 20 of 49 studies) and interpret the PGC-SCZ2 results as being broadly representative of findings based on European populations.

Consistency of directions of allelic effects

Linkage disequilibrium (LD) based “clumping” was used to obtain approximately independent sets of SNPs (r2 < 0.1 within a 500 kilobase (kb) window) using the 1KGP Phase 3 European (EUR) data, and preferentially retaining the most significant SNP in the PGC-SCZ2 analysis (among those meeting filtering criteria in the relevant GPC analysis). For varying P value thresholds applied to the PGC-SCZ2 results, we used a binomial sign test to determine if the proportion of same-direction effects in the admixed African or Latino analyses was greater than expected by chance (i.e., a one-sided test of whether this fraction is greater than 0.5). Reciprocal analyses comparing the observed directions of effects in PGC-SCZ2 to the African and Latino ancestry results were also performed, with LD-clumping based on the corresponding 1KGP reference population.

Polygenic risk score profiling

We performed polygenic risk score profiling based on the PGC-SCZ2 summary statistics (the “training” dataset), testing these scores for association with case−control status in African, Latino, and European cohorts from the GPC (the “target” datasets). For each pair of training and target datasets, results for overlapping SNPs (or indels) meeting quality control requirements (imputation quality ≥ 0.3 and MAF ≥ 1%) were subjected to LD-based clumping in the appropriate reference population from the 1KGP (r2 < 0.1 within a 500 kb window); for analyses of African, Latino, and European cohorts, we utilized reference data for AFR, AMR, and EUR populations, respectively. For SNPs significant at varying P value thresholds (PT) in the training dataset, individual-level scores were constructed by summing the number of copies of a given allele by its corresponding effect estimate (i.e, the log-transformed odds ratio in the training dataset). We evaluated the significance of case−control differences using logistic regression and covarying ancestry-based principal components (PCs) and a study indicator variable. Predictive values of these scores are reported both in terms of Nagelkerke’s pseudo-R2 (fmsb package in R) [39] as well as adjusting for sample and population prevalences of 1% for schizophrenia or bipolar disorder (i.e. the liability scale) [40]. We examined how varying strengths of LD among SNPs used to construct a polygenic score influence within- and cross-ancestry genetic prediction by repeating these procedures and increasing the threshold for “clumping” correlated markers (pairwise r2) to 0.5 and 0.8.

Because genetic prediction is generally worse when comparing training and testing datasets of divergent ancestry, with greater attenuation of predictive value for more divergent populations [27, 28, 41], we constructed analogous polygenic scores based on the African and Latino GWAS results. For within-ancestry prediction, we maintained the independence of training and testing datasets via an iterative “leave-one-out” procedure in which each cohort was omitted, and the remaining samples re-analyzed; the resultant summary statistics represented independent training datasets. For cross-ancestry prediction from the African or Latino GWAS, summary statistics from the primary mega-analysis were utilized.

Trans-ancestry fine-mapping of schizophrenia loci

We attempted to fine-map 276 autosomal and X-chromosome regions around statistically independent SNPs with association P value <10−6 in the publicly available PGC-SCZ2 summary statistics. For each index SNP, we considered SNPs correlated at r2 ≥ 0.6 within a 3 megabase window which had P < 10−4 in the PGC-SCZ2 discovery analysis. We constructed credible SNP sets by combining their posterior probabilities until the sum exceeded 99%, following the approach of Huang et al. [42]. Credible sets for meta-analytic models representing the PGC-SCZ2 discovery phase and its combined analysis with GPC-AA were compared on the basis of total length and number of credible SNPs, and the smallest observed P value among these SNPs; we followed-up regions attaining greater significance in the combined PGC-SCZ2/GPC-AA analysis and for which the credible set in the combined analysis represented a shorter genomic interval than the corresponding interval in the PGC-SCZ2 analysis. We considered a region to be “fine-mapped” if the genomic interval for the reduced credible set was smaller than the corresponding interval for SNPs with LD r2 ≥ 0.6 to the index SNP (based on 1KGP EUR reference data).

Results

Genome-wide association and trans-ancestry meta-analysis

Manhattan and quantile−quantile (QQ) plots for GWAS of admixed African and Latino ancestry individuals are presented in the Supplemental Material (Supplemental Figs. 4 and 5). We calculated the genomic control factor (λ) and its value scaled to a sample size of 1000 cases and 1000 controls (λ1000) from genome-wide distributions of test statistics; these values were 1.04 and 1.008 for the admixed African GWAS, and 1.055 and 1.031 for the Latino GWAS, indicating that our results are not likely to be confounded by population substructure.

Our primary GWAS in admixed African individuals did not yield any SNP findings that reached the accepted threshold for genome-wide significance (P < 5 × 10−8). In the Latino ancestry GWAS, we identified a novel genome-wide significant association with SNPs in GALNT13 on chromosome 2q23.3 (rs776877; OR = 1.420, 95% CI:[1.272,1.585]; P = 9.62 × 10−9) (Supplemental Fig. 6); the associated SNP was not associated in the PGC-SCZ2 analysis (OR = 1.026, 95% CI:[0.994,1.059]; P = 0.1215).

Meta-analysis of African ancestry GWAS and PGC-SCZ2 summary statistics yielded 107 independent genome-wide significant SNPs representing 93 physically distinct loci. Of these, 10 were not among the 108 loci reported in the PGC-SCZ2 study (Fig. 1b; Supplemental Table 1).

Combining PGC-SCZ2 and Latino summary statistics, we observed 114 associated SNPs representing 101 loci, 8 of which are newly significant in the current analysis (Fig. 1b; Supplemental Table 2).

Meta-analysis of PGC-SCZ2, African ancestry, and Latino summary statistics revealed two additional significant loci (Fig. 1b; Supplemental Table 3).

Consistency in directions of allelic effects

Across varying P value thresholds in the PGC-SCZ2 dataset and to a high degree of statistical significance overall, the fraction of same-direction effects in the African-ancestry cohort was significantly greater than expected by chance (Supplemental Table 4). We observed a similar pattern of consistency when considering the analysis of African-ancestry individuals and comparing the number of same-direction effects in the PGC-SCZ2 analysis. The observed fraction was significantly greater than expected by chance at more inclusive P value thresholds (PT < 5 × 10−4) accounting for a larger fraction of the genome. This is explainable by the greater degree of statistical enrichment of the PGC-SCZ2 results and corresponding larger number of independent significant findings. The larger number of statistically independent tests genome-wide in the African ancestry GWAS was a reflection of the lower background LD in the African ancestry reference data from the 1KGP. We observed similar results when restricting analyses to individuals with at least 75% African ancestry genome-wide (Supplemental Table 4), and comparable results for comparisons of African and Latino ancestry results (Supplemental Table 5).

Polygenic risk score profiling

Consistent with previous reports demonstrating the generalizability of polygenic findings for schizophrenia across diverse populations [14, 43, 44], individual-level scores constructed from PGC-SCZ2 summary statistics were significantly associated with case−control status in admixed African, Latino, and European cohorts in the current study (Fig. 2a). When considering scores constructed from approximately independent common variants (pairwise r2 < 0.1), we observed the best overall prediction at a P value threshold (PT) of 0.05; these scores explained ~3.5% of the variance in schizophrenia liability among Europeans (P = 4.03 × 10−110), ~1.7% among Latino individuals (P = 9.02 × 10−52), and ~0.5% among admixed African individuals (P = 8.25 × 10−19) (Fig. 2a; Supplemental Table 6). Consistent with expectation, when comparing results for scores constructed from larger numbers of nonindependent SNPs, we generally observed an improvement in predictive value (Fig. 2b; Supplemental Table 7).

Fig. 2
figure 2

Trans-ancestry association of polygenic risk scores with schizophrenia. For scores based on PGC-SCZ2, GPC-AA or GPC-Latino, and meta-analysis results, the variance in risk explained in the other study is shown on the y-axis in terms of R2 on the liability scale. a Scores based on various P value inclusion thresholds are displayed as shaded bars; b scores based on PT < 0.5 and varying pairwise LD between SNPs are displayed as shaded bars. Analyses of PGC-SCZ2 and meta-analysis scores utilized an independent cohort of European ancestry GPC participants

Polygenic scores based on African ancestry GWAS results were significantly associated with schizophrenia among admixed African individuals, attaining the best overall predictive value when constructed from approximately independent common variants (pairwise r2 < 0.1) with PT ≤ 0.5 in the discovery analysis (Fig. 2a and Supplemental Table 6); this score explained ~1.3% of the variance in schizophrenia liability (P = 3.47 × 10−41). Scores trained on African ancestry GWAS results also significantly predicted case−control status across populations; scores based on a PT ≤ 0.5 and pairwise r2 < 0.8 explained ~0.2% of the variability in liability in Europeans (P = 2.35 × 10−7) and ~0.1% among Latino individuals (P = 0.000184) (Fig. 2b and Supplemental Table 7). Similarly, scores constructed from Latino GWAS results (PT 0.5) were of greatest predictive value among Latinos (liability R2 = 2%; P = 3.11 × 10−19) and Europeans (liability R2 = 0.8%; P = 1.60 × 10−9); with scores based on PT 0.05 and pairwise r2 < 0.1 showing nominally significant association with case-control status among African ancestry individuals (liability R2 = 0.2%; P = 0.00513).

We next considered polygenic scores constructed from trans-ancestry meta-analysis of PGC-SCZ2 summary statistics and our African and Latino GWAS, which revealed increased significance and improved predictive value in all three ancestries. Among African ancestry individuals, meta-analytic scores based on PT 0.5 explained ~1.7% of the variance (P = 4.37 × 10−53); while scores based on PT 0.05 accounted for ~2.1% and ~3.7% of the variability in liability among Latino (P = 1.10 × 10−59) and European individuals (P = 1.73 × 10−114), respectively.

We then considered a “baseline” generalized linear model including the PGC-SCZ2 score and covariates as predictors and compared this to a joint model incorporating African- and/or Latino-trained scores by a log-likelihood ratio test. Consistent with our observation that polygenic scores constructed from trans-ancestry meta-analysis results yielded improved prediction at genome-wide PT, joint models incorporating both PGC-SCZ2 and ancestry-specific scores yielded significant improvements in goodness-of-fit (Supplemental Table 8).

We also considered whether these schizophrenia polygenic risk scores also indexed risk of bipolar disorder in independent cases from the GPC. We observed a similar pattern of findings as those reported above, albeit with systematic attenuation of signal in terms of explained variance and statistical significance, which is expected (Supplemental Table 9). Critically, scores constructed from PGC-SCZ2 and trans-ancestry meta-analysis results were significantly associated with a diagnosis of bipolar disorder in African, Latino, and European populations (P < 10−5).

Trans-ancestry fine-mapping of schizophrenia loci

We next sought to evaluate the extent to which combining PGC-SCZ2 summary statistics with GPC-AA and GPC-Latino results would yield improved fine-mapping resolution at implicated loci. For replicated associations from PGC-SCZ2 that increased in significance following trans-ancestry meta-analysis, we compared “credible sets” of SNPs constructed from trans-ancestry meta-analysis summary statistics to those based on the PGC-SCZ2 results alone. We interpreted any reductions in both the number of SNPs comprising the 99% credible set and the length of the corresponding genomic interval as evidence of improved fine-mapping resolution.

Among 128 statistically significant associations in the PGC-SCZ2 study, we successfully fine-mapped 12 regions by trans-ancestry meta-analysis with African ancestry GWAS summary statistics (Table 2). Meta-analysis of PGC-SCZ2 and Latino summary statistics yielded reductions in the credible set for nine regions (Supplemental Table 10), including one newly significant region. Combining PGC-SCZ2, African ancestry, and Latino summary statistics showed improved fine-mapping resolution for two additional PGC-SCZ2 regions (rs6670165 and chr11_46350213_D); and for one of two2 regions that saw credible set reductions from meta-analysis with either African or Latino ancestry results, this improved fine-mapping resolution was further enhanced in the combined analysis (Supplemental Table 11).

Table 2 Improved fine-mapping resolution at 12 established schizophrenia loci by trans-ancestry meta-analysis of PGC-SCZ2 and GPC-AA.

The degree of improved fine-mapping resolution varied between loci, and in two instances was reduced to a single SNP (rs9607782 and rs211829 in Table 2 and Supplemental Table 10, respectively). Importantly, for these fine-mapped regions, genomic intervals corresponding to 99% credible set were smaller than corresponding intervals defined by SNPs with LD r2 > 0.6 with the index SNP (based on 1KGP EUR reference data).

For selected regions on chromosomes 11p11.2 and 22q13.2, Fig. 3 displays regional association results for PGC-SCZ2 and trans-ancestry meta-analysis of PGC-SCZ2 and GPC-AA.

Fig. 3
figure 3

Regional association plots for selected schizophrenia associations with improved fine-mapping resolution in trans-ancestry meta-analysis. For each selected region, association results for PGC-SCZ2 and meta-analysis of PGC-SCZ2 with GPC-AA are shown in the first and second panels, respectively. The strength of LD of each SNP with the “index” SNP, displayed as a large purple diamond, is indicated by its color. Genomic intervals corresponding to SNPs with LD r2 > 0.6 to the index SNP (“rsq6”) and 99% credible sets in PGC-SCZ2 (“pgc”) and the present analysis (“meta”) are displayed. Plots were created using the LocusZoom standalone software [54]

Discussion

We have undertaken the largest genetic association study of schizophrenia in persons of African ancestry to date and provide important benchmarks for the generalizability of aggregate findings across diverse populations. We observed a significant excess of SNPs with consistent directions of allelic effect across studies and populations as well as robust enrichments of identified risk alleles among cases compared to controls. We demonstrate that combining European and African ancestry data has the potential to generate empirical support for specific genetic variants, and to refine implicated risk loci by trans-ancestry fine-mapping. Critically, aggregate polygenic risk scores derived from the largest published GWAS of SCZ to date have markedly attenuated predictive value among non-Europeans, presenting an imperative for increased diversity of participants in psychiatric genetics research.

Among admixed African cases and controls, we were able to explain a larger fraction of variance using polygenic scores constructed from our African ancestry GWAS results, and European and Latino cases were found to carry more of the African-derived score alleles than ancestry-matched controls. The predictive value of PGC-SCZ2 scores was comparable for European and Latino cohorts, but considerably attenuated in admixed African ancestry individuals. Importantly, meta-analysis of PGC-SCZ2 and African ancestry GWAS results yielded the best “training” dataset overall, with resultant scores explaining more variance among European and African ancestry individuals than corresponding scores based on either ancestry alone. Recalling the seminal findings of the International Schizophrenia Consortium (2009), polygenic scores based on a larger European cohort showed attenuated effects in the MGS-AA sample [14], reflecting aggregate differences in allele frequencies and patterns of linkage disequilibrium. Consistent with expectation, the overall predictive value of these scores was improved, both within- and across-ancestries, by achieving more complete coverage of the genome through use of a larger set of variants. Taken together, these results highlight that the utility of polygenic scoring methodologies—in both basic research and in terms of potential clinical applications—relies on the availability of appropriately matched “training” and “testing” samples. Larger and more inclusive GWAS are necessary to ensure that advances in genomic medicine, including improved risk prediction, benefit the entirety of humanity.

While GWAS of schizophrenia in admixed African ancestry individuals did not yield any genome-wide significant findings, our analysis of Latino cases and controls revealed a novel genome-wide association with SNPs in GALNT13 at 2q23.3. This locus encodes polypeptide N-acetylgalactosaminyltransferase 13, which has been shown to be specifically expressed in neurons and may be responsible for synthesizing Tn antigen; the 3′ UTR region contains two microRNA target sequences. The associated allele at the leading SNP at this locus (rs776877) has an odds ratio of ~1.4, which is larger than expected given that the majority of associated variants in PGC-SCZ2 have an odds ratio less than 1.2. This may be attributable to the phenomenon described as “winner’s curse” [45]. This SNP yielded significance evidence of heterogeneity of effect sizes in the trans-ancestry analysis of PGC-SCZ2, GPC-AA and GPC-Latino (Cochran’s Q = 33.1605; P = 6.30 × 10–8; I2 = 93.97%). It is also worth noting that a prior study of psychosis in Mexican and Central American families yielded some evidence of linkage to 2q33 [46]. However, confirmation of GALNT13 as a schizophrenia risk locus will require detailed follow-up and replication in an independent Latino cohort.

Meta-analysis of African ancestry results with PGC-SCZ2 summary statistics yielded 94 associated loci, of which 11 were not among the 108 previously reported, and 7 were newly genome-wide significant. These additional loci were significant at P < 10−6 in the PGC-SCZ2 discovery phase but did not attain genome-wide significance in a combined analysis with 1513 cases and 66,236 controls from deCODE genetics. It is noteworthy that the effective sample size (Neff) of the deCODE replication sample—the total sample size adjusted for imbalanced numbers of cases and controls—is smaller than the effective sample size of the African ancestry cohort (Neff = 5 917 vs. Neff = 9 574). That we observed fewer genome-wide significant associations overall is consistent with expectation that gains in statistical power from increasing sample size will be largest when adding ancestry-matched subjects. With greater “genetic distance” (e.g. fixation index, or FST) between the discovery and replication samples, we would expect greater attenuation in terms of realized gains in statistical power relative to increase in sample size. For example, consider that meta-analysis with Latino results (Neff = 3 527) yielded 101 loci, including 12 newly replicated loci.

Comparing the 18 newly significant loci reported here (Supplemental Tables 13) to findings from a recently published meta-analysis of PGC-SCZ2 and CLOZUK2 [16], we observe just six overlapping genome-wide significant loci. Noting the large sample size of the CLOZUK2 sample (5220 cases and 18,823 controls not included in the PGC-SCZ2 analysis), this argues that trans-ancestry meta-analysis has the potential to enlarge the scope of GWAS findings and lead to identification of novel associations.

Fine-mapping approaches often utilize functional annotations (e.g. predicted deleteriousness of nonsynonymous variants) to identify a likely causal variant at an associated locus [47, 48], and methods that leverage population differences in patterns of linkage disequilibrium have also been described [49, 50]. Our approach was to compare credible SNP sets constructed from PGC-SCZ2 and our trans-ancestry meta-analysis results, interpreting a reduction in the number of credible SNPs and length of the corresponding genomic interval as indication of improved fine-mapping resolution. Among 128 associated SNPs identified in the PGC-SCZ2 analysis, 41 increased in significance in the African ancestry meta-analysis; for 12 of these regions, we observed a concomitant reduction in the number of SNPs comprising the credible set. It can be expected that larger sample sizes will yield larger numbers of fine-mapped loci and enhanced fine-mapping resolution.

Limitations

We do not specifically model admixture within individuals or adjust for local ancestry in tests of common variant association, instead adjusting for global ancestry proportions within broadly defined ancestry groups. Importantly, our resultant test-statistic distributions do not suggest significant confounding by population substructure. It is likely that much larger samples of African and Latino ancestry are needed to capture the extensive genetic diversity present in these populations [51, 52].

We do not give specific consideration to enrichment of observed associations in particular biological pathways or other functional annotations (e.g. tissue-specific eQTLs), or evidence of cross-ancestry and cross-trait genetic correlations. This is in part owing to our concern that many current and trending methods utilize reference LD information, and the suitability of these data to admixed populations is an unresolved, empirical question.

Conclusions

We have conducted the largest GWAS of schizophrenia among admixed African individuals to date and demonstrate the potential of more diverse studies to refine the catalog of schizophrenia risk loci and enhance the generalizability of aggregate genetic findings. Addressing disparities in representation of African and Latino ancestries in psychiatric genetics research presents both scientific opportunities and imperatives [53], necessitating greater community engagement and genotyping initiatives at population-scale.