Variability of 128 schizophrenia-associated gene variants across distinct ethnic populations

Schizophrenia is a common polygenetic disease affecting 0.5–1% of individuals across distinct ethnic populations. PGC-II, the largest genome-wide association study investigating genetic risk factors for schizophrenia, previously identified 128 independent schizophrenia-associated genetic variants (GVs). The current study examined the genetic variability of GVs across ethnic populations. To assess the genetic variability across populations, the 'variability indices' (VIs) of the 128 schizophrenia-associated GVs were calculated. We used 2504 genomes from the 1000 Genomes Project taken from 26 worldwide healthy samples comprising five major ethnicities: East Asian (EAS: n=504), European (EUR: n=503), African (AFR: n=661), American (AMR: n=347) and South Asian (SAS: n=489). The GV with the lowest variability was rs36068923 (VI=1.07). The minor allele frequencies (MAFs) were 0.189, 0.192, 0.256, 0.183 and 0.194 for EAS, EUR, AFR, AMR and SAS, respectively. The GV with the highest variability was rs7432375 (VI=9.46). The MAFs were 0.791, 0.435, 0.041, 0.594 and 0.508 for EAS, EUR, AFR, AMR and SAS, respectively. When we focused on the EAS and EUR population, the allele frequencies of 86 GVs significantly differed between the EAS and EUR (P<3.91 × 10−4). The GV with the highest variability was rs4330281 (P=1.55 × 10−138). The MAFs were 0.023 and 0.519 for the EAS and EUR, respectively. The GV with the lowest variability was rs2332700 (P=9.80 × 10−1). The MAFs were similar between these populations (that is, 0.246 and 0.247 for the EAS and EUR, respectively). Interestingly, the mean allele frequencies of the GVs did not significantly differ between these populations (P>0.05). Although genetic heterogeneities were observed in the schizophrenia-associated GVs across ethnic groups, the combination of these GVs might increase the risk of schizophrenia.


INTRODUCTION
Schizophrenia is a common, complex psychiatric disease with a lifetime prevalence of~0.5-1% 1,2 and an estimated heritability of 80%. 3 The incidence of schizophrenia is uniform worldwide. 1,4,5 Hundreds of common genetic variants (GVs) have been weakly implicated in the pathogenesis of schizophrenia. 6,7 Genome-wide association studies (GWASs), which examine millions of GVs, are powerful tools for identifying common susceptibility variants associated with complex disorders (including schizophrenia) across diverse populations. The largest GWAS in the Schizophrenia Working Group of the Psychiatric Genomics Consortium (GWAS PGC-II), including 36 989 patients with schizophrenia and 113 075 controls, has identified 128 linkage disequilibrium (LD)-independent variants across 108 genomic loci. 7 However, most of these participants were of European (EUR) ancestry. The second most common ethnic population included case-control samples from East Asia (1866 cases and 3418 controls).
These 128 LD-independent schizophrenia-associated GVs contribute to the risk of schizophrenia across distinct populations. For example, the schizophrenia-associated GVs in the ZNF804A, NRGN, VRK2 and ITIH3/4 genes 7 are found in both EUR [7][8][9] and Asian [10][11][12] patients with schizophrenia; however, the significance levels of these associations in Asian populations are marginal but not significant across the genome. In contrast, GV rs115329265 in the major histocompatibility complex region on chromosome 6, the most significantly associated GV in schizophrenia, 7 is not polymorphic in the Japanese population, according to the 1000 Genomes Project (1000GP: http://browser.1000genomes.org/index.html).
This finding appears to contradict the evidence, suggesting that schizophrenia affects~0.5-1% of individuals across distinct populations. We hypothesized that the sum of the allele frequencies of the 128 GVs would not differ across ethnic populations; however, the frequencies of each GV would differ across populations.
The human genome consists of three billion bases and over 88 million GVs (including 84.7 million single-nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels) and 60 000 structural variants), which can differ between any two genomes in different people. 13 The 1000GP, which was conducted between 2008 and 2015, sought to study these variations in many people; in doing so, it has provided a solid foundation upon which to build understanding of the genetic variation in humans. [13][14][15] The 1000GP consortium has analyzed 2504 genomes across 26 populations from five continental regions (East Asians (EAS), EUR, Africans (AFR), Americans (AMR) and South Asians (SAS)), by using a combination of low-coverage whole-genome sequencing, deep exome sequencing and dense microarray genotyping. The 1000GP has demonstrated that a typical genome differs from the reference human genome at between 4.1 million and 5.0 million sites. 15 The total number of observed non-reference sites differs greatly across populations. 15 Individuals of AFR ancestry harbor the greatest number of variant sites among the five ethnic populations. In addition, individuals from recently admixed populations show great variability in the number of variants. The present study tested the genetic variability of the 128 LDindependent schizophrenia-associated GVs, including SNPs and indels, detected by using the most recently available data from the GWAS PGC-II 7 with regard to the five ethnic populations, particularly EAS and EUR, studied in the 1000GP.

MATERIALS AND METHODS Participants
The 1000GP is the largest public catalog of human variation and genotype data, comprising 2504 human genomes from 26 ethnic populations. [13][14][15] The healthy individual genomes are divided into five major ethnic populations, EAS (n = 504), EUR (n = 503), AFR (n = 661), AMR (n = 347) and SAS (n = 489), which were included in the current study and accessed via the 1000GP Phase 3 Browser (http://browser.1000genomes.org/index.  Table 1. According to the previous largest GWAS, 7 the 128 LD-independent schizophrenia-associated GVs were extracted from these populations using the 1000GP Phase 3 Browser.

Statistical analyses
All statistical analyses were performed using SPSS 21.0 (IBM SPSS Japan, Tokyo, Japan) and R 3.1.1 (http://www.r-project.org/). We defined a variability index (VI) to investigate the genetic variability of the 128 LD-independent schizophrenia-associated GVs among the five ethnic populations by using the following formula: where X i represents each minor allele frequency (MAF) weighted for the sample size (number of minor alleles (sqrt)) in each ethnic population and X represents each mean MAF weighted for the sample size among the five ethnic populations. A high VI indicates high genetic variability among the ethnic populations, whereas a low VI indicates low genetic variability among the populations. The mean VI among the chromosomes was analyzed using analysis of variance with the VI as the dependent variable and chromosomes as the independent variable. To compare the genetic variability of the schizophrenia-associated GVs between the EAS and EUR populations that were identical to individuals utilized to calculate the VI, the differences were analyzed using χ 2 or Fisher's exact tests. The mean allele frequencies of the GVs between these populations were analyzed using non-parametric Mann-Whitney U-tests. To control for type I error (that is, false-positives), P-values less than 3.91 × 10 − 4 were considered to be significant (α = 0.05/128 GVs).

RESULTS
First, we investigated the genetic variability of 128 independent schizophrenia-associated variants among five major ethnic populations by using the VI. Exactly 122 of the 128 GVs were found in the 1000GP Phase 3 Browser. As shown in Supplementary Figure 1, principal component analysis of the allele frequencies of the 122 GVs shared among the five ethnic populations reflected the populations' structure. The VIs of these GVs ranged from 1.07 to 9.46 (Supplementary Table 2). The top 10 GVs in low or high variability are shown in Table 1  We also investigated whether the mean genetic variability of each chromosome differed among the chromosomes. As shown in Figure 2, the mean genetic variability did not differ among the chromosomes (F 20, 101 = 0.50, P = 0.96). The mean VIs of the total GVs and highest and lowest chromosomes were 4.34 ± 1.66 (n = 122); the variability was 5.19 ± 1.89 on chromosome X (n = 3) and 2.84 ± 1.20 on chromosome 20 (n = 2). In addition, according to the GWAS rank of each GV (Supplementary  Table 2), the schizophrenia-associated 128 GVs were divided into four groups to compare the mean ranks of the VI among groups, where high rank represents low genetic variability. The mean ranks of the VI did not differ among four groups (first group (GWAS top 1-25% ranked GVs): mean ranks of the VI ± s. d. = 73.3 ± 30.6, second (26-50%): 58.8 ± 39.6, third (51-75%): 50.0 ± 34.2, fourth (76-100%): 65.0 ± 34.0, z = 7.10, P = 0.069), suggesting that the VI of each GV is not associated with its significance with schizophrenia in GWAS PGC-II.
Next, we focused on the genetic variability of the schizophreniaassociated variants between the EAS and EUR populations utilized in our first analysis because these groups represented the major ethnicities that participated in a previous GWAS. 7 The allele frequencies of 86 GVs significantly differed between the EAS and EUR populations (P o 3.91 × 10 − 4 ; Supplementary Table 3). The top 10 GVs in high or low variability between the EAS and EUR groups are shown in Table 2. The GV with the highest variability was rs4330281 on chromosome 3 (odds ratio (OR) = 0.02, 95% confidence intervals = 0.01-0.03, P = 1.55 × 10 − 138 ). The T-allele frequencies were 0.023 and 0.519 in the EAS and EUR populations, respectively. The GV with the lowest variability was rs2332700 on chromosome 14 (OR = 1.00, 95% confidence intervals = 0.81-1.23, P = 9.80 × 10 − 1 ). The C-allele frequency was similar between the

DISCUSSION
To the best of our knowledge, this study is the first to examine the genetic variability of the 128 LD-independent  To compare genetic variability among ethnic populations, we calculated a VI. The VIs of the GVs ranged from 1.07 to 9.46. We successfully detected GVs with high or low genetic variability by using the VI. When we focused on the genetic variability between the EAS and EUR populations (the major ethnic groups included in the previous GWAS), 7~7 0% of the allele frequencies of the schizophrenia-associated GVs significantly differed between these populations. However, the mean allele frequency of the GVs did not differ between these populations. Consistently with the findings of polygenic risk score studies, 16,17 our results suggest that the sum of the GVs contributes to the pathogenesis of schizophrenia across ethnic populations. Several GVs showed genetic heterogeneity across ethnic populations. GVs with MAFso 0.01 were identified in the EAS (n = 4), AFR (n = 14) and SAS (n = 4) populations. Four GVs with MAFso 0.01 were shared by these ethnic groups. Given that the genetic risk for schizophrenia is due to many GVs with small effects, a cumulative GV effect might be associated with the pathogenesis of schizophrenia, rather than each genetic effect individually. However, as shown in Table 1, some GVs such as rs36068923 and chr3_180594593_I had low genetic heterogeneity across the ethnic populations. Genes near these GVs may be better targets for drug discovery because the number of individuals with these risk variants is consistent across populations.
As predicted by the out-of-Africa model of human origin, 18 AFR had a greater number of GV sites than the other ethnic populations. 15 Therefore, we excluded AFR individuals and recalculated the VIs of the schizophrenia-associated GVs in the remaining four populations. The VIs of the GVs in these four ethnic groups ranged from 0.83 to 8.22. The mean VI of the total GVs was 3.55. Although each VI and the range of the VIs in the four ethnicities were significantly lower than the VIs in the five ethnic groups (z = − 3.87, P o0.05), genetic heterogeneities were nevertheless observed. Some of these risk GVs may exert as the onset of schizophrenia in a specific environmental backgrounds, such as climate and infection exposures. Given that environmental exposures as well as individual common genetic risk variants confer risk of schizophrenia, gene-environment interactions (G × E) could have an important role in the etiology of schizophrenia. 19 Further studies are needed to reveal G × E involving these GVs detected in the GWAS PGC-II.
The major histocompatibility complex on chromosome 6 is one of the strongest and most persistently well-replicated regions associated with schizophrenia according to previous GWASs. [7][8][9] Numerous genome-wide significant variants within the major histocompatibility complex region have been identified. However, it is difficult to analyze this region because of its high LD and ethnic heterogeneity. 20,21 We hypothesized that the rs115329265 GV in this region would show high genetic variability among the ethnic populations, and GVs on chromosome 6 would have higher genetic variability than those on other chromosomes. As expected, the rs115329265 GV showed high genetic heterogeneity (VI = 5.16). The MAFs of this variant were 0.022, 0.151, 0.445, 0.146 and 0.088 in the EAS, EUR, AFR, AMR and SAS populations, respectively. The VI of this variant was the 36th highest of 122 GVs. In contrast, the GVs on chromosome 6 (VI = 4.42) did not show significantly higher genetic variability than those on other chromosomes (VI = 4.34). Furthermore, no specific chromosome with high genetic variability was identified among the ethnic populations.
For the majority of these 108 loci, the molecular mechanisms that underlie susceptibility to schizophrenia are unknown. Although 75% of the 108 loci harbor protein-coding genes and 40% harbor a single gene, 7 most associated variants were not in LD with known protein-coding variants, splice sites or 3'/5' untranslated regions. In general, SNPs associated with common diseases and phenotypes identified by previous GWASs are enriched in regulatory regions of the genome. 22,23 These findings suggest that most GWAS-detected SNPs contribute to disease susceptibility by altering gene expression rather than the protein structure. Therefore, careful examinations of gene expression and its relationship to GVs have become a critical step in elucidating the genetic basis of schizophrenia. [24][25][26][27][28] The current study sought to identify genetic variability in schizophrenia-associated GVs detected by a previous GWAS (PGC-II) among five major ethnic populations. As expected, numerous GVs showed genetic heterogeneities among these populations. In particular, 86 of 122 GVs showed significant genetic heterogeneities between the EAS and EUR populations. However, a composite of these GVs did not differ between these populations. Our findings suggest that the cumulative effect of GVs contributes to the risk of schizophrenia across ethnic populations.

CONFLICT OF INTEREST
The authors declare no conflicts of interest.

ACKNOWLEDGMENTS
This work was supported by a Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Young Scientists (B; 16K19784). We thank all individuals who participated in this study.