Introduction

With a global prevalence of ~1.1%, schizophrenia (SCZ) is a complex and severe chronic psychiatric disorder with a heterogeneous clinical phenotype characterized by auditory hallucinations, delusions and deficits in cognitive function.1 Early genetic epidemiological studies have shown a familial clustering pattern for SCZ, which estimate a heritability ranging from 60 to 80% to develop SCZ.2 Follow-up genetic epidemiological research has focused on association mapping, thus identifying several potential susceptibility genes for SCZ.3, 4, 5, 6, 7 Research interest in this process has greatly increased with the wide application of genome-wide association studies of SCZ.8, 9 However, the specific biological pathways underlying an individual’s predisposition to SCZ, as a complex mental disorder influenced by both genetic and environmental factors, remain unknown.10, 11, 12

Recently, significant association between the H2AFZ gene and SCZ was reported in Japanese males.13 The H2AFZ gene contains five exons spanning ~2 kb of chromosome 4q24. These exons encode a replication-independent member of the histone H2A family, which has a major role in critical biological processes including chromosome segregation,14 cell cycle progression15 and the maintenance of heterochromatin/euchromatin status.16 The acetylation or ubiquitylation of H2AFZ leads to the activation or silencing of the gene, respectively,17 which in turn ensures proper regulation of gene expression. It can be suggested that H2AFZ is capable of coregulating multiple biological pathways involved in the etiology of SCZ. Despite evidence of a strongly significant association in Japanese males, whether the H2AFZ gene confers a risk for SCZ remains to be seen. Furthermore, among the various cognitive deficits (working memory, executive function, verbal fluency and episodic memory) reported in SCZ,18 perseverative deficits in SCZ are considered both as factors related to frontal dysfunction19 and as a vulnerability marker.20 Moreover, a previous study has shown that perseverative error processing appears to be a marker of frontal lobe dysfunction in SCZ, and this psychological parameter may also be measured as a vulnerability marker for the disorder.21 Therefore, it is necessary to evaluate and explore the association between the H2AFZ gene and cognitive deficits in SCZ patients, as well its possibility in additional, distinct populations.

Large-sample genetic association analyses were conducted in the present study to investigate the relationship between the H2AFZ gene and SCZ in case–control subjects selected from the Han Chinese population. Furthermore, to better understand the interaction between the H2AFZ gene and perseverative error processing under pathological conditions, the secondary aim of this study was to evaluate this interplay in SCZ patients.

Materials and Methods

Subjects

Two separate data sets were included in this study, and a two-stage approach was used for the single-marker discovery analyses. The testing set consisted of 1115 patients (580 females of mean age 37.1±7.68 years and 535 males of mean age 35.8±5.57 years) and 2289 unrelated healthy controls (1182 females of mean age 36.7±9.13 years and 1107 males of mean age 36.1±9.32 years), whereas the validation set consisted of 1843 SCZ cases (950 females of mean age 36.9±11.9 years and 893 males of mean age 36.3±11.3 years) and 3155 healthy controls (1609 females of mean age 37.3±13.8 years and 1546 males of mean age 36.6±13.3 years). All patients diagnosed by at least two experienced psychiatrists based on the Diagnostic and Statistical Manual of Mental Disorders, 4th edition, criteria for SCZ at the in- and outpatient clinical services of the psychiatric unit at Xi'an Mental Health Center were selected for the study. A total of 1115 patients in the testing set completed the cognitive assessments by the Wisconsin Card Sorting Test, and relevant useful data were obtained. All unrelated healthy controls were selected from a combination of local volunteers and blood transfusion donors; those with a personal family history of mental illness in the previous three generations and those diagnosed with psychoses either currently or in the past were excluded from the present study. All subjects from the city of Xi’an in Shaanxi Province were of Han descent, and we excluded those not born locally or whose family members from the previous three generations were not born locally. All participants completed the written informed consent forms. All procedures for the study were approved by the Ethics Committee of Xi’an Jiaotong University.

Single-nucleotide polymorphism selection and genotyping

The dbSNP database of National Center for Biotechnology Information was searched for all single-nucleotide polymorphisms (SNPs) with minor allele frequency ⩾0.01 in the H2AFZ gene region, and 10 SNPs were found (rs76263771, rs78656098, rs61203457, rs2276939, rs112445319, rs79002079, rs28635099, rs73832925, rs3756087 and rs17029673). Therefore, these 10 SNPs completely covering the region of the H2AFZ gene (Figure 1) were included in our further analyses.

Figure 1
figure 1

The genomic location of seven selected single-nucleotide polymorphisms (SNPs) across the H2AFZ gene. A full color version of this figure is available at the Journal of Human Genetics journal online.

DNA was extracted from whole blood by following the standard protocol of the DNA Isolation Kit for Mammalian Blood (Tiangen Biotech, Beijing, China). DNA was stored at −20 °C for SNP analysis. Genotyping was performed for all SNPs using the MassARRAY platform (Sequenom, San Diego, CA, USA). The subsequent statistical analysis was reliable because the final genotype call rate of each SNP was >96%, and the overall genotyping call rate was 98.8%. In addition, 5% of random samples were genotyped again, and the results showed 100% concordance.

Statistical analysis

The Hardy–Weinberg equilibrium test was conducted for each SNP in both the case and control groups using Haploview v.4.2 (Supplied by Broad Institute of MIT and Harvard, http://www.broadinstitute.org/scientific-community/science/programs/medical-and-population-genetics/haploview/haploview) CLUMP v.2.4 (Supplied by Dave Curtis, http://www.davecurtis.net/software/dcurtis/clump24.zip) genetic analysis software was used for allelic and genotypic association tests. The SNP markers were imputed to increase their density using IMPUT2 (Supplied by Bryan Howie and Jonathan Marchini, https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) genetic software with the 1000 Genome data set (Phase 3, Human Genome 19 or National Center for Biotechnology Information Build 37) from a combined sample set (CHB and JPT) as a reference. A follow-up association analysis was performed with the snptest software (Supplied by Jonathan Marchini and Gavin Band, https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html). With gender added as a covariate, frequentist association tests were implemented for each imputed SNP. To ensure the accuracy of imputation, only the imputed SNPs with an average maximum posterior probability ⩾0.8 were used. Only the SNPs with minor allele frequency ⩾0.01 were included in the power considerations of the association test. The intermarker relationship was determined by performing a pairwise linkage disequilibrium (LD) with D′ and r2 values using the Haploview v.4.2 software program. The haplotype frequencies were estimated using GENECOUNTING v.2.2, and a haplotypic association analysis that included a likelihood ratio test followed by permutation testing was performed for the common haplotypes (frequency >0.01). We used 1000 Genomes data as a reference data set to generate a regional LD plot and a regional association plot based on imputed and genotyped SNPs by SNP Annotation and Proxy Search (http://www.broadinstitute.org/mpg/snap/ldplot.php). To investigate the potential effect of gender on these SNPs, the samples were stratified by gender. With age, gender and duration of illness as covariates, multivariate analysis of covariance was performed in patients to investigate the association between H2AFZ polymorphisms and Wisconsin Card Sorting Test parameters. All tests in this study were two-tailed, and a P-value of 0.05 was considered to be significant in all analyses. The power calculations were performed using PGA v.2.0. In our study, the testing sample size could detect the SNP and haplotype association with 94% and 83% power, respectively. The power analysis for the additional replication analyses showed that the validation sample size could attain 96% power for the SNP association.

Meta-analysis

The study by Jitoku et al.13 and the present study were included in the meta-analysis. All statistical analyses were performed using the RevMan 5.2 program (http://ims.cochrane.org/revman).

Results

Allelic and genotypic association analysis

In the testing data set, 10 SNPs within the H2AFZ gene were genotyped. The allelic and genotypic distributions of all SNPs in both the cases and controls, including the results of the Hardy–Weinberg equilibrium test, are shown in Table 1 and Supplementary Table S1. Three SNPs (rs76263771, rs78656098 and rs112445319) did not exhibit any polymorphisms, whereas the other seven SNPs were highly polymorphic in both samples; the allelic and the genotype distributions of all samples were in Hardy–Weinberg equilibrium (P>0.05).

Table 1 Allele and genotype frequency of single SNP association analysis

The testing data set was subject to a single SNP association analysis, and association signals were observed for two SNPs (rs2276939 and rs73832925; P=0.002 and 0.011, respectively) (Table 1). After Bonferroni correction, however, only rs2276939 remained significantly associated with SCZ (corrected P=0.015). Genotypic association analyses confirmed similar results with a significantly corrected P-value of 0.039. The G allele of rs2276939 was more frequently observed in patients compared with that in controls (odds ratio (OR)=1.22, 95% confidence interval (CI): 1.07–1.38); the other five SNPs did not differ significantly in their genotypic or allelic distributions (Supplementary Table S1). Based on the small effect sizes conferred by common alleles, which requires the use of large samples, the overall state of a given SNP is best summarized by an association analysis from different populations. Therefore, a single SNP association analysis was performed for the two SNPs from the validation data set discussed earlier. The significant association between rs2276939 and SCZ was replicated (corrected P=0.006), and the association with rs73832925 remained nominal following Bonferroni correction (Table 1). To examine the role of gender in the associations from the testing and validation data sets, we analyzed our data by separating males from females. We found that only rs2276939 showed allelic associations with SCZ in males from both the testing data set (corrected P=0.034) and validation data set (corrected P=0.048), whereas rs73832925 showed no association with SCZ in females or males from either data set (Table 2). In addition, significant differences were observed between cognitive performances and rs61203457 and rs2276939 (Supplementary Table S2).

Table 2 Gender-specific allele and genotype frequency of single SNP association analysis

Imputation and haplotypic association analysis

A total of 2109 SNPs were imputed successfully. After the quality control parameters were applied (minor allele frequency ⩾0.01, certainty ⩾0.8), 22 imputed SNPs within a 500-kb genomic region of the H2AFZ gene were included in the association test. The complete results of the association test based on the typed and imputed SNPs are summarized in Supplementary Table S3. Four SNPs showed a significant association after Bonferroni correction (0.05/22=0.00227): three imputed SNPs (rs28893880, rs17029667 and rs28375729; imputed P=0.00155284, 0.00154899 and 0.00155329, respectively) in the DNAJB14 gene and a typed SNP (rs2276939, imputed P=0.00219188) in the H2AFZ gene. The regional association plot for the 1000 Genomes CHB and JPT data sets was presented in Figure 2. Based on the regional LD plot (Supplementary Figure S1), the four SNPs (three imputed SNPs and a typed SNP) were probably implicated in high LD. For the haplotype-based association analyses, the LD structure in the genotype data of seven SNPs was examined and two haplotype blocks were identified: a three-SNP block (Block 1, rs61203457–rs2276939–rs79002079) and a two-SNP block (Block 2, rs73832925–rs3756087) (Figure 3). Haplotypic association analyses were performed to test the two LD blocks (Table 3), with significant P-values (global P<0.001 and P=0.002, respectively) obtained from these blocks. Some haplotypes in the two LD blocks were positively associated with SCZ. For example, the CGT haplotype in Block 1 and the TG haplotype in Block 2 were significantly associated with SCZ (P<0.001 and P=0.026, respectively) (Table 3). It is worth noting that the frequency of the CGT haplotype in Block 1 was increased nearly fivefold in males but only 1.3-fold in females. These results further prove the significant association between rs2276939 and SCZ, particularly in males.

Figure 2
figure 2

The regional association plot for both imputed and genotyped single-nucleotide polymorphisms (SNPs) in 1000 Genomes CHB+JPT data. This plot shows association results of imputed and genotyped SNPs within a 500-kb genomic region covering the H2AFZ gene. The diamonds represent the imputed and typed SNPs. The horizontal axis is the genomic context of the region studied (along with the genes). The left vertical axis represents negative logarithm of the P-value, and the right vertical axis is the recombination frequency of the region. A full color version of this figure is available at the Journal of Human Genetics journal online.

Figure 3
figure 3

Estimation of linkage disequilibrium (LD) between each pair of seven single-nucleotide polymorphisms (SNPs) genotyped in the H2AFZ gene in Han Chinese individuals. LD structure (D′) between marker pairs was indicated by the shaded matrices. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 3 Common haplotype frequency and association analysis

Meta-analysis

For the meta-analysis of the rs2276939 SNP with SCZ, the current study was combined with another independent case–control study by Jitoku et al.13 No significant heterogeneity was detected on conducting homogeneity tests (χ2=2.19, d.f.=2, P=0.33); therefore, the combined data used in the meta-analysis were based on a fixed-effect model. A significant difference in the G allele of rs2276939 was found between patients and controls (total OR=1.15, 95% CI=1.08–1.23, Z=4.19, P<0.0001) (Figure 4).

Figure 4
figure 4

Meta-analysis of the association of the rs2276939 single-nucleotide polymorphism (SNP) in the H2AFZ gene with schizophrenia. A full color version of this figure is available at the Journal of Human Genetics journal online.

Discussion

In this study, evidence of an association between the rs2276939 SNP and SCZ is presented. Although Jitoku et al.13 did not find a similar significant association,13 they did report the T allele as the minor allele, in contrast to the G allele identified in the present study. Therefore, their interpretation of the association is likely to differ from ours. Taking into account the false association arising from the small sample size of a case–control study, a second-stage study was conducted in 4998 subjects for effectively following up the findings from our first-stage study of 3404 subjects. The association between rs2276939 and SCZ was replicated and additionally confirmed by the positive findings from our imputation-based analysis. Several quality control criteria, including the control of average maximum posterior probability, were applied in this study to reduce the potential effects of the inaccuracy of imputation on the association tests. The results of the haplotype analysis also indicated a significant association between rs2276939 and SCZ. Furthermore, rs2276939 showed significant allelic and genotypic associations with SCZ in males from the two-stage study, which was consistent with the study conducted by Jitoku et al.13 on the Japanese population. Gender-specific differences in SCZ susceptibility have been reported previously; for example, in the catecho-O-methyltransferase (COMT) gene22 and the PDE4B gene;8 however, the present data should be carefully interpreted because the gender–genotype biological interaction has not been verified experimentally. Despite some similarities in the general association patterns of the present study and the study by Jitoku et al.,13 these discrepancies can possibly be explained by differences in ethnicity, genetic heterogeneity and sample size.

The rs2276939 SNP identified in the present study is located at a high LD block covering exons 3 and 4 of the H2AFZ gene. Given the potential effects of the LD block on the regulation of gene transcription, it is reasonable to assume that the significant association identified in our study would be of interest for future studies. Three SCZ-associated imputed SNPs in the DNAJB14 gene were found to be more statistically significant compared with rs2276939 in the imputation analyses. DNAJB14 is a single membrane-spanning J-protein located in the endoplasmic reticulum, capable of enhancing the endoplasmic reticulum-associated degradation of misfolded membrane proteins via the ubiquitin–proteasome system.23 A possible potential association with SCZ in the region of the DNAJB14 gene cannot be completely ruled out, despite the lack of evidence in the study results of Jitoku et al., which was of low power with only two SNPs. In addition, based on large meta-analyses of SCZ genome-wide association studies,24, 25 the DNAJB14 gene was found to be only 1.3 kb away from H2AFZ gene, the closest gene to a significantly associated SNP (rs13194053). Thus, we hypothesized that some SNPs could influence the expression levels of the DNAJB14 gene unpredictably, or they might be in LDs with other undiscovered SNPs that are part of the regulation machinery conferring the risk for SCZ. All of our findings should be considered preliminary, and additional follow-up studies, including high-density mapping and targeted deep sequencing to undercover the fundamental characteristics of pathogenic mutations and any potential associations with SCZ, are required.

Recently, endophenotype approaches have been used in SCZ research, suggesting that endophenotypes are heritable and capable of segregating with the known risk loci of SCZ.26 Therefore, these endophenotype approaches could increase the chances of detecting the susceptibility loci in SCZ. The results in the present study revealed a significant interaction between rs61203457 and rs2276939 in the same LD block and phenotype in the Wisconsin Card Sorting Test. However, the molecular mechanisms of cognitive deficits are very complex, and the role of many genes, pathways and signaling molecules in these processes has been confirmed. Potential interactions with these processes are worth investigating further. First, other possible factors leading to population stratification cannot be ruled out completely, which might affect our results, although only subjects with no migration history over the past three generations were recruited. In addition, some potential limitations of our study should be noted. Other potential confounding variables such as clinical symptoms (positive or negative), education duration, medical history and drugs administered during treatment cannot be ruled out conclusively. Moreover, the significance of the SNPs identified in this study in cognitive function or other neuropsychological functions in healthy individuals is also unclear. Therefore, our results should be replicated in other ethnic groups, and the effects of the H2AFZ gene, particularly the rs61203457 and rs2276939 SNPs, on executive function, further investigated.

In summary, our study provides further evidence that supports the association between the H2AFZ and SCZ. However, the mechanisms involved in the etiology of SCZ remain poorly understood. For eventual clinical application, the present findings should be replicated to elucidate the pathological mechanisms of the H2AFZ gene and its functional roles in SCZ, given the multiple variants with small effects and the molecular basis of the associations in the complex network underlying the etiology and pathophysiology of SCZ.