Introduction

Outcomes of psychotic disorder show substantial between-individual variation,1, 2 and genetic factors have been suggested to have a part in this.3, 4 Recent genome-wide association studies have suggested that the risk of psychosis may be conveyed by multiple genes of small effect, and that the ensemble of variants involved may be different for each individual.5 Although it is still uncertain whether these genetic variants define subgroups within psychosis,6 or symptoms that might be shared across multiple disorders,7 it remains likely that an individual’s genetic composition will also predict certain psychosis characteristics and outcomes.8

The single-nucleotide polymorphism (SNP) rs1344706 (A/C) at the second intron of gene ZNF804A (OMIM612282) on 2q32.1 is the most widely supported of all the genome-wide association studies-identified risk factors in schizophrenia.9 However, the A allele of rs1344706, the risk variant in this gene, has consistently shown stronger associations when replicated for patient samples combining schizophrenia and bipolar disorder.10, 11 Although it is known that the ZNF804A gene encodes a zinc-finger family protein called ZNF804A, its function remains poorly characterised.

Despite putative associations with the risk of psychotic disorder, there has been little research to date on associations between ZNF804A and clinical outcomes. Capitalising on a data linkage between a large genetic database and an electronic mental-health record, we investigated the association between the ZNF804A genotype and two measures of early clinical outcomes: duration of in-patient admission and number of in-patient episodes. We hypothesised that the A allele, proposed to be associated with increased risk of developing psychosis, would also be associated with worse outcomes over a 2-year period following the first clinical presentation.

Materials and methods

Sample characteristics

The sample analysed comprised patients with first-episode psychosis from the Genetics and Psychosis (GAP) study.12, 13, 14 All cases were recruited between 2004 and 2010 at their first presentation to the South London and Maudsley NHS Foundation Trust: a large, near-monopoly provider of mental-health care to a geographic catchment of ~1.2 million residents in southeast London. All GAP participants were aged between 18 and 65 years, and met ICD10 diagnostic criteria for functional psychosis (F20–29 or F30–33). Diagnoses were validated using the Schedules for Clinical Assessment in Neuropsychiatry (WHO 1992) administered by trained researchers. Ethical approval for the GAP study was granted by the South London and Maudsley and Institute of Psychiatry Research Ethics Committee, and all participants provided informed written consent.

Data linkage

GAP study participant data were linked to the Clinical Record Interactive Search (CRIS) application that derives full but de-identified data from the electronic patient records used by South London and Maudsley NHS Foundation Trust. CRIS has been described in detail,15 and currently accesses over 250 000 mental-health records. All GAP participants had provided prior consent for medical records' access. Identifier fields from 291 incident cases of first-episode psychosis were thus successfully linked with the clinical records' database.

Genotyping

DNA for these 291 patients had previously been extracted from either a blood sample or a cheek swab, using a standard phenol–chloroform method. Genotype status at the rs1344706 locus within ZNF804A was ascertained using an off-the-shelf Taqman SNP genotyping kit (http://www.appliedbiosystems.com; C_2834835_10). Reactions were run on a 7900HT sequence detection system (Applied Biosystems, Foster City, CA, USA). All the 291 cases used in the study yielded an unambiguous genotyping result (100% call rate). The distribution of genotypes at rs1344706 were within Hardy–Weinberg equilibrium (P>0.05) for the 291 cases.14

Genetic ancestry was derived using a panel of 57 ancestry informative genetic markers. These were genotyped using the iPLEX technology developed for MassArray platform (Sequenom, San Diego, CA, USA). Further information on the makeup of the marker panel is available on request. Ancestry scores were derived using the programme STRUCTURE to implement a model-based (Markov Chain Monte Carlo) clustering algorithm. Having determined the best solution for K (the probable true number of underlying genetic groups in the GAP study sample), a three-way ancestral axis for black African, white Caucasian and Asian ancestry was created, using an ancestry score of 98% as the membership criterion for the reference group. Genetic ancestry was used as a covariate in the regression model as differences in rs1344706 allele frequencies have been observed between populations (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1344706).

Assessment of outcomes

A follow-up period of 2 years was defined for each GAP participant from their first mental-health services contact with psychotic illness. Date of first contact with mental-health services was extracted from CRIS. The primary outcome, defined a priori, was the time spent in mental-health in-patient care during the first 2 years. This, and a secondary outcome measure of number of in-patient episodes for the same period were derived from CRIS. Any outcome data recorded before 1 April 2006 were manually cross-checked because of less comprehensive structure in the electronic patient records before that date.

Covariates

The following additional data were extracted and used in analyses: age at presentation, gender and marital status (categorised into married/cohabiting or other). Information on the first functional psychosis diagnosis recorded in the health record was also obtained from CRIS and classified into schizophreniform (ICD10 F20–29 codes) or non-schizophreniform illnesses. Further exploratory analyses made use of structured data from the Health of the Nation Outcomes Scales (HoNOS),16 which are routinely completed by keyworkers in UK mental-health services for administrative returns. Specifically, where recorded, the highest scores during the 2-year surveillance period were derived on the following HoNOS scales: (i) agitated behaviour, (ii) cognitive impairment, (iii) hallucinations and delusions, (iv) depressed mood and (v) problems with activities of daily living. Subscales are rated by clinicians on a 0–4 ordinal scale indicating no problems (0), minor problems requiring no action (1), mild problems but definitely present (2), moderately severe problem (3) and severe to very severe problem (4). For the purpose of these analyses, HoNOS subscale scores were condensed to categorise the defined features as present (score 2–4) or absent (0–1).

Statistical analysis

Data were extracted using SQL, and analyses were performed using Stata 12 (StataCorp, College Station, TX, USA). The distributions of sociodemographic and diagnostic variables by genotype were described in a series of bivariate analyses. Initial analyses compared the duration of in-patient care calculated for the 2-year surveillance period between rs1344706 genotypes AA, AC and CC. Genotypes were then coded to reflect rs1344706 A allele dosage. Linear regression models with bootstrapping were fitted to analyse duration of in-patient care, entering sociodemographic and diagnostic covariates separately and in combination to assess the impact of these adjustments on the association of interest. A post hoc power calculation for a linear regression analysis (based on duration of in-patient admission) indicated that a small-to-medium effect size (f2=0.035) could be detected at 90% power (alpha 0.05) with the linked sample of the 291 cases.

Results

As displayed in Table 1, among the study sample of 291 patients with first-episode psychosis, no differences were observed in age, gender or diagnosis by rs1344706 genotypes. The A allele was positively associated with single status, with higher mean black African ancestry score and with lower white Caucasian and Asian ancestry scores.

Table 1 Sample characteristics by the rs1344706 genotype

Of the 291 cases, 245 (84.2%) experienced at least 1 day’s in-patient care between initial presentation and the end of the follow-up period (Table 2). These proportions did not differ significantly between genotypes (χ2=0.27 (1 degree of freedom), P=0.60; Fisher’s Exact Test P=0.90), and the median number of admissions was one in all three genotypes. However, the mean total duration of in-patient admission was higher with increasing A allele dose. This was confirmed in linear regression models with bootstrapping, whether the sample was analysed as a whole or restricted to those with at least one in-patient admission, and little altered in strength following adjustments (Table 3). Among those who experienced at least one in-patient admission, the fully adjusted coefficients indicated an increased total duration of in-patient admission of 38 days for each A allele increment.

Table 2 Risk of admission and total duration of in-patient admission by the rs1344706 genotype
Table 3 Association between number of rs1344706 A alleles and total duration of in-patient admission

In further exploratory analyses of characteristics derived from HoNOS scores recorded during the 2-year observation period, no significant group differences were observed between genotypes (Table 4), and no meaningful differences were observed in either the strength or statistical significance of associations between genotype and duration of in-patient stay following adjustment for these characteristics.

Table 4 Distribution of characteristics ascertained through HoNOS by the ZNF804A genotype

Discussion

Using linked genetic and clinical data on a sample with first-episode psychosis, we found evidence for a strong association between the ZNF804A rs1344706 A allele and longer cumulative duration of in-patient admissions during the 2 years following first clinical presentation. Although this SNP has been implicated as a risk factor for developing psychotic illness, to our knowledge, ours is the first demonstration of a relationship with subsequent clinical outcome.

The analyses presented here were planned and carried out a priori, and at the time of manuscript submission, ZNF804A rs1344706 was the only SNP to have been evaluated against these clinical outcomes in this cohort. This study sought to determine whether the newly emerged common genetic architecture of schizophrenia has potential clinical value beyond simply conveying liability to illness. Given that genome-wide association study data were not available for the majority of this cohort, the choice of an a priori candidate was central to the hypothesis. The most objective way to select the candidate in this case was a principled approach, where the most representative common genetic risk factor for schizophrenia was sought. The most replicated genome-wide association study signal for schizophrenia is attributable to the rs1344706 locus in ZNF804A, and this was therefore the archetypal common risk factor for psychosis (that is, schizophrenia and bipolar disorder). In addition, with the exception of ancestry informative markers (n~55; see Materials and methods), genetic characterisation of the GAP cohort was extremely limited at the time of analysis.

The plausibility of the finding and potential causal pathways require consideration. With regards to functionality, rs1344706 maps to the noncoding genome (that is, intron 2 of ZNF804A) and thus is likely to have regulatory effects on gene expression17 rather than consequences for protein structure.18 The gene demonstrates a temporally sensitive pattern of expression over the human lifespan, with maximum expression levels in utero.19, 20 Association of the rs1344706 risk locus with allelic expression has been reported, possibly specific to in utero development,20, 21 and is potentially attributable to splicing.20 However, it remains unclear whether the relationship between rs1344706 and expressed ZNF804A levels is sustained into adulthood. Allelic variation at rs1344706 reflects varying affinity of the gene for nuclear proteins.22 As nuclear proteins are the main factors that drive the transcription of genes, these subtle differences can also affect the compartmental ratios of one allelic transcript to another. This type of molecular disturbance has previously been shown to occur across the human brain and in relation to ZNF804A.23 As a putative transcription factor, ZNF804A has a large number of potential targets in both the adult brain and that of the developing fetus. Hence, the pathogenic consequences of ZNF804A expression may take hold in early neurodevelopment and would probably require the coordinated response of multiple different target genes. The net response of these ZNF804A targets may therefore explain the strong effect of genotype on the observed outcome (38 additional hospitalisation days for each A allele increment). In support of this, manipulation of ZNF804A expression in gene ‘knock down’ experiments suggest that regulatory variation within the gene could potentially have effects on broader transcriptional networks. Hill et al.24 demonstrated, using a whole-genome approach, that ZNF804A silencing can affect over 150 different genes that pertain to neuronal migration, neurite outgrowth and synapse formation.

Considering symptom severity and the nature of the underlying psychotic disorder as a potential reason for associations with in-patient outcomes, these effects of rs1344706 variation on gene expression may explain previously observed psychosis-related structural changes associated with the risk genotype, such as impaired connectivity between the prefrontal cortex and the hippocampus,25 as well as other cognitive26 and symptomatic27 effects. A combination of these and other dysfunctional processes attributed to rs1344706 might, therefore, explain our observation of prolonged hospitalisations in the high genetic risk subgroup. An exploratory analysis comparing HoNOS subscales did not indicate substantial differences between genotypes, although it is perhaps noteworthy that those with the AA genotype were more likely than those with the CC genotype to have cognitive problems or impaired activities of daily living, and were less likely to have agitation or hallucinations/delusions. This might indicate a higher propensity to negative rather than positive symptoms, although these differences were not significant and did not account for the observed association with the duration of in-patient stay; furthermore, they are not consistent with previous research that found higher levels of positive symptomatology in A allele carriers, but no difference in negative or disorganisation symptoms.27 However, it is important to bear in mind that these HoNOS-derived data were not specifically recorded in relation to index admissions and may therefore not adequately reflect causal pathways between gene expression and in-patient outcome.

Although use of in-patient services is an important marker of overall disease severity in psychosis, it is important to bear in mind that it is influenced by a number of other factors that could conceivably account for the association of interest if they also vary between the genotypes of interest. Differences in response to pharmacotherapy might also account for the observations. Previous research, for example, has suggested that rs1344706 may have a role in the genetics of antipsychotic response, namely two studies of Chinese and Caucasian first-episode psychosis patients, which found that remission of positive symptoms after 4 weeks of supervised antipsychotic medication was significantly reduced in people with the ZNF804a risk allele.28, 29 Unfortunately, however, we did not have specific data on treatment response in order to test this hypothesis, which requires further evaluation.

Considering other explanations, we investigated age, gender, relationship status, ancestry and diagnosis as potential covariates for this clinical outcome, which did not account for the associations of interest. Specifically, although ancestry scores showed substantial variation between genotypes, adjustment for ancestry, if anything, strengthened rather than attenuated the associations between genotype and in-patient outcomes. Causal pathway factors primarily determined by ancestry are therefore unlikely explanations. It was beyond the scope of the study to investigate other potentially relevant factors such as social support,30 substance use,31 employment32 or accommodation status,33 which have also been found to influence duration of in-patient admission. Data on the duration of untreated psychosis were also insufficient for analysis. Although service-level factors are important predictors of length of stay, these are unlikely to be important for genetic assays in a sample drawn from a single health-care provider. The fact that the rs1344706 A allele was associated with longer cumulative duration of in-patient care, but not with the presence or not of an in-patient admission, suggests that it might modify psychotic disorder presentation during the acute phase, or its response to treatment, rather than affecting the likelihood of a relapse or symptom levels during remission. Number of in-patient episodes also did not vary by genotype, consistent with this. However, although some studies have used admission count as an outcome measure for psychosis, this is often poorly correlated with other prognostic indicators,34 and might not have been a sensitive enough outcome during the 2-year observation period.

Psychiatric genetic studies often look at narrow and well-defined outcome measures, such as neuropsychological test scores, that have distant clinical application. A major contribution of this study was that the outcomes were directly drawn from clinical data within a relatively large and representative sample of ethnically and socially diverse people receiving routine clinical care for their first psychotic episode. Moreover, use of in-patient services as an outcome has a high face validity and a direct utility in service planning,35 with important implications for an individual’s recovery potential,36 despite the limitations noted above. The sample size was relatively high for a first-episode psychosis case series and was adequate to demonstrate the outcome of interest. Numbers of outcomes chosen for investigation were relatively small, and only a single genotype of interest had been examined at the time of analysis, reducing the likelihood of type 1 statistical error arising through multiple testing. Limitations, as discussed, principally concern the availability of data to investigate causal pathways between the genotype and outcome. Generalisability beyond the first-episode period cannot be assumed and would need further empirical investigation. Although we could not exclude a small proportion of the patients having had in-patient admissions outside the geographic catchment during the follow-up period, this is unlikely to have influenced the findings of interest. Assuming our finding is replicated, further research is required into mechanisms underlying the association we found. Key targets for further evaluation would include symptom profiles and severity experienced during a relapse, delays in presentation and/or intervention, treatment resistance and premorbid adjustment and social relatedness. It is likely that ZNF804A rs1344706 is not alone as a gene-modifying clinical outcome, but that it acts in combination with other genetic and environmental factors. Identifying these combinations may substantially improve the potential to predict prognosis in psychotic disorders.