Genome wide study of tardive dyskinesia in schizophrenia

Tardive dyskinesia (TD) is a severe condition characterized by repetitive involuntary movement of orofacial regions and extremities. Patients treated with antipsychotics typically present with TD symptomatology. Here, we conducted the largest GWAS of TD to date, by meta-analyzing samples of East-Asian, European, and African American ancestry, followed by analyses of biological pathways and polygenic risk with related phenotypes. We identified a novel locus and three suggestive loci, implicating immune-related pathways. Through integrating trans-ethnic fine mapping, we identified putative credible causal variants for three of the loci. Post-hoc analysis revealed that SNPs harbored in TNFRSF1B and CALCOCO1 independently conferred three-fold increase in TD risk, beyond clinical risk factors like Age of onset and Duration of illness to schizophrenia. Further work is necessary to replicate loci that are reported in the study and evaluate the polygenic architecture underlying TD.


Introduction
Tardive Dyskinesia (TD) is a persistent and potentially debilitating involuntary movement disorder characterized by choreiform, athetoid, and or dystonic movements 1,2 . TD is largely caused by antipsychotic treatment and was first described in 1957 3 . Although commonly observed in patients with schizophrenia, TD can occur in individuals with other psychiatric disorders, as long as they have been similarly exposed to prolonged antipsychotic treatment. The prevalence of TD in schizophrenia is estimated to be between 15 and 30%, although rates of TD have reduced with prescription of second-generation antipsychotics [4][5][6] . Nevertheless, second-generation antipsychotics still carry with them a risk of developing TD, and TD remains a clinically relevant phenotype as it has been associated with more severe schizophrenia illness, cognitive impairments, lowered quality of life and increased mortality [7][8][9][10] .
The pathophysiology of TD is currently unknown. Theories pertaining to dopamine receptor hypersensitivity, serotonergic dysfunction, GABA insufficiency, and free radical damage have been put forth 2,11,12 . It is postulated that TD is related to the schizophrenia disease process; recent reports have linked basal ganglia (caudate nucleus) volume reductions to TD in schizophrenia; notably, these samples were on second-generation antipsychotic-suggesting the pathological process towards TD is beyond neurochemical properties 13 . As TD is potentially irreversible with no effective treatment, there needs to be an added emphasis on the prevention and identification of genetic risk factors. Several genes (e.g.-, DRD2, DRD3, MnSOD, CYP2D6, GRIN2A, and GRIN2B) have been implicated in candidate gene studies, but replication remains equivocal [14][15][16][17][18] . Genome-Wide Association Studies (GWAS) performed on the TD phenotype suggested that the GLI family zinc finger 2 (GLI2), heparan sulfate proteoglycan 2 (HSPG2), dipeptidyl-peptidase 6 (DPP6), and GABA pathway genes could putatively be considered susceptibility genes for TD [19][20][21][22][23] . Nonetheless, these studies are limited by small samples and require further replication. Here, we report the findings of the largest GWAS of TD to date. We identified a novel locus at 16q24.1 (P = 3.01 × 10 −8 ) and three suggestive loci (1p36.22, 6q23.2, 12q13.13; P < 5 × 10 −7 ) associated with TD.

Methodology
Participants from the Singapore Clinical and Translational Research program in Singapore, and the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) [24][25][26] study were included in the current report. All participants met criteria for DSM-IV diagnosis for schizophrenia. Tardive dyskinesia was ascertained via the Abnormal Involuntary Movement Scale (AIMS) 27,28 . After quality control procedures, 1406 participants (280 with TD), and 6,291,020 SNPs remained. Linear Mixed Models GWA was performed via GEMMA 29 , and independent cohorts across the two studies were meta-analyzed via fixed effects inverse variance approach in METAL 30 . GWAS summary statistics were then subjected to functional annotation for GWAS identified loci 31 , eQTL lookups, pathway analysis 32 , and transcriptome-wide analysis 33 , which were conducted to characterize potential biological mechanisms underlying TD. Further finemapping analysis was also carried out to identify credible causal variants [34][35][36] . A series of polygenic risk score analyses were conducted to compare the genetic architecture of TD with related phenotypes 37 . We also carried out a series of post-hoc multivariate logistic regression analysis to investigate joint clinical and genetic factors that predict the emergence of TD. Finally, variants previously associated with TD were compared with the current GWAS results 11,18,22,23,[38][39][40][41][42][43][44][45][46] . Detailed methodological approaches are further reported in the Supplementary Information included with the current report.

Demographics and assessment of tardive dyskinesia
Demographics are reported in Table 1 and Supplementary Table 1. There were 71.1% males and 28.9% females with TD. There was no significant difference in gender proportion between individuals with TD and those without, χ 2 (1, n = 1406) = 0.142, P = 0.706. There were significant differences in age, t(1404) = −14.06, P = 4.0 × 10 −42 , age of illness onset, t(392.7) = −3.06, P = 0.0024, duration of illness, t(1390) = −11.28, P = 2.76 × 10 −28 , and antipsychotic dose measured in CPZ equivalent, t(516.3) = 3.73, P = 0.002 between individuals with and without TD. These differences in demographics and clinical characteristics were further modelled using a polygenic risk score approach that allows us to examine genetic effects for TD alongside demographics effects; results are reported in subsequent sections.

Gene-based and pathway analysis
MAGMA was used to conduct both the gene-based and geneset analysis 32 . Gene-based analysis tests for association   of TD with markers within the gene by mapping the SNPs to gene level, while geneset analysis aggregates individual genes to a collection of genes with overlapping characteristics 28 . Results of MAGMA gene-based analysis on 18,259 mapped autosomal genes after Bonferroni correction are presented in Supplementary Table 5. None of the genes were significant after Bonferroni correction. MAGMA competitive geneset analysis 32 was conducted using the most recent Molecular Signature Database version 6.1 48 . Although none of the 17,199 genesets survived Bonferroni correction, top pathways implicated regulation of immunoglobulin production and immunoglobulin isotype switching, expression of chemokine receptors, and regulation of cell growth, and monocytes (Supplementary Table 6).
Transcriptome-wide analysis was implemented via MetaXcan 33 . Unlike eQTL lookups, the MetaXcan approach further incorporates evidence from GWAS summary statistics with genome-wide gene expression profiles from the GTex 49 database. This provides information of potential functional enrichment within a genomic locus. Here, we found lower expression of a transcript, EPB41L2, in the esophagus muscularis at Bonferronicorrected significance (P = 0.02, Supplementary Table 8).

Fine-mapping analysis
Cross-Ancestry fine mapping was performed on PAINTOR v3.1 34,35 to identify putative causal variants on the 4 loci (1p36.22, 6q23.2, 12q13.13, and 16q24.1). The 99% cumulative posterior probability identified a total of 231 putative credible causal SNPs across three loci (1p36.22, 12q13.13, and 16q24.1; Supplementary Table 9, Supplementary Fig. 8). From these, highly credible SNPs were defined as posterior probability > 0.8. This identified a putative causal variant for each locus: 1p36.22 (rs499646, posterior probability = 0.99), 12q13.13 (rs4237808, posterior probability = 0.863), and 16q24.1 (rs28468398, posterior probability = 0.952). Notably, for loci on 1p36.22 and 12q13.13, the index SNP identified in GWAS is also a putative causal SNP identified by PAINTOR (See Fig. 3). Further annotations via the Variant Effect Predictor 50 revealed that (i) rs499646 is an intronic variant 3500 bp from the promotor of TNFRSF1B and is a known protein-coding variant and appears to be a loss-of-function intolerant variant, (ii) rs4237808 is an intergenic variant between ATP5G2 and CALCOCO1, but lies within 1000 bp of two CTCF binding sites, and 3200 bp from a known promotor flank, and (iii) rs28468398 is a regulatory region variant, which lies within a transcription factor binding site on CTC-786C10.1/GSE1 gene. Notably, additional joint finemap annotations revealed that all three variants were enriched by known brain level gene expression sites close by (See Fig. 3). GCTA-COJO 36 revealed no further signals present in the loci beyond independent variants identified via LD clumping or fine mapping (Supplementary Fig. 9).

Polygenic risk modelling of TD with other diseases and traits
Polygenic risk score modelling via PRSice2 37 (Supplementary Fig. 10) revealed best polygenic association of TD with amyotrophic lateral Sclerosis (ALS), albeit only two PRS thresholds remained significant after Bonferroni correction (P T = 0.5; P T = 1.0). ALS PRS-based pathway analysis indicated "misfolded protein" as a top pathway shared between TD and ALS ( Supplementary Fig. 11), although this remains a trend finding. We followed up on the possibility that there might be pleiotropic genetic effects of ALS and TD, given that both conditions implicated motor neurons. A post-hoc lookup of ALS variants and genes was carried out on GWAS catalog. We filtered GWAS SNPs previously reported to be associated with ALS based on p < 1e-6 and extracted corresponding mapped genes. These were then looked up in the MAGMA gene results reported earlier. We found that three previously known ALS genes, FBXO15, FAM19A1, and NP5 genes were nominally associated with TD genes (TD Gene P-values: FBXO15: 0.0210, FAM19A1: 0.0356 and NPS: 0.0455). Cross-Trait polygenic risk scores were also conducted on other psychopathological or autoimmune traits-such as schizophrenia, bipolar disorder, Alzheimer's disease, Parkinson's disease, depression, rheumatoid arthritis, and Crohn's disease but were not significant after multiple testing correction. It is, however, notable that aspect of the schizophrenia and rheumatoid arthritis did appear to be close to the multiple correction threshold for P T < 0.05.

Discussion
To our knowledge, here we report the largest GWAS for TD. A single novel locus at 16q24.1 was found to be associated with TD. We also identified three loci (1p36.22, 6q23.2, and 12q13.13) with suggestive evidence of association to TD. Of these, putative causal variants were identified for three of the loci. The top GWAS hit at 16q24.1, in the GSE1 coiled-coil protein gene, encodes for a proline rich protein which was reported to be a subunit of BRAF35-HDAC (BHC) histone deacetylase complex 51 . This gene has been known to be implicated in the proliferation, migration, and invasion of breast cancer cells 52 . A lookup in the GWAS catalog revealed GSE1 was also associated at GWAS significance with platelet count and distribution 53 , and suggestive GWAS significance with amyotrophic lateral sclerosis 54 and sulfasalazine-induced agranulocytosis 55 . GSE1 contributes to a geneset that are predicted targets of a microRNA biomarker for schizophrenia 56 .
Other suggestive associations with TD included TNFRSF1B (1p36.22), EPB41L2 (6q23.2), and CAL-COCO1 (12q13.13). The tumor necrosis factor receptor superfamily member 1B (TNFRSF1B) is a protein-coding gene that mediates antiapoptotic signaling. Expression of TNFRSF1B is specific to cells in the immune systems, specific neuronal subtypes, certain T-cells subtypes, and endothelial cells 57 . This appears to be supported by results from the competitive geneset enrichment analysis, albeit nonsignificant, revealed top pathways that implicated immunoglobulin production, chemokine receptors, and monocytes. These findings appear to support existing theories on the role of immune and inflammation in the pathogenesis of TD 11,12 .
The erythrocyte membrane protein band 4.1 like 2 (EPB41L2) is involved in actin and cytoskeletal protein binding. More recently, EPB41L2 deficiency was found to result in myelination abnormalities in the peripheral nervous system, leading to motor neuropathy in a mice study 58 . This putative association of EPB41L2 deficiency is consistent with the direction of effect found in our GWAS results, suggesting that the expression of EPB41L2 might confer a protective effect against motor neuropathy. Functional enrichment in and around the EPB41L2 based on the MetaXcan finding is intriguing, further research is needed to dissect the function of EPB41L2 in TD pathophysiology.
The calcium binding and coiled-coil domain 1 (CAL-COCO1) is a protein-coding gene that is involved in the activation of transcriptional activities of targets genes in the Wnt signaling pathway, neuronal receptor and aryl hydrocarbon receptor (AhR) 59 . Notably, one function of the AhR, a ligand-based transcription factor, is the regulation of transcriptional activity for drug metabolizing enzymes, including the family of cytochrome P450 (CYP) genes 60 . The family of CYP has been postulated to be a candidate for TD susceptibility 11 . Specifically, CYP enzymes, such as CYP1A2, CYP2D6, and CYP3A4, metabolize antipsychotics (e.g., clozapine, olanzapine, and haloperidol) and antidepressants (e.g., fluvoxamine) 60 . Meta-analysis of CYP1A2 from past and current study revealed that this gene was significantly associated with TD at trend level (P < 0.05; Supplementary Table 10).
Follow-up post-hoc investigation showed that individuals that were on Typical antipsychotics were expectedly more likely to be a TD case. We also demonstrated that clinical factors such as Age of Onset and Duration of illness of schizophrenia also significantly predicted TD. However, beyond clinical factors, top SNPs within CAL-COCO1 and TNFRSF1B were responsible for accounting for nearly three times the risk of having TD and that joint clinical and genetic model accounted for greater than 75% of variance for individuals having TD and were on atypical antipsychotics. Nevertheless, further investigation is required to confirm the replicability and generalizability of the observed phenomenon.
Polygenic risk score analysis on traits with polygenic architecture revealed overlapping genetic polygenic risk of TD and ALS. The shared protein misfolding pathway between TD and ALS is consistent with findings from our MAGMA pathway analysis. Post-hoc lookups of known ALS genes that might be associated with TD revealed that an F-Box protein-encoding gene FBXO15, chemo/neurokine encoding gene FAM19A1, and neuropeptide encoding gene, NPS. Proteinopathies commonly implicated in neurodegenerative conditions have been proposed to also underlie TD 61,62 . Prior reports have implicated the accumulative role of proteinopathies in neurotoxicity, synaptic dysfunction, and its bidirectional effect with oxidative stress, neuroinflammation, and its consequence on the immune system 61,62 . We do note, however, that genetic overlap between TD and other seemingly related traits is generally weak, such as neurodegenerative conditions and autoimmune conditions. Nevertheless, Schizophrenia and Rheumatoid Arthritis were close to the multiple correction threshold. Future work in more powered samples may be carried out to investigate how the potential biological mechanisms for ALS, rheumatoid arthritis, and schizophrenia might be related to TD. We note that potential ancestry differences in LD patterns and allele frequencies of the target and training dataset could result in poorer polygenic prediction. The generally small pharmacogenomic datasets utilized in the context of this report, and the use of multiple reference panels (i.e., HapMap and 1000 genomes project) could have contributed to the variability of the results. Nevertheless, pharmacogenomic datasets such as those reported here continue to allow more clues to be shed regarding the potential biological mechanisms that underlie complex clinical effects that arise from the prescription of psychotropic medication.
Emergent results from the current report demonstrate the polygenic architecture of TD. While the current study remains underpowered for GWAS analysis (Supplementary Fig. 12), we present the largest GWAS of TD to date. The findings reported here suggest that multiple overlapping biological systems might contribute to the etiopathogenesis of the condition. Taken together, results suggest that TD is associated with significant proteinopathies, disrupted neuronal function within the brain and potentially including muscular innervation, existing in a background of inflammation. Further work is necessary to identify the cascade of biological disruptions and events that trigger TD symptomatology. Further work is needed to replicate findings in the current report and further unravel the biology of these risk variants and pathways identified.