A genome-wide association study of anorexia nervosa suggests a risk locus implicated in dysregulated leptin signaling

We conducted a genome-wide association study (GWAS) of anorexia nervosa (AN) using a stringently defined phenotype. Analysis of phenotypic variability led to the identification of a specific genetic risk factor that approached genome-wide significance (rs929626 in EBF1 (Early B-Cell Factor 1); P = 2.04 × 10−7; OR = 0.7; 95% confidence interval (CI) = 0.61–0.8) with independent replication (P = 0.04), suggesting a variant-mediated dysregulation of leptin signaling may play a role in AN. Multiple SNPs in LD with the variant support the nominal association. This demonstrates that although the clinical and etiologic heterogeneity of AN is universally recognized, further careful sub-typing of cases may provide more precise genomic signals. In this study, through a refinement of the phenotype spectrum of AN, we present a replicable GWAS signal that is nominally associated with AN, highlighting a potentially important candidate locus for further investigation.

DSM-IV. Both types are characterized by below-normal weight and restricted food intake. Individuals diagnosed as restricting type do not experience binge-eating episodes and do not engage in purging, such as vomiting or use of laxatives. Standard quality controls measures were applied, specifically, excluding potential cryptic relatedness and checking for population stratification (details described elsewhere 8 ). The average age of onset of the case subjects was 16.3 years with a standard deviation (SD) of 3 years (Interquartile Range; IQR = 16 (14)(15)(16)(17)(18)). The control group included 3,570 female matched healthy adolescents of NHE ancestry that had an average age of 18.3 years at the time of data analysis (SD = 5.7; IQR = 19(13-23)) (Supplementary Table 1). Associations were assessed with 507,999 SNPs genotyped on either Illumina HumanHap550 or Human610-Quad BeadChips in an additive model using logistic regression analyses with principal components adjustment, based on the principal component analysis of cases and controls (Supplementary Figure 1), resulting in significantly low level of genomic control inflation factor of 1.03 (Supplementary Figure 2). The analysis yielded one SNP (rs929626) with a P value of 2.04 × 10 −7 and 4 other SNPs with marginally larger P values that are in strong linkage disequilibrium (r 2 > 0.8); these SNPs were selected for further analysis (Supplementary Figure 3; Supplementary Table 2).
Using imputation analysis based on data from the 1000 Genomes Project (Phase I integrated variant set, v2, March 2012), we subsequently tested associations with SNPs (imputed info > 0.5, minor allele frequency (MAF) > 0.05) located in a 200-kb window centered on the SNP rs929626. We observed association with a series of markers around this region, of which 34 SNPs supported suggestive associations (P < 1.0 × 10 −6 ) with both imputed and genotyped SNPs, which were in high LD with AN ( Fig. 1; Supplementary Table 3). This suggests that the single markers demonstrating nominal association in the GWAS are likely to be true positives.
We further explored this finding using the meta-analysis results from 15 previously reported AN cohorts 10 . Interestingly, two SNPs were also nominally significant (rs929626 with P = 0.037 and rs17543752 with P = 0.05) in the same direction as in the GWAS (Table 1). Meta-analysis results in a P value of 1.52 × 10 −7 .
We next used the ENCODE project 14 data to predict possible functional effect of the SNPs identified in this study. The top SNP, rs929626, and other significant markers located in the 6th intron of the EBF1 gene (Early B-Cell Factor 1), as well as two SNPs (rs113252656 and rs1081071) flanking the top SNP rs929626 at r 2 > 0.5 function as binding sites for EBF1 itself (HaploReg v4.1; ref. 15). This suggests that these genetic variants may modulate the expression of EBF1. Indeed, we observed a positive correlation with the rs929626 C allele carriers compared with TT homozygotes on the EBF1 expression level in nine independent subjects (the FPKM Figure 1. Region of genome-wide nominal association at 5q33.3. Regional plot of the EBF1-associated interval for the imputation analysis. Foreground shows scatter plot of the −log 10 P values plotted against physical position of human reference hg19. Background shows estimated recombination rates plotted to reflect the local LD structure. The color of the dots represents the strength of LD between the top SNP (rs929626) and its proxies (red, r 2 ≥ 0.8; orange, 0.8 > r 2 ≥ 0.6; green, 0.6 > r 2 ≥ 0.4; blue and navy, r 2 < 0.4). Genes, position of exons, and direction of transcription from UCSC genome browser (http://genome.ucsc.edu) are noted. value for TT homozygotes (3 subjects) versus C allele carriers (6 individuals) is 5.0 versus 6.4) with both whole genome sequencing data of blood and corresponding RNA-Seq data of heart right ventricle selected from the Pediatric Cardiac Genomics Consortium cohort (dbGaP Study Accession: phs000571.v3.p2). By using the Genotype-Tissue Expression Portal database (http://www.gtexportal.org), we also observed nominally significant expression quantitative trait loci (eQTLs) association (P = 0.0024, tested in 97 samples) in the putamen for rs929626 in the same direction. A few comorbid psychiatric disorders have been linked with the function of the putamen, such as anxiety, obsessive-compulsive disorder and attention deficit-hyperactivity disorder [16][17][18] . Taken together, these suggest the minor allele C carriers have relatively higher EBF1 expression.

Discussion
EBF1 encodes a transcription factor that originally thought to function as necessary for the development of the immune system 19 , but it has since been shown to regulate the development of both osteoblast and adipocyte lineages [20][21][22] . Two EBF1 variants, rs11953630-T and rs9313772-T, showed significant association at genome-wide level (P < 5 × 10 −10 ) in a study testing blood pressure in European whites 23,24 . In addition, rs17056278-C was also identified as a metabolic risk allele, interacting with psychosocial stress to contribute to increased hip circumference (P = 3 × 10 −8 ) 25 . However none of these is in LD with any markers in our identified locus. In animal studies, Ebf1−/− mice showed increased adipose tissue within marrow, whereas peripheral white adipose tissue was severely reduced. Circulating levels of leptin, a hormone released by adipocytes and one of the major players in food intake regulation, were also decreased in Ebf1−/− mice compared with controls 26 . This concurs with the reported generalized loss of accumulation of subcutaneous and visceral adipose accompanied by significant increases in yellow marrow in AN patients 27,28 . Also notable is the finding that circulating levels of leptin are very low in AN patients 29,30 and a decline in levels of circulating leptin can lead to changes in brain activity in areas involved in regulatory, emotional, and cognitive control of appetite 5 .
Understanding the genetics of AN is currently a major within-field initiative, in parallel to other neuropsychiatric/neurodevelopmental disorders such as schizophrenia, bipolar disorder, and autism spectrum disorders. Although the clinical and etiologic heterogeneity is universally recognized, in practice, many studies still failed to account for sample heterogeneity. In this study, by focusing on individuals with AN who have not crossed over to BN or BED, we have identified a marginally replicating GWAS signal that approached genome-wide significance. One limitation of our study is that all participants may not yet have experienced the full course of their eating disorder (The average duration of follow-up was 8.6 years with a SD of 7.0 years in the discovery cohort, while the average crossover time was 2.8 years with a SD of 2.6 years for the excluded AN patients), and a portion of the sample may develop BN or BED at later stages of illness. This would represent a conservative bias and underscores the importance of further investigation of this locus in the future focusing on individuals with lifetime AN who have never crossed over to other eating disorder presentations.

Methods
Discovery data set and quality control. We conducted a GWAS using data from our previously published study 8 consisting of 1,033 AN cases by excluding 212 patients with AN who experienced diagnostic crossover during the course of their illness (i.e. migrated from or to binge-eating disorder (BED) or bulimia nervosa (BN) as assessed with the Structured Interview for Anorexic and Bulimic Disorders 11 ) plus 100 patients without such information. A total of 692 female AN cases and 3,570 female matched controls that were carefully selected from Center for Applied Genomics (CAG) database were included in the analysis after Standard quality controls, namely, excluding potential cryptic relatedness and checking for population stratification by using the PLINK software 31 version 1.90a. The Research Ethics Board of CHOP and other participating centers approved the study. Informed consent was obtained from all adult participants and from a parent or legal guardian in the case of children and all work followed was in accordance with an IRB-approved protocol.
Association tests. For the genome-wide association analysis for SNPs, we utilized the PLINK software 31 version 1.90a, through Cochran-Armitage trend test.
Expression studies. The extended locus around associated SNP was then defined by identification of all SNPs showing r 2 > 0.5. Linkage disequilibrium (LD) was defined with the HaploReg v4.1 (ref. 15) based on Phase I of the 1000 Genomes project. Variants showing evidence of LD with associated AN variants were explored for impact on gene function via regulatory function (including eQTLs) by HaploReg v4.1, which both collate data from the Encyclopedia of DNA Elements (ENCODE) 14 . We also referred to the Genotype-Tissue Expression Portal database (http://www.gtexportal.org) for eQTLs analysis.