Introduction

Family studies of anorexia nervosa (AN) have consistently shown that first-degree relatives of AN sufferers have an increased risk of AN, compared with relatives of unaffected individuals.1, 2, 3, 4 Twin studies have estimated the heritability of AN at 56%,5 with the majority of remaining variance in liability attributed to non-shared environmental factors (38%).5

Three genome-wide association studies (GWAS) of AN have been conducted to date. The first comprised 1033 AN cases collected as part of the Price Foundation Genetic Study of Anorexia Nervosa and 3733 pediatric controls from the Children’s Hospital of Philadelphia.6 This study focused on common variation and identified 11 suggestive variants (P<1 × 10−5). None reached genome-wide significance in the primary analysis, although one variant (rs4479806) approached genome-wide significance in an associated secondary analysis. The second study (comprising 2907 cases and 14 860 controls) was carried out by the Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3) effort.7 This study identified two suggestively associated variants (P<1 × 10−5). Notably, signals at P<1 × 10−5 were significantly more likely to have the same direction of effect in the replication as in the discovery cohorts (P=4 × 10−6), which implies that true signals exist within this data set, but that the study was underpowered for detection. Recently, a third study-meta-analyzed samples from both of these studies, as well as some novel cases, comprising a total of 3495 cases and 10 982 controls. To our knowledge, this study identified the first genome-wide significant locus for AN (index variant rs4622308, P=4.3 × 10−9).8

Both previous studies focused on common variation. Here, we conducted, to our knowledge, the first association study that also considered low frequency (minor allele frequency (MAF)<5%) and rare exonic variants in addition to common variation.

Materials and methods

Sample collections

We conducted a GWAS across nine discovery data sets (the majority overlapping with Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3/WTCCC3 samples), resulting in a total of 2158 cases and 15 485 ancestrally matched controls (Table 1 and Figure 1). All AN cases were female. AN diagnosis was made via semistructured or structured interview, or population assessment strategy using Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV criteria for AN. The amenorrhea criterion was not applied, as this has been shown not to be diagnostically relevant9 and has since been dropped from DSM-5.10, 11 All cases met criteria for lifetime AN.

Table 1 Final numbers of cases and controls after QC
Figure 1
figure 1

Geographical distribution of samples across Europe. (a) Distribution of cases across Europe; 375 USA cases are not shown in this diagram. (b) Distribution of controls across Europe; 873 USA controls are not shown in this diagram.

PowerPoint slide

Exclusion criteria included confounding medical diagnoses, for example, psychotic conditions, developmental delay or medical or neurological conditions causing weight loss.

Ancestry-matched controls were selected for each AN case set. Both male and female controls were used (Table 1). These were obtained either from existing collaborations, or through genotyping repository (dbGaP) access. Each site obtained ethical approval from the local ethics committee, and all participants provided written informed consent in accordance with the Declaration of Helsinki.

Population prevalence of AN in these populations ranged from 0.4 to 3% (refs 12, 13, 14, 15, 16, 17, 18; Table 1).

Genotyping

Cases were genotyped on either the ‘Infinium HumanCoreExome-12 BeadChip Kit (Illumina, San Diego, CA, USA),19 or the ‘Infinium HumanCoreExome-24 BeadChip Kit (Illumina),20 at the Wellcome Trust Sanger Institute. Where possible, controls were selected from existing studies with matching genotyping platforms to cases. Three control cohorts had been genotyped on the ‘Infinium HumanExome-12 BeadChip Kit’ (Table 1). To ameliorate potential confounding due to chip effects,21 chip-type quality control (QC) was carried out, and ~14 000 single-nucleotide polymorphisms (SNPs) removed.

Quality control

Genotypes were called using the GenCall22 and Zcall23 algorithms. At each of these genotype-calling stages, QC was performed for each population and for cases and controls separately (Supplementary Table 1). The final number of SNPs included in the analyses is given in Table 2.

Table 2 Final number of SNPs per population

Controlling for population stratification

In order to account for population stratification, a principal components analysis was carried out for each cohort separately using the smartpca software.24

Population outliers were identified by merging each population with central European 1000 Genomes data.25

Variance explained by each PC was plotted for each population. In order to be both conservative and consistent across populations, the first 10 principal components were included as covariates in the association testing.

Association testing

Unbalanced case–control ratios can lead to anticonservative P-value estimates.26 This study includes a number of unbalanced strata (Table 1). The likelihood ratio test has been shown to have low type-I error rate across both balanced and unbalanced cohorts,26 and was chosen as the association test for this study.

A lower cutoff of minor allele count of 5 and MAF of 0.1% was used. Association testing was performed for each cohort separately using SNPtest.27 In the cohorts with mixed sex controls (all except Italy and Norway), sex was also included as a covariate.

The standard genome-wide significance threshold of P5 × 10−8 was applied.

Meta-analysis

Summary statistics across cohort were meta-analyzed using an inverse variance-based test in METAL.28 In order to test the heterogeneity of the results, Cochran’s Q and the I2 statistic were computed.

Assigning variants to genes

Variants identified associated at P1 × 10−4 were assigned to genes using Ensembl (release 83; Ensembl Genome Browser).29, 30 For each variant, all predicted consequences (for example, missense, non-synonymous, and so on) and associated gene transcripts were downloaded and compared. Each variant was associated with only one predicted consequence and one Ensembl gene ID (Ensembl Genome Browser).29

Cluster plot checking

Cluster plots were created for all SNPs reaching P1 × 10−4 in any analysis (cohort-specific or meta-analysis) using ScatterShot.31 SNPs were visually inspected for each cohort, and for cases and controls separately. In instances where multiple cohorts were merged (for example, UK cases), cluster plots were checked separately for each original cohort.

Burden testing

The potential aggregation of rare variants in cases compared with controls was investigated using a gene-based approach. Burden tests were carried out using the Zeggini–Morris burden test32 as implemented in rvtests (Rvtests - Genome Analysis Wiki).

All SNPs with MAF between 0.1 and 5% were included; similar to the single-point analysis, a lower bound of minor allele count=5 was used. A list of genes and locations was obtained from the UCSC genome browser (Table Browser: www.genome.ucsc.edu). All genes with at least two qualifying variants in at least two populations were used, resulting in a total of 9083 genes.

Burden tests were carried out for each population individually, and the results meta-analyzed using Stouffer’s method, weighted according to effective sample size.33

The genome-wide significance threshold for burden testing is computed in a similar manner to that for single-point analysis, using Bonferroni correction for the number of genes tested. This results in a genome-wide significance threshold of 5.5 × 10−6.

Pathway analysis

One of the key motivations of studying complex psychiatric disorders such as AN is the desire to unearth biological pathways underlying disease development. Pathway analysis was performed using summary statistics from the meta-analysis for the full data set.

Four pathway databases were used: the Kyoto Encyclopedia of Genes and Genomes (KEGG),34, 35 the Reactome pathway database (REACTOME),36 PANTHER pathway (PANTHER)37, 38 and the Gene Ontology database (GO).39, 40 These were curated to remove redundancy, resulting in a total set of 1836 pathways.

The analysis was run once on a merged set of 235 KEGG,34, 35 REACTOME36 and PANTHER37, 38 pathways, and once for the 1601 GO pathways.39, 40

Pathway analysis was carried out using MAGMA.41 MAGMA was selected for its ability to deal robustly with linkage disequilibrium (LD) between markers, correct for gene length and deal accurately with rare variants. To our knowledge, MAGMA was first used to annotate SNPs to genes. This analysis was repeated twice. In the first analysis, variants were assigned only to the gene they were in, resulting in 68.73% of the variants being assigned to 13 400 genes. In the second analysis, variants were assigned allowing a 20 kb window in both directions from the gene. This procedure included 75.44% of variants across 18 118 genes.

SNP P-values were used to create gene scores. The European panel of the 1000 Genomes project was used as a reference set to estimate LD between SNPs. The analysis also requires the sample size of the study to be specified; because of the unbalanced nature of the study, the effective sample sizes were given here.

Gene P-values were calculated using MAGMA.41 The top 10% of SNPs per gene were used. Significance was defined using a false discovery rate of 5%.42

There is a risk when assigning SNPs to genes using MAGMA that some highly associated SNP might be assigned to multiple overlapping genes, and thus distort pathway results. SNP–gene assignments were checked for all pathways that reached false discovery rate-corrected significance. No instances of SNPs being assigned to multiple genes were found across these pathways.

Replication

SNPs reaching P<1 × 10−4 in the discovery stage were prioritized for replication. In total, 16 SNPs were selected.

Replication was carried out using two data sets: one existing in silico data set and one set for de novo genotyping. The in silico data set came from an existing GWAS of AN,7 genotyped on the Illumina HumanHap610 platform. This data set included 1033 cases and 3733 controls. All cases included in this study were female. Controls were both male and female. The de novo replication cohort consisted of 266 self-volunteered female UK cases, collected through the charity Charlotte’s Helix (www.charlotteshelix.net). All participants were adults and had been diagnosed with AN by their clinician. In addition, all participants completed an online questionnaire based on the Structured Clinical Interview43 for the Diagnostic and Statistical Manual of Mental Disorders-IV Module H. The Structured Clinical Interview has been used extensively in epidemiological investigations. The Structured Clinical Interview eating disorder module was modified to capture information on lifetime history of eating disorders including AN, and includes questions on body mass index, age of onset, and experience of eating disorders. DNA from the saliva samples was extracted using standard protocols and was quantified using pico-green. Samples were genotyped on the Infinium HumanExome 12 Beadchip, genotypes were called using GenCall and Zcall algorithms and stringent QC was performed pre- and post-call. In all, 1500 ancestry-matched controls (55% female) were obtained from the UK Household Longitudinal Study.

De novo genotyping was performed using the iPLEX Assay and the MassARRAY System (Agena Bioscience, San Diego, CA, USA) (formerly Sequenom). Sample and SNP QC were carried out within each replication data set, using an 80% sample call rate and a 90% SNP call rate threshold, and a Hardy–Weinberg equilibrium threshold of 10−4. Five samples and one SNP were removed using these criteria.

Post-QC, 15 SNPs and 261 de novo cases remained. The de novo replication analysis therefore included 15 SNPs, 261 cases and 1500 controls. Genotypes for 12/16 SNPs were available in the in silico replication cohort, across 1033 in silico cases and 3733 controls.

Expression analysis

Gene expression data were obtained from the Genotype-Tissue Expression (GTex project) web portal, data release version 6 (dbGap Accession phs000424.v6.p1).44, 45, 46

Power

The sample sizes used in this study are small in the context of other psychiatric phenotypes. Power to identify genome-wide significant signals was calculated using Quanto.47, 48 This study is adequately powered to detect low-frequency alleles with large effect sizes and common alleles with substantial effect sizes (80% power to detect common alleles with odds ratio (OR)>1.5; low-frequency alleles with OR>2, Supplementary Figure 1).

Data availability

Genotypes of European cases included in this study are publicly available through the European Genome-Phenome Archive (EGA), under accession number EGAS00001000913, data set EGAD00010001043, with the exception of German and Dutch genotypes. Genotypes for cases from the United States of America may be obtained through dbGaP. Summary statistics are available for download from the PGC website (https://www.med.unc.edu/pgc/results-and-downloads).

Results

GWAS and replication meta-analyses

Association testing was performed separately for each of the nine discovery cohorts within this study (2158 cases, 15 485 controls), and the results were meta-analyzed. No inflation was seen in the QQ plot (Figure 2b). Six variants were identified with P<1 × 10−5, and nine additional variants with P<1 × 10−4 (Figure 2a and Supplementary Table 5). Of these, one variant approached genome-wide significance (exm860538/rs199965409, P=9.97 × 10−8), although this variant is polymorphic only in the Finnish population within these data sets, in the Exome Aggregation Consortium49 and in the 1000 Genomes project panel data.25 Variants with P<1 × 10−4 were taken forward for replication.

Figure 2
figure 2

Results from discovery-phase meta-analyses. (a) Manhattan plot for meta-analyzed P-values, across all nine populations. (b) QQ plot (λ=0.94).

PowerPoint slide

In total, 16 independent variants were selected for follow-up in one in silico cohort (1033 cases, 3733 controls) and one de novo genotyping cohort (261 cases, 15 000 controls). Of these, five were low frequency (MAF ~1%) and 11 were common frequency variants.

Twelve signals passed QC and were polymorphic in the de novo genotyping cohort, of which four were nominally significant (Supplementary Table 6; P<0.05, minimum P=0.001). Eight of twelve SNPs had the same direction of effect as in the discovery GWAS, including three of the four nominally significant variants.

Ten of the sixteen variants were present in the in silico cohort, of which six had the same direction of effect as in the discovery cohort, and one of these six was associated with P=0.02 (Supplementary Table 7).

On the basis of the number of SNPs taken forward for replication, we would not expect to see any variants reaching P<0.05 by chance. We also see a higher concordance in direction of effect between discovery and replication cohorts (7/10 in the in silico analysis, 8/12 in the de novo analysis) than might be expected by chance; however, the number of SNPs tested was too small to achieve statistical significance (P=0.17, P=0.19, one-sided binomial test).

Five SNPs had the same direction of effect across the meta-analyzed discovery cohort and both replication cohorts. No SNPs reached genome-wide significance in the final global meta-analysis. Two variants were associated with the same direction of effect across discovery and replication cohorts, and reached P<0.05 in at least one replication cohort (Table 3).

Table 3 Global meta-analysis results

rs10791286 was associated with risk for AN across all discovery and replication cohorts (Figure 3a, global P=9.89 × 10−6, OR 0.84, 95% confidence interval 0.78–0.91). It resides in intron one of the opioid-binding protein/cell adhesion molecule-like (OPCML) gene. Data from the CommonMind Consortium project indicate that this variant is an eQTL for OPCML in the dorsolateral prefrontal cortex, and is associated with reduced expression (P=0.014 after correction for multiple testing).50 OPCML has a role in opioid-binding and opioid receptor function51, 52 and is expressed in a range of neuronal tissues, primarily the cerebellum and cerebellar hemispheres.44, 45, 46 OPCML has previous associations with body mass index,53 waist–hip ratio,54 visceral fat distribution55 and alcohol dependence,56 among other phenotypes. The variant itself has no previously reported associations in any phenotype.

Figure 3
figure 3

Odds ratios for two notable single-nucleotide polymorphisms (SNPs) across discovery and replication cohorts. (a) rs10791286 and (b) rs7700147.

PowerPoint slide

rs7700147 was associated with AN across all discovery and replication cohorts (global P=2.93 × 10−5, OR 1.2, 95% confidence interval: 1.1, 1.3; Figure 3b). It is an intergenic variant and has no previous associations.

Burden testing

Burden testing allows the contribution of multiple low-frequency variants to be aggregated across discrete units (for example, genes). Three genes were identified with P<1 × 10−4, although none reached genome-wide significance (Table 4). A further five genes reached P<1 × 10−4, but passed inclusion thresholds in one population only (Table 4), and as such are likely to be false-positives.

Table 4 Burden test results

FAM96A has previously been associated with low-density lipoprotein levels and cholesterol57 and is primarily expressed in the liver, lymphocytes and adrenal gland.44, 45, 46 KIF7 has no previous phenotype associations and has generally low expression across a wide range of tissues.44, 45, 46 C6orf10 has previous associations with visceral fat55 and childhood obesity,58 as well as a number of autoimmune disorders.59, 60, 61, 62, 63, 64 C6orf10 is expressed in testes44, 45, 46 (see Discussion).

Biological pathways associated with AN

Allowing a 20 kb window for SNP to gene assignment identified two pathways significant at q<0.05: ‘Phospholipase activator’ and ‘GTP-rho binding’ (Table 5).

Table 5 Pathway analysis results for full data set

Using the strictest assignment method of SNPs to genes for the full data set, no pathways were significant after multiple-testing correction. The highest ranking pathway was ‘Calcium ion import’ (q-value=0.069).

Discussion

To our knowledge, this work constitutes the first examination of low frequency (<1% MAF) and rare exonic variation in AN in the context of a genome-wide scan. No low frequency or rare variant replicating associations were identified, although this study was well-powered to detect low-frequency variants with large effect sizes (Supplementary Figure 1). Although polymorphic only in the Finnish population, rs199965409 approached genome-wide significance. It is a non-synonymous variant with a MAF of 0.5% in the Finnish population.65, 66 The variant is within the WDR11 gene, which is associated with hypogonadotropic hypogonadism 14 with or without anosmia.67, 68, 69 The clinical features of the disease, such as delayed sexual maturation,68, 70, 71 suggest that it may be misdiagnosed or comorbid with AN, which may explain its association in the analysis.

Two notable, but common-frequency, signals were identified with consistent direction of effect across discovery and replication cohorts (rs10791286 and rs7700147). These variants had been removed from the first Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3) AN GWAS because of poor cluster plots; therefore, we were not able to compare effect sizes between studies. Burden tests to investigate an aggregation of rare variants within genes rendered three potentially interesting genes, which require further replication.

Studying rare variation presents a range of challenges. The sample sizes required to identify rare variants with modest effect sizes are substantially larger than for common variants. Further, the MAF spectra seen across trans-European populations differ more for rare variants than for common variants, especially when considering genetically distant populations such as Finland and Italy.25 This can reduce the power to detect a signal and achieve replication. There are also many technical challenges to consider when conducting a rare variant study; for example, the inflation seen in association tests at low minor allele count26 and the increased error rate of calling algorithms when applied to rare variants22, 23, 72 We mitigated against the latter challenge by comprehensively examining cluster plots of >10 000 variants that surpassed a P-value threshold of P<1 × 10−4 in any analysis.

Of the genes potentially implicated through the single-point and burden test analyses, three have associations with metabolic and anthropometric phenotypes (OPCML, C6orf10 and FAM96a). OPCML has previously been associated with waist-to-hip ratio, while C6orf10 has associations with childhood obesity.58 FAM96A has been shown to be associated with metabolic phenotypes such as low-density lipoprotein and cholesterol levels. The associations of these three genes with metabolic and obesity-related phenotypes may indicate some roles for metabolic processes in AN development, although pathway analysis did not corroborate this observation. A growing body of evidence suggests involvement of metabolic processes in AN development, including appetite-satiety pathways, gut motility and gastric-emptying times.73, 74, 75, 76, 77, 78, 79 For example, application of the LD Score regression method revealed significant negative genetic correlations between AN and body mass index, insulin, glucose, and lipid phenotypes and significant positive genetic correlations between AN and HDL cholesterol phenotypes.1, 80

Notably, C6orf10 has been previously associated with childhood obesity.58 This finding is particularly interesting for a number of reasons. First, appetite and satiety dysregulation have been shown to be central to the development of childhood obesity.81, 82 In particular, reduced satiety responsiveness (experiencing an urge to eat despite internal ‘full’ signals) and heightened responsiveness to food have a role in increased adiposity. Aberrant responses to satiety signals and reduced responsiveness to food are also operative in AN, suggesting shared biological dysregulation between the two conditions.83, 84 Children with increased adiposity are at higher risk of eating disorders85 as they are more likely to engage in high-risk behaviors such as repeated and excessive dieting and erratic or overly rigid eating patterns.85, 86, 87, 88, 89 These children are also at higher risk of being bullied about their weight, which may increase weight and shape concerns, body dissatisfaction and a host of related risk factors for AN development.85, 86, 87, 88, 89, 90, 91, 92

The most significant pathway analysis association was with phospholipase activator pathways, which act to catalyze the hydrolysis of glycerophospholipids (GO:0016004 phospholipase activator activity). Phospholipase has a central role in the serotonin-triggered metabolism of arachidonic acid in the brain,93, 94, 95 which is a common target for antidepressants94, 95 such as lithium, carbamazepine (Tegretol), valproate and lamotrigine (Lamictal).96 These antidepressants have been shown to have varying efficacy in treating AN.97, 98, 99 Lithium has been used in treatment of AN (with varying success),97, 98, 99 while carbamazepine and valproate have been successfully used in individuals with complex comorbid eating disorder phenotypes.100, 101, 102, 103, 104 Finally, lamotrigine has been shown to significantly improve eating disorder and mood symptoms in individuals with binge-eating and purging behaviors.105

The second pathway identified as significantly associated with AN was GTP-rho binding. This pathway has a role in brain development, and is regulated by autism-susceptibility candidate gene 2 (AUTS2).106 This finding is consistent with the comorbidity between AN and autism.107 Moreover, individuals with AN may be socially withdrawn107 and exhibit elevated levels of autistic traits associated with lower social functioning.107, 108, 109 AUTS2 has also been well studied as a candidate gene for alcohol abuse,110 which is commonly comorbid with eating disorders.111 There is also a well-established link between GTP-rho activation and cognition.112 Mice with altered expression of genes regulating Rho-GTPases have been shown to have altered exploratory and anxiety-related behavior, decreased sociability and memory formation, and decreased body weight, among others.112 These findings are in line with some of the comorbidities and intermediate phenotypes noted in AN, for example, the high comorbidity with anxiety-related disorders.113

There is substantial evidence for the involvement of chromatin-modulating genes in the development of autism,114, 115, 116, 117, 118, 119 schizophrenia120, 121, 122, 123, 124 and body mass index changes.114 Given the comorbidity of these disorders with AN, and the potential overlap with autism indicated in the pathway analysis results, we tested for enrichment of chromatin-modulating genes in these results. We obtained a list of 340 genes involved in modifying chromatin accessibility and/or modifying histone marks from existing literature; of these, 30 reached nominal significance in our burden test, substantially more than expected by chance (binomial test, P=0.0026). Moreover, one of the variants identified in the global meta-analysis (exm540361) lays near a gene included in this list (UHRF1BP1). Together, these results may indicate a role for chromatin-modifying genes in AN, although more work will be needed to investigate this further.

A number of limitations should be borne in mind when evaluating these results. First, the sample size of this study is small. Psychiatric disorders in general require very large sample sizes in order to identify reliable genome-wide significant signals.125 The current study was powered to detect common variants with substantial OR, and rare variants conferring substantial increases in disease risk (OR>2). To our knowledge, this was the first time a study has specifically investigated the role of rare variation in AN, and the lack of low-frequency replicating findings may indicate that little advancements may be made in this particular genomic search space.

We did not see any overlap between the pathways identified here and those identified in the recent PGC pathway analysis;126 however, this may reflect the relatively small sample size of this study, as well as different pathway analysis methodologies used.

In this study we only examined female AN cases of European origin. It has been suggested that the genetics underlying AN development may be easier to assess in an all-male study,3 as there may be a greater genetic risk required to induce trait expression. The higher relative risk in male subjects may also reflect this.3 To date, this has not been possible because of the lower prevalence of the disorder in men, resulting in substantially smaller sample sizes. Moreover, if AN is heterogeneous between populations, in order to fully understand the genetic etiology of the disorder, it will be necessary to expand collection to include more diverse samples. Efforts are already underway in a number of Asian populations such as Taiwan, Japan, Korea and China, as well as some South American populations such as Argentina and Brazil.

A caveat to this study is that controls were not screened for AN, and that both male and female controls were used. Given the population prevalence of AN across population of European descent, ~80 female and ~10 male controls would be expected to have AN diagnoses. Given the low rate of treatment seeking in AN,127 it would not be possible to confidently screen population-based or previously existing control cohorts for AN.

The underlying biological etiology of AN is complex and has not been elucidated yet. Here we have identified a number of variants that warrant follow-up in larger sample sizes, and which point to a role for metabolic, appetite-related and obesity-related effects, in line with a growing body of evidence for metabolic involvement in AN development. Substantially increased sample sizes and detailed phenotyping to reduce heterogeneity will be necessary to empower the characterization of the genetic architecture of AN.