Anorexia nervosa (AN) is amongst the psychiatric disorders with the highest mortality rates [1]. According to the Diagnostic and Statistical Manual of Mental Disorders (DSM), the symptomatology of AN is diverse, including anthropometric symptoms, such as low body weight, and cognitive-behavioral symptoms, such as a strong fear of weight gain or continuous behavior to avoid weight gain, and body image/self-worth contingent on physical appearance [2]. The etiology of AN is multifactorial, with substantial environmental and genetic influences [1, 3,4,5,6,7]. AN is heritable, with SNP and twin heritability estimates ranging from 20% to 58%, respectively [1, 8, 9]. Understanding the genetic basis for each of the AN symptom domains could improve diagnosis and treatment mechanisms.

Under the earlier diagnostic criteria of the DSM-IV, an AN diagnosis could only be given if the patient was clinically underweight [10]. Since the implementation of the DSM-5 [2], those criteria were modified to “a significantly low body weight in the context of age, sex, developmental trajectory, and physical health”. Nonetheless, low body weight remains a hallmark of the disorder. The criterion for low body weight can make it difficult for patients to receive a diagnosis, and therefore treatment if they exhibit other symptoms of the disorder but do not have significantly low body weight. In some cases, individuals with subsyndromal forms of AN presenting with a normal or above normal body mass index (BMI) may be diagnosed with atypical AN. Compared to individuals with AN diagnosis, these atypical AN patients usually have a longer duration of symptoms and greater weight loss before diagnosis [11], and are less likely to receive inpatient treatment for their condition [12].

In addition to the role of low body weight as diagnostic criteria, emerging genetic studies have identified a possible metabolic component to genetic risk for anorexia [1, 13, 14]. For instance, AN is negatively genetically correlated with BMI [15], triglyceride levels, and fasting insulin, and positively correlated with metabolic markers like HDL cholesterol [16]. Furthermore, BMI heritability is heavily concentrated in the tissues of the central nervous system, which are also directly involved in the cognitive and behavioral aspects of anorexia [17], leading to the hypothesis that metabolic factors are involved in the development of AN [18]. However, questions regarding the nature of the relationship between metabolic control and AN, remain. It is hypothesized that AN and anthropometric factors, such as BMI, may share a common genetic basis due to the pleiotropic effects of genes that simultaneously influence both phenotypes. Alternatively, it is also possible that the observed association between PRSBMI and AN diagnosis is in part due to the weight-related diagnostic criteria for anorexia. For example, because the GWAS for AN is reliant on diagnostic criteria requiring a low BMI, it is possible that BMI is acting as a collider variable [19], potentially inducing genetic correlations between AN and BMI. If true, then genetic correlates of BMI may not be involved in AN psychopathology per se, but may instead be critical for understanding the extent and rate of weight loss that can occur as a consequence of AN. For example, patients in treatment for AN often return to very low weights even following full recovery, which may also be, in part, explained by genetic variations that contribute to increased metabolism and a lower genetic “set-point” for body weight [20]. Therefore, in the present study, we set out to illuminate the genetic association between AN and BMI using health record data over time from cases and controls collected in a biobank setting.

Materials and methods

Study population and anorexia phenotype

Our cohort included 123 genotyped female subjects from the Vanderbilt University Medical Center biobank (BioVU) with a lifetime diagnosis of AN as determined via at least one ICD-9 (307.1) or ICD-10 (F50*) codes for AN. BioVU is a repository of leftover blood samples (~240,000 samples) from clinical testing, which are sequenced, de-identified, and linked to clinical and demographic data [21]. All BioVU participants have provided informed consent. The VUMC Institutional Review Board oversees BioVU and approved this project (IRB#160302).

The prevalence of AN in the BioVU population was ~0.65%, which is similar to that observed in the general population (~0.9%) [22]. Control subjects (N = 615) were identified as those without ICD codes for AN. A ratio of 5:1 controls to cases was used, as the effect of increased statistical power is negligible above that ratio [23]. We restricted the sample to females due to the very small number of male AN cases with genotype data available (N = 2).

We assessed the relationship between AN genetic risk, and both mean and lowest BMI, as mean BMI better summarizes long-term BMI over lifetime, while lowest BMI better reflects biological extremes. We defined two separate control samples to test hypotheses related to mean and lowest BMI, each containing 615 female subjects (1:5 ratio) without the ICD-9 or ICD-10 codes for AN. The first set was age-matched to cases based on the median age across the medical record for each individual for use in analyses involving mean BMI. Median age is used to represent the age of the individual while they were a patient at VUMC. The second control set was age-matched to cases based on the age at lowest recorded BMI for use in analyses involving lowest BMI. Because lowest BMI is a single incident measurement, the corresponding age at lowest BMI is the most appropriate age variable.

Calculation of BMI

For every individual in BioVU, age and BMI measurements were collected from their de-identified EHR. After quality controls (QC, see Supplementary Materials), mean BMI and lowest BMI were calculated for each individual.

Generation of polygenic scores

We calculated polygenic risk scores (PRS) using PRC-CS [24] “auto” version (i.e., the global shrinkage parameter phi was learned from the data in a Bayesian approach) for each of the defined AN groups, as well as for each of the 66,914 BioVU individuals genotyped on the Illumina MEGA-EX array for further exploratory analyses. Genotyping and QC of this sample have been described elsewhere [21, 25]. We used GWAS summary statistics for AN from the largest available study (N = 72,517) [1]. For BMI, we used the female stratified GWAS summary statistics from the GIANT Consortium and UK BioBank meta-analysis (N ~430,000) [26]. Scores were z score standardized for both PRSAN and PRSBMI.

Statistical analyses

We tested a total of eight multivariable regression models, including four logistic regression models and four linear regression models. The first ten principal components calculated from the genetic data were included in all models to control for residual population stratification. To account for multiple testing, we used a Bonferroni corrected p value of 6.25 × 10−3 (0.05/8) to determine statistical significance.

We first examined if the PRSAN was significantly associated with the diagnosis of AN; and if PRSBMI was significantly associated with both mean and lowest BMI. We then hypothesized that PRSAN would be associated with mean lifetime BMI and with lowest BMI, regardless of whether the diagnosis of anorexia is present. To test this hypothesis, we regressed PRSAN on mean BMI and, separately, on lowest BMI, while controlling for AN diagnosis.

Next, we tested whether the PRSBMI variable was associated with AN diagnosis, and then assessed whether that association remained after controlling for the lowest measured BMI. Similarly, we also investigated the effect of including a covariate for the lowest BMI when regressing PRSAN on the AN diagnosis.

Mediation analysis

We performed two mediation analyses, first we tested a model in which PRSBMI was the exposure, lowest BMI was the mediator, and AN diagnosis was the outcome. Second, we tested a model in which PRSAN was the exposure, lowest BMI was the mediator, and AN diagnosis was the outcome. Bootstrapping (10,000 iterations) was used to generate confidence intervals and determine statistical significance. The analyses were performed using the mediation R package v4.5.0 [25]. Due to the limitations of EHR data, we were unable to conclusively determine the chronological order of lowest BMI measurement and AN diagnosis.

PheWAS analyses

In the BioVU sample (N = 66,914), we fitted a logistic regression for each of the 1335 disease phenotypes available to estimate the odds of a diagnosis of that phenotype given the PRSAN. Each disease phenotype (commonly referred to as “phecode”;, Phecode Map 1.2) was classified using EHR and ICD diagnostic codes to establish “case” status. For an individual to be considered a case, they were required to have two separate ICD codes for the index phenotype, and each phenotype needed at least 100 cases to be included in the analysis.

We performed an exploratory phenome-wide association analysis (PheWAS) to examine genetic associations between PRSAN and thousands of other phenotypes in the medical phenome, including metabolic conditions. We repeated the analyses after adjusting for PRSBMI to determine the impact of genetic correlates of BMI on these associations. The covariates included in the analyses were sex, median age of the longitudinal EHR measurements, and the top ten principal components of ancestry. We repeated the analyses including AN diagnosis and PRSBMI, respectively, as additional covariates. We used the standard Benjamini–Hochberg false discovery rate (FDR 5%) to correct for multiple testing. PheWAS analyses were run using the PheWAS R package v0.12 [27].


BMI distributions and sample characteristics

Mean and lowest BMI was significantly lower among cases compared to controls (cases (mean, SD) = 20.93, 4.47; (lowest, SD) = 18.14, 4.31; controls (mean, SD) = 26.27, 6.81; (lowest, SD) 20.70, 6.48). Because the groups were age-matched, the average median age for both cases and controls was 26 (range 11–79), and the average age at the lowest BMI was also 26 (range 12–78).

In the case group, 65% of individuals had an underweight lowest BMI at some point in their BioVU medical record, while only 13% of controls had an underweight lowest BMI. Additionally, 82% of cases had a mean BMI that was normal or below, while only 48% of controls had a mean BMI that was normal or below normal.

Regression models

Figure 1 presents the results for all the tested regression models. As expected, PRSAN was associated with AN diagnosis (p = 6.25 × 10−4, OR = 1.05, CI = 0.97, 1.07). PRSBMI was associated with mean BMI (p = 2.00 × 10−16, β = 2.14, SE = 0.23) and with lowest BMI (p = 2.00 × 10−16, β = 2.40, SE = 0.23).

Fig. 1: Schematic representation of the associations tested.
figure 1

Arrows between boxes denote associations tested. Beta values and significance (**p < 6.25 × 10−3) (*p < 0.05) are reported for associations with mean/lowest BMI, while odds ratios and significance are reported for associations with AN diagnosis.

PRSAN was not associated with mean BMI (p = 0.88, β = −0.001, SE = 0.01) but was associated with lowest BMI (p = 9.03 × 10−6, β = −0.03, SE = 0.01), even after controlling for AN diagnosis (p = 6.46 × 10−4, β = −0.02, SE = 0.01). As previously reported by others, we observed a negative association between PRSBMI and AN diagnosis (p = 1.12 × 10−3, OR = 0.96, SE = 0.01). After accounting for the lowest measured BMI, the association between PRSAN and AN diagnosis remained nominally significant (p = 0.01, OR = 1.03, SE = 0.01) indicating the PRSAN contributes to the AN diagnosis beyond contributions to body weight. In contrast, after controlling for lowest measured BMI, the association between PRSBMI and AN diagnosis was null (p = 0.84, OR = 1.02, SE = 0.01).

Using a mediation model, we found that nearly the entire effect of the PRSBMI on AN diagnosis was accounted for by the lowest BMI measured (proportion variance mediated = 95%, p = 2.00 × 10−16). Again, in contrast, the lowest measured BMI only accounted for 40% of the variance (p = 3.24 × 10−3) in AN diagnosis contributed by the PRSAN.

PheWAS results

PRSAN was initially significantly associated with metabolic and psychiatric phenotypes, including positive associations with anxiety (β = 0.07, p = 1.81 × 10−8) and mood disorders (β = 0.07, p = 7.81 × 10−8) and negative associations with obesity and diabetes (β = −0.08, p = 1.80 × 10−7, β = −0.09, p = 1.53 × 10−14, respectively; Fig. 2, Supplementary Table 1), even when controlling for AN diagnosis (Supplementary Table 2). However, when PRSBMI was included as a covariate, the magnitude and strength of the associations with metabolic phenotypes were substantially decreased (Fig. 2, Supplementary Table 3), suggesting that these initial observations were largely a consequence of the genetic association between the PRSAN and the PRSBMI.

Fig. 2: PheWAS analysis showing shared genetic associations between risk for anorexia nervosa and other phenotypes in females.
figure 2

Significant positive correlations are shown between psychiatric conditions and significantly negative correlations are shown with metabolic phenotypes (upper panel). After controlling for PRSBMI, most of the metabolic phenotypes were attenuated (lower panel), while the psychiatric phenotypes remained relatively unchanged.


Here, we investigated the role of measured BMI as a mediator in the observed genetic relationship between AN and metabolic factors. Recent genetic studies suggest that AN should be recategorized as a metabo-psychiatric disorder due to the observed genetic associations between risk for AN and metabolic factors [1, 13, 18]. The present study does not suggest the complete absence of a link between AN and all metabolic factors, but proposes that there should be less emphasis on body weight as a diagnostic criterion as measured BMI mediates a significant portion of this link.

We provide evidence that suggests that the association between AN and anthropometric factors is potentially driven by the genetic predisposition of an individual to present with a low body weight (but that the PRSBMI is not necessarily involved in other aspects of AN symptomatology). In other words, PRSBMI may contribute to the AN diagnosis by decreasing the body’s “set-point” BMI, thus increasing the likelihood that extremely low body weight will be observed in addition to the other AN symptoms, thereby increasing odds of receiving an AN diagnosis. This finding has two important implications. First, it suggests that individuals with cognitive-behavioral symptomatology suggestive of AN, but with a higher genetically predicted BMI, may be under-diagnosed given current diagnostic criteria. Second, it suggests that the PRSBMI may be an important independent risk factor in the life-threatening consequences of extremely low BMI in the context of AN.

We found that PRSAN and PRSBMI were strongly associated with AN diagnosis and measured BMI, respectively, highlighting the validity of the EHR for genetic analysis for both traits. Intriguingly, PRSAN was not associated with mean BMI but was associated with lowest BMI, suggesting that some of the genetic factors involved in the diagnosis of AN may also contribute (behaviorally or otherwise) to loss of body weight [1], even in individuals who do not have a diagnosis of AN.

Both PRSBMI and PRSAN were associated with AN diagnosis, as previously speculated, but in this study, we were able to further dissect the contributions of PRSBMI. While initial results showed that PRSBMI was associated with AN diagnosis, the conditional analysis demonstrated that PRSBMI was not associated with AN diagnosis outside of its effect on BMI. In contrast, PRSAN remained robustly associated with AN diagnosis even after adjusting for the lowest BMI. We speculate that these results suggest that the PRSBMI mostly contributes to the severe consequences of AN (low body weight), which often bring people to clinical attention, while the PRSAN may represent an increased risk for the cognitive behavioral processes that lead to the development of AN. Analyses from our mediation analyses showed that the lowest BMI almost entirely accounted for the association between PRSBMI and AN diagnosis, but only partially accounted for the association between PRSAN and AN diagnosis. This is further evidence for our hypothesis that individuals with higher PRSBMI may be underdiagnosed given the current diagnostic criteria for AN. These findings are consistent with a recent study that did not find any significant associations between eating disorder symptoms and metabolic PRSs, suggesting that the metabolic genetic factors could distinguish between symptoms of disordered eating and a clinical eating disorder diagnosis [28]. There is a well-documented association between PRSBMI and disordered eating behaviors, which has been shown to be mediated by measured BMI. Individuals with higher PRSBMI show both higher measured BMI and weight loss behaviors [29]. Additionally, higher PRSAN has been linked to weight loss trajectory in individuals without a clinical diagnosis of AN, and this association is not mediated through genetic risk for obesity, which likely has shared genetic architecture with PRSBMI [30].

In the PheWAS analyses, PRSAN was associated with numerous health outcomes, and metabolic conditions were strongly implicated. This replicates and augments recent evidence showing positive correlations for AN with psychiatric phenotypes and negative correlations with diabetes and metabolism phenotypes [1, 13, 16, 18]. Chronic hepatitis and chronic renal failure were also significantly associated with the PRSAN, as these conditions are associated with poor appetite [31, 32]. However, when we controlled for genetically predicted BMI (i.e., PRSBMI), the associations between PRSAN and metabolic factors significantly weakened. This is in contrast to a previous study that did not observe significant attenuation of the correlation between genetic associations of metabolic factors and AN when controlling for genetic associations of BMI [1]. Our findings further signal that the association between genetic risk for AN and metabolic outcomes is potentially largely attributable to BMI, which is difficult to disentangle since low BMI is a diagnostic criterion of AN. This is important because metabolic dysregulation in individuals with AN and low BMI may further increase difficulty in maintaining a healthy BMI [1].

In addition to phenotypes directly related to AN symptomatology, other phenotypes continued to be associated with PRSAN after adjustment for PRSBMI in the PheWAS analysis. Notably, chronic hepatitis was negatively associated with PRSAN. While reversible severe hepatitis is sometimes observed in severe AN [33], chronic hepatitis is novel. Additionally, fractures and back pain were positively associated with PRSAN, which may reflect both increased physical activity and decreased bone density observed in AN patients [34]. Further studies are needed to investigate these associations and their role in the genetic architecture of AN.

These findings of decreased metabolic condition correlations when controlling for BMI are consistent with a recent study that stratified individuals into high, normal, and low BMI groups and performed three separate PheWAS analyses [35]. There were observed differences in the association between AN and BMI, with the strongest negative association in the high BMI group, demonstrating that measured BMI plays a role. However, our study controlled for PRSBMI rather than measured BMI, which allowed us to dissect the components of AN risk that are solely due to genetic body composition differences between individuals. This means that the collider effect of BMI on the correlation between AN risk and metabolic conditions extends farther than weight bias and extends into the realm of entangled genetic etiology.

Although the AN underweight diagnostic criteria were recently expanded upon the release of the DSM-5 in 2013, low body weight still remains a hallmark feature. Our findings emphasize that there needs to be a shift away from body weight as an important diagnostic criterion for AN, particularly in individuals with subsyndromal forms of AN (atypical AN), because it can lead to underdiagnosis and makes it difficult to disentangle the true genetic contributions to AN. Instead, low PRSBMI should be used to predict an increased risk of being underweight while exhibiting AN symptoms, consistent with research showing individuals with high PRSAN and low PRSBMI had significantly slower growth trajectories than those with high PRSAN and PRSBMI [36]. Additionally, results from a recent study suggest that the use of PRSBMI in addition to PRSAN could be a useful method to predict individuals that will develop severe and enduring eating disorders [37].

These ideas have been shown in clinical settings. A recent study on hospitalizations from AN and atypical AN (non-underweight patients) found that patients displayed similar medical complications regardless of weight and that duration of illness was a much stronger predictor for severity than body weight [38]. When present alongside AN behaviors, weight suppression of even five percent can be clinically significant [39]. Atypical AN occurs in up to 3% of the population, meaning that these “atypical” individuals represent the majority of cases by far [40]. In fact, under the DSM-IV, over half of all patients diagnosed with an eating disorder were given an ED-NOS (Eating Disorder Not Otherwise Specified) diagnosis due to the absence of one or more of the stringent criteria for the established disorders [41]. This may bias AN diagnoses towards only those with the most extreme BMI manifestations of the disease [42]. Patients with the same severity of symptoms who present at higher weights may not be able to receive insurance coverage/treatment due to inherent bias among health professionals [43, 44]. We predict that risk scores for AN will improve in the future if the focus continues to shift away from low body weight, due to the fact that more cases will be identified.

It is worth acknowledging the limitations of our study. First, larger sample sizes are still needed as well as provide validation across other populations and sample section schemes. For example, it is possible that the observed trends are unique to AN diagnosis in an academic medical center and may be different in a community care setting [45]. Additionally, this study focused on individuals of European ancestry leaving a research gap that needs to be filled with studies from diverse populations [46, 47]. The use of ICD codes as a diagnostic tool is also challenging. Because ICD codes are primarily used for billing, they do not always serve as an accurate predictor of a patient’s specific medical diagnosis [48]; however, we have shown strong genetic correlations between ICD codes and clinical diagnosis, and note that ICD codes were also used to define cases in the recent large GWAS of AN [1]. Lastly, it is difficult to assess the true BMI history of an individual through medical records. Lowest BMI based on a lifetime lowest BMI may not necessarily represent the BMI during the active illness period. Therefore, more comprehensive, or ideally, prospective studies on BMI that involve more frequent measurements would be an improvement [49, 50]. Gathering longitudinal data and performing a mediation analysis taking the underlying timeline into account may help us further examine whether PRSBMI significantly contributes to AN beyond its effects on BMI. Future GWAS of AN symptomatology would allow us to test whether PRSBMI is associated with low body weight over the other symptoms.

Overall, there is a clear relationship between AN diagnosis and body composition. Our work speaks to the importance of exploring potential hypotheses to explain this complex relationship.