The global prevalence and disease burden of obesity continues to rise in the United States and worldwide1,2, posing a major threat to public health and quality of life. Variation in body mass index (BMI), the metric commonly used to define obesity, is highly heritable. Obesity and extreme obesity is strongly associated with a growing number of chronic diseases, including type 2 diabetes mellitus (T2DM) and coronary heart disease, that are leading causes of preventable morbidity and mortality. It remains uncertain whether obesity lies in the causal pathway, is a confounding factor, or shares a common etiology with these co-occurring disease conditions. The totality of trials of metabolic bariatric surgery provides consistent efficacy for weight loss and growing evidence of increased survival and other health benefits associated with surgery3,4,5. Randomized clinical trials of pharmacotherapy with newer glucose-lowering agents, including SGLT2 inhibitors and GLP-1 receptor agonists, has also shown evidence of weight loss associated with reduced risk of major cardiovascular outcomes in patients with type 2 diabetes6, and treatment with GLP-1 receptor agonists plus lifestyle interventions is associated with sustained weight loss in obese nondiabetics7, although further study is needed to assess tolerability and overall clinical benefits8. Nevertheless, data are sparse on the interconnectedness of BMI-related conditions that could be influenced by emerging surgical and pharmacological approaches to obesity prevention and treatment. Mendelian randomization (MR)9 provides estimates of the strength of associations with disease outcomes using genetic instruments for life-course exposure to BMI, providing an approach to investigate the potential causal role of BMI in cardiometabolic disease10 and many other common diseases11,12,13.

A recent meta-analysis of genome-wide association studies (GWAS) for BMI identified over 700 independent variants in European descent populations14, implicating a large number of genes and pathways regulating satiety, energy balance and metabolism in adipose tissue. Genetic risk scores (GRS) based on BMI genetic variants robustly predict BMI and have enabled their use as genetic instruments in MR studies to address the relationship between BMI and several individual clinical disorders10,12,15. Phenome-wide association studies (PheWAS) offer a complementary framework to investigate genetic associations across many common diseases simultaneously13,16. While there is strong and consistent genetic evidence for the association of BMI with a number of leading causes of death17, the comprehensive assessment for MR associations within a PheWAS in a large population may expand our knowledge of the breadth, depth and interconnectedness of conditions associated with obesity.

We report here thousands of genome-wide single nucleotide polymorphism (SNP) associations with BMI in both European (EA) and African (AA) descent populations in the Million Veteran Program (MVP) mega-biobank, including meta-analysis with other large-scale multi-ancestry consortia and UK Biobank. We harness the breadth of BMI genetic variation and medical disease phenotypes in the Veterans Health Administration electronic health record (EHR) to conduct a phenome-wide MR study of the association of increased BMI with an extensive range of conditions accounting for a substantial burden of morbidity and mortality in the population.


Observational analysis of the MVP participants

Among 215,734 EA (93.0% male) and 55,525 AA (87.6% male) MVP participants, the mean (SD) age at enrollment was 64.0 (13.1) and 57.9 (12.0) years, the mean (SD) BMI was 29.9 (5.9) and 29.2 (4.8) kg/m2, and the prevalence of obesity (BMI≥30) was 44.2% and 42.9%, respectively. Descriptive statistics of the EA and AA participants are summarized in Table 1.

Table 1 Characteristics of the Million Veteran Program non-Hispanic EA and AA participants

GWAS and genetic instruments in European and African Americans

The genetic ancestries of the MVP non-Hispanic EAs and AAs were consistent with the population structure and admixture represented by the top principal components (Supplementary Fig. 1). The major analytical procedures including the GWAS, PheWAS and network analyses are outlined in Supplementary Fig. 2. For EA participants, we identified 795 genome-wide significant loci (lead SNPs p < 5×10−8) in MVP, of which 285 were novel (distance > 500 kb and LD r2 < 0.1) compared with previously reported BMI-associated loci14. In the combined meta-analysis of MVP, UK Biobank and the GIANT Consortium, including up to 1,122,049 participants of European ancestry, we identified 2446 independent SNPs in 906 genome-wide significant loci, of which 364 loci were novel (Supplementary Fig. 3A, Supplementary Data 1). Using LDSC and GWAS summary statistics, we estimated the inflation factor (λ) and heritability (h2) of BMI. We observed λ of 1.599, which was typical in GWAS with very large sample sizes, and h2 of 0.188 (SE of 0.008) of BMI. The LDSC intercept of 1.053 (SE 0.011) and small ratio of 0.062 (SE of 0.013), indicate the majority of the inflation was due to polygenicity of BMI, not confounding. For AA participants, we identified 18 significant loci in the MVP. In the meta-analysis combining the MVP and the AAAGC, we discovered a total of 100 independent SNPs in 41 BMI-associated loci, of which 33 loci were novel for African ancestry and 6 loci were novel compared to previous BMI GWAS in any ancestry group (Supplementary Fig. 3B, Supplementary Data 2).

Using the total of 2,446 and 100 independent SNPs (lead SNPs and secondary SNPs: pair-wise LD with lead SNPs r2 < 0.1 and p < 5 × 10−8) in EA and AA participants, respectively, we conducted a weighted GRSBMI analysis for EA and AA participants using the meta-analysis beta coefficients, after removing MVP, as weights for the GRS. Several statistically significant SNPs from the main GWAS meta-analysis were only available in MVP, therefore the final numbers of SNPs in the GRS were 2428 and 94 in EA and AA, respectively. The EA-specific GRSBMI was associated with BMI (p < 10−314) and explained 5.4% of BMI variance in the MVP EA participants. The AA-specific GRSBMI was also associated with BMI (p < 10−314) and explained 0.9% of BMI variance in the MVP AA participants. Using SNPs and beta-coefficients identified in the EA-specific BMI GWAS, the weighted EA GRSBMI was associated with BMI but explained only 1.5% of BMI variance in 55,525 AA participants.

Nongenetic BMI associations with outcomes

We tested associations of BMI with 1,244 disease codes, for which there were ≥200 cases and controls in EA participants, drawn from phenotype codes defined in previous PheWAS analyses18. In this cross-sectional analysis, 661 phecodes were associated with BMI after correction for multiple testing (see Supplementary Data 3). 75% (n = 493) of these phecodes were positively associated with BMI (increased risk with higher BMI). Expected findings included positive associations with obesity/overweight and T2DM, and negative associations with anorexia.

Phenome-wide Mendelian randomization of BMI and outcomes

We conducted MR analysis of the 1244 phecodes in up to 174,915 EA participants. Of the phecodes associated with standardized BMI, MR results were also associated (threshold p < 4.02 × 10−5) with 316 codes across 16 disease groups (Table 2, Fig. 1, Supplementary Data 4). The presence and magnitude of a number of associations, including T2DM (OR = 2.64 per SD of BMI [95% CI 2.54–2.75]), sleep apnea (OR = 2.36 [95% CI 2.26–2.46]), hypertension (OR = 2.20 [95% CI 2.10-2.30]), ischemic heart disease (OR = 1.67 [95% CI 1.60–1.74]), and asthma (OR = 1.25 [95% CI 1.17–1.33]), were consistent with recent MR studies of BMI for these diseases10,11,19. After accounting for multiple testing, there were a number of MR associations with circulatory system and metabolic diagnoses that were not previously reported to be significant. These included a broad set of cardiovascular diagnoses that confer increased mortality, such as the subtypes of heart failure with preserved ejection fraction and heart failure with reduced ejection fraction, heart block, and cardiomyopathy. In addition, we identified associations with many other major conditions that increase morbidity and mortality, including major respiratory, genitourinary, digestive, musculoskeletal and sensory conditions such as respiratory failure, cholelithiasis, chronic renal failure, diabetic retinopathy and macular degeneration (Table 2). In a sensitivity analysis, these associations in the MR analyses were essentially the same in analyses using an alternative GRSBMI based on beta coefficients derived from the meta-analysis that included MVP (correlation coefficient r = 0.999, Supplementary Figs. 46).

Table 2 Selected phenome-wide associations between GRSBMI and clinical diseases in the MR analysis of non-Hispanic EAs
Fig. 1: Summary of phenome-wide Mendelian randomization analysis using GRSBMI.
figure 1

Red dots represent a statistically significant positive association with GRSBMI ( ↑ GRSBMI = ↑ disease risk), and black dots represent a statistically significant negative association in the MR analysis after multiple testing correction (two-sided test with nominal p-value less than 4.02 × 10−5). Top 100 associations with lowest p-values in the MR analysis are labeled including 98 positive associations (colored font) and two negative associations (black font) from 13 systems. Abbreviations (clockwise): Circulatory: circulatory system; Co: congenital anomalies; Derm: dermatologic; Endo/Metab: endocrine/metabolic; Hema: hematopoietic; Infect: infectious diseases; Injuries: injuries & poisonings; Mental: mental disorders; Neuro: neurological; Respir: respiratory; Sym: symptoms.

To further test for the validity of MR associations with the phecodes significantly associated by initial MR PheWAS analyses, we conducted a set of additional MR association tests, including MR inverse-variance weighted (IVW), median-weighted and Egger regression analyses (Supplementary Data 5). Associations remained significant for most of the phenotypes in one or more of the association tests. All but one phecodes had p-value less than 0.05 in the IVW MR analysis, and 78% phecodes had p value less than 0.05 in the MR Egger analysis, which addresses pleiotropy but is known to be under-powered.

The burden of the 316 disease codes associated with genetically influenced BMI in EA increased across increasing BMI categories (Supplementary Data 6). The number of disease codes was higher in obese and severely obese persons (for BMI ≥ 30, mean 26.7 [median 23], for BMI ≥ 40, mean 32.1 [29]) compared with BMI < 25 (mean 20.2[17]). There was a trend (p < 0.001) for an increasing number of associated disease codes across BMI categories (Supplementary Data 6). Among individuals with BMI ≥ 40, 69.9% had more than 20 codes associated with genetically influenced BMI, compared with 40.4% in those with BMI < 25 (Supplementary Data 6).

The MR analysis also showed a negative (protective) direction of association with BMI for 3.5% (n = 11) phecodes (i.e., decreased risk with higher BMI), including inguinal hernia, osteoporosis, and alcoholism. An additional 197 phecodes from 16 disease systems were associated with BMI but were not associated (p > 0.05) with genetically influenced BMI (Supplementary Data 7).

For participants of African ancestry, we examined the association of the GRSBMI based on the 2428 genome-wide associated SNPs in the EA meta-analysis to increase strength of the instrumental variable. Of the phecodes associated with standardized BMI, MR association results were also statistically significant (p < 6.00 × 10−5) for 61 codes across 11 disease groups using weights from the EA meta-analysis excluding MVP (Supplementary Data 8). These included T2DM and several major diabetic complications, sleep apnea, hypertension, congestive heart failure and heart failure with preserved ejection fraction, ischemic heart disease, and chronic kidney disease. There were 34 phecodes associated using the 2428 SNP GRSBMI with weights from the AAAGC (Supplementary Data 8). Of these 34, all were associated with EA GRS with EA weights. We also constructed a GRSBMI based on 94 independent SNPs identified by the largest BMI meta-analysis of AA participants combining data from MVP and the AAAGC but weighted using the effect sizes from AAAGC alone20. Four phecodes were associated in a positive direction with genetically influenced BMI in AA, after controlling for multiple testing (Supplementary Data 8). Apart from BMI-related phecodes (overweight, obesity, morbid obesity), we also noted a relationship between genetically influenced BMI and sleep apnea (OR = 2.09, [95% CI: 1.48–2.96], p = 3.04 × 10−5).

Finally, to explore for the potential of reverse causality in the associations between BMI and conditions in our PheWAS, we selected 10 traits across a range of disease for which GWAS summary data are available (see Results for PheWAS and Network analysis in the Supplementary Methods). As expected from prior research, there is a strong bidirectional (two-way) “causal” effect between body mass index and type 2 diabetes. However, there was no evidence of significant (inverse variance weighted-based MR P-value < 0.005) bidirectional effect to BMI for any of the other nine traits (see Supplementary Table 1 in the Supplementary Methods).

Analysis of disease comorbidity network

We created a human disease network of phecode-based diseases and disorders to evaluate the comorbidity patterns among 134 3-digit codes constructed from 316 phecodes associated with genetically influenced BMI from the phenome-wide MR findings. We included only conditions that are strongly associated with genetically defined BMI to focus on networks of conditions that are grounded in strong evidence for association consistent with a “causal” association with BMI and less likely to be confounded by other factors. The resulting network map (117 higher level phecodes with 360 links) identified many diseases associated with BMI in MR analyses that co-occur in several distinct patterns that might indicate the typical clinical burden for individual persons (Fig. 2). The highly connected nodes (i.e., hubs) represent diagnoses that tend to co-occur with many others (Supplementary Data 9), reflecting common patterns of symptoms or of specific diseases. The most connected diseases included disorders of attachment of ligaments or tendons to bone (“peripheral enthesopathies”) and respiratory symptoms, each strongly correlated with 20 disease codes. In a subset of phecodes with multiple connections in the network analysis (degree > 5 in Supplementary Data 9), evidence for MR association in one or more tests was strong and consistent with a similar magnitude and direction of effect (Supplementary Data 5). Among the common disease diagnoses associated with 10 or more conditions were conditions such as cardiovascular diseases, acute upper respiratory illness, and renal failure that account for leading causes of death in the United States21.

Fig. 2: Phenotypic network map.
figure 2

Nodes represent phecodes at the three-digit level and the links represent significant disease-disease associations (ϕ-correlation). Node size is proportional to network connectivity (degree). Individual disease communities (A-G) are described in the Supplementary Materials.

We identified seven disease communities (see Supplementary Fig. 7A-G)—groups of diseases associated with genetically influenced BMI that tend to co-occur in the larger disease comorbidity network—using permutation-based statistical tests (p value < 0.0041, 0.05/12 communities). These communities were comprised of diseases from multiple disease systems (e.g., Community A: circulatory, endocrine, nervous systems, genitourinary, and general symptoms). Underscoring the extensive interconnectedness of the disease codes associated with genetically influenced BMI in this network, each of the top ten nodes was connected with over a dozen disease codes (ranging from 13 to 20) within a community and also with disease codes in up to six different communities (Supplementary Data 9).


We discovered a total of 370 novel genetic loci for BMI in samples of European (364) and African descent (6), through a large-scale ancestry-specific meta-analysis of European and African ancestry GWAS, including the MVP. We incorporate these newly discovered genetic variants into the largest phenome-wide MR analysis performed to date, identifying several hundred diseases from across 16 different disease categories in AA and EA US Veterans for which BMI is implicated as a genetically associated risk factor. While we confirmed a number of previously reported associations, including those recently reported in two recent MR analyses of the UK Biobank13,19, we also identified many conditions not associated with genetically influenced BMI in previous MR analyses. There was a striking increase in the burden of MR-associated disease codes across increasing categories of BMI. In our analysis of disease comorbidity networks that incorporates all strongly associated conditions from our phenome-wide MR analysis, there were seven communities of diseases with extensive intra- and inter-community connections, underscoring new insights into the complex genetic underpinnings of obesity and its impact on diseases in the population.

The high prevalence of obesity has continued to grow worldwide and across the United States1,22,23, including users of the Veterans Administration health care system24,25. For example, from 2007–2008 to 2015–2016, the age-standardized prevalence of obesity increased in US adults from 33.7% to 39.6%, respectively23. In the Global Burden of Disease (GBD) analysis, high BMI accounts for 4 million excess deaths per year, the majority being related to cardiovascular disease1. In the same study, 20 disease endpoints were identified through conventional epidemiological studies but no MR analysis was cited to support causality. The urgent need to understand these links is underscored by evidence from the recent pandemic of SARS-CoV-2 for an increased risk of mortality in obese COVID-19 patients26 as well as increased risk for severe COVID-19 illness associated with obesity supported by two-sample MR studies27,28,29.

Through our MR analysis, we confirm and extend the findings for strong associations with common conditions in the GBD analysis in EA and AA US Veterans. Furthermore, we also provide novel strong evidence for associations of BMI with risk for many specific major circulatory system diseases, including heart failure with preserved ejection fraction, atrial fibrillation, aortic valve disease, venous thromboembolism, and abdominal aortic aneurysm. These strong associations highlight a broader impact of excess adiposity on cardiovascular disease morbidity than previously appreciated. Additionally, we observed associations in the MR analysis with a range of other conditions that are either life-threatening or adversely affect quality of life. These included life-threatening conditions of the genitourinary (chronic renal failure) and respiratory (respiratory failure, asthma) systems as well as conditions associated with substantial morbidity in the musculoskeletal and dermatologic systems. We noted a limited number of inverse associations that are consistent with prior observational studies, such as inguinal hernia30 and osteoporosis31. We also note other inverse disease associations in the MR analysis that warrant further investigation, such as protective associations with selected viral infectious diseases (herpes simplex virus) and substance abuse. Given the lack of published studies in populations of African descent, nearly all our MR association findings are novel in that population.

Our analysis of disease comorbidity networks incorporated over 300 conditions and identified seven communities with extensive intra- and inter-community associations of multiple BMI-associated conditions. While several communities were enriched for cardiovascular diseases, others were enriched for skin diseases, renal diseases, pulmonary diseases, and disorders of the eye and other sensory organs. This extensive set of conditions with evidence of association with genetically influenced BMI extends prior evidence for the clinical co-occurrence of multiple comorbidities conferred by obesity. Our findings are also consistent with a recent study showing substantial benefit on multiple outcomes, in obese diabetic patients treated with metabolic weight reduction surgery32.

In summary, we harnessed genetic variation discovered in large-scale meta-analysis of both European and African ancestry GWAS, revealing associations of increased genetic risk of BMI for several hundred diseases in a phenome-wide Mendelian randomization analysis and intra- and inter-community connections in an extensive disease comorbidity network. Our findings underscore the broad impact of obesity on multiple interconnected chronic and acute diseases and highlight the public health imperative to prevent and treat obesity in order to reduce downstream morbidity and mortality from numerous obesity-associated diseases.


Study participants

The design of the MVP has been previously described33. Briefly, individuals aged 19 to 104 years with the mean age of 62 years have been recruited from over 60 Veterans Health Administration medical centers nationwide since 2011. Each veteran’s EHR is being integrated into the MVP biorepository, including inpatient International Classification of Diseases (ICD9/10) diagnosis codes, Current Procedural Terminology (CPT) procedure codes, clinical laboratory measurements, and reports of diagnostic imaging modalities.The MVP research database integrates the extensive EHR data from each enrolled Veteran. MVP has received ethical and study protocol approval by the VA Central Institutional Review Board in accordance with the principles outlined in the Declaration of Helsinki.


EHR data from clinical examinations were available for MVP participants from as early as 2003. BMI is calculated as the weight (in kilograms) divided by the height (in meters) squared. We calculated the average BMI using all measurements within a three-year window around the date of MVP enrollment (i.e., 1.5 years before/after the date of enrollment), excluding height measurements that were >3 inches (0.0762 meters) or weight measurements >60 pounds (27.22 kilograms) from the average of each participant25.

Genetic data and genome-wide association analysis

DNA extracted from participants’ blood was genotyped using a customized Affymetrix Axiom® biobank array, the MVP 1.0 Genotyping Array, with genotype imputation to the 1000 Genomes reference panel34 as detailed in the Supplementary Materials. We used both self-reported ethnicity and genetic ancestry to define non-Hispanic European American (EA) and Non-Hispanic African American (AA) participants in the MVP.

Genetic association with BMI in the MVP cohort was examined separately among non-Hispanic EA (N = 215,734) and non-Hispanic AA (N = 55,525) participants. For each ancestry group, BMI was stratified by sex and adjusted for age, age-squared, and the top ten genotype-derived principal components of ancestry in a linear regression model. The resulting residuals were transformed to approximate normality using inverse normal scores. Imputed and directly measured genetic variants were tested for association with the inverse normal transformed residuals of BMI through linear regression assuming an additive genetic model.

We performed ancestry-specific inverse-variance weighted fixed-effects meta-analysis using METAL35. For EA participants, we meta-analyzed GWAS results from MVP, UK Biobank, and the Genomic Investigation of Anthropometric Traits (GIANT) Consortium. For AA participants, we meta-analyzed GWAS results from MVP with the African Ancestry Anthropometry Genetics Consortium (AAAGC) consortium. GWAS results were summarized using FUMA ( Novel loci were defined as those with genome-wide significance (p < 5 × 10−8) and a distance > 500 kb from previously published variants14,20,37,38.

Phenome-wide association and Mendelian randomization analyses

We constructed EA- and AA-specific weighted genetic risk scores of BMI (GRSBMI) to perform phenome-wide MR analysis, using the independent and genome-wide significant SNPs from the respective BMI meta-analysis. Beta coefficients from the respective ancestry-specific meta-analyses, excluding MVP, were used as weights for each GRS in order to avoid overfitting.

The phenotypes characterizing disease diagnoses include the full catalog of phecodes (N = 1813) from sixteen disease systems determined at participant enrollment16, and all analyses were limited to phecodes with at least 200 cases and 200 controls in the 174,531 EA or 49,695 AA participants (N = 1244 for EA and N = 833 for AA).

For the PheWAS, logistic regression models were used to assess the association between standardized BMI [(bmi-mean(bmi))/sd(bmi)] and phecodes, adjusted for age, sex and the top 10 genotype-derived principal components. Phenotypes were considered associated if they had a p-value less than 4.02 × 10−5 and 6.00 × 10−5 (Bonferroni corrected p-value of 1,244 and 833 traits) for EA and AA, respectively.

To search for evidence of genetic association of BMI with BMI-associated traits, we then conducted the phenome-wide MR analysis in EA and AA participants separately using ancestry-specific GRSBMI as the genetic instrumental variable6. We used the same set of phecodes included in the PheWAS. To explore in depth the validity of MR associations, we examined further for associations using inverse-variance weighted (IVW), MR Egger regression and weighted median MR analysis39 for all PheWAS phecodes that were significant in the phenome-wide PRS analysis. To identify significant phenotype associations, we used the same Bonferroni correction for the PheWAS to account for the total number of phenotypes (N = 1244 for EA and N = 833 for AA) tested in the initial MR PheWAS. Details of phenomic data quality control, case definitions, and association analysis are described in the Supplementary Materials.

Disease comorbidity networks

We created a phenotypic network map to evaluate the comorbidity patterns among the diagnosis codes for which there was plausible evidence (p < 4.02 × 10−5) for BMI in the causal pathway in the MR analysis. The nodes in the network represent disease diagnosis codes rounded up to their correspondent 3-digit level and links between nodes represent the strength and significance of disease co-occurrence (comorbidity). To measure the comorbidity strength for diseases i and j, we used the ϕ-correlation ϕij40, and determined their significance using t-test corrected for multiple testing. The ϕ-correlation, which is Pearson’s correlation for binary variables, can be expressed mathematically as the following equation (Eq. 1):


where Cij is the number of patients affected by both diseases, N is the total number of patients in the population and Pi and Pj are the prevalence of diseases i and j. Only pairs with adjusted p-value < 0.05 and ϕ >= 0.2 were considered. Next, we used the network structure to identify groups of diseases (i.e., communities) that show higher comorbidity (links in the network) among diseases in the group in comparison with the remaining diseases in the network. We applied the Louvain community detection algorithm41 and the qstest method42 for evaluation of community significance.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.