Cardiometabolic and renal phenotypes and transitions in the United States population

Lhoste, Victor P. F.; Zhou, Bin; Mishra, Anu; Bennett, James E.; Filippi, Sarah; Asaria, Perviz; Gregg, Edward W.; Danaei, Goodarz; Ezzati, Majid

doi:10.1038/s44161-023-00391-y

Download PDF

Article
Open access
Published: 15 December 2023

Cardiometabolic and renal phenotypes and transitions in the United States population

Victor P. F. Lhoste^1,2,
Bin Zhou ORCID: orcid.org/0000-0002-1741-8628^1,2,3,
Anu Mishra^1,2,
James E. Bennett^1,2,
Sarah Filippi⁴,
Perviz Asaria^1,2,
Edward W. Gregg^1,2,3,5,
Goodarz Danaei^6,7 &
…
Majid Ezzati ORCID: orcid.org/0000-0002-2109-8081^1,2,3,8

Nature Cardiovascular Research volume 3, pages 46–59 (2024)Cite this article

3547 Accesses
1 Citations
15 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 15 January 2024

This article has been updated

Abstract

Cardiovascular and renal conditions have both shared and distinct determinants. In this study, we applied unsupervised clustering to multiple rounds of the National Health and Nutrition Examination Survey from 1988 to 2018, and identified 10 cardiometabolic and renal phenotypes. These included a ‘low risk’ phenotype; two groups with average risk factor levels but different heights; one group with low body-mass index and high levels of high-density lipoprotein cholesterol; five phenotypes with high levels of one or two related risk factors (‘high heart rate’, ‘high cholesterol’, ‘high blood pressure’, ‘severe obesity’ and ‘severe hyperglycemia’); and one phenotype with low diastolic blood pressure (DBP) and low estimated glomerular filtration rate (eGFR). Prevalence of the ‘high blood pressure’ and ‘high cholesterol’ phenotypes decreased over time, contrasted by a rise in the ‘severe obesity’ and ‘low DBP, low eGFR’ phenotypes. The cardiometabolic and renal traits of the US population have shifted from phenotypes with high blood pressure and cholesterol toward poor kidney function, hyperglycemia and severe obesity.

Development and validation of a new algorithm for improved cardiovascular risk prediction

Article Open access 18 April 2024

Genome-wide characterization of circulating metabolic biomarkers

Article Open access 06 March 2024

Continuous glucose monitoring and intrapersonal variability in fasting glucose

Article 08 April 2024

Main

Diabetes, dementia, cardiovascular disease (CVD) and chronic kidney disease (CKD) are leading causes of death in the United States, in other high-income nations and, increasingly, in low-income and middle-income countries^1,2. Obesity, short stature, high blood pressure, high heart rate, hyperglycemia, non-optimal lipid profiles and poor kidney function are established risk factors for one or more of these diseases^{3,4,5,6,7,8,9,10,11,12,13,14,15} and, in some cases, for infections such as coronavirus disease 2019 (ref. ¹⁶). As a result, people who have optimal levels of all or most risk factors are at low risk of cardiovascular and renal disease and cancer and vice versa^{17,18,19,20,21}. Physiological risk factors can have complex correlations and co-occurrence patterns for at least two reasons. First, these physiological factors have shared as well as distinct genetic, behavioral, environmental and dietary determinants. For example, consumption of fruits and vegetables, meat, dairy, unsaturated versus saturated fats, processed versus whole grain carbohydrates and alcohol affect multiple cardiometabolic and renal traits beneficially or adversely, whereas others, such as sodium and potassium, affect only one or two traits (blood pressure and kidney function)^{22,23,24,25,26,27,28,29}. Furthermore, these factors may cluster differently among different subgroups of a population³⁰ and change over time³¹. Second, some of these physiological risk factors are themselves etiologically related; for example, obesity is a risk factor for dyslipidemia, elevated blood pressure and hyperglycemia^32,33.

At the population level, some studies have quantified trends in individual cardiometabolic risk factors in the US population, other countries or globally^{34,35,36,37,38,39,40,41,42,43}. Other studies have counted the number of cardiometabolic risk factors^44,45, with some also quantifying association with the risk of coronary heart disease⁴⁵. Some studies have used concepts such as metabolic syndrome⁴⁶, optimal cardiometabolic health⁴⁴ and metabolically healthy obesity^47,48,49 to identify groups of people with a specific pre-determined risk factor profile. Studies that used data-driven methods to identify cardiometabolic phenotypes were mostly based on data from specific subgroups of a population (for example, older adults)⁵⁰, users of specific health programs⁵¹ or people with a specific index disease, such as diabetes^52,53,54, sepsis⁵⁵ or cardiogenic shock⁵⁶. The only study analyzing health-related phenotypes in an entire national population⁵⁷ used a mix of behavioral, physiological and diagnostic variables at a single point in time for methodological assessment; it did not analyze change over time or the clinical or epidemiological characteristics of the clusters. Beyond cardiometabolic and renal health, some studies identified co-occurrences, or subtypes, of specific diseases in large cohorts, such as the UK Biobank⁵⁸, in primary care patients from different countries^59,60, especially using electronic health records^{61,62,63,64,65,66,67}. These studies used a range of clustering methods^66,68.

In the present study, we applied a data-driven approach to repeated nationally representative health examination surveys, namely the National Health and Nutrition Examination Survey (NHANES), from 1988 to 2018, to identify a comprehensive set of cardiometabolic and renal phenotypes in the United States adult population. We measured how the prevalence of these phenotypes has changed over time and characterized their sociodemographic, epidemiological and clinical predictors. This information is needed for planning and priority setting for population-based prevention programs and health system interventions to coherently and effectively prevent and manage conditions based on their co-occurrence in the population^69,70.

Cardiometabolic and renal phenotypes of the US population

We identified 10 clusters (phenotypes) for both men and women that collectively characterized the cardiometabolic and renal traits of the US population from 1988 to 2018 (Fig. 1). The reasons for using 10 clusters are stated in the Methods, and the results with other cluster numbers are presented below. The identified phenotypes were similar between men and women, even though we analyzed data for the two sexes separately.

**Fig. 1: Risk factor profiles of the cardiometabolic and renal clusters of US adults for women and men.**

For both sexes, we identified a ‘low risk’ phenotype with near-optimal risk factor levels, accounting for 15% and 13% of the sample for women and men, respectively. We also identified two clusters (‘mid risk short’ and ‘mid risk tall’) jointly accounting for 25% and 28% of the sample for women and men, respectively, with risk factor levels mostly around sample medians. These two clusters differed by their average height and, to a lesser extent, by blood pressure and estimated glomerular filtration rate (eGFR) levels, with the ‘mid risk short’ cluster having, on average, shorter height (median of 155 cm versus 167 cm for women; 168 cm versus 182 cm for men) (Supplementary Table 1), lower blood pressure and higher eGFR than the ‘mid risk tall’ cluster. We also identified a group (‘low BMI, high HDL’) characterized by low levels of body mass index (BMI) and waist-to-height ratio (WHtR) and high high-density lipoprotein (HDL) cholesterol relative to the rest of the NHANES sample but with other risk factors being around the sample median.

Five clusters were characterized by having high levels of one or two related risk factors accounting together for 40% of the sample for both sexes. These were ‘high cholesterol’, ‘high blood pressure’, ‘severe hyperglycemia’, ‘high heart rate’ and ‘severe obesity’. For instance, the ‘severe hyperglycemia’ phenotype had a median glycated hemoglobin (HbA1c) of 9.9% for women and 9.8% for men, but their median BMI (and WHtR) was much lower than those of the ‘severe obesity’ cluster (median BMI of 31.8 kg m⁻² and 29.7 kg m⁻² in the ‘severe hyperglycemia’ cluster for women and men, respectively, compared to a median BMI of 41.1 kg m⁻² and 38.2 kg m⁻² in the ‘severe obesity’ cluster). Similarly, the ‘high blood pressure’ cluster had a median systolic blood pressure (SBP) of 159 mmHg for both sexes, and the ‘high cholesterol’ cluster had a median non-HDL cholesterol of 5.5 mmol L⁻¹ for both women and men, with other risk factor levels lying between the median and 75th percentiles of the entire NHANES sample. In all these clusters, the defining risk factor varied less among member participants than the other risk factors (Extended Data Fig. 1), further illustrating that its high value was the shared feature among participants who fell in the cluster. Finally, in both sexes, the last cluster (‘low DBP, low eGFR’) was characterized by low levels of diastolic blood pressure (DBP) and eGFR. For example, women who fell in the ‘low DBP, low eGFR’ cluster had a median DBP of 61 mmHg and a median eGFR of 63 ml/min/1.73 m².

Demographic and clinical characteristics of clusters

Most of the identified cardiometabolic and renal phenotypes had a mix of young (20–39 years), middle-aged (40–59 years) and old (60 years and older) adults. The exceptions were two clusters for men and three for women with predominantly young people (‘low risk’ and ‘mid risk short’ for both sexes and ‘high heart rate’ for women) and one with predominantly old people (‘low DBP, low eGFR’) (Table 1). Even though 73% of women and 77% of men in the ‘low risk’ phenotype were aged 20–39 years, 4% and 6%, respectively, were older than 60 years with near-optimal risk factor profiles similar to their younger peers, except for slightly lower eGFR and higher HbA1c. Similarly, although most (92% of women and 90% of men) in the cluster ‘low DBP, low eGFR’ were 60 years or older, a small percentage (1% and 2%, respectively) were aged 20–39 years. Within each cluster, individuals of different age groups generally had similar risk factor profiles, especially on the defining risk factors in the higher risk phenotypes (Extended Data Fig. 2).

Table 1 Demographic characteristics and medication use of cardiometabolic and renal clusters of US adults

Full size table

The ‘low risk’ group had the lowest number of morbidities and medication use (Table 1 and Extended Data Table 1). As expected, 96% of women and 98% of men in the ‘high blood pressure’ cluster had hypertension, yet this condition was also prevalent in ≥50% of participants in some other clusters—for example, ‘low DBP, low eGFR’ and ‘severe hyperglycemia’ for both sexes and ‘severe obesity’ phenotype for men (most of those with hypertension in the ‘low DBP, low eGFR’ cluster had isolated systolic hypertension). Similarly, all participants in the ‘severe hyperglycemia’ cluster had diabetes; the next highest diabetes prevalence was in the ‘low DBP, low eGFR’ cluster (31% in both sexes), with the ‘severe obesity’ cluster having only the third highest prevalence (22% in women and 25% in men). Median HbA1c of people with diabetes in the ‘severe obesity’ cluster (6.88% for men and 6.77% for women) was much lower than median HbA1c of those in the ‘severe hyperglycemia’ cluster (9.9% for women and 9.8% for men). Finally, those in the ‘low DBP, low eGFR’ phenotype more frequently had a history of myocardial infarction (MI), stroke and congestive heart failure (CHF) than the other phenotypes—for example, 19% of men in this phenotype had a history of MI compared to 6% in the whole sample; similarly, 12% of men in this phenotype had a previous history of CHF compared to 4% in the whole sample.

The use of statins was relatively low in the ‘high cholesterol’ group—13% for women and 8% for men—with that of men being lower than the overall NHANES sample (Table 1). In contrast, statin and antihypertensive use was high in the ‘low DBP, low eGFR’ and ‘severe hyperglycemia’ groups (26–41% of participants in different cluster–sex combinations, which is 2–3 times more than in the overall samples), consistent with the clinical guidelines that recommend the use of these medicines among people with diabetes and history of MI and stroke, especially in older ages. In the ‘severe obesity’ cluster, antihypertensive and statin use was above average, which may partly account for this group having blood pressure and cholesterol levels around the population median. The use of most medicines was higher in the 2011–2018 period than over the entire analysis period, with the largest increase being that of statins (Extended Data Table 2). The increase in statin use was, however, less pronounced in the ‘high cholesterol’ phenotype (+38% relative increase for women and +4% for men) than in the whole sample (+48% for women and +45% for men), demonstrating that this phenotype was characterized by insufficiently treated or controlled levels of non-HDL cholesterol.

Trends over time

The cardiometabolic and renal risk profile of the US population changed from 1988 to 2018 (Fig. 2). The age-standardized prevalence of the ‘severe obesity’ phenotype more than tripled for both sexes and that of the ‘low DBP, low eGFR’ phenotype almost doubled over the entire analysis period. Most of the increase of the ‘low DBP, low eGFR’ phenotype occurred between 2000 and 2010, before plateauing after 2010 (P value for trend from 2010 to 2018 was 0.96 for women and 0.97 for men; Extended Data Table 3). In contrast, the prevalence of the ‘high blood pressure’ and ‘high cholesterol’ phenotypes more than halved in both sexes (P value for trend was <0.0001 for both sexes over the entire analysis period). However, since the late 2000s, there has been a reversal of the earlier declines in the prevalence of the ‘high blood pressure’ phenotype (P value for increasing trend from 2010 to 2018 was 0.0015 for women and 0.0346 for men). There was no statistically detectable change in the ‘severe hyperglycemia’ phenotype (P = 0.09 for women and 0.79 for men), which indicates that, despite the increase in the prevalence of diabetes in the United States, those at extreme values of HbA1c were stable. Rather, many of the additional people with diabetes fell in the ‘severe obesity’ and ‘low DBP, low eGFR’ clusters for which the prevalence increased over time. Most trends were consistent between the two sexes. A notable exception was the ‘low risk’ phenotype, which remained constant for men but decreased by 4.5 percentage points for women (P value for trend was 0.0006 over the entire analysis period), even though its prevalence remained higher in women than men throughout the analysis period. Trends in crude prevalence were nearly identical to the age-standardized trends (Extended Data Fig. 3).

**Fig. 2: Trends in cardiometabolic and renal clusters from 1988 to 2018.**

Changes in age patterns of clusters

The various cardiometabolic and renal phenotypes had differing age associations (Fig. 3). The ‘low risk’ and ‘mid risk short’ phenotypes for both sexes, and the ‘high heart rate’ phenotype for women, were more common among younger adults, and their prevalence decreased with age, with a much steeper age association for the ‘low risk’ group. Conversely, the ‘low DBP, low eGFR’ and ‘high blood pressure’ phenotypes became more prevalent throughout the life course, with a steeper age association for the ‘low DBP, low eGFR’ group. Other phenotypes tended to peak in middle ages.

**Fig. 3: Age patterns of cardiometabolic and renal clusters.**

Both ‘high blood pressure’ and ‘high cholesterol’ phenotypes decreased sharply in people aged 50 years and older from 1991 to 2008, likely due to the increased use of statins and antihypertensive medication; however, the decreases may have slowed down or stagnated in the past decade. In contrast, for both sexes, the age association of the ‘low DBP, low eGFR’ phenotype became steeper over time.

Predictors of cardiometabolic and renal traits

We analyzed the sociodemographic, behavioral and clinical predictors of cluster membership in multivariate regressions as described in the Methods. Both education and ethnicity were associated with the partition of the participants into some of the cardiometabolic and renal phenotypes. Higher education was associated with lower odds of allocation to the ‘high cholesterol’ phenotype for both men and women, lower odds of allocation to the ‘severe hyperglycemia’ phenotype for men and lower odds of allocation to the ‘low DBP, low eGFR’ phenotype for women; it was associated with higher odds of being in the ‘low risk’ phenotype for women (Figs. 4 and 5). Hispanic and non-Hispanic Black women and men had higher odds of belonging to the ‘severe hyperglycemia’ and ‘high blood pressure’ phenotypes than non-Hispanic Whites; Hispanic and non-Hispanic Black women had lower odds of belonging to the ‘low risk’ phenotype than non-Hispanic Whites; and non-Hispanic Black men and women had lower odds of belonging to the ‘high cholesterol’ phenotype.

**Fig. 4: Predictors of the allocation to cardiometabolic and renal phenotypes in women.**

**Fig. 5: Predictors of the allocation to cardiometabolic and renal phenotypes in men.**

The use of statins was associated with lower odds of belonging to the ‘high cholesterol’ phenotype for both men and women, demonstrating its effectiveness in controlling hypercholesterolemia. In contrast, diabetes medications, both oral and insulin, were associated with the ‘severe hyperglycemia’ phenotype in both sexes, as were antihypertensive medications for the ‘high blood pressure’ phenotype, albeit with a smaller magnitude than the former association. This shows that many individuals in these two phenotypes have uncontrolled diabetes or hypertension despite being treated⁴¹. Individuals on antihypertensive medicines also had higher odds of belonging to the ‘severe obesity’ phenotype, which provides one explanation for this group having a blood pressure level around the population median, despite the association between obesity and hypertension³³. We also found that previous history of MI (both sexes) as well as previous history of CHF (women) were associated with the ‘low DBP, low eGFR’ phenotype even after adjusting for age and other predictors.

Influence of the number of clusters

As described in the Methods, while our main results are based on 10 clusters we also investigated cluster membership and characteristics when sequentially changing the number of clusters (k) from 5 to 12. Even with five clusters (k = 5), four epidemiologically relevant cardiometabolic and renal phenotypes were identified—‘low risk’, ‘severe hyperglycemia’, ‘high blood pressure’ and ‘severe obesity’—along with a ‘mid risk’ cluster that captured all other participants (Fig. 6 and Supplementary Fig. 1). As the number of clusters increased, more refined and specific groups were identified as subsets of one or more of the existing clusters. For instance, the ‘high cholesterol’ cluster appeared at k = 7 for women, with participants coming from the clusters of ‘high blood pressure’ and ‘mid risk’ at k = 6. Similarly, the ‘mid risk’ group for men at k = 7 split into ‘mid risk tall’ and ‘mid risk short’ at k = 8. For both sexes, the ‘severe hyperglycemia’ cluster appeared at k = 5 and remained relatively unchanged as k increased, as did the ‘low DBP, low eGFR’ cluster after k = 6.

**Fig. 6: Changes in cardiometabolic and renal clusters in relation to the number of clusters.**

Strengths and limitations

The strengths of our study include using a novel approach to identifying a comprehensive set of epidemiologically and clinically relevant phenotypes that characterizes the entire national population while covering four decades using repeated nationally representative samples with a largely consistent methodology, which allowed measuring change and disparities in phenotype prevalence and its predictors. Our study has some limitations. First, we did not include any inflammation-related biomarkers, such as C-reactive protein, or other cardiometabolic or renal biomarkers, such cystatin C or apolipoprotein B, because these data were not available in some rounds of NHANES. Second, this analysis was based on a series of repeated cross-sectional samples and was not designed to evaluate how an individual with a specific phenotype in one year may have shifted to another in a later year or how the identified phenotypes affect the risk of disease onset or death, which should be pursued with data from prospective cohort studies. Third, other clustering methods should be tested in future methodological assessments, especially probabilistic clustering methods that estimate the probabilities that each participant belongs to each cluster. Finally, although we analyzed some predictors of cluster allocation, future research should investigate how other factors, including genetics, diet, behaviors and the living environment, affect assignment to specific clusters.

Discussion

Application of data-driven clustering, which has been applied extensively to genomics data, to population-based risk factor data identified a comprehensive set of clinically relevant cardiometabolic and renal phenotypes in the US adult population over a period of four decades. The results showed an increase in the ‘severe obesity’ phenotype whose other cardiometabolic risks were not noticeably different from the average population, a stable prevalence of the ‘severe hyperglycemia’ phenotype and a sharp decrease in the ‘high cholesterol’ and ‘high blood pressure’ phenotypes. This improvement in vascular health has been partly offset by rising prevalence of those with poor kidney function in the ‘low DBP, low eGFR’ cluster.

To our knowledge, no study has applied data-driven clustering methods to repeated nationally representative data to identify multifactorial cardiometabolic and renal phenotypes, and to analyze their trends, in the US population. Our results were consistent with single-risk-factor trend studies on obesity, hypertension or blood lipids, which showed a rise in the former but a decline in the latter two risk factors, including in individuals with obesity^{34,35,36,42,43}. Our result on the higher prevalence of the ‘low risk’ phenotype in women than in men was also consistent with previous findings on cardiovascular health of the US population⁴⁴. We further observed a decrease in the ‘low risk’ phenotype in women and no detectable change for men, which was consistent with a reported statistically insignificant trend in the prevalence of optimal cardiometabolic health for both sexes combined⁴⁴. We did not observe an increase in the ‘severe hyperglycemia’ phenotype between 1988 and 2018 despite the reported rise in diabetes in the United States⁷¹. This was because the ‘severe hyperglycemia’ phenotype was characterized by very high HbA1c levels and included individuals with uncontrolled diabetes, consistent with previous findings on diabetes subgroups^53,54. The prevalence of people at such high levels of HbA1c has been relatively stable because improvements in diagnosis and management have countered the rise in total diabetes prevalence⁷². The ‘low DBP, low eGFR’ phenotype, which had two dominant features (high pulse pressure and poor kidney function), is consistent with the association between atherosclerosis and CKD⁷³. This phenotype was found predominantly in older ages, had a high prevalence of diabetes and was associated with a history of MI and CHF for women, consistent with high levels of vascular–renal comorbidity in older ages⁷⁴ and with the association of CHF with pulse pressure⁷⁵. The observed increase in the ‘low DBP, low eGFR’ phenotype, especially in the early 2000s, was also consistent with the previously reported rise in the prevalence of CKD in the United States⁷⁶. We did not identify a metabolically healthy obesity phenotype, which accounted for 9.7% of the US population in one study on this specific group⁷⁷, even after allowing 12 clusters to be formed. There may be two reasons for this apparent difference. First, half of the people classified as metabolically healthy in the aforementioned study⁷⁷ had one metabolic risk factor. Second, in our study, such people were clustered either in the ‘severe obesity’ phenotype or in the two mid-risk phenotypes. Finally, our results on ethnic and educational disparities in the prevalence of specific clusters were consistent with previous studies that considered risk factors either individually^36,78 or through the lens of optimal cardiometabolic health²³, but these studies did not examine disparities in a comprehensive set of cardiometabolic and renal phenotypes of risk factors. Our results are not directly comparable with those using electronic health records due to differences in the study population, methods and clinical conditions used in the clustering and because some of these studies aimed at identifying subtypes of specific diseases^{45,47,48,50,51,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68}. Among such studies, two studies in different populations identified phenotypes characterized by compromised kidney function and low DBP^50,56. Another study that used electronic health records in London also found a cluster with both CHF and CKD⁶², which is analogous to our ‘low DBP, low eGFR’ phenotype. One study using electronic health records found a subtype of type 2 diabetes characterized by very high HbA1c levels analogous to the ‘severe hyperglycemia’ phenotype identified in our study⁵⁴.

Our analysis coherently uncovered epidemiological subgroups of the US population characterized by distinct profiles of cardiometabolic and renal risk factors. Some of these phenotypes were characterized by high levels of one or two closely related risk factors, whereas others were more complex and based on multiple seemingly unrelated traits that may share upstream clinical and sociodemographic determinants. Although genetics influences individual or multiple risk factors^{79,80,81,82,83,84,85}, the risk factors that characterized the clusters identified in our study are also influenced by behavioral, environmental and dietary determinants as well as the use (or non-use) of medicines that lower risk factor levels. Future research combining these determinants with genetic data is needed to discern their contributions to the prevalence and trends in cardiometabolic phenotypes and their influence on the occurrence of disease. Our results apply to the US population, and future research should also compare cardiometabolic and renal phenotypes across populations with different diets, health behaviors, healthcare and genetics.

Although the prevalence of the phenotype characterized by very high BMI and WHtR has increased, this group had about average levels of other risk factors. Nonetheless, higher-than-median BMI was also a trait of the ‘severe hyperglycemia’ phenotype, which has not declined despite improvements in diabetes detection and treatment, reflecting the growth of incidence and prevalence of diabetes during the period examined³⁸. There was a substantial decline in phenotypes characterized by high levels of non-HDL cholesterol and SBP and DBP, despite the rise in the ‘severe obesity’ phenotype. The use of antihypertensive medicines, which increased over time, may be one of the reasons that those in the ‘severe obesity’ cluster have near-average blood pressure levels despite their high BMI and WHtR levels. The use of statins and antihypertensive medications may have also shifted some treated individuals from the ‘high blood pressure’ and ‘high cholesterol’ groups into the two mid-risk ones, as seen in the correlated trends in the prevalence of the ‘high cholesterol’ and ‘high blood pressure’ phenotypes with the use of statins and antihypertensive medications, respectively (Fig. 7)^86,87. These improvements have contributed to the decades-long decline in cardiovascular mortality in the United States through lower event rates and better survival^88,89. The delayed vascular events and better survival, however, may have engendered a rise in an older group with increasingly vascular–renal comorbidities, represented by the ‘low DBP, low eGFR’ phenotype, among whom history of MI and stroke was common and the prevalence of CHF was high. The increase of the ‘high blood pressure’ phenotype since late 2000s may be due to the fact that hypertension treatment and control in the United States, and in other high-income countries, has not improved over the past decade⁹⁰. This stagnation may be partly responsible for the recent deceleration in the decline of CVD mortality⁸⁹. Public health actions, especially those that enhance access to healthier foods, such as fresh fruits and vegetables, legumes and unprocessed grains, as well as treatment of hypertension, high cholesterol and diabetes, can help shift an increasing share of the population from some of the high-risk phenotypes to low-risk and mid-risk ones and delay the onset of comorbid chronic conditions that characterized the ‘low DBP, low eGFR’ phenotype. New medicines for obesity, if their cost is lowered, may also reduce the prevalence of the ‘severe obesity’ phenotype, which has average levels of other risk factors, and also reduce BMI among people who fall in other high-risk clusters⁹¹. These interventions may be optimized and targeted in the future through precision public health approaches that use the entire risk factor profile or more efficient risk stratification and risk factor management through both clinical and community-based interventions.

Fig. 7: Age-standardized trends in hypertension, treated hypertension and prevalence of the ‘high blood pressure’ phenotype (a) and in hypercholesterolemia, treated hypercholesterolemia and prevalence of the ‘high cholesterol’ phenotype (b).

Methods

Data

The NHANES is a nationally representative survey of the US non-institutionalized civilian population aged 2 months or older with a multistage, stratified clustered probability sample design. The first round of NHANES was done in 1959, and, since 1999, it has been conducted in continuous 2-year rounds. Details of survey design and sampling are provided elsewhere⁹² and are summarized below.

We used 11 rounds of NHANES, including NHANES III (1988–1994) and various rounds of continuous NHANES from 1999 to 2018, for analyzing trends in cardiometabolic and renal traits. We did not use rounds before NHANES III because they did not measure HbA1c. NHANES participants are not re-enrolled in subsequent years, except through chance. Therefore, our results represent cardiometabolic and renal clusters present in successive US populations.

Participants in each round of NHANES were sampled to be collectively representative of the population in the survey year. Ethnic minorities as well as older adults were oversampled to provide stable estimates for these groups. Sample weights were calculated to account for the complex survey design, survey non-response and post-stratification adjustment to match total population counts from the Census Bureau.

We restricted the analysis to participants aged 20 years and older who had all the required biomarker measurements available. We used the following risk factors in our study, based on their relevance to cardiometabolic and renal diseases and their availability in NHANES data.

Anthropometric measures: we used height (cm); BMI, defined as weight divided by height squared (kg m⁻²); and WHtR, defined as waist circumference divided by height. Being taller is associated with a lower risk of CVDs and all-cause mortality but a higher risk of some cancers¹³. High BMI is a risk factor for diabetes, CVDs, several cancers and kidney and liver diseases^9,14. WHtR was included as a measure of abdominal obesity, which may increase the risk of disease and death independently of BMI⁹³.

Blood pressure and heart rate: we used SBP and DBP as they are associated with increased risk of CVDs, kidney disease and dementia⁸. We included resting heart rate (RHR), as higher values have been associated with increased risk of cardiovascular and all-cause mortality³. RHR was measured as 60-s pulse and referred to as pulse rate.

Lipids: we used HDL and non-HDL cholesterol defined as total cholesterol (TC) minus HDL cholesterol. Non-HDL cholesterol is associated with higher risk of ischemic heart disease and stroke, and HDL cholesterol is a marker for lower risk¹¹.

Glycemia: we used HbA1c as a proxy of average glucose levels in the blood for recent weeks, which has been associated with CVDs¹², as the marker for glycemic risk and control.

Kidney function: we used eGFR (using the CKD-EPI creatinine equation) as a measure of kidney function, which is a predictor of CKD and CVDs^5,6.

All the risk factors used in the clustering were measured. Physical examinations were conducted in a mobile examination center, and blood samples were drawn from a random subset of the participants. Blood pressure was measured three times on the right arm with a sphygmomanometer and appropriate cuff size in seated position after a 5-min rest period in all rounds. Both TC and HDL analyses were conducted on venous samples collected according to a standardized protocol. Although there were changes in the laboratories, methods and instruments used to measure lipid concentrations across survey periods were standardized according to the criteria of the Centers for Disease Control and Prevention (CDC) or the National Heart, Lung, and Blood Institute Lipid Standardization Program of the CDC⁹⁴. HbA1c was measured in all NHANES cycles using high-performance liquid chromatography. We followed NHANES recommendations and did not apply any calibration correction based on cross-over regression. Before eGFR calculation, serum creatinine measurements were calibrated using a previously reported calibration equation⁹⁵ to account for potential drift in measurement methods. More information on NHANES measurement, laboratory procedures and careful quality controls can be found on the survey website: http://www.cdc.gov/nchs/nhanes.htm.

We did not use data on inflammation markers, such as C-reactive protein, because these data were only available in some rounds of NHANES. We also used data on age, sex, race and ethnicity, education, history of diseases and medication use for examining the demographic and clinical characteristics of the clusters; these data were collected through a questionnaire.

Data cleaning

Before analyses, we conducted the following data cleaning procedure. First, we removed measurements outside pre-defined plausibility ranges (Supplementary Table 2). Second, for blood pressure, we discarded the first measurement and used the average of the remaining measurements. Third, for all participants, we confirmed that SBP > DBP and TC ≥ HDL. Finally, we applied an outlier detection procedure based on Mahalanobis distance⁹⁶ to exclude risk factor pairs that had an implausible pairwise relationship relative to the overall data. This method uses the empirical relationship between risk factor pairs to detect extreme combinations, for example, a high SBP of 248 mmHg but low DBP of 40 mmHg or a high BMI of 42 kg m⁻² but small waist circumference of 74 cm. We applied this technique separately to all pairs of anthropometric variables (height, weight, BMI, waist circumference and WHtR), those of blood pressure (SBP and DBP) and those of lipids (TC and HDL). All variables except height and DBP were log transformed before outlier detection to account for their skewed distributions. For each pair considered, observations with a Mahalanobis distance larger than 40.08 (equivalent to a distance of six standard deviations from the mean) were excluded. The present analysis used data from 58,452 participants (28,272 men and 30,180 women) after applying the above steps (Extended Data Fig. 4).

Statistical analysis—cluster identification

Our analytical objective was to divide the NHANES sample into groups of participants with risk factor levels that are similar to each other but distinct from those in other clusters. In extreme cases of one or more risk factors—for example, familial hypercholesterolemia or possibly type 1 diabetes—this task is relatively straightforward and may even be feasible based on prior knowledge or visual inspection of data. For national populations, however, such partitioning requires a method that operationalizes the analytical objective by partitioning the joint distribution of risk factors.

We used a k-means clustering algorithm to identify cardiometabolic and renal phenotypes of the US population in an unsupervised data-driven approach. The k-means algorithm partitions participants into non-overlapping clusters that are relatively homogeneous while maximizing the heterogeneity between clusters, by minimizing the sum of distances of all data points from the center of the cluster they belong to. The k-means algorithm is a specific form of Gaussian mixture method where only the means of the clusters are estimated but not their covariance⁹⁷. It is a widely used and computationally efficient clustering algorithm that produces non-overlapping clusters. We took 50 different random sets of starting values to avoid converging to local minima and used Euclidian distance and the Lloyd implementation of the algorithm.

All analyses were conducted by pooling individual participant data across all survey rounds but separately for men and women to allow for potentially different clustering of cardiometabolic traits between them. We centered and scaled each risk factor by subtracting the overall mean and dividing by the standard deviation before clustering. In k-means, the number of clusters (k) must be pre-specified. Various heuristics have been suggested for selecting the optimal number of clusters—for example, the elbow method and the silhouette method—which compare measures of cluster cohesion and cluster separation for different choices of k. Neither the elbow nor the silhouette method provided a definitive optimal number of clusters (Supplementary Fig. 2). Therefore, we investigated cluster membership, and characteristics when sequentially changing k from 5 to 12, and selected k based on these heuristics as well as on the epidemiological interpretability of the results.

Stability of the clustering results

After selecting the number of clusters, we evaluated the stability of the resultant clusters by calculating the average Jaccard index⁹⁸ between the clustering results over the entire sample and that of 1,000 subsamples of 50% of the data drawn without replacement (Extended Data Table 4). The Jaccard index is a measure of similarity between two groups and ranges from 0 to 1, with 0 indicating no overlap and 1 indicating identical results. For men, all clusters had an average Jaccard index of 0.87 or above; for women, all clusters had an average Jaccard index of 0.80 or above, except for the ‘mid risk tall’ phenotype that had an average Jaccard index of 0.70. To evaluate whether our analysis met our analytical objective of partitioning the joint distribution of risk factors based on a true correlation structure, we also used k-means to cluster 30,180 simulated data points (the same number as used in the main analysis). The simulated data were generated from a 10-dimensional normal distribution with no correlation. All the resulting clusters were highly unstable with a Jaccard index below 0.30, which is much lower than those of clusters identified on NAHNES data (Extended Data Table 4).

Intra-cluster and inter-cluster distances

We also report (Extended Data Fig. 5) the intra-cluster and inter-cluster distances as a measure of how the method achieves the analytical objective. The intra-cluster distance was calculated as the average Euclidian distance between all pairs of points in the same cluster, and the inter-cluster distance was calculated as the average Euclidian distance between all pairs of points from two different clusters. These metrics show that participants assigned to every cluster were, on average, more similar to one another in terms of their risk factor levels than they were to participants in any other cluster.

Consistency of clusters over time

We investigated whether clusters emerging from the analysis of all rounds of NHANES from 1988 to 2018 were similar to those that would emerge if we repeated the analysis for subperiods consisting of NHANES III 1988–1994, NHANES 1999–2008 and NHANES 2009–2018 separately (Supplementary Fig. 3). The phenotypes identified in subperiods were similar to those identified when aggregating all rounds from 1998 to 2018 for men. For women, most of the phenotypes identified over the entire analysis period remained in subperiod clustering, except the ‘mid risk tall’ phenotype, which was replaced by either an ‘obesity’ phenotype or a ‘mid risk’ phenotype, and except the ‘low DBP, low eGFR’ phenotype in NHANES III, which was replaced with a ‘high risk’ phenotype with hazardous levels of all risk factors.

Statistical analysis—trends in prevalence and predictors of cluster membership

In addition to graphical presentation of how cluster prevalence has changed over time, we analyzed the presence of a trend in a regression analysis. We fitted one logistic regression per cluster, with time as the independent variable. We adjusted for age by 5-year age bands and report the P value for the coefficient of the time term. In addition to the entire analysis period, we analyzed trends for pre-specified time periods of 1988–2000, 2000–2010 and 2010–2018 (Extended Data Table 3).

We also used multivariate logistic regression to analyze the predictors of cluster membership. The predictors included age group, survey year, race or ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic and Other ethnicity), education (below high school, high school and university or college), medication use (antihypertensive, statin, oral hypoglycemic diabetes medication and insulin), smoking (current smoking, never smoking and former smoking) and previous history of disease (MI, stroke and CHF).

When reporting the prevalence of clusters over time, and the potential predictors of cluster membership, we accounted for the sampling design through the use of sample weights in the regressions. In all regressions, we rescaled sample weights so that they summed to the same total in each round. We did this so that each round of NHANES contributes the same effective sample size to the analysis of trends and predictors. When evaluating trends over time and predictors of cluster membership, we also adjusted the sample weights by 5-year age bands to match the age distribution of the 2020 US census population. All analysis were done using R software version 4.0.3

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used for this analysis are publicly available and can be downloaded on the NHANES website: https://wwwn.cdc.gov/nchs/nhanes/Default.aspx.

Code availability

The computer code for the clustering and the multivariate analysis in this work is available at https://globalenvhealth.org/code-data-download/ and https://doi.org/10.5281/zenodo.10075387.

Change history

15 January 2024
A Correction to this paper has been published: https://doi.org/10.1038/s44161-024-00425-z

References

NCD Countdown 2030 Collaborators. NCD Countdown 2030: worldwide trends in non-communicable disease mortality and progress towards Sustainable Development Goal target 3.4. Lancet 392, 1072–1088 (2018).
NCD Countdown 2030 Collaborators. NCD Countdown 2030: pathways to achieving Sustainable Development Goal target 3.4. Lancet 396, 918–934 (2020).
Aune, D. et al. Resting heart rate and the risk of cardiovascular disease, total cancer, and all-cause mortality—a systematic review and dose–response meta-analysis of prospective studies. Nutr. Metab. Cardiovasc. Dis. 27, 504–517 (2017).
Article CAS PubMed Google Scholar
Cheng, G., Huang, C., Deng, H. & Wang, H. Diabetes as a risk factor for dementia and mild cognitive impairment: a meta‐analysis of longitudinal studies. Intern. Med. J. 42, 484–491 (2012).
Article CAS PubMed Google Scholar
Chronic Kidney Disease Prognosis Consortium. Association of estimated glomerular filtration rate and albuminuria with all-cause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis. Lancet 375, 2073–2081 (2010).
Article Google Scholar
Gansevoort, R. T. et al. Lower estimated GFR and higher albuminuria are associated with adverse kidney outcomes. A collaborative meta-analysis of general and high-risk population cohorts. Kidney Int. 80, 93–104 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kannel, W. B., Dawber, T. R., Kagan, A., Revotskie, N. & Stokes, J. III Factors of risk in the development of coronary heart disease—six-year follow-up experience: the Framingham Study. Ann. Intern. Med 55, 33–50 (1961).
Article CAS PubMed Google Scholar
Kennelly, S. P., Lawlor, B. A. & Kenny, R. A. Blood pressure and dementia—a comprehensive review. Ther. Adv. Neurol. Disord. 2, 241–260 (2009).
Article PubMed PubMed Central Google Scholar
Kyrgiou, M. et al. Adiposity and cancer at major anatomical sites: umbrella review of the literature. BMJ 356, j477 (2017).
Article PubMed PubMed Central Google Scholar
Lewington, S., Clarke, R., Qizilbash, N., Peto, R. & Collins, R. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 360, 1903–1913 (2002).
Article PubMed Google Scholar
The Emerging Risk Factor Collaboration. Major lipids, apolipoproteins, and risk of vascular disease. JAMA 302, 1993–2000 (2009).
Article Google Scholar
The Emerging Risk Factor Collaboration. Glycated hemoglobin measurement and prediction of cardiovascular disease. JAMA 311, 1225–1233 (2014).
Article Google Scholar
The Emerging Risk Factors Collaboration. Adult height and the risk of cause-specific death and vascular morbidity in 1 million people: individual participant meta-analysis. Int. J. Epidemiol. 41, 1419–1433 (2012).
Article PubMed Central Google Scholar
The Global BMI Mortality Collaboration. Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents. Lancet 388, 776–786 (2016).
Article PubMed PubMed Central Google Scholar
Tsilidis, K. K., Kasimis, J. C., Lopez, D. S., Ntzani, E. E. & Ioannidis, J. P. Type 2 diabetes and cancer: umbrella review of meta-analyses of observational studies. BMJ 350, g7607 (2015).
Article PubMed Google Scholar
Mahamat-Saleh, Y. et al. Diabetes, hypertension, body mass index, smoking and COVID-19-related mortality: a systematic review and meta-analysis of observational studies. BMJ Open 11, e052777 (2021).
Article PubMed Google Scholar
Angell, S. Y. et al. The American Heart Association 2030 impact goal: a presidential advisory from the American Heart Association. Circulation 141, e120–e138 (2020).
Article PubMed PubMed Central Google Scholar
Kannel, W. B., McGee, D. & Gordon, T. A general cardiovascular risk profile: the Framingham Study. Am. J. Cardiol. 38, 46–51 (1976).
Article CAS PubMed Google Scholar
Lloyd-Jones, D. M. et al. Defining and setting national goals for cardiovascular health promotion and disease reduction: the American Heart Association’s strategic Impact Goal through 2020 and beyond. Circulation 121, 586–613 (2010).
Article PubMed Google Scholar
Rasmussen-Torvik, L. J. et al. Ideal cardiovascular health is inversely associated with incident cancer: the Atherosclerosis Risk In Communities study. Circulation 127, 1270–1275 (2013).
Article PubMed PubMed Central Google Scholar
Stamler, J. et al. Low risk-factor profile and long-term cardiovascular and noncardiovascular mortality and life expectancy: findings for 5 large cohorts of young adult and middle-aged men and women. JAMA 282, 2012–2018 (1999).
Article CAS PubMed Google Scholar
Carter, P., Gray, L. J., Troughton, J., Khunti, K. & Davies, M. J. Fruit and vegetable intake and incidence of type 2 diabetes mellitus: systematic review and meta-analysis. BMJ 341, c4229 (2010).
Article PubMed PubMed Central Google Scholar
Filippini, T. et al. Blood pressure effects of sodium reduction: dose–response meta-analysis of experimental studies. Circulation 143, 1542–1567 (2021).
Article CAS PubMed PubMed Central Google Scholar
Filippini, T. et al. Potassium intake and blood pressure: a dose–response meta‐analysis of randomized controlled trials. J. Am. Heart Assoc. 9, e015719 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gay, H. C., Rao, S. G., Vaccarino, V. & Ali, M. K. Effects of different dietary interventions on blood pressure: systematic review and meta-analysis of randomized controlled trials. Hypertension 67, 733–739 (2016).
Article CAS PubMed Google Scholar
Ley, S. H., Hamdy, O., Mohan, V. & Hu, F. B. Prevention and management of type 2 diabetes: dietary components and nutritional strategies. Lancet 383, 1999–2007 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mensink, R. P. Effects of saturated fatty acids on serum lipids and lipoproteins: a systematic review and regression analysis. https://iris.who.int/bitstream/handle/10665/246104/9789241565349-eng.pdf (World Health Organization, 2016).
Mente, A. et al. Association of dietary nutrients with blood lipids and blood pressure in 18 countries: a cross-sectional analysis from the PURE study. Lancet Diabetes Endocrinol. 5, 774–787 (2017).
Article CAS PubMed Google Scholar
Sacks, F. M. & Campos, H. Dietary therapy in hypertension. N. Engl. J. Med. 362, 2102–2112 (2010).
Article CAS PubMed Google Scholar
Meader, N. et al. A systematic review on the clustering and co-occurrence of multiple risk behaviours. BMC Public Health 16, 657 (2016).
Article PubMed PubMed Central Google Scholar
Bentham, J. et al. Multidimensional characterization of global food supply from 1961 to 2013. Nat. Food 1, 70–75 (2020).
Article PubMed PubMed Central Google Scholar
Goodarzi, M. O. Genetics of obesity: what genetic association studies have taught us about the biology of obesity and its complications. Lancet Diabetes Endocrinol. 6, 223–236 (2018).
Article CAS PubMed Google Scholar
Lu, Y. et al. Metabolic mediators of the effects of body-mass index, overweight, and obesity on coronary heart disease and stroke: a pooled analysis of 97 prospective cohorts with 1.8 million participants. Lancet 383, 970–983 (2014).
Carroll, M. D., Kit, B. K., Lacher, D. A., Shero, S. T. & Mussolino, M. E. Trends in lipids and lipoproteins in US adults, 1988–2010. JAMA 308, 1545–1554 (2012).
Article CAS PubMed Google Scholar
Hales, C. M., Fryar, C. D., Carroll, M. D., Freedman, D. S. & Ogden, C. L. Trends in obesity and severe obesity prevalence in US youth and adults by sex and age, 2007–2008 to 2015–2016. JAMA 319, 1723–1725 (2018).
Article PubMed PubMed Central Google Scholar
He, J. et al. Trends in cardiovascular risk factors in US adults by race and ethnicity and socioeconomic status, 1999–2018. JAMA 326, 1286–1298 (2021).
Article PubMed PubMed Central Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). A century of trends in adult human height. eLife 5, e13410 (2016).
Article PubMed Central Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants. Lancet 387, 1513–1530 (2016).
Article Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: a pooled analysis of 2416 population-based measurement studies in 128.9 million children, adolescents, and adults. Lancet 390, 2627–2642 (2017).
Article PubMed Central Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). Repositioning of the global epicentre of non-optimal cholesterol. Nature 582, 73–77 (2020).
Article PubMed Central Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants. Lancet 398, 957–980 (2021).
Article PubMed Central Google Scholar
Saydah, S. et al. Trends in cardiovascular disease risk factors by obesity level in adults in the United States, NHANES 1999–2010. Obesity 22, 1888–1895 (2014).
Article CAS PubMed Google Scholar
Zhang, Y. & Moran, A. E. Trends in the prevalence, awareness, treatment, and control of hypertension among young adults in the United States, 1999 to 2014. Hypertension 70, 736–742 (2017).
Article CAS PubMed Google Scholar
O’Hearn, M., Lauren, B. N., Wong, J. B., Kim, D. D. & Mozaffarian, D. Trends and disparities in cardiometabolic health among U.S. adults, 1999–2018. J. Am. Coll. Cardiol. 80, 138–151 (2022).
Article PubMed PubMed Central Google Scholar
Wilson, P. W., Kannel, W. B., Silbershatz, H. & D’Agostino, R. B. Clustering of metabolic factors and coronary heart disease. Arch. Intern. Med. 159, 1104–1109 (1999).
Article CAS PubMed Google Scholar
Mottillo, S. et al. The metabolic syndrome and cardiovascular risk: a systematic review and meta-analysis. J. Am. Coll. Cardiol. 56, 1113–1132 (2010).
Article PubMed Google Scholar
Primeau, V. et al. Characterizing the profile of obese patients who are metabolically healthy. Int. J. Obesity 35, 971–981 (2011).
Article CAS Google Scholar
Stefan, N., Häring, H.-U., Hu, F. B. & Schulze, M. B. Metabolically healthy obesity: epidemiology, mechanisms, and clinical implications. Lancet Diabetes Endocrinol. 1, 152–162 (2013).
Article PubMed Google Scholar
Wang, J.-S. et al. Trends in the prevalence of metabolically healthy obesity among US adults, 1999–2018. JAMA Network Open 6, e232145 (2023).
Article PubMed PubMed Central Google Scholar
Miller, L. M. et al. Cardiovascular damage phenotypes and all-cause and CVD mortality in older adults. Ann. Epidemiol. 63, 35–40 (2021).
Article PubMed PubMed Central Google Scholar
Liao, X., Kerr, D., Morales, J. & Duncan, I. Application of machine learning to identify clustering of cardiometabolic risk factors in US adults. Diabetes Technol. Ther. 21, 245–253 (2019).
Article PubMed Google Scholar
Antonio-Villa, N. E. et al. Prevalence trends of diabetes subgroups in the united states: a data-driven analysis spanning three decades from NHANES (1988–2018). J. Clin. Endocrinol. Metab. 107, 735–742 (2022).
Article PubMed Google Scholar
Bancks, M. P., Casanova, R., Gregg, E. W. & Bertoni, A. G. Epidemiology of diabetes phenotypes and prevalent cardiovascular risk factors and diabetes complications in the National Health and Nutrition Examination Survey 2003–2014. Diabetes Res. Clin. Pract. 158, 107915 (2019).
Article CAS PubMed Google Scholar
Xue, Q. et al. Subtypes of type 2 diabetes and incident cardiovascular disease risk: UK Biobank and All of Us cohorts. Mayo Clin. Proc. 98, 1192–1204 (2023).
Article CAS PubMed Google Scholar
Seymour, C. W. et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA 321, 2003–2017 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zweck, E. et al. Phenotyping cardiogenic shock. J. Am. Heart Assoc. 10, e020085 (2021).
Article PubMed PubMed Central Google Scholar
Markovich Gordon, M., Moser, A. M. & Rubin, E. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools. PLoS ONE 7, e29578 (2012).
Article PubMed PubMed Central Google Scholar
Zemedikun, D. T., Gray, L. J., Khunti, K., Davies, M. J. & Dhalwani, N. N. Patterns of multimorbidity in middle-aged and older adults: an analysis of the UK Biobank data. Mayo Clin. Proc. 93, 857–866 (2018).
Violan, C. et al. Prevalence, determinants and patterns of multimorbidity in primary care: a systematic review of observational studies. PLoS ONE 9, e102149 (2014).
Article PubMed PubMed Central Google Scholar
Prados-Torres, A., Calderón-Larrañaga, A., Hancco-Saavedra, J., Poblador-Plou, B. & van den Akker, M. Multimorbidity patterns: a systematic review. J. Clin. Epidemiol. 67, 254–266 (2014).
Article PubMed Google Scholar
Alhasoun, F. et al. Age density patterns in patients medical conditions: a clustering approach. PLoS Comput. Biol. 14, e1006115 (2018).
Article PubMed PubMed Central Google Scholar
Bisquera, A. et al. Identifying longitudinal clusters of multimorbidity in an urban setting: a population-based cross-sectional study. Lancet Reg. Health Eur. 3, 100047 (2021).
Article PubMed PubMed Central Google Scholar
Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit. Med. 3, 96 (2020).
Article PubMed PubMed Central Google Scholar
Roso-Llorach, A. et al. Comparative analysis of methods for identifying multimorbidity patterns: a study of ‘real-world’data. BMJ Open 8, e018986 (2018).
Article PubMed PubMed Central Google Scholar
Zhu, Y., Edwards, D., Mant, J., Payne, R. A. & Kiddle, S. Characteristics, service use and mortality of clusters of multimorbid patients in England: a population-based study. BMC Med. 18, 78 (2020).
Article PubMed PubMed Central Google Scholar
Yang, S., Varghese, P., Stephenson, E., Tu, K. & Gronsbell, J. Machine learning approaches for electronic health records phenotyping: a methodical review. J. Am. Med. Inform. Assoc. 30, 367–381 (2022).
Article PubMed Central Google Scholar
De Freitas, J. K. et al. Phe2vec: automated disease phenotyping based on unsupervised embeddings from electronic health records. Patterns 2, 100337 (2021).
Article PubMed PubMed Central Google Scholar
Loftus, T. J. et al. Phenotype clustering in health care: a narrative review for clinicians. Front. Artif. Intell. 5, 842306 (2022).
Article PubMed PubMed Central Google Scholar
Multimorbidity: a priority for global health research. https://acmedsci.ac.uk/file-download/82222577 (Academy of Medical Sciences, 2018).
Pearson-Stuttard, J., Ezzati, M. & Gregg, E. W. Multimorbidity—a defining challenge for health systems. Lancet Public Health 4, e599–e600 (2019).
Article PubMed Google Scholar
Wang, L. et al. Trends in prevalence of diabetes and control of risk factors in diabetes among US adults, 1999–2018. JAMA 326, 704–716 (2021).
Article Google Scholar
Selvin, E., Parrinello, C. M., Sacks, D. B. & Coresh, J. Trends in prevalence and control of diabetes in the United States, 1988–1994 and 1999–2010. Ann. Intern. Med. 160, 517–525 (2014).
Article PubMed PubMed Central Google Scholar
Sarnak, M. J. et al. Chronic kidney disease and coronary artery disease: JACC state-of-the-art review. J. Am. Coll. Cardiol. 74, 1823–1838 (2019).
Article CAS PubMed Google Scholar
Salive, M. E. Multimorbidity in older adults. Epidemiol. Rev. 35, 75–83 (2013).
Article PubMed Google Scholar
Chae, C. U. et al. Increased pulse pressure and risk of heart failure in the elderly. JAMA 281, 634–643 (1999).
Article CAS PubMed Google Scholar
Coresh, J. et al. Prevalence of chronic kidney disease in the United States. JAMA 298, 2038–2047 (2007).
Article CAS PubMed Google Scholar
Wildman, R. P. et al. The obese without cardiometabolic risk factor clustering and the normal weight with cardiometabolic risk factor clustering: prevalence and correlates of 2 phenotypes among the US population (NHANES 1999–2004). Arch. Intern. Med. 168, 1617–1624 (2008).
Article PubMed Google Scholar
Kanjilal, S. et al. Socioeconomic status and trends in disparities in 4 major risk factors for cardiovascular disease among US adults, 1971–2002. Arch. Intern. Med. 166, 2348–2355 (2006).
Article PubMed Google Scholar
Dong, G., Feng, J., Sun, F., Chen, J. & Zhao, X.-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med. 13, 110 (2021).
Article PubMed PubMed Central Google Scholar
Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425 (2018).
Article CAS PubMed PubMed Central Google Scholar
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Article CAS PubMed PubMed Central Google Scholar
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Article CAS PubMed PubMed Central Google Scholar
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wood, A. C. et al. Identification of genetic loci simultaneously associated with multiple cardiometabolic traits. Nutr. Metab. Cardiovasc. Dis. 32, 1027–1034 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Article CAS PubMed PubMed Central Google Scholar
Muntner, P. et al. Trends in blood pressure control among US adults with hypertension, 1999–2000 to 2017–2018. JAMA 324, 1190–1200 (2020).
Article PubMed Google Scholar
Salami, J. A. et al. National trends in statin use and expenditures in the US adult population from 2002 to 2013: insights from the Medical Expenditure Panel Survey. JAMA Cardiol. 2, 56–65 (2017).
Article PubMed Google Scholar
McGovern, P. G. et al. Trends in acute coronary heart disease mortality, morbidity, and medical care from 1985 through 1997: the Minnesota heart survey. Circulation 104, 19–24 (2001).
Article CAS PubMed Google Scholar
Shah, N. S. et al. Trends in cardiometabolic mortality in the United States, 1999–2017. JAMA 322, 780–782 (2019).
Article PubMed PubMed Central Google Scholar
NCD Risk Factor Collaboration (NCD-RisC). Long-term and recent trends in hypertension awareness, treatment, and control in 12 high-income countries: an analysis of 123 nationally representative surveys. Lancet 394, 639–651 (2019).
Article PubMed Central Google Scholar
Bessesen, D. H. & Van Gaal, L. F. Progress and challenges in anti-obesity pharmacotherapy. Lancet Diabetes Endocrinol. 6, 237–248 (2018).
Article PubMed Google Scholar
Chen, T.-C., Clark, J., Riddles, M. K., Mohadjer, L. K. & Fakhouri, T. H. National Health and Nutrition Examination Survey, 2015−2018: sample design and estimation procedures. https://www.cdc.gov/nchs/data/series/sr_02/sr02-184-508.pdf (National Center for Health Statistics, 2020).
Ashwell, M., Gunn, P. & Gibson, S. Waist‐to‐height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta‐analysis. Obesity Rev. 13, 275–286 (2012).
Article CAS Google Scholar
Myers, G. L., Cooper, G. R., Winn, C. L. & Smith, S. J. The centers for disease control-national heart, lung and blood institute lipid standardization program: an approach to accurate and precise lipid measurements. Clin. Lab. Med. 9, 105–136 (1989).
Article CAS PubMed Google Scholar
Murphy, D. et al. Trends in prevalence of chronic kidney disease in the United States. Ann. Intern. Med. 165, 473–481 (2016).
Article PubMed PubMed Central Google Scholar
Rousseeuw, P. J. & van Zomeren, B. C. Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85, 633–639 (1990).
Article Google Scholar
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning (Springer, 2006).
Hennig, C. Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52, 258–271 (2007).
Newcombe, R. G. Two‐sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17, 857–872 (1998).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work was funded by a grant from the UK Medical Research Council (grant no. MR/V034057/1, to M.E.). B.Z. is supported by a fellowship from the Abdul Latif Jameel Institute for Disease and Emergency Analytics at Imperial College London, funded by a donation from Community Jameel. The funders had no role in the design and conduct of the study; in the collection, management, analysis and interpretation of the data; in the preparation, review or approval of the manuscript; or in the decision to submit the manuscript for publication.

Author information

Authors and Affiliations

Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London, UK
Victor P. F. Lhoste, Bin Zhou, Anu Mishra, James E. Bennett, Perviz Asaria, Edward W. Gregg & Majid Ezzati
MRC Centre for Environment and Health, School of Public Health, Imperial College London, London, UK
Victor P. F. Lhoste, Bin Zhou, Anu Mishra, James E. Bennett, Perviz Asaria, Edward W. Gregg & Majid Ezzati
Abdul Latif Jameel Institute for Disease and Emergency Analytics, Imperial College London, London, UK
Bin Zhou, Edward W. Gregg & Majid Ezzati
Department of Mathematics, Imperial College London, London, UK
Sarah Filippi
School of Population Health, Royal College of Surgeons in Ireland, Dublin, Ireland
Edward W. Gregg
Department of Global Health and Population, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Goodarz Danaei
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Goodarz Danaei
Regional Institute for Population Studies, University of Ghana, Accra, Ghana
Majid Ezzati

Authors

Victor P. F. Lhoste
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Anu Mishra
View author publications
You can also search for this author in PubMed Google Scholar
James E. Bennett
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Filippi
View author publications
You can also search for this author in PubMed Google Scholar
Perviz Asaria
View author publications
You can also search for this author in PubMed Google Scholar
Edward W. Gregg
View author publications
You can also search for this author in PubMed Google Scholar
Goodarz Danaei
View author publications
You can also search for this author in PubMed Google Scholar
Majid Ezzati
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.P.F.L., J.E.B., S.F. and M.E. conceived and designed the study. V.P.F.L., J.E.B., B.Z., A.M., S.F. and M.E. developed the analytical strategy. V.P.F.L. conducted analysis, in consultation with B.Z., J.E.B. and A.M. V.P.F.L., M.E., J.E.B. and A.M. interpreted the data and drafted the figures. V.P.F.L. and M.E. wrote the first draft of the manuscript. E.W.G., P.A. and G.D. provided input to finalize the paper. All authors had full access to all data used in this study. V.P.F.L. and B.Z. checked and verified the data used in the analysis. All authors were responsible for submitting the article for publication.

Corresponding author

Correspondence to Majid Ezzati.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Cardiovascular Research thanks Melissa Haendel, Simin Liu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Risk factor distribution within each cluster.

Each panel corresponds to a cluster, the color shows the distribution of each variable in each cluster with darker color at the center of the distribution. The concentric circles show the minimum, 25^th, 50^th, 75^th percentiles and maximum in the whole sample, with the median shown in darker color. Each percentile is positioned relative to the distribution in the whole population so that the scale is common across all clusters. The scale is reversed for height, eGFR and HDL because lower values indicate higher risk. eGFR: estimated glomerular filtration rate; BMI: body-mass index; WHtR: waist-to-height ratio; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein cholesterol; non-HDL: non-high-density lipoprotein cholesterol; SBP: systolic blood pressure; DBP: diastolic blood pressure.

Extended Data Fig. 2

Risk factor levels by age group in cardiometabolic and renal clusters. Each panel corresponds to a cluster, each line shows the median value of one biomarker for one age group within each cluster. The concentric circles show the minimum, 25^th, 50^th, 75^th percentiles and maximum in the whole sample, with the median shown in darker color. Each line is positioned relative to the distribution in the whole population so that the scale is common across all clusters and age groups. The scale is reversed for height, eGFR and HDL because lower values indicate higher risk. eGFR: estimated glomerular filtration rate; BMI: body-mass index; WHtR: waist-to-height ratio; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein cholesterol; non-HDL: non-high-density lipoprotein cholesterol; SBP: systolic blood pressure; DBP: diastolic blood pressure.

Extended Data Fig. 3 Trends in crude prevalence of cardiometabolic and renal clusters from 1988 to 2018.

Crude prevalence was calculated as overall prevalence in each NHANES round without any adjustment for the age structure of the participants.

Extended Data Fig. 4 Flowchart of data cleaning.

Data cleaning per survey round.

Extended Data Fig. 5 Average intra- and inter-cluster distances for both women and men.

Each cell of the diagonal represents the average Euclidian distance between all pairs in a given cluster (Inter-cluster distance). Each cell on the off diagonal represents the average Euclidian distance between all pairs of individuals from different clusters (intra-cluster distance).

Extended Data Table 1 Clinical characteristics of cardiometabolic and renal clusters

Full size table

Extended Data Table 2 Medication use in cardiometabolic and renal clusters of US adults for 2011–2018 NHANES rounds

Full size table

Extended Data Table 3 Changes in phenotypes prevalence in pre-specified periods

Full size table

Extended Data Table 4 Average Jaccard index of the clusters identified using NHANES data and on those identified using simulated data from independent normal distribution

Full size table

Supplementary information

Supplementary Information

Supplementary Figs. 1–3 and STROBE checklist for cross-sectional studies.

Reporting Summary

Supplementary Table

Supplementary Table 1: Risk factor distributions in cardiometabolic and renal clusters. Supplementary Table 2: Pre-defined ranges used for data cleaning.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lhoste, V.P.F., Zhou, B., Mishra, A. et al. Cardiometabolic and renal phenotypes and transitions in the United States population. Nat Cardiovasc Res 3, 46–59 (2024). https://doi.org/10.1038/s44161-023-00391-y

Download citation

Received: 28 March 2023
Accepted: 13 November 2023
Published: 15 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1038/s44161-023-00391-y

Subjects

Abstract

Similar content being viewed by others

Main

Cardiometabolic and renal phenotypes of the US population

Demographic and clinical characteristics of clusters

Trends over time

Changes in age patterns of clusters

Predictors of cardiometabolic and renal traits

Influence of the number of clusters

Strengths and limitations

Discussion

Methods

Data

Data cleaning

Statistical analysis—cluster identification

Stability of the clustering results

Intra-cluster and inter-cluster distances

Consistency of clusters over time

Statistical analysis—trends in prevalence and predictors of cluster membership

Reporting summary

Data availability

Code availability

Change history

15 January 2024

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links