Abstract
Cardiovascular and renal conditions have both shared and distinct determinants. In this study, we applied unsupervised clustering to multiple rounds of the National Health and Nutrition Examination Survey from 1988 to 2018, and identified 10 cardiometabolic and renal phenotypes. These included a ‘low risk’ phenotype; two groups with average risk factor levels but different heights; one group with low body-mass index and high levels of high-density lipoprotein cholesterol; five phenotypes with high levels of one or two related risk factors (‘high heart rate’, ‘high cholesterol’, ‘high blood pressure’, ‘severe obesity’ and ‘severe hyperglycemia’); and one phenotype with low diastolic blood pressure (DBP) and low estimated glomerular filtration rate (eGFR). Prevalence of the ‘high blood pressure’ and ‘high cholesterol’ phenotypes decreased over time, contrasted by a rise in the ‘severe obesity’ and ‘low DBP, low eGFR’ phenotypes. The cardiometabolic and renal traits of the US population have shifted from phenotypes with high blood pressure and cholesterol toward poor kidney function, hyperglycemia and severe obesity.
Similar content being viewed by others
Main
Diabetes, dementia, cardiovascular disease (CVD) and chronic kidney disease (CKD) are leading causes of death in the United States, in other high-income nations and, increasingly, in low-income and middle-income countries1,2. Obesity, short stature, high blood pressure, high heart rate, hyperglycemia, non-optimal lipid profiles and poor kidney function are established risk factors for one or more of these diseases3,4,5,6,7,8,9,10,11,12,13,14,15 and, in some cases, for infections such as coronavirus disease 2019 (ref. 16). As a result, people who have optimal levels of all or most risk factors are at low risk of cardiovascular and renal disease and cancer and vice versa17,18,19,20,21. Physiological risk factors can have complex correlations and co-occurrence patterns for at least two reasons. First, these physiological factors have shared as well as distinct genetic, behavioral, environmental and dietary determinants. For example, consumption of fruits and vegetables, meat, dairy, unsaturated versus saturated fats, processed versus whole grain carbohydrates and alcohol affect multiple cardiometabolic and renal traits beneficially or adversely, whereas others, such as sodium and potassium, affect only one or two traits (blood pressure and kidney function)22,23,24,25,26,27,28,29. Furthermore, these factors may cluster differently among different subgroups of a population30 and change over time31. Second, some of these physiological risk factors are themselves etiologically related; for example, obesity is a risk factor for dyslipidemia, elevated blood pressure and hyperglycemia32,33.
At the population level, some studies have quantified trends in individual cardiometabolic risk factors in the US population, other countries or globally34,35,36,37,38,39,40,41,42,43. Other studies have counted the number of cardiometabolic risk factors44,45, with some also quantifying association with the risk of coronary heart disease45. Some studies have used concepts such as metabolic syndrome46, optimal cardiometabolic health44 and metabolically healthy obesity47,48,49 to identify groups of people with a specific pre-determined risk factor profile. Studies that used data-driven methods to identify cardiometabolic phenotypes were mostly based on data from specific subgroups of a population (for example, older adults)50, users of specific health programs51 or people with a specific index disease, such as diabetes52,53,54, sepsis55 or cardiogenic shock56. The only study analyzing health-related phenotypes in an entire national population57 used a mix of behavioral, physiological and diagnostic variables at a single point in time for methodological assessment; it did not analyze change over time or the clinical or epidemiological characteristics of the clusters. Beyond cardiometabolic and renal health, some studies identified co-occurrences, or subtypes, of specific diseases in large cohorts, such as the UK Biobank58, in primary care patients from different countries59,60, especially using electronic health records61,62,63,64,65,66,67. These studies used a range of clustering methods66,68.
In the present study, we applied a data-driven approach to repeated nationally representative health examination surveys, namely the National Health and Nutrition Examination Survey (NHANES), from 1988 to 2018, to identify a comprehensive set of cardiometabolic and renal phenotypes in the United States adult population. We measured how the prevalence of these phenotypes has changed over time and characterized their sociodemographic, epidemiological and clinical predictors. This information is needed for planning and priority setting for population-based prevention programs and health system interventions to coherently and effectively prevent and manage conditions based on their co-occurrence in the population69,70.
Cardiometabolic and renal phenotypes of the US population
We identified 10 clusters (phenotypes) for both men and women that collectively characterized the cardiometabolic and renal traits of the US population from 1988 to 2018 (Fig. 1). The reasons for using 10 clusters are stated in the Methods, and the results with other cluster numbers are presented below. The identified phenotypes were similar between men and women, even though we analyzed data for the two sexes separately.
For both sexes, we identified a ‘low risk’ phenotype with near-optimal risk factor levels, accounting for 15% and 13% of the sample for women and men, respectively. We also identified two clusters (‘mid risk short’ and ‘mid risk tall’) jointly accounting for 25% and 28% of the sample for women and men, respectively, with risk factor levels mostly around sample medians. These two clusters differed by their average height and, to a lesser extent, by blood pressure and estimated glomerular filtration rate (eGFR) levels, with the ‘mid risk short’ cluster having, on average, shorter height (median of 155 cm versus 167 cm for women; 168 cm versus 182 cm for men) (Supplementary Table 1), lower blood pressure and higher eGFR than the ‘mid risk tall’ cluster. We also identified a group (‘low BMI, high HDL’) characterized by low levels of body mass index (BMI) and waist-to-height ratio (WHtR) and high high-density lipoprotein (HDL) cholesterol relative to the rest of the NHANES sample but with other risk factors being around the sample median.
Five clusters were characterized by having high levels of one or two related risk factors accounting together for 40% of the sample for both sexes. These were ‘high cholesterol’, ‘high blood pressure’, ‘severe hyperglycemia’, ‘high heart rate’ and ‘severe obesity’. For instance, the ‘severe hyperglycemia’ phenotype had a median glycated hemoglobin (HbA1c) of 9.9% for women and 9.8% for men, but their median BMI (and WHtR) was much lower than those of the ‘severe obesity’ cluster (median BMI of 31.8 kg m−2 and 29.7 kg m−2 in the ‘severe hyperglycemia’ cluster for women and men, respectively, compared to a median BMI of 41.1 kg m−2 and 38.2 kg m−2 in the ‘severe obesity’ cluster). Similarly, the ‘high blood pressure’ cluster had a median systolic blood pressure (SBP) of 159 mmHg for both sexes, and the ‘high cholesterol’ cluster had a median non-HDL cholesterol of 5.5 mmol L−1 for both women and men, with other risk factor levels lying between the median and 75th percentiles of the entire NHANES sample. In all these clusters, the defining risk factor varied less among member participants than the other risk factors (Extended Data Fig. 1), further illustrating that its high value was the shared feature among participants who fell in the cluster. Finally, in both sexes, the last cluster (‘low DBP, low eGFR’) was characterized by low levels of diastolic blood pressure (DBP) and eGFR. For example, women who fell in the ‘low DBP, low eGFR’ cluster had a median DBP of 61 mmHg and a median eGFR of 63 ml/min/1.73 m2.
Demographic and clinical characteristics of clusters
Most of the identified cardiometabolic and renal phenotypes had a mix of young (20–39 years), middle-aged (40–59 years) and old (60 years and older) adults. The exceptions were two clusters for men and three for women with predominantly young people (‘low risk’ and ‘mid risk short’ for both sexes and ‘high heart rate’ for women) and one with predominantly old people (‘low DBP, low eGFR’) (Table 1). Even though 73% of women and 77% of men in the ‘low risk’ phenotype were aged 20–39 years, 4% and 6%, respectively, were older than 60 years with near-optimal risk factor profiles similar to their younger peers, except for slightly lower eGFR and higher HbA1c. Similarly, although most (92% of women and 90% of men) in the cluster ‘low DBP, low eGFR’ were 60 years or older, a small percentage (1% and 2%, respectively) were aged 20–39 years. Within each cluster, individuals of different age groups generally had similar risk factor profiles, especially on the defining risk factors in the higher risk phenotypes (Extended Data Fig. 2).
The ‘low risk’ group had the lowest number of morbidities and medication use (Table 1 and Extended Data Table 1). As expected, 96% of women and 98% of men in the ‘high blood pressure’ cluster had hypertension, yet this condition was also prevalent in ≥50% of participants in some other clusters—for example, ‘low DBP, low eGFR’ and ‘severe hyperglycemia’ for both sexes and ‘severe obesity’ phenotype for men (most of those with hypertension in the ‘low DBP, low eGFR’ cluster had isolated systolic hypertension). Similarly, all participants in the ‘severe hyperglycemia’ cluster had diabetes; the next highest diabetes prevalence was in the ‘low DBP, low eGFR’ cluster (31% in both sexes), with the ‘severe obesity’ cluster having only the third highest prevalence (22% in women and 25% in men). Median HbA1c of people with diabetes in the ‘severe obesity’ cluster (6.88% for men and 6.77% for women) was much lower than median HbA1c of those in the ‘severe hyperglycemia’ cluster (9.9% for women and 9.8% for men). Finally, those in the ‘low DBP, low eGFR’ phenotype more frequently had a history of myocardial infarction (MI), stroke and congestive heart failure (CHF) than the other phenotypes—for example, 19% of men in this phenotype had a history of MI compared to 6% in the whole sample; similarly, 12% of men in this phenotype had a previous history of CHF compared to 4% in the whole sample.
The use of statins was relatively low in the ‘high cholesterol’ group—13% for women and 8% for men—with that of men being lower than the overall NHANES sample (Table 1). In contrast, statin and antihypertensive use was high in the ‘low DBP, low eGFR’ and ‘severe hyperglycemia’ groups (26–41% of participants in different cluster–sex combinations, which is 2–3 times more than in the overall samples), consistent with the clinical guidelines that recommend the use of these medicines among people with diabetes and history of MI and stroke, especially in older ages. In the ‘severe obesity’ cluster, antihypertensive and statin use was above average, which may partly account for this group having blood pressure and cholesterol levels around the population median. The use of most medicines was higher in the 2011–2018 period than over the entire analysis period, with the largest increase being that of statins (Extended Data Table 2). The increase in statin use was, however, less pronounced in the ‘high cholesterol’ phenotype (+38% relative increase for women and +4% for men) than in the whole sample (+48% for women and +45% for men), demonstrating that this phenotype was characterized by insufficiently treated or controlled levels of non-HDL cholesterol.
Trends over time
The cardiometabolic and renal risk profile of the US population changed from 1988 to 2018 (Fig. 2). The age-standardized prevalence of the ‘severe obesity’ phenotype more than tripled for both sexes and that of the ‘low DBP, low eGFR’ phenotype almost doubled over the entire analysis period. Most of the increase of the ‘low DBP, low eGFR’ phenotype occurred between 2000 and 2010, before plateauing after 2010 (P value for trend from 2010 to 2018 was 0.96 for women and 0.97 for men; Extended Data Table 3). In contrast, the prevalence of the ‘high blood pressure’ and ‘high cholesterol’ phenotypes more than halved in both sexes (P value for trend was <0.0001 for both sexes over the entire analysis period). However, since the late 2000s, there has been a reversal of the earlier declines in the prevalence of the ‘high blood pressure’ phenotype (P value for increasing trend from 2010 to 2018 was 0.0015 for women and 0.0346 for men). There was no statistically detectable change in the ‘severe hyperglycemia’ phenotype (P = 0.09 for women and 0.79 for men), which indicates that, despite the increase in the prevalence of diabetes in the United States, those at extreme values of HbA1c were stable. Rather, many of the additional people with diabetes fell in the ‘severe obesity’ and ‘low DBP, low eGFR’ clusters for which the prevalence increased over time. Most trends were consistent between the two sexes. A notable exception was the ‘low risk’ phenotype, which remained constant for men but decreased by 4.5 percentage points for women (P value for trend was 0.0006 over the entire analysis period), even though its prevalence remained higher in women than men throughout the analysis period. Trends in crude prevalence were nearly identical to the age-standardized trends (Extended Data Fig. 3).
Changes in age patterns of clusters
The various cardiometabolic and renal phenotypes had differing age associations (Fig. 3). The ‘low risk’ and ‘mid risk short’ phenotypes for both sexes, and the ‘high heart rate’ phenotype for women, were more common among younger adults, and their prevalence decreased with age, with a much steeper age association for the ‘low risk’ group. Conversely, the ‘low DBP, low eGFR’ and ‘high blood pressure’ phenotypes became more prevalent throughout the life course, with a steeper age association for the ‘low DBP, low eGFR’ group. Other phenotypes tended to peak in middle ages.
Both ‘high blood pressure’ and ‘high cholesterol’ phenotypes decreased sharply in people aged 50 years and older from 1991 to 2008, likely due to the increased use of statins and antihypertensive medication; however, the decreases may have slowed down or stagnated in the past decade. In contrast, for both sexes, the age association of the ‘low DBP, low eGFR’ phenotype became steeper over time.
Predictors of cardiometabolic and renal traits
We analyzed the sociodemographic, behavioral and clinical predictors of cluster membership in multivariate regressions as described in the Methods. Both education and ethnicity were associated with the partition of the participants into some of the cardiometabolic and renal phenotypes. Higher education was associated with lower odds of allocation to the ‘high cholesterol’ phenotype for both men and women, lower odds of allocation to the ‘severe hyperglycemia’ phenotype for men and lower odds of allocation to the ‘low DBP, low eGFR’ phenotype for women; it was associated with higher odds of being in the ‘low risk’ phenotype for women (Figs. 4 and 5). Hispanic and non-Hispanic Black women and men had higher odds of belonging to the ‘severe hyperglycemia’ and ‘high blood pressure’ phenotypes than non-Hispanic Whites; Hispanic and non-Hispanic Black women had lower odds of belonging to the ‘low risk’ phenotype than non-Hispanic Whites; and non-Hispanic Black men and women had lower odds of belonging to the ‘high cholesterol’ phenotype.
The use of statins was associated with lower odds of belonging to the ‘high cholesterol’ phenotype for both men and women, demonstrating its effectiveness in controlling hypercholesterolemia. In contrast, diabetes medications, both oral and insulin, were associated with the ‘severe hyperglycemia’ phenotype in both sexes, as were antihypertensive medications for the ‘high blood pressure’ phenotype, albeit with a smaller magnitude than the former association. This shows that many individuals in these two phenotypes have uncontrolled diabetes or hypertension despite being treated41. Individuals on antihypertensive medicines also had higher odds of belonging to the ‘severe obesity’ phenotype, which provides one explanation for this group having a blood pressure level around the population median, despite the association between obesity and hypertension33. We also found that previous history of MI (both sexes) as well as previous history of CHF (women) were associated with the ‘low DBP, low eGFR’ phenotype even after adjusting for age and other predictors.
Influence of the number of clusters
As described in the Methods, while our main results are based on 10 clusters we also investigated cluster membership and characteristics when sequentially changing the number of clusters (k) from 5 to 12. Even with five clusters (k = 5), four epidemiologically relevant cardiometabolic and renal phenotypes were identified—‘low risk’, ‘severe hyperglycemia’, ‘high blood pressure’ and ‘severe obesity’—along with a ‘mid risk’ cluster that captured all other participants (Fig. 6 and Supplementary Fig. 1). As the number of clusters increased, more refined and specific groups were identified as subsets of one or more of the existing clusters. For instance, the ‘high cholesterol’ cluster appeared at k = 7 for women, with participants coming from the clusters of ‘high blood pressure’ and ‘mid risk’ at k = 6. Similarly, the ‘mid risk’ group for men at k = 7 split into ‘mid risk tall’ and ‘mid risk short’ at k = 8. For both sexes, the ‘severe hyperglycemia’ cluster appeared at k = 5 and remained relatively unchanged as k increased, as did the ‘low DBP, low eGFR’ cluster after k = 6.
Strengths and limitations
The strengths of our study include using a novel approach to identifying a comprehensive set of epidemiologically and clinically relevant phenotypes that characterizes the entire national population while covering four decades using repeated nationally representative samples with a largely consistent methodology, which allowed measuring change and disparities in phenotype prevalence and its predictors. Our study has some limitations. First, we did not include any inflammation-related biomarkers, such as C-reactive protein, or other cardiometabolic or renal biomarkers, such cystatin C or apolipoprotein B, because these data were not available in some rounds of NHANES. Second, this analysis was based on a series of repeated cross-sectional samples and was not designed to evaluate how an individual with a specific phenotype in one year may have shifted to another in a later year or how the identified phenotypes affect the risk of disease onset or death, which should be pursued with data from prospective cohort studies. Third, other clustering methods should be tested in future methodological assessments, especially probabilistic clustering methods that estimate the probabilities that each participant belongs to each cluster. Finally, although we analyzed some predictors of cluster allocation, future research should investigate how other factors, including genetics, diet, behaviors and the living environment, affect assignment to specific clusters.
Discussion
Application of data-driven clustering, which has been applied extensively to genomics data, to population-based risk factor data identified a comprehensive set of clinically relevant cardiometabolic and renal phenotypes in the US adult population over a period of four decades. The results showed an increase in the ‘severe obesity’ phenotype whose other cardiometabolic risks were not noticeably different from the average population, a stable prevalence of the ‘severe hyperglycemia’ phenotype and a sharp decrease in the ‘high cholesterol’ and ‘high blood pressure’ phenotypes. This improvement in vascular health has been partly offset by rising prevalence of those with poor kidney function in the ‘low DBP, low eGFR’ cluster.
To our knowledge, no study has applied data-driven clustering methods to repeated nationally representative data to identify multifactorial cardiometabolic and renal phenotypes, and to analyze their trends, in the US population. Our results were consistent with single-risk-factor trend studies on obesity, hypertension or blood lipids, which showed a rise in the former but a decline in the latter two risk factors, including in individuals with obesity34,35,36,42,43. Our result on the higher prevalence of the ‘low risk’ phenotype in women than in men was also consistent with previous findings on cardiovascular health of the US population44. We further observed a decrease in the ‘low risk’ phenotype in women and no detectable change for men, which was consistent with a reported statistically insignificant trend in the prevalence of optimal cardiometabolic health for both sexes combined44. We did not observe an increase in the ‘severe hyperglycemia’ phenotype between 1988 and 2018 despite the reported rise in diabetes in the United States71. This was because the ‘severe hyperglycemia’ phenotype was characterized by very high HbA1c levels and included individuals with uncontrolled diabetes, consistent with previous findings on diabetes subgroups53,54. The prevalence of people at such high levels of HbA1c has been relatively stable because improvements in diagnosis and management have countered the rise in total diabetes prevalence72. The ‘low DBP, low eGFR’ phenotype, which had two dominant features (high pulse pressure and poor kidney function), is consistent with the association between atherosclerosis and CKD73. This phenotype was found predominantly in older ages, had a high prevalence of diabetes and was associated with a history of MI and CHF for women, consistent with high levels of vascular–renal comorbidity in older ages74 and with the association of CHF with pulse pressure75. The observed increase in the ‘low DBP, low eGFR’ phenotype, especially in the early 2000s, was also consistent with the previously reported rise in the prevalence of CKD in the United States76. We did not identify a metabolically healthy obesity phenotype, which accounted for 9.7% of the US population in one study on this specific group77, even after allowing 12 clusters to be formed. There may be two reasons for this apparent difference. First, half of the people classified as metabolically healthy in the aforementioned study77 had one metabolic risk factor. Second, in our study, such people were clustered either in the ‘severe obesity’ phenotype or in the two mid-risk phenotypes. Finally, our results on ethnic and educational disparities in the prevalence of specific clusters were consistent with previous studies that considered risk factors either individually36,78 or through the lens of optimal cardiometabolic health23, but these studies did not examine disparities in a comprehensive set of cardiometabolic and renal phenotypes of risk factors. Our results are not directly comparable with those using electronic health records due to differences in the study population, methods and clinical conditions used in the clustering and because some of these studies aimed at identifying subtypes of specific diseases45,47,48,50,51,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68. Among such studies, two studies in different populations identified phenotypes characterized by compromised kidney function and low DBP50,56. Another study that used electronic health records in London also found a cluster with both CHF and CKD62, which is analogous to our ‘low DBP, low eGFR’ phenotype. One study using electronic health records found a subtype of type 2 diabetes characterized by very high HbA1c levels analogous to the ‘severe hyperglycemia’ phenotype identified in our study54.
Our analysis coherently uncovered epidemiological subgroups of the US population characterized by distinct profiles of cardiometabolic and renal risk factors. Some of these phenotypes were characterized by high levels of one or two closely related risk factors, whereas others were more complex and based on multiple seemingly unrelated traits that may share upstream clinical and sociodemographic determinants. Although genetics influences individual or multiple risk factors79,80,81,82,83,84,85, the risk factors that characterized the clusters identified in our study are also influenced by behavioral, environmental and dietary determinants as well as the use (or non-use) of medicines that lower risk factor levels. Future research combining these determinants with genetic data is needed to discern their contributions to the prevalence and trends in cardiometabolic phenotypes and their influence on the occurrence of disease. Our results apply to the US population, and future research should also compare cardiometabolic and renal phenotypes across populations with different diets, health behaviors, healthcare and genetics.
Although the prevalence of the phenotype characterized by very high BMI and WHtR has increased, this group had about average levels of other risk factors. Nonetheless, higher-than-median BMI was also a trait of the ‘severe hyperglycemia’ phenotype, which has not declined despite improvements in diabetes detection and treatment, reflecting the growth of incidence and prevalence of diabetes during the period examined38. There was a substantial decline in phenotypes characterized by high levels of non-HDL cholesterol and SBP and DBP, despite the rise in the ‘severe obesity’ phenotype. The use of antihypertensive medicines, which increased over time, may be one of the reasons that those in the ‘severe obesity’ cluster have near-average blood pressure levels despite their high BMI and WHtR levels. The use of statins and antihypertensive medications may have also shifted some treated individuals from the ‘high blood pressure’ and ‘high cholesterol’ groups into the two mid-risk ones, as seen in the correlated trends in the prevalence of the ‘high cholesterol’ and ‘high blood pressure’ phenotypes with the use of statins and antihypertensive medications, respectively (Fig. 7)86,87. These improvements have contributed to the decades-long decline in cardiovascular mortality in the United States through lower event rates and better survival88,89. The delayed vascular events and better survival, however, may have engendered a rise in an older group with increasingly vascular–renal comorbidities, represented by the ‘low DBP, low eGFR’ phenotype, among whom history of MI and stroke was common and the prevalence of CHF was high. The increase of the ‘high blood pressure’ phenotype since late 2000s may be due to the fact that hypertension treatment and control in the United States, and in other high-income countries, has not improved over the past decade90. This stagnation may be partly responsible for the recent deceleration in the decline of CVD mortality89. Public health actions, especially those that enhance access to healthier foods, such as fresh fruits and vegetables, legumes and unprocessed grains, as well as treatment of hypertension, high cholesterol and diabetes, can help shift an increasing share of the population from some of the high-risk phenotypes to low-risk and mid-risk ones and delay the onset of comorbid chronic conditions that characterized the ‘low DBP, low eGFR’ phenotype. New medicines for obesity, if their cost is lowered, may also reduce the prevalence of the ‘severe obesity’ phenotype, which has average levels of other risk factors, and also reduce BMI among people who fall in other high-risk clusters91. These interventions may be optimized and targeted in the future through precision public health approaches that use the entire risk factor profile or more efficient risk stratification and risk factor management through both clinical and community-based interventions.
Methods
Data
The NHANES is a nationally representative survey of the US non-institutionalized civilian population aged 2 months or older with a multistage, stratified clustered probability sample design. The first round of NHANES was done in 1959, and, since 1999, it has been conducted in continuous 2-year rounds. Details of survey design and sampling are provided elsewhere92 and are summarized below.
We used 11 rounds of NHANES, including NHANES III (1988–1994) and various rounds of continuous NHANES from 1999 to 2018, for analyzing trends in cardiometabolic and renal traits. We did not use rounds before NHANES III because they did not measure HbA1c. NHANES participants are not re-enrolled in subsequent years, except through chance. Therefore, our results represent cardiometabolic and renal clusters present in successive US populations.
Participants in each round of NHANES were sampled to be collectively representative of the population in the survey year. Ethnic minorities as well as older adults were oversampled to provide stable estimates for these groups. Sample weights were calculated to account for the complex survey design, survey non-response and post-stratification adjustment to match total population counts from the Census Bureau.
We restricted the analysis to participants aged 20 years and older who had all the required biomarker measurements available. We used the following risk factors in our study, based on their relevance to cardiometabolic and renal diseases and their availability in NHANES data.
Anthropometric measures: we used height (cm); BMI, defined as weight divided by height squared (kg m−2); and WHtR, defined as waist circumference divided by height. Being taller is associated with a lower risk of CVDs and all-cause mortality but a higher risk of some cancers13. High BMI is a risk factor for diabetes, CVDs, several cancers and kidney and liver diseases9,14. WHtR was included as a measure of abdominal obesity, which may increase the risk of disease and death independently of BMI93.
Blood pressure and heart rate: we used SBP and DBP as they are associated with increased risk of CVDs, kidney disease and dementia8. We included resting heart rate (RHR), as higher values have been associated with increased risk of cardiovascular and all-cause mortality3. RHR was measured as 60-s pulse and referred to as pulse rate.
Lipids: we used HDL and non-HDL cholesterol defined as total cholesterol (TC) minus HDL cholesterol. Non-HDL cholesterol is associated with higher risk of ischemic heart disease and stroke, and HDL cholesterol is a marker for lower risk11.
Glycemia: we used HbA1c as a proxy of average glucose levels in the blood for recent weeks, which has been associated with CVDs12, as the marker for glycemic risk and control.
Kidney function: we used eGFR (using the CKD-EPI creatinine equation) as a measure of kidney function, which is a predictor of CKD and CVDs5,6.
All the risk factors used in the clustering were measured. Physical examinations were conducted in a mobile examination center, and blood samples were drawn from a random subset of the participants. Blood pressure was measured three times on the right arm with a sphygmomanometer and appropriate cuff size in seated position after a 5-min rest period in all rounds. Both TC and HDL analyses were conducted on venous samples collected according to a standardized protocol. Although there were changes in the laboratories, methods and instruments used to measure lipid concentrations across survey periods were standardized according to the criteria of the Centers for Disease Control and Prevention (CDC) or the National Heart, Lung, and Blood Institute Lipid Standardization Program of the CDC94. HbA1c was measured in all NHANES cycles using high-performance liquid chromatography. We followed NHANES recommendations and did not apply any calibration correction based on cross-over regression. Before eGFR calculation, serum creatinine measurements were calibrated using a previously reported calibration equation95 to account for potential drift in measurement methods. More information on NHANES measurement, laboratory procedures and careful quality controls can be found on the survey website: http://www.cdc.gov/nchs/nhanes.htm.
We did not use data on inflammation markers, such as C-reactive protein, because these data were only available in some rounds of NHANES. We also used data on age, sex, race and ethnicity, education, history of diseases and medication use for examining the demographic and clinical characteristics of the clusters; these data were collected through a questionnaire.
Data cleaning
Before analyses, we conducted the following data cleaning procedure. First, we removed measurements outside pre-defined plausibility ranges (Supplementary Table 2). Second, for blood pressure, we discarded the first measurement and used the average of the remaining measurements. Third, for all participants, we confirmed that SBP > DBP and TC ≥ HDL. Finally, we applied an outlier detection procedure based on Mahalanobis distance96 to exclude risk factor pairs that had an implausible pairwise relationship relative to the overall data. This method uses the empirical relationship between risk factor pairs to detect extreme combinations, for example, a high SBP of 248 mmHg but low DBP of 40 mmHg or a high BMI of 42 kg m−2 but small waist circumference of 74 cm. We applied this technique separately to all pairs of anthropometric variables (height, weight, BMI, waist circumference and WHtR), those of blood pressure (SBP and DBP) and those of lipids (TC and HDL). All variables except height and DBP were log transformed before outlier detection to account for their skewed distributions. For each pair considered, observations with a Mahalanobis distance larger than 40.08 (equivalent to a distance of six standard deviations from the mean) were excluded. The present analysis used data from 58,452 participants (28,272 men and 30,180 women) after applying the above steps (Extended Data Fig. 4).
Statistical analysis—cluster identification
Our analytical objective was to divide the NHANES sample into groups of participants with risk factor levels that are similar to each other but distinct from those in other clusters. In extreme cases of one or more risk factors—for example, familial hypercholesterolemia or possibly type 1 diabetes—this task is relatively straightforward and may even be feasible based on prior knowledge or visual inspection of data. For national populations, however, such partitioning requires a method that operationalizes the analytical objective by partitioning the joint distribution of risk factors.
We used a k-means clustering algorithm to identify cardiometabolic and renal phenotypes of the US population in an unsupervised data-driven approach. The k-means algorithm partitions participants into non-overlapping clusters that are relatively homogeneous while maximizing the heterogeneity between clusters, by minimizing the sum of distances of all data points from the center of the cluster they belong to. The k-means algorithm is a specific form of Gaussian mixture method where only the means of the clusters are estimated but not their covariance97. It is a widely used and computationally efficient clustering algorithm that produces non-overlapping clusters. We took 50 different random sets of starting values to avoid converging to local minima and used Euclidian distance and the Lloyd implementation of the algorithm.
All analyses were conducted by pooling individual participant data across all survey rounds but separately for men and women to allow for potentially different clustering of cardiometabolic traits between them. We centered and scaled each risk factor by subtracting the overall mean and dividing by the standard deviation before clustering. In k-means, the number of clusters (k) must be pre-specified. Various heuristics have been suggested for selecting the optimal number of clusters—for example, the elbow method and the silhouette method—which compare measures of cluster cohesion and cluster separation for different choices of k. Neither the elbow nor the silhouette method provided a definitive optimal number of clusters (Supplementary Fig. 2). Therefore, we investigated cluster membership, and characteristics when sequentially changing k from 5 to 12, and selected k based on these heuristics as well as on the epidemiological interpretability of the results.
Stability of the clustering results
After selecting the number of clusters, we evaluated the stability of the resultant clusters by calculating the average Jaccard index98 between the clustering results over the entire sample and that of 1,000 subsamples of 50% of the data drawn without replacement (Extended Data Table 4). The Jaccard index is a measure of similarity between two groups and ranges from 0 to 1, with 0 indicating no overlap and 1 indicating identical results. For men, all clusters had an average Jaccard index of 0.87 or above; for women, all clusters had an average Jaccard index of 0.80 or above, except for the ‘mid risk tall’ phenotype that had an average Jaccard index of 0.70. To evaluate whether our analysis met our analytical objective of partitioning the joint distribution of risk factors based on a true correlation structure, we also used k-means to cluster 30,180 simulated data points (the same number as used in the main analysis). The simulated data were generated from a 10-dimensional normal distribution with no correlation. All the resulting clusters were highly unstable with a Jaccard index below 0.30, which is much lower than those of clusters identified on NAHNES data (Extended Data Table 4).
Intra-cluster and inter-cluster distances
We also report (Extended Data Fig. 5) the intra-cluster and inter-cluster distances as a measure of how the method achieves the analytical objective. The intra-cluster distance was calculated as the average Euclidian distance between all pairs of points in the same cluster, and the inter-cluster distance was calculated as the average Euclidian distance between all pairs of points from two different clusters. These metrics show that participants assigned to every cluster were, on average, more similar to one another in terms of their risk factor levels than they were to participants in any other cluster.
Consistency of clusters over time
We investigated whether clusters emerging from the analysis of all rounds of NHANES from 1988 to 2018 were similar to those that would emerge if we repeated the analysis for subperiods consisting of NHANES III 1988–1994, NHANES 1999–2008 and NHANES 2009–2018 separately (Supplementary Fig. 3). The phenotypes identified in subperiods were similar to those identified when aggregating all rounds from 1998 to 2018 for men. For women, most of the phenotypes identified over the entire analysis period remained in subperiod clustering, except the ‘mid risk tall’ phenotype, which was replaced by either an ‘obesity’ phenotype or a ‘mid risk’ phenotype, and except the ‘low DBP, low eGFR’ phenotype in NHANES III, which was replaced with a ‘high risk’ phenotype with hazardous levels of all risk factors.
Statistical analysis—trends in prevalence and predictors of cluster membership
In addition to graphical presentation of how cluster prevalence has changed over time, we analyzed the presence of a trend in a regression analysis. We fitted one logistic regression per cluster, with time as the independent variable. We adjusted for age by 5-year age bands and report the P value for the coefficient of the time term. In addition to the entire analysis period, we analyzed trends for pre-specified time periods of 1988–2000, 2000–2010 and 2010–2018 (Extended Data Table 3).
We also used multivariate logistic regression to analyze the predictors of cluster membership. The predictors included age group, survey year, race or ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic and Other ethnicity), education (below high school, high school and university or college), medication use (antihypertensive, statin, oral hypoglycemic diabetes medication and insulin), smoking (current smoking, never smoking and former smoking) and previous history of disease (MI, stroke and CHF).
When reporting the prevalence of clusters over time, and the potential predictors of cluster membership, we accounted for the sampling design through the use of sample weights in the regressions. In all regressions, we rescaled sample weights so that they summed to the same total in each round. We did this so that each round of NHANES contributes the same effective sample size to the analysis of trends and predictors. When evaluating trends over time and predictors of cluster membership, we also adjusted the sample weights by 5-year age bands to match the age distribution of the 2020 US census population. All analysis were done using R software version 4.0.3
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The data used for this analysis are publicly available and can be downloaded on the NHANES website: https://wwwn.cdc.gov/nchs/nhanes/Default.aspx.
Code availability
The computer code for the clustering and the multivariate analysis in this work is available at https://globalenvhealth.org/code-data-download/ and https://doi.org/10.5281/zenodo.10075387.
Change history
15 January 2024
A Correction to this paper has been published: https://doi.org/10.1038/s44161-024-00425-z
References
NCD Countdown 2030 Collaborators. NCD Countdown 2030: worldwide trends in non-communicable disease mortality and progress towards Sustainable Development Goal target 3.4. Lancet 392, 1072–1088 (2018).
NCD Countdown 2030 Collaborators. NCD Countdown 2030: pathways to achieving Sustainable Development Goal target 3.4. Lancet 396, 918–934 (2020).
Aune, D. et al. Resting heart rate and the risk of cardiovascular disease, total cancer, and all-cause mortality—a systematic review and dose–response meta-analysis of prospective studies. Nutr. Metab. Cardiovasc. Dis. 27, 504–517 (2017).
Cheng, G., Huang, C., Deng, H. & Wang, H. Diabetes as a risk factor for dementia and mild cognitive impairment: a meta‐analysis of longitudinal studies. Intern. Med. J. 42, 484–491 (2012).
Chronic Kidney Disease Prognosis Consortium. Association of estimated glomerular filtration rate and albuminuria with all-cause and cardiovascular mortality in general population cohorts: a collaborative meta-analysis. Lancet 375, 2073–2081 (2010).
Gansevoort, R. T. et al. Lower estimated GFR and higher albuminuria are associated with adverse kidney outcomes. A collaborative meta-analysis of general and high-risk population cohorts. Kidney Int. 80, 93–104 (2011).
Kannel, W. B., Dawber, T. R., Kagan, A., Revotskie, N. & Stokes, J. III Factors of risk in the development of coronary heart disease—six-year follow-up experience: the Framingham Study. Ann. Intern. Med 55, 33–50 (1961).
Kennelly, S. P., Lawlor, B. A. & Kenny, R. A. Blood pressure and dementia—a comprehensive review. Ther. Adv. Neurol. Disord. 2, 241–260 (2009).
Kyrgiou, M. et al. Adiposity and cancer at major anatomical sites: umbrella review of the literature. BMJ 356, j477 (2017).
Lewington, S., Clarke, R., Qizilbash, N., Peto, R. & Collins, R. Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies. Lancet 360, 1903–1913 (2002).
The Emerging Risk Factor Collaboration. Major lipids, apolipoproteins, and risk of vascular disease. JAMA 302, 1993–2000 (2009).
The Emerging Risk Factor Collaboration. Glycated hemoglobin measurement and prediction of cardiovascular disease. JAMA 311, 1225–1233 (2014).
The Emerging Risk Factors Collaboration. Adult height and the risk of cause-specific death and vascular morbidity in 1 million people: individual participant meta-analysis. Int. J. Epidemiol. 41, 1419–1433 (2012).
The Global BMI Mortality Collaboration. Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents. Lancet 388, 776–786 (2016).
Tsilidis, K. K., Kasimis, J. C., Lopez, D. S., Ntzani, E. E. & Ioannidis, J. P. Type 2 diabetes and cancer: umbrella review of meta-analyses of observational studies. BMJ 350, g7607 (2015).
Mahamat-Saleh, Y. et al. Diabetes, hypertension, body mass index, smoking and COVID-19-related mortality: a systematic review and meta-analysis of observational studies. BMJ Open 11, e052777 (2021).
Angell, S. Y. et al. The American Heart Association 2030 impact goal: a presidential advisory from the American Heart Association. Circulation 141, e120–e138 (2020).
Kannel, W. B., McGee, D. & Gordon, T. A general cardiovascular risk profile: the Framingham Study. Am. J. Cardiol. 38, 46–51 (1976).
Lloyd-Jones, D. M. et al. Defining and setting national goals for cardiovascular health promotion and disease reduction: the American Heart Association’s strategic Impact Goal through 2020 and beyond. Circulation 121, 586–613 (2010).
Rasmussen-Torvik, L. J. et al. Ideal cardiovascular health is inversely associated with incident cancer: the Atherosclerosis Risk In Communities study. Circulation 127, 1270–1275 (2013).
Stamler, J. et al. Low risk-factor profile and long-term cardiovascular and noncardiovascular mortality and life expectancy: findings for 5 large cohorts of young adult and middle-aged men and women. JAMA 282, 2012–2018 (1999).
Carter, P., Gray, L. J., Troughton, J., Khunti, K. & Davies, M. J. Fruit and vegetable intake and incidence of type 2 diabetes mellitus: systematic review and meta-analysis. BMJ 341, c4229 (2010).
Filippini, T. et al. Blood pressure effects of sodium reduction: dose–response meta-analysis of experimental studies. Circulation 143, 1542–1567 (2021).
Filippini, T. et al. Potassium intake and blood pressure: a dose–response meta‐analysis of randomized controlled trials. J. Am. Heart Assoc. 9, e015719 (2020).
Gay, H. C., Rao, S. G., Vaccarino, V. & Ali, M. K. Effects of different dietary interventions on blood pressure: systematic review and meta-analysis of randomized controlled trials. Hypertension 67, 733–739 (2016).
Ley, S. H., Hamdy, O., Mohan, V. & Hu, F. B. Prevention and management of type 2 diabetes: dietary components and nutritional strategies. Lancet 383, 1999–2007 (2014).
Mensink, R. P. Effects of saturated fatty acids on serum lipids and lipoproteins: a systematic review and regression analysis. https://iris.who.int/bitstream/handle/10665/246104/9789241565349-eng.pdf (World Health Organization, 2016).
Mente, A. et al. Association of dietary nutrients with blood lipids and blood pressure in 18 countries: a cross-sectional analysis from the PURE study. Lancet Diabetes Endocrinol. 5, 774–787 (2017).
Sacks, F. M. & Campos, H. Dietary therapy in hypertension. N. Engl. J. Med. 362, 2102–2112 (2010).
Meader, N. et al. A systematic review on the clustering and co-occurrence of multiple risk behaviours. BMC Public Health 16, 657 (2016).
Bentham, J. et al. Multidimensional characterization of global food supply from 1961 to 2013. Nat. Food 1, 70–75 (2020).
Goodarzi, M. O. Genetics of obesity: what genetic association studies have taught us about the biology of obesity and its complications. Lancet Diabetes Endocrinol. 6, 223–236 (2018).
Lu, Y. et al. Metabolic mediators of the effects of body-mass index, overweight, and obesity on coronary heart disease and stroke: a pooled analysis of 97 prospective cohorts with 1.8 million participants. Lancet 383, 970–983 (2014).
Carroll, M. D., Kit, B. K., Lacher, D. A., Shero, S. T. & Mussolino, M. E. Trends in lipids and lipoproteins in US adults, 1988–2010. JAMA 308, 1545–1554 (2012).
Hales, C. M., Fryar, C. D., Carroll, M. D., Freedman, D. S. & Ogden, C. L. Trends in obesity and severe obesity prevalence in US youth and adults by sex and age, 2007–2008 to 2015–2016. JAMA 319, 1723–1725 (2018).
He, J. et al. Trends in cardiovascular risk factors in US adults by race and ethnicity and socioeconomic status, 1999–2018. JAMA 326, 1286–1298 (2021).
NCD Risk Factor Collaboration (NCD-RisC). A century of trends in adult human height. eLife 5, e13410 (2016).
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in diabetes since 1980: a pooled analysis of 751 population-based studies with 4.4 million participants. Lancet 387, 1513–1530 (2016).
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: a pooled analysis of 2416 population-based measurement studies in 128.9 million children, adolescents, and adults. Lancet 390, 2627–2642 (2017).
NCD Risk Factor Collaboration (NCD-RisC). Repositioning of the global epicentre of non-optimal cholesterol. Nature 582, 73–77 (2020).
NCD Risk Factor Collaboration (NCD-RisC). Worldwide trends in hypertension prevalence and progress in treatment and control from 1990 to 2019: a pooled analysis of 1201 population-representative studies with 104 million participants. Lancet 398, 957–980 (2021).
Saydah, S. et al. Trends in cardiovascular disease risk factors by obesity level in adults in the United States, NHANES 1999–2010. Obesity 22, 1888–1895 (2014).
Zhang, Y. & Moran, A. E. Trends in the prevalence, awareness, treatment, and control of hypertension among young adults in the United States, 1999 to 2014. Hypertension 70, 736–742 (2017).
O’Hearn, M., Lauren, B. N., Wong, J. B., Kim, D. D. & Mozaffarian, D. Trends and disparities in cardiometabolic health among U.S. adults, 1999–2018. J. Am. Coll. Cardiol. 80, 138–151 (2022).
Wilson, P. W., Kannel, W. B., Silbershatz, H. & D’Agostino, R. B. Clustering of metabolic factors and coronary heart disease. Arch. Intern. Med. 159, 1104–1109 (1999).
Mottillo, S. et al. The metabolic syndrome and cardiovascular risk: a systematic review and meta-analysis. J. Am. Coll. Cardiol. 56, 1113–1132 (2010).
Primeau, V. et al. Characterizing the profile of obese patients who are metabolically healthy. Int. J. Obesity 35, 971–981 (2011).
Stefan, N., Häring, H.-U., Hu, F. B. & Schulze, M. B. Metabolically healthy obesity: epidemiology, mechanisms, and clinical implications. Lancet Diabetes Endocrinol. 1, 152–162 (2013).
Wang, J.-S. et al. Trends in the prevalence of metabolically healthy obesity among US adults, 1999–2018. JAMA Network Open 6, e232145 (2023).
Miller, L. M. et al. Cardiovascular damage phenotypes and all-cause and CVD mortality in older adults. Ann. Epidemiol. 63, 35–40 (2021).
Liao, X., Kerr, D., Morales, J. & Duncan, I. Application of machine learning to identify clustering of cardiometabolic risk factors in US adults. Diabetes Technol. Ther. 21, 245–253 (2019).
Antonio-Villa, N. E. et al. Prevalence trends of diabetes subgroups in the united states: a data-driven analysis spanning three decades from NHANES (1988–2018). J. Clin. Endocrinol. Metab. 107, 735–742 (2022).
Bancks, M. P., Casanova, R., Gregg, E. W. & Bertoni, A. G. Epidemiology of diabetes phenotypes and prevalent cardiovascular risk factors and diabetes complications in the National Health and Nutrition Examination Survey 2003–2014. Diabetes Res. Clin. Pract. 158, 107915 (2019).
Xue, Q. et al. Subtypes of type 2 diabetes and incident cardiovascular disease risk: UK Biobank and All of Us cohorts. Mayo Clin. Proc. 98, 1192–1204 (2023).
Seymour, C. W. et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA 321, 2003–2017 (2019).
Zweck, E. et al. Phenotyping cardiogenic shock. J. Am. Heart Assoc. 10, e020085 (2021).
Markovich Gordon, M., Moser, A. M. & Rubin, E. Unsupervised analysis of classical biomedical markers: robustness and medical relevance of patient clustering using bioinformatics tools. PLoS ONE 7, e29578 (2012).
Zemedikun, D. T., Gray, L. J., Khunti, K., Davies, M. J. & Dhalwani, N. N. Patterns of multimorbidity in middle-aged and older adults: an analysis of the UK Biobank data. Mayo Clin. Proc. 93, 857–866 (2018).
Violan, C. et al. Prevalence, determinants and patterns of multimorbidity in primary care: a systematic review of observational studies. PLoS ONE 9, e102149 (2014).
Prados-Torres, A., Calderón-Larrañaga, A., Hancco-Saavedra, J., Poblador-Plou, B. & van den Akker, M. Multimorbidity patterns: a systematic review. J. Clin. Epidemiol. 67, 254–266 (2014).
Alhasoun, F. et al. Age density patterns in patients medical conditions: a clustering approach. PLoS Comput. Biol. 14, e1006115 (2018).
Bisquera, A. et al. Identifying longitudinal clusters of multimorbidity in an urban setting: a population-based cross-sectional study. Lancet Reg. Health Eur. 3, 100047 (2021).
Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit. Med. 3, 96 (2020).
Roso-Llorach, A. et al. Comparative analysis of methods for identifying multimorbidity patterns: a study of ‘real-world’data. BMJ Open 8, e018986 (2018).
Zhu, Y., Edwards, D., Mant, J., Payne, R. A. & Kiddle, S. Characteristics, service use and mortality of clusters of multimorbid patients in England: a population-based study. BMC Med. 18, 78 (2020).
Yang, S., Varghese, P., Stephenson, E., Tu, K. & Gronsbell, J. Machine learning approaches for electronic health records phenotyping: a methodical review. J. Am. Med. Inform. Assoc. 30, 367–381 (2022).
De Freitas, J. K. et al. Phe2vec: automated disease phenotyping based on unsupervised embeddings from electronic health records. Patterns 2, 100337 (2021).
Loftus, T. J. et al. Phenotype clustering in health care: a narrative review for clinicians. Front. Artif. Intell. 5, 842306 (2022).
Multimorbidity: a priority for global health research. https://acmedsci.ac.uk/file-download/82222577 (Academy of Medical Sciences, 2018).
Pearson-Stuttard, J., Ezzati, M. & Gregg, E. W. Multimorbidity—a defining challenge for health systems. Lancet Public Health 4, e599–e600 (2019).
Wang, L. et al. Trends in prevalence of diabetes and control of risk factors in diabetes among US adults, 1999–2018. JAMA 326, 704–716 (2021).
Selvin, E., Parrinello, C. M., Sacks, D. B. & Coresh, J. Trends in prevalence and control of diabetes in the United States, 1988–1994 and 1999–2010. Ann. Intern. Med. 160, 517–525 (2014).
Sarnak, M. J. et al. Chronic kidney disease and coronary artery disease: JACC state-of-the-art review. J. Am. Coll. Cardiol. 74, 1823–1838 (2019).
Salive, M. E. Multimorbidity in older adults. Epidemiol. Rev. 35, 75–83 (2013).
Chae, C. U. et al. Increased pulse pressure and risk of heart failure in the elderly. JAMA 281, 634–643 (1999).
Coresh, J. et al. Prevalence of chronic kidney disease in the United States. JAMA 298, 2038–2047 (2007).
Wildman, R. P. et al. The obese without cardiometabolic risk factor clustering and the normal weight with cardiometabolic risk factor clustering: prevalence and correlates of 2 phenotypes among the US population (NHANES 1999–2004). Arch. Intern. Med. 168, 1617–1624 (2008).
Kanjilal, S. et al. Socioeconomic status and trends in disparities in 4 major risk factors for cardiovascular disease among US adults, 1971–2002. Arch. Intern. Med. 166, 2348–2355 (2006).
Dong, G., Feng, J., Sun, F., Chen, J. & Zhao, X.-M. A global overview of genetically interpretable multimorbidities among common diseases in the UK Biobank. Genome Med. 13, 110 (2021).
Evangelou, E. et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nat. Genet. 50, 1412–1425 (2018).
Graham, S. E. et al. The power of genetic diversity in genome-wide association studies of lipids. Nature 600, 675–679 (2021).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Mahajan, A. et al. Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation. Nat. Genet. 54, 560–572 (2022).
Wood, A. C. et al. Identification of genetic loci simultaneously associated with multiple cardiometabolic traits. Nutr. Metab. Cardiovasc. Dis. 32, 1027–1034 (2022).
Yengo, L. et al. A saturated map of common genetic variants associated with human height. Nature 610, 704–712 (2022).
Muntner, P. et al. Trends in blood pressure control among US adults with hypertension, 1999–2000 to 2017–2018. JAMA 324, 1190–1200 (2020).
Salami, J. A. et al. National trends in statin use and expenditures in the US adult population from 2002 to 2013: insights from the Medical Expenditure Panel Survey. JAMA Cardiol. 2, 56–65 (2017).
McGovern, P. G. et al. Trends in acute coronary heart disease mortality, morbidity, and medical care from 1985 through 1997: the Minnesota heart survey. Circulation 104, 19–24 (2001).
Shah, N. S. et al. Trends in cardiometabolic mortality in the United States, 1999–2017. JAMA 322, 780–782 (2019).
NCD Risk Factor Collaboration (NCD-RisC). Long-term and recent trends in hypertension awareness, treatment, and control in 12 high-income countries: an analysis of 123 nationally representative surveys. Lancet 394, 639–651 (2019).
Bessesen, D. H. & Van Gaal, L. F. Progress and challenges in anti-obesity pharmacotherapy. Lancet Diabetes Endocrinol. 6, 237–248 (2018).
Chen, T.-C., Clark, J., Riddles, M. K., Mohadjer, L. K. & Fakhouri, T. H. National Health and Nutrition Examination Survey, 2015−2018: sample design and estimation procedures. https://www.cdc.gov/nchs/data/series/sr_02/sr02-184-508.pdf (National Center for Health Statistics, 2020).
Ashwell, M., Gunn, P. & Gibson, S. Waist‐to‐height ratio is a better screening tool than waist circumference and BMI for adult cardiometabolic risk factors: systematic review and meta‐analysis. Obesity Rev. 13, 275–286 (2012).
Myers, G. L., Cooper, G. R., Winn, C. L. & Smith, S. J. The centers for disease control-national heart, lung and blood institute lipid standardization program: an approach to accurate and precise lipid measurements. Clin. Lab. Med. 9, 105–136 (1989).
Murphy, D. et al. Trends in prevalence of chronic kidney disease in the United States. Ann. Intern. Med. 165, 473–481 (2016).
Rousseeuw, P. J. & van Zomeren, B. C. Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85, 633–639 (1990).
Bishop, C. M. & Nasrabadi, N. M. Pattern Recognition and Machine Learning (Springer, 2006).
Hennig, C. Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52, 258–271 (2007).
Newcombe, R. G. Two‐sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17, 857–872 (1998).
Acknowledgements
This work was funded by a grant from the UK Medical Research Council (grant no. MR/V034057/1, to M.E.). B.Z. is supported by a fellowship from the Abdul Latif Jameel Institute for Disease and Emergency Analytics at Imperial College London, funded by a donation from Community Jameel. The funders had no role in the design and conduct of the study; in the collection, management, analysis and interpretation of the data; in the preparation, review or approval of the manuscript; or in the decision to submit the manuscript for publication.
Author information
Authors and Affiliations
Contributions
V.P.F.L., J.E.B., S.F. and M.E. conceived and designed the study. V.P.F.L., J.E.B., B.Z., A.M., S.F. and M.E. developed the analytical strategy. V.P.F.L. conducted analysis, in consultation with B.Z., J.E.B. and A.M. V.P.F.L., M.E., J.E.B. and A.M. interpreted the data and drafted the figures. V.P.F.L. and M.E. wrote the first draft of the manuscript. E.W.G., P.A. and G.D. provided input to finalize the paper. All authors had full access to all data used in this study. V.P.F.L. and B.Z. checked and verified the data used in the analysis. All authors were responsible for submitting the article for publication.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Cardiovascular Research thanks Melissa Haendel, Simin Liu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Risk factor distribution within each cluster.
Each panel corresponds to a cluster, the color shows the distribution of each variable in each cluster with darker color at the center of the distribution. The concentric circles show the minimum, 25th, 50th, 75th percentiles and maximum in the whole sample, with the median shown in darker color. Each percentile is positioned relative to the distribution in the whole population so that the scale is common across all clusters. The scale is reversed for height, eGFR and HDL because lower values indicate higher risk. eGFR: estimated glomerular filtration rate; BMI: body-mass index; WHtR: waist-to-height ratio; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein cholesterol; non-HDL: non-high-density lipoprotein cholesterol; SBP: systolic blood pressure; DBP: diastolic blood pressure.
Extended Data Fig. 2
Risk factor levels by age group in cardiometabolic and renal clusters. Each panel corresponds to a cluster, each line shows the median value of one biomarker for one age group within each cluster. The concentric circles show the minimum, 25th, 50th, 75th percentiles and maximum in the whole sample, with the median shown in darker color. Each line is positioned relative to the distribution in the whole population so that the scale is common across all clusters and age groups. The scale is reversed for height, eGFR and HDL because lower values indicate higher risk. eGFR: estimated glomerular filtration rate; BMI: body-mass index; WHtR: waist-to-height ratio; HbA1c: glycated hemoglobin; HDL: high-density lipoprotein cholesterol; non-HDL: non-high-density lipoprotein cholesterol; SBP: systolic blood pressure; DBP: diastolic blood pressure.
Extended Data Fig. 3 Trends in crude prevalence of cardiometabolic and renal clusters from 1988 to 2018.
Crude prevalence was calculated as overall prevalence in each NHANES round without any adjustment for the age structure of the participants.
Extended Data Fig. 4 Flowchart of data cleaning.
Data cleaning per survey round.
Extended Data Fig. 5 Average intra- and inter-cluster distances for both women and men.
Each cell of the diagonal represents the average Euclidian distance between all pairs in a given cluster (Inter-cluster distance). Each cell on the off diagonal represents the average Euclidian distance between all pairs of individuals from different clusters (intra-cluster distance).
Supplementary information
Supplementary Information
Supplementary Figs. 1–3 and STROBE checklist for cross-sectional studies.
Supplementary Table
Supplementary Table 1: Risk factor distributions in cardiometabolic and renal clusters. Supplementary Table 2: Pre-defined ranges used for data cleaning.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lhoste, V.P.F., Zhou, B., Mishra, A. et al. Cardiometabolic and renal phenotypes and transitions in the United States population. Nat Cardiovasc Res 3, 46–59 (2024). https://doi.org/10.1038/s44161-023-00391-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s44161-023-00391-y