Predicting hypertension onset: the current practice

Risk equations are essential tools that guide clinicians’ decisions about which health services to use or preventative strategies to recommend when caring for patients. Knowing patients’ risk scores or estimated likelihoods of disease onset also helps clinicians communicate with their patients about the importance and, in some cases, the urgency of behavioral modification to delay disease onset or minimize further progression [1]. Clinicians can use risk equations routinely, leveraging data from various sources. At least 48 different hypertension risk equations have been developed to date [1]. The utility of risk equations and their likelihood of adoption depends not only on their predictive accuracy but also their usability. In general, predictive accuracy means two things: (1) the degree to which risk equations can accurately distinguish between people who will have or not have a disease (i.e., a measure of discrimination) and (2) whether they can predict disease for people at different levels of risk (i.e., a measure of calibration) [2]. Input variables for risk equations should also be readily available to clinicians (e.g., information collected during a regular office visit and stored in medical records). Risk equations that require extensive input variables may be used less given a fast turnaround of each visit [1]. The vast majority of the existing hypertension risk equations predict disease onset based on age, sex, current blood pressure (often systolic blood pressure or prehypertensive/high-normal status), body mass index (BMI), smoking status, exercise, and family history; [1] in other words, demographic, behavioral, and non-invasive biomarker data.

Another characteristic of risk equations that clinicians often overlook is whether the equations predict a disease equally well for people from different social groups. In recent years, the lack of attention to this characteristic has been highlighted by several empirical studies demonstrating that the disease status of Black patients can be systematically misclassified if a race variable (i.e., a socially constructed characteristic) [3] is embedded in risk equations to adjust for unexplained risk differences between Black and White patients [4]. Using a race-correction term to predict patients’ health trajectory without considering health-harming effects of racism and differences in social needs and social determinants of health deprives unhealthy Black patients from access to entitled health services and compensation [5, 6]. This evidence of inequitable care led to calls to remove a race-correction term from risk equations and other aspects of clinical practices [4, 7].

Work as a social determinant of hypertension

The work we do affects our risk of hypertension. Three occupational determinants of hypertension have been extensively researched in the past few decades: job insecurity, job loss, and the psychosocial work environment. Job insecurity and job loss, the level of which depends on workers’ occupations [8], increase the risk of hypertension among workers directly as added sources of stress and indirectly by disrupting access to essential health-promoting resources (e.g., income, healthcare coverage). Studies from the United States (US) and Scandinavian countries show that workers who have experienced job insecurity or job loss are more likely to be hypertensive than those who have not [9,10,11]. The degree to which these risk factors affect workers’ blood pressure varies based on the workers’ gender, age when job loss occurs, and the duration of unemployment [9,10,11].

While workers’ ability to maintain employment is an important predictor of hypertension, the nature of work and the workplace also predict the onset. Well-documented evidence shows that exposure to work-related psychosocial stressors, also referred to as psychosocial work environment (PWE), predicts high stress and increases the risk of hypertension [12]. Three dimensions of PWE predict hypertension onset among workers: job demand, job control, and work-related support [13, 14]. Job demand is characterized by the worker’s workload, time pressure, and role conflict. Job control is characterized by the worker’s ability to control his/her work activities. This dimension of the PWE consists of two subcomponents: 1) skill discretion (levels of skill and creativity required on the job and the flexibility an employee is permitted in deciding what skills to use); and 2) and decision authority (opportunities for workers to make decisions about their work). The third PWE dimension, work-related support, is characterized by levels of social interaction and support received from coworkers and supervisors. Workers whose occupation is associated with high job demand, low job control, and have limited support face an increased risk of hypertension onset compared to workers whose occupation is low in job demand, high in job control, and receive extensive support [13, 14].

Current study

Despite strong evidence that work characteristics predict the onset of hypertension, no existing risk equations account for these variables. This oversight may represent a missed opportunity for clinicians to better identify workers at high risk of hypertension onset. The lack of risk equations that account for work characteristics also hinders clinicians’ ability to look beyond their clinic walls to address a root cause of hypertension inequities in the community they serve. The objective of our study is to examine the extent to which including job loss/job insecurity (measured by employment status) and PWE (measured by the occupation-based measures derived from the publicly available occupation rating data) improve the accuracy of the prediction of hypertension onset for workers. We purposefully excluded a race-correction term from our risk equations and evaluated social determinants of health that are the root cause of racial inequity in hypertension instead.

Materials and methods


Our analysis used the public data from the Coronary Artery Risk Development in Young Adults (CARDIA)––a multi-center prospective study designed to assess exposure to cardiovascular risk factors in young adulthood and patterns of health outcomes in later life. Detailed descriptions of the study design and data collection procedures are published elsewhere [15]. In this secondary analysis, we used participants’ work characteristics and relevant hypertension risk factors data from the follow-up assessment conducted in 2000 (Year 15) to predict their hypertension onset in 2005 (Year 20).

CARDIA collected data on a host of cardiovascular risk factors. At all assessments, certified technicians measured the blood pressure of the participants. The systolic (SBP) and diastolic blood pressure (DBP) were measured three times from the participant’s right arm using the Hawksley random-zero sphygmomanometer (Year 15) and the Omron HEM907XL machine (Year 20). We examined non-work-related risk factors as suggested by the current literature [16]. Primary risk factors consisted of each participant’s age, gender (male; female), educational attainment (less than high school, some college to having a college degree; higher than college), parental history of hypertension, BMI, and diagnoses of health conditions that are positively associated with hypertension, including high cholesterol, diabetes (type 1 or type 2), heart problems (e.g., heart attack, angina, rheumatic heart disease, mitral valve prolapse), and kidney problems (e.g., urine infection from kidney, kidney stone, kidney problems like nephritis or glomerulonephritis, kidney failure, dialysis, or a kidney transplant). We also included participants’ insurance coverage at the Year 15 assessment, as individuals who have difficulty accessing healthcare when needed are less likely to be in control of blood pressure and other conditions co-occurring with hypertension (e.g., diabetes) [17]. Lastly, we also included three health behaviors that are associated with hypertension: [16] drinking status (current drinker; not current drinker), smoking (never smoker, former smoker, current smoker), and physical inactivity during the past year compared to people with the same gender and age (physically inactive; physically active).

As for work-related characteristics, CARDIA assigned participants the 3-digit 1990 US Census Occupational Classification (OCC) based on their response to questions about their industry of employment, occupation, and a class of worker. To determine employment status, participants were asked to describe their current main daily activities and/or responsibilities, with possible responses including working full-time, working part-time, being unemployed/laid off, looking for work, and housekeeping or raising children full time. To assess PWE for workers in our study, we derived the occupation-based measures using the occupation rating data from the Occupational Information Network (O*Net) database [18]. This publicly available online database contains ratings of over 200 occupational characteristics from approximately 900 occupations in the US economy. Ratings of occupational tasks, work activities, knowledge, education and training, work styles, and work context are based on questionnaires completed by a representative sample of current job incumbents. Trained occupational analysts also provide ratings on skills required for particular occupations and how important all rated characteristics are to the success of specific occupations. Since 2005, the O*Net data have been used in more than 60 published studies to examine the extent to which various work-related exposures predict workers’ health status. We used similar O*Net items as in previous studies to derive the job demand, job control, and support measures [19, 20]. The possible range is 0–4 for the job demand and job control measures and 0–2 for the support measure, with higher scores indicating greater extent. For a description of the development process of our PWE measures, the values associated with specific occupations, and how we mapped these measures to the CARDIA data, please refer to the online supplements S1S3.

Exclusion criteria

The total number of participants who received an assessment in Year 15 was 3,671. We restricted our analytical sample to participants with non-military occupations [21] who reported working full-time, part-time, and unemployed, laid off, or actively looking for jobs in Year 15. Participants were excluded if they worked in occupations to which the PWE measures could not be assigned (see online supplement S4), if they were hypertensive or had a history of hypertension before Year 15, were pregnant at the time of the assessment (women only), had a missing hypertension status in Year 20, or had missing covariates described above. Figure 1 displays the number of excluded participants in each exclusion step to arrive at the final analytical sample (N = 1,909). The age, gender, and educational makeup of our final analytical sample were not significantly different from the excluded group. However, the proportion of Black participants in our analytical sample was significantly smaller than those in the excluded group (37.4% vs. 54.3%, p < 0.001).

Fig. 1
figure 1

Sample Exclusion.

Model estimation and validation

We defined hypertension as having SBP ≥ 140 mmHg, DBP ≥ 90 mmHg, and/or if the participant was taking hypertensive medication, consistent with the Seventh Report of Joint National Committee on Prevention, Detection, Evaluation, and Treatment of Hypertension guideline (JNC-7) [22]. Participants with prehypertension were those with SBP of 120 to 139 mmHg and/or DBP of 80–89 mmHg [22]. We fitted a logistic regression to estimate a five-year probability of hypertension. Similar to other US-based models [23,24,25], we minimized the use of interaction terms in our models. We tested 2-way interactions between the three PWE measures with gender and age groups, grounded on the evidence that these demographic characteristics likely affect ones’ experience in the workplace [20]. We employed a backward selection algorithm to arrive at the most parsimonious model [26]. For each round of deletion, the algorithm calculated the Bayesian Information Criterion (BIC), which combines the model likelihood with a penalty for the number of predictors in the model; [27] the model with the lowest BIC was considered the most parsimonious model. To ensure that the final model reasonably predicted hypertension among workers and takes into consideration the employment status and PWE, we purposefully programmed the selection algorithm to generate several models: (1) retained variables that were significantly associated with the five-year probability of hypertension (simplified model); (2) retained significant variables and employment status and all three dimensions of PWE regardless of their p value (full PWE model); and (3) retained significant variables, employment status, and then only those PWE dimension that were significantly associated with hypertension (partial PWE model). For each PWE model, we also tested whether retaining participants’ gender and age on top of the work-related variables further improve the model’s performance or not.

We validated our models using two metrics. Model discrimination was evaluated using the Harrell c-statistic; predictive models with a c-statistic of 0.7 or higher are typically considered to have good discrimination [28]. Model calibration was evaluated using the Hosmer–Lemeshow goodness-of-fit statistic, which is distributed as chi-squared. A statistically significant result for the goodness-of-fit test indicates that there is a statistically significant difference in the predicted number of outcomes compared to the observed number of outcomes for at least one group (i.e., poor calibration).

Comparison of model prediction accuracy for Black and White participants

We compared the c-statistics evaluated for the Black participants only and White participants only. Because our analytical sample contained an unequal number of Black (n = 733) and White (n = 1,176) participants, this sample size difference may contribute to differences in model prediction accuracy between these groups. We, therefore, created 200 datasets of White participants of the same sample size as that of Black participants (i.e., n = 733) by sampling without replacement from the original sample of White participants. We then repeated the estimation process described above for the combined Black and White samples (total n = 1,466 for each dataset). The mean of the c-statistics estimated for White participants based on all 200 samples was compared to the c-statistic evaluated using the full sample of Black participants with two-sample T-tests.

All data management and analyses were conducted in R version 3.5.2. The Institutional Review Board of the University of Minnesota reviewed this study protocol and designated it as non-human research.


Table 1 displays the characteristics and hypertension status of participants in our sample. After applying the exclusion criteria, our sample consisted of 1,909 participants with a mean age of 40.3 years that were relatively balanced in sex (54% female and 46% male) but unbalanced in race (38% Black and 62% White). A large majority of the participants in our sample worked full-time (81%) or part-time (12%). The mean job demand, job control, and support scores were 1.8 (SD = 0.3), 2.7 (SD = 0.6), 1.2 (SD = 0.2), respectively. Overall, the participants were generally healthy, with a mean BMI of 27.9 and prevalence of hypertension co-occurring conditions lower than the national average [29]. Most participants reported drinking alcohol, never smoke, and being physically active. At Year 20, 11.3% of participants who were hypertension-free at Year 15 developed hypertension.

Table 1 Analytical Sample Characteristics at Year 15 and Their Hypertension Status at Year 20 (n = 1,909).

Table 2 displays the parsimonious models with the lowest BIC for the five backward selection strategies. When no restriction was imposed on the selection (simplified model), the backward selection identified the logarithm of the BMI, being a current smoker, being prehypertensive, and having a family history of hypertension as predictors of hypertension onset in the next five years. When we purposefully programmed the selection algorithm to retain employment status and all three dimensions of the PWE (full PWE model), all predictors identified in the simplified model remained significant in this model. Levels of job control associated with the worker’s occupation was the only PWE dimension that predicted hypertension onset in five years with p < 0.05. When participant’s gender and age were also retained (full PWE model with gender and age), the coefficients of the predictors changed slightly compared to in the full PWE model. When we purposefully programmed the selection algorithm to retain only employment status and job control (partial PWE model), the final model identified the same set of predictors as in the full PWE model. The addition of gender and age (partial PWE model with gender and age) did not change the effect size of the model predictors significantly. All five models described earlier had high discrimination (greater than 0.8) and were well-calibrated. The full PWE model with gender and age, the partial PWE model, and the partial PWE model with gender and age had the highest c-statistics (0.805), but the simpler full PWE predicts hypertension onset almost equally well (c-statistic = 0.804) with slightly better calibration (Hosmer–Lemeshow χ2 = 6.09).

Table 2 Logistic Regression Models Predicting Five-year Hypertension Onset and Their Validation Statistics.

Table 3 displays the mean c-statistics of the five models evaluated with 200 datasets with all Black and equal number of sampled White participants. All models predicted hypertension well for both Black and White participants, as indicated by c-statistics of greater than 0.7. We observed significantly lower c-statistics when using data of Black participants compared to using data of sampled White participants in all models. This finding suggests that even when the prediction models are estimated using the same number of Black and White participants, our models predict hypertension less accurately for Black participants.

Table 3 Comparison of the Mean C-statistics of the 200 Regression Models Estimated Using the Datasets That Consisted of All Black Participants (n = 733) and the Same Number of Randomly Sampled White Participants (n = 733).


Our study investigated whether including employment status (as a proxy of job loss and instability) and psychosocial work environment in risk equations improves the accuracy of prediction of the short-term onset of hypertension. Despite work-related risk factors’ direct and indirect linkages with health and well-being, clinicians have never used such information to identify workers at elevated risk of hypertension. Unlike biomedical indicators (e.g., age, weight, height) or health behaviors routinely assessed during clinical intake (e.g., smoking, drinking, physical activity), consideration of work characteristics is only possible if a worker volunteers such information to their provider. As clinicians start to look beyond the walls of their clinics and collect data on social determinants of health that may affect their patients’ hypertension status [30], our study provides new insight into how this information can help improve hypertension care. We found that risk equations that incorporate workers’ biomedical risk factors as well as employment status and PWE predict the onset of hypertension more accurately than ones that use only biomedical risk factors. However, we also found that our enhanced risk equations predict hypertension onset more accurately for White than Black workers, even when the data used to derive such equation consists of the same number of data points from the two racial groups.

One of the major barriers to using work characteristics to predict disease onset is ensuring that such information is collected appropriately. While asking workers about their employment status may be difficult on its own and may require better data collection strategies or linkage of data from other sources [31], assessing workers’ psychosocial work environment during a short clinical encounter can pose another challenge. The risk equations we presented in this paper used the occupation-based measures of job demand, control, and support that can easily be coded from a worker’s response about the nature of work they do. In light of the COVID-19 pandemic and the need to better surveille the health status of US workers, the National Institute of Occupational Health and Safety has released a guideline that employers and health professionals can use for occupational coding based on a response to the question “what is your occupation?” [32]. Compared to the 49-question Job Content Questionnaire (JCQ) [33], which is used commonly by occupational health scientists to assess PWE, the implementation of occupation-based PWE measures requires significantly less time and the occupational coding process can be made automated in the electronic medical records system. Institutional initiatives to streamline the collection of workers’ work-related data will ensure the effective adoption of risk equations like ours and others that may have been in development for clinicians to predict hypertension trajectory and tailor hypertension care for workers more effectively. Future research should examine benefits and costs of automating the calculation of hypertension risk and making this information readily available for clinicians to guide their decisions.

It is worth noting that occupation-based PWE measures used in our models assess the average level of exposure to work-related stressors for a particular occupation, but not necessarily the level that a specific worker experiences [20]. Past research suggests that workers from different racial backgrounds experience different levels of PWE, even when they have the same occupation [20, 34,35,36]. In a racist society like the US and countries with a colonial past, residential segregation and structural racism in education jointly produce the workforce in which Black workers, particularly Black women, disproportionately hold low-status occupations relative to White workers [5]. However, even Black workers holding high-status occupations experience a more stressful and challenging PWE than their White peers. These Black workers frequently report being tokenized by their White coworkers [36]. They are also more likely to be assigned “diversity work” and experience workplace discrimination and harassment than Black workers with low-status occupations, let alone their White coworkers [35]. Different lived experiences in the workplace between Black and White workers with the same occupation may explain why our risk equations that use occupation-based PWE predict hypertension onset less accurately for Black than White workers.

While our choice to use the occupation-based PWE measures in our hypertension risk equations had the ease of data collection and the likelihood of adoption in the fast-paced clinical setting in mind, our findings of lower prediction accuracy for Black workers signal the need for additional research on ways to utilize work-related information in the real-world clinical setting. Discrepancies in prediction accuracy by workers’ race cannot/must not be corrected with the race-correction term; while adding an interaction term between race and PWE would capture the heterogeneous effect of PWE on hypertension onset by race, it does not correct the fact that workers from different racial groups have different PWE experiences for a given occupation. Future research should investigate if including other individual-level measures of social conditions, particularly those that have already been captured in electronic health records (e.g., financial resources, living arrangements, social connections) [37] will minimize discrepancies in prediction accuracy and promote equity in healthcare for workers of all backgrounds. Additionally, researchers should investigate whether the traditional JCQ can be abbreviated for a clinical intake form and evaluate whether the racial discrepancy in the prediction accuracy by race also exists in risk equations that incorporate the JCQ responses.

Our findings should be considered in light of several limitations. First, because of our relatively small sample size after exclusion, we were unable to estimate and validate our models using two separate datasets. It is possible that the validation statistics we calculated may be overly optimistic. Hence, our risk equation should be validated with data from other longitudinal cardiovascular studies. At the time of writing, this was a challenging task because although most, if not all, longitudinal studies measured participants’ employment status, they did not assess participants’ occupations in detail, hindering the merging of occupation-based PWE measures with biomedical risk data. Second, our study defined hypertension status based on the JNC-7 guideline. Although this procedure is consistent with the standard of care around the time the study data was collected, the American Heart Association (AHA) and the American College of Cardiology (ACC) released the updated guideline in 2017 that lower the clinical threshold for hypertension to 130/80 mmHg and eliminate the prehypertension category altogether. We encourage future studies with data collected after the release of the AHA/ACC guideline to replicate our protocol to examine the sensitivity of our findings. Lastly, the data used to derive our risk equations came from Black and White workers only. Thus, the risk equations presented in this paper should be examined for their prediction accuracy with data from workers from other racial backgrounds before their adoption in real-world practices.

Work is an important social determinant of hypertension and should be included in the risk prediction equations. This effort will enhance clinicians’ ability to make better decisions for hypertension care for workers with varied working conditions, which has the potential to address population health inequities. We demonstrated that including workers’ employment status and psychosocial work environment in risk equations improves their prediction accuracy. To maximize the utility of risk equations that include social determinants of health like work conditions, we must also determine how to effectively capture this information in ways that do not disrupt a clinical workflow. It is also critical to pay attention to racial implications, big or small, that can arise when developing decision tools like risk equations and address such issues appropriately.


What is known about the topic

  • Despite extensive evidence that work is an important determinant of hypertension, none of the existing risk equations use patients’ work-related characteristics to predict disease onset.

  • Several risk equations incorporate a race-correction term. This embedded racist practice has been shown to produce racial inequity in patient care.

What this study adds

  • We estimated and compared the prediction accuracy of risk equations with and without variables reflecting work-related characteristics.

  • We did not treat race as a hypertension risk predictor. Instead, we fit the “raceless” model and tested if the resulting risk equations predict hypertension equally well for Black and White people.

  • We found that the risk equations with work-related variables predict hypertension onset more accurately than those without. Our models predict hypertension onset more accurately for White than Black people.