Abstract
We used machine learning methods to investigate if body composition indices predict hypertension. Data from a cohort study was used, and 4663 records were included (2156 were male, 1099 with hypertension, with the age range of 35–70 years old). Body composition analysis was done using bioelectrical impedance analysis (BIA); weight, basal metabolic rate, total and regional fat percentage (FATP), and total and regional fat-free mass (FFM) were measured. We used machine learning methods such as Support Vector Classifier, Decision Tree, Stochastic Gradient Descend Classifier, Logistic Regression, Gaussian Naïve Bayes, K-Nearest Neighbor, Multi-Layer Perceptron, Random Forest, Gradient Boosting, Histogram-based Gradient Boosting, Bagging, Extra Tree, Ada Boost, Voting, and Stacking to classify the investigated cases and find the most relevant features to hypertension. FATP, AFFM, BMR, FFM, TRFFM, AFATP, LFATP, and older age were the top features in hypertension prediction. Arm FFM, basal metabolic rate, total FFM, Trunk FFM, leg FFM, and male gender were inversely associated with hypertension, but total FATP, arm FATP, leg FATP, older age, trunk FATP, and female gender were directly associated with hypertension. AutoMLP, stacking and voting methods had the best performance for hypertension prediction achieving an accuracy rate of 90%, 84% and 83%, respectively. By using machine learning methods, we found that BIA-derived body composition indices predict hypertension with acceptable accuracy.
Introduction
Hypertension is one of the most important and preventable causes of cardiovascular disease (CVD), stroke, chronic kidney disease, and dementia which caused approximately 8.5 million deaths in 2015, in low & middle-income countries1. Reports show that hypertension prevalence is 25% in Iran2. Hypertension depends on well-known risk factors such as age, gender, family history, smoking, alcohol consumption, central obesity, overweight and physical inactivity3,4. Obesity has gained significant attention over the past years5.
Body mass index (BMI) is widely used for anthropometric measurements, and regardless of inaccuracy, it is still commonly used to determine obesity and assess health risks such as hypertension6. Complementary measures such as waist circumference, waist-to-hip ratio (WHR), and body composition analysis improve the prognostic efficiency of BMI7. Evidence shows that body fat distribution is a more vital determinant of cardiovascular morbidity and mortality than increased fat mass8,9,10; further indicating that detailed assessment of body composition is beneficial for health risk estimations.
In the past few years, growing number of researchers have used machine learning and data mining algorithms to diagnose and treat health conditions such as heart11 and brain12 diseases. Their non-invasive nature and accuracy have enabled health professionals to quickly identify at-risk individuals and use more efficient preventive and managing strategies13.
In this study, we used machine learning approaches to investigate whether BIA-derived body composition indices predict hypertension in a cohort of patients.
Methods
Study design and participants
Fasa cohort study14 recruited at least 10,000 people and assessed predisposing factors for non-communicable diseases in rural regions of Fasa, Iran. In the present study, we used a subset of their data of 4663 records in which 2156 were male, 1099 had HTN, and the age range was 35–70. hypertension diagnosis was based on the blood pressure threshold defined by ACC/AHA guidelines15. All participants had given informed consent, and the Shiraz University of Medical Sciences ethics committee approved this study.
Body composition analysis
Body composition analysis was performed using eight electrodes (Tanita Segmental Body Composition Analyzer BC-418 MA Tanita Corp, Japan) BIA machines. The following variables were measured:
-
1.
Fat mass (FATM): Total fat mass (FATM), Left and Right Leg Fat Mass (LLFATM & RLFATM), Left and Right Arm Fat Mass (LAFATM & RAFATM), and Trunk Fat Mass (TRFATM).
-
2.
Fat percentage (FATP): Total Fat Percentage (TFATP), Left and Right Leg Fat Percentage (LLFATP & RLFATP), Left and Right Arm Fat Percentage (LAFATP & RAFATP), Trunk Fat Percentage (TRFATP).
Fat percentage is calculated as (fat mass)/weight × 100
-
3.
Fat-free mass (FFM): Total Fat-Free Mass (FFM), Left and Right Leg Fat-Free Mass (LLFFM & RLFFM), Left and Right Arm Fat-Free Mass (LAFFM & RAFFM), Trunk Fat-free Mass (TRFFM).
-
4.
Basal metabolic rate (BMR).
Dataset
Our dataset included 4663 records of which 1099 were hypertensive. Among 2156 males and 2507 females, 430 and 669 cases were hypertensive, respectively. Input features were: age (Between 35 and 70), gender ID (1: male, 2: female), BMR, FATM, FATP, FFM, LLFATP, RLFATP, LLFFM, RLFFM, LLFATM, RLFATM, LAFATP, RAFATP, LAFATM, RAFATM, LAFFM, RAFFM, TRFATP, TRFATM, and TRFFM. The target feature is the discrete binary variable of hypertension i.e. Yes or No.
It is noted that Institutional approval was granted for the use of the patient datasets in research studies for diagnostic and therapeutic purposes. Approval was granted on the grounds of existing datasets. Informed consent was obtained from all of the patients in this study. All methods were carried out in accordance with relevant guidelines and regulations. Ethical approval for the use of these data was obtained from the Tehran Omid hospital.
Investigated machine learning and data mining algorithms
We utilized some of the most efficient classification algorithms, such as Support Vector Classifier (SVC)16, Decision Tree (DT)17, Stochastic Gradient Descend (SGD) Classifier18, Logistic Regression (LR)19, Gaussian Naïve Bayes (GNB)20, K-Nearest Neighbor (K-NN)21, Multi-Layer Perceptron (MLP)22, Random Forest (RF)23, Gradient Boosting (GB)24, Histogram-based Gradient Boosting (HGB)25, Bagging26, Extra Tree (ET)27, Ada Boost28, Voting29, and Stacking30.
These algorithms are briefly explained, and references for more detailed descriptions about them are provided. In the following part, we introduce metrics for evaluating the effectiveness of the algorithms.
To classify the data, SVC tries to find the best hyperplane to separate the different classes. The criterion to evaluate the hyper-plane is maximizing its distance to the sample points. SVC has a limitation compensated by the Support Vector Machine (SVM) non-linearly. It is the difference between SVC and SVM. In SVC, the hyper-plane classifies the data linearly. However, in SVM, the algorithm separates the dataset non-linearly31.
DT is a supervised learning algorithm used for classification and regression. This method tries to learn a model that can predict the value of a target feature by learning some decision rules inferred from the features of samples32.
SGD classifier is a linear classifier optimized by the SGD33.
LR is a classification algorithm used in machine learning; it uses a logistic function to model the dependent variable. This variable can only have two values. Therefore, LR is only used in solving problems with binary target features. Moreover, the sigmoid function in LR maps the predicted values to the probabilities34.
GNB is a probabilistic classification algorithm that utilizes the Bayes theorem. It assumes that the variables are independent of each other. This algorithm requires training data to estimate the parameters needed for classification. Since its implementation is simple, it is used to solve many classification problems20.
K-NN algorithm is a non-parametric, supervised classifier that uses proximity to perform classification. In this algorithm, the assumption is that similar points are located near each other. A class label is assigned to a sample based on the majority vote between its K nearer samples35.
MLP is a supervised learning algorithm that tries to learn a function based on a data set. The learned function is used to predict the class for a new sample. This algorithm has a network structure consisting of several layers of nodes. Each layer is connected to the next layer in the network. Nodes in the first layer represent input data. Other nodes map inputs to outputs by linearly combining them using a set of weights and a bias and applying an activation function36.
RF is an ensemble learning method for classification which consists of many decision trees. It is created based on training data. The output of this algorithm is the class that most trees suggest. This algorithm can be used to avoid over-fitting the training set. Random forest performance is usually better than decision tree classifiers. However, the performance improvement usually depends on the data type37.
Another machine learning algorithm is GB which performs prediction based on a set of weak prediction models such as decision trees. GB is one of the most popular methods of structured classification and predictive regression modeling and can cover a wide range of data sets. However, this method suffers slow training, mainly when used on large data sets (number of samples ≥ 10,000). In order to solve this problem, the trees added to the set are trained by discretization (binning) of continuous input variables to hundreds of unique values24. This modification dramatically increases the algorithm execution speed compared to the Gradient Boosting Classifier. GB ensembles that implement this technique are referred to as HGB sets. It can also manage missing values. During training, at each split point, the tree learns whether samples with missing values should be assigned to the left or right child based on the potential gain. If there are no missing values for a given feature during training, samples with missing values are mapped to the child that has the highest number of samples25.
A bagging classifier is an ensemble meta-classifier that consists of a set of base classifiers applied to random subsets of the original dataset. These classifiers’ results are collected, and a final prediction is derived according to them. The base classifiers are trained in parallel on disjoint training sets. Much of the original data may be repeated in the resulting training set, while other data may be omitted38.
ET classifier is an ensemble learning technique, also known as an extremely randomized tree classifier. This algorithm uses the results of several uncorrelated decision trees collected in a forest to perform the classification process. The performance of this algorithm is very similar to an RF classification. However, building decision trees in the forest is different from RF. In this algorithm, each decision tree is built from the original training sample. At each test node, each tree is presented with a random sample containing a subset of the feature set. Each decision tree must select the best feature for splitting the data based on mathematical criteria such as the Gini index. Random selection of samples leads to multiple uncorrelated decision trees27.
An Adaptive Boosting or Adaboost classifier is a meta-classifier algorithm. This ensemble algorithm starts by fitting a classifier on the original data set. It then tries to classify the same data set again using additional copies of the classifier, except that the weights of the misclassified samples are adjusted so that subsequent classifiers focus more on complex cases. The outputs of these classifiers are combined using weighted summation to create the final classification output39.
The voting classifier is a meta-classifier that trains base models the outputs of which are used to guess the final result. Aggregation of the results of base learners is done in two ways: hard voting and soft voting. In the former, voting is done based on the output class declared by each base learner, while in the latter, the output class is based on the probability predicted by the base classes40.
Stacking or Stacked Generalization is an ensemble meta-learning algorithm. Using this algorithm makes it possible to learn how to combine the results of two or more basic machine learning algorithms in the best possible way. The advantage is that capabilities of a wide range of well-performing algorithms can be exploited to achieve performance that none of them can achieve individually41.
We will apply these algorithms to our dataset but before that, some preprocesses must be performed on the training data.
Data preprocessing
To improve the performance of algorithms, some feature selection algorithms were used. These algorithms are used for selecting a subset of features for model construction. They are commonly used for simplification of constructed models to make them easier to interpret. Using these techniques leads to shorten training time, and void the curse of dimensionality. The feature selection algorithms tested in our research are best first42, genetic algorithm43, greedy forward selection44, greedy backward elimination44, decision tree45, random forest46, and particle swarm optimization (PSO)47. Among them, genetic algorithm showed the best performance and the rest of this research was organized according to the results of it. This algorithm declared that FATP, AFFM, BMR, FFM, TRFFM, AFATP, LFATP, and older age were the top features in hypertension prediction.
Evaluation metrics
In this research, we used the confusion matrix to test and compare the algorithms’ effectiveness. This matrix is a popular metric to evaluate the performance of binary and multi-class classification problems. Figure 1 shows a confusion matrix48,49,50.
The confusion matrix shows how many outputs are correctly classified and how many are misclassified. In this table, “TN”, for true negative, shows how many negative samples are correctly classified. Similarly, “TP” stands for true positive and indicates how many positive samples are correctly classified. The term "FP" stands for false positive and represents the number of samples misclassified as positive. Finally, "FN" stands for false negative and indicates the number of positive samples misclassified as negative. Based on the values of this matrix, one of the most common metrics used for evaluating classification algorithms –accuracy- is calculated based on Eq. (1)51,52.
Precision, sensitivity (or recall), specificity, and F1-score are some other performance metrics that are very popular. They are calculated according to the following equations:
Using these metrics, the above mentioned classification algorithms are compared. The flowchart of proposed method is shown in Fig. 2.
As there is an obvious category imbalance between normal individuals (negative cases) and diseased individuals (positive cases), during model training, the prediction results may be biased to judge as normal individuals, resulting in high specificity and low sensitivity. To solve this issue, three oversampling and two undersampling methods were applied to the dataset. SMOTE53, Random Oversampling54, ADASYN55 methods are oversampling and Random Undersampling56 and NearMiss57 methods were used for undersampling. However, the results of applying classification methods on oversampling data generated by SMOTE and undersampling data generated by NearMiss methods were reported because of better performance. Using SMOTE, the number of cases was increased to 7128 with an equal number of positive and negative cases. When NearMiss was used for undersampling, the number of cases was decreased to 2198 with equal number of samples in each class.
In addition, the MetaCost58 method was used to increase the penalty of negative cases.
Experimental results
In this section, we report and compare the results of applying classification algorithms mentioned in the methodology section. These algorithms are implemented in Python version 3.10,0 and its ready-made modules were used. These algorithms were run in Windows 11 operating system. The default settings of the algorithms are used in this research, except those listed in Table 1.
Tables 2, 3, and 4 list the accuracy, precision, recall, f1-score, and AUC of train and test data of these algorithms when oversampling, undersampling, and original data (while the penalty for negative cases in the model was increased) were used, respectively. In our research, genetic algorithm showed the best performance. Therefore, the results reported in Tables 2, 3, and 4 were calculated according to this feature selection algorithm.
AutoMLP has the best accuracy commonly followed by Stacking and Voting. The performance of different algorithms on the training set was also reported. This helps to check whether the model training is over-fitting or under-fitting, and helps with better adjustment of model parameters to improve the classification results. As it is clear in these tables, the oversampling performance is better than undersampling or original sampling methods.
Discussion
In the present study and a cohort population, we used machine learning methods and found that BIA-derived body composition indices predict hypertension with an acceptable accuracy. FATP, AFFM, BMR, FFM, TRFFM, AFATP, LFATP, and older age were the top features in hypertension prediction. FATP, AFATP, LFATP, TRFATP, higher age, and female gender directly associated with HTN. But, FFM, AFFM, LFFM, TRFFM, BMR, and male gender were inversely linked to HTN. AutoMLP, stacking and voting methods had the best performance for hypertension prediction showed an accuracy rate of 90%, 84% and 83%, respectively.
Total FATP and FFM
Various other studies confirm the direct link of body fat mass (and percentage) with blood pressure59,60,61. Park et al.62, in a prospective cohort study, showed that a high body fat percentage (more than 19.9% in men and 32.5% in women) was associated with an increased risk of incident hypertension regardless of BMI, waist circumference, and WHR. Although body fat mass and percentage are superior to BMI in morbidities risk assessment, a study63 on Iranian population showed that BMI predicts CVD better than body fat percentage. Another study64 on American postmenopausal women with normal BMI found no relation between whole-body fat mass and percentage of CVD risk; although regional body fat had significant associations. These discrepancies may be due to different analysis methods of body composition, and ethnicity.
Contrary to our results, some investigations in adult and pediatric populations established that FFM is positively related to systolic, diastolic, or mean blood pressure65,66,67,68,69,70,71. Korhonen et al.66 attribute this finding to muscle mass properties; during daytime and contraction, skeletal muscles release myokines that may increase blood pressure. This explanation confirms the findings of Ye et al.60 in a Chinese population: total skeletal mass (TSM) indices -primarily arm lean body mass- are positively associated with blood pressure, pre-HTN, and HTN.
Trunk FATP and FFM
Previous studies have established the positive association of TRFATM with hypertension and CVD72, and our data further support that BIA-measured abdominal adiposity is positively associated with hypertension73. Chen et al.64 assessed CVD incidence in postmenopausal women with normal BMI during a median of 17.9 years. The authors used Dual X-ray Absorptiometry (DXA) and found that higher TRFATP and lower LFATP were associated with higher CVD risk.
In an opinion survey71, using DXA body measurement and machine learning methods, researchers depicted that TRFAT correlates with both mean systolic and diastolic pressure -the same as our findings. The authors have not provided trunk lean body mass results but declare that total lean body mass positively correlates with mean systolic blood pressure. In general, evidence is lacking about the association between TRFFM and hypertension risk.
Appendicular FATP and FFM
There are conflicting data about arm and leg fat association with HTN. In a study of 3130 Chinese participants by Ye et al.60, fat mass percentage and lean body mass, especially in the arm, were positively associated with increased blood pressure. Nevertheless, leg lean mass showed no significant association with systolic and diastolic pressure. In another study74 on 399 participants, authors showed that: (1) arm fat was a positive predictor for blood pressure, (2) after full adjustment, loss of lean leg mass directly correlated with reductions in systolic blood pressure, (3) loss of leg fat and lean mass had direct beneficial changes in markers of CVD risk. More conflicting results exist: positive association of mid-upper arm circumference with increased hypertension risk75, and significant inverse association between the leg and arm total fat percentage with hypertension76.
The exact mechanism by which LFATP and LFFM modulate blood pressure is still unclear. Regional fat deposition in the legs, mainly subcutaneous, reduces fatty acid turnover and downregulates triglyceride production in the blood. Therefore, it acts as a “metabolic sink” and preserves other tissues from lipotoxicity, protects endothelium against damage, and maintains elasticity and compliance of arterioles74,77. Another possible mechanism is that as subcutaneous fat, it may decrease the activation of renin–angiotensin–aldosterone and the sympathetic system77. Also, for FFM, some studies declare that muscle mass has a protective role in blood pressure78,79. However, Ye et al.60 suggest that previous studies on appendicular lean mass or skeletal muscle did not control fat mass and fat distribution in their analysis, leading to inaccurate results.
Gender and age
Sex differences did not predict hypertension in our study population; however, the association was negative in males and positive in females. Previous studies showed that in men, lower body fat (thigh or gynoid) had a more protective effect on cardio-metabolic risks, such as elevated blood pressure. The effects of sex hormones on subcutaneous fat mass in these regions might explain this sex difference80.
Based on our results, age had a positive association with hypertension. Likewise, a study on the Chinese population age indicated an independent association in both men and women with hypertension81. However, results are not always positive; in a study performed on Brazilian children and adolescents, regardless of sex, the authors observed no significant association between age and systolic blood pressure82.
BMR
Our study demonstrated a strong inverse relationship between BMR and hypertension, but this is not reported elsewhere. A study in Bangladeshi adults showed a positive relation between BMR and blood pressure, suggesting that upregulated BMR may elevate blood pressure by accelerating thyroid hormone levels and increasing sympathetic tone and oxidative damage83. Further investigation is required.
Strengths and limitations
The implication of machine learning in a cohort of patients is the main strength of our study. Machine learning methods are more precise than traditional ones, so we believe that our findings can resolve the conflicting results regarding our research question. Nevertheless, this study has some limitations including lack of data about the use of anti-hypertensive drugs and other anthropometric indices such as waist circumference. Also, BIA of TRFAT do not differentiate between visceral and subcutaneous abdominal adipose tissues. However, we aimed to use an available method for body composition analysis and BIA is a simple, safe, and readily available method –unlike DEXA, CT scan, and MRI. We suggest that future prospective studies use machine learning methods and body composition analyses to predict hypertension among different ethnic groups. In addition, this study can be extended to more clinical samples. Consequently, classification methods especially the autoMLP are expected to have better performance.
Conclusion
Given that body fat and its distribution are risk factors for hypertension, we used machine learning methods to study these relations. With an acceptable accuracy, we confirmed that BIA-derived body composition predicts hypertension. Also, total and regional FATP, higher age, and female gender had a positive relation with hypertension while it was the exact contrary for total and regional FFM, BMR, and male gender.
Data availability
Data are available from the authors upon reasonable request from the corresponding author, Hamed Bazrafshan Drissi.
References
Zhou, B. et al. Global epidemiology, health burden and effective interventions for elevated blood pressure and hypertension. Nat. Rev. Cardiol. 18(11), 785–802 (2021).
Oori, M. J. et al. Prevalence of HTN in Iran: Meta-analysis of published studies in 2004–2018. Curr. Hypertens. Rev. 15(2), 113–122 (2019).
Qiu, L. et al. Prevalence and risk factors of hypertension, diabetes, and dyslipidemia among adults in Northwest China. Int. J. Hypertens. 2021, 1–10 (2021).
Carson, A. P. et al. Ethnic differences in hypertension incidence among middle-aged and older adults: The multi-ethnic study of atherosclerosis. Hypertension 57(6), 1101–1107 (2011).
Goto, K. et al. An association between subcutaneous fat mass accumulation and hypertension. J. Gen. Fam. Med. 22(4), 209–217 (2021).
Nuttall, F. Q. Body mass index: Obesity, BMI, and health: A critical review. Nutr. Today 50(3), 117 (2015).
González-Muniesa, P. et al. Obesity. Nat. Rev. Dis. Primers 3, 17034 (2017).
Blüher, M. & Laufs, U. New concepts for body shape-related cardiovascular risk: Role of fat distribution and adipose tissue function. Eur. Heart J. 40(34), 2856–2858 (2019).
Yano, Y. et al. Regional fat distribution and blood pressure level and variability: The Dallas Heart Study. Hypertension 68(3), 576–583 (2016).
Gowri, S. M. et al. Distinct opposing associations of upper and lower body fat depots with metabolic and cardiovascular disease risk markers. Int. J. Obes. 45(11), 2490–2498 (2021).
Joloudari, J.H. et al. Application of artificial intelligence techniques for automated detection of myocardial infarction: A review. Physiological Measurement (2022).
Shoeibi, A. et al. Detection of epileptic seizures on EEG signals using ANFIS classifier, autoencoders and fuzzy entropies. Biomed. Signal Process. Control 73, 103417 (2022).
Chowdhury, M. Z. I. et al. Prediction of hypertension using traditional regression and machine learning models: A systematic review and meta-analysis. PLoS One 17(4), e0266334 (2022).
Farjam, M. et al. A cohort study protocol to analyze the predisposing factors to common chronic non-communicable diseases in rural areas: Fasa Cohort Study. BMC Public Health 16(1), 1–8 (2016).
ACO Cardiology. Guideline for the prevention, detection, evaluation, and management of high blood pressure in adults. J. Am. Coll. Cardiol. https://doi.org/10.1016/j.jacc.2017.07.745 (2017).
Lau, K. & Wu, Q. Online training of support vector classifier. Pattern Recogn. 36(8), 1913–1920 (2003).
Chiu, P.K.-F. et al. Enhancement of prostate cancer diagnosis by machine learning techniques: An algorithm development and validation study. Prostate Cancer Prostatic Dis. 25(4), 672–676 (2022).
Song, S., Chaudhuri, K. and Sarwate, A.D. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, IEEE, 2013.
Hosmer, D. W. Jr., Lemeshow, S. & Sturdivant, R. X. Applied Logistic Regression Vol. 398 (Wiley, 2013).
Ontivero-Ortega, M. et al. Fast Gaussian Naïve Bayes for searchlight classification analysis. Neuroimage 163, 471–479 (2017).
Wu, Y., Ianakiev, K. & Govindaraju, V. Improved k-nearest neighbor classification. Pattern Recogn. 35(10), 2311–2318 (2002).
Camacho Olmedo, M. T. et al. Geomatic Approaches for Modeling Land Change Scenarios. An Introduction (Springer, 2018).
Biau, G. & Scornet, E. A random forest guided tour. Test 25, 197–227 (2016).
Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021).
Friedman, J. H. Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002).
Zareapoor, M. & Shamsolmoali, P. Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia Comput. Sci. 2015(48), 679–685 (2015).
Abhishek, L. Optical character recognition using ensemble of SVM, MLP and extra trees classifier. In 2020 International Conference for Emerging Technology (INCET), IEEE, 2020.
Schapire, R.E. Explaining adaboost. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, p. 37–52 (2013).
Parhami, B. Voting algorithms. IEEE Trans. Reliab. 43(4), 617–629 (1994).
Sikora, R. A modified stacking ensemble machine learning algorithm using genetic algorithms. In Handbook of Research on Organizational Transformations Through Big Data Analytics (eds Tavana, M. & Puranam, K.) 43–53 (IGi Global, 2015).
Alizadehsani, R. et al. Coronary artery disease detection using computational intelligence methods. Knowl.-Based Syst. 109, 187–197 (2016).
Alizadehsani, R. et al. Machine learning-based coronary artery disease diagnosis: A comprehensive review. Comput. Biol. Med. 111, 103346 (2019).
Kabir, F. et al. Bangla text document categorization using stochastic gradient descent (sgd) classifier. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP), IEEE, 2015.
Ayoobi, N. et al. Time series forecasting of new cases and new deaths rate for COVID-19 using deep learning methods. Results Phys. 27, 104495 (2021).
Alizadehsani, R. et al. Coronary artery disease detection using artificial intelligence techniques: A survey of trends, geographical differences and diagnostic features 1991–2020. Comput. Biol. Med. 128, 104095 (2021).
Shoeibi, A. et al. (2021) Applications of epileptic seizures detection in neuroimaging modalities using deep learning techniques: methods, challenges, and future works. Preprint at https://arxiv.org/arXiv:2105.14278
Khozeimeh, F. et al. Combining a convolutional neural network with autoencoders to predict the survival chance of COVID-19 patients. Sci. Rep. 11(1), 15343 (2021).
Nahavandi, D. et al. Application of artificial intelligence in wearable devices: Opportunities and challenges. Comput. Methods Programs Biomed. 213, 106541 (2022).
Asgharnezhad, H. et al. Objective evaluation of deep uncertainty predictions for covid-19 detection. Sci. Rep. 12(1), 1–11 (2022).
Moridian, P. et al. (2022) Automatic autism spectrum disorder detection using artificial intelligence methods with MRI neuroimaging: A review. Preprint at https://arxiv.org/arXiv:2206.11233
Khozeimeh, F. et al. RF-CNN-F: Random forest with convolutional neural network features for coronary artery disease diagnosis based on cardiac magnetic resonance. Sci. Rep. 12(1), 11178 (2022).
Xu, L., Yan, P. & Chang, T. Best first strategy for feature selection. In 9th International Conference on Pattern Recognition (eds Xu, L. et al.) (IEEE Computer Society, 1988).
Leardi, R., Boggia, R. & Terrile, M. Genetic algorithms as a strategy for feature selection. J. Chemom. 6(5), 267–281 (1992).
Caruana, R. & Freitag, D. Greedy attribute selection. In Machine Learning Proceedings 1994 28–36 (Elsevier, 1994).
Zhou, H. et al. A feature selection algorithm of decision tree based on feature weight. Expert Syst. Appl. 164, 113842 (2021).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).
Xue, B., Zhang, M. & Browne, W. N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 18, 261–276 (2014).
Nahavandi, S. et al. (2022) A Comprehensive Review on Autonomous Navigation. Preprint at https://arxiv.org/arXiv:2212.12808
Alizadehsani, R. et al. Swarm intelligence in internet of medical things: A review. Sensors 23(3), 1466 (2023).
Karami, M. et al. (2023) Revolutionizing Genomics with Reinforcement Learning Techniques. Preprint at https://arxiv.org/arXiv:2302.13268
Kakhi, K. et al. The internet of medical things and artificial intelligence: Trends, challenges, and opportunities. Biocybern. Biomed. Eng. https://doi.org/10.1016/j.bbe.2022.05.008 (2022).
Nasab, R.Z. et al. (2022) Deep Learning in Spatially Resolved Transcriptomics: A Comprehensive Technical View. Preprint at https://arxiv.org/arXiv:2210.04453
Torgo, L. et al. Smote for regression. In Progress in Artificial Intelligence: 16th Portuguese Conference on Artificial Intelligence, EPIA 2013, Angra do Heroísmo, Azores, Portugal, September 9–12, 2013. Proceedings 16, Springer, 2013.
Mohammed, R. J. Rawashdeh, and M. Abdullah. Machine learning with oversampling and undersampling techniques: overview study and experimental results. In 2020 11th International Conference on Information and Communication Systems (ICICS), IEEE, 2020.
He, H. et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IEEE, 2008.
Prusa, J. et al. Using random undersampling to alleviate class imbalance on tweet sentiment data. In 2015 IEEE International Conference on Information Reuse and Integration, IEEE, 2015.
Bao, L. et al. Boosted near-miss under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets. Neurocomputing 172, 198–206 (2016).
Domingos, P. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (1999).
Li, R. et al. The association of body fat percentage with hypertension in a Chinese rural population: The Henan rural cohort study. Front. Public Health 8, 70 (2020).
Ye, S. et al. Associations of body composition with blood pressure and hypertension. Obesity 26(10), 1644–1650 (2018).
Chen, M. et al. Association between body fat and elevated blood pressure among children and adolescents aged 7–17 years: Using dual-energy X-ray Absorptiometry (DEXA) and bioelectrical impedance analysis (BIA) from a cross-sectional study in China. Int. J. Environ. Res. Public Health 18(17), 9254 (2021).
Park, S. K. et al. Body fat percentage, obesity, and their relation to the incidental risk of hypertension. J. Clin. Hypertens. 21(10), 1496–1504 (2019).
Sheibani, H. et al. A comparison of body mass index and percent body fat as predictors of cardiovascular risk factors. Diabetes Metab. Syndr. 13(1), 570–575 (2019).
Chen, G.-C. et al. Association between regional body fat and cardiovascular disease risk among postmenopausal women with normal body mass index. Eur. Heart J. 40(34), 2849–2855 (2019).
He, H. et al. Effect of fat mass index, fat free mass index and body mass index on childhood blood pressure: A cross-sectional study in south China. Transl. Pediatr. 10(3), 541 (2021).
Korhonen, P. E. et al. Both lean and fat body mass associate with blood pressure. Eur. J. Intern. Med. 91, 40–44 (2021).
Rao, K. M. et al. Correlation of Fat Mass Index and Fat-Free Mass Index with percentage body fat and their association with hypertension among urban South Indian adult men and women. Ann. Hum. Biol. 39(1), 54–58 (2012).
Takase, M. et al. Association between the combined fat mass and fat-free mass index and hypertension: The Tohoku Medical Megabank Community-based Cohort Study. Clin. Exp. Hypertens. 43(7), 610–621 (2021).
Vaziri, Y. et al. Lean body mass as a predictive value of hypertension in young adults, in Ankara, Turkey. Iran. J. Public Health 44(12), 1643 (2015).
Xu, R. et al. Percentage of free fat mass is associated with elevated blood pressure in healthy Chinese children. Hypertens. Res. 42(1), 95–104 (2019).
Nath, T., Ahima, R. S. & Santhanam, P. DXA measured body composition predicts blood pressure using machine learning methods. J. Clin. Hypertens. 22(6), 1098 (2020).
Goswami, B. et al. Role of body visceral fat in hypertension and dyslipidemia among the diabetic and nondiabetic ethnic population of Tripura—A comparative study. J. Fam. Med. Prim. Care 9(6), 2885 (2020).
Takeoka, A. et al. Intra-abdominal fat accumulation is a hypertension risk factor in young adulthood: A cross-sectional study. Medicine 95(45), e5361 (2016).
Clifton, P. M. Relationship between changes in fat and lean depots following weight loss and changes in cardiovascular disease risk markers. J. Am. Heart Assoc. 7(8), e008675 (2018).
Hou, Y. et al. Association between mid-upper arm circumference and cardiometabolic risk in Chinese population: A cross-sectional study. BMJ Open 9(9), e028904 (2019).
Visaria, A. et al. Leg and arm adiposity is inversely associated with diastolic hypertension in young and middle-aged United States adults. Clin. Hypertens. 28, 1–12 (2022).
Porter, S. A. et al. Abdominal subcutaneous adipose tissue: A protective fat depot?. Diabetes Care 32(6), 1068–1075 (2009).
AlKaabi, L. A. et al. Predicting hypertension using machine learning: Findings from Qatar Biobank Study. PLoS One 15(10), e0240370 (2020).
Butcher, J. T. et al. Increased muscle mass protects against hypertension and renal injury in obesity. J. Am. Heart Assoc. 7(16), e009358 (2018).
Yang, Y. et al. Sex differences in the associations between adiposity distribution and cardiometabolic risk factors in overweight or obese individuals: A cross-sectional study. BMC Public Health 21(1), 1232 (2021).
Liu, Y. et al. Gender stratified analyses of the association of skinfold thickness with hypertension: A cross-sectional study in general Northeastern Chinese residents. Int. J. Environ. Res. Public Health 15(12), 2748 (2018).
Zaniqueli, D. et al. Muscle mass is the main somatic growth indicator associated with increasing blood pressure with age in children and adolescents. J. Clin. Hypertens. 22(10), 1908–1914 (2020).
Ali, N. et al. Hypertension prevalence and influence of basal metabolic rate on blood pressure among adult students in Bangladesh. BMC Public Health 18(1), 1–9 (2018).
Author information
Authors and Affiliations
Contributions
M.A.N., S.J., A.A., M.S., A.D., M.M., M.R., G.G., R.A., M.B., and H.B. are helping with writing the text. H.B.D. and S.M.S.I. have supervised us during the writing of the paper with their invaluable comments. At the end, this manuscript has resulted by the collaboration of all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Nematollahi, M.A., Jahangiri, S., Asadollahi, A. et al. Body composition predicts hypertension using machine learning methods: a cohort study. Sci Rep 13, 6885 (2023). https://doi.org/10.1038/s41598-023-34127-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-34127-6
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.