Introduction

High altitude increases the risk of myocardial ischemia1, primarily caused by occlusive coronary artery disease, coronary microvascular dysfunction, or both. Exposure to high altitudes leads to reduced arterial oxygen saturation, worsening pre-existing cardiovascular disease1. In patients with coronary artery disease living at high altitudes, myocardial ischemia is driven by increased myocardial oxygen demand due to elevated heart rate, enhanced contractility, and increased afterload. This imbalance can worsen ischemia due to respiratory alkalosis or coronary artery vasoconstriction secondary to hypoxia or spasm2,3.

Currently, there is a scarcity of epidemiological data pertaining to populations exposed to high altitudes and the incidence of myocardial ischemia. Soldiers who are frequently obliged to undergo training and perform operations in high altitude areas for mission purposes, are vulnerable to both acute and chronic cardiovascular events. The ability to accurately predict the occurrence of myocardial ischemia in individuals entering high altitude environments would facilitate early interventions or appropriate personnel selection.

In this study, we aim to fill this gap by establishing a machine learning-based predictive model for the occurrence of myocardial ischemia within six months of entering high altitudes and identifying the key clinical features that significantly contribute to the model's predictive capabilities. By providing a reliable predictive model and identifying crucial clinical features, this study will aid in ensuring the safety of military personnel and others who are required to enter high altitude environments.

Methods

Study design and population

In this study, we employed a prospective cohort study design. The sample population was drawn from soldiers who received health examinations at the 920th Hospital of the Joint Logistic Support Force between January 2022 and June 2022, and who were scheduled to undergo high-altitude training (at an altitude of 3000–3500 m) within six months. Inclusion criteria: 1. Male and female military personnel aged ≥ 18 & ≤ 60; 2. Underwent health examination at the 920th Hospital between January 2022 and June 2022; 3. Scheduled to undergo high-altitude training within 6 months of health examination; 4. Completed first self-reported questionnaire; 5. Completed second health examination and questionnaire at high-altitude training site. Exclusion criteria: 1. History of structural heart disease, hypertension, coronary artery disease, chronic obstructive pulmonary disease, pneumonia, asthma; 2. Abnormal ECG findings. 3. Did not ultimately participate in high-altitude training abnormal electrocardiogram; 4. Developed acute mountain sickness, severe training injuries, or COVID-19 during study period.

In total, 4000 individuals participated in the health examination, of whom 3800 were men and 200 were women aged 18 to 54 years old. The examination included chest X-rays, electrocardiograms, ultrasounds, and hematological tests. After careful screening, we excluded a total of 1093 individuals from the study. Specifically, we excluded 189 cases (both male and female) with a history of pneumonia, asthma, hypertension, abnormal electrocardiogram findings (including ST-T changes, abnormal Q waves, and arrhythmia), and myocardial hypertrophy, as well as 904 individuals who did not ultimately participate in high-altitude training. The remaining 2907 soldiers completed the first self-reported electronic questionnaire, which included smoking status, duration and amount of smoking, recent physical ability test (3 km test, Sit-ups and Serpentine Run) scores (Table 1), Chesttightness or Chestpain, altitude of residence, education level, marital status, and content from the Scale for Outcomes in Parkinson′s Disease for Autonomic Symptom (SCOPA-AUT) score4. A medical team performed a health examination on the population at the high-altitude training site from December 2022 to January 2023, which included chest X-rays, electrocardiograms, heart rate (HR), systolic blood pressure (SBP) and diastolic blood pressure (DBP), oxygen saturation (OS), and a second self-reported electronic questionnaire. We also obtained electronic medical records from the previous six months and excluded 52 individuals with acute mountain sickness, severe training injuries, or COVID-19 infections, resulting in a final sample of 2855 individuals, including 2810 men and 45 women (Fig. 1; Additional file 1).

Table 1 Characteristics of participants in the training and test datasets.
Figure 1
figure 1

Workflow for data management and myocardial ischemia prediction model development. Data came from 4000 adults who completed physical exams at the 920th Hospital of Joint Logistics Support Force between January and June 2022 and prepared to enter the plateau within 6 months. After exclusion, there were 2855 people remaining. Participants' data (n = 2855) were randomly assigned to a prediction model training dataset (n = 2141) and a test dataset (n = 714) in a 3:1 ratio following preprocessing. fivefold cross-validation was used for training and selecting the prediction model, and five classification algorithms were evaluated. Feature selection was conducted using the RFE (Recursive Feature Elimination) algorithm. The final prediction model was validated using a test dataset. LR logistic regression, RF random forest, XGBoost eXtreme gradient boosting, KNN K-nearest neighbor, SVM support vector machines.

Outcome variable

In the present study, the diagnosis of myocardial ischemia was determined through evaluation of electrocardiogram (ECG) results, specifically based on the following criteria: (1) the presence of horizontal or downsloping ST depression of greater than 0.5 mm at the J-point in at least two consecutive leads, and (2) the presence of T wave inversion with a depth of at least 1 mm in at least two continuous leads with a dominant R wave (R/S ratio > 1)5.

Considering the challenges of large-scale population screening at high altitudes, we are restricted to the use of ECG alone for detecting myocardial ischemia. The definition of myocardial ischemia in this research is based on ECG examinations conducted during resting conditions throughout the high-altitude training period. Participants were required to abstain from any form of physical training, including various sports activities, for a minimum of 48 h prior to the ECG examination.

Candidate predictors

In this prospective cohort study, we incorporated clinical results from two health examinations conducted prior to and following entry into high-altitude areas as initial variables. Variables such as the number of health examinations and home address were excluded, as were highly correlated variables such as education level and marital status. Univariate analysis was performed and only hematological test results with a P-value < 0.001 were retained in the final variables. Subsequently, feature selection and engineering were applied to select the most pertinent features and create new variables that could prove useful for our predictive model. To achieve our goal of predicting the risk of myocardial ischemia upon entering high-altitude areas, we included 27 variables in our machine learning analysis, such as age, gender, body mass index (BMI), SBP, DBP, HR, OS, Highland acclimatization training, Altitude of original station, 3 km test in Non-High Altitude (Score), Sit-ups in Non-High Altitude (Score), Serpentine Run in Non-High Altitude (Score), and SCOPA-AUT SCORE in Non-High Altitude (Table 1).

Sample size estimation

The use of machine learning models in predicting binary clinical outcomes has gained significant attention in recent years, but determining the appropriate sample size for these models remains a challenge. While no established method for determining sample size currently exists, a conventional approach suggests a minimum of 10 outcome events per variable for building a binary clinical outcome prediction model6. In this study, we utilized the R pmsampsize package to calculate a sample size of 1100 for a traditional logistic regression model and our study utilized a sample size greater than this calculation.

Datasets

We preprocessed the data by standardizing it and dividing it randomly into a training set (75%) and a test set (25%). The outcome variable was dichotomized, and continuous and categorical variables were analyzed using appropriate statistical tests. Significant variables (p < 0.001) were included in subsequent analysis focused on hematological findings. Prediction models were constructed using algorithms and features and evaluated in the test set. The analysis was conducted using R software.

Identification and validation of the prediction model

Our study aims to develop a machine learning model to categorize individuals as having "myocardial ischemia" or not. Model efficacy will be evaluated using the area under the receiver operating characteristic curve (AUC)7. The dataset will be randomly divided into training and test sets using a stratified sampling approach. We will use LR8, RF9, XGBoost10, KNN11, and SVM12 algorithms to fit models to the training set and validate them on the test set. The best algorithm will be selected based on AUC scores and calibration curve performance.

We used the Recursive Feature Elimination (RFE) algorithm13 to identify significant variables for a more optimal and clinically feasible model. The resulting model, based on the most influential variables, was compared to the full dataset model and found to be well-suited for practical use.

We used the tidymodels 1.0.0 framework and R programming language (version 4.2.0), along with tidyverse, tidymodels, and caret packages for data analysis, implementing fivefold cross validation (Additional file 1).

Ethics approval and consent to participate

In accordance with the ethical guidelines of the Helsinki Declaration, the experimental protocol was developed and approved by the Human Ethics Committee of the 920th Hospital of the Joint Logistics Support Force. (Lot no. 2022-135-01). Written informed consent was obtained from individual or guardian participants.

Results

In our study, univariate analysis was used to identify clinical and hematological factors associated with myocardial ischemia. Results showed that individuals with myocardial ischemia were older, had a higher BMI and SBP, and tended to be female (OR 3.57, 95% CI 1.48–7.72, p = 0.005) and experience chest tightness or pain (OR 2.33, 95% CI 1.35–3.83, p = 0.002). Smoking duration for more than 10 years (OR 2.33, 95% CI 1.35–3.83, p = 0.002) was also found to be a risk factor for myocardial ischemia. Furthermore, individuals living at 1500–2500 m (OR 2.44, 95% CI 1.48–3.87, p = 0.002) experienced a greater risk of myocardial ischemia than those at 1500 m. Interestingly, scores of physical ability test and SCOPA-AUT did not differ significantly between the two groups. Among hematological markers, we identified 12 indicators that showed significant differences (p < 0.001) between the two groups, including Ca, Crea, Eos.Per, G, GLU, MCHC, P, PLT, AST, TP, Urea, and WBC. These factors were used to develop our algorithm model, though other 11 markers such as ALB, AST, ALP, and Eo also showed differences between the two groups (P < 0.05) (Additional file 2: Table S1).

All analyzed models exhibited high performance in predicting the outcome, with RF and XGBoost algorithms models outperforming the other three models. Both the mean AUC for models of RF in the training dataset and AUC for the model of RF in the test dataset were 0.86, demonstrating superior discrimination ability of RF. The accuracy and precision scores for all models were also provided for comprehensive evaluation (Additional file 2: Table S2). Overall, the RF model demonstrated the best predictive performance in both the training and test datasets (Fig. 2).

Figure 2
figure 2

Results from the model training and algorithm selection process. (A) The bar chart shows the average AUC score for each candidate algorithm in the training set. (B) The bar chart shows the AUC score of the best model for each candidate algorithm in the test set. (C) The best ROC curve for each candidate algorithm in the training set. (D) ROC plot of the best model for each candidate algorithm in the test set. Details of these analysis methods are provided in the appendix (Additional file 2: Table S2). AUC area under the receiver operating characteristic curve, ROC Receiver operating characteristic (ROC) curve.

In developing a machine learning model, many variables are considered, but not all variables are equally important. To determine the important variables and evaluate risk factors that influence the outcome, variable importance plots were utilized to rank the variables according to their importance, and we found that 27 important clinical features were able to predict myocardial ischemia in the RF and LR algorithm models, while the XGboost algorithm model identified 25 important clinical features. After analyzing the top 50% of clinical features from three algorithm models, we found that 9 variables were consistently identified as the most important. These variables, including Eos.Per, Ca, GLU, G, PLT, MCHC, AST, Crea, and TP were deemed critical for the predictive model (Fig. 3).

Figure 3
figure 3

Variable importance scores of RF, XGBoost and LR. (A)–(C) Variable importance scores of RF, XGBoost and LR, respectively. Details of these analysis methods are provided in the appendix (Additional file 3: Table S3). (D) The Venn diagram of the top 50% ranked variables in the RF, XGBoost and LR algorithm models, with 9 intersecting variables including Eos.Per, G, Ca, PLT, GLU, AST, TP, MCHC, and Crea.

To enhance the clinical feasibility of the RF algorithm in predicting myocardial ischemia, we investigated whether a smaller subset of 27 features could yield a more precise model. By utilizing the RFE method, we evaluated the performance of the RF algorithm with varying numbers of features selected from the full set of 27. Intriguingly, our findings indicated that the RF algorithm demonstrated a turning point in AUC with as few as 5 features (Fig. 4). Notably, the 5 features identified from the training dataset were Eos.Per, G, Ca, GLU, and AST. These results suggest that a simplified model containing only a small number of crucial features may be useful in a clinical setting.

Figure 4
figure 4

AUC versus the number of variables used in RFE and performance comparison of 27F model and 5F model under RF algorithm. (A) The performance of RFE with respect to different variables using the area under the ROC curves. The graph illustrates that the model achieved comparable performance with the highest AUC by utilizing 5 selected features. (B) The ROC curves of the 27F and 5F models based on the training dataset. On (C), we have shown the ROC curves for the 27F and 5F models for the testing dataset. The calibration curves for the 5F and 27F models for the training dataset (D) and test dataset (E) are presented.

Building on our findings, we developed a second prediction model using the RF algorithm in the training set, which refer to as the 5F model. We then compared its performance with the previously constructed 27F model. Both models performed well in the training dataset (mean AUC scores of 0.86 and 0.85, respectively) and showed similar performance in the test set (AUC score of 0.86). The calibration curves of the 5F and 27F models on both datasets demonstrated good predictive ability for myocardial ischemia (Fig. 4). In the training dataset, the calibration slopes of the 5F model (1.31, 95% CI 0.38–2.24) and 27F model (0.90, 95% CI 0.68–1.12) were compared. However, in the test set, the calibration slopes of the two models were both favorable (1.17, 95 CI 0.81–1.54 vs 0.84, 95% CI 0.62–1.05). Notably, the slopes of all curves were relatively close to one, and the intercepts in both datasets were close to zero, suggesting that the established models can feasibly predict myocardial ischemia at high altitude (Additional file 2: Table S5).

Finally, the violin diagram is used to show the comparison of important variable parameters in the 5F model in the overall datasets (Additional file 3: Fig. S1). Prior to entering high altitude, levels of Eos.Per, G, Ca, and GLU were significantly higher in the myocardial ischemia group (p < 0.001), while AST levels were lower in the myocardial ischemia group (p < 0.001). These differences between the two groups were also evident in the testing dataset, with the 5 variables showing distinct performance in predicting myocardial ischemia.

Discussion

We have successfully developed and validated a predictive model for myocardial ischemia, utilizing clinical and hematological data from 2855 soldiers who underwent training in high-altitude regions (at an altitude of 3000–3500 m), in combination with advanced machine learning algorithms. Our model demonstrated exceptional accuracy (AUC score of 0.86) in both the training and testing sets. While the main risk factors for cardiovascular disease include advanced age, hypertension, type 2 diabetes, hypercholesterolemia, obesity, and smoking14, the specific impact and epidemiological situation of these risk factors on cardiovascular disease in high-altitude regions, particularly those above 3000 m, remains largely unknown. Ischemic cardiomyopathy is the primary cause of death in cardiovascular disease, with myocardial ischemia and myocardial infarction being its primary manifestations15. A variety of factors, such as emotions, drugs, exercise, and cold exposure, can induce myocardial ischemia events and significantly increase the risk of cardiovascular death or non-fatal myocardial infarction16,17,18,19. Therefore, the early identification and intervention of the myocardial ischemia process in ischemic cardiomyopathy is of paramount importance20. Our high-altitude myocardial ischemia prediction model, based on the machine learning RF algorithm, demonstrates great predictive value for individuals entering high-altitude regions, particularly those above 3000 m. Validation of this model in women, the elderly, and individuals with a history of coronary heart disease could potentially aid even more people.

It is estimated that over 100 million individuals worldwide annually migrate from low-altitude areas to high-altitude regions, with the majority being middle-aged and young adults21. The participants in our study were primarily healthy individuals aged 18–54, representing the physically sound middle-aged and young adults who intend to venture into high-altitude regions from the plains. After arriving in these regions, they take on more physically demanding tasks related to basic infrastructure construction, military missions, or high-intensity sports, such as skiing, ice skating, and mountaineering22. Our machine learning-based high-altitude myocardial ischemia model can provide more precise predictions for this population. Machine learning techniques have gained widespread use in cardiovascular disease research23,24,25 and are believed to have significant advantages in clinical diagnostic model construction. In our study, we constructed a 5F model (5 main variables: Eos.Per, G, Ca, GLU, AST), which showed no significant AUC difference compared to models with more variables. Fewer clinical features may be more convenient for clinical services.

After healthy individuals ascend to high altitude areas, the cardiovascular system undergoes physiological changes due to the compensatory mechanisms of the body. These changes include an increase in pulmonary artery pressure, blood pressure, heart rate, and changes in cardiac output26,27,28. Coronary heart disease patients also experience corresponding physiological compensatory changes, which may increase the risk of death due to reduced coronary reserve capacity and inadequate oxygen supply. However, the high-altitude environment does not increase mortality rates29, indicating that the protective compensatory mechanisms of the heart are effectively resolving these risks. Myocardial cells are protected by these mechanisms by stimulating the sympathetic nervous system, reducing intracellular oxygen pressure, and initiating gene programs throughout the body. These gene programs include antioxidation, anti-inflammatory, anti-apoptosis, and Ca2 + transport proteins30. However, the high-altitude environment can intensify numerous cardiovascular disease risk factors, including higher triglycerides, higher cholesterol levels, and lower levels of high-density lipoprotein cholesterol (HDL-C)31,32,33. Blood lipids also change when healthy individuals enter high altitude areas34. Additionally, the high-altitude environment can increase systemic inflammatory response factor levels and affect platelet aggregation and fibrinogen levels35,36. Nevertheless, activities at high altitude can improve insulin function and maintain blood sugar stability37. Modifiable cardiovascular disease risk factors in the high-altitude environment may contribute to the development of high altitude-induced myocardial ischemia. Our research has found that five variables, including Eos.Per, Ca, AST, G, and GLU, may be involved in this process. Eosinophils are important immune cells involved in anti-parasitic and hypersensitivity reactions. It has been suggested that they may promote coronary artery spasm by releasing inflammatory mediators38. Clinical studies have also demonstrated that the percentage of eosinophils in peripheral blood is closely related to the subtype and severity of coronary artery disease39. Calcium, which is essential for cellular function in the human body, is primarily regulated through renal reabsorption and excretion, intestinal absorption, and bone exchange40. The presence of calcium may induce vasoconstriction by affecting nitric oxide and endothelial function, while magnesium, a natural calcium antagonist, can reverse this process41. Serum levels of calcium and magnesium have been found to be closely related to coronary artery disease, with the potential to predict the risk of coronary heart disease42. AST, a liver transaminase and myocardial biomarker, has unexplored predictive value for myocardial ischemia under plateau exposure43. Despite being independently associated with atherosclerosis and cardiovascular events, our study identifies lower AST prior to plateau entry as a risk factor for developing myocardial ischemia on plateau, challenging conventional understanding of AST's role in cardiovascular health. Components of the adaptive immune system, such as lymphocytes and immunoglobulins, are also involved in the development of coronary atherosclerosis44. Serum γ-globulin, primarily composed of immunoglobulins, has been found to have prognostic value for stable coronary heart disease patients45. Diabetes is a significant risk factor for cardiovascular disease, with changes in blood glucose closely linked to increased risk. Even changes in blood glucose within the normal range are significantly associated with metabolic syndrome, with elderly individuals displaying high normal blood sugar levels at an increased risk of developing cardiovascular disease46. Given the evidence, it is speculated that “5F” may enhance the risk factors of high-altitude cardiovascular disease, and that “5F” levels will increase significantly after entering high altitude37,47. Further research is necessary to investigate the potential mechanisms underlying the relationship between “5F” changes in high altitude and myocardial ischemia.

This study has certain limitations. Firstly, the majority of our study population were aged 18–35 years old, which may limit the generalizability of our findings to the wider population. Additionally, this age group tends to have a lower incidence of underlying diseases, which may affect the significance of high-altitude myocardial ischemia. Secondly, the male-to-female ratio in our study population was imbalanced, which may limit the representativeness of our results for women. Thirdly, this study location was chosen based on several factors, including the rarity of reports on altitudes above 3000 m, and the challenges associated with conducting health examinations at higher altitudes. While this altitude range may be representative to a certain extent, it is important to note that the number of residents living above 3500 m in China is relatively small.

Fourth, the use of resting ECG as the exclusive diagnostic tool for myocardial ischemia presents inherent limitations, which can escalate the risk of misclassification bias. Despite these limitations, our findings shed light on the clinical and hematological characteristics of high-altitude myocardial ischemia and contribute to a growing body of literature on this important topic.

Conclusion

With the help of machine learning techniques applied to cohort data of healthy individuals preparing to enter high-altitude regions, we have developed a precise and uncomplicated predictive model for myocardial ischemia. After external validation in diverse populations, including more female participants, the model can aid individuals in identifying their risk and preparing early.