Development and validation of a prediction model to estimate risk of acute pulmonary embolism in deep vein thrombosis patients

Venous thromboembolism (VTE), clinically presenting as deep vein thrombosis (DVT) or pulmonary embolism (PE). Not all DVT patients carry the same risk of developing acute pulmonary embolism (APE). To develop and validate a prediction model to estimate risk of APE in DVT patients combined with past medical history, clinical symptoms, physical signs, and the sign of the electrocardiogram. We analyzed data from a retrospective cohort of patients who were diagnosed as symptomatic VTE from 2013 to 2018 (n = 1582). Among them, 122 patients were excluded. All enrolled patients confirmed by pulmonary angiography or computed tomography pulmonary angiography (CTPA) and compression venous ultrasonography. Using the LASSO and logistics regression, we derived a predictive model with 16 candidate variables to predict the risk of APE and completed internal validation. Overall, 52.9% patients had DVT + APE (773 vs 1460), 47.1% patients only had DVT (687 vs 1460). The APE risk prediction model included one pre-existing disease or condition (respiratory failure), one risk factors (infection), three symptoms (dyspnea, hemoptysis and syncope), five signs (skin cold clammy, tachycardia, diminished respiration, pulmonary rales and accentuation/splitting of P2), and six ECG indicators (SIQIIITIII, right axis deviation, left axis deviation, S1S2S3, T wave inversion and Q/q wave), of which all were positively associated with APE. The ROC curves of the model showed AUC of 0.79 (95% CI, 0.77–0.82) and 0.80 (95% CI, 0.76–0.84) in the training set and testing set. The model showed good predictive accuracy (calibration slope, 0.83 and Brier score, 0.18). Based on a retrospective single-center population study, we developed a novel prediction model to identify patients with different risks for APE in DVT patients, which may be useful for quickly estimating the probability of APE before obtaining definitive test results and speeding up emergency management processes.


Scientific Reports
| (2022) 12:649 | https://doi.org/10.1038/s41598-021-04657-y www.nature.com/scientificreports/ diagnostic tests are typically unavailable because of a lack of equipment and trained personnel 4 and prohibitive costs 5 . Other relatively economical and accurate examination means are difficult to be implemented quickly.
The results of prospective studies and guidelines lend support to the concept that clinical probability assessment is a fundamental step in the diagnosis of pulmonary embolism 1,6,7 . The pulmonary embolism risk assessment scale recommended by the current guidelines mainly includes the Wells score 8 and the revised Geneva score 9 .
In the Chinese population, the diagnostic value of the Wells scores and revised Geneva score still needs to be verified by multi-center, prospective validation studies in a large cohort. Although some studies demonstrated the usefulness of these traditional scores for identifying suspected patients at risk of developing PE, however, as one of most important inducements of PE, they are not focused on accurately estimating risk in DVT patients. Based on clinical characteristics and pathogenesis, the primary aim of our study is to estimate the risk of APE in DVT patients. The secondary aim is to develop and validate a predictive model using clinically variables which are readily available in primary care institutions and different professional departments at the time of thrombotic events. To enhance visual presentation and facilitate subsequent clinical applications, heatmap shows the distribution of all the sample's predictor variables, and generated nomograph provides a quick visual technique to assess the clinical probability of acute pulmonary embolism, which can direct personalised decision-making for preventative therapy.

Methods
Study design and data source. Figure 1 illustrates the workflow. The experiments, including any relevant details, were approved by the Ethics Committee of the First Affiliated Hospital of Xi'an Jiaotong University (No: XJTU1AF2018LSK-144) (Supplementary Information 1). The study was performed in accordance with relevant guidelines and regulations. Verbal informed consent was obtained from the patient(s) for their anonymized clinical information to be published in this article. Consecutive patients who were diagnosed as symptomatic VTE between June 1, 2013 and June 1, 2018 (n = 1582) at The First Affiliated Hospital of Xi'an Jiaotong University were initially enrolled in this study. Among them, 122 patients were excluded, including 26 DVT controls (incomplete data in 26), 96 DVT + APE cases (admitted outside study window in 86 and incomplete data in 10). Therefore, a total of 1,460 patients was determined as the required sample size to be enrolled in the study (Supplementary Information 2). Department of Peripheral Vascular Diseases undertakes the important task of diagnosis, treatment and follow-up VTE patients in the whole hospital. So about 30% of patients were suspected and diagnosed of VTE during hospitalization in other departments (such as Department of Critical Care Medicine, Department of Respiratory Medicine, Department of Cardiovascular Medicine, Department of Oncology and so on) or emergency departments of The First Affiliated Hospital of Xi'an Jiaotong University due to other diseases, and then transferred to Department of Peripheral Vascular Diseases for treatment. Approximately 70% of patients presented directly to the Department of Peripheral Vascular Diseases for suspected VTE, which was clearly diagnosed and treated during hospitalization. All enrolled patients confirmed and diagnosed carried out at our institution dedicated diagnostic unit, including pulmonary angiography or CTPA, compression venous ultrasonography and electrocardiogram (ECG). Each patient was examined at baseline according to a standard-  www.nature.com/scientificreports/ ized protocol, following recommended international standards. Pulmonary angiography was performed after obtaining written informed consent from the patients. Exclusion criteria were: (1) recurrent pulmonary embolism, (2) incomplete clinical data, (3) contraindication to CTPA/pulmonary angiography, (4) the patient refuses to complete diagnostic test.
Predictor variables. In the first step, we searched PubMed and Web of Science databases without language or time restrictions to retrieve relevant studies. The prediction factors were mainly derived from 2019 ESC Guidelines for the diagnosis and management of acute pulmonary embolism 1 and a systematic review and metaanalysis that was designed to identify factors for VTE in hospitalized medical patients 10 . To maximize safety and model usability, we tend to choose reasonable and clinically relevant predictors that are easily available in primary care institutions and different professional departments. Because some biochemical tests are not routinely available, we did not consider biomarkers endorsed by guidelines (D-dimer or pro-BNP). Age as one of continuous variables, was transformed into binary variables using pre-specified cut-offs either (> 65 years and < 65 years) derived from literature 9 . Meta-analysis found low-certainty evidence of association between risk of any VTE and central venous catheters (CVC) use 10 , we did not choose CVC use, this risk factor is less common in our sample population. There is probably an association between risk of any VTE and elevated heart rate(> 100 beats per minute), therefore, we selected tachycardia (> 100 beats per minute) and heart rate (as continuous variables).
Based on literature and research reports, we screened more than 10 kinds of electrocardiogram sign associated with APE [11][12][13][14][15] . The ECGs obtained within the first 24 h of hospital admission were included in the study. Patients with acute cor pulmonale were deemed present if we identified at least one of the following: (1) S I Q III T III , (2) T-wave inversion in right precordial leads, (3) S 1 S 2 S 3 , (4) pseudo infarction, (5) transient right bundle branch block. If the above signs had appeared in the past, they would be excluded.
In order to control bias and produce reliable data for research, the following rules were decided upon: (1) The principle of blind method was guaranteed in the whole process of experimental design, study implementation and statistical analysis. (2) At the time of diagnosis, all the eligible cases undergone by trained clinical doctors to determine the presence or absence of signs and symptoms related to VTE, as dichotomous variables (yes/no), including dyspnea, hemoptysis, chest pain, syncope, swelling pain in the lower limbs and so on. (3) The clinical doctor should be careful to identify potential factors associated with APE and exclude pre-existing medical history, that are similar to the clinical manifestations of pulmonary embolism. (4) Trained research personnel completed a training course designed to explain all variables to ensure that the same data collection methods were followed. (5) Clinical data and inspection results were abstracted from hospital medical records by trained research personnel using a standardized form and they did not be aware of the final diagnosis at the time of data collection. (6) The data collector was not allowed to take charge of data analysis and did not know the research protocol. (7) The data collector ensured fully understood all eligible patients' clinical data and the survey was conducted in a quiet room without any disturbance. (8) The statistician analyzed the data independently without any disturbance.
Outcome variables. The primary outcomes of this study were as follows: (1) an easy-to-use predictive model for acute pulmonary embolism was derived and validated, (2) a reasonable pipeline of disease risk prediction and factor analysis was introduced.
Derivation and validation of the models. The initial cohort comprised 1582 symptomatic VTE patients. 36 patients (including 10 DVT + APE cases and 26 DVT controls) were excluded due to incomplete data, 86 were excluded due to acute pulmonary embolism only, therefore, 1460 patients (DVT + APE vs DVT 773:687) were included in this study. Then, we randomly classified samples as training set (1095) and testing set (365) in a 3:1 ratio. The training set was used to generate the prediction model, and testing set was used to evaluate the prediction performance of the model. Firstly, we performed univariate analysis to select predictor variables those significantly linked with APE diagnosis, using a cutoff of p < 0.05 (Supplementary Tables 1 and  2). To avoid overfitting, LASSO regression analysis was used to screened those APE diagnostic-related variables. Later, all APE diagnostic-related predictor variables were included in the multivariate analysis to assess independent predictor factors using logistics regression (Supplementary Table 3). Ultimately, we constructed sixteen APE diagnostic-related predictors as candidates for the prediction model. The area under the receiver operator characteristics (AUC) curves was used to evaluate the diagnostic efficiency of the model. Based on the AUC, Brier score and calibration curves were used to evaluate the concordance between predicted diagnosis outcomes in training set and testing set. The prediction model distribution of patients at different risk levels, the number of censored patients, and the heatmap of APE diagnostic-related predictors were displayed. Establishment of the nomogram based on independent risk factors resulting from multivariate logistics regression to predict the APE probabilities for patients with DVT.
All figures were created using R software version 4.0.2. LASSO logistic regression was performed by package 'glmnet' function of 'glmnet' package. The AUC and Brier score for the model were calculated using the R package of the "riskRegression". The nomogram was constructed using the logistic regression analysis with the R package "rms".
Handling of missing data. Except for age and gender, there was tiny missing data for all variables. We eliminated the missing variables and analyzed the complete data.
Statistical analysis. The

Model development.
A total of 54 variables were obtained from systematic review and meta-analysis, which has previously been reported to be associated with VTE. Univariate regression analysis was performed on 54 selected variables. We found that 34 variables were significantly linked with diagnosis of APE in DVT patients (p < 0.05) ( Supplementary Tables 1 and 2). Lasso regression analysis and multivariate logistics regression analysis were adopted for the 34 APE diagnostic-related variables (Supplementary Table 3). Based on the results of the univariate analysis, 23 variables are included in the Lasso regression model (Fig. 2). After selecting the above 23 variables through multiple logistic regression again, 20 variables were independently associated with APE. We included 16 variables with OR value > 1 to build a prediction model, and named the model as APE risk prediction model ( Table 2).
The APE risk prediction model included one pre-existing disease or condition(respiratory failure),one risk factors(infection), three symptoms(dyspnea, hemoptysis and syncope), five signs(skin cold clammy, tachycardia, diminished respiration, pulmonary rales and accentuation/splitting of P 2 ), and six ECG indicators(S I Q III T III , right axis deviation, left axis deviation, S 1 S 2 S 3 , T wave inversion and Q/q wave), of which all were positively associated with APE in DVT patients. The area under the ROC curve was 0.79 (95% CI, 0.77-0.82) (Fig. 3).

Internal validation.
To validate the APE risk prediction model, we used an internal validation procedure based on random classify validation. The ROC curves of the model showed AUC of 0.79 (95% CI, 0.77-0.82) and 0.80 (95% CI, 0.76-0.84) in the training set and testing set, respectively, and no significant difference was found between these values, indicating the reliability of the nomogram (Fig. 3). This model had a Brier score of 0.18, calibration slope of 0.83, indicating good predictive accuracy performance (Fig. 3).

Model presentation.
Since none of the prediction models performed well in all patients with APE, we try to derive a new predictive model which better identify patients at risk of deterioration. Our model had a good discriminatory power for APE in DVT patients (AUC, 0.79; 95% CI, 0.77-0.82). Heatmap showed that high-risk patients had more kinds of risk factors, which suggested that there were significant differences between the 16 diagnostic-related variables in high-risk and low-risk score APE patients (Fig. 4). To generate and validate an APE risk prediction model that could be translated to the clinic, we developed a nomogram to predict risk of APE in DVT patients (Fig. 5).

Discussion
Utilizing high-quality data from a retrospective cohort study, we derived an easy-to-use clinical score to predict the risk of developing APE in the DVT patients. The APE risk prediction model derived from a large cohort of consecutive inpatient with diagnostic examination, totally based on past medical history, clinical symptoms, physical signs, and the sign of the electrocardiogram. Sixteen clinical predictors accurately identified patients with high-risk disease who may benefit from individualized management to improve clinical outcomes. The excellent discriminatory power of our model was validated by internal validation.
We purposefully selected to utilize readily available predictors to enhance clinical applicability and ease of use, particularly for primary care institutions and different professional departments. In our study, suddenonset dyspnea and hemoptysis are powerful predictors of pulmonary embolism, it is consistent with previous reports 16,17 . Pulmonary embolism was identified in nearly one of every six patients hospitalized for a first episode of syncope 18 , therefore, syncope was selected as a predictive variable and was eventually included in the predictive model. Right ventricular dysfunction is associated with thrombotic load and one of the important prognostic factors of pulmonary embolism. In APE patients, there is always at least one ECG sign of right ventricular strain, including S I Q III T III , right bundle branch block and T wave inversions 14 . Our model included a total of 6 ECG sign, and these indicators have previously been reported to be related to pulmonary embolism.
Why choose Electrocardiogram as a predictor of APE? Electrocardiogram is an irreplaceable examination method to explore and measure abnormal electrocardiogram activity. It has the advantages of non-invasive, timeliness and simple operation, and has become a necessary examination for patients with unexplained dyspnea or chest pain 19 . The changes of electrocardiogram frequency, rhythm and conduction in APE patients, throughout the disease course and during treatment phases, may better assess risk stratification, prognosis and outcome of the disease and hence the opportunity for more applicable and balanced targeted preventative strategies 13,20,21 . Is this prediction model clinically generalized? Firstly, as one of the most common clinical examination methods, electrocardiogram is often used to assist early screening of suspected patients. Typically, the sign of the electrocardiogram requires physicians to provide scientific and medical expertise, and the electrocardiographic abnormalities with acute cor pulmonale are well-defined criteria, which have been known and applied for many years 22 . Except electrocardiogram, all the data required for the prediction model are routinely collected in the context of suspected acute pulmonary embolism and are available from the patient's history and physical examination. Since the model was derived from multidisciplinary patients, we believe that the prediction model is applicable to all clinical departments and easy to calculate. Is this prediction model valid and accurate? In terms of prediction accuracy, all patients received a diagnosis by a gold standard criterion, and our prediction model could be considered accurate for predicting pulmonary In fact, our prediction model is being extended and externally validated in multiple centers. Preliminary experimental results prove the feasibility of our ideas, the ability of the model to distinguish patients' risk for APE in the validation cohort is at least as good as in the original cohort. To facilitate clinical visualization management, instead of using points proportional to their beta regression coefficient values, we estimate the probability of acute pulmonary embolism directly from Nomogram.
There are potential limitations to our study. Firstly, the study is a retrospective study and not a populationbased study or nationwide survey, which had an unavoidable selection bias. Secondly, the original intention of this model was to serve primary care institutions and simplify the diagnosis process, so we did not include the biochemical indicators recommended by the guidelines, such as D-dimer, pro-BNP, etc. Finally, as is often the case in clinical diagnostic studies, in our study, we did not account for the uncertainty around predictions, but   www.nature.com/scientificreports/ focused on the clinical possibility assessment. Hopefully, this model will be further validated in a large, multicenter, prospective validation study before providing benefits for Chinese patients.
In conclusion, this study reports the derivation and initial validation of a sixteen variable clinical prediction model that demonstrated good overall accuracy in predicting risk of acute pulmonary embolism for patients with deep vein thrombosis. The above means the prediction model appears more suitable for primary care institutions and different professional departments. Pending external validation, this study now provides the basis and information for risk assessment of patients with acute pulmonary embolism.     www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.