Clinical symptoms and related risk factors in pulmonary embolism patients and cluster analysis based on these symptoms

Pulmonary embolism (PE) remains largely underdiagnosed due to nonspecific symptoms. This study aims to evaluate typical symptoms of PE patients, their related predictors, and to differentiate typical clusters of patients and principal components of PE symptoms. Clinical data from a total of 551 PE patients between January 2012 and April 2016 were retrospectively reviewed. PE was diagnosed according to the European Society of Cardiology Guidelines. Logistic regression models, system clustering method, and principal component analysis were used to identify potential risk factors, different clusters of the patients, and principal components of PE symptoms. The most common symptoms of PE were dyspnea, cough, and tachypnea in more than 60% of patients. Some combined chronic conditions, laboratory and clinical indicators were found to be related to these clinical symptoms. Our study also suggested that PE is associated with a broad list of symptoms and some PE patients might share similar symptoms, and some PE symptoms were usually cooccurrence. Based on ten symptoms generated from our sample, we classified the patients into five clusters which represent five groups of PE patients during clinical practice, and identified four principal components of PE symptoms. These findings will improve our understanding of clinical symptoms and their potential combinations which are helpful for clinical diagnosis of PE.

some patients had a delayed presentation of related symptoms over weeks or days 15 . Therefore, well understanding of typical clinical symptoms associated with PE, their potential predictors, and potential co-occurrence and combination of these symptoms would improve diagnosis and prognosis of PE. However, few studies had focused on this topic and no studies were found to cluster PE symptoms and patients to date. Few related studies were all based on Western populations 6,16 , and Chinese patients might have different characteristics due to racial differences 17,18 .
Based on a large sample of PE patients in a comprehensive hospital in China, this study aims to assess typical clinical symptoms, to identify related clinical and laboratory indicators of these symptoms, to group the patients into different clusters based on the presentations of symptoms, and to identify the principal components of these symptoms. This study will improve our understanding of clinical symptoms and their potential combinations which are helpful for clinical diagnosis of PE.

Ethics statement. This study was approved by the Medical Ethics Committee of Affiliated Dongyang
Hospital of Wenzhou Medical University (Dongyang, China). Informed consent was obtained from all enrolled patients. Patients records/information was anonymized and de-identified prior to analysis. All experiments were performed in accordance with relevant guidelines and regulations.
Subjects and sample collection. Clinical data from a total of 551 patients hospitalized at the Affiliated Dongyang Hospital of Wenzhou Medical University between January 2012 and April 2016 were retrospectively reviewed. According to the criteria of European Society of Cardiology Guidelines, PE was confirmed by an identified filling defect in the pulmonary artery system in a CT pulmonary angiography (CTPA) or a positive venous ultrasound of extremity deep venous thrombosis (DVT) in patients with typical symptoms of PE (Pleuritic pain or dyspnea). In addition, a positive D-dimmer or ventilation-perfusion (V/Q) scintigraphy supports a high probability for PE, as recommended in those guidelines. All clinical data including symptoms, vital signs, comorbidities, and laboratory indicators were collected on admission. Ten symptoms were included in the analysis including dyspnea, tachypnea, hemoptysis, pleuritic pain, hydrothorax, syncope, hypoxia, fever, cough, and edema in lower extremities. Concurrent chest radiographs, CT scan and echocardiographic examination were performed and recorded. CTPA examination. The thoracic CT scans were performed using a 64-detector multi-sectional CT scanner (Brilliance 64-slice; PHILIPS, Amsterdam, Netherlands) with an intravenously injected contrast agent. Scanned with multi-slice spiral CT, collimation of 0.6, rotation time of 0.5 s, slice thickness of 5 mm, and pitch of 1.0, contrast agent (100 ml) was injected at 4 ml/s. CTPA results were categorized as positive for PE if an intraluminal filling defect was observed within a pulmonary arterial vessel and were considered negative if no filling defect was seen. Scans were considered technically inadequate only if main or lobar pulmonary vessels were not visualized.
Statistical Analysis. Characteristics were presented as mean ± standard deviation (SD) for continuous variables and percentage for categorical variables. Multivariate logistic regression models were used to identify the associations between clinical indicators/predictors and PE symptoms. System clustering method was used to identify typical clusters of PE patients based on the symptoms. In this method, Euclidean distance was used to measure similarity of objects in the symptoms of PE Patients, and Ward's method was performed to group objects to clusters. Four cluster evaluation statistics including R-squared, semi-partial R-squared, Pseudo F and Pseudo T-squared were plotted in the hierarchical analysis to determine the optimal number of clusters. According to the plots, we usually defined 4-6 clusters. Within this range of number of clusters, we chose the number with smaller semi-partial R-squared, larger R-squared, larger Pseudo F, and smaller Pseudo T-squared statistics. Chi-square Fisher's exact test was utilized to determine whether cluster membership was significantly associated with different clusters. To determine the extent to which the clusters based on presentations of PE correspond to differences in other indicators, a series of one-way ANOVA tests were performed for other indicators with different clusters. Principal component analysis (PCA) was used to identify potential principal components of PE symptoms. All principal components with eigenvalue greater than 1 were remained. All significance levels were set at 0.05. Statistical analyses were run in SAS 9.4.

Results
Characteristics of the study population. A total of 551 patients with PE were included in this study. Table 1 presented the characteristics of the study population. Patients presented a broad range of clinical symptoms. Among all symptoms, dyspnea was the most common presenting symptom in 64.1% of cases with PE, followed by cough (60.4%), tachypnea (60.4%) and hypoxia (57.9%), while the other symptoms included hydrothorax, edema in lower extremities, fever, syncope, hemoptysis and pleuritic pain, accounting for 26.9%, 23.6%, 22.1%, 12.7%, 11.8% and 8.2%, respectively. PE patients were usually combined with hypertension (45.2%) and chronic obstructive pulmonary disease (COPD) (32.1%).
Associations between comorbidities and symptoms. The results of associations between comorbidities and each symptom were shown in Table 2. The PE patients with atrial fibrillation (AF) were more likely to present tachypnea, hemoptysis, and hydrothorax compared to those without AF. The patients with coronary heart disease (CHD) experienced higher odds of presenting dyspnea and tachypnea, and those with CDPD experienced higher odds of presenting dyspnea, tachypnea, hypoxia, and cough. The patients with lower extremities thrombosis were more likely to present edema in lower extremities. However, the PE patients with hypertension or diabetes, COPD, and lower extremities thrombosis experienced lower odds of presenting cough, syncope, and dyspnea, respectively. Associations between laboratory indicators and symptoms. The results of associations between some laboratory indicators and each symptom were summarized in Table 3. The counts of white blood cell (WBC), neutrophilic granulocyte, red blood cell (RBC), and D-dimer (D-D) were usually associated with increased odds of presenting these symptoms. For example, the count of WBC was positively associated with presenting of dyspnea, tachypnea, hydrothorax, syncope, hypoxia, and fever. The level of albumin (ALB) was usually associated with decreased odds of presenting symptoms including dyspnea, tachypnea, hydrothorax, fever, and cough. Hemoglobin and platelet were occasionally associated with some symptoms. Arterial carbon dioxide tension (PPOCD) and three tip valve regurgitation (TTVR) were associated with increased odds of presenting several symptoms, while the PH value was associated with decreased odds of presenting dyspnea, tachypnea, hydrothorax, and syncope (Table 4).

Five clusters of patients from cluster analysis.
Based on ten main symptoms presented among our PE patient sample, we generated five clusters of patients according to cluster analysis approach. We evaluated the cluster analysis by R-squared, semi-partial R-squared, Pseudo F and Pseudo T-squared statistics, and the plots results suggested noticeable improvement at around five clusters. Thus, five is the most optimal number, where both the semi-partial R-squared and Pseudo T-squared were relatively small, and both R-squared and Pseudo F were relatively large ( Figure 1). The cluster history results were presented in Table 1S in the supplementary file. Table 5 showed the number and proportion of each symptom in each of five clusters. Almost half of the patients (N = 250) were classified into Cluster 3, and a quarter of those (N = 136) were classified into Cluster 2. Among patients in Cluster 3, almost all presented dyspnea (98.8%) and tachypnea (92.8%), and most of them presented hypoxia (70.8%) and cough (71.6%). For Cluster 2, we could find that the patients have no specific/typical symptoms. Among patients in Cluster 1, all presented syncope, and almost half of those presented dyspnea, hypoxia, tachypnea, and cough, but few presented pleuritic pain, hemoptysis, and edema in lower extremities. Table 6 summarized age, gender, comorbidities, and clinical indicators related to PE by five clusters. For example, in Cluster 3, the patients were the oldest, were most likely to have COPD, had the lowest arterial oxygen tension (PPO), the highest PPOCD, and pulmonary artery pressure (PAP).
PCA of PE symptoms. The principal components with eigenvalue greater than 1 were retained to account for as large as possible proportion of the total variability in the component measures. The eigenvalues of the correlation matrix from PCA were shown in Table 1S, and the graph results of PCA were shown in Fig. 2. The loadings of ten symptoms for all principal components were presented in Table 7. In the first principal component which accounted for 25% of the variance, the symptoms with large loadings included tachypnea, dyspnea, cough, hydrothorax and hypoxia.

Discussion
In our study, the most common symptoms of PE were dyspnea, cough, and tachypnea in more than 60% of patients. Some combined chronic conditions, and laboratory and clinical indicators were found to be related to these clinical symptoms among PE patients. The present study also suggested that PE is associated with a broad list of symptoms and some PE patients might share similar symptoms. Based on ten symptoms generated from our sample, we classified the patients into five clusters which represent five groups of PE patients during clinical practice. Four principal components of symptoms were identified and the tachypnea, dyspnea, cough, hydrothorax, and hypoxia were the most common symptoms in the largest principal component which accounted for 25% of variability of the PE symptoms. The diagnosis of PE remains challenging, particularly due to the absence of commonly associated symptoms and signs in this disease. The majority of well-known symptoms in our sample are similar in prevalence to those described in prior studies 2,5,6,18-23 . However, the results are still inconsistent across previous studies. For example, some studies showed that pleuritic pain and hemoptysis were the most frequent mode of presentation in PE patients 2 . A recent study indicated that most PE patients featured at least one of the four following symptoms: sudden onset dyspnea, pleuritic pain, syncope, and hemoptysis 11 . In the present study, however, fewer PE patients had pleuritic pain and hemoptysis. Hemoptysis has been traditionally taught as a classically described symptom in the presentation of PE 12,20,[24][25][26][27] . Previous studies have reported that the occurrence of hemoptysis in PE is to be as high as 20-25% 12 . However, in our study, hemoptysis was noticed only in 11.8% of PE patients. We hypothesize that the decrease in the incidence of hemoptysis might be related to the wide availability of CT scans, allowing of early detection and timely anticoagulation in patients with PE, preventing further progression of the disease with resultant pulmonary infarction 28,29 . These differences could also be explained by the different distribution of age and population 27 .
It is interesting that dyspnea presenting in PE patients, was often accompanied by other symptoms like tachypnea, hypoxia, cough, and hydrothorax. For example, in Cluster 3 of the five clusters generated by cluster analysis, almost all patients in the cluster presented dyspnea and tachypnea, and most presented hypoxia and cough, and some presented hydrothorax. From PCA results, these five symptoms were with the largest loadings in the first principal component. A reasonable explanation was that PE was more likely to be incidentally detected when patients had obvious symptoms 28 . In addition, we recognized that almost half of the PE patients in Cluster 3  Table 2. The associations between comorbidities and presentation of symptoms from logistic regression models (N = 551) Abbreviations: OR, odds ratio; CI, confidence interval; AF: atrial fibrillation; CHD: coronary heart disease: DM: diabetes mellitus; COPD: chronic obstructive pulmonary disease. Age and sex were adjusted in the models. Bold: p < 0.05.
were with COPD. COPD patients had a significantly increased risk of dyspnea than those without COPD. The prevalence of dyspnea in PE patients with COPD was 91.3% in this study, which was similar to a latest study 30 . Actually, the patients in Cluster 3 were usually difficult to differentiate with COPD patients. However, if the duration and severity of dyspnea and tachypnea felt different than usual COPD conditions, or the symptoms were not improved after treatment, or the hypoxia was not improved after treatment, the physician should highly suspect the occurrence of PE. In our study, we also found that in Cluster 1, PE patients with syncope were as the main presentation, accompanied by high oxygen partial pressure (PPO about 96%), and 92.3% of the patients were without edema. The syncope was usually accompanied by hypoxia which was not common in other nervous system diseases. The possible explanation was that embolus sudden blockage in the lung. Such patients should be paid enough attention because of their sudden onset of unknown, lack of typical presentations of PE, and easy to cause misdiagnosed. Therefore, we suggest that patients with this type of presentation should be diagnosed as early as be suspected, and diagnostic performance of PE such as D-dimer testing and CTPA, should be applied for these patients. In Cluster 2 PE patients, there were no typical symptoms. This cluster might be the explanation for under-diagnosis of PE. Possible reason for without typical presentations was that embolus did not fall off to the lungs to formation of PE, or because the patients were on anticoagulant medicine after the formation of lower limb thrombosis. PE patients with cancers and PE patients from obstetrical department usually have no obvious symptoms during PE occurrence and might belong to this cluster. Among PE patients in Cluster 4, the most typical presentation of PE were hemoptysis and cough, accompanied by smoking and COPD. Hemoptysis generally indicated massive PE, but at present PE appeared relatively small probability because of advances in diagnostic techniques. In our study, we found only 11.8% PE patients were with hemoptysis. Researchers recently reported low rates of hemoptysis of PE (only about 5%) 17,18 . The possible reason for the difference might be due to different procedures and level of diagnosis techniques and clinical skills of physicians between our hospital and hospitals from western world. Therefore, we believe that improving the detection and diagnosis of PE, early intervention and treatment would  1.07-1.19) 62.35 (9.97-389.77) 0.67 (0.49-0.92) 0.99 (0.98-1.00) 1.00 (0.99-1.00) 0.91 (0.87-0.95) 1.04 (1.01-1.07 -1.00) 1.00 (1.00-1.01) 0.93 (0.89-0.97) 0.97 (0.94-0 Table 4. The associations between blood gas/ultrasonic indicators and presentation of symptoms from logistic regression models (N = 551) Abbreviations: OR, odds ratio; CI, confidence interval; PPO, arterial oxygen tension; PPOCD, arterial carbon dioxide tension; LAC, lactic acid; PAP, pulmonary artery pressure; TTVR, Three tip valve regurgitation. Age and sex were adjusted in the models. Bold: p < 0.05.
help to reduce the emergence of hemoptysis in patients with PE. Among PE patients in Cluster 5, the most typical presentations of PE were pleuritic pain and cough, accompanied by fever, dyspnea, and tachypnea. Clinicians should pay attention to differentiate such patients with acute myocardial infarction. Pleuritic pain of PE was difficult to describe, and if the pain could not be explained by myocardial infarction or other related diseases, physicians should consider the possibility of PE. All these findings suggested that more attention should be taken into the under-diagnosis of PE. A timely detection of PE is an essential prerequisite of a prompt effective treatment 31 . The current study indicated that clinical symptoms combined with risk factors might provide useful information in identifying highly susceptible PE patients although clinical manifestations of PE were often nonspecific. The clinical presentation also varies depending on the distribution and size of emboli occluding the pulmonary vasculature, as well as the age and pre-existing co-morbidities of the patients 32,33 . This study identified associated signs and symptoms, clinical risk factors associated with the presentation of PE, which were helpful to aid physicians on the diagnosis of this dangerous and potentially fatal disease. Based on our PCA results, we could establish a scoring system using   Table 5. Prevalence of symptoms by the clusters of patients with pulmonary embolism (N = 551).    sensitive and specific enough to diagnose PE patients. However, our study was a good attempt to cluster and clean PE symptoms, and we believe we could develop an accurate scoring system after we accumulate more and more data. As development of artificial intelligence and machine learning techniques, it is possible to deeply study these symptoms and their interactions and combinations and to improve the diagnosis of PE. Moreover in our future study, we aim to analyze all suspicious patients, demonstrate the risk factors of PE incidence, and construct the risk scoring system for PE incidence but not PE patients only. There were some limitations in this study. The major limitation of our study is its retrospective design. Data collection was based on information available on review of the patient medical records. Second, we did not investigate the etiology of PE in this retrospective study. Third, the clinical findings could be well under-represented due to its dependency on physician assessment and documentation of clinical findings leading to recall bias. Further studies are therefore necessary to validate the diagnostic value of clinical characteristics.
In conclusion, different symptoms were associated with different clinical indicators among PE patients. PE patients could be grouped into different clusters of typical symptoms, which would improve accuracy of diagnosis and prevent adverse events due to delayed diagnosis. The diagnosis of PE remained a challenging task, our results will improve our understanding of clinical symptoms and their potential combinations which are helpful for clinical diagnosis of PE.  Table 7. Eigenvectors of ten principal components from PCA Abbreviations: PCA: principal component analysis; PC: principal component.