Introduction

Obstructive sleep apnea (OSA) refers to a chronic condition characterized by recurrent episodes of upper airway obstruction during sleep, leading to intermittent hypoxia, sympathetic nervous system activation, daytime sleepiness, and impaired cognitive function. OSA also has a significant impact on cardiovascular and cerebrovascular function1.

OSA diagnosis is typically based on in-laboratory full-night polysomnography (PSG), which collects multiple types of physiological data during sleep. Currently, the apnea–hypopnea index (AHI) is used to define and categorize the severity of OSA2. AHI is the most commonly reported statistically significant predictor of cardiovascular morality and all-cause mortality3,4. In PSG, other parameters besides AHI, such as oxygen desaturation, sleep architecture, position statistics, and periodic leg movements are reported. Moreover, these parameters are known to be independently related to disease outcome5,6,7,8.

In general, given the likelihood of associations between variables, patients with more severe OSA may have more severe degrees of hypoxia or sleep fragmentation9,10 However, the traits of OSA are complex,some patients may show no association between variables, reducing the overall probability of association between parameters. In this regard, cluster analysis using PSG variables may help to identify associations between variables and unique phenotypes. There several reports regarding cluster analysis to identify multiple phenotypes in OSA in Western countries11,12,13, however, not many studies had been published so far regarding only Asians. Therefore this study aimed to understand the characteristics of patients with OSA in Korea by identifying patient phenotypes based on multiple PSG parameters using unsupervised learning clustering methods. We also attempted to evaluate differences in the clinical outcomes of cardiovascular or cerebrovascular disease-related mortalities or all-cause mortality among the different phenotypes.

Methods

Study population

This was a single tertiary center study based on a retrospective review of medical records. Our study population consisted of patients who had visited the tertiary sleep center for snoring or sleep apnea from January 2007 to October 2017 and underwent a full-night in-laboratory PSG. Patients below 18 years of age or those with a total sleep time less than 240 min were excluded as in our previous study3. Follow up PSGs taken after treatment such as surgery, application of mandibular advancement devices, or PSG taken for titration of continuous positive airway pressure (CPAP) were also excluded. After PSG variables has been selected for further analysis which will be described in the following section, patients with missing variables were excluded. Thus, the final sample size in our study cohort was 4,603 patients (67.6%) out of 6,803 patients. Among the patients, 3,324 were men and the other 1,279 were women. Mean age was 51.9 ± 14.3 years. Duration of recording (time in bed) and total sleep time ranged from 322–720 min to 240–657 min, respectively. Medical condition of enrolled patients was analyzed by their diagnosis based on International Classification of Diseases and Related Health Problems, 10th Revision (ICD‐10). Serum total cholesterol, high density lipoprotein (HDL) levels and smoking status were collected during the study period. Smoking status was reported as never smoker or current/ex-smoker. Serum total cholesterol, HDL level that had been checked were averaged for each patient. Among the patients, 3,299 (71.7%), 2,381 (51.7%), and 1,978 (42.4%) patients were evaluated for serum total cholesterol, HDL, and smoking status. Since not all patients were evaluated, these factors were not adjusted for survival analysis.

All data had been acquired by the Clinical Data Warehouse of hospital electronic medical record system (Bestcare, Ezcaretech, Seoul, Korea)14. As this is a retrospective study with minimal risk for patients, requirement of informed consents was waived by the institutional review board of Seoul National University Bundang Hospital. All methods were carried out in accordance with the ethical standards of the 1964 Helsinki declaration and its later amendments. This study was approved by the institutional review board (IRB No. B-1804/465-104).

Full-night in-laboratory PSG

All subjects underwent a full-night in-laboratory PSG using an Embla N7000 recording system (Embla; Medcare, Reykjavik, Iceland) with the supervision of an experienced technician at the sleep center, as previously described15. PSG parameters included the following: electroencephalography, electrooculography, submental electromyography, lower leg electromyography, electrocardiography, leg movement, chest and abdominal movements, nasal airflow, mouth airflow (thermistor), pulse oximetry, and body position. Based on standard criteria, sleep stages were scored in 30-s epochs. Apnea was defined as an absence of airflow for ≥ 10 s, and were classified as obstructive, mixed, or central according to the American Academy of Sleep Medicine (AASM) recommended guidelines16. Hypopnea was defined as a > 50% decrease in airflow for at least 10 s or a moderate reduction in airflow for at least 10 s associated with arousals or oxygen desaturation (< 4%)17. AHI was calculated as the total number of apneas and hypopneas per hour sleep.

Variable reduction analysis

A two-step variable reduction analysis was conducted. Briefly, clinically relevant PSG variables were selected, followed by principal component analysis-based dimension reductions and K-means cluster analysis18.

PSG results consisted of 62 variables (Supplementary Table S1). Among them, 29 PSG variables were selected (Table 1). Three practicing sleep clinicians (JW Kim, IY Yoon, and SW Cho) at the Seoul National University Bundang Hospital reviewed and selected seemingly clinically relevant variables. To identify phenotypes solely using PSG parameters, other measurements, such as age, sex, and anthropometric measurements, including body mass index (BMI) and neck circumference, were excluded. Moreover, sleep questionnaire scores, including the Epworth Sleepiness Scale (ESS) score and Pittsburg Sleep Quality Index (PSQI) score, were not used as variables. Redundant variables (eg. AHI in non-supine position) that could be easily extracted from the other variables (AHI, supine AHI, and percent sleep time in supine position) were excluded. As we considered to analyze patients whose total sleep time is more than 4 h, total sleep time, time in bed (sum of sleep latency, time of wake after sleep onset, and total sleep time) were not considered. Instead, we selected sleep efficiency which could represent these time variables. PSG variables were categorized into the domains of sleep architecture, breathing disturbance, desaturation, limb movement, and arousal, based on known mechanisms of cardiovascular consequences of OSA4,19 which is similar to a recent study by Zinchuk et al.13. Although most of the variables are overlapping with those from an aforementioned study, several different variables were selected. Durations of respiratory events were included for analysis since these variables are known to differ among patients with similar AHI, with different cardiovascular outcome20,21. The central apnea index was not included in variable reduction or cluster analysis, since among the variables in the apnea index category, the distribution of the central apnea index was the most skewed [skewness = 19.4, with D (4,603) = 0.417 and P < 0.001 by Kolmogorov–Smirnov test for normality], and therefore, was not suitable for standard principal component analysis22. AHI is a composite of variables that it could be easily calculated as sum of apnea index and hypopnea index. However, as AHI is the most commonly reported statistically significant predictor of cardiovascular morality and all-cause mortality, AHI was included in our study. Finally, sleep position in OSA was also considered. Therefore, AHI in supine position, time percent of supine position during sleep was considered.

Table 1 Polysomnographic parameters used for analysis.

After variables for analysis has been selected, patients with missing variables were excluded.

Cluster analysis

Variables were standardized with a mean of zero and standard deviation of one. Using these variables as input, principal components analysis was used to derive a set of low-dimensional components while still preserving as much variance as possible. Selection of components that retained at least 75% of the total variance (the sum of variances of all individual principal components) were determined23 (Supplementary Table S2). After application of varimax rotation to maximize item variance and simplify interpretability, scores of given principal components representing PSG features were acquired for each subject24. The number of clusters from this dataset was then estimated via the gap static25 with 500 bootstraps. The suggested number of clusters was 4 (Supplementary Figure S1). Thereafter, K-means cluster analysis was performed using the factors. Cluster validation had been performed by calculation of Dunn index and average silhouette width index26 (Supplementary Table S3).

Mortality status

Outcomes of different clusters were evaluated using mortality status. Deaths that occurred up to December 31, 2016, were identified by matching each subject’s Korean Identification Number and name with death records provided by Statistics Korea (https://kostat.go.kr). The date and primary cause of death were collected from the national death statistics. Causes of death were classified based on the underlying causes described in the deceased’s death certificate, as recommended by the World Health Organization27.

Disease-specific causes of death in the following categories were evaluated: (1) cardiovascular or cerebrovascular diseases, such as stroke, acute myocardial infarction, acute ischemic heart disease, sudden cardiac arrest, and cardiac dysrhythmias, (2) cancer, (3) fatal events including car accidents injuries, falls, and suicide, (4) others, including pneumonia, gastrointestinal bleeding, end-stage renal disease, amyloidosis, etc3.

Statistical analysis

PSG and other quantitative clinical variables were described separately for each cluster using mean and standard deviation. Comparisons between clusters were performed using one way analysis of variance for continuous variables, and the chi-squared test for categorical variable. The relationships between clusters and both disease-specific and all-cause mortality rates were evaluated using Kaplan–Meier survival analysis. A Cox proportional hazards regression analysis was used to estimate hazard ratios (HR) and 95% confidence intervals (CI), which were adjusted for age, sex, BMI, and underlying medical condition. Cox–Snell residuals were used to evaluate the overall fitness of the model28. Medical condition of each patient was then grouped into three disease category; hypertension, diabetes, cardiovascular/cerebrovascular disease (such as atrial fibrillation, ischemic heart disease and stroke). A repeated analysis was performed using the different categories of AHI severity (normal, AHI < 5; mild, AHI < 15; moderate, 15 ≤ AHI < 30; and severe, AHI ≥ 30) to assess whether the risk of disease-specific and all-cause mortality can be observed by conventional OSA severity categories29.

Analyses were performed using SPSS 22.0 (IBM Corp., Armonk, NY, USA) and R 3.4.2 software (R Foundation for Statistical Computing; https://www.r-project.com). Values of P < 0.05 were considered statistically significant.

Results

General characteristics of patients

Final analysis was performed with 4,603 patients (PSGs). The mean AHI was 22.6 ± 22.3 events/h. According to current classification for OSA severity based on AHI, the number of patients in each severity category was 1,194 (25.9%), 1,081 (23.5%), 975 (21.2%), and 1,353 (29.3%) for AHI < 5 (normal), 5 ≤ AHI < 15 (mild), 15 ≤ AHI < 30 (moderate), and AHI ≥ 30 (severe), respectively. There were significant differences in age, sex, BMI, ESS, and PSQI according to current classification for OSA severity (P < 0.001). General patient characteristics are shown in Table 2

Table 2 Patients’ general characteristics.

Cluster characteristics

Twenty-nine PSG parameters were reduced into nine factors, and based on these factors, K-means cluster analysis resulted in four OSA subgroups. Subsequently, the four clusters were rearranged according to their mean AHI, similar to conventional categorization of OSA severity categorized based on AHI. A summary of cluster characteristics is described in Table 3. All 29 PSG variables used for analysis differed significantly among the four clusters (P < 0.001). Other clinical parameters that were not used for analysis, such as age, BMI, ESS, and PSQI, were also significantly different among the four clusters (P < 0.001). Cluster labeling was based on AHI severity and differences in PSG features between clusters. Distribution of patients’ OSA severity, according to the AHI within each cluster, is described in Fig. 1.

Table 3 Cluster characteristics.
Figure 1
figure 1

Stacked bar graph showing the distribution of OSA severity in each cluster based on AHI (normal: AHI < 5; mild: AHI < 15; moderate: 15 ≤ AHI < 30; and severe: AHI ≥ 30). AHI apnea–hypopnea index.

Cluster 1

Cluster 1 was the largest among the 4 clusters (n = 2,563, 55.7%). The mean AHI was 8.52 ± 7.43/h, the lowest among all other clusters. This cluster tended to have the lowest proportion of stage 1 non-rapid eye movement (NREM) sleep, 9.27 ± 4.85% of total time analyzed, while other sleep stages were longer compared to other clusters. This cluster had the highest spontaneous arousal index (5.28 ± 4.13/h) among all clusters. This cluster was labeled as “normal to mild OSA with spontaneous arousal”.

Cluster 2

Cluster 2 was the least common, comprising only 7.6% (352/4,603). The mean AHI was 12.16 ± 11.61/h, significantly higher than that of cluster 1 (P < 0.001). This cluster had the longest sleep latency (26.19 ± 31.42 min) and REM latency (155.99 ± 96.94 min) and the lowest sleep efficiency (74.65 ± 11.03%). The most significant PSG parameters in this cluster were periodic limb movement (PLM) index score (61.96 ± 33.34/h) and PLM associated arousal index (10.93 ± 11.58/h). This cluster was the oldest with a mean age of 63.91 ± 10.36 years. ESS score was the lowest (8.44 ± 4.98) in this cluster, while PSQI score was the highest (9.40 ± 7.15). This cluster was labeled as “normal to mild OSA with poor sleep and PLMs”.

Cluster 3

Cluster 3 was the second largest cluster, comprising 27.8% (1,278/4,603) of the patients. The mean AHI was 38.60 ± 13.65/h. Among the analyzed PSG parameters, hypopnea index scores were the highest in this cluster (16.63 ± 10.47/h). This cluster was labeled as “moderate to severe OSA with hypopnea”.

Cluster 4

Cluster 4 comprised 8.9% (410/4,603) of the patients and had the highest AHI with a mean value of 69.66 ± 14.54/h. This cluster had the shortest sleep latency (11.59 ± 16.22 min) and the highest sleep efficiency (84.40 ± 9.75%). Compared to other clusters, this cluster had the highest proportion of stage 1 NREM sleep (24.31 ± 11.54%) and the lowest proportion of stage 2 and 3 NREM sleep (44.09 ± 12.71%, 3.24 ± 4.43%, respectively). The oxygen desaturation index (ODI), percentage of time with SpO2 less than 90%, respiratory arousal index, mean apnea and hypopnea duration, were the highest in cluster 4 among all of the four clusters. Compared to all other clusters, this cluster was apnea dominant with the lowest proportion of hypopnea index over AHI (14.54 ± 13.81%). AHI during REM over AHI during NREM was the lowest in cluster 4 with a ratio of 0.86 ± 0.27. Regarding AHI according to sleep position, this cluster had the lowest ratio of supine AHI over lateral AHI (2.85 ± 7.80). Among non-PSG features, this cluster had the highest ESS score (11.69 ± 4.86), highest BMI (28.92 ± 4.24 kg/m2), and the youngest age (48.18 ± 14.30 years). This cluster was labeled as “severe OSA with hypoxemia”.

Obesity is an important risk factor for OSA and many PSG features show correlation with BMI30,31. Because the BMI of cluster 4 was the highest, the PSG features were compared for subgroup analysis according to BMI status (BMI ≥ 25 kg/m2, and BMI < 25 kg/m2) to evaluate the impact of obesity. Among 410 patients, 68 were non-obese (BMI < 25 kg/m2) while 342 were obese (BMI ≥ 25 kg/m2). The mean AHI was significantly higher in obese patients than in non-obese patients (71.02 ± 14.29 vs 62.80 ± 13.92, P < 0.001). However, there was no significant difference in AHI during REM over AHI during NREM, ratio of supine AHI over lateral AHI, and fraction of hypopnea index between the 2 groups. Interestingly, the mean total apnea–hypopnea duration was significantly higher in non-obese patients (34.39 ± 7.81 vs 29.83 ± 7.18, P < 0.001). The PSG features of obese and non-obese patients in cluster 4 are summarized in Supplementary Table S4.

Survival analysis

Of the 4,603 patients, survival data were acquired for 4,210 (91.5%) patients. The mean observational period was 53.49 ± 34.33 months and 119 deaths (2.82%) occurred during the observation period. Among them, 15.97% (19/119) of the deaths were due to cardiovascular or cerebrovascular disease. Other causes of death were cancer (35.2%, 42/119), fatal events (15.96%, 19/119), and others (32.8%, 39/119). The 10-year actuarial overall survival was 93.0% with an actuarial mean survival of 113.5 months (95% CI 112.8–114.1). Number of deaths and underlying medical condition, serum total cholesterol, HDL, and smoking status according to cluster are described in Table 4. Underlying comorbidities, serum cholesterol level, and smoking status were significantly different among clusters. Cluster 4 tended to have the highest prevalence of hypertension, diabetes, and smoking while having lowest HDL level. Similar table was made for OSA severity based on AHI as Supplementary Table S5. Kaplan–Meir survival analysis showed a significant difference in cardiovascular or cerebrovascular disease-specific mortality, according to clusters (Log rank, P value = 0.009, Fig. 2a). Clusters 2 and 4 showed 6.4- and 4.8-fold greater risk of mortality due to cardiovascular or cerebrovascular disease-specific mortality, respectively, compared to that in cluster 1 (Table 5). However, after adjustment for age, BMI, sex, and underlying medical condition only cluster 4 had a significantly higher risk for disease-specific mortality as compared to cluster 1 (HR = 7.580; 95% CI 1.852–31.029; P = 0.005). However, according to the conventional OSA classification based on AHI severity, there was no significant difference in disease-specific mortality (Log rank, P value = 0.331; Fig. 2b). Moreover, Cox proportional hazards model demonstrated no significant risk of mild to severe OSA compared to normal individuals with AHI less than 5/h after adjustment for age, BMI, sex, and comorbidities (Table 6). For all-cause mortality, both conventional OSA classification and clusters showed a significant difference. After adjustment for age, BMI, sex, and underlying medical condition, cluster 4 and severe OSA patients had significantly higher risk of all-cause mortality as compared to cluster 1 and normal individuals. However, the clusters showed no significant differences in mortalities due to other causes (Supplementary Figs. S2,S3; Tables S5,S6)

Table 4 Patient outcome according to clusters.
Figure 2
figure 2

Kaplan–Meier survival probability curves for risk of cardiovascular/cerebrovascular disease specific mortality according to clusters resulted from cluster analysis (a) and groups of patients categorized by conventional OSA severity classification (b). There is a significant difference in disease specific mortality when patients are grouped by cluster analysis (Log rank P = 0.009) while as when patients are grouped by conventional OSA severity classification, there is no such significant difference (Log rank P = 0.331).

Table 5 Cox hazard ratio and adjusted Cox hazard ratio for cardiovascular/cerebrovascular disease related mortality of OSA clusters.
Table 6 Cox hazard ratio and adjusted Cox hazard ratio for cardiovascular/cerebrovascular disease related mortality of OSA severity classification.

Discussion

The current classification of OSA is based on AHI severity only, since each OSA severity is known to have a different outcome as shown in a longitudinal cohort study4. The goal of this study was to identify multiple OSA phenotypes based on various PSG features and to assess whether these PSG features were distinct from each other. It is prudent to understand the meaning of PSG features, including AHI, to identify and understand the pathophysiology of OSA. Numerous studies have reported on the importance of other PSG parameters besides AHI5,6,7,8,however, these studies mainly focused on specific PSG parameters. There is a paucity of studies that have simultaneously analyzed and compared all PSG parameters.

All-cause mortality was significantly different for both conventional classification of OSA and cluster-based classification. The adjusted HR for all-cause mortality in severe OSA group was significantly higher compared to normal individuals. However, mortality differed significantly only among the clusters when cardiovascular or cerebrovascular disease were considered in the analysis. After adjustment for age, sex, BMI, and underlying comorbidities, cluster 4 had significantly higher risk of mortality due to cardiovascular or cerebrovascular diseases as compared to cluster 1, while patients with severe OSA did not have a significantly higher risk compared to normal individuals. The current result contradicts our previous report, which stated that patients with severe OSA had significantly higher risk of mortality due to cardiovascular and cerebrovascular diseases as compared to normal individuals. However, this contradiction seems to be related to the recent decreasing trends of mortality due to cardiovascular and cerebrovascular diseases in Korea32,33.

Although both cluster 1 and cluster 2 are mostly composed of patients with normal to mild OSA, they represent different clinical outcomes. Cluster 2 is significantly older, and is characterized by a high PLM index and low sleep efficiency. It is known that prevalence of PLM increases with age34,35 and it is also known to be associated with OSA, such that PLM syndrome (PLMS) is more common in patients with sleep-disordered breathing than in the general population36. One study demonstrated a decreased PLM after CPAP therapy in mild OSA, suggesting the development of PLM as one of the consequences of OSA37. However, PLM sometimes newly appears after the initiation of positive airway pressure therapy38 and therefore the direct causal relationship between PLMS and OSA is poorly understood

PLMs are followed by arousal-related nervous system events, which manifest as cortical activity, and heart rate increases significantly in the first 5 s after the PLM39. Previous studies have advocated the importance of PLM since events may be representative of decreased dopaminergic activity or increased activation of the autonomic nervous system, which may be related to increased vascular consequences7,13,34,40. In our study, cluster 2 had a significantly higher risk of disease-specific mortality compared to cluster 1. However, after adjustment for age, increased risk in cluster 2 was no longer significant. Therefore, whether poor outcomes in cluster 2 are due to age, or other clinical findings possibly related to characteristic PSG features needs to be further validated.

Another important cluster was cluster 4. This cluster was characterized by the most severe type of OSA with the highest AHI. Contrary to the other clusters, this cluster had the lowest proportion of hypopnea, ratio of REM AHI to NREM AHI, and ratio of supine AHI to lateral AHI. Although age, BMI, and ESS were not included in the clustering, this cluster had the highest ESS score, highest BMI, and the youngest age. Severe obesity and high ESS scores seem to be the cause and effect, respectively, of cluster 4. Increased sleep efficiency in this cluster may be the result of excessive daytime sleepiness41. In Korea, severe obesity is more prevalent in younger adults42, which may explain the relatively young age of cluster.

Patients in cluster 4 had the highest age-adjusted cardiovascular/cerebrovascular mortality. Similar trend was observed for patients with severe OSA (AHI ≥ 30), a statistical significance was lacking, suggesting that not all patients in severe OSA carry a similar cardiovascular/cerebrovascular risk. PSG measurements representing the cumulative exposure to hypoxemia during sleep (i.e. time of O2 saturation < 90%), or depth and duration of hypoxia induced by respiratory distress, are known to be strongly associated with cardiovascular mortality43,44. However these measurements are only moderately associated with AHI, and therefore may not be captured by frequency based metrics only. Time of O2 saturation < 90% and mean apnea hypopnea duration were highest in cluster 4 than in the other clusters suggesting that patients in cluster 4 are grouped by combinations of PSG measurements associated with adverse cardiovascular outcome. This seems to result in additional cardiovascular/cerebrovascular mortality risk in cluster 4.

Cluster 3, which had a mean AHI of 38.6 ± 13.65/h, had a higher level of fraction of hypopnea, ratio of REM AHI to NREM AHI, and ratio of supine AHI to lateral AHI compared with cluster 4. This cluster had lower mean apnea–hypopnea durations. The fraction of hypopnea, sleep stage dependency, position dependency, and duration of breathing disturbance are known to be related to other pathophysiological traits. For example, increased fraction of hypopnea may indicate decreased arousal threshold, while increased apnea–hypopnea duration with desaturation may indicate increased arousal thresholds45. The ratio of REM AHI to NREM AHI may also indicate alterations in muscle responsiveness with higher values suggesting higher muscle responsiveness45,46. Although most of the patients in clusters 3 and 4 belonged to the same category of severe OSA, the PSG characteristics and clinical outcomes were different between the two clusters, which may suggest different underlying pathophysiologies.

As the mean BMI was the highest in cluster 4, severe obesity may have resulted in different PSG characteristics and disease outcomes in this cluster. The AHI, ratio of REM AHI to NREM AHI, ratio of supine AHI to lateral AHI, and severity of nocturnal hypoexmia are known to be correlated to BMI30,31. However, severe obesity cannot explain all the observed features of cluster 4. In this cluster, 16.6% (68/410) of the patients were non-obese (BMI < 25 kg/m2). Although the mean AHI and mean ODI scores were higher in the obese group, mean AHI in the non-obese group was still high (62.85 events/h), and there was no significant difference in the ratio of REM AHI to NREM AHI and ratio of supine AHI to lateral AHI between the obese and non-obese groups.

Cluster analysis with variables limited to breathing disturbance and desaturation (such as AHI, ODI, time of O2 saturation < 90%, and lowest O2 saturation) may be a simpler approach compared to our study which used 29 PSG variable from 5 different domains. One recent study demonstrated that oximetric parameters were able to describe a different phenotype with a high risk of mortality among patients with moderate to severe OSA47. However, our study revealed how other PSG features were grouped together so that we could characterize each cluster from many other perspective. In either way, these analytic methods may ultimately help us to understand the pathophysiology of OSA by phenotyping patients with PSG features beyond AHI.

The current study has several limitations. First, no adjustment was made for patient treatments. CPAP therapy is the gold standard procedure to treat OSA; however, the long-term, definitive effects of CPAP to prevent adverse cardiac events, stroke, or death are still controversial48,49,50. Some of the patients in our study population may have been using CPAP therapy. However, until 2018, CPAP therapy was not covered by the insurance system in Korea. Therefore, CPAP treatment was used as an off-label therapy and no systematic data are available on the use of CPAP therapy. Presently, CPAP treatment is covered by insurance, therefore, future prospective studies to validate the effect of CPAP treatment in disease prevention are necessary. Our clustering algorithm may improve the efficacy of CPAP therapy for certain patient groups. Second, there were no adjustments for the treatment of underlying comorbid diseases. Although serum cholesterol, smoking history were collected, these were not adjusted in survival analysis. These are the limitations that originated from the retrospective study design. A relatively short follow up period and therefore a low event rate is another limitation. A longer follow up period or perhaps prospective multicenter studies, with more detailed clinical information may overcome these limitations. In addition, this is a single center study mainly composed by a Korean population, therefore, generalization should be avoided as ethnic factors such as diet, craniofacial morphological differences, obesity, and prevalence of comorbidities are known to be different from Western countries51,52. Finally, K-means clustering may not be the optimal clustering algorithm for the current dataset. Theoretically, K-means clustering assumes that all clusters lie within a sphere with the same radius. When this equal-radius, spherical assumption is violated, as in the case of elliptically distributed data, K-means clustering can behave non-intuitively. Moreover, the number of K of groupings is fixed, and therefore, groupings would be easily violated by a small number of outliers53. Therefore, our results need to be further validated by other clustering algorithm.

Conclusion

This study identified four distinct clusters of patients with OSA solely based on the various PSG features. There was a significant difference in disease outcome among the clusters, and such a difference could not be found in the standard classification of OSA based on AHI severity scores. Distinct characteristics among clusters imply different underlying pathophysiologies. Based on these phenotypes, further understanding and personalized treatment of OSA can be made. Our findings suggest that physicians should consider the AHI score along with other PSG parameters in their assessment of patients with OSA.