Introduction

The excessive upper airway narrowing reportedly contributes to the underlying pathogenesis of obstructive sleep apnea (OSA) and leads to sleep-related symptoms or systemic cardiovascular complications if not properly treated1,2,3,4,5,6,7,8. The upper airway narrowing can be caused by collapse at multiple levels, such as the soft palate, uvula, palatine tonsils, lateral pharyngeal walls, and tongue base3,9. So far, diverse sleep surgeries have been developed to correct the upper airway narrowing of OSA subjects and the focus on enhancing the tension of the upper airway muscles resulting in widening the pharyngeal lumen10,11,12,13,14,15.

The success rate of sleep surgery is known to vary widely, ranging from 45 to 78% in OSA subjects10,11,12 and it is essential to determine OSA subjects suitable for sleep surgery to avoid unnecessary surgical treatment and to provide optimal surgery to OSA subjects. In particular, sleep surgery could be more effective if it is possible to know in advance how successful the surgery will be prior to surgery. Until now, prediction of the success rate of sleep surgery has depended on the subjective experience of the sleep surgeons, and objective prediction based on the results of polysomnography (PSG) or the subjects’ upper airway findings has not been attempted much in the clinical field.

The development of a screening model based on clinical features collected from OSA subjects and sleep parameters would be extremely practical in predicting therapeutic outcomes. Such a model can also help sleep physicians provide more adequate therapeutic options to OSA subjects with a high pretest probability of OSA7,14,15,16,17,18. Prediction models reported in the literature were mostly built using clinical features including demographics (age, sex, smoking, and alcohol consumption), comorbidities, anthropometrics, OSA symptoms, physical findings, and physiologic measurements (blood pressure, overnight pulse oximetry, and pulmonary function) collected from either sleep lab- or community-based populations7,14,15,16,17,18,19,20. However, these models only use limited variables because creating a predictive model using numerous parameters including anatomical and physiological parameters may not be effective in traditional statistical models. Moreover, most prediction models for the outcome of sleep surgery tend not to have an increased sensitivity with higher specificity in providing adequate therapeutic options before treatment and it is difficult to create standardized predictive models.

Recently, machine learning, an application of artificial intelligence (AI), is the study and development of systems that can learn from and make predictions about data without the need to be programmed. Machine learning and data-mining methods enable the detection of hidden patterns in a set of data. Thus, machine learning is a potential means of addressing problems in conventional predictive modeling because of its massive parallelism, self-organization, adaptive learning capability, and robustness21. Hence, we considered it possible to create a good prediction model of therapeutic outcomes for sleep surgery by using machine learning methods that repeatedly analyzing the difference between the subjects who succeeded and who failed sleep surgery. The development of an objective surgical success rate prediction program using machine learning can classify OSA subjects who may be suitable for surgery and provide more effective treatment to subjects.

The present study aimed to propose an easy-to-use and accurate machine learning model to predict the surgical outcome of OSA subjects. We developed a data mining-driven prediction model using a database with features routinely collected from PSG data and the findings from anatomical and physical examinations.

Methods

Ethics statement

One hundred sixty-three subjects who underwent sleep surgery for diagnosed OSA at the Department of Otorhinolaryngology, Seoul National University Hospital, from March 2010 to September 2019 were recruited and analyzed retrospectively. All subjects participated in the study voluntarily and the medical records of the participants were reviewed retrospectively. Written informed consent was obtained from each participant and the study complied with the Declaration of Helsinki. The protocol of the study was approved by the Institutional Review Board of Seoul National University Hospital (IRB number: 1801-084-915).

Subjects and study design

All subjects underwent sleep surgeries which were combined with tonsillectomy, palatal procedures, tongue base resection, and nasal surgeries to improve sleep-related symptoms and abnormal sleep parameters. Indications of sleep surgeries for OSA were (1) aged 18 years or older, (2) diagnosed as OSA (apnea–hypopnea index (AHI) ≥ 5) based on the International Classification of Sleep Disorders22, and (3) refused or failed PAP therapy. Among the subjects who underwent sleep surgeries, only those who had undergone physical examinations and sleep tests (PSG or home sleep apnea testing, HSAT) before and following sleep surgeries, and whose sleep parameters were available, were included. The OSA subjects who had 1) a history of previous oropharyngeal OSA surgery, 2) morbid obesity (a body mass index (BMI) > 40 kg/m2), 3) craniofacial abnormality, 4) other significant conditions (genetic syndrome, neuromuscular disease), or (5) AHI < 10 was excluded.

Physical examination

All subjects underwent a preoperative upper airway examination, and the following variables were attained for analysis: septal deviation, presence of elongated uvula, tonsil size, and palate position. Tonsil size and palate position were categorized from 1 to 4 using the Friedman staging system23. Tonsil size grade 0 is defined as tonsillectomy status. The presence of nasal pathologies including septal deviation and inferior turbinate hypertrophy were evaluated based on intranasal endoscopic findings and confirmed it if the subjects complain of nasal obstruction and frequently breathe with the mouth open.

Sleep study

Pre- and postoperative PSG or HSAT studies were carried out in all subjects: PSG in 110 subjects and HSAT in 63 subjects. The results of the tests were analyzed according to the scoring guidelines of the American Academy of Sleep Medicine Task Force 2007 criteria24. An apnea episode was defined as a complete cessation of airflow or a ≥ 90% reduction in the peak thermal sensor signal for at least 10 s. A hypopnea episode was defined as a ≥ 50% reduction in the nasal pressure signal for at least 10 s and ≥ 3% desaturation from baseline or an arousal. The mean period between preoperative PSG or HSAT and surgery was 5.6 ± 6.8 months and the mean period between sleep surgery and postoperative PSG or HSAT was about 4.1 months.

Surgery and surgical outcomes

All subjects were treated with oropharyngeal surgery, nasal surgery, and/or hypopharyngeal surgery according to the results of their physical examination. In this study, oropharyngeal OSA surgery included uvulopalatoplasty, suspension lateral pharyngoplasty, uvuloplasty, and relocation pharyngoplasty. Nasal surgery included septoplasty, turbinoplasty, and endoscopic sinus surgery. Hypopharyngeal surgery included tongue base reduction epiglottis surgery. Objective surgical outcomes were evaluated by the result of postoperative sleep studies. Surgical success was defined as a postoperative AHI < 20 and a ≥ 50% reduction in preoperative AHI25. All subjects were divided into two groups: responders and non-responders. A responder was defined as a patient who had surgical success in their postoperative PSG or HSAT.

Prediction modeling and machine learning

Fifteen variables were used for prediction models: demographic parameters (age, sex, and BMI), anatomical parameters in physical examination (presence of septal deviation, tonsil size grade, palate position grade, presence of uvula elongation, and Friedman stage), and parameters from the preoperative PSG or HSAT (preoperative AHI, ratio of non-supine versus supine AHI, sleep efficiency, percent of rapid eye movement [REM] sleep, ratio of REM versus non-rapid eye movement [NREM] sleep AHI, lowest O2 percent, and sleep time with oxygen saturation ≤ 90%). The subjects were randomly divided into a training set, in which the prediction models were derived, and a test set, in which the models were applied and verified: 70% of subjects were in the training set and the remaining 30% were in the testing set (Fig. 1). Considering the ratio of responders and non-responders in each set, stratified random sampling was applied. Logistic regression (generalized linear model) and three different machine learning methods (random forest, gradient boosting machine [GBM], and support vector machine [SVM]) were used to predict surgical outcomes. Random forest and GBM are ensemble models that generate a powerful model by grouping several decision trees; it can be used for both regression analysis and classification analysis26,27. Random forest consists of several decision trees made by randomly selecting some of the entire variables to prevent overfitting26. GBM is different from random forest as it generates a series of trees by emphasizing the mis-classified/predicted incidences from the previous tree27. SVM generates a line or hyperplane which separates the data into class while creating maximum margin between classes28. Hyperparameters were determined by using fivefold cross-validation and grid search on the training set to lead the best performance in the random forest, gradient boosting, and SVM models. Missing data that accounted for 1.2% of overall clinical parameters (e.g. ratio of REM versus NREM sleep AHI in a patient with undetectable REM sleep) were dealt with by multiple imputation29. Variable importance was determined in each model30.

Figure 1
figure 1

Prediction modeling and evaluation process. The total subjects (n = 163) were randomly divided into training and test sets by 7:3 ratio. The training set (n = 115) was used to derive the four prediction models: logistic regression (generalized linear model), random forest, gradient boosting machine, and support vector machine. Then, each model was applied to predict the surgical success for the subjects in test set (n = 48); surgical success was defined as a postoperative AHI < 20 and a ≥ 50% reduction in preoperative AHI. The performance of each model was evaluated by the true reference (surgical outcomes in their postoperative PSG or HSAT). AUC area under curve, PPV positive predictive value, NPV negative predictive value.

The importance scores are used to characterize the general effect of predictors on the model. For logistic regression model, the absolute value of the t-statistic for each model parameter determines the importance scores. Based on permutation principle31, the differences in accuracy for permutation of each predictor variables are computed and then averaged over all trees to determine the importance score in random forest model. The importance scores in GBM are determined by using the same approach as a random forest model except that the importance scores were summed over each boosting iteration. For SVM model, receiver operating characteristic (ROC) curve analysis is conducted on each predictor, and area under the curve (AUC) is used as the measure of variable importance. A physician who specialized in otorhinolaryngology and performed OSA surgery predicted the surgical outcomes of the 163 subjects based on the 15 variables including anatomical factors and preoperative AHI, and the physician’s prediction was compared with the performance of the machine learning models. The physician predicted the surgical outcome intuitionally based on his clinical experience and had more focused on preoperative OSA severity based on AHI, and the anatomical factors of OSA subjects. If the moderate or severe OSA subjects have intranasal lesions, over grade 3 of tonsil size, and less severe retroglossal area narrowing, the surgical outcome might be predicted to be success and the physician determined the sleep surgery in the real practice.

Statistical analysis

All statistical analyses were performed using R for Windows version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). Library packages (caret, http://CRAN.R-project.org/package=caret) were used to obtain prediction models and variable importance30. Continuous variables, such as demographic parameters and PSG or HSAT data, are presented as the mean ± SD; two-sample t-tests were carried out to analyze these variables. Chi-squared tests were performed to assess categorical variables. Predicted probability of outcomes were computed for all prediction models. ROC curves were then generated to evaluate the performance of predictions of responders and non-responders confirmed by postoperative PSG or HSAT. The thresholds that yield closest points in the ROC curves to the top-left corner were selected and then used to compute the model accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). As multiple classifiers are trained in a single training set and compared in a single test set, McNemar test is conducted to compare 4 machine learning classifiers. McNemar test compares the number of classifications that method A was correct while method B was incorrect to the number of classifications that method B was correct while method A was incorrect. Two sample proportion test was performed to compare the accuracy of the prediction models and the physician’s prediction. A p-value < 0.05 was set as the threshold for statistical significance.

Results

Clinical examination and sleep parameters of OSA subjects

We recruited 163 subjects who underwent sleep surgeries to resolve excessive narrowing or collapse of upper airway including nasal pathologies, palatal obstruction, tonsil hypertrophy, and tongue base narrowing. Of these, 148 subjects were men and 15 were women. The mean age was 43.5 years (range, 18–72) and the mean BMI was 26.2 kg/m2. Based on preoperative endoscopic findings, 90.2% of the subjects exhibited septal deviation and 98.2% showed grade 1 or larger tonsils. In addition, 94.5% of the subjects had over grade I palate position, and Friedman stage II (44.8%) or stage III (47.9%) were commonly found in the subjects. Most subjects showed anatomic narrowing at one or more structures of the upper airway and 69.3% of subjects were found to have excessive narrowing at both nasal cavity and oropharynx. The hypopharyngeal narrowing was observed not to be severe in these OSA subjects.

The severity of OSA was based on AHI and the mean value of preoperative AHI was 36.5 events/hour. Of the subjects, 19 (11.7%) had mild OSA, 51 (31.3%) had moderate OSA, and 93 (57.1%) had severe OSA. PSG findings revealed that the mean lowest O2 saturation was 78.1 and the mean sleep time with O2 saturation < 90% was 7.8% (Table 1).

Table 1 Clinical characteristics and management of 163 subjects.

The clinical data revealed that the correction of nasal pathologies was performed in 150 subjects (92.0%) and 159 subjects underwent palatal surgery including tonsillectomy (97.5%). The rate of anatomic narrowing in tongue base and hypopharynx was relatively lower and only 33.6% of subjects underwent hypopharynx surgery such as tongue base resection and partial epiglottectomy (Table 1). Because multi-level narrowing of upper airway was commonly found in 163 subjects, 153 subjects underwent multi-level surgery (93.9%), and only 10 subjects underwent single-level surgery (6.1%). Among the 153 subjects who underwent multi-level surgery, 147 had multi-level surgery including both nasal and palatal surgeries with or without hypopharynx surgery. The success rate of sleep surgery was determined using pre- and post-operative sleep parameters and the overall severity of OSA improved significantly after sleep surgeries (36.5 versus 21.5, p < 0.001) in 163 subjects. PSG data revealed that 80 subjects were classified into the responder group and 83 into the non-responder group.

Comparison of sleep parameters and clinical factors between responders and non-responders

Next, we compared the demographic data, clinical findings from physical examination, and the sleep parameters between responders and non-responders to determine the factors that can predict the surgical outcome of OSA subjects (Table 2). There was a significant difference BMI, the presence of nasal pathologies, tongue size grade, preoperative AHI, lowest O2 saturation, and sleep time with oxygen saturation ≤ 90% between the two groups.

Table 2 Comparison of clinical parameters between responder and non-responder groups.

We found that the mean BMI of the responders was lower than that of non-responders. On the comparison of anatomical parameters, the degree of nasal septal curvature was more severe in non-responders and the mean tonsil grade was higher in responders. Comparing the parameters from the preoperative PSG, the responders had a higher preoperative AHI than non-responders and the lowest O2 saturation was relatively higher and sleep time with oxygen saturation ≤ 90% was lower in the responder group. Through these clinical data, we were able to determine the clinical factors that showed a significant difference between responders and non-responders, and we presume that preoperative analysis of these data through machine learning might be effective to predict therapeutic outcome of sleep surgery.

Prediction of surgical outcome using machine learning

Among the 163 subjects, 115 (70.6%) subjects were randomly assigned to the training set and the remaining 48 (29.4%) were randomly assigned to the testing set. To predict surgical outcomes, logistic regression and three machine learning models were derived by using the data from the training set: Logistic regression (Fig. 2a), random forest (Fig. 2b), gradient boosting (Fig. 2c), and SVM (Fig. 2d) models. Variable importance in each model shows the relative importance of the parameters in predicting the surgical outcome. The variable importance is measured in a scale from 0 to 100 with the most important variable having a value of 100 and the least important variable having a value of 0. In the logistic regression, septal deviation, age, lowest O2 level, and tonsil size were major contributors to surgical outcomes (Fig. 2a). Sleep time with oxygen saturation less than 90%, lowest O2 level, and age were major contributors in the other models (Fig. 2b-2d). BMI was also a major contributor in the random forest and gradient boosting models (Fig. 2b,c). In addition, tonsil size and septal deviation were ranked highly as important variables in the random forest model (Fig. 2b). The majority of important contributors in the four machine learning models correspond to the significant parameters in the comparison analysis between responder and non-responder groups: BMI, presence of septal deviation, tonsil size, preoperative AHI, sleep time with oxygen saturation ≤ 90%, and lowest O2 percent. However, some parameters such as age and sleep efficiency, which are not significant in the comparison analysis between responder and non-responder groups, also seem to play an important role in the machine learning models. The analytic data generated a receiver operating characteristic (ROC) curve for the prediction of surgical outcomes in OSA subjects; the area under the curve (AUC) was highest in the gradient boosting model (Fig. 3). In addition, Table 3 represents the accuracy, sensitivity, specificity, and positive predictive value, negative predictive value for each model. The gradient boosting model showed the highest accuracy, while the logistic regression model showed the lowest accuracy. In contrast to the traditional model (logistic regression model) that does not analyze the complex interaction between variables, machine learning models analyze the interaction and the association between variables, which may improve their predictive performance. Table 3 represents accuracy, sensitivity, specificity, positive predictive value for each model. Gradient boosting model showed higher accuracy than logistic regression model (p = 0.033; Fig. 3). Although significance was not met, random forest and SVM models had higher accuracy than logistic regression model (p = 0.083 and p = 0.109, respectively). The physicians predicted surgical outcomes in a full dataset of 163 participants based on 15 clinical parameters used in the prediction models (Table 3), and the performance of the physician’s prediction and the prediction models was also compared. Although there was no significant difference in accuracy between the physician’s prediction and the logistic regression model (p = 0.467), the machine learning models had a higher performance in accuracy than the physician’s prediction (p = 0.053, 0.017, and 0.053 in the random forest, gradient boosting, and SVM models, respectively).

Figure 2
figure 2

Importance of the variables in each model. (a) Logistic regression, (b) random forest, (c) gradient boosting, and (d) support vector machine. AHI apnea–hypopnea index, BMI Body mass index, REM rapid eye movement, NREM non-rapid eye movement.

Figure 3
figure 3

Receiver operating characteristic (ROC) curves for machine learning models. Area under the curve (AUC) are shown for each model. The color of lines in the ROC curve represents each algorithm; logistic regression (black); random forest (green); gradient boosting (red); support vector machine (blue). GBM, gradient boosting machine; SVM, support vector machine.

Table 3 Performance metrics of machine learning models and physician’s prediction.

Discussion

Here, we found that the therapeutic outcomes of sleep surgeries could be predicted prior to surgery in OSA subjects by using machine learning models based on the demographic features, sleep parameters, and the characteristics of subjects’ upper airway structures. Our data also suggest that the predicted surgical outcome from preoperative machine learning might provide more adequate therapeutic options to OSA subjects in concert with avoidance of unnecessary sleep surgery.

Surgical modification of the upper airway is suitable for select subjects and is often recommended for symptomatic subjects unable to tolerate PAP therapy. There are many discrepancies in the literature related to the therapeutic outcome of surgical intervention in OSA, and it is clinically important to recommend sleep surgery to OSA subjects who are judged to have higher success rate10,11,12. Until now, favorable candidates for sleep surgery may be decided thorough preoperative evaluation including endoscopic examination of upper airway, drug-induced sleep endoscopy, cephalometry, and PSG. The predicted success rate of sleep surgery was somewhat inaccurate and was determine by sleep surgeon’s subjective clinical experience. It is widely accepted that more method for objective surgical outcome prediction would be needed and the comparison of clinical factors in OSA subjects should be focused on detecting the distinctive points between responders and non-responders to predict success rate prior to sleep surgery32,33,34,35. The mean age of the subjects in this study was 43.5 years that may be low considering the prevalence of OSA peaks at approximately 55 years of age36. However, as many young OSA subjects are reluctant to use PAP every night and seek for surgical treatment due to low compliance on PAP, the age of subjects who underwent OSA surgery in other studies7,14,15. In addition, the ratio of male and female subjects (9:1) in the present study was higher than that reported in the previous study36 (ranges between 2:1 and 4:1). Therefore, the population who underwent OSA surgery in this study might be young and male-dominated population compared with general population of OSA. For surgical method, as upper airway obstruction was found at multiple sites, most subjects underwent multi-level surgery. The surgical method was not used as the parameters deriving machine learning models, as preoperative anatomical structures determine the surgical method. However, the difference in the detail surgical technique and skillfulness between the surgeons were not considered, because these factors are too complicated and difficult to be quantified.

We also compared clinical factors including sleep parameters and anatomic structures of the upper airway in OSA subjects between responders and non-responders. Our data showed that the mean value of BMI was relatively lower in responders and the success rate of sleep surgery was significantly lower in obese OSA subjects.

The current data also revealed that the presence of nasal pathologies might be closely related to the success or failure of surgery in OSA subjects and the surgical correction of nasal pathologies would be helpful better outcome of sleep surgery. In addition, tonsil grade in the responders was higher than that of non-responders and the OSA subjects with relatively larger tonsils exhibited a higher success rate after sleep surgery.

Sleep parameters still represent essential factors in determining the therapeutic outcome of OSA subjects. Evidently, the difference in sleep parameters before therapeutic trials in OSA subjects seems to predict the success rate of OSA treatment including sleep surgery7,14. In the present study, preoperative AHI was significantly lower and the lowest O2 saturation was relatively higher in responders. In addition, sleep time with oxygen saturation ≤ 90% was significantly longer in non-responders than in responders. In the present study, postoperative complications were not assessed due to its retrospective nature. Moreover, it was difficult to quantify the subject's subjective symptoms, and the rate of serious complications lasting 4 to 6 months after sleep surgery was very minimal, so we presumed that the occurrence of complications or side effects of surgery did not significantly affect the results of polysomnography performed 4 months later.

Next, we conducted a study to establish an appropriate machine learning model using clinical factors and sleep parameters of OSA subjects. The data processing in machine learning is different from traditional statistical analysis. In traditional statistical approaches, researchers commonly choose a pre-designed model that is most appropriate for the data to predict surgical outcome of sleep surgery. Thus, the major limitation of the traditional statistical model is that only theoretically relevant parameters based on previous studies and experience, or significant parameters in the univariate analysis are used. In addition, the studies related to sleep surgery that have already been published were more focused on anatomic factors to evaluate therapeutic outcome and many variables in PSG or HSAT had not been used to predict therapeutic outcomes of sleep surgery2,14,15.

In contrast, machine learning is not created on a pre-structured model; instead, the numerous variables in data produce the model by detecting the underlying patterns. This approach prevents the pre-assumptions regarding types of models and interaction between variables. Considering these analytic advantages, machine learning may find the hidden knowledge that remains undetected by conventional statistical analysis. In the present study, logistic regression and three different machine learning methods were used to predict the surgical outcomes of sleep surgery in OSA subjects, and the predicted surgical outcome of sleep surgery was calculated in each machine learning model based on sleep parameters and the characteristics of subjects’ upper airway structures. Although the importance of variables was different between the models, sleep time with oxygen saturation less than 90% and lowest O2 level were highly ranked variables across all the models. In particular, sleep time with oxygen saturation less than 90% and the lowest O2 were the highest ranked variables in the random forest, gradient boosting machine, and SVM models. Because sleep surgery corrects anatomical structures in the upper airway of OSA subjects, surgeons often underestimated other physiological factors in sleep studies except for preoperative AHI. However, the consistent importance of sleep time with oxygen saturation less than 90% and lowest O2 saturation suggest the importance of preoperative sleep parameters for sleep surgery. Among the four machine learning models, gradient boosting shows the best performance in AUC, accuracy, sensitivity, specificity, and positive predictive value to predict surgical outcomes; the lowest performance was by the logistic regression model. The performance of the gradient boosting model had higher accuracy than the physician’s prediction and logistic regression model, which indicates the possible clinical application of machine learning in the prediction of the clinical outcomes of sleep surgery. The physician expected many surgical failures due to high AHI, therefore, the physician’s prediction showed low sensitivity. Feature selection strategies using information gain, gain ratio, and recursive feature elimination methods were performed. However, it did not improve the performance of the models and the performance of the gradient boosting model was even lower with selected features (data not shown). Therefore, we used all the variables to derive machine learning models.

We estimate that the complex interactions between the anatomical and physiologic variables are the obstacles in predicting surgical outcome properly because both anatomical and physiologic factors contribute to the pathophysiology of OSA37. Using a machine learning model, more variables which were representative of both anatomical and physiologic parameters were used properly to derive prediction models than those in a conventional statistical model such as a logistic regression model that used in previous studies7,14,15.

Moreover, the machine learning approach elicits a better prediction model by detecting previously unknown associations between anatomical variables and sleep parameters. In particular, the subjects were divided into a training set and test set in the present study, and the machine learning models derived from the training set were applied to the different cohort (test set), which predicted more acceptable real surgical outcomes. Recently, clustering analysis, unsupervised machine learning, has been performed to investigate phenotypes of OSA38,39. Clustering analysis showed different phenotypes of OSA through the various clinical clusters38,39. The present study aimed to predict surgical outcomes immediately for candidates for sleep surgery in OSA subjects and used supervised machine learning techniques, which may be helpful for especially inexperienced surgeons when determining OSA surgery. However, a clustering analysis may be helpful to find a distinct phenotype of OSA with high surgical success rate in the future.

There are some limitations in this study. First, subjects were obtained from two different sleep studies (PSG and HSAT). Therefore, some parameters (e.g. central apnea) which are only available in PSG were not utilized to derive machine learning models. Further investigation with more OSA subjects who underwent preoperative and postoperative PSG is necessary. Second, revision surgery was not evaluated for consistency of the preoperative status. Hence, these models were only applicable in the subjects without previous history of oropharyngeal OSA surgery. Third, the clinical data analyzed in this study do not represent general population for OSA subjects in terms of age and sex ratio. However, the difference in age and sex ration between the general population of OSA and that of patient who underwent OSA surgery is also reported in the previous studies7,14,15. Lastly, the time point determining the surgical outcome through postoperative PSG or HSAT might affect the actual surgical outcome. For the subject's personal circumstances, we performed PSG or HSAT at an average of 4 months after sleep surgery, but we have not been able to analyze the difference in the subject's sleep factors at each time point of PSG or HSAT or apply them to machine learning methods. It was more adequate to proceed with PSG or HSAT at each time point in a group of OSA patients with the same other conditions, but it was impossible to recruit those OSA subjects in a retrospective study using clinical data. The time point determining the surgical outcome should be controlled in the further study with prospective design.

In summary, machine learning models such as the gradient boosting model can supply the accurate prediction about the surgical outcome of sleep surgery based on demographics, anatomical characteristics, and sleep parameters prior to sleep surgery. The predicted surgical outcome from machine learning-derived analysis might provide a key clinical decision for adequate therapeutic options of OSA.