Machine learning based algorithms to impute PaO2 from SpO2 values and development of an online calculator

We created an online calculator using machine learning (ML) algorithms to impute the partial pressure of oxygen (PaO2)/fraction of delivered oxygen (FiO2) ratio using the non-invasive peripheral saturation of oxygen (SpO2) and compared the accuracy of the ML models we developed to published equations. We generated three ML algorithms (neural network, regression, and kernel-based methods) using seven clinical variable features (N = 9900 ICU events) and subsequently three features (N = 20,198 ICU events) as input into the models. Data from mechanically ventilated ICU patients were obtained from the publicly available Medical Information Mart for Intensive Care (MIMIC III) database and used for analysis. Compared to seven features, three features (SpO2, FiO2 and PEEP) were sufficient to impute PaO2 from the SpO2. Any of the ML models enabled imputation of PaO2 from the SpO2 with lower error and showed greater accuracy in predicting PaO2/FiO2 ≤ 150 compared to the previously published log-linear and non-linear equations. To address potential hidden hypoxemia that occurs more frequently in Black patients, we conducted sensitivity analysis and show ML models outperformed published equations in both Black and White patients. Imputation using data from an independent validation cohort of ICU patients (N = 133) showed greater accuracy with ML models.

www.nature.com/scientificreports/ The ratio of the partial pressure of oxygen (PaO 2 ) to the fraction of oxygen (FiO 2 ) delivered, or the PaO 2 /FiO 2 , is the reference standard measurement for the assessment of low blood oxygen levels, or hypoxemia, in mechanically ventilated patients with respiratory failure. The PaO 2 /FiO 2 ratio (PF ratio) has predictive value for mortality in patients with acute respiratory distress syndrome (ARDS) 1 and is also part of a severity index scoring system called the Sequential Organ Failure Assessment (SOFA) score that is used to predict severity of illness in patients with critical illness [2][3][4] . Additionally, the PF ratio is relevant to clinical decision-making including the decision to initiate prone positioning in ARDS patients with PF ratios ≤ 150 5 . Currently, measurement of the PF ratio requires invasive arterial blood gas (ABG) sampling and does not provide a continuous measure of the patient's oxygenation. Increasingly, non-invasive monitoring with pulse oximetry is utilized instead of ABGs 6,7 , particularly in low-resource settings where ABG monitoring may not be readily available. In contrast to invasive blood gas sampling, the SpO 2 (peripheral saturation of oxygen)/FiO 2 ratio can be calculated without blood collection, arterial puncture, or blood gas analyzers and may serve as a surrogate for the PaO 2 /FiO 2 ratio. Notably several studies have evaluated the SF ratio in children where non-invasive measurements are increasingly favored [8][9][10] . A few studies have examined non-linear imputation of PaO 2 /FiO 2 from SpO 2 /FiO 2 measurements recorded at the same time 11,12 . These studies have reported that the accuracy of non-linear imputation is superior to loglinear or linear imputation, especially for moderate to severe hypoxemic respiratory failure with ARDS where the PF ratio is < 200 11,13 . However, in patients with respiratory failure requiring mechanical ventilation, the optimal equation for imputation of PaO 2 /FIO 2 from the SpO 2 /FIO 2 remains unclear. An algorithm to accurately impute the PaO 2 from the SpO 2 in mechanically ventilated patients would be beneficial for predictive modeling and clinical research to facilitate recruitment of patients for clinical trials if an ABG is not available. Ideally, this approach would include only variables that contribute to the relationship between SpO 2 and PaO 2 but would not require the same invasive ABG measurement as the PaO 2 . From the clinical perspective, SF ratio can be utilized as a surrogate for PF ratio to diagnose ARDS or ALI with less invasive nature and comparable reliability 14 .
The objective of this study is to develop a calculator utilizing machine learning algorithms to impute PaO 2 using non-invasive SpO 2 measurements from mechanically ventilated patients in the Medical Information Mart for Intensive Care (MIMIC) III database 15 and to compare the accuracy of the machine learning models to the previously published non-linear and log-linear equations 11,13 . In this study, three common machine learning approaches (neural network 16 , regression 17 , and kernel-based methods 18,19 ) were tested for regression and classification tasks using data available in MIMIC III 20 with 7 clinical variable features and a subsequent 3-feature model. We created models to perform a regression task to impute PaO 2 from SpO 2 values and a classification task to predict patients with moderate to severe hypoxemic respiratory failure based on a cut-off of a predicted PF ratio ≤ 150 11 . Our overall hypothesis is that a machine learning algorithm would perform better in predicting the PaO 2 from SpO 2 across the entire span of SpO 2 values when compared to the previously published equations.

Methods
The MIMIC-III database v1.4 (https:// mimic. physi onet. org) is an openly available dataset developed by the Massachusetts Institute of Technology Lab for Computational Physiology 15 . It contains de-identified health data associated with approximately 40,000 intensive care unit admissions for patients admitted to critical care units in the Beth Israel Deaconess Medical Center between 2001 and 2012. MIMIC-III is a relational database that contains information on demographics, vital signs, mechanical ventilation status, laboratory tests, medications, and mortality. We also utilized a validation cohort obtained from an existing database of de-identified clinical information from intensive care unit patients with Pseudomonas aeruginosa respiratory isolates from 2 hospitals within the University of Pittsburgh Medical Center (UPMC). This dataset similarly contains information of demographics, mechanical ventilation status, ventilator parameters and laboratory tests. Our study utilizing the MIMIC-III database was determined as exempt by the University of Pittsburgh Institutional Review Board (STUDY19100068). The University of Pittsburgh Institutional Review Board approved the Pseudomonas aeruginosa ICU respiratory isolates database as waiver of informed consent (STUDY21030010) and also approved the use of this database as an independent validation cohort (STUDY21090073). All methods were carried out in accordance with relevant guidelines and regulations.
Data processing. For the MIMIC-III database, we identified unique ICU encounters (icustay_id) with mechanical ventilation status. We next identified the lab event PaO 2 and chart event SpO 2 occurring at the same time of the mechanical ventilation status. In order to minimize error between matched PaO 2 and SpO 2 , we constrained the time gap between the lab event PaO 2 and the chart event SpO 2 to be no more than 30 min. To minimize repeated sampling from the same subjects, we restricted the search of PaO 2 measurements to the first 24 h of mechanical ventilation and obtained the first PaO 2 recorded within this time frame. For chart events including tidal volume (TV), positive end-expiratory pressure (PEEP), FiO 2 , temperature, and mean arterial pressure (MAP), we constrained the time gap to within 2 h of the selected SpO 2 measurement. If a patient was treated with vasoactive infusions, it was recorded as a categorical variable. Data extraction and processing methods are available at https:// github. com/ rensh uangx ia/ PaO2P redic torDj ango 21 . The online calculator is available at https:// dikb. org/ pa02-predi ctor.
For the 3-feature model in the UPMC validation cohort, the database was queried for unique ICU patients requiring mechanical ventilation. The validation set cases include 133 discrete individuals with ABGs obtained within 30 min of an SpO 2 reading similar to the constraints defined in the MIMICS III derivation cohort.
Machine learning methods for regression task. For the regression task we implemented 3 different models-a neural network model, a linear regression model, and support vector regression (SVR), a type of kernel-based modeling. For each model, we applied a tenfold cross-validation 22  www.nature.com/scientificreports/ For the neural network model, we tested different network structures and various numbers of features to arrive at two models used for comparison with the linear and support vector regression models. One model used seven input features and three hidden layers (16,8,5 neurons for layers 1-3). The other model used only three input features and two hidden layers (6, 3 neurons for layers 1 and 2). Both final models used a tangent activation function for all layers except the output layer which used a linear function in both models. Also, both models were trained for 200 epochs with Adam optimizer using gradient descent. The learning rate was 0.001 and the batch sizes were 50 for both models.
For the linear regression model, the output variable can be computed by a linear combination of the input variables. We trained the linear regression equation by the Ordinary Least Squares approach. We used the lin-ear_model.LinearRegression method from scikit-learn 0.22 (https:// scikit-learn. org/ stable/) with default hyperparameters for predicting PaO 2 values. For the SVR model, we tested multiple kernels including linear kernel, polynomial kernel, and radical basis function kernel (RBF). Based on the performance in the training data, the RBF kernel was selected.
Machine learning methods for classification task. We utilized PaO 2 /FiO 2 ≤ 150, an accepted threshold previously utilized to capture patients with moderate to severe disease meeting the criteria for ARDS 11,13 . We utilized this cut-off to test machine learning methods to predict this diagnostic threshold PaO 2 /FiO 2 ≤ 150 for the different imputation techniques. We implemented three classification models including neural network, logistic regression, and a kernel-based model, SVM.
For each machine learning model, we applied a tenfold cross-validation and calculated the sensitivity, specificity, likelihood ratios, diagnostic Odds Ratio (OR), Area Under Receiver Operating Characteristic curve (AUROC), F1 score and Bayesian Information Criterion (BIC) to compare across models. The two neural network models for classification were similar to the neural networks used in regression, except the output layer used the sigmoid function. As with the regression models, various topologies were tested to arrive at the final two multi-layer perceptron (MLP) classifiers, one with an input size of seven features and the other with an input size of three features. The hidden layer size is (12,8,6,4,4) for the model with seven input features. For the other model which utilizes only three input features, we used two hidden layers of size 6 and 3. All hidden layers used the tangent activation function. We trained both models for 200 iterations with Adam optimizer, setting seven feature classifier momentum value as 0.8 and three feature classifier momentum value as 0.6. The learning rate was 0.001 and the batch size was 200 for both models.
In addition, we implemented a basic logistic regression model for classification purposes as well as the SVM model which classifies examples with an optimal hyperplane. For the logistic regression, it uses logistic function to model a binary dependent variable. We utilized the linear_model.LogisticRegression method provided in the scikit-learn library without regularization, and other arguments were set as default. For the SVM model, we compared the results by applying different kernels and the RBF kernel outperformed other kernels. Methods were similar to those used in the regression task.

Comparison of machine-learning based algorithm to published non-linear and log-linear equations.
We compared the performance of our machine learning algorithms to the previously published equations. For the non-linear equation from Brown et al. 11 the PaO 2 was imputed from the SpO 2 , where PO 2 = PaO 2 , S = SpO 2 and F = FiO 2 which is illustrated in the Eq. (1). For situations where the recorded SpO 2 was 100% (or, 1.0), the SpO 2 was substituted with 0.996 given that the equation would not permit the calculation of S = 1.0.
Non-linear equation to impute PaO 2 from the SpO 2 (Reprinted with permission -see Acknowledgment section).

Sensitivity analysis.
To compare the performance of our machine learning algorithms to previously published equations, a sensitivity analysis was performed by selecting either self-reported White or Black race. For each machine learning model, we implemented a tenfold cross-validation and calculated the sensitivity, specificity, likelihood ratios, diagnostic OR, AUROC, F1 score, RMSE (root-mean-square deviation), and BIC to compare across models. (1)

Results
A parsimonious three features model is sufficient to impute PaO2/FiO 2 ratio using a large dataset. An overview of the machine learning tasks is outlined in Fig. 1. We initially chose seven relevant features from the chart events (SpO 2 , FiO 2 , TV, MAP, temperature, PEEP and vasopressor administration) representing recorded bedside measurements that were independent from an invasive arterial blood gas measurement. When applying the seven features to impute the PaO 2 , the final data set contained 9900 unique ICU encounters from 9302 mechanically ventilated patients (Supplementary Table e1). The relationship between SpO 2 /FiO 2 (S/F) and the PaO 2 /FiO 2 (P/F) was examined in dataset 1 containing 9900 unique ICU events from the MIMIC-III database and was best described by a log-linear relationship between the transformed logarithmic value of the SF and PF ratios as previously described by Pandharipande et al. 13 (Supplementary Fig. e1). The relationship between S/F and P/F ratios showed high variance across the distribution of mechanically ventilated subjects (R 2 = 0.21).
For the regression task, we derived the RMSE and BIC for each of the different seven feature machine learning models (neural network, linear regression, support vector regression) to assess the performance of the imputation techniques. The RMSE and BIC of the three machine learning methods are shown in Supplementary Table e2. All the machine learning models outperformed the previously published non-linear and log-linear equations as shown by lower RMSE score; the same was observed for subset 1 (SpO 2 < 97%). For the classification task, the three machine learning methods achieved similar classification performance according to F1 scores, as shown in Supplementary Table e3; the same pattern was observed for subset 1 (SpO 2 < 97%).
To improve practicality of the method at the bedside, we attempted to use the smallest number of features possible to predict the PaO 2 or PaO 2 /FiO 2 ratio from the regression and classification tasks, respectively. Compared to the other measured variables, PEEP had the strongest correlation with PaO 2 /FiO 2 (r = − 0.31) outside of the SF ratio (SpO 2 /FiO 2 ) ( Table 1). Using this information, we created a 3-feature model using SpO 2 , FiO 2 and PEEP. As compared to seven features, three features were sufficient to impute PaO 2 /FiO 2 ratio with a similar degree of accuracy. The 3-feature model was therefore utilized in the remainder of the analysis for the machine learning algorithms. The final 3-feature data set (dataset 2) contained 20,198 ICU encounters from 17,818 unique patients ( Table 2). Forty percent of subjects were of female sex and the mean age was 64 years. The degree of hypoxemic respiratory failure, as measured by the PaO 2 /FiO 2 ratio 1 , showed a distribution in which 26% had mild respiratory failure (PaO 2 /FiO 2 = 201-300), 22% had moderate respiratory failure (PaO 2 /FiO 2 = 101-200), and 8% had severe respiratory failure (PaO 2 /FiO 2 ≤ 100).   Fig.  e2). There was decreasing accuracy at higher PaO 2 /FiO 2 ratios for all the methods examined.
Machine learning models show improved performance for the classification task. We compared the performance of the machine learning models with the log-linear and non-linear equations using F1 scores. Similar to the findings for the regression task, all three machine learning models performed better in the whole dataset than log-linear and non-linear equations (Table 4). When the dataset was limited to SpO 2 < 97% (subset 2), the machine-learning methods performed slightly better than log-linear and better than non-linear equations, respectively ( Table 4). The F1 scores for all three machine learning methods were similar when using the whole dataset (dataset 2) and for subset 2 where SpO 2 < 97%. As shown in Fig. 2, when comparing the 3 machine learning models to one another, the neural network preformed slightly better in the whole dataset (area  Sensitivity analysis. Hidden hypoxemia, or the discrepancy between peripheral oxygen saturation (SpO 2 ) measurements and the arterial oxygen saturation (SaO 2 ) measured by ABG, was recently identified to occur in 5.3-5.5% of patients in the ICU setting 23,24 . Hidden hypoxemia, defined as SpO 2 ≥ 88% despite an SaO 2 ≤ 88%, was observed in all races and ethnic groups but occurs with higher prevalence in Black patients 23,24 . We conducted a sensitivity analysis to compare the performance of the machine learning models between self-reported Black and White race in dataset 2. For the regression task, among Black patients, machine learning algorithms outperformed both non-linear and log-linear equations in terms of the regression task  Table 4. Prediction performance of machine learning classification models based on three features. Prediction performance statistics were calculated for the machine learning models based on three features and compared to the Log-linear and Non-linear methods for the entire dataset (20,198 ICU events; entire dataset 2) and for a subset of the events where SpO 2 < 97% (3280 events; subset 2). Variables included in the 3-feature machine learning models are SpO 2 , FiO 2 , and PEEP. PaO 2 : SVR Support vector regression, AUROC area under receiver operating characteristic curve, BIC Bayesian information criterion, LR likelihood ratio, OR odds ratio.

Neural network Logistic regression SVM Log-linear Non-linear Neural network Logistic regression SVM Log-linear Non-linear
Total, No 20,198 20,198 20,198 20,198 20,198

Machine learning algorithms show a better accuracy in the validation cohort.
We developed an online calculator using the three machine learning algorithms requiring three inputs (SpO 2 , FiO 2 , and PEEP): https:// dikb. org/ pa02-predi ctor. The calculator was then utilized in an independent validation cohort of 133 mechanically ventilated ICU patients to impute the PaO 2 in a regression task. The imputed PaO 2 was compared to the actual PaO 2 obtained by ABG. The accuracy of the machine learning algorithms was compared to the nonlinear equation and was reported as the RMSE and adjusted R-squared ( Table 5). The neural network and SMV models had lower RMSE than the previously published non-linear equation, demonstrating improved performance in the imputation of PaO 2 . Adjusted R-squared was also higher in the neural network and SMV models.
To clarify the models proposed in this study, the following example is worth mentioning: with the assumption of SpO 2 = 100%, FiO 2 = 0.6, and PEEP = 5 cmH 2 O (observed PaO 2 /FiO 2 = 190), the predicted PaO 2 is estimated as 203.0, 186.2, 188.4 using neural network, SVM, and regression models, respectively, while the estimate of conventional non-linear model is 167 (Table 6).

Discussion
We used the publicly available MIMIC-III database as a derivation cohort to develop and evaluate machinelearning algorithms to impute PaO 2 utilizing non-invasive SpO 2 in patients who are mechanically ventilated. We tested three machine learning models (neural network, linear regression and SVR) first using seven available clinical variables SpO 2 , FiO 2 , PEEP, TV, MAP, temperature, and vasopressor administration to impute the PaO 2. We subsequently used a parsimonious model with three clinical variables (SpO 2 , FiO 2 and PEEP) to noninvasively impute PaO 2 in both a derivation and validation cohort. The imputation of PaO 2 from the regression tasks enabled us to derive the PaO 2 /FiO 2 , a clinically meaningful ratio with predictive value 1,25 . Additionally, we performed a classification task to predict PaO 2 /FiO 2 ≤ 150, a cut off that has been used to capture those patients with moderate to severe respiratory failure in ARDS cohorts 11,13 and to guide patient management 5 . To increase the clinical applicability of our work, we also developed an open-access online calculator to impute the PaO 2 using the 3-feature model requiring only non-invasive bedside parameters in mechanically ventilated patients. Our calculator showed improved accuracy in the imputation of the PaO 2 when compared to the previously published non-linear equation in both our initial cohort and the validation cohort.
To develop the machine learning algorithms, we initially evaluated clinical variables such as PEEP, TV, MAP, temperature, and vasopressor administration that are easily obtained at the bedside. TV, MAP, temperature and vasopressor use demonstrated a stochastic distribution and did not significantly alter the accuracy of the machine-learning based algorithms and were therefore removed to create the 3 features model (SpO 2 , FiO 2 , Table 5. RMSE of the 3-feature machine learning models regression task compared to the published nonlinear equation. The PaO 2 was imputed using an online calculator of the three machine learning models using SpO 2 , PEEP, and FiO 2 from a validation cohort of 133 mechanically ventilated ICU patients. Subsequently, the RMSE and adjusted R 2 for the 3-feature machine learning models were calculated and compared to the published non-linear equation. A lower RMSE and higher adjusted R 2 indicate higher accuracy. SVR Support vector regression, RMSE root-mean-square deviation. www.nature.com/scientificreports/ PEEP). This 3-feature model provides a framework for generalizability using large datasets of mechanically ventilated patients. We considered other clinical variables such as skin pigmentation, pulse oximeter location, oximeter manufacturer, vasopressor infusion, and laboratory variables such as serum bicarbonate, serum chloride, serum creatinine, serum sodium but others have shown these variables provided negligible improvement in the accuracy of imputation in a prior prospective study 11 and were therefore not included. However, it is worth mentioning that recent studies showed discrepancy between peripheral oxygen saturation (SpO 2 ) measurements and the arterial oxygen saturation (SaO 2 ) measured by ABG. This discrepancy, defined as SpO 2 ≥ 88% despite an SaO 2 ≤ 88% and referred to as hidden hypoxemia, was present in all racial and ethnic groups but showed higher prevalence in Black patients 23,24 . Considering this discrepancy between SpO 2 and arterial oxygen saturation occurs more frequently in Black patients 24 , we performed a sensitivity analysis showing that our machine learning algorithms outperform previously published equations both in the Black and White race.
Our study shows that a machine learning based method for both the regression and classification task, when applied to the MIMIC-III critical care database, improved the accuracy compared to the previously published non-linear and log-linear imputation methods. As is evidenced by comparing the F1 and discrimination measures in Table 4, the performance improvement was more modest for the classification task in subset 2 where SpO 2 < 97%. A possible explanation is that there were fewer ICU events (smaller N) per group in the subset.
Prior studies have examined the relationship between SF and PF ratios for patients with ARDS to determine whether the non-invasive SF ratio can be substituted for the invasively obtained PF ratio 11,13,26 . Panharipande, et al. studied matched measurements of SpO 2 and PaO 2 in a heterogeneous population (i.e., patients undergoing general anesthesia and patients with ARDS) to determine the association between SF and PF ratios in order to calculate the respiratory parameter of the SOFA score 13 . In their study, matched SpO 2 and PaO 2 values were obtained from two groups of patients: Group 1 comprised of the derivation set and was obtained from patients undergoing general anesthesia from a single center and Group 2 comprised a validation set utilizing data from patients enrolled in a multi-center randomized clinical trial examining low versus high tidal volume for acute respiratory management of ARDS (ARMA) 27 . All SpO 2 values > 97% were also excluded from analysis in order to maximize matched data to those values likely to be within the linear range of the oxyhemoglobin dissociation curve. Data from 4728 matched SpO 2 and PaO 2 measurements showed that the relationship was best described by a log-linear equation with slight variation based upon the level of PEEP. In the setting of a more heterogeneous population, a poorer correlation was noted between SF and PF ratios. The regression equation of Log(PF) = 0.48 + 0.78 × Log(SF) yielded an R-square of 0.31 13 .
Additionally, a retrospective analysis of arterial blood gas measurements from three ARDS Network studies compared the performance of non-linear, log-linear and linear imputation methods to derive PaO 2 from the SpO 2 12 . In all patients (N = 1184), the nonlinear imputation was equivalent to log-linear imputation. However, in those patients with SpO 2 < 97% (N = 707), the nonlinear imputation showed lower error than either linear or log-linear equations. A prospective study was subsequently conducted in patients enrolled in the Prevention and Early Treatment of Acute Lung Injury network 11 to assess the performance of the non-linear equation to impute PaO 2 from the SpO 2 and compare it to the prior log-linear and linear equations 11,13,26 . This study included 1034 arterial blood gases from 703 patients, of which 650 arterial blood gases had matched SpO 2 < 97%. The non-linear equation showed lower error and better identified moderate to severe ARDS patients (defined in the study as PaO 2 /FiO 2 ≤ 150) when compared to log-linear or linear imputation methods.
In our study, we similarly found a high degree of variance across SpO 2 values and corresponding measured PaO 2 values which was noted when we formally examined the relationship between SF and PF. This may be attributed to the retrospective nature of the data collection and the numerous variables that may confound the reliability of a recorded SpO 2 measured non-invasively to reflect the arterial SaO 2 8,10,12 . Despite this limitation, the machine learning algorithms performed better on both regression and classification tasks when compared to the log-linear and non-linear published equations.
We used a validation cohort to show improved accuracy for the neural network and kernel-based machine learning algorithms when compared to the previously published non-linear equation. Another strength of our study is the development of an online calculator that can be used to impute the PaO 2 from three noninvasive parameters (SpO 2 , FiO 2 and PEEP) and may serve as a tool for future studies in large electronic health record datasets. Additionally, our machine learning models allow for the evaluation of all mechanically ventilated patients with available data rather than narrowing the analysis to a specific population such as those with ARDS. Given the inclusion of all mechanically ventilated patients, a significant number of SpO 2 values were > 97% (N = 8510 for seven features and N = 16,918 for three features). While this reduced the accuracy of the imputed PF ratio, particularly above a certain threshold, the machine learning models were applied to the data without a pre-defined restriction placed upon the range of SpO 2 values and showed better performance than both the log-linear and non-linear equations on both the regression and classification tasks.
Imputation of PaO 2 from SpO 2 has been increasingly implemented in clinical and research settings using previously published equations for subjects that do not have invasive ABG measurements readily available. This underscores the need to improve upon existing published equations and the clinical importance of machine learning models proposed. Machine learning models are currently being used to answer numerous clinical questions; these models have substantially impacted different scopes of medicine from early-warning systems for sepsis to imaging diagnostics 24 . Herein, we proposed three machine learning algorithms which can provide a framework for future investigations. The online calculator, on the other hand, can provide feasible prediction of PF ratio from SF ratio at the bedside for clinicians working in the critical care settings.
We showed that machine learning models outperformed previously published equations in terms of imputing PaO 2 from SpO 2 in the mechanically-ventilated adult population. Consistent with our findings, Sauthier et al., utilized neural network models to validate a continuous and noninvasive method of hypoxemia estimation in www.nature.com/scientificreports/ pediatric population 28 . They utilized convolutional neural network (CNN), long short-term memory network (LSTM), and multilayer perceptron (MLP) to impute PaO 2 . Intriguingly, they concluded that bias was lowered when using neural network models compared to mathematical equations. In summary, any of the tested machine learning models applied to MIMIC-III dataset enabled imputation of PaO 2 from the SpO 2 with lower error and provided greater accuracy in predicting PaO 2 /FiO 2 ≤ 150 across the entire range of SpO 2 examined when compared to that of published equations in two independent cohorts. All machine learning models proposed in this paper outperformed log-linear and non-linear equations. Future work will be required to prospectively test ML algorithms for use in clinical practice. Additionally, our study provides a clinically relevant online calculator for the imputation of the PaO 2 from the 3-feature machine learning models. The calculator requires the input of SpO 2 , FiO 2 , and PEEP all of which are non-invasive and readily available at the bedside of mechanically ventilated patients.