Continuous noninvasive blood gas estimation in critically ill pediatric patients with respiratory failure

Patients supported by mechanical ventilation require frequent invasive blood gas samples to monitor and adjust the level of support. We developed a transparent and novel blood gas estimation model to provide continuous monitoring of blood pH and arterial CO2 in between gaps of blood draws, using only readily available noninvasive data sources in ventilated patients. The model was trained on a derivation dataset (1,883 patients, 12,344 samples) from a tertiary pediatric intensive care center, and tested on a validation dataset (286 patients, 4030 samples) from the same center obtained at a later time. The model uses pairwise non-linear interactions between predictors and provides point-estimates of blood gas pH and arterial CO2 along with a range of prediction uncertainty. The model predicted within Clinical Laboratory Improvement Amendments of 1988 (CLIA) acceptable blood gas machine equivalent in 74% of pH samples and 80% of PCO2 samples. Prediction uncertainty from the model improved estimation accuracy by 15% by identifying and abstaining on a minority of high-uncertainty samples. The proposed model estimates blood gas pH and CO2 accurately in a large percentage of samples. The model’s abstention recommendation coupled with ranked display of top predictors for each estimation lends itself to real-time monitoring of gaps between blood draws, and the model may help users determine when a new blood draw is required and delay blood draws when not needed.

Inspiratory tidal volume % leak % Gas leak around endotracheal tube during respiratory cycle Patients in severe respiratory distress are often supported by intubation with mechanical ventilation. The correct level of ventilation is critical for life support without further lung injury. Blood gas pH and arterial CO 2 pressure (PaCO 2 ) obtained through invasive blood draws are relied upon to help determine ventilator settings. In the acute phase of injury, frequent blood draws are needed to determine blood gases 1 . This is especially difficult in pediatric patients where arterial access, pain, and blood loss are major concerns 2 ; moreover, arterial catheters are an under-recognized source of infection 3 . Improvements in pulse oximetry providing continuous monitoring of oxygenation has proved helpful in children and shifted practice patterns in pediatric intensive care to reduce use of arterial catheters 1,2 . With respect to ventilation, exhaled CO 2 monitored through capnography is correlated with blood gas (BG) CO 2 tension but has not been accepted to provide the accuracy continuous monitoring oximetry does. However, the frequency of BG sampling is decreased with capnography usage [4][5][6] , demonstrating that clinicians informally use capnography to determine the direction of blood pH changes. There has long been interest in estimating BG pH and PCO 2 from end-tidal CO 2 (PetCO 2 ) 7-9 and over the past few years there have been some stimulating new investigations on estimating these in pediatric patients noninvasively [10][11][12][13] . These studies show that PetCO 2 concentrations along with other noninvasive measurements can be used to estimate the values of blood pH and PCO 2 without taking an invasive blood sample. Nonetheless, challenges to clinical adoption remain. Prediction accuracy outside the normal pH range is low 10,11 , and there is a lack of clinical confidence in the predicted values.
The goal of this study is to develop continuous BG estimation that is accurate in all pH ranges for mechanically ventilated children with a wide range of severity of lung injury and hemodynamic support. Special consideration was given to develop a model suitable for clinical adoption. Estimations are made with a prediction uncertainty range, and the model can abstain from making inaccurate estimations when prediction uncertainty is high in case of large physiological fluctuations. Investigations on estimation accuracy over time provides guidance on the timeframe in which continuous noninvasive monitoring can be used. To further help users interpret and understand an estimated BG value, predictors are ranked by those with most significant contributions to the estimated value and displayed. We developed the model on a large derivation dataset spanning 5 years of data using novel modeling techniques and tested it on unseen validation data.

Methods
Following the guidelines of the Transparent Reporting of a Multivariate Prediction Model for Individual Prognosis or Diagnosis 14 , we developed and validated a BG estimation model that either provides an estimate of the current pH and PCO 2 or abstains from estimation. Study population. The retrospective derivation and validation datasets were collected from pediatric critical care patients admitted to a tertiary pediatric intensive care center with a multidisciplinary pediatic medicalsurgical ICU (PICU) and a pediatric cardiothoracic intensive care unit (CTICU), as shown in Table 1. Figure 1 illustrates data extraction steps for both cohorts. Derivation and validation cohorts spanned different times, and samples from the same patient could not appear in both cohorts. The dataset was approved with waiver of informed consent by the Children's Hospital Los Angeles Institutional Review Board and the study protocol was approved by the Philips Internal Committee for Biomedical Experiments. All experiments were performaned in accordance with relevant guidelines and regulations.
Derivation cohort extaraction. The derivation dataset was collected from patient measurements made between September 2012 and May 2017 and stored prospectively in the hospital's dedicated critical care SQL Server (Microsoft, Redmond, WA, USA). A dataset containing BG, granular physiological and ventilator data collected within ± 1 min of BG sample time was extracted from these medical records. pH and PCO 2 measurements were obtained from both arterial and capillary blood gases, which made up 90% and 10% of the data samples, respectively. In model development and analyses, arterial and capillary BG were used inter-changeably given closeness of capillary BG to arterial BG 15 . Samples with missing information in PetCO 2 measurement or medical record number (MRN) were removed. Derivation data was resampled to balance pH distribution and improve model performance in sparsely represented pH regions (eFig. 1). Patients on extracorporeal membrane oxygenation support were removed. A plausibility check was performed on measurement values as shown in eTable 1. Samples for the same patient were linked in time, and samples without a prior BG within 24 h were removed. Processing was done to ensure that variables were correctly linked in time, and that outcome variables (pH and PaCO 2 ) were always linked to prediction variables measured prior in time. The final derivation cohort was split into 5 outer-and 5 inner-folds for cross-validation (CV) using nested CV 16    Predictor variables www.nature.com/scientificreports/ using ridge regression to remove co-linearity between predictors and spurious correlations between predictors and targets.

Statistical analysis. Model building.
A novel pairwise regression model was developed to model interactions between one key predictor (previous pH) and non-key predictors. This model allows differences in physiology between patients in different pH ranges to be modeled independently while representing monotonic relationships between non-key predictors and BG. The model is mathematically expressed as where y denotes the predicted target, x i , i = 1, . . . , K denotes K non-key predictors, z denotes the key predictor, and f j (z), j = 1, . . . , M denotes sigmoid functions centered at M different values of the key predictor. The model is interpretable: pairwise interactions can be visualized 17 as show in Fig. 2 and the contribution of each predictor can be separated to generate predictor importance rankings for each estimation. Data processing, modeling, and analyses were performed in Python.
Prediction uncertainty around point-estimates and abstention. Prediction uncertainty was modeled to abstain from making predictions on high uncertainty samples and only accept predictions likely to be more accurate. Abstention rate was empirically set to 25%, meaning that 75% of samples were estimated.
Prediction uncertainty was modeled using bootstrap estimation of uncertainty 18 , by building separate models for each derivation CV fold and quantifying the agreement between them. For a given sample, the prediction uncertainty is the variance in predicted target values by all separate models. Only samples with uncertainty lower than a threshold generated an estimation, and this threshold was determined with the pre-defined criteria of improving the 95% percentile of pH predictions to ± 0.1 pH unit or lower while not abstaining on more than half of patients. Estimations were abstained on samples with high uncertainty.
The point-estimate of BG is generated from a final model trained on all derivation data. Separate models from derivation CV folds provide a prediction uncertainty range around the point-estimate.
Predictor importance ranking. When an estimate is made, predictor contribution to the estimation is ranked. The model can be rewritten as y = K i=0 g(x i ) , which denotes the sum of contributions from individual predic- , the importance of the i th predictor is where x i denotes the population mean of the predictor. When x i is close to x i , I i will be equal or close to 0, which means predictor x i contributes little to the overall estimate. When x i deviates from the population mean, I i shifts away from 0 to highlight the increased contribution of   10 using PetCO 2 , FiO 2 , and mean airway pressure (MnAwP) was also tested on the validation dataset for comparison.
Performance comparison. Performance was evaluated by the 95% percentile of absolute error, or the worst 5% of samples. Performance of samples in separate pH ranges was reported. The percentage of samples with absolute error under 0.04 pH unit or 5 mmHg PCO 2 was calculated, following the CLIA gold standard for blood gas 19 .

Results
Cohort characteristics. Characteristics of the final derivation and validation cohorts are described in Table 3. The cohorts are representative of a general pediatric intensive care population. Sub-cohort criteria such as pediatric acute respiratory distress syndrome (PARDS) and respiratory acidosis are defined in Supplementary material. Figure 3 plots estimates of pH and PCO 2 against laboratory values for the validation dataset, and estimation performance before and after abstention are shown in Table 4. Overall, estimations were within CLIA acceptable blood gas machine equivalents 19 in 74% of pH samples (± 0.04 pH unit) and 80% of PCO 2 samples (± 5 mmHg). Estimation accuracy was balanced across pH, especially after abstention. Using the Mann-Whitney U-test, the validation results outperformed AVDSf and Baudin 10 models (eTable 4) with statistical significance P value < 0.001. Prediction uncertainty and predictor importance. Estimations were not made when prediction uncertainty was above an acceptable threshold, as shown in the example in Fig. 3f. Also shown are the top three predictors ranked by importance and their measured values. The uncertainty threshold for abstention was obtained by examining the trade-off between performance and abstention rate on derivation samples, as shown in Fig. 4a. Abstaining using prediction uncertainty outperforms randomly abstaining the same percentage of samples, as shown in Fig. 4a, indicating that prediction uncertainty is a useful measure of estimation confidence. Prediction accuracy over time. Figure   www.nature.com/scientificreports/ While the most recent BG was used for modeling, an additional analysis examined using the first available BG for each patient for all subsequent estimations. The 95% percentile using the first available BG was ± 0.124 pH unit, compared to ± 0.078 pH unit when using the previous BG. Safe classification of pH. As the predicted values are used to guide ventilator settings, erroneous predictions between pH ranges could be potentially dangerous for patients. Figure 4c examines whether the estimated pH range, spanned by the point estimate plus uncertainty range, cover the correct pH range. Overall, 85% of all estimations cover the correct pH range, while those in individual pH-ranges are above 70%.

Validation performance.
Arterial and capillary blood gas. Estimations based on arterial BG were slightly more accurate than estimations based on capillary BG but not statistically different. The null hypothesis of no statistically significant difference in absolute estimation errors was not rejected by a Mann-Whitney U-test with P value 0.07.  www.nature.com/scientificreports/ Model visualization. Figure 2 depicts two example non-linear relationships learned between non-key predictors and the key predictor, previous pH. Contribution to estimated pH is color-coded onto the two-dimensional predictor value space, with black indicating higher estimated pH contributions and white indicating lower estimated pH. The left plot shows that a lower etCO 2 measurement contributes to a higher estimate of pH contribution, as seen in the dark color at the crosspoint where a low etCO 2 measurement of 20 falls. If the same patient had a higher etCO 2 measurement, the predicted pH contribution would be lower due to the crosspoint falling higher on the vertical line and into the lighter lower pH contribution zone. The right plot shows that the nonlinear relationship between ΔSpO 2 and pH also varies depending on the key-predictor, previous pH.

Discussion
This study demonstrates that noninvasive parameters routinely available on most clinical monitors and ventilators can be used to provide useful estimates of BG in all intubated patients without necessitating a new blood draw, for up to 8 h. The model outperformed previous models while providing prediction uncertainty and predictor importance ranking, both of which can help users assess whether the model is likely to be accurate in a specific patient scenario. Built-in transparency of the model enables interpretation of estimation results, encouraging trust in adopting novel data-driven solutions for clinical practice. The model estimated within CLIA acceptable blood gas machine equivalents 19 in 74% of pH samples (± 0.04 pH unit) and 80% of PCO 2 samples (± 5 mmHg). The model achieved better performance than previously reported models 10,11 , especially in low-pH samples. Prediction accuracy on validation data was comparable to that on derivation data, demonstrating that the model is generalizable to new data.
The pH ranges in this study were used by the ARDSNet studies 20 . While accuracy in the absolute value of a pH estimation is important, users may be more concerned with whether pH is estimated in the correct range. For example, it would be very detrimental for a pH of 7.15 (low) to be estimated as 7.45 (high) since the likely change in ventilator management would be rather different under the two clinical situations, whereas an inaccurate estimation of 7.25 (compared to 7.15) would result in a less impactful modification to treatment and the change suggested would be in the same direction as that for the lower pH of 7.15. We showed that the majority of estimated samples cover the correct pH range in Fig. 4c. Low-pH samples remain the most challenging samples to estimate but using prediction uncertainty results in 74% of low-pH samples falling in the correct range.
Using older BGs for prediction was less accurate than using more recent BGs. This is likely due in part to variable changes in patient condition with time. Estimation accuracy decreased with longer time intervals between time of estimation and prior blood sample, but estimations remained accurate until up to 8 h, suggesting that typically one may abstain from blood draws up to 8 h from the previous BG.
The model utilizes readily available data sources in ventilated patients to provide continuous monitoring of BG through estimation. Estimations are made with prediction uncertainty, which highlight inherent uncertainty in the model and prevent the display of potentially inaccurate predictions. Furthermore, the model displays top predictors and their values, which gives the user more context around the estimation.
One could argue that in current practice, clinicians are already able to 'guesstimate' the BG pH based on the same data, and that any estimation that does not achieve laboratory-level performance is not useful. We propose there are merits of the model even at the current performance level. First, the model provides automatic and continuous monitoring of BG pH without any human effort, saving time and mental calculation even if the estimation is not perceived as better than a 'guesstimation' . Second, ranked top predictors can illuminate patient measurements and changes that may not have occurred to bedside caregivers. Third, it can be a good reassurance model for clinicians who want to check that their 'guesstimate' matches trends from thousands of prior blood www.nature.com/scientificreports/ gases from which the model was developed. Finally, estimation uncertainty is displayed, and clinicians can always make sure that the model does no harm by opting to obtain a BG. There are several potential applications of the model. First, noninvasive estimates of pH can decrease the number of blood draws further, and recommend that users obtain blood draws when they are most necessary. Second, continuously available estimates may facilitate standardized assessment of ventilator support and adherence to ventilator protocols, particularly those promoting lung protective recommendations. This has been implemented in the management of PARDS at Children's Hospital Los Angeles 21,24 . Clinicians can accept or reject the protocol's recommendation or obtain a blood draw if not confident about the prediction. In addition, the majority of BG for ventilated children in respiratory failure with PARDS lie in a normal to high range where the model performs well. Finally, the model has potential applications for closed loop ventilation, and will likely improve existing algorithms which use the PetCO 2 directly.
The model uses a recent BG under the assumption that the patient's respiratory and metabolic conditions have not drastically changed. Many external and contextual, and patient conditions are not available or captured at the time of estimation, so the final decision at the bedside must be left to the expertise of clinicians. One potential direction for improvement is obtaining a large dataset of BG, physiological measurements, and ventilation parameters along with full volumetric capnography for all patients. Volumetric capnography may provide additional information about patients' respiratory states and prognoses 22,23 not present in PetCO 2 . which could enable more accurate estimation of BG. Lastly, although the model was generalized to unseen patients from the same center, it is unknown whether the model will generalize to other centers. This also requires validation on additional data.
The main application of the model is likely to be when ventilated patients have been stabilized and are in a relatively steady clinical state. Hopefully, it will be a viable tool for avoiding blood draws and facilitating continuous BG monitoring leading to more lung protective practices as we currently understand ventilation and oxygenation management 24 . Ventilator decision support protocols based on measurements of arterial BG have proven useful in the management of adult respiratory failure 20 . Accurate noninvasive measurements of arterial or capillary PCO 2 with subsequent prediction of pH could allow more frequent ventilator changes to optimize lung and diaphragm protective ventilation without BG analysis, which would be particularly useful in pediatric practice where fewer arterial lines are used 25 .
The current model and results have some limitations. Specifically, model estimation only works on patients who have at least one recent blood gas invasively sampled. Abstention when model uncertainty is high defaults to invasive sampling and limits cases when noninvasive estimation can be used. Currently, there is no predeterminant of which patients are likely to generate high uncertainty samples. All of these questions may be better answered in a randomized control trial, which would also provide information on the usefulness and safety of such a model at the bedside.

Data availability
The datasets analysed during the current study are not publicly available due to restrictions in IRB approved usage.