## Introduction

Pulse oximetry is routinely used for non-invasive monitoring of oxygen saturation levels. A low oxygen level in the blood means low oxygen in the tissues, which can ultimately lead to organ failure. Oximetry can be used to sporadically measure the oxygen saturation level during a medical examination or continuously monitor patients in the intensive care unit (ICU) or overnight for a polysomnography (PSG) study. Identification of digital biomarkers extrapolated from the oxygen saturation time series can support the diagnosis and continuous monitoring of patient pulmonary function to predict deteriorations (prognosis). Specifically, studying the variability of the oxygen saturation signal may provide information on the underlying physiological control systems. Furthermore, it may enhance our understanding of the manifestation and etiology of diseases and identify digital oximetry biomarkers (OBMs) for the purpose of health monitoring. Sleep medicine makes standard usage of oximetry biomarkers, where overnight drops in oxygen saturation are characteristic of obstructive sleep apnea (OSA). Beyond the presence of OSA, the repetitive nocturnal hypoxemia may cause oxidative stress, contributing to the pathogenesis of cardiovascular morbidity1. Similarly, patients with advanced chronic obstructive pulmonary disease (COPD), and with no primary sleep-related breathing disorders, commonly exhibit overnight hypoxemia2. Yet, contrary to heart rate variability (HRV) measures, a field which has benefited from the development of stable standards3 and advanced toolboxes and software4,5,6, there are currently no such standards and open tools for analyzing oxygen saturation time series in terms of its variability, dynamics, and the statistical characterization of specific patterns.

### This contribution

Research on the use of existing and development of new oximetry biomarkers has mainly focused on the diagnosis of OSA, as echoed by five recent reviews in this field7,8,9,10,11. Although this paper will naturally somewhat overlap with these reviews, we present a new comprehensive review focusing on the physiological interpretation and clinical use of oximetry biomarkers, in the spirit of the work of Malik et al.3 in the field of HRV analysis. We also develop a complete Python toolbox (denoted “pobm”) and software interface (“PhysioZoo OBM”) for usage of these biomarkers, similar to our previous work in HRV analysis6. This will support rigorous research in oximetry time series analysis and ensure reproducibility of research. We apply the developed toolbox to a large dataset of overnight recordings in order to demonstrate its usability and clinical value in the context of OSA diagnosis. While the work mainly focuses on OBM developed in the field of sleep medicine, the reviewed OBMs can be applied to the analysis of continuous oximetry recordings for any other condition and we thus introduce a general purpose flow diagram for continuous oximetry analysis. We limit the scope of the biomarkers to single-channel oximetry analysis, thereby implicitly excluding OBMs that may require additional channels, such as airflow, to be engineered12. The paper defines categories of OBMs and a review of the literature to identify evidence-based OBMs in the field of sleep medicine as well as some additional suggested OBMs. The biomarkers are applied to a large PSG dataset totaling 3806 individual oximetry recordings.

## Results

### pobm toolbox and PhysioZoo OBM interface

The pobm toolbox was implemented in Python. For the purpose of quality control, functions were benchmarked against comparative reference source code (Supplementary Table 1) or ranges published in the literature (Supplementary Table 2). For the comparison to range reported in the literature, we compared the order of magnitude of some biomarkers with those reported by other for the non-OSA group. Figure 1 shows the PhysioZoo OBM interface for oximetry analysis. In PhysioZoo OBM, a SpO2 time series can be loaded (File → Open data file) and pre-filtered using one of the preprocessing filters introduced in section “Preprocessing”. After computation, OBM can be exported together with standard data representation figures. The PhysioZoo software handles data in .txt, .mat (The MathWorks, Inc., Natick, MA, USA) and WFDB13 formats. In addition, the PhysioZoo OBM enables oximetry analysis of multiple segments, thereby enabling tracking of temporal changes in oximetry measures for a given record. The pobm toolbox and PhysioZoo software are available at https://physiozoo.com/.

### Standard ranges for oximetry biomarkers

Tables 1 and 2 summarize the median and interquartile range for all the OBMs implemented in the PhysioZoo software for individuals participating in SHHS1. This provides a standard reference range for each oximetry biomarker. The null hypothesis of Kruskal–Wallis test was rejected for most biomarkers (43/44) with the smallest p value obtained for Px, CAx, and CTx. Following the Dunn post hoc analysis, a total of 30 biomarkers were statistically discriminative between every pair of classes, i.e., p < 0.05 for all pairs (Tables 1 and 2).

### Added value in combining multiple biomarkers

When performing simple regression analysis for individual OBM against the AHI, CTMρ with ρ = 0.25 achieved the highest goodness of fit ($$\overline R ^2 = 0.77$$) followed by the ODI3 ($$\overline R ^2 = 0.74$$). When combining the 10 oximetry biomarkers with the best score within a multivariable linear regression framework, the goodness of fit was further improved to $$\overline R ^2 = 0.82$$ (Fig. 2).

## Discussion

We showed that OBMs engineered from continuous oximetry recordings may provide discriminative information of groups of individuals suffering from respiratory disorders. Within the context of OSA, we found that CTMρ with ρ = 0.25 had the highest $$\overline R ^2$$ in estimating the AHI, with $$\overline R ^2 = 0.77$$. Furthermore, we demonstrated that combining multiple oximetry biomarkers for estimating the AHI increased the $$\overline R ^2$$ to 0.82. This highlights the complementary value in using multiple OBMs versus a single one.

Recent studies have shown that nocturnal hypoxemia correlates better with cardiovascular disease, cancer incidence, and mortality than traditional nocturnal respiratory disturbance indexes, such as the AHI12,14. This suggests that alternative nocturnal OBMs may provide important health information. Both intermittent hypoxia and sleep fragmentation are responsible for clinical manifestations and most related comorbidities of OSA15. In OSA, recurrent collapse of the upper airway leads to a reduced tidal volume and both intermittent hypoxemia and hypercapnia. In consequence, activity of the sympathetic nervous system increases and cortical arousals occur, leading to disrupted sleep architecture and restless sleep. In addition, repetitive hypoxemia–reoxygenation periods are linked to the production of free oxygen radicals, inflammation, and endothelial dysfunction16. In this regard, OBMs, such as CT90 and overnight mean and minimum saturation, have been significantly linked with dysfunction in cardiovascular modulation, arterial hypertension, atrial fibrillation, increased insulin resistance, higher incidence of lung cancer, and worst prognosis after myocardial infarction, as well as higher risk of post-surgery complications in OSA patients17. However, it remains very unclear which oximetric biomarkers (or combinations of biomarkers) are most predictive of clinical endpoints, such as metabolic and cardiovascular diseases. For example, the average duration and morphology of the events are not considered in routine OSA diagnosis. This is limiting, since longer apnea or hypopnea events will likely result in increased desaturation (in length and depth), which will likely result in more hypoxic stress, leading to more severe cardiovascular consequences. At the same time, longer desaturation events may result in a decrease in AHI, that is, a lower number of events per hour. Thus the relationship between duration and morphology of events and a clinical endpoint (e.g., cardiovascular complication) remains unclear. For this reason, additional desaturation biomarkers may provide valuable information on disease phenotyping. This has been suggested, for example, in the work of Kulkas et al.18, who showed, albeit on a very small population sample (n = 19), that additional oximetry biomarkers, i.e., duration and morphology related, enhance OSA phenotyping.

Nocturnal hypoxemia can be present in many respiratory diseases that are either acute or chronic. For example, OSA patients show cyclic desaturation–resaturation episodes during the night, which are linked with partial or complete obstruction of the upper airway, leading to the well-known chronic intermittent hypoxia pattern. On the other hand, COPD patients show slower and longer desaturations linked with sustained hypoventilation, mainly during rapid eye movement (REM) sleep, leading to a state of nocturnal chronic hypoxemia. Characterization of hypoxemia is therefore different during REM sleep. In subjects with OSA, oxygen desaturation indices (ODIs) of 3 and 4%, as well as mean, minimum, and CT90, are widely used in sleep medicine. On the contrary, criteria for classifying a COPD patient as a nocturnal desaturator are not well established. Showing at least one episode with saturation <90% lasting for >5 min and reaching a minimum saturation of at least 85% has been proposed19, while some authors define nocturnal desaturators as patients with CT90 ≥30%20. This example highlights that a standard to characterize and quantify some respiratory conditions, such as COPD, using nocturnal oximetry, remain to be defined. Usage of multiple OBM may also enable to identify patterns for different apnea types such as central apnea versus mixed apnea versus obstructive apnea as ongoing research studies such as the SomnaPatch intend (Somnarus Inc., ClinicalTrials.gov Identifier: NCT02034175). Machine learning algorithms will play an important role in engineering models that can learn complex combinations of OBM for the purpose of regression or classification tasks. Such models will uncover the OBM combinations that best reflect the unique patterns of a given condition.

Because respiratory conditions may possess different oximetry patterns/dynamics and oximetry recordings may be of different durations, it is important to define a general methodology for continuous oximetry time series analysis using the OBM toolbox. The suggested flow for such analysis is illustrated in Fig. 3. Following these steps, performance statistics relevant to the task at hand should be reported and a clear discussion should be delivered regarding the biomarkers that were most relevant to the data-driven model including interpretation about the underlying physiology.

Oximetry biomarkers may vary significantly with the technology used (transmission versus reflectance) as well as by the manufacturer. Most oximeters use two light-emitting diodes (LED) that face a translucent part of the body, such as the fingertip or earlobe, and a photodiode that receives light rays. In most cases, one LED is red and the second infrared. The oximeter includes a processor that calculates the oxygen saturation using the ratio between the amount of light that was emitted and the amount that was received at each wavelength. Oximeters may be transmissive or reflective. In a transmissive oximeter, the photodiode and the LEDs are placed on opposite sides of the measurement site and the light passes through the site. In a reflective oximeter, the LEDs are placed on the same side and the light is reflected to the photodiode across the measurement site.

During the current coronavirus disease 2019 (COVID-19) pandemic, many individuals with suspected or confirmed, but mild, COVID-19 are told to monitor their symptoms at home or from government-managed locations. Hospitalization is only an option if there is a medical need. Monitoring the blood oxygen level may be a meaningful way to remotely monitor individuals with mild COVID-1921. It could also be used for continuous monitoring of patients in the ICU with pneumonia, a common complication of COVID-19. However, there is a lack of smart algorithms that can exploit the information encrypted within these oxygen saturation physiological time series. The development of such algorithms will facilitate the continuous monitoring of COVID-19 ICU patients in predicting deteriorations. It remains to be determined how the information contained in the oxygen saturation physiological time series can be exploited. Are trends or absolute values or the occurrence of specific patterns the most meaningful information for identification of the disease and prediction of its course? The pobm toolbox developed in this publication can support researching novel biomarkers for diagnosis and prognosis of COVID-19.

Additional OBMs, such as kernel entropy, bispectrum, and wavelet, among others, should be considered and added to the library in future works. Although we demonstrated the usage of the PhysioZoo OBM resource on OSA, there is a need to assess the value of these biomarkers for other respiratory disorders.

Typical oximetry biomarkers used in clinical practice include the ODI and CT90. While these indices are standardized, to some extent, and interpretable, they fail to capture important pathophysiological characteristics. We reviewed evidence-based oximetry biomarkers, suggested a classification system, and created a unique resource (pobm toolbox and PhysioZoo OBM interface) for performing oximetry time series analysis. This resource can be applied to gain novel physiological, clinical, and epidemiological insights.

## Methods

### Oximetry biomarker categorization

Various categorizations of OBMs have been previously suggested10,22. We introduce a five-category classification scheme of our own, which we believe best reflects the literature and usage of these biomarkers in medical practice.

1. (i)

General statistics: these are time-based statistics describing the oxygen saturation time series data distribution.

2. (ii)

Complexity: quantifies the presence of long-range correlations in non-stationary time series.

3. (iii)

Periodicity: quantifies consecutive events to identify periodicity in the oxygen saturation time series.

4. (iv)

Desaturations: time-based descriptive measures of the desaturation patterns occurring throughout the time series.

5. (v)

Hypoxic burden: time-based measures quantifying the overall degree of hypoxemia imposed on the heart and other organs during the recording period.

A comprehensive summary of the OBMs reviewed and implemented in this research is presented in Table 3 for the general statistics, complexity, and periodicity categories and in Table 4 for the desaturation measures and hypoxic burden categories. A total of 44 oximetry biomarkers were engineered. A glossary with variables symbols and definition is presented in Supplementary Table 3.

### Preprocessing

Raw oximetry data is often associated with missing values and artefacts caused, for example, by motion of the oximeter or lack of proper contact between the finger and the probe. Therefore, the toolbox includes an option for two preprocessing filters:

#### Delta filter

A delta filter is applied to the SpO2 time series, in which, when two consecutive samples are >x%/s apart, they are considered non-physiological and are discarded. By default, x = 4%/s apart as in the work of Taha et al.23. As an example, applying the delta filter to the SpO2 time series is shown in Fig. 4a.

#### Block of data filter

An error value is considered a value <50%. For each error value, a small block of data of length x s (default is x = 20 s) around it is discarded. Once the small blocks are removed, the mean is computed for each block of data of length 100 s around the original error value. The mean of the overall SpO2 signal is also computed. Each block with a mean <6% smaller than the overall mean is discarded. This technique was used by Buekers et al.24. As an example, applying the block of data filter to the SpO2 time series is shown in Fig. 4b.

### General statistics

Average (AV): average of SpO2 values. Median (MED): median of SpO2 values. Min (Min): minimum of SpO2 values representing the physiological minimum of the SpO2. Standard deviation (SD): standard deviation of SpO2 values. Range (RG): the difference between the maximal and minimal SpO2 values. Percentile (Px): The xth percentile of SpO2 values. BelowMedian (Mx): Percentage of the signal x% below median oxygen saturation.

#### ZeroCrossing (ZCx)

ZeroCrossing (ZCx) is the number of zero-crossing points, used by Xie et al.25, using the x% SpO2 level as baseline. A crossing point is considered as two consecutive samples of the SpO2 signal, one lower than the baseline and the second greater, or vice versa. This biomarker helps to understand how the signal oscillates around a baseline. The intuition is that a SpO2 time series from a patient with OSA as compared to that of a non-OSA patient will oscillate more around the baseline because of the presence of desaturations and then reach a higher value. A common baseline used for this biomarker is the mean of the signal (i.e., by default, x = AV). ZCx is defined as:

$${\mathrm{ZCx}} = \mathop {\sum}\limits_{i = 10}^{N_{{\mathrm{SpO}}_2} - 1} {{\mathrm{ZC}}_i\left( x \right)},$$
(1)
$${\mathrm{ZC}}_i\left( x \right) = \left\{ {\begin{array}{ll} 1 & {\mathrm{if}}\left( {{\mathrm{SpO}}2_i - x} \right)\left( {{\mathrm{SpO}}2_{i + 1} - x} \right) < \, 0 \\ 0 & {\mathrm{else}} \end{array}}, \right.$$
(2)

where NSpO2 is the number of samples of the SpO2 time series.

#### Delta index (ΔIx)

ΔIx26 corresponds to the sum of the absolute variations between two successive points divided by the number of intervals. The original intuition was that SpO2 oscillations, induced by repeated apnea resumption of ventilation sequence, will lead to a high ΔI, while COPD-induced prolonged desaturations or nearly constant SpO2 would lead to a low ΔI. In the original paper of Pepin et al.26, recordings from a total of 160 consecutive patients referred for PSG were used to set the threshold of the ΔI distinguishing between OSA and non-OSA. When testing on the prospective group of patients, i.e., n = 36 patients with p = 34 nights of recordings for each patient, they obtained a sensitivity of 0.75 and specificity of 0.86. In Magalang et al.27, the ΔI was the best predictor (r2 = 0.60) for the AHI versus other OBMs including the ODI. The ΔI index is defined as:

$$\Delta {\mathrm{Ix}} = \frac{1}{{N_{{\mathrm{window}}}}} \cdot \mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{window}}}} {\left| {{\mathrm{SpO}}2\_{\mathrm{window}}_{i + 1} - {\mathrm{SpO}}2\_{\mathrm{window}}_i} \right|},$$
(3)

where SpO2_windowi is the average of the level of oxygen saturation for the window i of length x s, and Nwindow is the number of windows. In their original work, Pepin et al.26 used x = 12 s. In our implementation, signals are re-sampled to 1 Hz by default, so, by default, a window will contain 12 samples.

### Complexity measures

Regularity quantifies how often similar patterns are observed in the oximetry signal10. In the context of physiological time series analysis, approximate entropy (ApEn)28 and sample entropy (SampEn)29 have commonly been used as measures of the unpredictability (opposite of regularity). OSA individuals typically have less regular oximetry patterns, leading to higher ApEn and SampEn values as compared to non-OSA individuals. Loss of physiological complexity may be better captured by using other measures that can detect and quantify the presence of long-range correlations in non-stationary time series30 with measures such as the Lempel–Ziv complexity (LZ)31. Fractal objects, generated by stochastic or nonlinear deterministic mechanisms30, may also be used to capture complexity, as they show self-similarity, i.e., the smaller-scale structure resembles the larger-scale form32. Detrended fluctuation analysis (DFA) has commonly been used for fractal analysis in the field of physiological time series analysis.

#### Approximate entropy

ApEn is a biomarker introduced in Pincus et al.28, which aims to capture the irregularity in the signal, with higher values indicating higher irregularity. This biomarker is very useful in the detection of OSA, as high randomness is associated with high values of the biomarker. Thus apneas and hypopneas are associated with high ApEn values. ApEn(m, r, N) can be defined as:

$${\mathrm{ApEn}} = {\upvarphi}^m\left( r \right) - {\upvarphi}^{m + 1}\left( r \right),$$
(4)
$${\upvarphi}^m\left( r \right) = \frac{1}{{N - m + 1}}\mathop {\sum}\limits_{i = 1}^{N - m + 1} {\ln \left( {\frac{{N^m\left( i \right)}}{{\left( {N - m + 1} \right)}}} \right)},$$
(5)

where Nm(i) is the number of windows of length m for which the distance from the window beginning at the index i is lower than or equal to r. The distance between two windows can be defined as:

$$d\left( {X\left( i \right),\,X\left( j \right)} \right) = \mathop {{\max }}\limits_{1 \le k \le m} \left| {x\left( {i + k - 1} \right) - x\left( {j + k - 1} \right)} \right|.$$
(6)

This biomarker was first used in the context of OSA diagnosis from oximetry data, in a study by Hornero et al.33, with m = 1, r = 0.25·σ, where σ is the standard deviation of the data. The database was composed of SpO2 time series from subjects showing symptoms of sleep disordered breathing categorized into OSA-positive and OSA-negative groups according to the gold standard PSG. ApEn was used to diagnose OSA and reached 82.09% sensitivity and 86.96% specificity on a test set composed of n = 113 individuals.

#### Sample entropy

This is a non-linear biomarker that quantifies the irregularity in the data and has less bias compared to ApEn. It was used in the original work of Richman and Moorman29, who proved the robustness of sample entropy within the context of physiological time-series analysis (on neonatal HRV). This biomarker has also been used by Behar et al.6 for HRV analysis across different mammals. A pseudo-code for the implementation of SampEn is provided in Supplementary Methods.

#### LZ complexity

This biomarker was introduced by Lempel and Ziv31 in 1976. Within the context of SpO2 analysis, LZ evaluates the degree of complexity of spatiotemporal patterns in the SpO2 signal. It has been largely used in the domain of medicine, especially in the domain of biomedical signal analysis, such as in the work of Amigó et al.34, who used it on electroencephalogram time series, or by Álvarez et al.35 to discriminate between OSA and non-OSA individuals. For the later work, it resulted in a sensitivity of 86.5%, a specificity of 77.6%, and an accuracy of 82.9%, when tested on a population of n = 187 patients, including 147 males and 40 females. A pseudo-code for the LZ measure is available in Supplementary Methods.

#### Detrended fluctuation analysis

DFA is a scaling analysis method that aims to represent the autocorrelation properties of the signal. A major advantage of this method is its robustness against non-stationarity of the signal. This biomarker was introduced by Peng et al.36 to identify crossover behavior in signals. Larger fluctuations typical of repetitive desaturations lead to a higher DFA profile, while near-constant or slow, longer desaturations result in lower profiles. A pseudo-code for the implementation of DFA is provided in Supplementary Methods.

#### Central tendency measure (CTMρ)

CTMρ is a non-linear method first proposed by Cohen et al.37, with the goal of assessing the degree of variability in cardiac physiological data. The higher the variability in the SpO2 signal (i.e., more desaturations/apneas), the lower the CTMρ. Indeed, as CTMρ measures the number of points within a circular region of radius ρ, the higher the variability/dispersion the lower the number of points within the circle so the lower the CTMρ. This biomarker was used in the study of Álvarez et al.35 on a dataset composed of n = 187 patients, with ρ = 0.25 for the purpose of OSA diagnosis. Their analysis resulted in a sensitivity of 90.01% and a specificity of 82.9%. CTMρ is calculated as:

$${\mathrm{CTM}}_\rho = \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\rm{SpO}}_2} - 2} {\delta \rho \left( i \right)} }}{{N_{{\rm{SpO}}_2} - 2}},$$
(7)
$$\begin{array}{l}\delta \rho \left( i \right) =\\\quad \left\{ \begin{array}{ll} 1 & {\mathrm{if}}\,\sqrt {\left( {{\mathrm{SpO}}_2\left( {i + 2} \right) - {\mathrm{SpO}}_2\left( {i + 1} \right)} \right)^2 + \left( {{\mathrm{SpO}}_2\left( {i + 1} \right) - {\mathrm{SpO}}_2\left( i \right)} \right)^2}\, < \,\rho \\ 0 & {\mathrm{else}} \end{array} \right.\end{array}.$$
(8)

### Periodicity measures

Consecutive apneic events create some periodicity in the oxygen saturation time series. This periodicity can be quantified through techniques, such as frequency analysis, phase-rectified signal averaging (PRSA), and autocorrelation.

#### Phase-rectified signal averaging

PRSA is a signal processing technique introduced by Bauer et al.38 to detect and quantify quasi-periodic oscillations in a noisy non-stationary signal. The method also identifies patterns in increasing and decreasing regions of the signal. A PRSA window can be defined as:

$$\begin{array}{*{20}{c}} {\overline x \left( k \right) = \frac{1}{M}\mathop {\sum}\limits_{i = 1}^M {X_i\left[ k \right],} } & {{\mathrm{for}} - {L}\, \le \,{k}\,<\,{L}} \end{array},$$
(9)

where Xi is the window of length 2L around the anchor point x(i) and M is the number of anchor points. An anchor point is a decreasing point in the signal: x(i) < x(i−1), such that the decreasing part (negative slope) of the desaturation is always within the window. It can also be defined as increasing points in order to investigate patterns in the resaturation part of the event. Figure 5 shows an example of PRSA computation on an oximetry time series for L = 10 and M = 10 anchor points.

Within the context of OSA diagnosis, PRSA biomarkers were used in the study of Deviaene et al.22 and evaluated on three datasets: the Sleep Heart Health Study (SHHS) dataset39,40,41,42, the Apnea-ECG43 dataset, and a third set recorded at the sleep laboratory of the University Hospital Leuven. Five PRSA biomarkers were found significant and these are the ones that we implemented 22.

• PRSAdc, the capacity of the window, defined as

$${\mathrm{PRSAd}}_{\mathrm{c}} = \frac{{\overline x \left( 0 \right) + \overline x \left( 1 \right) - \overline x \left( { - 1} \right) - \overline x \left( { - 2} \right)}}{4}.$$
(10)
• PRSAdad, the amplitude differences; this is the difference between max and min values.

• PRSAdos, the overall slope of the window; the window is linearly approximated, and the slope is retained.

• PRSAdsb, the slope before the anchor point.

• PRSAdsa, the slope after the anchor point.

#### Autocorrelation

AC(k) measures the degree of correlation between values of the same variable. This is achieved by computing the correlation between the original SpO2 time series and a shifted version of it. The analysis of AC can be used to find repeating patterns, such as a periodic signal. Mathematically, AC can be defined as:

$${\mathrm{AC}}\left( k \right) = \mathop {\sum}\limits_{i = 1}^{N - k} {{\mathrm{SpO}}_2\left( i \right) \ast {\mathrm{SpO}}_2\left( {i + k} \right)}.$$
(11)

#### Power spectral analysis (power spectral density (PSDtotal, PSDband, PSDratio, PSDpeak)

In Zamarrón et al.44, the authors analyzed the PSD curve of the oximetry time series. They defined a spectral band of interest for oximetry analysis within the context of OSA as 0.014–0.033 Hz. Zamarrón et al.45 assessed PSD biomarkers on a total of 250 subjects between the ages of 21 and 82 years and obtained a sensitivity of 78.2% and a specificity of 89.0%. Figure 6 shows the differences in the spectral signal between a non-OSA and an OSA patient.

• PSDtotal corresponds to the area defined by the power spectrum:

$${\mathrm{PSD}}_{{\mathrm{total}}} = \mathop {\sum}\limits_{i = 1}^{{\mathrm{NFFT}}} {X\left( i \right)},$$
(12)

where X is the amplitude of the PSD function, estimated by the Welch’s method46 using a hamming window, and NFFT is the number of points in the PSD signal.

• PSDband corresponds to the energy within the band 0.014–0.033 Hz:

$${\mathrm{PSD}}_{{\mathrm{band}}} = \mathop {\sum}\limits_{i = N_1}^{N_2} {X\left( i \right),}$$
(13)

where N1 and N2 are the limits of the summation between 0.014 and 0.033 Hz.

• PSDratio corresponds to the ratio between the power (area) within the spectral band 0.017–0.033 Hz and PSDtotal.

$${\mathrm{PSD}}_{{\mathrm{ratio}}} = \frac{{\mathop {\sum}\nolimits_{i = N_1}^{N_2} {X\left( i \right)} }}{{\mathop {\sum }\nolimits_{i = 0}^{{\mathrm{NFFT}}} X\left( i \right)}},$$
(14)
• PSDpeak corresponds to the peak amplitude of the PSD within the band 0.014–0.033 Hz.

$${\mathrm{PSD}}_{{\mathrm{peak}}} = \mathop {{{\mathrm{max}}}}\limits_{N_1 \,< \,i\, <\, N_2} \left\{ {X\left( i \right)} \right\},$$
(15)

where i is the index of the power spectrum signal.

### Desaturation measures

Desaturations can occur as a consequence of conditions such as sleep disordered breathing, and can be characterized by descriptors such as their lengths and depths. For example, a study by Kulkas et al.47 studied the gender difference in the distribution of the desaturation lengths, depths, and areas caused by hypopnea and apnea events. Desaturations are not only caused by apnea or hypopnea events48 and thus desaturation events and their statistical descriptors may capture the expression of other conditions during sleep.

#### Oxygen desaturation index (ODIx)

The ODIx corresponds to the average number of desaturation events per hour. A desaturation is defined as a SpO2 drop of x% below the baseline. The ODIx is a widely used measure in the field of sleep medicine, where desaturations are characteristic of apnea and hypopnea events49. Indeed, obstruction of the airway leads to reduced entrance of oxygenated air to the lungs, which leads to a drop in oxygenated hemoglobin until airway patency is restored. These manifest as transient hypoxemic events or desaturations. Traditionally x = 3 or 4%. There exist many implementations of the ODI with variable definitions. In the present work, the implementation of the ODIx detection algorithm was developed and validated by Behar et al.50, building on the parent model and desaturation definition of Jung et al.51. Specifically, the model of Jung et al.51 defines three fiducial points A, B, and C, to determine the occurrence of a desaturation. Fiducial point A is defined as the point where the SpO2 value decreases by ≥1 and ≤3%, fiducial point B as the value that reaches a minimum of at least 3% below A, and fiducial point C as the point where the SpO2 value returns to a level either 1% below A or 3% above B. Some additional constraints are imposed, including that fluctuations in consecutive SpO2 values should be <1% between A and B and >−1% between B and C. Finally, the time interval between A and C must be ≥10 and ≤60 s51. In the original paper, 90 s was used as the time limit, but to ensure capture of resaturation, 60 s is used here. An additional fiducial point D was defined as a point posterior to C at which the SpO2 time series reaches a level of at least 1% below A and where the time interval between A and D is ≤60 s. From the detected desaturation events, several oximetry biomarkers can be computed (Fig. 7). Within our context of single-channel SpO2 analysis, i.e., when no reference EEG channel is available to quantify sleep time, the ODIx is defined as:

$${\mathrm{ODI}}_x = \frac{{N_{{\mathrm{desat}}}}}{{{\mathrm{TRT}}}},$$
(16)

where Ndesat is the number of desaturations in the signal and TRT is the total recording time in hours.

#### Desaturation length (DLμ, DLσ)

DLμ is the mean and DLσ is the standard deviation across the entire length of the desaturation. These biomarkers contain information about the duration of the desaturation events in the SpO2 signal. Indeed, ODIx only considers the number of desaturations but does not consider their length. The length of a desaturation is particularly important because it reflects how long an individual is under hypoxic stress. Furthermore, desaturations of various lengths will lead to a higher DLσ. DLμ and DLσ are defined as:

$${\mathrm{DL}}_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\rm{desat}}}} {{\tau }}_i,$$
(17)
$${\mathrm{DL}}_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {\left( {{\tau }}_i - {\mathrm{DL}}_{\mu} \right)^2} }}{{N_{{\mathrm{desat}}}}}},$$
(18)

where τi corresponds to the duration of the ith oxygen desaturation event.

#### Desaturation depth (DDmaxμ, DDmaxσ, DD100μ, DD100σ)

For a single desaturation, the desaturation depth is computed as the maximal minus the minimal SpO2 value within the desaturation event, from point A to B. DDmaxμ is the mean and DDmaxσ is the standard deviation across all individual desaturation depths. A similar pair of biomarkers can be engineered by computing the depth with respect to the 100% SpO2 level, i.e., 100% minus the minimal SpO2 value of a desaturation event. These are, respectively, denoted DD100μ and DD100σ for the mean and the standard deviation computed across all desaturation depths. The idea of quantifying the desaturation depth has also been used by Terrill10. The desaturation depth may be an important factor for determination of the severity of OSA because it will reflect, for a given desaturation, the level of the hypoxic stress imposed. Desaturation depth also varies with sleep stages, and so there is value in capturing its variation across desaturations52. Furthermore, alternation of soft and deep desaturations will lead to high values of DDmaxσ and DD100σ. The two pairs of biomarkers are defined as:

$${\mathrm{DDmax}}_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{max}}_i - {\mathrm{min}}_i},$$
(19)
$${\mathrm{DD}}100_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{desat}}}} {100 - {\mathrm{min}}_i},$$
(20)
$${\mathrm{DDmax}}_{\sigma} = \sqrt {\frac{{\mathop {\sum }\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} \left( {{\mathrm{max}}_i - {\mathrm{min}}_i - {\mathrm{DDmax}}_{\mu}} \right)^2}}{{N_{{\mathrm{desat}}}}}},$$
(21)
$${\mathrm{DD}}100_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {\left( {100 - {\mathrm{min}}_i - {\mathrm{DD}}100_{\mu}} \right)^2} }}{{N_{{\mathrm{desat}}}}}},$$
(22)

where maxi is the maximum value of desaturation i and mini is the minimum value of desaturation i. The variables maxi and mini are illustrated in Fig. 7. The desaturation depth can also be described by extending the ODI to any threshold x and studying the cumulative frequency of the desaturations as a function of x53.

#### Desaturation slope (DSμ, DSσ)

The downslope of the signal is calculated for each desaturation. The decreasing phase of the desaturation is linearly approximated. DSμ is the mean and DSσ is the standard deviation of the slopes over all desaturation events. These biomarkers consider the slope of the desaturation, which is a different factor than the number, duration, or depth of the desaturations. Indeed, OSA and other pathologies may lead to sharp drops in SpO2, which would lead to high DSμ value. The slope of a specific desaturation can be written as:

$${\mathrm{Slope}}_i = \frac{{B - A}}{{t_B - t_A}},$$
(23)

where (A, tA) is the point of inflexion (amplitude and timestamp) of the desaturation, and (B, tB) is the minimum point of the desaturation. Accordingly, the mean and standard deviation of slopes are computed as:

$${\mathrm{DS}}_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{Slope}}_i},$$
(24)
$${\mathrm{DS}}_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {\left( {{\mathrm{Slope}}_i - {\mathrm{DS}}_{\mu}} \right)^2} }}{{N_{{\mathrm{desat}}}}}}.$$
(25)

Slopei can be seen in Fig. 7.

#### Desaturation area (DAmaxμ, DAmaxσ, DA100μ, DA100μ)

The area of the desaturation is computed for each desaturation. DA100μ is the mean and DA100σ is the standard deviation of the area across all individual desaturations, taking 100% SpO2 as baseline. The area can also be computed by taking the maximum SpO2 value of individual desaturations as baseline. DAmaxμ is the mean and DAmaxσ is the standard deviation of the area across all individual desaturations. Whereas the Desaturation length biomarker considers the duration of the events (reflecting the time under hypoxic stress) and the desaturation depth biomarkers consider the depth of events (reflecting the strength of hypoxia), the desaturation area factorizes both the depth and the length of the desaturations. The two pairs of biomarkers can be mathematically written as:

$${\mathrm{DA}}100_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{S}}100_i},$$
(26)
$${\mathrm{DA}}100_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {\left( {{\mathrm{S}}100_i - {\mathrm{DA}}100_{\mu}} \right)^2} }}{{N_{{\mathrm{desat}}}}}},$$
(27)
$${\mathrm{DAmax}}_{\mu} = \frac{1}{{N_{{\mathrm{desat}}}}}\mathop {\sum}\limits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{Smax}}_i},$$
(28)
$${\mathrm{DAmax}}_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {\left( {{\mathrm{Smax}}_i - {\mathrm{DAmax}}_{\mu}} \right)^2} }}{{N_{{\mathrm{desat}}}}}},$$
(29)

where Smaxi is the area of the specific desaturation event integrated from the maximal (max) value of the desaturation event and S100i is the area of the specific desaturation event integrated from 100%. Smaxi and S100i can be seen in Fig. 7.

#### Time between desaturation (TDμ, TDσ)

The average and standard deviation of time elapsed between two consecutive desaturation events can be used to capture some aspect of the temporal distribution of desaturation events. The two biomarkers can be computed as:

$${\mathrm{TD}}_{\mu} = \frac{1}{{N_{{\mathrm{desat}}} - 1}}\mathop {\sum}\limits_{{\mathrm{i}} = 2}^{N_{{\mathrm{desat}}}} {\Delta t_i},$$
(30)
$${\mathrm{TD}}_{\sigma} = \sqrt {\frac{{\mathop {\sum}\nolimits_{i = 2}^{N_{{\mathrm{desat}}}} {\left( {\Delta t_i - {\mathrm{TD}}_{\mu}} \right)^2} }}{{N_{{\mathrm{desat}}} - 1}}},$$
(31)

where Δti is the time elapsed between desaturation i and desaturation i − 1. Δti can be seen in Fig. 7.

### Measures of the hypoxic burden

#### The percentage of oxygen desaturation events (PODx)

The PODx is the overall duration of all desaturations, normalized by the total recording time. It was introduced by Kulkas et al.18 in order to estimate the severity of OSA from the SpO2 time series. It was used in the work of Watanabe et al.54 to study the prognostic importance of novel oxygen desaturation measures in heart failure and central sleep apnea population samples. Non-survivors had a higher PODx compared with survivors (19 ± 13 versus 11 ± 6.4%; p = 0.001). By contrast, non-survivors did not differ significantly from survivors with respect to the AHI and CT90%. An adjusted logistic regression analysis revealed that the PODx was the best independent predictor of mortality. In the work by Kulkas et al.18, the biomarker was computed on a dataset collected from 160 male patients with different levels of AHI severity. The correlation between AHI and PODx was high: r2 = 0.8718. The PODx can be mathematically defined as:

$${\mathrm{POD}}_x = 100 \cdot \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {{\tau }}_i }}{{{\mathrm{TRT}}}},$$
(32)

where τi (Fig. 7) corresponds to the duration of each oxygen desaturation event and x to the level of the desaturation. In their original publication, Watanabe et al.54 set x = 4%.

#### The area under the oxygen desaturation curve (AODmax, AOD100)

The AODx was introduced by Kulkas et al.18 in the same context as the PODx, for the estimation of the sleep apnea–hypopnea syndrome. It was used in the work of Watanabe et al.54, along with the POD biomarker. Survivors in this study appeared to have lower AODx than non-survivors (0.16 ± 0.2 versus 0.26 ± 0.2; p = 0.08). It represents the sum of the area of each desaturation event divided by TRT. This index was demonstrated to be an independent modulator of increased epicardial fat volume (EFV) in an acute myocardial infarction population sample14 (n = 105). EFV is associated with adverse cardiovascular events after myocardial infarction. In the work of Kulkas et al.18, this biomarker appeared to have moderate correlation with AHI: r2 [0.581−0.689], p < 0.001. It can be mathematically defined as:

$${\mathrm{AODmax}} = 100 \cdot \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{Smax}}_i} }}{{{\mathrm{TRT}}}},$$
(33)
$${\mathrm{AOD}}100 = 100 \cdot \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{desat}}}} {{\mathrm{S}}100_i} }}{{{\mathrm{TRT}}}}.$$
(34)

Smaxi and S100i are illustrated in Fig. 7.

#### Cumulative time (CTx)

Percentage of the time spent below the x% oxygen saturation level. Typically, CT90 is used55 but other thresholds such as 80 or 84% have also been assessed27. This biomarker is evaluated on the overall signal, i.e., not only on the desaturation events, and it might consequently capture hypoxic behaviors that are different from the desaturation events found in OSA. The biomarker is illustrated in Fig. 8. It is mathematically defined as:

$${\mathrm{CT}}_x = 100 \cdot \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{SpO}}2}} {t\left( x \right)_i} }}{{{\mathrm{TRT}} \ast {\mathrm{fs}}}},$$
(35)
$$t\left( x \right)_i = \left\{ {\begin{array}{ll} 1 & {\mathrm{if}}\,{\mathrm{SpO}}{2_i}\,<\,{x} \\ 0 & {\mathrm{else}} \end{array}}, \right.$$
(36)

where fs is the sampling frequency of the signal.

#### Cumulative area (CAx)

Total area under the x% oxygen saturation level. This biomarker was introduced by Watanabe et al.54, with x = 90%. Indeed, OSA patients tend to have a greater area under the baseline x than non-OSA patients and then get a higher value for this biomarker. The biomarker is illustrated in Fig. 8. It can be defined as:

$${\mathrm{CA}}_x = 100 \cdot \frac{{\mathop {\sum}\nolimits_{i = 1}^{N_{{\mathrm{SpO}}2}} {\left( {x - {\mathrm{SpO}}2\left( x \right)_i} \right)} }}{{{\mathrm{TRT}} \ast {\mathrm{fs}}}},$$
(37)
$${\mathrm{SpO}}2\left( x \right)_{\mathrm{i}} = \left\{ {\begin{array}{ll} {\mathrm{SpO}}2_i & {\mathrm{if}}\,{\mathrm{SpO}}2_i\, < \,{x} \\ x & {\mathrm{else}} \end{array}}. \right.$$
(38)

### Evaluation database

In order to demonstrate the usability of the implemented oximetry biomarkers and to define some normality ranges, we used the SHHS39,40,41,42 database. SHHS was a multi-center cohort study conducted by the National Heart Lung & Blood Institute (ClinicalTrials.gov Identifier: NCT0000527) to determine the cardiovascular and other consequences of sleep-disordered breathing. In all, 6441 men and women, aged ≥40 years, were enrolled between November 1, 1995 and January 31, 1998. Institutional review board from the Technion-IIT Rappaport Faculty of Medicine was obtained under number 62-2019 in order to use this database for research. The variable “ahi_a0h3a” was used for the AHI in order to define the classes. To elaborate this variable, the AHI was computed as the average number of all apneas and hypopneas (with oxygen desaturation >3% or an arousal) per hour of sleep and following the American Academy of Sleep Medicine (AASM) 2012 rules56. OSA severity was defined with respect to the AHI, i.e., mild (5 ≤ AHI < 15), moderate (15 ≤ AHI <30) or severe (AHI ≥ 30). The Nonin XPOD 3011 pulse oximeter (Nonin, USA) was used for recording. The signal was sampled at 1 Hz and with a resolution of ±0.01%. The OBM were computed for patients with available recordings and at least 4 h of continuous SpO2 tracing. This resulted in a total of 3806 individual patient recordings out of 5793 patients who participated in the first study visit (SHHS1). Among them, there were 1195 non-OSA, 1303 with mild OSA, 833 with moderate OSA, and 475 with severe OSA. OBMs were evaluated on these recordings in order to report reference ranges for each OBM.

### Statistical and regression analysis

The median and interquartile range of the SpO2 biomarkers were computed for the following classes: non-OSA, mild, moderate, and severe OSA. Kruskal–Wallis test with post hoc analysis was performed. Statistical significance or non-significance was indicated as “p < 0.05”, “p < 0.001”, or “p > 0.05”. Dunn post hoc analysis was performed between each pair of the classes. Multivariable linear regression was performed to assess the added value in combining OBM for the purpose of estimating the AHI. To this end, linear regression was performed between individual and combined sets of biomarkers and the AHI. For each model, the adjusted R2 $$( {\overline R ^2} )$$ score was reported. $$\overline R ^2$$ is defined as:

$$\overline R ^2 = 1 - \frac{{\left( {1 - R^2} \right)\left( {{\mathrm{size}} - 1} \right)}}{{{\mathrm{size}} - {\mathrm{pred}} - 1}},$$
(39)

where pred is the number of predictors, and size is the total sample size.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.