New principle for measuring arterial blood oxygenation, enabling motion-robust remote monitoring

Finger-oximeters are ubiquitously used for patient monitoring in hospitals worldwide. Recently, remote measurement of arterial blood oxygenation (SpO2) with a camera has been demonstrated. Both contact and remote measurements, however, require the subject to remain static for accurate SpO2 values. This is due to the use of the common ratio-of-ratios measurement principle that measures the relative pulsatility at different wavelengths. Since the amplitudes are small, they are easily corrupted by motion-induced variations. We introduce a new principle that allows accurate remote measurements even during significant subject motion. We demonstrate the main advantage of the principle, i.e. that the optimal signature remains the same even when the SNR of the PPG signal drops significantly due to motion or limited measurement area. The evaluation uses recordings with breath-holding events, which induce hypoxemia in healthy moving subjects. The events lead to clinically relevant SpO2 levels in the range 80–100%. The new principle is shown to greatly outperform current remote ratio-of-ratios based methods. The mean-absolute SpO2-error (MAE) is about 2 percentage-points during head movements, where the benchmark method shows a MAE of 24 percentage-points. Consequently, we claim ours to be the first method to reliably measure SpO2 remotely during significant subject motion.

Scientific RepoRts | 6:38609 | DOI: 10.1038/srep38609 To provide a solution for subjects with an extremely sensitive skin, e.g. preterm infants 4 , the possibility of camera-based PPG measurement has been considered. This technique is referred to as remote PPG (rPPG). Most rPPG research focused on robust extraction of the cardiac pulse signal to measure pulse rate or derived features, e.g. heart rate variability (HRV). However, camera-based SpO 2 measurement has also been attempted in the last few years [5][6][7][8][9][10][11][12][13][14][15][16][17] . Besides the advantage that a camera does not require direct skin contact, it has the potential to be more robust by exploiting the spatial redundancy of the camera sensor, which is not possible with the single-spot measurement of a contact sensor. However, the lower SNR of the reflected light renders rPPG methods more susceptible to noise as compared to PPG methods. Furthermore, the calibration of camera-based SpO 2 is not trivial because of the fundamental difference between the geometries of the conventional contact source-detector and the contact-less illumination-detection; whereas the former geometry collects light that has travelled through relatively deep vasculature 18 , the latter predominantly collects light that has travelled through much shallower tissue depths over much smaller distances 19 . Currently, it is not known whether the PPG signal measured in the camera geometry stems mostly from the deeper arterioles or also (partly) from the shallow capillaries. In fact, there is some controversy on whether the PPG signal stems directly from blood volume changes at all [20][21][22] . Kamshilin et al. 21 challenged the widely believed presumption that the pulsatile variations of the light absorption are mainly caused by arterial blood-volume pulsations and presented a new interpretation of remote PPG to explain their experimental observations of strongly pulsatile counter-phase PPG waveforms, which were detected as local hotspots in the amplitude and phase maps at the wrist. They proposed a new model of light interaction with biological tissue in-vivo in which pulse oscillations of arterial transmural pressure mechanically deform the connective-tissue components of the dermis resulting in periodical changes of both the light absorption and scattering coefficient, which suggests that it is an indirect measurement of arterial pressure variations. Moço et al. 22 showed however that for the explanation of the strongly pulsatile counter-phase PPG waveforms a new physiological model is not required. They performed experiments in skin covered by opaque ink to prove that these effects find explanation in the motion pattern of the skin; ballistocardiography (BCG). A thorough understanding about the origin of the remotely measured PPG waveform is important for pulse oximetry to ensure that the cardiac-synchronous intensity variations can solely be related to arterial blood volume variations and not to other physiological factors. In their recent study, Verkruysse et al. 23 showed that the fundamental difference in geometries of contact and non-contact methods does not harm the calibratibility of camera-based SpO 2 ; a calibration curve determined for a population of 24 individuals was validated on 40 individuals with various skin-tones. The camera-based accuracy was found to be about ± 3%, which is comparable to that of conventional transmissive probes, and therefore proves its feasibility.
The potential to measure SpO 2 with a camera was first mentioned by Wieringa et al. 5 , who investigated different wavelengths for rPPG, but did not show results on oxygen saturation due to poor SNR of the PPG waveforms. Humphreys et al. 6 were the first to estimate oxygen saturation with a single camera using dual-wavelength near-infrared illumination, where the PPG waveforms are obtained in transmissive mode. In ref. 7, attempts have been made to relate RGB signals to relative blood oxygen concentrations by performing Monte Carlo simulations of light transport. Scully et al. 8 showed the feasibility of using the RGB camera of a mobile phone to measure oxygen saturation, although it requires the finger to be placed on the lens, similar to ref. 15 where a setup consisting of two RGB cameras is proposed with dual wavelength illumination, and ref. 24 who claim to be able to measure oxygen saturation with a smartphone camera without calibration and independent of the hardware and skin characteristics. The desire for non-contact camera-based methods was first addressed by Kong et al. 11 , who attached narrow-band optical filters to the cameras. They demonstrated the feasibility to estimate SpO 2 remotely with two monochrome cameras under ambient light conditions, which has been further elaborated on by Tarassenko et al. 12 by using the red and blue color channels of a single RGB camera. Another single camera approach has been presented by Shao et al. 16 , but instead of using ambient light illumination, a trigger-controlled dual wavelength LED-array was proposed to measure the PPG signal at different wavelengths with a monochromatic camera. Recently, efforts have been made to minimize the effects of noise artifacts, which have a large influence on the accuracy of the described methods. Bal et al. 14 proposed using a skin detector to only select skin pixels from the detected face region, whereas Guazzi et al. 13 exploited the spatial redundancy of the camera by pruning distorted regions based on signal quality and phase information. All aforementioned methods are based on the principles of conventional pulse oximetry, which determine the blood oxygenation levels from parameters directly extracted from the PPG waveforms measured at two wavelengths. In other words, the current methods measure relative pulsatility at different wavelengths. This makes them susceptible to noise and hence their robustness is limited. The accuracy is further influenced by the sensitivity of rPPG methods to motion. None of the aforementioned methods has addressed this challenge limiting their use for clinical practise.
We introduce a new principle that allows accurate measurement even during significant subject motion. It is based on a recent technique 17 exploiting an a priori signature of relative pulsatile amplitudes at different wavelengths to extract the best quality pulse-signal from noisy data. Basically, we invert the principle of robust pulse-extraction by searching for the signature that produces the best pulse quality. This optimal signature then can be mapped to an SpO 2 -reading. The large asset of our method compared to all current methods is that it does not require clean PPG signals for accurate measurement, instead it utilizes the fact that noise and motion artifacts affect the overall quality of the pulse signal, but not the optimal wavelength combination which minimizes distortions. This optimal combination is shown to remain stable, even when the signals are very noisy; results remain accurate with noise levels that can be 3 orders of magnitude higher than what the ratio-of-ratios method, used in conventional pulse oximetry, allows. Additionally, the spatial redundancy of the camera is exploited to further reduce the influence of artifacts on the measurement by using multi-site measurements. This improvement is shown to be exclusively applicable to the new principle, since the increased noise in the smaller sub-regions prohibit current ratio-of-ratios methods to profit from such multi-site measurements. To our knowledge, this is Scientific RepoRts | 6:38609 | DOI: 10.1038/srep38609 the first non-contact camera-based method to measure blood oxygenation levels in the presence of motion and noise artifacts.

Results
The experimental results can be categorized into two sections. The first section consists of quantitative noise sensitivity results. A 4 minutes recording without subject motion containing a rapid desaturation event has been selected, where multiplicative random noise with different noise levels has been added to the recording to verify the robustness of the method, both with and without exploiting multi-site measurements. The second section presents the results from a self-created dataset of healthy subjects with various skin-tones, which perform head movements to verify motion robustness. The experimental setup of all recordings consists of three identical monochrome cameras, type Manta of Allied Vision Technologies GmbH, which capture the scene simultaneously at a frame rate of 15 FPS, with a resolution of 968 × 728 pixels and with 8 bits depth. The cameras have 25 mm lenses with different optical filters mounted to them, each capturing a specific part of the light spectrum. For our benchmark dataset optical filters with a center-wavelengths of 760, 800 and 840 nm are used. The reasoning for this wavelength selection is twofold: (1) the clinical desire to measure SpO 2 in full darkness, e.g. during sleep in a hospital setting, and (2) the wavelengths are spectrally sufficiently spaced to provide contrast necessary for robustness, while remaining within the spectral sensitivity of the camera sensor. . More details about these vectors and how they can be used to measure SpO 2 can be found in the Methods section. Since the three monochrome cameras have slightly different viewpoints because of their physical spacing, the frames are aligned using an affine transformation to ensure that the pixel locations are coinciding for the different channels. For reference, a finger pulse-oximeter has been attached to the index finger of the subject. This data is synchronized with the camera frames. The performance of the algorithms is evaluated with three different metrics: (1) Mean Absolute Error (MAE), (2) Root Mean Squared Error (RMSE) and (3) standard deviation (STD): where N indicates the number of evaluated samples. Additionally, correlation and Bland-Altman analysis have been performed. Our method, entitled adaptive PBV (APBV), is benchmarked against a ratio-of-ratios-based (RR) algorithm, in line with the earlier proposed camera-based methods. This benchmark algorithm is extensively described in the Methods section, and is calculated with the 760 and 840 nm PPG waveforms. The SpO 2 values have been estimated on time-windows with a duration of 10 seconds, with a step-size of 1 second.
Noise sensitivity analysis. As indicated in the introduction, motion and noise artifacts have a large influence on the performance of conventional pulse oximeters. Earlier proposed camera-based methods suffer from the same limitation. To verify the sensitivity of our method to these artifacts, a recording with a duration of 4 minutes has been selected which includes breath-holding after 1 minute to induce a hypoxemic event, resulting in a dip in oxygen saturation of more than 10 percentage-points (PP). The recording was made under incandescent lighting conditions, providing homogeneous illumination of the skin area. Incandescent light was used because of its continuous emission spectrum for both visible and infrared wavelengths. However, we only use invisible portions of the emitted spectrum, and consequently the light source can be replaced with nearly invisible LEDs. By evaluating the performance of our method for different noise levels, the accuracy of our method can be measured and compared with the benchmark algorithm. The noise-adding process is defined as: identifies the value of the pixel at location → x and time t for wavelength i, and η indicates the zero-mean random Gaussian noise term added to the spatial average of each sub-region at time t. The resulting multiplicative noise is similar to the distortions caused by intensity variations, typically seen during motion. An advantage of using a camera compared to a contact probe is that it allows multi-site measurements.
Regions with a distorted signal within the selected Region of Interest (RoI) which pollute the measurement can be pruned when exploiting this spatial redundancy. To individually assess the gain in performance of our principle and the contribution of using multi-site measurements, we first perform the analysis on the single-site measurement of the entire RoI, which is subsequently repeated using multi-site measurements, where the RoI is divided into equally-sized sub-regions. The details how we exploit the spatial redundancy are described in the Methods section. Figure 1 contains the statistical results of the analysis.
The results show that the APBV method profits from multi-site measurements, in contrast to the RR method, which even shows a decrease in performance. This can be explained by the reduced SNR of the sub-regions compared to the SNR of the entire ROI. Our method is not much affected by noise in the individual PPG waveforms since the algorithm searches for the signature that provides the best SNR, even though it may be quite low. However, the performance of the RR method is highly affected by the signal quality. Averaging over more skin pixels improves signal quality and hence improves the accuracy of the RR method.
Our method requires at least two wavelengths. However, when using two instead of three wavelengths as used in the original formulation 17 , one loses dimensionality to suppress distortions, which will negatively impact the performance of our method. To verify this hypothesis, we compared the performance of our method for both two and three wavelengths, where the two wavelengths are similar to the ones selected for the RR method. The results show that the RR-based benchmark algorithm is able to accurately estimate SpO 2 for noise levels up to 0.006 when using a single-site measurement, whereas the APBV method is capable to estimate SpO 2 for noise levels higher than 1, which indicates the large improvement in robustness of our method. The clinically acceptable accuracy criterion is specified in the International Standard for pulse-oximeter manufacture ISO 80601-2-61-2011, which requires an accuracy of ≤ 4% error in the range 70-100% SpO 2 25 . Comparing the results obtained with two and three wavelengths, confirms our hypothesis that robustness decreases when reducing the number of wavelengths. Figure 2 provides a visual comparison for different noise levels when using multi-site measurements for the APBV method and an average-site measurement for the benchmark RR method, as these combinations provide the best results. These results are obtained from a simulation of an ideal case, where motion causes intensity-variations only. We expect the performance-drop to be more severe in the real-life case where the noise may have a different character. To verify this hypothesis, we repeated the previous analysis on a motion sequence. The results of this analysis are presented in Fig. 3 and the protocol used is described in Fig. 4. Real-life artifacts, like remaining parallax, tracking errors, specular reflection, may additionally lead to more harmful color-distortions. The results indeed show that the differences between the different evaluated variants have increased, especially those between two and three wavelengths. Based on these results, we decided to create our dataset with three wavelengths.
Breath-holding events with motion. The results from the noise analysis show that our proposed method outperforms the benchmark algorithm in terms of robustness to multiplicative, random noise. Additionally, we found from analysis on a motion sequence that our method profits from multi-site measurements. To verify if this also holds on a broader population, we created a dataset where subjects were asked to perform continuous quasi-periodic head movements, similar to the protocol of the previous Section. A total number of 14 recordings We compare the performance of our method, both with two and three wavelengths, with the benchmark RR method for different noise levels. The top row contains the results using a single-site measurement, the bottom row that of using multi-site measurements. Figure 2. Noise sensitivity results after adding multiplicative Gaussian distributed random noise to a stationary sequence with a breath-holding event, whose start is indicated with the dotted line. It can be observed that the proposed method is much less sensitive to noise compared to the standard ratio-of-ratios (RR) method, and three wavelengths yield better robustness compared to two.
Scientific RepoRts | 6:38609 | DOI: 10.1038/srep38609 were made of 4 healthy, non-smoking subjects with different skin-types in the range II-V of the Fitzpatrick scale 26 . We had to exclude 2 recordings from the dataset because one suffered from frame drops during acquisition and for one other our motion-tracker could not deal with the vigorous motion. An overview of the 9 minutes protocol is visualized in Fig. 4. The subjects were asked to have their face focussed towards the cameras in front of them. During the first four minutes of the recording, the subjects were asked to keep their head stationary and hold their breath as long as possible starting one minute after the beginning of the test. Between the fifth and eighth minute, a second breath-holding event was timed similar to the first event, however the subjects were now asked to move their head for four minutes with a combination of translation and rotation movements. The last minute of the 9 minutes protocol was stationary again with normal breathing. The duration of the entire dataset is 108 minutes with SpO 2 values ranging between 80.2 and 100%.
In Fig. 5, the results of all four subjects, indicated I-IV, are displayed visually. Based on the noise sensitivity results, the APBV method is evaluated using multi-site measurements, whereas the RR method uses single-site measurements. It can be observed that both the benchmark RR method and our APBV method are capable of estimating SpO 2 during the static part of the sequence, when there is no head motion. During the head movements in the second part of the sequence, the RR method completely fails, whereas the APBV method is still able to estimate SpO 2 , although accuracy decreased compared to the static part. The statistical results are presented in Table 1. To quantify the difficulty of the sequences, both the pulse amplitude and the average noise amplitude are calculated for each subject, which are defined as the ratio of the pulsatile AC component and the stationary DC component. Here the AC component is determined by calculating the standard deviation of the concatenated spatial mean values of the 800 nm wavelength after filtering out the pulse frequency and its harmonics, and the DC component is determined by the first component of the Fourier transform. Besides amplitudes, also the range of pulse rates and motion frequencies present in the dataset are added for each subject.
During the static part of the sequences, the estimated SpO 2 of our APBV method is within the clinically acceptable accuracy of ≤ 4% for 96.6% of the time, compared to 87.9% for the benchmark RR method. The inaccuracies in the static parts are mainly caused by physiological response differences between head and finger location, which will be further discussed in the next Section. In the challenging motion part of the sequences, the APBV method is clinically accurate according to the ISO standard 25 for 86.1% of the time, in contrast to the benchmark method which is clinically accurate for only 13.1% of the time. Correlation plots and Bland-Altman analysis are displayed in Fig. 6, whereby the 95% limits of agreements are set at 2σ. We split our analysis into two parts: static and motion, to distinguish between the performance of the methods for both scenarios.  . Results on a 9 minutes motion sequence including two hypoxemic events (indicated with dotted lines). We compare the performance of our method, both with two and three wavelengths (λ), with the benchmark RR method. Furthermore, we investigate the gain in performance when using multi-site measurements (M-S) compared to a single-site measurement (S-S). Overall, the average MAE on the complete dataset is 2.03 PP for the APBV method, compared to 24.2 PP for the benchmark method. This difference is mainly caused by the sensitivity to motion artifacts of the benchmark algorithm, and emphasizes the improvement of our proposed method to cope with these distortions.

Discussion
The reference blood oxygenation values for our dataset are obtained with a pulse-oximeter attached to the index finger of the left hand. It is well known that finger-oximetry readings lag the patient's physiologic state; signal averaging of 4 to 20 seconds is typical of most monitors 27 . A delay because of sensor anatomic location and abnormal cardiac performance compound the lag relative to central SaO 2 . Forehead and ear probes are closer to the heart and therefore respond more quickly than distal extremity probes; an average delay of 15 seconds between ear and finger has been measured 28 . The response difference compared to central SaO 2 is also compounded by hypoxemia and slower peripheral circulation such as low cardiac output states. As such, forehead reflectance probes are often preferred in critically ill patients. All of these response delays become clinically more important during rapid desaturation, such as those present in our dataset. Since we estimate SpO 2 from the facial skin, an ear probe would be the optimal solution to minimize the response time without occluding skin pixels. However, since the subjects perform head movements, this probe location suffers from motion artifacts resulting in erroneous measurements. A finger can be easily isolated from these head movements and was therefore selected as best alternative location. This measurement location as reference imposes two assumptions: (1) a fixed response delay between head and finger, and (2) a similar response at both measurement sites, which holds in steady-state but not during hypoxemia, e.g. during breath-holding events 29 . Therefore the comparison of saturation values between our method and the reference may not be accurate during these events, although a dip in the SpO 2 -curve is expected to occur at some, not too distant, point in time.
As can be observed from Fig. 7, when SpO 2 decreases, the pulse amplitude at 760 nm increases whereas the amplitude at 840 nm decreases. Since the amplitude at 800 nm does not change as it is close to the isosbestic point of (oxy-)haemoglobin, the differences in amplitude decrease between the three channels. A consequence is that motion robustness of the PBV method decreases at lower blood oxygenation values. Assuming uniform, homogeneous illumination of the skin, motion will result in intensity variations which are equal in all three channels. Similar to the definition of → P bv , these variations can be characterized with the 'motion' vector [0.58, 0.58, 0.58]. Since the pulse 'signature' → P bv at normal blood saturation levels is different from this motion signature, the PBV method is able to suppress these motion distortions. However, when the pulse and motion signatures are becoming more similar, as it occurs during de-saturation, it gets more difficult to discriminate between them, resulting in a pulse signal with lower SNR. The selection of wavelengths has a large influence on the robustness of the method. Motion robustness can be improved by maximizing the angle between the pulse and motion vector, as has been investigated in our previous study 30 . Our wavelength selection is mainly motivated by the clinical desire to measure SpO 2 in full darkness. However, when full darkness is not a strict criterion, it is preferred to increase It can be observed that the RR-based method achieves satisfactory results in the first static part of the recordings, but completely fails in the presence of motion artifacts. The proposed APBV method is capable to estimate SpO 2 during motion, although a decrease in accuracy can be identified compared to the static part.  Fig. 8, and confirm the improved robustness of our method with the 675 nm wavelength. Because also the RR method profits from an increased contrast, the comparative results of this method are added to the figure. Similarly, some gain can be expected choosing the longest wavelength even longer. There is a limitation here, in that the camera-sensitivity drops rapidly for longer wavelengths, while also water-absorption may start to play a role. Statistical results are presented in Table 2. As expected the performance does not change considerably during the static part of the recording; an average error of 1.17 versus 1.21 PP. In the challenging, motion part of the   41 . Right: The relative PPG amplitude spectra for 60% and 100% SpO 2 .
Scientific RepoRts | 6:38609 | DOI: 10.1038/srep38609 sequence however, a noticeable gain in accuracy can be recognized; an average error of 1.40 compared to 1.98 PP when using 675 nm. Since the motion distortions are difficult to quantify in sequences where subjects are asked to move their head continuously, we performed a noise sensitivity analysis similar to the one described in the Results section. The results of this analysis are presented in Fig. 9, and confirm our previous results that clinically acceptable accuracy can be achieved at higher noise levels when the shortest wavelengths is moved towards visible red. Please note that this change of wavelength improves motion robustness at normal oxygen saturation levels, but is not a solution to the reduced robustness at low saturation levels since the angle between the pulse and motion vector still decreases for decreasing SpO 2 levels.
A common problem in pulse-oximetry is the decreased accuracy for individuals with a dark skin, especially the overestimation of SpO 2 during de-saturation 31 . Because of the higher melanin content of their skin, more light is absorbed, resulting in reflected light with a lower pulsatile amplitude compared to skin with a lower pigmentation level. Skin reflectance is rather uniform in the infrared part of the light spectrum, as measured by ref. 32. Consequently, whereas both the AC and DC components decrease for increasing melanin content, the ratio of both remains stable, resulting in a stable signature-vector → P bv over the entire range of skin pigmentation levels, as verified in our large scale study 30 . As a result, the calibration coefficients, i.e. the values of the static and update signature-vectors of our method, are expected to be independent of the skin pigmentation level. Our dataset contains one subject, subject II, with Fitzpatrick skin-type V to challenge this hypothesis. The result of this sequence is visualized in Fig. 5. It can be observed that there is indeed no decrease in accuracy visible for our method, especially during the two de-saturation events, whereas the ratio-of-ratios based method overestimates SpO 2 in the static part of the sequence, which is likely caused by the reduced SNR of the PPG waveforms. It should however be noted that in contrast to our previous large scale study with 40 subjects, this 'proof of concept' dataset only includes one dark-skinned subject with SpO 2 values in a rather narrow range.
The values of the PBV signature-vector for an arbitrary wavelength combination can be calculated by Equation 16. In our method, we assumed the PPG amplitude spectrum to be the only time-varying factor in this expression. However, it may occur that the illumination spectrum I(λ) varies over time, e.g. due to ambient light interference, or the simulated light spectrum deviates from the actual light spectrum. From Equation 16, it can be observed that this will affect the values of the PBV vector, and jeopardizes the accuracy and calibratibility of our method, but also all existing methods will suffer from this problem. However, in most clinical environments illumination is strictly controlled, and chromaticity changes of the light source are not likely to occur. Moreover, the problem can be prevented with narrow-bandwidth optical filters. This makes results virtually independent of the illumination spectrum.
Our method focusses on application in near-infrared. However, our method could also be applied in visible light, in the wavelength range [400-700] nm. As can be observed from Fig. 7, it are mainly the red wavelengths which are affected by changes in SpO 2 , whereas the shorter wavelengths only vary slightly. The PBV method has originally been developed for applications in visible light using an RGB camera. By only adjusting the values of the PBV vectors, one could apply our method in visible light to arrive at a more motion-robust version of Guazzi et al. 13 . A critical remark should however be made about the choice for the blue and red color channels of their RR-based method. Because of the shallow penetration depth of blue, it suffers much more from specular reflections compared to the other two color channels. Since SpO 2 is determined by measuring relative pulsatilities of arterial blood at different wavelengths, it is likely that the reported measurements are corrupted by these artifacts and hence jeopardizes the calibratibility of the method.
A potential risk for measurements in visible light is the temperature dependency, which makes calibration much more difficult. Since green and particularly blue wavelengths have much shallower skin penetration depths compared to red and near-infrared, the reflected light is mostly determined by blood-absorption in the  capillaries close to the surface, and much less by the blood-volume variations in the arterioles. Since capillaries are closer to the skin surface, environmental temperature will have a large influence on these, but much less on the arterioles, which are located in the deeper layers of the skin and will therefore have a temperature close or equal to the body temperature. It has been observed that the relative PPG-amplitude is positively correlated with skin-temperature 33 . At low ambient temperatures, the viscosity of blood increases, which together with sympathetic-mediated vasoconstriction will decrease blood flow in the cooled skin-region. This risk of environment temperature dependence may however not be an issue, because of the strictly controlled environmental temperature in most clinical settings. In summary, our work builds on the calibratibility proof of camera-based SpO 2 measurement in near-infrared, and uses the same signal acquisition as in ref. 23. Our new principle for SpO 2 -estimation, however, radically changes the signal processing to arrive at a significant motion robustness for the contact-less scenario. We first proof our new measurement principle to be an improvement over RR on noisy data. Next, we show the unique characteristic of the new principle that it can operate on noisy data, enabling us to add multi-site measurements on small regions which are problematic for RR due to lower SNR, and combined it with a relatively simple motion-tracker to show that this results in a much increased performance. We would like to emphasize that the obtained results will depend on the quality of the tracker, and the parameter choice of the multi-site measurement. We did not attempt to fully optimize these system aspects, and also cannot claim clinical validity of our proposal. The method has been validated on sequences with motion frequencies both inside and outside the heart rate frequency band. We recognize, however, that it is possible to imagine a scenario, e.g. if continuous periodic head movements with a constant frequency similar to the pulse rate would occur, in which our method will likely be inaccurate. This should be expected, since to determine oxygenation levels we estimate an SNR which will be incorrect if the distortion frequency and the pulse-frequency coincide. Although such scenario might occur, e.g. in a fitness setting where exercise causes periodic motion, we expect it to be unlikely in a clinical setting. Since an SpO 2 measurement seems less relevant for fitness, we did not include this scenario in our dataset.
Our results have been obtained on a dataset recorded in a controlled environment on healthy subjects to proof the concept. For future work, we are planning to perform a clinical validation of our method in a hospital setting on patients suffering from obstructive sleep apnea (OSA). These patients can have very low and rapidly varying oxygenation values in combination with body movements, which are not easy to simulate by healthy subjects. In current practise, OSA patients are monitored with contact sensors during sleep, causing stress and discomfort. Our method could be a valuable alternative to mitigate these side-effects while simultaneously eliminating the use of disposables.

Methods
Conventional pulse oximetry estimates blood oxygen saturation levels by extracting features from multiple PPG waveforms measured at different wavelengths. The earlier proposed contact-less camera-based methods are based on a similar principle, e.g. ref. 13, and will therefore be used as benchmark for our method. This section is organized as follows: (1) we will first concisely derive the principles of conventional pulse oximetry and present the adaptations made for our benchmark algorithm to improve robustness to noise and motion artifacts, (2) explain the robust pulse extraction method, and (3) present how we exploit this method to measure SpO 2 robustly.
Ratio-of-ratios method. Arterial blood oxygen saturation is defined as the ratio of oxyhaemoglobin (HbO 2 ) to total haemoglobin; the iron-containing protein which serves as the oxygen-carrier in blood: where C HbO 2 , C Hb are the concentrations of oxyhaemoglobin and deoxyhaemoglobin, respectively. The theory of conventional pulse oximetry has been described in several publications 2,34,35 and is referred to as the "ratio-of-ratios" method. Let us now derive this method which is used as benchmark and will later be used to explain our method. Pulse oximetry measures SpO 2 non-invasively and continuously by applying spectroscopic techniques and the Beer-Lambert law, which describes the transmission of light through a material as a function of incident light intensity (I 0 ), extinction coefficient (ε(λ)), path length of the light (l), and concentration of the substance (C):  Figure 7 shows the electromagnetic radiation extinction spectra for oxygenated and deoxygenated haemoglobin in the visible and near-infrared regions of the light spectrum. As can be observed, deoxygenated haemoglobin absorbs more red light in contrast to oxygenated haemoglobin, which absorbs more infrared light, for λ > 800 nm. By comparing the optical extinction at these two regions of the light spectrum, a pulse oximeter can distinguish between the two haemoglobin species. Typical wavelengths of pulse oximeters are 660 and 940 nm, and are selected based on locations in the spectra where relatively large extinction coefficient differences between the two haemoglobin species are present. When the light is emitted through a peripheral site, there are many tissues with different path lengths, concentrations and extinction coefficients contributing to the light attenuation. The Beer-Lambert law, Equation 3, can be split into three components: attenuation due to arterial Hb, attenuation due to arterial HbO 2 , and attenuation due to other tissues. Their contributions to the overall light extinction are assumed to be additive: where l b is the path length through the arterial blood, and l tissue is the path length through other tissues. This equation applies between pulses of arterial blood, i.e., at the valley of the PPG waveform (I v ). Pulse oximetry uses the pulsatile nature of arterial blood to isolate the Hb and HbO 2 terms. When an arterial blood pulse enters the peripheral site, the arteries dilate and the path length through the arterial blood changes slightly (Δ l). For the light attenuation at the peak of the arterial pulse (I p ), a second light absorption equation can be set up: The Hb and HbO 2 terms can now by isolated by dividing Equation 5 by Equation 4: p v Hb Hb HbO HbO 2 2 where by the effects of other tissues and the power of the incident light are cancelled out. This expression depends on the path length difference Δ l, which is unknown. A new equation can be established by changing the incident wavelength. By examining Equation 6 for two different wavelengths, λ 1 and λ 2 , and taking the ratio, its dependence on Δ l can be eliminated assuming an equal path length through the arterial blood for both wavelengths: and motion artifacts. However, the signal-to-noise ratio (SNR) of camera-based PPG signals is much lower compared to those obtained by contact PPG. Therefore, we seek to minimize the distortions which pollute the measurement similar to the proposed camera-based methods described in the introduction Section, and use this method as benchmark algorithm for our analysis. First, the blind source separation technique Independent Component Analysis (ICA) is performed on the temporally-normalized traces of all three color channels. Similar to ref. 36, the component with the largest energy peak in the heart rate band of the frequency spectrum is selected as pulse signal, → S , and the corresponding energy peak as pulse rate: where c indicates the number of independent components,  is the Fourier transform, and f 1 , f 2 are the minimum and maximum plausible pulse rates respectively, typically set at [0.8,4] Hz for adults. The estimated pulse rate, P , is used for the design of the narrow-band adaptive bandpass filter, which eliminates all frequencies not associated with the cardiac pulse signal. The AC components are consequently estimated by calculating the median of the time-domain detected peak-valleys values of the filtered signals within the time-window, whereas the DC components are estimated by taking the median of the low-pass filtered signals, with a cut-off frequency of 0.05 Hz.
From the original three wavelengths, the two wavelengths with the largest contrast, e.g. the wavelengths which have the largest absorbance differences between Hb and HbO 2 , are typically selected for the calculation of SpO 2 . The calibration coefficients α and β are determined by performing linear regression on all static sequences from all subjects, and are therefore not patient specific. We applied the same calibrations coefficients on all sequences and subjects based on the findings of the recent calibration study of Verkruysse et al. 23 . For the multi-site measurements, the RR method uses features similar to the ones presented in the Framework section to prune distorted sub-regions. However, instead of calculating an average pulse-signal from the non-pruned sub-regions, the AC and DC components are determined by calculating the mean of the AC and DC components from the non-pruned sub-regions.
PBV method. Current research in the field of remote PPG mainly focusses on extracting the cardiac pulse signal in the presence of motion and noise artifacts. Recently, de Haan et al. 17 presented a method, the "PBV-method", which uses the unique 'signature' of the blood volume pulse signal. An interesting property of this method is that it utilizes the different relative pulsatile amplitudes in the color channels to differentiate between intensity variations induced by blood volume changes and variations which are not related to these. Since SpO 2 affects the pulsatile amplitudes of the color channels, this can be exploited to measure blood oxygenation values as will be shown later. We will first summarize the PBV method, and consequently explain how this method can be adapted to measure SpO 2 in the next Section. De Haan et al. showed that the minute optical absorption changes caused by blood volume variations in the skin occur along a very specific vector in a normalized RGB-space. This unique blood volume signature enables robust rPPG pulse extraction that minimizes the contribution to the pulse-signal of color variations with other signatures. Compared to the motion robust chrominance-based pulse-extraction method by de Haan et al. 37 , no assumptions about the distortion signals have to be made. Instead, the known relative pulsatile amplitudes → P bv in the mean-centered normalized color channels are employed to discriminate between the pulse-signal and distortions.
We assume that the pulse-signal → S can be constructed as a linear combination of the three normalized color channels: Since the relative pulsatile amplitudes in the color channels of the camera are known based on physiology and optics, the aim is to find the weights, → W, that construct the pulse-signal → S , for which the correlation with the normalized color channels equals → P bv : and therefore the weights → W PBV can be calculated using: where scalar k is chosen to ensure that → W PBV is normalized in ℓ 2 -norm sense. To employ the PBV-method and extract the cardiac pulse-signal, the relative pulsatile amplitudes of the channels, compiled in the normalized blood volume pulse vector → P bv , have to be known. Let us summarize the prediction of the pulse vector from physiology and optics following 17 . The relative, AC/ DC, PPG amplitude as function of the wavelength λ has been modeled by Hülsbusch 38 . Corral 39 measured the absolute, i.e. AC, PPG spectrum using a tungsten-halogen lamp as illumination, which emits radiation in both the visible and NIR section of the light spectrum. The relative PPG can be related to the absolute PPG by: Scientific RepoRts | 6:38609 | DOI: 10.1038/srep38609 s h since the light-source and the skin determine the baseline component of the absolute PPG spectrum. Here ρ s (λ) and I h (λ) represent the skin reflectance spectrum and the emission spectrum of the tungsten-halogen illumination, respectively. The relative pulsatile amplitudes in the three channels of a camera, described by the blood volume pulse vector → P bv , is given by:   30 . An example to illustrate the ability of the PBV method to suppress distortions present in the PPG signals is displayed in Fig. 10.
Adaptive PBV method. Up to now, we considered → P bv to be static. However, as can be observed from Equation 16, → P bv is partly determined by the PPG amplitude spectrum. Different blood oxygen values reflect different mixtures of the absorption spectra of HbO 2 and Hb, leading to different PPG amplitude spectra. The PPG spectra for 60 and 100 percent SpO 2 are visualized in Fig. 7. As can be observed from this figure, the PPG amplitude increases for λ < 800 nm in the near-infrared part of the light spectrum when SpO 2 decreases, whereas the opposite holds for λ > 800 nm. Consequently, the values of the 'signature' → P bv vector change for different blood oxygenation levels. Our method exploits this observation by applying a collection of → P bv vectors, each corresponding to a specific SpO 2 value. The values of → P bv can be determined by Equation 16, where the PPG amplitude spectrum, PPG(λ), is the only SpO 2 -dependent term. Since the PPG spectrum is partly determined by a linear mixture of the spectra of oxygenated and deoxygenated haemoglobin, the collection of examined → P bv vectors can be expressed as:  → P s bv is the static vector corresponding with 100 percent SpO 2 , α indicates the gain factor and → P u bv the update vector, which describes the ratio of amplitude changes for decreasing SpO 2 in the different channels. As can be observed from Equation 16, the values of Equation 17 depend on the selected wavelengths and the optical characteristics of the camera, where the PPG amplitude spectrum can be linearly interpolated within a clinically relevant range of blood oxygenation levels, as visualized in Fig. 7. As elaborated in the previous paragraph, the PBV method calculates the weights → W PBV for which the correlation between → S and C N equals the blood volume pulse 'signature' → P bv . When the relative pulsatile amplitudes of the color channels do not match → P bv , the method will mix in noise to ensure that the correlation between the pulse signal → S and the normalized color channels C N equals → P bv as has been mentioned in ref. 17. Consequently, the quality of the resulting pulse signal reduces when → P bv deviates from the correct vector. Since we can calculate → P bv for different blood oxygenation levels, the vector which provides the pulse signal with the highest SNR describes the data best, and can consequently be related back to an SpO 2 value. An illustrative example of this principle on synthetic data is displayed in Fig. 11.
Essentially, this proposed method inverts the process of measuring SpO 2 compared to the conventional ratio-of-ratios principle; whereas the ratio-of-ratios principle estimates SpO 2 from features of the PPG waveforms of the individual wavelengths, our proposed method examines a collection of 'signatures' of oxygenation levels, and determines SpO 2 from the signature which describes data best, based on the quality of the pulse signals. This has the advantage that artifacts present in the PPG waveforms, such as motion and noise, can be eliminated, which is the main cause of erroneousness measurements in conventional pulse-oximetry. Moreover, we are much less dependent on the overall quality of pulse signal because the optimal signature remains stable, even when the pulse signal itself is very noisy. However, a requirement for the method is a correct determination of the pulse-frequency, which is necessary to identify the differences in pulse quality for different → P bv vectors. In the next section we will describe how this "adaptive PBV method" (APBV) is incorporated in the general framework, which additionally exploited the spatial redundancy of the camera, i.e. multi-site measurements robustly combined in a single value, to further improve robustness.
Framework. The proposed framework is visualized in Fig. 12 and can be divided into four operations: (1) RoI tracking, (2) pulse extraction, (3) feature calculation, and (4) PBV selection. We will now discuss each operation separately.
• ROI Tracking: The Region of Interest (RoI) comprising skin area is manually initialized in the first frame and tracked over time using the CSK algorithm of Henriques et al. 40 , similar to our earlier study 30 . To exploit the spatial redundancy of the camera sensor, the rectangular RoI is divided into M equally sized sub-regions for which Figure 11. Illustrative example to demonstrate the principle of the adaptive PBV method. Synthetic data with added noise (A pulse = 2 · 10 −3 , A noise = 1 · 10 −2 ) is generated to simulate a slow, linear, desaturation event with oxygenation levels in the range 60-100 percent and a constant pulse rate of 70 BPM. Within this range, nine PBV vectors are uniformly sampled (1 = 100%, 9 = 60%). It can be observed from the spectrograms and the PBV indices (red), that the PBV vector corresponding with the pulse signal with the highest SNR, can be mapped to the correct oxygenation level. The PBV indices (red) indicate the PBV vector with the highest SNR for each time-window of 8 seconds.
the spatial average is calculated. By concatenating these values over time for each sub-region and wavelength, traces are constructed, which are consequently used to extract the pulse signals in the next processing step. • Pulse extraction: The traces of the spatial averages are mean-centered normalized µ non-skin pixels or suffer from local motion distortions. To minimize the effects of these sub-regions in the next, crucial SpO 2 estimation step, a quality measure for each sub-region is calculated to prune distorted regions. This quality measure, Q, consists of two spatiotemporal features: (1) cross-spectral signal-to-noise ratio (SNR), and (2) spectral peak correspondence: (1) Cross-spectral signal-to-noise ratio (SNR): The spectrum of a clean pulse signal consists of peak at the pulse frequency and multiple smaller peaks at the locations of the harmonics. A common metric to express the quality of a signal is the signal-to-noise ratio (SNR), which expresses the ratio between the spectral energy of the fundamental frequency and other components present in the spectrum. Our SNR metric is similar to the one proposed by 37 , and can be expressed as: where U is a binary template centered around the pulse peak location and its harmonics, ⊙ indicates element-wise multiplication, i, j are sub-region indices, and 1 ≤ p ≤ N P . The peak location is determined by selecting the peak of the histogram where the histogram is taken over the location of the peak frequency in each of the N P × M pulse signals. Different from 37 , not only the SNRs of each individual sub-region are calculated, but also the SNRs of all sub-region combinations are calculated, closely related to spectral coherence. These cross-spectral SNR values express the spectral correlation of the sub-regions, and are consequently normalized in the range [0,1]: = ∈ ∈ SNR SNR SNR max max ( ) p N P i j M ij p , .
(2) Spectral peak correspondence: The pulse rate is expected to be similar for all sub-regions within the selected RoI, resulting a similar frequency peaks. Regions with a distorted pulse signal are likely to have a frequency peak different from the pulse frequency. This deviation can be expressed by calculating the spectral peak differences of all sub-region combinations: a rgmax ( ) and argmax ( ) , The final quality measure Q is defined as the element-wise multiplication of the normalized SNR and peak correspondence scores: =  Q SNR C. A visualization of the features is displayed in Fig. 13 for different scenarios, with N P = 9 and M = 30. • PBV Selection: By adding the row-elements of Q, a quality measure for each sub-region is obtained, , . Sub-regions with a quality measure smaller than µ → k Q ( ) s are pruned (default k = 1). After pruning the distorted sub-regions, a single pulse signal is calculated for each pulse vector p by taking the mean of the pulse traces from the remaining sub-regions. From the N P pulse signals, the pulse signal with the highest SNR is selected and its corresponding → P bv can be related back to an SpO 2 value, as explained in the description of the APBV method.
Finally, the estimated SpO 2 values are low-pass filtered using a five-point moving average filter. (1) the selected ROI (forehead) is tracked over time and divided into rectangular sub-regions for which the spatial average is calculated (2) the pulse signal is calculated for a collection of PBV vectors each reflecting a oxygen saturation level (3) spatiotemporal features are calculated from the pulse signals to prune distorted regions, and (4) the PBV vector is selected from the pulse signals of the non-pruned regions. Figure 13. Visualization of the features and quality measure for different scenarios with nine PBV vectors uniformly sampled in the range 60-100 SpO 2 , where PBV 1 corresponds to 100% and PBV 9 to 60% SpO 2 . This quality measure calculated for each sub-region is used to prune sub-regions with low SNR, which could corrupt the SpO 2 measurements.