Reliability of threshold determination using portable muscle oxygenation monitors during exercise testing: a systematic review and meta-analysis

Over the last few years, portable Near-Infrared Spectroscopy (NIRS) technology has been suggested for determining metabolic/ventilator thresholds. This systematic review and meta-analysis aimed to assess the reliability of a portable muscle oxygenation monitor for determining thresholds during exercise testing. The proposed PICO question was: Is the exercise intensity of muscle oxygenation thresholds, using portable NIRS, reliable compared with lactate and ventilatory thresholds for exercise intensity determined in athletes? A search of Pubmed, Scopus and Web of Science was undertaken and the review was conducted following PRISMA guidelines. Fifteen articles were included. The domains which presented the highest biases were confounders (93% with moderate or high risk) and participant selection (100% with moderate or high risk). The intra-class correlation coefficient between exercise intensity of the first ventilatory or lactate threshold and the first muscle oxygenation threshold was 0.53 (obtained with data from only 3 studies), whereas the second threshold was 0.80. The present work shows that although a portable muscle oxygenation monitor has moderate to good reliability for determining the second ventilatory and lactate thresholds, further research is necessary to investigate the mathematical methods of detection, the capacity to detect the first threshold, the detection in multiple regions, and the effect of sex, performance level and adipose tissue in determining thresholds.

In many sports, various methods of exercise testing are performed for detecting metabolic/ventilatory thresholds.These zones or points are characterized by nonlinear increases of physiological outcomes (e.g., dot(V), oxygen volume (VO 2 ), blood lactate, heart rate, etc.) so determining two physiological breakpoints that allow the three-phase model of intensities to be applied [1][2][3] .These data are important to trainers and athletes for assessing physical condition and programming intensities to optimize training and improving cardiovascular fitness and endurance 4,5 .Therefore, it is of great importance to have a reliable method for threshold detection 6 .
The ventilatory or metabolic threshold is usually determined by gas exchange or blood lactate data respectively, obtained during incremental tests 4,7 .Gas exchange is one of the most commonly used methods for assessing the evolution of gas exchange measurements (dot(V), VO 2 , carbon dioxide volume (VCO 2 ) and minute ventilation (VE)) that allow detection of the respiratory compensation point (also referred to as ventilatory Modified Beer-Lambert's law Eq.(1), where "A" is the absorption, "I" is the luminous intensity (lm sr −1 ), " ε " is the extinction coefficient for the light absorbing compound of interest, "[C]" is the concentration of the compound of interest (e.g.[Hb], [Mb] and/or [cyt ox ]), "L" is the source-detector distance (mm), "DPF" the differential path length factor and "G" is the factor reflecting non-absorption.
NIRS technology in the sports field is being used to observe changes in the muscle metabolism of different muscles 19 .This has allowed us to measure local muscle performance during exercise, determining whether the muscles work optimally and if there is deoxygenation depending on exercise intensity 20,23,24 .Moreover, although several studies have suggested that portable NIRS technology can be used for determining muscle oxygenation thresholds 17,25,26 , and many studies have been published over the last few years, as far as the author knows, no systematic reviews and meta-analyses that validate the use of NIRS technology to detect thresholds have been undertaken.
Therefore, the aim of this systematic review and meta-analysis was to evaluate the reliability of determining the exercise intensity of the muscle oxygenation threshold (using the portable NIRS) compared with detection, using a gold standard method during laboratory and field tests.

Methods
Literature search methodology.This systematic review and meta-analysis was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement 27 .The proposed PICO (Population, Intervention, Comparison and Outcomes of an article) question was: Is the exercise intensity of muscle oxygenation thresholds, using portable NIRS, reliable compared with lactate and ventilatory thresholds for exercise intensity determined in athletes?Three databases (PubMed, Scopus and Web of Science) were electronically searched on the 15th of June of 2023 using the following terms: "NIRS" OR "Near Infrared Spectroscopy" OR "muscle oxygenation" OR "oximetry" AND with the terms and synonyms "threshold" OR "breakpoint" OR "inflection point".Additionally, (AND) different terms such as "exercise" OR "sport" OR "physical activity" OR "running" OR "cycling" OR "swimming" were used.Every database employed its own term mapping.The results were screened to identify relevant studies, first by abstract and finally by full text.Full texts underwent a thorough screening process to determine their eligibility for inclusion in the review.Only those texts that fulfilled all the predetermined criteria were considered for inclusion.
The articles obtained were exported to Zotero (version 6.0.15,Corporation for Digital Scholarship, Vienna, USA) to eliminate duplicates, and the abstracts were uploaded to JBI SUMARI (The University of Adelaide, Adelaide, Australia) to carry out the first screening.
Inclusion and exclusion criteria.The inclusion criteria established for the systematic review were as follows: (1) Only studies written in English, Spanish or Portuguese, (2) studies using a portable and commercial NIRS for muscle oxygenation threshold detection, (3) studies using a gold standard (gas exchange or blood lactate methods) in addition to muscle oxygenation for thresholds detection, (4) studies with a healthy population between 18 and 65 years of age, and (5) experimental and quasi-experimental studies. (1) Meta-analysis.A separate meta-analysis was performed to examine the reliability in determining intensity at each threshold using NIRS and the gold standard method (gas exchange and/or blood lactate).The intraclass correlation coefficient (ICC) and sample size were extracted for each study.For the studies that did not provide ICC values, the ICC value was calculated from obtaining the data from the datasets, tables and figures of the article, or on request from the authors.In the case of figures, data was extracted from scatter plots using the plot digitizer application 31 .If the data were not provided by the authors, the study was excluded from the analysis.ICC values were calculated based on a single rater-measurement, absolute-agreement, and 2-way random-effects model.For studies where it was possible to obtain more than one ICC value (e.g., because the intensity at the threshold was extracted using different automatic methods), these ICC values were averaged, using only one ICC value for each study to avoid statistical dependence 31,32 .ICC values were transformed to Fisher's z scale and a random-effects model with Restricted Maximum Likelihood Estimation was used for the analysis 33 , assessing the type of gold standard compared (gas exchange or blood lactate) as a possible moderator.Q and I 2 statistics were used for the homogeneity analysis.I 2 values of around 25%, 50%, and 75% denoted low, moderate, and large heterogeneity, respectively.To assess the publication bias, funnel plot with Duval and Tweedie's trim-andfill method for imputing missing data and the Egger's test were performed 34,35 .To facilitate the interpretation of the data, Fisher's z values were then converted back to ICC values after completing the meta-analyses 33 .The ICC and associated 95% confidence intervals were interpreted as: poor (0.00-0.25), fair (0.26-0.50), moderate (0.51-0.75) and good (0.76-1.00) 36 .Statistical significance was established at p < 0.05.A meta-analysis was performed with the "metafor" package (version 4.2-0) 37 in RSTUDIO (version 2023.06.0) 38 .

Results
Study selection.A total of 1,131 articles from databases of PubMed (237), Web of Science (507), and Scopus (387) were included, and 559 articles remained after removing duplicates.Finally, after selecting studies by their abstracts, 129 full articles were reviewed, of which 15 were included in the systematic review (Fig. 1).

Participants characteristics.
The systematic review included a sample of 344 participants (216 males and 128 females).Among these participants 33 were elite athletes, 208 highly trained athletes, 31 trained athletes and 72 recreationally active athletes.Moreover, athletes from various sports were included (soccer, cycling, running, triathlon and rowing) with laboratory protocols, since there are currently no studies carried out in field tests.The study characteristics and the main findings are summarized in Table 1.
Methods used for determining muscle oxygenation threshold.The studies selected had determined both muscle oxygenation threshold (MOT) (first and second) using different methods (Table 2).Most of the studies used the regression double linear representing 42% and wearable lactate threshold (WLT) was used in 25% of the studies included in the systematic review.Together, these two methods represented 67% of the studies included in the systematic review.However, visual identification was also used in two studies (17%).

Risk of bias evaluation.
The domains which presented the highest bias were due to confounding (7% with critical risk, 33% with serious risk and 53% with moderate risk), due to the selection of the participants (20% with serious risk and 80% with moderate risk), and due to the selection of the reported results (40% with moderate risk) (Figs. 2 and 3).For the other domains, most of the studies presented a low risk of bias (> 85%).

Meta-analyses.
Of the 15 articles included in this review, the ICCs of 13 of them were obtained from the meta-analysis (Table 3).Of these 13 articles, the ICC was provided in the article itself in 3, was calculated from the data obtained in a dataset, table or figure in 8, and in 2 the ICC was provided directly by the authors (Table 3).A test of moderators was not performed for the first threshold due to the low number of studies (n = 3, Table 3).The Q test was not significant (Q(df = 2) = 1.01, p-val = 0.60) and the I 2 was 0%, showing a low heterogeneity.The Trim-and-fill method estimated 0 missing studies and Egger's test was not significant (p = 0.46).The ICC of the first threshold was moderate (ICC = 0.53) but with a wide 95%CI[0.31,0.69] (Fig. 4A).
Vol:.( 1234567890 For the second threshold, no effect of moderators was observed (p = 0.94) at first.Therefore, a meta-analysis was performed without differentiating between the ICC obtained compared with lactate or gas exchange.The Q test was not significant (Q(df = 13) = 99.17,p < 0.001) and the I 2 was of 86%, showing a large heterogeneity.The Trim-and-fill method estimated 0 missing studies and Egger's test was not significant (p = 0.54).The ICC of the second threshold was good (ICC = 0.80, 95%CI[0.65,0.89] (Fig. 4B).

Discussion
The aim of this systematic review and meta-analysis was to evaluate the reliability of determining exercise intensity using the muscle oxygenation threshold (with the portable NIRS) compared with a gold standard detection method during laboratory tests.The results of the review show that the methods mostly used to determine muscle oxygenation thresholds were regression double linear (46%), WLT (20%), and visual identification (20%).The meta-analysis revealed that of the 13 studies where ICC was obtained, only 3 studies assessed the first threshold, the mean ICC of 0.53 being observed between the exercise intensity obtained at the first muscle oxygenation threshold (MOT 1 ) and first lactate threshold (LT 1 ) or first ventilatory threshold (VT 1 ).The mean ICC between second muscle oxygenation threshold (MOT 2 ) and second lactate threshold (LT 2 ) or second ventilatory threshold (VT 2 ) was 0.80.
Our meta-analyses were focused on showing whether the exercise intensity where the first and second thresholds were detected using the portable NIRS was more reliable than the gold standards methods (gas exchange and blood lactate).Table 1 shows how the relationship between MOT and VT was analyzed in 7 studies 16,25,[39][40][41][42][43] and in 9 studies for LT 17,26,41,[44][45][46][47][48][49] .The studies of Feldmann et al. 16 and Van der Zwaard et al. 42 compared the VT 1 and LT 1 with MOT 1 in cycling and found ICC values (ICC = 0.56-0.65).These results are in line with other studies that determined thresholds with non-portable NIRS in cycling 50 .Moreover, a fair ICC in running was shown (ICC = 0.23-0.49) 16,44.28/07/2023 17:06:00 A lower number of studies assessed the first threshold compared with the second one (3 vs. 12 studies), maybe due to the difficulty of determining the MOT 1, since the slope changes very slightly and the ICC value is not as good as the second threshold 42 .
The second threshold was determined using the blood lactate concentration and muscle oxygenation in different sports such as cycling 16,17,[46][47][48] , running 44,45,49 and rowing 26 .ICC values showed a certain disparity and were fair, moderate or good (ICC = 0.29-0.90) in studies of running, although cycling studies showed a good ICC (ICC = 0.91-0.94).However, the ICC value of two studies were not obtained 46,48 .The remaining studies also compared gas exchange with muscle oxygenation in the second threshold in cycling 16,39,42,43 and running 16,40 .The results of the different studies suggest that the relationship between both methods in threshold determination is affected by the region assessed by the NIRS device, as good values (ICC = 0.92-0.97)were observed on assessing the intercostalis during cycling 39 .Moreover, the vastus lateralis presented moderate or good ICC in different investigations 25,42 , so the test or determination method chosen may also be critical.Different methods were developed to determine the thresholds in blood lactate concentration and gas exchange, which are commonly combined by users to find the most optimal inflection point 51 .Despite recent research into the application of NIRS technology for the purpose of obtaining thresholds, there is a lack of research on its methods of determination.The articles included in this systematic review use different methods for determining thresholds: BSX Insight (20%, N = 3) 45,47,49 , double linear regression (46%, N = 7) 16,26,39,[41][42][43][44] , visual method 17,25,40 , Dmax or modified Dmax 46 and applications of devices Humon Beta 48 .
BSX Insight, which determines the threshold by making a comparison with blood lactate concentration, presented good values of ICC, although this used a patented method to determine MOT based on the inflection point of SmO 2 during incremental testing 45 .However, as this system is commercial and patented, specific details of the algorithm used for said detection are unknown.Another important method is visual, which could be the most accurate for detecting the thresholds 17 but with associated human error, or complementary to the previous one as was performed by Turnes et al. 26 We recommend that future studies explore different methods to analyze thresholds using NIRS technology, to provide evidence on which are optimal, if several should be combined, or if some are more suitable for certain populations or sports.systematic review analyzed 1 or 3 muscles at most at the same time.Moreover, most of these studies were focused on correlating the main muscles of exercise with blood lactate concentration or gas exchange, and it is important to take into account that lactate and gas exchange determine systemic changes, while NIRS technology can be used for determining a more local response.For this reason, further studies that analyze different muscles simultaneously would be interesting in order to understand what is happening in each muscle during exercise testing, and how some may be more related to systemic changes while others have more specific alterations.
It is important to consider that the present meta-analysis is limited to only one measure of reliability (ICC), and more statistics are desirable (e.g., bias between methods) to improve the interpretation and application of the present results.Bias was not included due to the low number of studies that reported this data, and the different units used (W, km•h −1 or percentage) also posed a challenge.This point should be regarded as a limitation of the present work, and future meta-analysis with a higher number of studies should incorporate more reliable statistics.Some of the articles included in this review demonstrate mean bias between MOT 2 and LT 2 or VT 2 ranging from 0.01 and 0.4 km•h −125, 44,45,49 , between 3.9 and 15.4 W 39,41 , 0.05 W•kg −117 and 10.7% of the power output 26 .However, Batterson et al. 44 showed a higher mean bias for MOT and LT 1 (1.1-1.2 km•h −1 ), and Driller et al. 47 also demonstrated how the method of determination could affect the bias, with the lowest being for the Dmax method (17 W) and the highest for the OBLA method (37 W).Finally, the study of Feldmann et al. 16 stated that in terms of power or speed, the bias represents one performance step (for this particular study, it was 25 W for cycling and 0.5 km•h −1 for running).
Although the studies included present low risk of bias in most of the domains assessed, the analysis performed suggests that two domains presented a considerable risk of bias: confounders and the selection of the participants.The main issues related to the confounding domain were the studies that did not consider the effect of the training level of participants, prior activity or sex in their results.In some cases, only the value of correlation or intraclass correlation coefficient without the confidence interval appear in the reported results.However, the majority of studies had a missing data count bias and bias in measurement outcomes.Future studies should take into account these aspects, so as to control them as much as possible, to improve their quality and reduce their biases.Moreover, these aspects are possible sources of the high heterogeneity found in the meta-analysis.
The main limitation of the present work is the small number of studies included in the meta-analysis (N = 13).In future, a higher number of studies incorporated into the current analysis could corroborate the results obtained.Moreover, there was a high heterogeneity between the different studies included.Regarding the methodology, the regions or the sample assessed, with participants ranging from national and international level competitors 17 to recreational ones 42 , could affect the results of the metanalysis.
Considering all the analyses carried out, we think that the following lines of research should be prioritized in this area: exploring which are the most appropriate mathematical detection methods depending on the sports or populations for NIRS, investigating whether it is possible to detect the first threshold, analyzing multiple regions at the same time to find out which ones are most related to systemic thresholds and which have a more specific behavior of the muscle itself, and understanding the differences in the detection of thresholds depending on sex, performance level, amount of adipose tissue or the changing of muscle length during exercise.

Conclusion
The present systematic review and meta-analysis shows that, although using a portable muscle oxygenation monitor has moderate to good reliability for determining the second threshold, further research is necessary to investigate the mathematical methods of detection, the capacity to detect the first threshold, detection in multiple regions, and the effect of sex, performance level and adipose tissue on threshold determination.

Figure 1 .
Figure 1.Study selection from the systematic review and meta-analysis (PRISMA).

Figure 3 .
Figure 3.The risk of bias for each study.Created with 'robvis' application 54 .

Figure 4 .
Figure 4. Forest plots of the meta-analysis was performed for the intraclass correlation (ICC) of the exercise intensity obtained at the first (A) and second (B) threshold determination using NIRS and the gold standard (gas exchange or blood lactate).

Table 2 .
Methods for determining the muscle oxygenation threshold in the studies selected.MOT 1 : first muscle oxygenation threshold, MOT 2 : second muscle oxygenation threshold.a Also visually checked.b Inflection point at SmO 2 values at the same point as the VT 2 .