Noninvasive prediction of Blood Lactate through a machine learning-based approach

We hypothesized that blood lactate concentration([Lac]blood) is a function of cardiopulmonary variables, exercise intensity and some anthropometric elements during aerobic exercise. This investigation aimed to establish a mathematical model to estimate [Lac]blood noninvasively during constant work rate (CWR) exercise of various intensities. 31 healthy participants were recruited and each underwent 4 cardiopulmonary exercise tests: one incremental and three CWR tests (low: 35% of peak work rate for 15 min, moderate: 60% 10 min and high: 90% 4 min). At the end of each CWR test, venous blood was sampled to determine [Lac]blood. 31 trios of CWR tests were employed to construct the mathematical model, which utilized exponential regression combined with Taylor expansion. Good fitting was achieved when the conditions of low and moderate intensity were put in one model; high-intensity in another. Standard deviation of fitting error in the former condition is 0.52; in the latter is 1.82 mmol/liter. Weighting analysis demonstrated that, besides heart rate, respiratory variables are required in the estimation of [Lac]blood in the model of low/moderate intensity. In conclusion, by measuring noninvasive cardio-respiratory parameters, [Lac]blood during CWR exercise can be determined with good accuracy. This should have application in endurance training and future exercise industry.

Blood lactate during aerobic exercise is the result of glycolytic metabolism, an anaerobic energy production pathway in muscle cells. Its concentration in muscle and blood reflect the extent of involvement of anaerobic metabolism. Blood lactate levels are also an important concern in exercise training.
Aerobic exercise intensity has been divided into three zones by ventilatory threshold 1(VT1) and ventilatory threshold 2(VT2) identified by breath-by-breath gas exchange measurement during incremental exercise testing 1 . In the moderate zone (between VT1 and VT2), the [Lac] blood is increased but production and elimination rates reach equilibrium. VT2 is the highest possible intensity to achieve a steady [Lac] blood , termed maximal lactate steady state [2][3][4] . Studies demonstrated that the intensity of endurance training between VT1 and VT2 significantly improved fitness among the untrained subjects 5,6 . In addition, in the high zone above VT2, the [Lac] blood accumulates rapidly and fatigue is forthcoming. VT1(also lactate threshold or anaerobic threshold) and VT2(also respiratory compensation point or onset of blood lactate accumulation) correspond to 1~2 and 4 mM/liter blood lactate concentration respectively 2 . Accordingly, it is valuable to noninvasively obtain the numerical value of [Lac] blood during endurance training.
The current investigation attempted to establish a novel mathematical model to estimate [Lac] blood noninvasively. During exercise, an extremely complicated relationship exists between [Lac] blood and tidal volume (V T ), breathing frequency (BF), exercising heart rate(ExHR), resting HR(ReHR), and anthropometric characteristics such as body weight. Exercise intensity plays a key role in delineating the complex interaction among these physiologic variables. As intensity increases, HR, Bf, V T and [Lac] blood all increase but with different trajectories. Accordingly, we hypothesized that [Lac] blood is a function of cardiopulmonary variables, exercise intensity and anthropometric characteristics.

Methods
Thirty-one healthy male and female participants between 20 and 50 years-old were recruited by convenience sampling (Table 1). Those with cardiovascular illness were excluded. The experiment protocol was approved by the Chang Gung Memorial Hospital Institutional Review Board. All the subjects provided written informed consent after receiving an oral and printed explanation of the experimental procedures. This research was performed in accordance with the ethical standards of the Declaration of Helsinki.
Cardiopulmonary exercise test and blood lactate measurement. Every participant underwent four cardiopulmonary exercise tests on a cycle ergometer (Ergoselect 150 P, Germany) on different days: one incremental and three constant work rate (CWR) tests. CWR tests were of mild, moderate and high intensity. Each subject was instructed to refrain from exercise for 12 hours before each test. The incremental exercise test comprised 1 minute of unloaded pedaling followed by an incremental increase in work rate of 15 watts per minute until exhaustion; thereby the peak work rate was determined. The VO 2peak was defined by the following criteria: (i) VO 2 increased by less than 2 mL/kg/min over at least 2 min, (ii) HR exceeded 85% of its predicted maximum, (iii) the respiratory exchange ratio exceeded 1.15, or (iv) some other symptom/sign limitations 7 . Subsequently, each subject performed three CWR exercise tests: 15-minute low-constant load at 35% peak work rate, 10-minute moderate-constant load at 60% peak work rate and 4-minute high constant-load at 90% peak work rate. The CWR intensity of low (35%), moderate (60%) and high (90%) were chosen based on three zones: below VT1, between VT1 and VT2, and above VT2 1,2 . In the majority of healthy people, VT occurs at 40-60% of VO 2max 8 . RCP has been reported to be 61.3 to 85.4% of VO 2max in healthy subjects 3,9 . Minute ventilation (V E ), oxygen uptake (VO 2 ), and carbon dioxide production (VCO 2 ) were measured breath-by-breath using a computer-based system (MasterScreen CPX, Cardinal-health Germany). Heart rate (HR) was determined from the R-R interval of a 12-lead electrocardiogram (CardioSoft, GE, Wilwaukee, USA). Arterial blood pressure was measured every two minutes using an automatic blood pressure system (Tango, SunTech Medical, UK), and arterial O 2 saturation was monitored continuously by a finger pulse oximetry (model 9500, Nonin Onyx, Plymouth, Minnesota). End-exercise values were determined as the average of the final 30 seconds of exercise. In the CWR tests, venous blood was sampled mostly from an antecubital vein or in a few cases from a dorsal interosseous metacarpal vein for [Lac] blood assay 30-60 seconds after the end of constant work rate exercise tests. The sample was collected in NaF/K 3 EDTA tubes and then placed on ice. The whole blood was centrifuged within 90 minutes to obtain plasma, which was stored at 4 °C.
[Lac] blood was measured by the enzymatic method within 14 days after sampling (DXC880i).

Mathematical model for lactate estimation.
We propose a novel model to estimate [Lac] blood by noninvasively-measured physiologic signals including V T , BF, ReHR, ExHR, age, body mass index (BMI) and sex at the end of the CWR testing. The model is based on an exponential regression method combined with Taylor expansion 10 to find the best predictors of [Lac] blood .
Exponential function and Taylor expansion. An exponential regression model between [Lac] blood and the physiologic signals at CWR testing was examined, including BF, VT, BMI, age, ReHR and ExHR. We asserted that the connection between [Lac] blood and various physiologic signals can be formulated into a poly-exponential function 11,12 and defined by equation 1.
Ax where x is a matrix of the independent variables, A is the weighting matrix corresponding to each independent variable, and f(x) is [Lac] blood . A supervising gradient descent was adopted to equation 1 to solve the model A with a full-rank matrix by x and f(x) 13 . In order to solve the equation 1 efficiently, Taylor expansion was applied to transfer equation 1 into polynomial form for linear solution. Taylor expansion can be expressed in equation 2 for x = a.
Equation 1 can be transformed into equation 3 for cubic approximation polynomials by Taylor expansion.
is rewritten in matrix form as follows 14 .
Linear regression. Equation 4 is the relationship between the multiple independent variables X and [Lac] blood ; the model A can be solved by linear regression analysis 15 Additionally, leave-one-out cross validation was applied to avoid overfitting8-10 as follows: In the training step, physiologic, anthropometric variables and [Lac] blood from the database involving CWR testing at low, moderate and high intensity were employed to construct the A matrix and solve for regression coefficients. In the testing step, estimated blood lactate concentration ([Lac] estimate ), = Y AX, was used to verify the difference between true and the estimated lactate value by error and variance.
The error distance d is 6) and the sum of squares D of all data is Where σ 2 is the variance. The solution matrix A in equation 5 is satisfied with the minimum D in order to get the minimum error variance.
Leave-one-out cross validation. Additionally, leave-one-out cross validation was applied to avoid overfitting [16][17][18] , which is briefly described as follows: • If k observations are recorded, one is used for testing and the other k-1 observations are for training.
• The above procedure is repeated k times in the k observations for testing and training.  19 were employed to show the validity of estimated blood lactate level. Descriptive statistics were also used. Data are presented as mean ± standard deviation.

Results
The average work rate corresponding to low, moderate and high CWR was 66 ± 29, 107 ± 46 and 146 ± 66 watts. The mean [Lac] blood was 3.7 ± 2.3, 6.9 ± 4.2 and 10.4 ± 4.1 mM/liter for the three intensities ( Table 2). The mathematical analysis showed that, if low, moderate and high intensity are processed together, the fitting error was large (Fig. 1G-I). However, if we fit the data in low-and moderate-intensity CWR (Fig. 1A-C) together, and fit the high-intensity data (Fig. 1D-F) separately, the fitting error becomes much smaller. In these two conditions, both models fit the data quite well, especially low and moderate intensity, in which the standard deviation of fitting error is 0.52 mmol/liter (Fig. 1B). This indicates that a very different relationship exists between [Lac] blood and these measured variables under the two conditions: low/moderate, and high intensity. Figure 2A presents the absolute weighting of each variable in determining [Lac] blood . In low & moderate constant-load intensity, ReHR, BF and age have the greatest positive influence. On the other hand, during high-intensity condition, ExHR alone has a significant impact (Fig. 2B).

Discussion
In the present investigation, responses to 31 trios of CWR exercise tests in 31 subjects were employed to construct a mathematical model to estimate [Lac] blood from noninvasive measurement. The database is comprised of low, moderate and high intensity exercise tests. The model is based on exponential regression method combined with Taylor expansion. The independent variables included ExHR, ReHR, Bf, VT, BMI, age and sex. Excellent fitting was achieved in the conditions of low and moderate intensity in one model, and high-intensity in another model. The standard deviations of fitting error in the former condition is 0.52 mmol/liter; and is 1.82 in the latter condition, which are acceptable for the purposes of specifying exercise training targets. This result implies that exercise intensity is a significant determinant in the complex relationship between [Lac] blood and cardiopulmonary variables. Poor fitting was obtained when we attempted to construct a single mathematical model including of three exercise intensities. Three intensity zones divided by VT1 and VT2 has been reported to have distinct differences in sympathetic stress load, motor unit involvement, and duration to fatigue 2 . The difference could be too large to be mathematically processed in a single model. Additionally, regarding to the model construction, the accuracy of [Lac] blood was also affected by the amount and consistency of the training data. The database in the present study is small compared to the sample size commonly employed in deep learning. The consistency of our data, as revealed by the leave-one-out validation, is good. Therefore, increase of the sample size will improve the precision of the estimation, especially in the high-intensity database. It is also worth mentioning that multiple linear regression was attempted at first. Little or low correlation was found between almost all the independent variables versus [Lac] blood (the highest correlation coefficient is 0.54) . Accordingly, a more complicated mathematical method is adopted in the analysis.
Modeling weight distribution analysis showed that BF is indispensible and even more important than heart rate as an independent variable in the conditions of low/moderate intensity. The advancement of wearing device is progressing rapidly, exercising BF may be acquired conveniently soon though we are unaware of any wearable devices that measures currently. The mathematical model is constructed with a view to applying it in the exercise industry. Therefore, V O 2 and V CO 2 are not included because they are quite impossible to obtain without a gas analysis system.
The algorithm of the current study could be applied in the threshold training model 2 of cycling endurance training. The suitable intensity is to keep [Lac] blood in the range of VT1 and VT2, especially for the untrained people. The corresponding [Lac] blood are 1~2 and 4 mM/liter 2 . SD of fitting error in the low/moderate model is 0.52 mM/liter. Considering the width of the middle zone, the fitting error should be acceptable. Increase the sample size will further minimize the fitting error. Another possible application could be in the CWR testing. Estimated [Lac] blood may be used as a criterion to judge whether the subject approaches maximal effort when gas analysis to measure V O 2 plateau is not available. A variety of [Lac] blood cut-off values have been proposed. Most of the criteria are around 8 mM/liter 20,21 . Further study is needed to prove this idea.
The mathematical methodology employed in this study should apply to other CWR conditions during exercise. The most common scenario is in treadmill exercise in which speed and slope are fixed. Further, some steppers are provided with constant-power modes. Additionally, wearable device that measure physiological responses are being developed. Our model to estimate [Lac] blood would be relevant to a free ambulation constant work rate task in which cardiac and respiratory responses are measured.
There are several study limitations. First, the sample size is relatively small. However, good validity is still attainable, which suggests that the mathematical model employed in the present study works. A large database should be acquired to increase the accuracy, especially in the model of high intensity. Secondly, equation (regression coefficient or A value) obtained in this study may only be relevant to CWR cycle ergometer exercise. Studies should be undertaken to test its validity during other exercise modalities (e.g., treadmill, walking). Third, prospective validation procedures were not performed in this study. Nonetheless, if we do prospective validation, those data can be pulled into the learning model to generate a new regression coefficient. Leave-one-out cross validation 8 was already applied to our model to determine the overfitting statistical learning.

Conclusion
This is the first study to establish a mathematical model in predicting the numerical values of [Lac] blood during exercise. By measuring noninvasive cardio-respiratory parameters and including some anthropometric factors, [Lac] blood during constant work rate exercise can be determined with good validity by exponential regression combined with Taylor expansion. These experimental finding should have application in designing intensities during endurance training and future exercise industry.

Data Availability
The datasets generated during and/or analysed during the current study are not publicly available due to its potential commercial interest but are available from the corresponding author on reasonable request.