Finger sweat analysis enables short interval metabolic biomonitoring in humans

Metabolic biomonitoring in humans is typically based on the sampling of blood, plasma or urine. Although established in the clinical routine, these sampling procedures are often associated with a variety of compliance issues, which are impeding time-course studies. Here, we show that the metabolic profiling of the minute amounts of sweat sampled from fingertips addresses this challenge. Sweat sampling from fingertips is non-invasive, robust and can be accomplished repeatedly by untrained personnel. The sweat matrix represents a rich source for metabolic phenotyping. We confirm the feasibility of short interval sampling of sweat from the fingertips in time-course studies involving the consumption of coffee or the ingestion of a caffeine capsule after a fasting interval, in which we successfully monitor all known caffeine metabolites as well as endogenous metabolic responses. Fluctuations in the rate of sweat production are accounted for by mathematical modelling to reveal individual rates of caffeine uptake, metabolism and clearance. To conclude, metabotyping using sweat from fingertips combined with mathematical network modelling shows promise for broad applications in precision medicine by enabling the assessment of dynamic metabolic patterns, which may overcome the limitations of purely compositional biomarkers.


Supplementary
Supplementary Figure 4: Effect size plots with data from 47 individuals (n=47 for 10 time-points) for caffeine, paraxanthine, theophylline and 1-methylxanthine. A shared-control estimation plot was performed, which presents the mean differences between a single control (time-point before consumption, red line) and each of the intervention groups.
All nAUC datapoints are presented as a swarmplot (top). The effect size is presented as a bootstrap 95% confidence interval (bottom), where the effect size is displayed to the right of the raw data, and the mean of the test group is aligned with the effect size. Exact values for the mean difference, the lower and upper limits as well as the p-values can be found in the Source Data sheet. nAUC, normalised area under curve.    Create synthetic original data (c) for j time points.

Cell culture conditions
HepG2 cells (purchased from ATCC, HB-8065) were cultivated in Eagle's minimum essential medium (EMEM, Gibco) with 10% fetal calf serum (FCS, Gibco), 1% penicillin/streptomycin (Sigma-Aldrich, Austria), 2 mm L-glutamine (Sigma-Aldrich, Austria) and 1% MEM non-essential amino acids (Gibco). Cultivation was performed in a humidified atmosphere at 37°C and 5% CO 2 . For assessing caffeine metabolisation, HepG2 cells were seeded in 6-well plates with cell growth surface for adherent cells (Sarstedt, Austria) at a density of 7 × 10 5 cells/well in 1.5 mL complete medium. Cells were left over night. The growth medium was exchanged and the samples incubated for 24 h with either solvent control (con) or 100 µm caffeine. A caffeine stock solution (20 mm) was prepared in PBS. Additionally, HepG2 cells were pre-treated with benzo-[a]-pyrene (BaP, Sigma) at 0.5 µm or 5 µm for 6 h to induce cytochrome P450 enzymes. [4] A BaP stock solution (20 mm) was prepared in DMSO. After exchanging the medium, cells were again treated with solvent control (con) or caffeine (100 µm) and incubated for 24 h. All experiments were performed in triplicates. 4 Supplementary Note 2: Targeted approach using multiple reaction monitoring (MRM)

Chemicals
Caffeine and formic acid were obtained from Fluka. Caffeine-D9, paraxanthine, theobromine and theophylline were obtained from Sigma-Aldrich. Ultrapure water was obtained from a Millipore system (18.2 MΩ, 185 UV, Millipore), all other solvents were purchased from VWR (LC-MS grade). All chemicals and solvents were used as received.

Standard Solutions and Calibration Samples
The stable isotope-labelled caffeine-D9 was used as the internal standard. It was spiked to all samples in a concentration of 10 pg µL −1 . Additionally, a standard mixture containing caffeine, theobromine and theophylline (each 100 pg µL −1 ) was measured every thirtieth sample as a quality control. The limit of detection (LOD) was defined as the lowest concentration of the analyte that can be detected with a signal-to-noise ratio of 3:1. The lower limit of quantification (LLOQ) was defined as the lowest concentration of the analyte that can be detected with a signal-to-noise ratio of 10:1.

Samples
The temporal evolution of caffeine, theobromine, theophylline and paraxanthine in fingertips of five volunteers was investigated. This was the suggested number of volunteers by power analysis in order to obtain statistical relevant data (calculated using R-studio with a 10% error rate and a significance criterion of 0.05). Two female and three male volunteers were recruited between 25 and 30 of age. All volunteers were non-to moderate coffee consumers. The volunteers were asked to fast caffeine-containing products for 12 h before the start of each experiment. Subjects presented on 8 am on the study day before the first cup of coffee (equivalent to a double espresso, approx. 80 mg caffeine). Samples from the fingertips were collected before and 1, 3 and 5 h after coffee consumption. The experiment was performed independently on three different days.

Sample Preparation
Sample collection was performed similarly as described in the methods section of the main text. The filter paper was transferred into an Eppendorf tube and metabolites were extracted with acetonitrile (500 µL). First, the extraction solution was vortexed for 1 min and then stirred in am Eppendorf Thermomixer comfort 1.5 mL (40°C, 1400 rpm, 10 min). This process was repeated twice and finally the mixture was centrifuged (10 min, 20000 rpm). A fraction of the sample solution (250 µL) was transferred into a fresh Eppendorf tube. The samples were dried under a flow of nitrogen. The dried residues were reconstituted in water (250 µL) containing 0.2% formic acid. The extracted samples were sonicated (10 min) and transferred into 96-well plates for analysis.

Targeted LC-MS/MS
Targeted LC-MS/MS experiments were performed using a nano-and cap-pump (1260 Infinity, Agilent) together with a microfluidics-based separation system (Chip-Cube, Agilent) hyphenated to a triple quadrupole mass spectrometer (Agilent 6490). Liquid Chromatography. The analyte separation was performed using a chip-based integrated sample enrichment, separation and nanoESI sprayer tip (UHC-CHIP II, ZORBAX 80SB-C18, 5 µm, 25 mm×75 µm enrichment column and 150 mm×75 µm separation column, Agilent). The injection volume was 0.5 µL. The autosampler was held at 4°C. Mobile phase A was aqueous solution (0.2% FA) and mobile phase B was acetonitrile (0.2% FA). The gradient included a total run time of 25 min using a flow rate of 400 nL min −1 . A stepped-gradient was applied by starting with 0% B and increasing to 8% B within 0.1 min. The mobile phase B was linearly increased to 20% (3 min) and then to 80% (5 min), which was held constant for 4 min. Then, the mobile phase B was decreased to 0% and the system allowed to equilibrate for further 16 min. A mixture of isopropanol, acetonitrile, methanol and water (1:1:1:1) was used for needle washes.

Mass Spectrometry
The analytes were detected via multiple reaction monitoring (MRM) of three different transitions per molecule with a cycle time of 0.8 s and a dwell time of 50 ms per transition (Supplementary Table 4). Typical MS parameters were as follows: capillary voltage -1.7 to −1.9 kV, gas flow 13 L min −1 , dry gas temperature 200 °C. Experiments were performed and evaluated using Mass Hunter B.06.00 (Agilent).

Supplementary Note 3: Mathematical model
In this section we present the mathematical model used to described the metabolic network of Supplementary Figure 11 (Figure 5a in main manuscript). A list of symbols used in this section is given in Supplementary Table 6. Briefly, the model describes concentration time series of one the ingested free caffeine (C free caf ) and four sweat metabolites (C caf , C par , C bro , C phy ) within the constraints of the following assumptions: • caffeine metabolism can be described by mass-action kinetics in a one-compartment body model, • the uptake of the ingested caffeine (C free caf ) is instantaneous (i. e. no lag time between ingestion and absorption into the body), • the steady-state volume of distribution (V D ) of all four internal metabolites is instantaneously reached and time-independent, • concentration enrichment due to an increase in the water fraction from blood to sweat and dilution through the inability of bound caffeine to diffuse as described by ref. [5] are approximately equal and cancel each other out, • the apparent metabolite concentrations are proportional to the sweat volume, and finally, • sweat volumes are time-dependent and constant across all metabolites.
In the following equations, we differentiate between three types of variables: (i) true variables, denoted by capital letters (e.g. C, which denotes a measure, non-obscured concentration), (ii) apparent variables, denoted by a tilde (e.g.C), wich denotes a measured concentrations that is obscured by an unknown sweat volume, V sweat , and (iii) true specific variables, denoted as lower-case letters, such as normalised concentrations c that have been divided by another variable. Additionally, we use vector notation to collectively refer to a set of metabolites, e.g. C = C caf C par C bro C phy T , which denotes the vector of (true) concentrations for caffeine, paraxanthine, theobromine, and theophylline, respectively.
The fluxes described by the metabolic network can be separated into three, fluxes into the system (J in ), rearrangements of internal fluxes (NJ), and fluxes out of the system (J out ), Since we assumed first order kinetics, the differential equation can be written as with k 9 = k 2 + k 3 + k 4 + k 5 and J in (t) = C free caf Together with some initial conditions C 0 the system of ordinary differential equations (2) can be solved analytically. Note that due the preceding fasting period, we assumed that only the initial caffeine concentration C caf 0 in the sweat is zero, but not the other metabolites [6], i.e.
Thus, we explicitly kept the initial concentration of paraxanthine, theobromine, and theophylline as fitting parameters in the model. However, we do know the amount of ingested caffeine and thus can compute the initial (free) caffeine concentration, where M dose is the mass of ingested caffeine, M donor is the bodymass of the donor, f is the bioavailability, and v caf D is the specific volume of distribution of caffeine. The literature value for v D caf /f is listed in Supplementary Table 7.
With these settings the solution to (2) reads and where c, c i , and c i 0 denote corresponding normalised (and thus unit-less) metabolite concentrations.
The relation between the measured mass,M(t), and the metabolite concentration, C(t), is given over the sweat volume, V sweat , as were we implicitly made use of our central assumption that The measured mass is also given via the calibration curve (y = ax, see Supplementary Table 8 and Fig. 2c in the main text), which connects it to the measured concentration, ˜C (t), and the sample volume, V sample ,M Here, diag(a) denotes an 4 × 4 diagonal matrix whose entries are the 4 elements of (vector of) the slopes of the calibration curves listed in Supplementary Table 8. Finally, combining Equations 11, 12, and 14 gives where the left hand side only contains known constants or measured variables and the right hand side contains all unknown parameters. Thus, (15) can be used for fitting V sweat (t) as well as the kinetic constants in c(t).
With c(t) and V sweat (t) known from the fitting of (15), the absolute metabolite concentrations plotted in Figure 5c and f of the main manuscript is given by In literature the fractions of caffeine being degraded to paraxanthine, theobromine and theophylline are frequently reported. The fractional conversion (fc) can be calculated from the fitted kinetic constants using Equation 17. A comparison of individual fc constants from donor 1 and 2 to population averages is given in the Supplementary Table 3 fc 6 Supplementary Note 4: Sensitivity analysis 6

.1 Aim
In the caffeine capsule study we tried to standardize our experimental setup as much as possible. Specifically, we asked volunteers to fast foods that are known to modify the activity of caffeine processing cytochrome P450 (CYP) enzymes. However, the interpretation of the sweat metabolome is not only complicated by problems of experimental design or measurement errors but by a magnitude of intrinsic, confounding factors like the variability of the sweat volume across time courses and people or differences in precipitation between metabolites. Caffeine and its major degradation products have a similar chemical structure, and thus we assumed identical precipitation characteristics for all four of them in our model, cf. (13). Nevertheless, our mathematical model contains 11 + j free parameters -eight kinetic constants, three initial concentrations, and for each of the j time points one sweat volume. In the following we present our work flow and systematically investigate (i) the influence of variations in the sweat volume (ii) the influence of experimental errors (iii) the influence of different fitting methods on our model's ability to correctly identify these parameters. A list of symbols used in this section is given in Supplementary Table 6.

Work Flow
Similar to [7] we assessed the ability of our mathematical model to correctly fit free parameters in comparison to synthetic data. In short, concentration profiles for caffeine, paraxanthine, theobromine and theophylline were calculated according to Equations 6-9 with parameters listed in Supplementary Table 9 at j = 20 time points τ = {0, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 24} h (unless stated otherwise). The resulting original concentration time series was subsequently artificially obstructed. In our case the obstruction was two-fold,(i) trough a randomly sampled sweat volume, and (ii) trough a randomly sampled experimental error. Next, the obstructed data (m ) was used as input for data fitting and the model parameters were estimated. The fit was repeated n times for the same m (inner Monte Carlo sampling) and the best result (namely the one with the lowest loss value) of the n replicates was stored. To get statistical significance this process was done for i = 300 (unless stated otherwise) replicates (i. e. the outer Monte Carlo sampling).
Since the original parameters were initially known it was subsequently possible to compare true model parameters and fitted parameters and, therefore, estimate the error associated to the data fitting procedure. The workflow was implemented with Python 3.7 and a scheme of it is shown in Supplementary Figure 12.

Data Creation
With assumed reasonable kinetic parameters (Supplementary Table 9) the unit-less concentrations (c) of caffeine, paraxanthine, theobromine and theophylline were calculated using our model (Supplementary Figure 11, Equations 6 -9) at j = 20 time points (unless stated otherwise). The variability of sweat flux on the finger tips spans between 0.05 and 0.62 mg cm −2 min −1 [8,9]. The sweat volume, V sweat , is calculated from the sweat flux by multiplying by 2 cm 2 (sampling area) and 2 min (after washing hands 1 min of free sweating and 1 min of sweat sampling). We assumed a truncated normal distribution of the sweat volume, V sweat , with the mean of both literature values (µ Vsweat = 1.34 µL) and the standard deviation of both values halved (σ Vsweat = 0.57 µL).
In Supplementary Figure 13b the probability density of the V sweat values of all simulations is shown. 96.4 % of them are within literature bounds. Next, the experimental error was sampled. Unless stated otherwise, we assumed a normal error distribution with µ = 1 and σ = 10% for the experimental error, , We point out that unlike V sweat , is not the same for every metabolite at one time point, t, and is, therefore, a vector. Moreover, the measurements of the caffeine capsule study were performed in technical replicates, and thus is sampled twice for every time point. The probability density of is shown in Supplementary Figure 13d. Both, V sweat and are applied to the theoretical concentrations as multiplication,m As an example, the steps of data processing are visualized in Supplementary Figure 13. Supplementary Figure 13a shows, the size of the sampled V sweat s at their respective time points and Supplementary Figure 13b shows the true concentration (c) as black line and the artificially errored data points as blue dots or orange crosses (0 % and 10 % error, respectively). For simplicity reasons we do not show the final data points (m˜ ) which additionally includes the sweat volume and the remaining three metabolites.

Fitting Procedure
The data fitting procedure was implemented in Python using SciPy's curve fit function [10]. As minimization method the exact trust region reflective ('trf') algorithm was selected. Numerical tolerance settings were kept at default (10 −8 ). The general, adaptive, and robust loss [11] required two more fitting parameters which were optimized in a separate minimization problem inside of the loss function using SCiPy's minimize and the truncated Newton ('TNC') algorithm. The numerical tolerance was set to 10 −10 being more accurate than the main minimization problem. Moreover, each input data set was fitted n times as Monte Carlo (MC) replicates. The fit with the lowest overall loss was chosen as the best option and saved for subsequent analysis. The bounds for the model parameters are listed in Supplementary Table 9. All values reported in literature are well within these bounds [12,13,1,2,14,15,16,17]. Nevertheless, many hyperparameters still needed to be manually tuned and in this study we focused on finding optimal settings for (i) the choice of loss function,(ii) the number of Monte Carlo replicates, and (iii) the number of time points.

Data Analysis
Two quality parameters of the fit were calculated: (i) the coefficient of variation (CV in %) giving the relative width of the fitted parameter distribution around the true value (µ, Equation 21) and (ii) the median of the relative error (MRE in %) which gives the offset of the fitted parameter distribution to the true value (Equation 22).
for x m (m = 1, ..., i) Generally speaking CV is a measurement of precision, whereas MRE is a measurement of accuracy. In rare cases some extreme outliers were impacting the calculation of CV of V sweat .
This happened when true V sweat values were sampled outside of the bounds of the fit (close to 0 µL). To get better estimates of CV, all outliers with more than 1000% deviation from the true value were excluded. However, the number of excluded outliers never exceeded 0.1% of the whole sample set for all simulations except one (a comparison simulation without V sweat normalisation, Supplementary Table 17). To give an overall estimate of a certain fitting procedure's performance, the average of the absolute MREs and CVs of all model parameters was calculated. Error distributions were plotted as histograms with 21 bins. To test for statistical significance of error distribution we used Levene's test for the similarity of (error) variances [18]. Raw data and analysis code is available at GitHub (https://github.com/Gotsmy/finger_sweat) [19].

Loss Functions
Three losses (Cauchy, Huber, and Soft-l1) which are part of the SciPy package [10] and implemented in its curve fit function and a forth, general, adaptive, and robust loss described by Barron 2019 [11] (from now on referred to as Robust loss) were tested with n = 100 inner and i = 300 outer MC replicates. Additionally, we analysed differences in the performance with respect to accuracy when the loss was calculated with absolute errors or the maximum of either absolute or relative errors for every time point. The performances of the loss functions are shown in Supplementary Figure 14. For most cases, we found that estimating the loss from the maximum of absolute and relative error resulted in better (i.e. smaller) CVs and MREs than just taking the absolute errors into account. Moreover, the Max Robust loss was clearly performing best, followed by Max Cauchy and all others being more inaccurate. Therefore, we continued to use the Robust loss function calculated from the maximum of absolute and relative error (max Robust) in all subsequent simulations.

Inner Monte Carlo Replicates
As mentioned in the Section 6.2.3, MC replicates were performed for every time course data set, and the result with the best loss was selected. This was done to sample the whole solution space in an unbiased manner and to prevent the estimation of incorrect fitting parameters from convergence in local minima. Simulations (with Max Robust loss and i = 300 outer MC replicates) for different numbers of inner MC replicates, n, were performed. The results are shown in Supplementary Figure 15. Generally, there are big improvements of CV and MRE between n = 1 and n = 100. Less improvements are visible with more inner MC replicates. Therefore, and because of the additional computational load with increasing n we decided to use 100 inner MC replicates for the data fitting procedure and all subsequent simulations.

Sampling Time Points
Next, we considered different amounts of sampling time points and how they affect the error associated to data fitting. The sampling time points used in the simulations closely resemble the time points used for finger sweat data acquisition in the caffeine capsule study (Supplementary Table 12). In that study we found a high variability of late concentrations that were fitted. Thus we were interested whether it is advisable to not use the late values at all for the fitting procedure. The number of time points for which data was created was defined as j and simulations were performed with n = 100 inner and i = 300 outer MC replicates using the max Robust loss. An overview of the time points is shown in Supplementary Table 12. The change of the CV and MRE with different time points is plotted in Supplementary Figure 16. Taking more time points into account improves the CV for the time-independent kinetic constants and initial parameters. In contrast the CV across all time points of time-dependent V sweat (t) and c(t) stays constant or even worsens slightly for larger j. This indicates that the error variance of time-dependent parameters is not constant over time (as further investigated in Section 6.3.4). Interestingly, the number of sampling time points does not have a strong impact on MRE with some parameters improving and some deteriorating in a seemingly random pattern. One peculiarity was k 5 (constant of caffeine degradation) which was the most overestimated parameter with j = 11 and then went on to be most underestimated parameter with j = 20.

Time Dependence
Previous results indicated that, when including time points many hours after the initial ingestion of the caffeine capsule, the CV of time-dependent parameters worsened. Therefore, we calculated the distribution of relative error for V sweat (t) and c(t) on each time point of a simulation. All distributions were taken from one simulation with n = 100 inner MC replicates, i = 3000 outer MC replicates and j = 20 time points using the max Robust loss. The distribution of relative errors of the V sweat at different time points is shown in Supplementary Figure 17a. Interestingly, the distribution for early time points (0 h ≤ t ≤ 1 h) was comparably wide (CV roughly > 20 %). For time points in the middle of the curve (1.5 h ≤ t ≤ 11 h) CV was the lowest (10 % < CV < 20 %). Thereafter, the more time there was between sampling and caffeine intake, the worse the V sweat predictions got (i. e. wider error distribution, CV > 20 %). Finally, there is a clearly visible drop in precision and accuracy for t = 24 h (black line, CV > 30 %, |MRE| > 5 %).
Similarly to V sweat , a time-dependent change in error distribution is visible for unit-less metabolite concentrations as shown in Supplementary Figure 17b-e. Concretely, we observed higher CVs in early samples (0 h ≤ t ≤ 1 h), the smallest errors in the middle section (1.5 h ≤ t ≤ 6 h) and an increase in CV thereafter. Generally, the CVs for concentrations were smaller than for V sweat , however, time point t = 24 h clearly performed worst for all parameters (black lines, CV > 30%).
Early measurements of the caffeine capsule study were done with fewer sampling time points (simulation E15, Supplementary Table 12). To estimate the CV and MRE with this set-up a simulation with n = 100, i = 3000, and the max Robust loss was performed and the timedependent results that were used to calculate confidence intervals of Supplementary

Experimental Error
So far a fixed experimental error of 10% was assumed. In this section we investigate the influence of experimental error on the size of CV and MRE. The changes in size of fitting errors are shown in Supplementary Figure 18 for experimental errors between σ = 1% and 30% (i = 300, n = 100, max Robust loss). As expected, as the experimental errors propagated through the fitting procedure and lead to increased CVs and MREs. Interestingly, the propagation follows a linear relationship, however, the slopes of the increase of CVs were different for different parameters. Constants (full and dotted lines) seem to be impacted more than the time-dependent concentrations (dashed/dotted lines). Only the time-dependent V sweat (dashed line) increased at a similar rate as the time-independent parameters.

V sweat Normalisation
To analyse the impact of the V sweat normalisation, we compared two fitting procedures. First we used the procedure as described in the previous section. Second we assumed a constant sweat volume µ Vsweat = 1.34 µL (Equation 18) across all time points (being equivalent to no V sweat normalisation). Simulations were done with i = 3000 inner and n = 100 outer MC replicates and max Robust loss. The distribution of fitting errors around the true value for the kinetic constants of the model are shown in Supplementary Figure 19. MREs are found to be close to the true value in both cases. However, the distribution of errors, CV, is clearly narrower with V sweat normalisation than without (Supplementary Table 17). On average the V sweat normalisation improves the CV of fitting constants by 19%. The highest differences in CV are found for V sweat itself (79%), followed by the kinetic constants k i (25%) and the smallest differences for the concentrations (4.5%). Generally, the results showed that with V sweat normalisation the error variances for all model parameters were significantly smaller than without V sweat normalisation (Levene's test, all P -values < 10 −10 , sample size for time-independent constants = i, for V sweat , c par , c bro , and c phy = i · j, and for c caf = i · (j − 1)).

Discussion
Here we investigated the error associated to a fitting procedure that accounts for variable sweat volume includes an V sweat . We found that the best loss function (i.e. the loss function with the lowest CV and MRE) for this task is the general, adaptive, and robust loss [11]. Our results showed that using the maximum of absolute and relative error per time point for calculation of the loss gives better accuracy compared to just the absolute error. Next, we investigated the optimal number of innter Monte Carlo replicates. The idea behind taking many MC replicates is to avoid getting stuck in local minima. We found that up to 100 MC replicates the quality parameters of the fit improved. With more than 100 MC replicates, however, no substantial improvement of the fit's accuracy and precision was observed. Therefore, and for the reason of reducing computational expense, we chose 100 MC replicates for the data fitting procedure.
Error estimation for simulations with different number of sampling points clearly showed an advantage of increasing the number of samples. Thus, we concluded that even samples measured after more than 24 h after caffeine capsule ingestion can improve the precision of time-independent fitting parameters (compare j = 19 and j = 20 of Supplementary Figure 16a). However, this lead to an overall decrease of quality of estimation of time-dependent parameters. This decline, however, was mainly due to high errors in late concentration and V sweat estimates and, therefore, they should be treated very conservatively. This effect makes sense since so late the concentrations are relatively low and small absolute deviations can have big relative errors as consequence.
Additionally, we showed that the fitting error linearly increased with the assumed experimental error, which indicates that the method is robust, even to bigger disturbances of the input data. However, we argue that expecting an experimental error of 10% is reasonable since variations within one donor are expected to be small and potential technical variations are corrected for by division with the internal standard.
Finally, we compared the CV and MRE for a fitting procedure with and without V sweat normalisation and we demonstrated that indeed explicitly using V sweat values in the model significantly improves the precision of the fit.