Introduction

Blood constituent (analyte) monitoring forms a substantial component of medical diagnostics, ranging from critical-care to point-of-care testing. The concentration levels of these analytes are tightly controlled under normal circumstances and thus any deviation from the well-established ranges can be immediately correlated with an abnormality in body function. Formulation and advance of non-invasive, continuous measurement strategies for such analytes - particularly glucose in diabetic patients1,2 - is highly desirable, given the significant challenges and inconvenience associated with multiple blood withdrawals per day. Furthermore, such a measurement technology would significantly aid neonatal and ICU patient monitoring as well as the screening for pre-diabetes and gestational diabetes. Currently, the latter pathological conditions are diagnosed via functional loading tests (e.g. the oral glucose tolerance test (OGTT)3), where the insulin action is monitored by discrete finger-prick measurements over the duration of a few hours following an initial glucose stimulus.

To address this unmet clinical need for non-invasive, continuous measurement of blood analytes, vibrational spectroscopy, especially infrared (IR) absorption and Raman4,5,6, has been proposed by researchers due to its ability to quantify biochemical composition of the blood-tissue matrix without necessitating addition of exogenous labels. Raman spectroscopy, in particular, has been exploited due to its exquisite chemical specificity emanating from the characteristic frequency shifts of the photons following its interaction with the matrix molecule(s). This provides an inherent advantage in targeted analysis of a specific bioanalyte as the congestion among the broad overlapping features in IR absorption spectra often washes out the information of interest. To gainfully employ spectroscopic techniques in bioanalyte concentration prediction, chemometric methods, such as partial least squares (PLS) regression7 and support vector regression (SVR)8, are employed to develop calibration models from representative samples. The multivariate calibration models are then used in combination with the spectrum acquired from a prospective sample to compute the bioanalyte concentration in that sample.

Despite promising measurements of clinically relevant analytes (e.g. glucose, urea and cholesterol) in aqueous solutions9 and whole blood samples10, the translation of spectroscopic techniques to in vivo measurements in humans has proven to be challenging. The primary impediments for clinical translation has been attributed to sample-to-sample variability in optical properties, such as those due to variations in skin-layer thickness and hydration state11 and in physiological characteristics12. In view of the substantial inter-person variance, an alternate route in establishing the potential of vibrational spectroscopy would be to perform time-lapse measurements (in a continuous or semi-continuous manner) on a single individual. Specifically, it would be beneficial if temporal evolution of the concentration profile could be obtained solely from spectral acquisitions without resorting to (intermediate) concentration measurements. This would allow for minimum sample perturbation be it in a biomedical setting or in chemical reaction monitoring. Although the utility of such a protocol, which can function with little or no concentration information, is indisputable, there is currently a lack of analytic frameworks that can operate solely based on the acquired spectroscopic and sample-specific kinetic information.

In this article, we propose a novel analytical formulation that enables spectroscopy-based prediction of analyte information, without necessitating reference concentration information for the development of the calibration model. The proposed framework is hereafter referred to as the improved concentration independent calibration (iCONIC) approach. We seek to solve this inverse concentration estimation problem by incorporating the kinetic model of the system to guide the spectroscopy-based concentration estimates. In other words, the kinetic model of the process provides a guide to the “missing” concentration piece of the inverse problem of concentration estimation. While the fundamental principles of the iCONIC approach are generalizable to any spectroscopy-based quantification study, this work focuses on the development and application of the iCONIC framework using non-invasive glucose monitoring as the paradigm. Here, we characterize the physiological lag between the blood and interstitial fluid (ISF) glucose concentrations using a two-compartment mass transfer framework, which has been employed to model the analyte transport by us and others13,14,15. Inspired by indirect implicit calibration ideas16, minimization of the spectral information and the output of the kinetic model is then pursued in the concentration domain. The spectroscopic calibration step is executed inside the kinetic parameter estimation loop in an iterative fashion. This considerably alleviates the rigidity associated with prior methods that sought to determine a simultaneous solution to the kinetic modeling and the spectroscopic calibration components15.

Using concentration datasets obtained from a series of OGTTs in human subjects, we demonstrate the potential of the iCONIC approach in estimating blood glucose concentrations. We show that the iCONIC estimates conform more closely to the measured values in relation to the predictions computed from conventional PLS calibration that shows larger deviations. Additionally, this study also provides quantitative insights into the subject's physiological lag characteristics potentially offering a new tool for the personalized assessment of diabetes onset and progression. Collectively, these findings open the door for a diverse range of spectroscopic monitoring applications - especially in clinical practice where obtaining intermediate concentration information is always challenging and often impossible.

Results

Dynamic bioanalyte tracking by calibration-free approach

Motivated by the need for a spectroscopy-based monitoring algorithm that can work with limited or no reference concentration inputs, we explore the powerful, yet relatively underutilized, idea of indirect implicit calibration16,17 and report its first application to quantitative biological spectroscopy. Spectroscopy-based inference of concentration of system constituents belongs to the class of inverse, ill-posed problems, in the sense that there can be multiple solutions that are consistent with the experimental data18. Additionally, tracking the temporal evolution of a constituent necessitates analysis of the spectral time series often incorporating conservation equations of differential nature and constitutive equations of algebraic nature into the spectroscopic calibration framework. Continuous spectroscopy-based non-invasive glucose monitoring offers a representative case study, due to the physiological dynamics of glucose transport between the blood and ISF compartments. Specifically, the time lag between the two glucose levels gives rise to an inconsistency in classical spectroscopic calibration models, as the spectroscopic measurements primarily probe ISF glucose while blood glucose values are used as reference inputs19. This problem is particularly exacerbated when measurements are performed during rapid changes in glucose levels such as immediately after a meal ingestion (as is the case for OGTT) or insulin administration. Fig. 1 schematically illustrates the spectroscopic measurement process and shows how the photons interact with the glucose molecules in the two distinct compartments. We introduce here a novel route to address this important problem – and related class of monitoring applications – by minimizing the residual between two concentration profiles, namely the profile computed from the kinetic model and that obtained from transformation of the spectral information to the concentration domain.

Figure 1
figure 1

A schematic illustration of the Raman spectroscopic measurement process for in vivo continuous glucose monitoring.

In the proposed iCONIC approach, calibration of the acquired spectra is done using the concentrations calculated with iteratively improved kinetic parameter(s) – and thus does not require the actual measurement of the reference concentration values as detailed below. Here, the calculated concentrations, , are considered to be “measured” variables and the residual-minimization is performed in concentration units. This represents a multivariate calibration framework with “floating data”16. Using concentration values (which is a function of the kinetic parameters, k) and the recorded calibration spectral matrix Y, one can compute the corresponding regression matrix B using the least-squares solution to equation (1):

where E denotes the noise (error) in measurements.

Given the underdetermined nature of the system (as it has more variables, i.e. wavelengths, than equations, i.e. number of calibration data points), solution of the above equation implies calculation of a suitable pseudo-inverse of Y, such that where represents the regression matrix estimate. The regression matrix estimate is obtained using singular value decomposition (SVD), PLS or principal component regression (PCR). The calibrated concentration profiles is then determined by substitution in equation (1):

This formulation is employed to iteratively obtain the estimates of the kinetic parameters, k, by minimizing the following residual:

Equation (3), notably, employs two altogether different concentration profiles: , the concentrations computed based on the conservation equations governing the dynamic process; and , the spectroscopy-based concentration estimate obtained from the calibration step. Evidently, both concentrations are dependent on the current value of the kinetic parameters.

In order to reduce the impact of spectral baseline fluctuations and improve the contribution of each component of the spectral data during fitting, SVD of the spectral dataset Y is used to isolate the important time-trace information. Detailed explanation of the specific SVD procedure and its ability to alleviate the pernicious effect of baseline shifts is provided in Supplementary Notes. Reducing Y to Y = UΣV* (where U is the abstract time-trace matrix of concentration information, Σ is the diagonal matrix containing the singular values and V* is the abstract matrix of basis spectra) and replacing it in equation 3, we obtain:

While equation (4) represents the general framework for analysis of any time-resolved spectral data recorded from a dynamic system, the ensuing solution formalism is specialized for spectroscopy-based non-invasive monitoring of blood glucose. Here, the modeled concentration () and regression () matrices are replaced by the corresponding ISF glucose-specific vectors (,). Since the acquired spectral data are representative of the ISF glucose concentrations, this ensures consistency in the developed calibration models. Additionally, to remove any remaining ambiguity in the inversion problem, as well as to rule out unphysical and implausible solutions, a secondary convex goal is added by means of a regularization parameter λ20. This ensures that the minimization procedure converges on a robust solution in the sense that small variations in the spectral dataset do not cause large variations in the computed kinetic parameters, k and the resultant regression matrix.

Where is assessed from the reverse form of the mass conservation-based model that governs the blood and ISF glucose relationship, as detailed in the Methods section. The residual of equation (5), Qreg, is minimized using the Newton-Gauss-Levenberg/Marquardt (NGLM) algorithm21 for identification of the optimal kinetic parameters, kopt (see Supplementary Note 1).

Solution of equation (5) yields the set of optimal ISF glucose concentrations (via kopt), which in turn is used to calculate the ISF glucose-specific regression vector . Using this regression vector in conjunction with the spectrum measured at the prediction time point (spred), one can predict the ISF glucose concentration:

The set of predicted ISF glucose concentrations can be transformed using the forward form of the physiological glucose dynamics model and knowledge of the kinetic parameters kopt to construct the corresponding blood glucose estimates.

Calculation of Blood Glucose concentration

Fig. 2 shows the mean and ±1 standard deviation (SD) of representative Raman spectra acquired from a human volunteer undergoing an OGTT. The SD to mean ratio of the intensity values over the fingerprint region of the spectrum, 300–1700 cm−1, ranges from 0.03 to 0.1. The tissue spectral signatures can be attributed to the presence of Raman-active components (such as from blood analytes, collagen I and III, structural proteins in the epidermis and dermis and sub-cutaneous lipids) and endogenous fluorophores. While the near-infrared (NIR) excitation considerably reduces the autofluorescence levels, the presence of a broad background can still be observed in the acquired spectra. The strongest Raman peak is observed at ca. 1445 cm−1 and other prominent features are located at approximately 859, 938, 1004, 1273, 1302 and 1655 cm−1, which is consistent with prior in vivo tissue observations22. Expectedly, the Raman bands of glucose are masked in the myriad signals of other constituents and cannot be uniquely assigned by visual inspection alone. This necessitates the use of multivariate algorithms to identify the subtle changes and to link such changes to the glucose concentrations at different time points.

Figure 2
figure 2

Representative Raman spectra acquired from a human subject undergoing OGTT.

The thick line shows the mean value and the shaded area represent ±1 standard deviation.

Here, we have used the iCONIC approach to predict the glucose concentrations with only the first reference concentration from each subject being used to develop the model. To understand the efficacy of the proposed method in comparison with more established approaches, PLS calibration was also used to estimate the glucose concentrations based on the acquired Raman spectra. Since PLS calibration (or any other analogous implicit calibration technique such as PCR and SVR8) method requires significantly more reference concentrations to build a model, a cross-validation procedure is implemented to test the predictive power of the model. While the leave-one-out cross-validation routine (LOOCV), explained in Methods section, avoids some of the pitfalls encountered in autoprediction, it may yet result in an apparently functional model (due to “overtraining”) that cannot be used for prospective prediction23. Nevertheless, given the problem constraints, the PLS LOOCV procedure provides the best yardstick for comparison while also highlighting the need of an essentially “calibration-free” approach.

Fig. 3 displays the results of iCONIC prediction (red diamond) and PLS LOOCV (black circle) in a representative subject, where the blue squares depict the measured blood glucose values. The measured concentrations show the expected rise in glucose levels due to ingestion of the sugar-rich drink followed by the subsequent recovery to (nearly) euglycemic levels owing to the normal insulin response. The recovery would be delayed or absent if a diabetic subject were tested. We observe that the iCONIC model (root mean squared error of prediction, RMSEP = 5.14 mg/dL) exhibits significantly better prediction accuracy in comparison with the PLS LOOCV estimation (root mean squared error of cross validation, RMSECV = 13.64 mg/dL). The better estimation using the iCONIC approach can be attributed to two factors, namely suitable correction for the physiological lag between blood and ISF glucose and the isolation of the baseline shifts and system drifts (Supplementary Note 2). The effect of the former can be viewed in the initial 60 minutes when the glucose levels of the subject rise sharply. During this time frame, the iCONIC predictions match considerably better with the reference concentrations, in relation to the PLS estimates, by appropriately modeling the transient discrepancies. As noted in previous studies15,24, conventional calibration methods exhibit systematic errors during rapid excursions, even in the presence of a positive correlation between blood and ISF glucose.

Figure 3
figure 3

Plot of prospective prediction (iCONIC, red diamond), LOOCV (PLS, black square) and reference glucose concentrations (blue squares) for a representative human subject.

To better illustrate the predictive power when multiple human subject data sets are included in the analysis, the results of the iCONIC model are plotted on the Clarke error grid (Fig. 4)25, a widely used method for quantifying the clinical usefulness of glucose predictions. Predictions in zones A and B are regarded as acceptable and predictions in zones C, D and E are considered to be potentially dangerous if used for clinical management. The RMSEP and the R2 value (coefficient of determination) are computed to be 0.54 mM (1 mM of glucose = 18 mg/dL) and 0.97, respectively. Critically, all the glucose predictions over the entire human subject dataset reside in the clinically acceptable regions – even when the glucose levels are relatively low (4–6 mM). This result is of great value as a key benefit of a continuous glucose monitoring system is the real-time detection of hypoglycemic states. A common motif in diabetes care is the lack of immediate knowledge regarding low blood glucose excursions in over-medicated patients resulting in serious consequences including diabetic coma. The ability to non-invasively and continuously estimate blood glucose trends that is predictive of both hypoglycemic and hyperglycemic blood glucose excursions would address this pressing need.

Figure 4
figure 4

Blood glucose predictions of the iCONIC model for the complete human subject dataset shown on the Clarke Error Grid.

Table 1 summarizes the results of the iCONIC model predictions (viewed in Fig. 4) as well as the corresponding PLS LOOCV estimates. We observe that the reduction in error on application of the iCONIC model, when compared to the PLS model estimates, ranges from nearly 18% to 59% with an average value of 35.5% computed over the 8 subjects. Even when compared with our previous dynamic concentration correction (DCC) model, which provided on average a 16% reduction in prediction error with respect to the corresponding PLS models for the same dataset21, the iCONIC model demonstrates much better predictive power.

Table 1 Summary of PLS LOOCV and iCONIC prediction results for the human subject dataset

Discussion

Our findings suggest that vibrational spectroscopy in combination with the proposed iCONIC approach can provide continuous glucose tracking information without necessitating substantial invasive blood glucose measurements. In the following, we discuss the validity and efficacy of this information content including the characterization of the glucose diffusion process.

The glucose diffusion process, which was previously modeled using a single lumped parameter13,14,15, is now more correctly characterized using a two-parameter model (k1, k2). This ensures that the rate of glucose uptake by the subcutaneous tissue is also addressed in the mass diffusion process. In each of the volunteers, k1 had a larger numerical value in relation to k2 signifying that the blood glucose rise was faster than the return to euglycemic levels. This is consistent with typical observations in glucose tolerance studies where the increase in blood glucose levels following ingestion of glucose solution is rapid in relation to the subsequent insulin-mediated glucose clearance from the blood (by the cells) and, thus, the corresponding return to normal blood glucose levels. One would anticipate that subjects with impaired glucose tolerance would exhibit significant changes in the determined rate constants, especially k2.

Critically, this allows us to model situations where, during the time of decreasing glucose levels, ISF glucose may fall in advance of blood glucose and reach nadir values that are lower than the corresponding blood glucose levels26,27. Some studies have indicated that ISF glucose levels can remain below blood glucose concentrations for fairly long period of time following correction of insulin-induced hypoglycemia28. While our limited observations do not appear to support such reports, these findings could be explicated by the so-called push-pull phenomenon, according to which the glucose is pushed from the blood to the ISF compartment during the rising phases and the glucose is recruited from the ISF to the surrounding cells during the falling phases. If this were true for a given process, our model would re-calibrate itself by adjusting the corresponding k2 value.

Another pertinent question relates to the demonstration of causality of glucose concentration to the acquired spectral information, especially as the intrinsic glucose signal is significantly smaller than that of several other blood-tissue matrix constituents. Moreover, time-dependent physiological processes or variations specific to an instrument that happen to be correlated with the glucose levels have often been found to dominate classical implicit calibration models, especially for non-specific measurement modalities29. To investigate the robustness of our predictions to chance correlations (spurious factors), an F-test was used to compare the squared error of prediction (SEP) to the standard deviation of the glucose concentrations within the prediction data set (SDP) and, therefore, to assess if the variability of the predicted concentrations is greater than would be expected by chance. Here, from the values listed in Table 2, the F-value for the PLS LOOCV estimates and the iCONIC predictions was calculated to be 4.99 and 13.28, respectively. For the PLS computation, the SEP was replaced with SECV. Clearly, both sets of F-values are statistically significant. Thus, the null hypothesis that the variance of errors of glucose predictions is same as the variance of the reference glucose concentrations can be rejected. Table 2 also lists the results for linear regression analysis of the prediction points for the PLS and iCONIC cases. The y-intercept for the iCONIC predictions is lower, the slope is closer to unity and R2 value is higher. These results strongly indicate that the current glucose predictions are based on the spectroscopic properties of glucose rather than on chance variations or correlative response between glucose and other matrix constituents. Notably, since the iCONIC models do not use the standard input of an array of reference concentrations, the possibility of building an apparently functional model based on incidental correlations is largely eschewed.

Table 2 Performance characteristics for the PLS and iCONIC models (* indicates SECV is the correct error metric for PLS and is used here)

Difference plot analysis was also performed (Supplementary Fig. S1) for further comparison of the methods. The Bland-Altman plots of Supplementary Fig. S1 enable the investigation of the presence of any systematic difference between the reference and Raman measurements and to identify possible outliers. Here, the mean difference is the estimated bias and is found to be 0.5 mg/dL and −4.73 mg/dL for the PLS and iCONIC estimates, respectively. The corresponding 2SD limits are determined to be 31.9 mg/dL (PLS) and 17.1 mg/dL (iCONIC). As per ISO 15197 guidelines, these 2SD limits should be less than 15 mg/dL for glucose concentration below 75 mg/dL and should be lower than 20% for any value higher than 75 mg/dL. Our findings therefore suggest that the combination of the iCONIC approach and Raman spectroscopy provide clinically viable predictions, especially in terms of single-individual prediction.

The results presented in this manuscript provide a proof-of-concept validation of the untapped potential of such a broad and widely generalizable approach. Specific to the problem of glucose monitoring, we envision that the iCONIC model can predict impending hypo- and hyperglycemic excursions potentially allowing the diabetic patient to take necessary corrective action. It can also be gainfully employed in studying physiological changes (for example, in micro- and macro-vasculature) due to the onset of diabetes via its ability to characterize the glucose transport process in the circulation system and in the ISF. A large cohort of normal human volunteers and diabetic patients is currently being studied to test the feasibility of this method across different ages, ethnicities and, critically, in subjects with impaired glucose tolerance characteristics. While the present day standard of care primarily involves interpretation of changes in blood glucose, we believe that in specific cases measurement of ISF glucose levels may be more important clinically such as the persistence of impaired cognition for prolonged periods of time after correction of hypoglycemia.

In the current study, we have proposed the potential of a spectroscopic method for tracking bioanalytes in a dynamic system with minimal a priori concentration information. The ability of the iCONIC approach to make accurate predictions in clinical datasets acquired from human subjects is demonstrated in the presence of myriad non-analyte specific variations. The performance metrics of the iCONIC algorithm exceed that of the conventional PLS calibration method, which we attribute to its twin advantages of accounting for the physiological lag between blood and ISF glucose and avoiding the baseline shifts and system drifts. While the initial pilot studies performed here provide the foundation, further clinical investigations - in single-center and subsequently in multi-center settings - will be pursued to validate the approach. Furthermore, the iCONIC formulation can be readily extended to quantify analytes using other spectroscopic signatures such as infrared absorption and thermal emission, which offer higher sensitivity in comparison to Raman acquisitions30,31,32.

Given our findings and the inherent non-invasive nature of vibrational spectroscopy, the combined method would be appropriate as a real-time clinical adjunct for continuous monitoring of glucose and other blood analytes, e.g. creatinine, urea and bilirubin, in critical care patients and in neonates, where frequent blood withdrawal is particularly problematic. Application of this minimally perturbative approach would also lay the foundation for a novel blood withdrawal-free spectroscopic assay for glucose tolerance testing in the near future. Moreover, the scope of application of this method extends beyond in vivo diagnostics to microfluidics investigations as well as recalcitrant industrial process monitoring, where intermediate sampling of the specimen would compromise its identity.

Methods

Clinical Studies on Human Subjects

To test the capability of the iCONIC approach in predicting concentrations from time-resolved spectra, clinical datasets comprised of blood glucose concentrations and Raman spectra are used. These datasets, which were detailed in one of prior reports33, were collected from healthy human volunteers undergoing OGTT. Raman spectra were recorded at regular 5 min intervals from the forearms of these volunteers. For Raman spectral acquisition, an 830 nm diode laser (Process Instruments) was used as an excitation source with an average power of ca. 300 mW in a ~ 1 mm2 spot. On the detection end, an f/1.8 spectrograph (Kaiser Optical Systems) was coupled to a liquid nitrogen-cooled CCD (1340 × 1300 pixels, Roper Scientific). Blood was drawn every 10 min and analyzed using a clinical glucose system (HemoCue, Inc.) to evaluate the subject's response. This study protocol was approved by the MIT Committee on the Use of Humans as Experimental Subjects and written informed consent was obtained from each of the volunteers in the study. All the experiments were carried out in accordance with the approved guidelines by the Committee. Data sets from volunteers exhibiting motional artifacts, inadequate SNR in the acquired spectra and impaired glucose tolerance characteristics are excluded from our analysis. Additionally, two subjects who underwent double OGTT were also not considered in this study.

Data Analysis

To address clinical concerns arising from glucose monitoring in the subcutaneous interstitial fluid, we have previously developed a DCC method21. This method, which incorporates the mass transfer equations governing the diffusion of glucose between the blood and ISF compartments into the spectroscopic framework, provides a greater degree of consistency with the acquired spectra in the calibration model. Here, we re-formulate the solution method to allow for subcutaneous uptake of glucose in the ISF compartment by the cells.

Briefly, the transport of glucose from the blood to the ISF compartment occurs by a diffusion process across an established concentration gradient19,20. As detailed in the literature, this process can be mathematically written in the form of the following equation for the glucose component in the ISF space:

where cBG, cISF are the concentrations of glucose in the blood and ISF compartments, respectively; VBG, VISF are the volume of blood and ISF in the probed region; k21 and k12 are the forward and reverse flux rates for glucose transport across the capillaries; and k02 is the rate of glucose uptake into the surrounding tissue. This equation can be re-written to the following form by reducing the additional parameters into a two-parameter (k1, k2) system:

Re-arranging equation (8) forms the forward iCONIC model that is used to compute the blood glucose concentrations based on the ISF glucose values and knowledge of the system parameters. Additionally, integrating equation (8) provides the reverse form of the iCONIC model, which is plugged into equation (5) that in turn performs minimization of the objective function.

For the conventional approach, PLS models were created based on the number of loading vectors that provide the least error in cross-validation10. The PLS models did not explicitly address the physiological dynamics issue. Here, LOOCV approach was used to provide concentration estimates, because of the limited number of data points available per individual. In LOOCV, the data from a particular time point is eliminated and the PLS model developed on all the other points is used to predict the concentration at that time point optimizing agreement with the reference measurement.