Determination of the composition of heterogeneous binder solutions by surface plasmon resonance biosensing

Surface plasmon resonance-based biosensors have been extensively applied to the characterization of the binding kinetics between purified (bio)molecules, thanks to robust data analysis techniques. However, data analysis for solutions containing multiple interactants is still at its infancy. We here present two algorithms for (1) the reliable and accurate determination of the kinetic parameters of N interactants present at different ratios in N mixtures and (2) the estimation of the ratios of each interactant in a given mixture, assuming that their kinetic parameters are known. Both algorithms assume that the interactants compete to bind to an immobilized ligand in a 1:1 fashion and necessitate prior knowledge of the total concentration of all interactants combined. The effectiveness of these two algorithms was experimentally validated with a model system corresponding to mixtures of four small molecular weight drugs binding to an immobilized protein. This approach enables the in-depth characterization of mixtures using SPR, which may be of considerable interest for many drug discovery or development applications, notably for protein glycovariant analysis.


Surface plasmon resonance-based biosensors have been extensively applied to the characterization of the binding kinetics between purified (bio)molecules, thanks to robust data analysis techniques.
However, data analysis for solutions containing multiple interactants is still at its infancy. We here present two algorithms for (1) the reliable and accurate determination of the kinetic parameters of N interactants present at different ratios in N mixtures and (2) the estimation of the ratios of each interactant in a given mixture, assuming that their kinetic parameters are known. Both algorithms assume that the interactants compete to bind to an immobilized ligand in a 1:1 fashion and necessitate prior knowledge of the total concentration of all interactants combined. The effectiveness of these two algorithms was experimentally validated with a model system corresponding to mixtures of four small molecular weight drugs binding to an immobilized protein. This approach enables the in-depth characterization of mixtures using SPR, which may be of considerable interest for many drug discovery or development applications, notably for protein glycovariant analysis.
In the last three decades, Surface Plasmon Resonance (SPR)-based biosensors have become the method of choice to study binding between an immobilized (bio)molecule-referred to as the ligand-and its soluble partner-the analyte. The said analyte may be purified 1 or in a crude extract [2][3][4] . Through the years, important advances were made in the development of optimized experimental protocols 5 and data analysis methods [6][7][8][9][10][11][12] . Robust data analysis software packages were created and contributed to the success of this label-free technology [13][14][15][16] . Recent efforts have brought the SPR technology at the forefront of drug development for (i) the off-line 17 or at-line 18 characterization of the critical quality attributes of biotherapeutics or (ii) for the screening of large libraries of drug candidates 19 . On that specific note, our group recently demonstrated that the throughput of a SPR-based experiment could be improved by injecting two analytes simultaneously, since the binding kinetics of both analytes could be identified in only one set of experiments 10,12 . These methods have yet to be expanded to the study of more than two analytes. We also demonstrated that analyte concentration could be identified from a SPR sensorgram, provided that the kinetics of its interaction to its immobilized ligand are known 20 . However, this approach, which uses parameter identification techniques, only considered pure analyte injections.
The SPR identification of the kinetic parameters for each individual analyte within a complex analyte mixture would also be of great interest to determine the individual kinetic parameters of various (bio)molecules that are hard to separate from each other. Such is the case, for instance, for the various glycoforms of therapeutic monoclonal antibodies produced in mammalian cell cultures, for which binding to their receptors is affected by their glycosylation state [21][22][23][24][25][26][27] . Hence, SPR data analysis techniques that can extract specific kinetic parameters for different glycoforms are required for the analysis of antibody SPR sensorgrams.
The present study aims at providing a twofold framework enabling the treatment of mixtures of more than two analytes. In a first part, we present an algorithm to extract meaningful kinetic parameters from SPR sensorgrams corresponding to the injections of N mixtures of N analytes. In a second part, we exploit the prior knowledge of the kinetic parameters of a set of analytes to estimate the composition of a mixture of these analytes by performing a sensorgram analysis. We validated our algorithms using a well-characterized model system [10][11][12]20 Single-analyte injections. All injections were performed in duplicate at a flow rate of 85μL/min. The data collection rate was set to 10 Hz and the temperature to 18 °C. For 'classical' single-analyte kinetic experiments, CBS, BDS, sulfanilamide and furosemide were injected alone at 5 different concentrations ranging from 1.06 μM to 52.98 μM, 210 nM to 10.49 μM, 1.01 μM to 50.27 μM and 1.01 μM to 50.52 μM, respectively. For double-referencing purposes, 2 buffer injections per analyte were also performed. The injection time was set to 60 s for all analytes and the dissociation time was 420 s for CBS, 240 s for BDS and sulfanilamide and 300 s for furosemide. No regeneration step was needed between injections as the dissociation was complete in every case. This agrees with previous studies that used this system [10][11][12]20,28 . The SPR biosensor used in this study (Biacore T100) uses a detection mode based on varying the incident angle while keeping the wavelength constant. The biosensor uses the geometry proposed by Kretschmann and Raether 29,30 .
Multiple-analyte injections. The four analytes were combined to create 12 mixtures each containing the 4 analytes, as described in Table 1. Four mixtures were rich in one analyte (70-10-10-10 mixtures, A-D), 4 mixtures had a dominating presence of one analyte, but at lower purity (40-20-20-20 mixtures, E-H) and 4 mixtures were created with random compositions (Random mixtures, I-L). The mixture creation was intended to mimic different types of mixtures one could wish to study in a practical application. For all mixtures, the analyte and the buffer injection phases were set to 240 and 320 s, respectively. Each mixture was injected in duplicate at 18 °C with a flow rate of 85μL/min at 7 different concentrations comprised between 300 nM and 15 μM with a data collection rate of 10 Hz. Again, no regeneration was necessary.

Data analysis.
In the case of 'classical' single-analyte experiments, each double-referenced set of sensorgrams was fitted to a 1:1 Langmuir model using a standard 'global fit' approach. Briefly, sensorgrams corresponding to different concentrations were simultaneously analyzed to obtain a set of global parameter estimates via a least square minimization using a custom program developed with the Matlab R2018b software platform (The Mathworks, Natick, USA) 9 . The parameters obtained with our platform matched closely those obtained with the Biacore T100 Evaluation Software 15 with deviations ranging from 0 to 2% for CBS, BDS and sulfanilamide and from 1 to 9% for furosemide (data not shown). The algorithms we developed to (1) extract the kinetic parameters from the sets of sensorgrams corresponding to the various analyte mixtures (Tables 1) and (2) estimate the composition of a given mixture based on the prior knowledge of the kinetic constants of each analyte are presented in detail in the next section. Both algorithms assumed that each analyte bound to CAII according to a 1:1 Langmuir model. Fitting the multi-analyte data and estimating mixture composition was also performed using custom-designed Matlab scripts. As sensorgrams corresponding to the different dilutions of the different mixtures needed to be analyzed simultaneously within the fitting procedure, this type of analysis will be further described in the text as the 'global-global' approach.

Multi-analyte modeling and kinetic parameter identification
The interaction model. We assume a Langmuir 1:1 interaction between each analyte ( A i ) and the ligand www.nature.com/scientificreports/ With k a,i and k d,i corresponding to the association and dissociation rate constants of the i th analyte, respectively. For N analytes, the system is described by the following ordinary differential equation (ODE) system: R max,i is the theoretical SPR response that would be obtained if an infinite concentration of analyte i were injected. F i is the fraction of analyte i in the injected mixture and C TOT is the total concentration of all analytes. We can then compute the predicted SPR response ( R TOT ) and the contribution of each analyte to the response ( R i ) by solving the system of ODE in (2). Furthermore, it can be shown that the contribution of every analyte at equilibrium is given by: where K A,i is the affinity of a given analyte for the immobilized ligand. Note that, for generality purposes, we here consider that each analyte has its own R max,i .
Identification of equilibrium parameters. The following algorithm was developed to identify the kinetic parameters ( k a,i and k d,i ) and the corresponding R max,i of N analytes. Some conditions on the data set to be analyzed must be met: • At least N sets of sensorgrams must be recorded with analyte mixtures of different compositions (at least N mixtures). • The composition of the N mixtures must be known and be linearly independent.
• Each mixture must be injected at least 3 times at different known C TOT .
• The sensorgrams must reach equilibrium.
• The sensorgrams must return to zero during the dissociation phase.
For a given mixture injected at known dilutions, the observed equilibrium responses can be described analogously to a single-analyte experiment by a set of observed equilibrium parameters ( K A,obs and R max,obs ): www.nature.com/scientificreports/ When fitting the model, adding an offset parameter ( R O ) can lead to better fits as it can account for bulk refractive index contributions. K A,obs and R max,obs can be obtained by simple nonlinear least square regression. This optimization problem and all others in this manuscript were solved using an interior-point algorithm ('fmincon' function in MATLAB).
By combining ( R eq,obs = N i=1 R eq,i ), we then have: Equations (5) and (6) imply that, given the experimental plateau values for the sets of sensorgrams of N mixtures of analytes, one can obtain the individual K A,i and R max,i by solving two systems of linear equations: where bold font indicates a matrix or a vector and a dot ( · ) denotes an element-wise multiplication. Each line of F corresponds to a mixture and each column to a given analyte (i.e., each line sums to unity). K A,obs and R max,obs are column vectors containing the observed parameters (1 per mixture). Solving (7) for K A returns a column vector containing each K A,i . Solving (8) for R , we can obtain the vector of R max,i ( R max ): Equations (7) and (8) highlight the need to use mixtures of independent compositions, as the F matrix needs to be invertible to solve the systems of linear equations, as previously mentioned.
Identification of kinetic parameters. During the dissociation phase, only buffer is injected, so that C TOT = 0 . If the equilibrium is reached during the analyte injection phase, the system of ODE further simplifies to: Here, t off is the time point at which buffer injection starts. This ODE system has a straightforward analytical solution: With the previously obtained estimates of the K A,i and R max,i , we can compute the R eq,i for each analyte and each sensorgram in the data set using (3). We then take the integral of the signal during the dissociation phase (11): If t end − t off is sufficiently large, i.e. if the dissociation phase lasts long enough that the observed signal tends to 0 towards the end of the sensorgram, the integral can be simplified to: The integral in (13) can be computed numerically using the trapezoidal method, for example. We can obtain an estimate of the dissociation rate constants ( k d,i ) by a simple linear regression over all sensorgrams of all www.nature.com/scientificreports/ mixtures of the data set. Note that the k d,i can be said to be 'global-global' parameters because they apply to all the sensorgrams of all the analyte mixtures in the data set. We can finally obtain an estimate of the association rate constants ( k a,i ) from the definition of the affinity: In theory, we then have extracted all the relevant kinetic parameters of the N analytes. However, as N increases, more sensorgrams are required. The parameter identification can be sensitive to the amount of noise contained in the sensorgrams. To make the approach more robust, two nonlinear optimization routines were added.
The first optimization pertains to the dissociation phase. Using the previously estimated k d,i as a starting point, we can obtain better estimates by least squares: where S and T are the numbers of sensorgrams and of time steps in a sensorgram, respectively, and k d is a column vector containing estimates of the k d,i . A better estimate of the k a,i can then also be obtained using (14).
The second optimization takes the whole sensorgram into account. We first consider a bulk refractive index contribution term ( R I ) such that: R I is a local offset parameter (1 per sensorgram). A first estimate for R I can be obtained by taking the difference between the observed and predicted equilibrium signals: We then proceed by least square over whole sensorgrams: With k a and R I being column vectors containing the association rate constants and the refractive bulk contributions. Estimates obtained in (9) ( R max,i ), (16) ( k a,i and k d,i ) and in (18) ( R I,s ) are used as a starting point for this optimization routine. The solution to the optimization problem in (19) gives the final estimates of the kinetic parameters of the N compounds. Figure 1 summarizes the kinetic parameter identification algorithm. Note that obtaining adequate starting points is critical to solve the optimization problem in Eq. (19). Otherwise, the optimization algorithm may remain trapped in a local minimum (data not shown). Evans et al. 31 showed that the 2-analyte heterogeneous mixture model is not structurally globally identifiable. Rather, it is structurally locally identifiable, but only if the concentration of each analyte is known. This implies that it is possible to obtain a global minimum to our optimization problem, but only if the starting point for the optimization routine is adequate. The proposed algorithm can provide such starting points using a negligible amount of time (less than a minute for N = 4 ) and computing capacity. One could also verify that a similar optimum is reached even if the starting point is slightly altered but remains adjacent to what is suggested by our method.
Composition of an N-analyte mixture. The kinetic parameters and the R max,i obtained from a training data set can now be used to infer the composition of an unknown mixture of the same N compounds. For this algorithm to be applied, the following conditions must be met: • The k a,i , k d,i and R max,i are known for the N compounds.
• Sensorgrams reaching plateaus are available for the unknown mixture.
• The total concentration of all the analytes being combined ( C TOT ) is known for all sensorgrams.
The problem of identifying the composition of a mixture comes down to determining the set of F i that best fit the data. With no prior information, the starting point is chosen to be: www.nature.com/scientificreports/ A first optimization is carried out by considering only the dissociation phase, as analytical solutions can be easily obtained for this part of the sensorgrams (see (15)).  where F is a column vector containing the F i . This provides coarse estimates for the F i that can be used as starting points for a second optimization routine performed over the whole sensorgrams: With the same constraints given in (22). Note that terms accounting for the bulk refractive contributions ( R I,s ) were added for every sensorgram. These are optimized together with F in (23). Initial estimates for these parameters can be obtained using (18). Figure 2 summarizes the composition identification algorithm.
Confidence intervals on the kinetic parameters. The confidence intervals can be computed based on the standard error 7,9 . For the k th parameter, the standard error is given by: where H is the hessian matrix and H −1 k,k corresponds to the k th element on the diagonal of the inverse of H . χ 2 , which depends on the number of estimated parameters ( p = 3N + S ) and on the number of data points ( n = ST ), is calculated as follows:  Once the standard error of a given parameter has been computed, a 100(1 − α)% confidence interval can be obtained using Student's T distribution: Confidence intervals on the fractions. Because the optimization problems for mixture composition estimation involve inequality constraints (all fractions must be between 0 and 1), it is necessary to obtain nonsymmetrical confidence intervals for the fractions. We have applied a method based on the Fisher F statistic 32 . Considering the optimization problem given in (23), we have: With J θ being the value of the objective function at the optimum point (estimated fractions) and J(θ )| F i =F i0 the value of the objective value obtained by optimizing with an added constraint F i = F i0 . The boundaries of the 100(1 − α)% confidence interval are such that:

Results
For the purpose of testing our data analysis algorithms for multiple-analyte SPR injections, carbonic anhydrase II (CAII) was immobilized at the surface of a SPR biosensor. CAII was selected because its interactions with many small molecular compounds have been extensively characterized by SPR. Among known and well-characterized CAII binders, CBS, BDS, sulfanilamide and furosemide were selected because they encompass a broad range of kinetic constants and refractive index increments (RII) 12 . In order to account for these RII discrepancies, our model considers an individual R max,i for each analyte.
A total of 16 SPR experiments were performed. The 4 compounds were first injected alone at 5 different concentrations, then 12 mixtures all containing the 4 compounds were injected at 7 different total concentrations. To determine if it is possible to extract reliable kinetic parameters from mixtures of analytes at any molar ratio, various sets of experiments were performed. First, we injected mixtures in which one of the analytes is dominant (i.e. 70% on a molar basis) while the other 3 analytes account for 10% each (mixtures A to D, Table 1). Second, to investigate whether mixtures of lesser purity would also allow to extract kinetics, each analyte was injected at a proportion of 40% while the others completed the mix (20% each, mixtures E to H, Table 1). Third, 4 mixtures (I to L, Table 1) were created with random compositions to determine if a non-orthogonal mixture set could lead to accurate kinetic estimations. Sensorgrams corresponding to these 16 experiments are shown in Fig. 3. Sensorgrams corresponding to single-analyte experiments (CBS, BDS, sulfanilamide and furosemide) were globally analyzed with a 1:1 Langmuir model. The three sets of sensorgrams obtained for analyte mixtures (i.e. those shown in panels A-D, E-H and I-L) were fitted assuming that (1) each mixture composition was known and (2) each analyte within each mixture competed to bind to CAII according to a 1:1 Langmuir model. In these cases, the fits can be said to be 'global-global' because sensorgrams of different concentrations and different mixtures are simultaneously analyzed to extract the kinetic parameters. In all cases, to mimic an automated data preparation routine, data were control-corrected without excluding any data point prior to fitting. This is reflected by the presence of spikes at the beginning of each injection (i.e., analyte and buffer) phase in Fig. 3. The fits are otherwise excellent ( χ 2 in the order of 0.1 for all data sets) with no prevalent trends in the residuals (see residual plots, Fig. 3). Table 2 shows the estimated kinetic parameter values and their 95% confidence intervals as well as the estimated affinity of each analyte for each fit. The width of the confidence intervals is less than 5% of the estimated parameter value for most cases. Using these estimated parameters and Eq. (2), it is possible to compute the contribution of each analyte to the SPR response. The contribution of each analyte is shown in Fig. 4 to facilitate the understanding of the competitive phenomena occurring at the surface during the multiple-analyte injection phase. Of considerable interest, BDS, the analyte with the highest association rate ( k a ) always binds to CAII first, before being progressively replaced by an analyte with slower kinetics but higher affinity for CAII (i.e., furosemide) or by analytes present in higher proportions in the injected mixture. Sulfanilamide, which has the lowest affinity for CAII among the analytes we used, has small to almost negligible contributions to the response in most mixtures, except for mixtures C and G where it is present in a dominating proportion (see Table 1). Figure 3 showed good fits between the calculated and observed SPR responses. In order to evaluate the accuracy of the kinetic parameter (26) k = k pred ± t n−p,α/2 * SE(k) www.nature.com/scientificreports/ determination when the parameters are extracted from mixtures of analytes, we compared them to the kinetic parameters determined from single-analyte experiments. For comparison sake, the deviation (or relative errorsee Table 3) was calculated as follows for a given parameter k:

Is it possible to extract kinetic parameters from a mixture of analytes?
The deviation between parameters obtained with single and multiple-analyte experiments should ideally be close to 0. The deviations for the 70-10-10-10 data set (mixtures A to D) were of the same magnitude as the repeatability errors. Those were evaluated by performing 3 independent single-analyte experiments using the same stock solution of CBS and the same CAII surface and ranged from 0 to 10% (data not shown). The error in the estimation of the kinetic parameters made by the fitting algorithm is indiscernible from the experimental error made by producing the 4 independent mixtures of analytes, thus indicating that the fitting algorithm was able to properly extract the kinetic parameters of the 4 analytes using the 70-10-10-10 data set.
How pure do the mixtures need to be? The limits of the approach were then tested with mixtures of lesser purity (the 40-20-20-20 data set, mixtures E to H). Once again, the estimated parameters were very close deviation(%) = k single − k multi k single * 100  Table 1 for exact composition), 40-20-20-20 mixtures (E to H, see Table 1 for exact composition) and random proportion mixtures (I to L, see Table 1 (Table 3). A bigger deviation was observed for sulfanilamide. This can be attributed to its low affinity for CAII, which leads to a limited or negligible contribution in the sensorgrams corresponding to the 40-20-20-20 mixtures data set (see plots E-H in Fig. 4). The goodness of fit of the model is then less sensitive to the quality of the estimated parameters of sulfanilamide. This leads to wider confidence intervals, especially for sulfanilamide (see Table 2), thus complicating the extraction of the kinetic parameters of this analyte. Nonetheless, our results indicate that reliable parameters could be obtained for CBS, BDS and furosemide even with mixtures of a lower purity.

Can a non-orthogonal set of mixtures be used?
We then tested the method with a non-orthogonal mixture set. Here, 'non-orthogonal' refers to the fact that the F matrix is not orthogonal (see Eqs. 7 and 8), although it needs to remain invertible. The compositions of the mixtures within this data set are given in Table 1 (Random mixtures data set, mixtures I-L). One could expect that mixtures for which none of the analytes has a dominating proportion would lead to confounding the effects of the different analytes, and hence poor estimates. However, Table 3 shows that this is most likely not an issue, as small deviations were observed for CBS, BDS and furosemide. Only the estimates for sulfanilamide deviated significantly, more likely because of the small contribution of sulfanilamide to the sensorgrams in the data set (see plots I to L in Fig. 4). We validated that our algorithm can extract the kinetic parameters of a group of analytes from sensorgrams corresponding to various mixtures of these analytes. The main limitation is that each analyte must have a nonnegligible contribution to the recorded SPR response in at least one of the mixtures being fitted. Failure to comply with this requirement leads to bigger estimation errors (especially for the analyte that contributes the less), and to larger confidence intervals. Better estimates (i.e. smaller deviations, narrower confidence intervals) are obtained when low-affinity compounds are present in higher proportion in at least one mixture of the data set being fitted.
Can we estimate the composition of a mixture of analytes? We then explored if prior knowledge of the kinetic parameters of each analyte could be used to infer the composition of each mixture of analytes. Using a given set of kinetic parameters we previously determined (see Table 2), we estimated the composition of all 16 mixtures. Figure 5 compares the estimated fractions to the actual ones. Note that vertical error bars on Fig. 5 correspond to the 95% confidence bounds for the estimated fractions and horizontal bars represent the uncertainty on the real fractions. The latter comes from the propagation of systematic instrument errors during weighting Table 2. Kinetic parameters identified from single-and multiple-analyte injections of mixtures of known composition. 95% confidence interval are given in parentheses. The estimated affinity ( K A ) is also reported. www.nature.com/scientificreports/ and pipetting. Of considerable interest, prior knowledge of the kinetic parameters, as estimated from any data set, allowed us to predict the compositions of every mixture with good accuracy.
To investigate whether prior knowledge of the kinetic parameters, as estimated from a given data set, led to more accurate fraction estimates, we computed the Root Mean Square Error (RMSE) between the estimated and actual fractions independently for each set of kinetic parameters (see Fig. 6). The RMSE can be strongly influenced by the presence of a few large errors, so we also computed the Mean Absolute Error (MAE), which can temper their contribution 33,34 .
We computed the mean prediction error for one analyte at a time and then for all the analytes at once. Results are shown in Fig. 6. For a given set of parameters, only the mixtures which were not used to identify the kinetic  Table 2) were used to obtain the contributions to related sensorgrams. The sum of these contributions gives the 'globalglobal' fits shown in Fig. 3, minus the bulk refractive index contribution R I . The total concentrations ( C TOT ) ranged from 300 nM to 15 μM for all mixtures. This figure was generated with the Matlab R2018b software platform (The Mathworks, Natick, USA, www.mathw orks.com/produ cts/matla b.html). Table 3. Deviation between the parameters identified from single-analyte experiments and parameters identified by fitting the multi-analyte model independently to the 3 data sets (70-10-10-10, 40-20-20-20 and random mixtures).

Data set
Compound k a (%) k d (%) R max (%) K A( %) www.nature.com/scientificreports/ parameters were used to compute the mean error (each set of kinetic parameters was thus identified using 4 mixtures and tested across 12 other mixtures). Figure 6 shows similar trends for the RMSE and the MAE. No apparent difference is perceived between the performance of the different sets of kinetic parameters in predicting the mixture compositions when considering all the compounds. This is of considerable interest as it indicates that, even when using mixtures of a lesser purity to identify the kinetic parameters (which leads to larger deviations in the estimated kinetic parameters-see Table 3), mixture compositions can still be estimated with similar accuracy. All sets of kinetic parameters except the one obtained with random mixtures led to larger errors in estimating the fraction of sulfanilamide. This is because of its low affinity to CAII and therefore small contribution to the recorded SPR response.

Discussion
An algorithm for the identification of the binding kinetics of multiple analytes from the SPR sensorgrams corresponding to mixtures of these analytes has been proposed (Fig. 1). Furthermore, we postulated that, using the identified kinetics, the composition of other mixtures of the same analytes could be estimated (Fig. 2). As a proof of concept, the binding kinetics of four small molecular-weight compounds (CBS, BDS, sulfanilamide and furosemide) to the enzyme CAII were studied. The kinetic parameters derived from classical single-analyte experiments (analyzed via standard numerical methods) and the kinetic parameters derived from multiple-analyte The horizontal error bars were obtained by propagating the systematic error of the pipettes (Pipetman Neo P10, P20, P200 and P1000 and Microman M1000E) and the balance (Dever Instrument Company AA-160) that were used to conduct the experiments. The vertical error bars correspond to 95% confidence intervals computed using the Fisher F statistic. Here, we assumed that the total concentration of all analytes ( C TOT ) is known, but the composition of each mixture (fraction of each analyte) is not. This figure was generated with the Matlab R2018b software platform (The Mathworks, Natick, USA, www.mathw orks.com/ produ cts/matla b.html).  (Fig. 3) analyzed using our algorithm were shown to match. This remained true even when mixtures of lower purity were used for the identification (Table 3). Additionally, the kinetic parameters obtained from one data set could be used to estimate the compositions of the other mixtures with excellent accuracy (Figs. 5, 6). Indeed, we generated 16 mixtures of 4 analytes, of which 4 were used for training (identification of the kinetic parameters), while the remainder was used for validation (estimating the composition of the other 12 mixtures). We pushed the study further by performing 3 other rounds of study in which the 4 mixtures used for training were changed and the remaining 12 were used for validation. For example, we identified kinetic parameters using mixtures A-B-C-D (as reported in Table 2), then used the estimated parameters obtained with mixtures A-B-C-D to estimate the composition of mixtures E-F-G-H-I-J-K-L and pure solutions (Fig. 5). With this approach, we were able to show that our algorithm was extremely robust since, no matter which data set were used for training, validation was good, or even excellent, with the other mixtures (Fig. 6).
In contrast to the previous body of work we proposed on kinetic parameter identification 10,12 or concentration determination 20 , the goal of the algorithms presented here is not to increase the throughput of SPR-based experiments. Indeed, for the identification of the kinetic parameters of N analytes, our approach requires N sets of sensorgrams, as for a classical single-analyte approach. Rather, we here demonstrate that for mixtures of macromolecules that are hard to purify, one may determine the kinetic parameters of each individual binder without any further purification. For instance, it is well-known that immunoglobulin G (IgG) production in bioreactors typically results in a heterogeneous distribution of glycoforms of the same IgG 26 , in turn impacting the IgG interactions with their Fcγ receptors, and the IgG therapeutic efficacy [21][22][23][24][25][26][27] . Hence, an algorithm similar to ours could potentially uncover meaningful binding kinetics of different IgG glycoforms without having access to pure IgG glycoform solutions. To uncover the kinetics of N different glycoforms, one would require N (potentially heterogeneous in terms of glycoforms) different samples with different known compositions (relative abundance of the glycoforms). Producing N differently glycosylated samples can be achieved by varying the cell culture conditions or through the use of various cellular and protein engineering strategies 35 . The compositions could be determined via mass spectroscopy, as it has already been performed in [21][22][23][24][25][26][27] . Once the glycoform kinetics have been uncovered, unknown glycoform mixtures could then be characterized.
The absence of mass transport limitations (MTL) 36,37 and the Langmuirian behavior of each analyte/ligand interaction we assumed was verified for the model system we used in this study (CAII/small molecular weight drugs). However, these assumptions may not always apply. It may thus be necessary to adjust the model to other mechanisms of binding so as to fit the experimental data. In the case of MTL, as plateau values of the sensorgrams would remain the same 36,37 , the determination of the equilibrium constants ( K A ) would still be possible with our approach.
High molecular weight analytes, such as proteins, usually have similar refractive index increments (RII), thus implying that the ratio of their respective R max,i should be proportional to the ratio of their molecular weights 10,15,[38][39][40] . Knowledge of the molecular weights of every analyte would then reduce the number of parameters to optimize using our strategy, as including only one R max value would be sufficient. In contrast, small www.nature.com/scientificreports/ molecular weight compounds often exhibit varying RII. Such is the case for those used in the present study 12 . Thus, it was necessary to account for one R max,i per analyte. We found an affinity between sulfanilamide and CAII that is almost an order of magnitude lower than that of the other compounds tested. An analyte with a considerably lower affinity for the ligand will often have a negligible contribution to the SPR response, especially in mixtures where it is not present at a high purity. This presumably explains why bigger deviations were observed in the identified kinetic parameters of sulfanilamide. Larger errors were also made for sulfanilamide when estimating mixture compositions. This phenomenon could be even more noticeable in systems where two or more analytes are found to have considerably lower affinities than the rest of the analytes. In such a system, the kinetic parameters identified for the low-affinity analytes could exhibit significant deviations and large confidence intervals. When possible, a mixture containing a high purity of at least one of these analytes should be added to the training data to facilitate the kinetic parameter identification process. If such a thing is not possible, we suggest pooling low affinity analytes together, i.e. treating two or more analytes as a single one. Pooling analytes with very similar kinetics could also be preferable, as it might be difficult to distinguish their respective effect on the SPR response. The kinetics obtained would then be slightly biased, but still usable to partially characterise an unknown mixture.
It is important to note that the experiments in this study aimed at validating the method and were not intended to be optimal in terms of experimental time and material consumption. Notably, the analyte injection time was kept very long to ensure that equilibrium was reached. This resulted in longer sensorgrams with more data points, in turn narrowing the confidence intervals on the kinetic parameters and decreasing uncertainties on the estimated compositions. Also, the number of sensorgrams used (7 duplicates per mixture) was intended to survey the experimental space (from a low concentration that gives a minimal observable response to a high concentration that is close to saturation of the surface). However, this may not be optimal and previous studies with single-analyte injections have shown that using a lower number of injections does not always result in larger parameter uncertainty 7 . One may thus minimize experimental time and material consumption necessary to apply our strategy in a similar fashion as performed for single-analyte 9 and for two-analyte experiments 10 .

Conclusion
We suggested a framework for the analysis of multiple-analyte injections over an SPR biosensor surface. The proposed algorithms were shown robust and efficient to extract the kinetics of at least four analytes, as well as to use these parameters to estimate the composition of unknown mixtures from their SPR sensorgrams. The limitations of the framework are its simple 1:1 competitive Langmuir binding scheme and the fact that knowledge of the total concentration of all analytes combined is required. However, the total concentration could be added as a variable to optimize when characterizing an unknown mixture. Our framework broadens the application of SPR biosensing to the detailed characterization of complex analyte mixtures, particularly in contexts where separating the constituents is challenging or impossible. Given the increased use of SPR biosensing, especially in the field of drug discovery and development, the findings of this study may be of interest to a broad community in both academia and industry.