Flexible modeling and control of capacitive-deionization processes through a linear-state-space dynamic Langmuir model

While black-box models such as neural networks have been powerful in many applications, direct physical modeling (white box) remains crucial in many fields where experimental data are difficult or time-consuming to obtain. Here, we demonstrate with an example from desalination by capacitive deionization (CDI), how an existing physical model could be strengthened by combining a general modeling framework with physical insights (gray box). Thus, a dynamic Langmuir (DL) model is extended to a linear-state-space DL model (LDL). Results obtained show the new LDL model could incorporate general structural and operational modes, including membrane CDI and constant-current operation. The formulation removes the need for direct measurements of detailed device properties without adding model complexity, and MATLAB code for automatically implementing the model is provided in the Supplementary Information. We conclude the new LDL model is widely applicable, offering great flexibility in calibration data, and enabling prediction over general operating modes.


INTRODUCTION
Although black-box models 1 , such as neural networks 2,3 , have seen widespread success in recent years for modeling complex systems using a lot of data 2 , physical modeling (white-box 4 ) remains a prominent tool in many fields where detailed system knowledge is required while data for learning is difficult or timeconsuming to extract 4 . A gray-box model leverages the system's known physical behavior to effectively use available data in generalized fitting methods, making it a trade-off between black and white-box approaches 4,5 , and extensive theory and software exist for systems that can be written in this form 6 . In this work, we present an example from desalination modeling, showing how an existing physical model can be lifted up to a gray-box form, making it possible to leverage the existing framework to significantly improve the model performance.
Because CDI is a technology strongly affected by material and operational conditions, physics-based white-box models 24,39,[50][51][52] have been important for describing, predicting, and efficiently optimizing the operations of CDI systems. One such model is the dynamic Langmuir (DL) 53-55 model, which has been shown to be applicable to a wide range of CDI systems and could predict device performance with respect to the applied voltage in CV mode 54 , different flow rates 54 , ion concentration 53,54 , ion composition in multi-ion solutions 55 , and electrode asymmetry 53 .
Among the various modeling approaches in CDI, the decoupled nature of the DL model 53 makes it a solid base for constructing a linear-state-space formulation to describe the CDI processes. In this work, we have constructed such a formulation (termed the LDL model) in order to take advantage of the existing framework of such gray-box models and significantly improve the performance of the simulations. Crucially, this also enables strong and flexible control systems for operating the CDI devices.

RESULTS
The DL model describes the salt removal in CDI through the macroscopic properties of adsorption and desorption strengths (see the method section). The salt ions are assumed to adsorb on voltage-induced sites, and the net adsorption is rapid at the beginning of the desalination process but slows down as the device approaches saturation.
Models can be valuable for CDI as they make it possible to describe, predict, and optimize device performance. This section shows the results when the derived LDL model is applied to various systems and experiments from the literature, in order to validate the model performance for describing and predicting CDI processes.  Fig. 2a demonstrates that the LDL model gives a great fit to data in the desalination phase and slightly underestimates the effluent ion concentration at the beginning of the regeneration phase. The basic modeling accuracy is similar to that of the mD model 51 , and the modeling error could be attributed to electrode starvation (low concentrations asymmetrically prolong the desalination phase) which is not accounted for in the LDL model 51 . Crucially, the model accurately predicts the variations in effluent concentration depending on the applied voltage (Fig. 2b).
For completeness, note that the model can also predict the CDI performance with varying flow rates ( Supplementary Fig. 1, data from ref. 54 ), and for batch-mode operations ( Supplementary Fig.  2, data from ref. 56 ).
Wider applicability Wang et al. investigated the performance of an MCDI system operating in either CV or CC modes, to compare which operational mode was most efficient 48 . The DL model applied to the data for the CV operation is shown in Fig. 3a. There is a very good agreement between the model and experiment for both the effluent concentration (Fig. 3a) and the current (Fig. 3b) over time.
Wang et al. 48 also investigated CC-mode operation with the same device. Because the same device was used for both operations, the fitting from Fig. 3 and the input voltage from Fig. 4b can be used the predict the MCDI performance during CCmode operation. (Fig. 4a). There is excellent agreement with reported experimental results, which demonstrates that the LDL model could be applied to CC-mode operations in MCDI and, more generally, operations with time-varying voltages.
As a side note, the LDL model is flexible in that it allows for fitting using only concentration, only current, or both. This is demonstrated in Supplementary Note 4, where either only effluent ion concentration is used for fitting ( Supplementary Fig.  3), both current and ion concentration is used ( Supplementary Fig.  4), or the current is used as input ( Supplementary Fig. 5). Although it is possible to fit using only ion concentration or current through the device, using both could reduce overfitting as discussed in Supplementary Note 4.

Control circuit
The linear-state-space form that has been developed in this work has for the DL model makes it possible to implement a proportional integral derivative (PID) controller to automate the cell operation. Crucially, MATLAB provides support and great flexibility for tuning such controllers based on the performance requirements 57 . MATLAB code for creating the state-space model and automatically tuning such a controller is provided in Supplementary Note 1.
A tuned controller can automatically choose the input (μ, the voltage in Eq. 11) to make the output value y approach some  given reference signal r. This can reliably improve the operation's efficacy in various ways depending on the operating targets, such as energy efficiency, desalination rate, a combination of these, or the volume of produced water with a given outlet concentration.
For instance, the CDI device could reliably produce clean water with the same specified quality (ion concentration) while requiring a short time to reach this state (Fig. 5a). Similarly, the operation could be either in CC (Fig. 5b) or CV mode (discussed in Supplementary Notes 2 and 4). As previous works have argued that CC mode is more energy-efficient than CV 58 , this already means the controller can improve energy efficiency compared to a CV operation. Moreover, any combination of CC and CV mode could be achieved by choosing y = I/I 0 + V/V 0 , for some chosen constants I 0 ,V 0 (Fig. 5c). Because this operation combines the CC and CV modes, it is at least as good as the best of the CC and CV modes, and future work may find a combined operation (values for I 0 ,V 0 ) that is even more energetically efficient for a given system.   Response from a CDI control system. A state-space model was derived from Eq. 11 and the parameter fitting in Fig. 3. A PID controller was automatically tuned to this system using the tool in Supplementary Note 1 so that the model output y will approach a supplied reference signal r. The graphs show simulated step responses for the controlled CDI system, under various outputs y. a The output is the concentration y = (c 0 − c)/c 0 and the reference value is r = 0.2. b The output is the current y = I/I 0 (I 0 = 100[mA]) and the reference value is r = 1. c As an example of combined CC/CV operation, the output is a combination of the current and the voltage I ) and the reference value is r = 1.
Note that the control system in Fig. 5 is meant as a proof-of concept and does not include extra boundary conditions that might be present in reality, such as an upper limit on the current or voltage the cell and circuit can handle. However, MATLAB also provides the option to set limits on the input through the Simulink control systems interface (Fig. 6a, Supplementary Note 1). Thus, an actuator (the part limiting the maximum voltage) connects to the PID controller and the extracted model structure from Fig. 5. Figure 6b shows that the complete control circuit works very well for effectively and safely pursue operational targets. In this example, the reference signal c ref /c 0 = 0.5 makes the effluent concentration quickly and reliably reach half the ion concentration of the inlet solution. Crucially, the limit on the applied voltage means the ion removal steadily begins to drop when the adsorption is so large that the maximum voltage is no longer enough to achieve the target ion removal rate. In effect, this means the device operates as close as possible to the stated goals without exceeding the critical input boundaries.
So far, the work has found a linear-state-space formulation that basically allowed us to construct control systems that effectively and safely operate toward generalized objectives and efficiency metrics. Having these control systems intuitively means future work could systematically investigate the general operational modes to find faster and more energy-efficient operations. Also, although this work uses a PID controller as a proof-of-concept, notice that a strength of the linear-state-space formulation is that it additionally provides a solid foundation for deriving even more effective control systems in the future.

DISCUSSION
This work demonstrates that converting an existing physical model to a gray-box form could have several advantages.
First, the new LDL model retains all the properties of the DL model, such as predicting device performance with respect to the applied voltage in CV mode 54 , different flow rates 54 , ion concentration 53,54 , ion composition in multi-ion solutions 55 , and electrode asymmetry 53 . Therefore, this work has focused on showing data sets that are new in the LDL model, such as MCDI systems, CC charging, and control systems. Future work could focus on further expanding the LDL model by, for instance, including Faradaic reactions and spatiotemporal resolution. Also, to create a linear formulation, the LDL model has integrated the main concentration dependence into the state-space constants, which complicates the simulations of processes where the variation in concentration is large. Thus, future non-linear control systems have the potential to further improve control in such processes.
Second, the existing framework for state-space models made LDL more generally applicable and more flexible than DL. The LDL model could predict the performance of MCDI systems without increasing the model complexity. Also, it could predict the performance under time-varying voltages. The model makes it possible to fit and predict device performances without making any direct measurements of device components, such as contact resistance or cell structures, which could make the model easier to implement for large systems where direct measurements could be difficult to perform. Furthermore, the model allows fitting to the initial state, thus circumventing the need for the calibration experiments to start at equilibrium. This could be valuable if it takes time and several cycles for the CDI system to reach a stable operation after the first cycle is initiated, meaning that erratic data at the beginning of the experiment could be ignored when providing data to the model for fitting. In summary, we are excited to learn that the CDI process has fundamental underlying principles that make it possible to describe a wide variety of operations and cell structures even with limited device-specific knowledge, and enables powerful system-identification and control methods.
Third, software exists that automatically fits state-space models to data. In this work, the system-identification and control systems toolboxes have been used in MATLAB. These made it possible for us to develop MATLAB programs for automatically fitting and predicting with the LDL model, as well as constructing control systems. These programs have been provided in Supplementary Note 1 to make CDI modeling more accessible to researchers.
It is hoped that the work presented here could aid in modeling and controlling complex CDI process while inspiring researchers in other areas where physical modeling is prominent to try to improve their modeling process by incorporating elements from gray-box modeling.

METHODS
The idea behind this work is that the decoupled nature of the DL model 53 makes it a good candidate for constructing a linear-state-space formulation, and such formulations are well-adapted for system-identification and control.
As a background, a compressed version of the theory behind the DL model is derived at the beginning of the methods section. The full derivation can be seen in refs [53][54][55]59,60 . Mainly, a linear-state-space formulation version of the DL model (LDL) is derived. Also, we describe how the LDL can be used in system identification for parameter fitting and control systems for achieving the desired system dynamics.

The DL model
The basis for the DL model is the Langmuir isotherm 61,62 . In the Langmuir isotherm, gas molecules adsorb on a surface, and the fractional surface coverage, θ, changes at a rate depending on the number of free sites, (1−θ), the partial pressure, p A , and the adsorption/desorption rate constants, k ads and k des (Eq. 1). The Langmuir isotherm has been used for explaining adsorption in liquids by exchanging the partial pressure with the ion concentration c 61 , and this formulation has been used to model the concentration dependence of the equilibrium adsorption in CDI [63][64][65][66][67][68][69] .
To construct a formulation that can incorporate the operational parameter in CDI electro-adsorption, several aspects should be noted. The number of sites can be interpreted as voltage-induced sites S, which depend proportionally on the applied voltage V, rather than physical sites. We will not consider electrode redox reactions, so the intended voltage operation range is in the non-Faradaic (V < 1.2) regime. Also, the fundamental mechanism during the electro-adsorption is the storage of charged species, σ, onto these sites. Thus, Eq. 1 can be simplified to Eq. 2, where subscript "ads" denote the corresponding adsorbed quantities (Eq. 2).
Ideally, the number of charged species, σ, would relate to the ion concentration, c, through the ion valency, z, as σ ads = zc ads . However, the effective number of surface sites available for ion adsorption, S′, is typically lower than the sites for charges (S′ = S−β, where β is a constant at a given initial concentration) (Eq. 3). This could be due to several reasons, but the main factor considered here is co-ion expulsion; that is, the blockage could be attributed to charge storage leading to co-ions being repelled from the surface rather than counterions being attracted. The magnitude of the blockage can be attributed to charges on the electrodes being neutralized 70 (reduces S by β 0 , a constant) and a passive presence of ions close to the pore wall before the CDI process starts (reduces S by β 1 c 0 , where β 1 is a constant and c 0 is the initial concentration), so that β = β 0 + β 1 c.
In experiments, typically the effluent concentration of the cell is measured rather than the number of ions adsorbed on the electrodes. Consider a uniform, well-stirred container (CDI cell) with a cell-free volume, v c . At any given time, all ions inside the cell are either adsorbed on the electrodes or free in the solution (total molar content at time t: v c c(t) = v c c ads (t)). Over a time period dt, the molar influx of ions (concentration c in , water volume flow Q) is Qc in dt while the outflux is Qcdt. Thus, a formula for the time-varying molar content inside the cell can be represented by Eq. 4, which can be rearranged into Eq. 5 when going to the limit of small dt. ν c c t þ dt ð Þþν c c ads t þ dt ð Þ¼ν c c t ð Þ þ ν c c ads t ð Þ þ Qc in dt À Qcdt (4) State-space formulation Linear-state-space formulations are desirable because they are relatively easy to use for system identification and control. They can be generally written as in Eqs. 6 and 7. Here x is the vector of internal states (typically c c ads σ ads ½ T in CDI), μ is the vector of system inputs (how the operator affects the system, typically [Vc in ] T , the voltage and inlet concentration), y is the vector of system outputs (what is measured, typically [cI] T , the outlet concentration and the current through the CDI device), and A, B, C, D are matrices.
Assuming the variation in the cell concentration during the experiment is not large enough to significantly change the adsorption/desorption rate, the concentration c in Eq. 3 can be exchanged for c 0 (Eq. 8).
dcads dt ¼ k ads c S À β 0 À β 1 c 0 À c ads z ð Þ À k des c ads % k ads S À β 0 ð Þ c 0 À k ads β 1 c 2 0 À Á À k ads c 0 z þ k des ð Þ c ads The parameter K a depends on one term that is proportional to the applied voltage and one that arises from unideal charge efficiency (K a ¼ maxð0; K ads V À K Λ Þ, where K ads and K Λ are constants corresponding to the terms in K a with S or β 0 and β 1 , respectively). Note that, the voltages applied to desalinate are typically such that, K a = K ads V−K Λ , can be used 53 (otherwise there would be no ion removal). In the regeneration phase, linearizing by directly removing the "max" induces a small error at 0 V discharge, unless the charge efficiency is high, which could be addressed by raising the discharge voltage during the regeneration phase while extracting the calibration data, or by fitting the model to data from the desalination phase only. Following the normal rules for matrix multiplication and the definition above, Eq. 9 summarizes the results from Eqs. 4, 7, and 2. Note that σ = zc (charge neutrality is assumed in bulk water) and that Q Q=ν c .
The expression in Eq. 9 can be rewritten in several different forms depending on the choice of input and the type of operation (elaborated in Supplementary Notes 2 and 3). For instance, it can be adapted to batchmode operations or rewritten to simplify parameter fitting when only concentration or current data is available.
Note that depending on the available data the output state can be chosen to incorporate either the effluent ion concentration or the current through the device, or both. Assuming the internal states and inputs in Eq. 9, only using the effluent ion concentration as output corresponds to y = c (C = [1000], D = [00])). We define the parameter relating the change in charge concentration to total current as, η ¼ Fzν c ¼ FzQ=Q, where F is the Faraday constant. Then, using the current passing through the cell corresponds to, ). Finally, using both the above formulations, we can simplify to y ¼ c η _ σ ads ½ T . Normalizing the model states might increase the stability of the fitting process (Eq. 10). Thus, we introduce the normalizations c ¼ c À c 0 ð Þ =c 0 , c ads ¼ c ads =c 0 , σ ads ¼ σ ads =c 0 , K ads ¼ K ads =c 0 , and K b ¼ K b . Here, c; σ ads ; and c ads are unitless, while the unit of K ads is s −1 V −1 and K b is s −1 . Notice that K b is supposed to be the same as K b because the normalized parameters come from Eq. 8 divided by c 0 . Thus, this unchanged parameter's altered notation just stresses that K b has the same unit as K ads with respect to concentration. Also, let η ¼ ηc 0 , be the factor that relates the normalized variation in the concentration of charges to the total current passing through the cell.
Because this work will move towards creating a controller, notice that a typical CDI operation will run in either continuous mode or batch mode, which prevents the controller from freely choosing the inlet concentration. This means a formulation with only V as input would be more appropriate for creating the controller. Thus, in Eq. 10, we have additionally assumed the inlet concentration to be constant (c in = c 0 ), which removes the concentration from the input matrix to make the voltage the sole system input.
Furthermore, reducing the number of unknown parameters could improve stability and reduce overfitting. Note, therefore, that if the charge efficiency is high, such as by utilizing ion-selective membranes 71 , increasing the charging voltage 39 , treating the electrodes 71 , or raising the discharge voltage 72 , this simplifies the model formulation (Eq. 11). Note also, that if a CV charging is used and only effluent concentration data are supplied to the model (data for the current through the cell is not included in the fitting), then Eq. 11 must be used instead of Eq. 10 since the contributions from the K Λ and K ads parameters become indistinguishable (not identifiable).
Regarding the output states, in Eq. 12, y 1 ¼ c thus corresponds to the normalized effluent concentration, whereas y 2 ¼ η _ c ads corresponds to the current through the cell at high charging efficiency. Note that if the cellfree volume v c is measured (and thus known), K ads and K b are the only unknown parameters. However, the model can be implemented without measuring v c separately by declaring Q as a fitting parameter in the provided program (it is assumed that the volume flow rate, Q, is known, so v c is uniquely determined by the relative volume flow rate, Q). The program will then find the value of Q that yields the best fit to data, considering both the direct dependence on Q in Eq. 11 and the indirect dependence on Q through η in Eq. 12.

Implementation
For a purely physical model, the model structure and the model parameters are either known or calculated in specific experiments. In contrast, when constructing the gray-box model, we will derive a physical structure where unknown model parameters are extracted through generalized comparisons with time-series data.
Here, The LDL model was implemented in MATLAB 73 using the systemidentification toolbox; specifically, the linear gray-box estimation (grayest) was used 6 . Such modeling requires the user to specify the input μ, the reference data y, and the A, B, C, D matrices. Based on that the unknown parameters are automatically extracted.
A control system was implemented in MATLAB using the control system toolbox 57 . The program takes as input a calibrated model, extracted with the system-identification software, that can be implemented to automatically tune a PID controller.
In Supplementary Note 1, the programs and all data used are disclosed. The program files include a "data" folder with all the data sets used in this manuscript, a "files" folder and a "main" script for running the systemidentification program, an "equation" folder with scripts describing different implementations of Eq. 8-10, and a "control" file for tuning the controller. An instruction document is also provided to facilitate learning and implementing the program for simulation.

Experimental validation
The derived model was validated solely using experimental data from reports available in the literature. These were chosen so that CDI/MCDI, continuous/batch mode, CV/CC charging were represented. In addition, data from the literature were chosen that reported multiple data sets, showing different operational modes, such that the model could be used for fitting one data set while predicting the others. Note that both the extracted data and the program used to implement the model are provided in Supplementary Note 1.

DATA AVAILABILITY
All data used can be found in Supplementary Note 1.

CODE AVAILABILITY
The code used to implement the LDL model can found in Supplementary Note 1.