Introduction

Activated sludge is the most widely used biological technology for treating domestic and industrial wastewater. After its development with 100 years of history, many novel and modified processes have been developed to meet the more and more stringent discharge and emission limits. However, most of operating systems are suffering some drawbacks, such as substantial energy consumption, excessive greenhouse gas emission and labour-intensive industry. As a powerful tool, Activated Sludge Models (ASMs) have proven to be very useful in process design, operation and optimization1,2. To date, ASMs for the simulation of biological nutrients removal processes have been updated from the first version of ASM1 to more complicated extensions, including ASM2, ASM2d and ASM33 and further to the extended ASM3s4,5,6,7,8,9,10 in order to satisfy various requirements.

However, ASMs are large and overparameterized models in terms of having many stoichiometric and kinetic parameters. Some of the model parameters as well as the model structure have to be adjusted, since microbial community structure and dominant species can vary in different wastewater treatment systems with different influent characteristics or operation schemes6,7,11,12,13,14. In addition, the collected data from full-scale plants as well as pilot- or lab- scale reactors can hardly provide reliable estimations of all the parameters simultaneously due to the well-known problem of poorly identifiable parameters15,16,17,18,19,20. Thus, the approach to properly select the subsets of parameters for model calibration plays a crucial role on simulation results and model applications21,22,23,24,25.

Until now, substantial studies have been conducted to develop effective model calibration approaches, which can be distinguished into two major categories: the conventional experience-based approaches and the systems analysis approaches22. The experience-based approaches that were proposed by WERF, BIOMATH, STOWA, CALAGUA, HSG etc., import programmatic flow based on experts' knowledge and experience22,26,27. However, these calibration protocols require specific experimental designs and data processing methods to decouple the ASMs to be small and simple sub ones. The data for model calibration and validation may be from different feeding and operation conditions and some of the parameters are fixed in order to obtain accurate estimation of the concerning ones. This approach might also ignore shifts in microbial community, eventually resulting in error estimations. Recently, the Good Modeling Practice (GMP) task group from IWA have emphasized that a standardized modeling procedure is needed to distinguish parameter subset that is identifiable with the available data for experience-based approaches28.

Systematical analysis approaches have attracted much more attention15,18,29,30,31,32, which mainly consist of parameter identification, sensitivity analysis and error propagation32. Conventional parameter subset identifiability analyzing methods are mainly based on the local sensitivity analysis21,22. Recently, global sensitivity analysis (GSA) becomes a promising method for parameter identifiability analysis33,34,35, such as the Morris screening method36, Fourier Amplitude Sensitivity Testing (FAST), Extended-FAST, the Sobol's method37,38,39,40,41 and standardized regression coefficient (SRC) method33. Structural identification through Taylor series expansion16, generating series42, similarity transformation approach43 and differential algebra approach17,44, have been proposed in order to screen out the identifiable combinations. Based on the symbolic algebra system, structural identification has been applied to ASM-type simple sub-models successfully16,17. However, structural identification based on symbolic algebra system could be time consuming and unreliable due to the large number of parameters in ASMs. Therefore, it is highly desired to develop a numerical integrated approach by including both proper global parameter sensitivity analysis method and correlation analysis method for the identifiability of the parameters of the ASMs.

The main objective of this study is to develop a novel systematical approach, namely Numerical Optimal Approaching Procedure (NOAP), for efficient calibration and validation of ASMs, taking an extended ASM3 as example. The NOAP integrates a new screening global parameter sensitivity analysis method (Derivative based Global Sensitivity Measures, DGSM) and a numerical correlation analysis method (based on the pseudo-global covariance matrix), with Genetic Algorithm (GA) as the global optimization (GO) method for parameter estimation. The validity and applicability of the approach for efficient model calibration are tested by independent experimental data from a Sequential Batch Reactor (SBR) activated sludge system45 and a Continuous Stirred-Tank Reactor (CSTR) activated sludge system46.

Results

Development of the NOAP approach

The iterative algorithm of NOAP approach, as shown in the main scheme flow in Fig. 1, mainly consists of four steps in succession.

Figure 1
figure 1

Scheme flow of the proposed NOAP (PE, parameter estimation; GO, global optimization).

Step I: Model selection and data preparation. The user should firstly choose an appropriate model and extend it to satisfy the specific requirement from the current available models in previous studies1,47. Data collected from lab-, pilot- and full-scale reactors are often used, but the amount and quality need reconciliation for the basic request of model calibration. The data collection and reconciliation can refer to the procedures suggested by Rieger et al.28.

Step II: GSA for factors fixing. Global parameter sensitivity analysis would be performed only if (i) the initial values of the state variables are set in a reasonable range based on data analysis and ii) parameter boundaries (lowest and highest) for global sensitivity analysis are proper. A fitting goodness criterion (such as A = 85%) between data and simulation results is used to judge parameter subset remained after factor fixing from DGSM analysis result. If the result is satisfied (R2 ≥ A), the parameters with lower sensitivity will be removed with trial, then repeating estimation trial until a proper small enough parameter subset is selected out. Otherwise, more non-sensitive ones should be added into the target subset one by one with the iterative parameter estimation trial. If parameters are all estimated simultaneously without a satisfied result, boundaries of the parameters or model structure might be improper and need further modifications.

Step III: For the scale reduced parameter subset, a pseudo-global correlation matrix is calculated. If the correlation coefficient of any parameter pair is high enough (>0.95 as example), then such parameter combinations are located to be highly correlated parameters due to the statistical average effect of the numerical algorithm. Afterwards, less interested members and correlation crossing ones (high correlated with not only just one parameter) can be fixed.

Step IV: The final parameter estimation is performed to check the efficiency of the procedure. Another fitting goodness criterion (such as B = 95%) is set to roughly test the quality of the observables' collected data.

Preliminary model calibration based on conventional approach

According to Kaelin et al.6, the parameter subset [μAOB, μNOB, , , , ] is firstly adopted and used to perform preliminary model calibration. Through this step, it is expected to: (1) determine whether the experimental data from the target system could be simulated by the proposed model; (2) suggest proper initial values of the state variables; and (3) verify directly whether the reported parameter subset from literature is efficient or not.

Scenario 1: SBR system

As shown in Fig. 2(a), the simulated results (Sim-Re series) have a good fitness with the experimental data in terms of the profiles of COD, nitrite and nitrate concentrations. However, an unacceptable accumulation of ammonium during anoxic period is observed in the model simulations, which might be due to improper parameter values. Moreover, the validation (Sim-Re series) illustrated in Fig. 2(b) further verifies that the simulated ammonium results are much higher than the experimental data (R2(SNH Sim-Re) = 0.5011).

Figure 2
figure 2

Calibration and validation results with the parameter subset suggested by the reference 6 and the proposed procedure NOAP for target SBR system.

(A, Calibration results with the parameter subset recommended by the reference (Sim-Re)6 and the NOAP procedure (Sim-Pr); B, Validation for the calibration results as Fig. 2(a) presented, legends are the same as Fig. 2(a); C, Validation adjusted for the calibration with the parameter subset recommended by the reference6 and the proposed procedure NOAP with the suggestions of the global parameter sensitivity and correlation analysis results (Here is added to the calibration subset), legends are the same as Fig. 2(a).)

Regarding the SBR system for nitrogen removal, the parameter subset selected from literature can partially calibrate the model to describe most of the experimental data. However, the extra efforts are needed to select other parameter subsets due to the failure of model validation.

Scenario 2: CSTR system

As shown in Fig. 3(a), the simulated (Sim-Re series) COD and ammonium concentrations in effluent fit well with the experimental data. However, the simulated nitrite concentrations are slightly higher than the experimental data (R2(Eff. SNO2 Sim-Re) = −0.3617) and the simulated nitrate concentrations are distinctly lower than the experimental data after the 30th day (R2(Eff. SNO3 Sim-Re) = −1.0290). The validation results (Sim-Re series) in Fig. 3(b) suggest that all the measured nitrogen species concentrations (including ammonium, nitrite and nitrate) deviate distinctly from the experiment data (R2(Eff. SNH Sim-Re) = −3.2151; R2(Eff. SNO2 Sim-Re) = −16.4224; R2(Eff. SNO3 Sim-Re) = −3.3828). This might be attributed to the improper parameter values of the dissolved oxygen (DO)-related kinetic coefficients, such as KH, O2, inh, KAOB, O2 and KNOB, O2. Hence, the model calibration is failed when applying the literature reported parameter subset6 into the CSTR system.

Figure 3
figure 3

Calibration and validation with the parameter subset recommended by the reference6 and the proposed procedure NOAP for target CSTR system (A, Calibration results with the parameter subset suggested by the reference6 and the NOAP procedure (Sim-Pr); B, Validation for the calibration results as Fig. 3(a) presented.)

The preliminary calibration results from SBR and CSTR systems indicate that the parameter sensitivity vary across different activated sludge systems in the application of same model. Moreover, the parameter subset reported in literature could lead to poor model calibration results if fixing the remained parameters (e.g., the DO related parameters in Scenario 2). Thus, reliable parameter sensitivity analysis should be imported for model application in different activated sludge systems. It should be noted that the parameter subset reported in literature could be still useful for qualitatively model predictions, although it may not serve as a precise and quantitative predictor to match all experiment data.

Parameter sensitivity and correlation analysis

According to the preliminary calibration and validation, it can be concluded that the extended ASM3 model could predict the data only if a proper parameter subset is selected. However, the parameter subset reported in literature6 cannot achieve satisfied calibration or validation results for both SBR and CSTR systems. Thus, numeric global parameter sensitivity and correlation analysis are performed to determine a more proper parameter subset.

In order to assess the identification of all the related parameters listed in Table S3 (SI), parameter sensitivity and correlation analysis are conducted for both SBR and CSTR systems. The proposed iterative trials (Fig. 1) are implemented in order to capture the best-fit parameter values. The results are shown in Fig. 4 and Fig. 5, respectively. It can be found that both sensitivity analysis and correlation analysis results are quite different between the SBR system and CSTR system. Furthermore, parameters indexed from 1 to 6 in Table S4 (SI) exhibit a similar low sensitivity for both SBR and CSTR systems, suggesting the composition coefficients could be fixed as default values due to their low sensitivity. However, most of the composition coefficients strongly depend on the influent wastewater characteristics. Therefore, it is recommended that the N content of biomass can be fixed as default values from references, while others (e.g. COD transformation fractions) need to be obtained experimentally.

Figure 4
figure 4

Global parameter sensitivity analysis results.

(A, Parameter sensitivity analysis for the SBR system (parameter 8, 9 and 10 reached 0.20, 0.13 and 0.46, respectively; B, Parameter sensitivity analysis results for the CSTR system; The indexes and their corresponding parameters are listed as the follows: 1- iN,SS, 2- iN,XI, 3- iN,BM, 4- fXI, , , , , , , 11-YAOB, 12-YNOB, 13- kH, 14- kSTO, 15-μH, 16-μAOB, 17-μNOB, , , 20-bAOB, 21-bNOB, , , , , 26-ηN,end, 27- KX, , , 30-KH,SS, , , , 34-KH,ALK, 35-KH,STO, , , , , 40-KN,ALK. The meanings of each parameter can be found in Table S4, SI).

Figure 5
figure 5

Global parameter correlation analysis results (color of the off-diagonal elements represents related parameters' correlation, between -1 and 1) (A, Parameter correlation analysis result matrix for the SBR system; B, Parameter correlation analysis result matrix for the CSTR system.

The indexes and their corresponding parameters are the same with Fig. 4.)

For the SBR system, parameters including, , , , YAOB, kSTO, μAOB, μNOB, and show high sensitivity, with their normalized relative influences reach 0.01 or even higher (Fig. 4(a)). Meanwhile, according to Fig. 5(a), the highly correlated parameter pairs are vs. (R2 = 0.98), YAOB vs. μNOB (R2 = 0.99), YAOB vs. (R2 = 0.99) and μAOB vs. (R2 = 0.99). In addition, the sensitivity of is much higher than and YAOB and is highly correlated with other parameters, thus, , YAOB and should be eliminated from the parameter subset. Finally, a parameter subset [, , , kSTO, μAOB, μNOB, ] has been selected for the SBR system. Similarly, the parameter subset selected for CSTR system has been determined as [, , , , , KH,NO3, , ]. The corresponding calibration and validation achieve satisfied results as shown in Fig. 2(b) and Fig. 3(b).

Optimal parameter subsets of SBR and CSTR systems for the calibration of the extended ASM3 are highly different. In the SBR system, three processes dominate the biological activities, which are storage of SS by heterotrophic organisms, growth of autotrophic organisms and denitrification via nitrite as the electron acceptor. These results are consistent with the experimental observations in the target SBR system, in which simultaneous nitrification and denitrification (SND) was observed and the nitrogen loss due to SND under aerobic condition was about 10 ~ 20 mg/L45. In contrast, for the CSTR system, the dominated biological activities include storage of SS by heterotrophic organisms, denitrification and inhibition of DO on heterotrophic denitrification and autotrophic growth. Hence, different optimal parameter subsets of SBR and CSTR systems may be mainly attributed to the different operation conditions, which lead to different dominant biological activities. From the modeling perspective, different initial values of the state variables, inputs like DO, influent COD and ammonium and the parameter boundaries would lead to different optimal parameter subsets.

Parameter estimation and model validation based on the NOAP approach

In order to compare with the results obtained through the experience approach (i.e., preliminary calibration), the parameter subsets for SBR and CSTR systems determined by NOAP are further utilized for calibration and validation in this section. All the parameter estimation results can be found in Table 1.

Table 1 Parameter estimation results of each model calibrations after global optimization using Genetic Algorithm

Scenario 1: SBR system

As illustrated in Fig. 2(a), the subset of parameters determined by the NOAP can efficiently calibrate the extended ASM3 through the GO algorithm (see Sim-Pr series). The results indicate that a satisfied calibration could be obtained by different parameter subsets under the condition of limited data availability. Compared to the results of the traditional experience-based approach's calibration results (Fig. 2(a) Sim-Re series), the accumulation of ammonium during anoxic period in the NOAP calibration (Fig. 2(a) Sim-Pr series) is less, demonstrating a better fitting ability of the proposed NOAP approach. Furthermore, the nitrate reduction rate becomes slower when the biodegradable COD concentration is insufficient for denitrification. The simulated nitrate profile deviate slightly from the trend of the experimental nitrate concentrations, which indicates the parameter subset determined by the NOAP is more sensitive than that suggested by the traditional experience-based approach.

Although the calibration is successfully achieved, the initial validation (Fig. 2(b) Sim-Pr series) is still failed (R2(SNH Sim-Pr) = 0.0997), similar with the validation by the experience-based parameter subset (Fig. 2(b) Sim-Re series). However, the proposed NOAP provides possible solutions to improve the simulation performance. After the iterative parameter subset selection procedure by NOAP, the parameter is further added into the parameter subset for re-calibration of the model. As shown in Fig. 2 (c), the performance of re-calibration for both parameter subsets (suggested by the traditional experience-based approach and the NOAP, respectively) are improved after adding the parameter in the parameter subset. In addition, the parameter subset [, , , kSTO, μAOB, μNOB, , ] determined by the NOAP can reach a better fitting goodness, as the simulated ammonium concentrations match better with the original data during the anoxic period (Fig. 2 (c)). Thus, the proposed NOAP demonstrates its ability to provide additional information for improving model calibration.

Scenario 2: CSTR system

As present as Fig. 3, the calibration results (see Sim-Pr series) with parameter subset determined by the NOAP procedure achieve a much better fitting goodness with the experimental data (as shown in Fig. 3(a)). The validation of the calibrated parameters (Fig. 3(b)) also shows a good fitting except the period from day 99 to day 101, in which a lower ammonium and higher nitrite concentrations are predicted. The reason for such phenomena might result from the use of DO as an input for simulation and the DO concentrations during the period are higher than the real concentrations. It can be concluded that parameter subset [, , , , , KH,NO3, , ] selected by the proposed NOAP could achieve a better fitting, with the same GO algorithm.

Discussion

In this work, a novel NOAP approach for the efficient calibration of activated sludge models with limited available data has been proposed. The proposed NOAP integrates a new numerical global parameter sensitivity analysis method (DGSM) for factor fixing and a numerical pseudo-global parameter correlation analysis method for non-identifiable parameter detection to determine the optimal parameter subset for model calibration. The validity and applicability of the approach for efficient model calibration is confirmed by two different activated sludge systems (SBR and CSTR systems). The model calibration results suggested that the optimal sensitive parameter subsets of the SBR and CSTR system are different despite with the same extended ASM3 model to calibrate. Even with the same biomass collected from a municipal WWTP, two SBR reactors finally result in different optimal parameter subsets due to different operational conditions. The results indicate that the parameter subsets determined by NOAP can tail with the state variation of the system. This outstands from the experience-based procedures in calibrating dynamic systems as activated sludge systems whose parameters and structure can vary gradually, which would facilitate modeling automation a lot to support more optimization applications of WWTP.

The optimal parameter subset is different and specific for various systems, because of differences in environmental conditions, influent characteristics, operation modes and biomass population. As a black-box method, conventional experience-based model calibration procedures construct a mapping relation between data and model parameter subset, in which mapping routines are based on experts' empirical knowledge. However, uncertainty would be inevitable due to the arbitrary subset selection. For example, these risks may be distinctly enlarged when modeling the dynamic SBR scenario. An efficient calibration procedure is not only simple to fit the trend of historical data by manually selecting a parameter subset, but should be competent for optimal parameter subset determination, with the aids of efficient parameter estimation algorithms. The proposed NOAP in this study could be a promising alternative to fulfill the described demands. Since the global sensitivity analysis possesses the ability to evaluate uncertainty impact of the concerning factors on the model outputs scientifically. In fact, through factors fixing, parameter estimation is performed for top uncertainty introducers, which would reduce outputs' uncertainty in maximization. Consequently, through parameter estimation of subset determined by the global sensitivity analysis, the possibility of successful model calibration and prediction can be maximized. Furthermore, parameter subset optimized by numerical global parameter correlation analysis would enhance the success of calibration and validation. Simultaneously, the proposed NOAP method can quickly capture the shifts of system states through continuous updating of the known Factors' values and unknown Factors' boundary variations. Thus, the proposed NOAP is a promising and useful tool for the efficient calibration of ASMs and could potentially apply to other ordinary differential equations models.

Currently, the proposed NOAP procedure is a decision-helping tool, rather than an automatic protocol. In fact, it is expected to develop a fully automatic calibration procedure in future. Firstly, the automatic calibration procedure should recognize the optimal parameter subset for any models and organize efficient parameter estimation automatically and robustly, through efficient numerical or symbolic algebra calculation approaches. Secondly, automatic methods are available for optimal experiments design and data collection, when uncertainty analysis of the parameter estimated is necessary in case of improving confidence intervals. Moreover, this calibration procedure can provide implementation procedure for automatic modification, selection and extension of model structure, when both model calibration and validation failed after data optimization. Most importantly, the calibrated model cannot only monitor and predict the overall process dynamics, but also facilitate the operators to achieve optimal control of a target system.

Methods

Parameter estimation, parameter sensitivity and correlation analysis

The parameter estimation is a critical step of the model calibration process. Stochastic global optimization algorithms can find the global minimum of the objective function given by equation (1)48,49, where y0 = [ COD(0) SNH4(0) SNO2(0) SNO3(0) ]T as the example. Fitting goodness criterion function for each observable is given by equation (2). Characterizations and explanations of the symbols presented in this section can be found in the Table S1 (Supporting Information, SI).

GA possesses the advantages of easy implementation and mature codes to reuse compared to other resembled technologies50,51,52. In this study, the MATLAB R2010a (Global Optimization Toolbox) is referred as the numerical function implementation of GA (The Mathworks Inc. USA).

To realize factor fixing, a Derivative based Global Sensitivity Measures (DGSM) method is introduced to perform the global sensitivity analysis53,54. By comparisons among DGSM, Morris and Sobol's method, it indicates that: a) DGSM shows much higher convergence rate and more accurate than Morris method for non-monotonic functions; b) there is a link between DGSM and Sobol' global sensitivity indices, but the computational time required for numerical evaluation of DGSM measures is many orders of magnitude lower55,56. Essentially, the DGSM method is based on the local sensitivity measure, but perform an average of the local sensitivity measure throughout the parameter space by introducing Quasi Monte Carlo sampling methods. The relative time varying sensitivity matrix is described as the following equation (3).

Average over the parameter space using Quasi Monte Carlo sampling methods, a measure can be defined as the equation (4).

The numerical computation format can be expressed as the equation (5).

To overcome the time varying character, the global sensitivity analysis indices of each parameter are defined as equation (6).

About the global Correlation Analysis, a pseudo-global correlation matrix is introduced54. The local Fisher Information Matrix (FIM) is described as equation (7).

The derivative covariance matrix is an approximation of the inverse of the FIM as equation (8).

To introduce the pseudo-global covariance matrix, the local covariance matrix needs to be averaged throughout the parameter space like DGSM done with each objective function's value as the weight as equation (9).

According to the pseudo-global covariance matrix, the correlation matrix is defined as equation (10).

Based on the equations (6) and (10), parameter sensitivity ranking order and correlation relationships would be produced systematically.

Activated sludge model and experimental data for NOAP testing

An extended ASM3 for two-step nitrification and denitrification6 is used for verifying the proposed procedure. The model inherits the basic mechanism settings of ASM3, in the frame of “Hydrolysis – Storage – Growth - Respiration”, nitrification and denitrification are extended to meet current need of description for main intermediate product – nitrite. The kinetic equations and stoichiometric matrix are presented in Table S2 and Table S3 (SI), respectively. Model structure and parameter settings are kept as the original for possibility of results comparison.

In addition, basic stoichiometric and kinetic parameters related information is presented in the Table S4 (SI), as well as parameter boundaries for GSA and parameter estimation. The validity and applicability of the approach is confirmed using experimental data obtained with two independent wastewater treatment systems, including SBR45 and CSTR46, respectively.

Experimental data of the SBR related scenario were collected from two reactors with a working volume of 14 L. Both reactors were seeded with the same inoculum from a full-scale municipal WWTP, but operated in different modes45. One was operated with the complete nitrification mode, while the other one was operated with the partial nitrification mode. The complete nitrification mode was operated in the aerobic-anoxic scheme with extensive aeration. Each cycle of the aerobic-anoxic scheme consisted of 3 min feeding, aeration, anoxic phase, 1 h settling, 6 min decanting and 1 min idling. Aeration was still provided for another 0.5 h after the ammonium has been completely oxidized to nitrite, which would offer ideal environment for the nitrite-oxidizing bacteria to oxidize the nitrite successively to nitrate completely. The data from the complete nitrification reactor are used for the preliminary model calibration because the kinetic properties of the microorganisms in the system can be properly captured by these data series. In addition, the reactor with partial nitrification mode was also operated in the aerobic-anoxic scheme, but aeration duration was controlled through a real-time control system. The data from the partial nitrification reactor with obvious nitrite accumulation are applied to validate the preliminary model calibration results45.

The lab-scale CSTR was set up to achieve partial nitritation. The reactor had an effective reaction volume of 4 L, followed by an clarifier with a working volume of 3.5 L. Sludge retention time (SRT) was kept at 12 days by wasting sludge from the secondary clarifier. Experimental data of the CSTR related scenario illustrated an obvious effect of DO on the nitrite accumulation in the CSTR system, which could be used to identify and estimate DO related switching function parameters. DO play an important role in biological nitrogen removal processes. Controlling DO at a proper level can not only reduce energy consumption, but also favor the partial nitrification for nitrogen removal via nitrite57. During the simulation in this case, experimental data from 113-day operation of the CSTR are divided into two groups. One group (0th to 75th day) is used for the preliminary calibration, while the other group (75th to 113th day) is used for the validation.