A modeling method for the development of a bioprocess to optimally produce umqombothi (a South African traditional beer)

Bioprocess development for umqombothi (a South African traditional beer) as with other traditional beer products can be complex. As a result, beverage bioprocess development is shifting towards new systematic protocols of experimentation. Traditional optimization methods such as response surface methodology (RSM) require further comparison with a relevant machine learning system. Artificial neural network (ANN) is an effective non-linear multivariate tool in bioprocessing, with enormous generalization, prediction, and validation capabilities. ANN bioprocess development and optimization of umqombothi were done using RSM and ANN. The optimum condition values were 1.1 h, 29.3 °C, and 25.9 h for cooking time, fermentation temperature, and fermentation time, respectively. RSM was an effective tool for the optimization of umqombothi’s bioprocessing parameters shown by the coefficient of determination (R2) closer to 1. RSM significant parameters: alcohol content, total soluble solids (TSS), and pH had R2 values of 0.94, 0.93, and 0.99 respectively while the constructed ANN significant parameters: alcohol content, TSS, and viscosity had R2 values of 0.96, 0.96, and 0.92 respectively. The correlation between experimental and predicted values suggested that both RSM and ANN were suitable bioprocess development and optimization tools.


Scientific Reports
| (2021) 11:20626 | https://doi.org/10.1038/s41598-021-00097-w www.nature.com/scientificreports/ was used to generate 20 experimental runs. The input factors were cooking time (hr), fermentation temperature (°C), and fermentation time (hr) ( Table 1). Following experimental combinations (Table 2) subsequent experiments were conducted. Samples were withdrawn after each experimental run (done in triplicates) and alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, viscosity (cm/min) were determined. The Design-Expert software was also used to analyze and compute a second-order polynomial model to estimate and predict response values over a range of input parameter values by determining which input factors influenced responses, and the direction of that drive for the designed experiments as depicted in Eq. (1) below: where Y indicated the response variable (optimal production parameter), β o the intercept of the response variable, while β i , β ii , and β ij were coefficients corresponding to the factor x i , x j ( i, j = 1, 2, …, n). The input variables that affected the response Y were x 1 ,x 2 , x 3 . The random error was represented by ε.
Neural network construction and fitting. Experimental data was organized and used for the development of ANN prediction models. A matrix laboratory MATLAB R2020a (MathWorks, Massachusetts, USA) software was used for the design of function fitting neural network. A feed-forward neural network with two layers was used. The first layer was the input layer and the second layer was the output layer, both of which were triggered using the sigmoid activation function. Cooking time (hr), fermentation temperature (°C) and fermentation time (hr) were used as network inputs and alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, and viscosity (cm/min), were each used as the outputs to develop several networks and to determine the optimal network topology. Experimental data were randomly divided for training, validation, and testing. For training, 14 (70%) instances were used, 3 (15%) for validation and 3 (15%) for testing. The ANN model was then trained, validated, and tested by the Levenberg-Marquardt (LM) training algorithm. To further study the responses of the model, Bayesian Regularization (BR) and Scaled Conjugate Gradient (SCG) training algorithms were also evaluated. The network was trained until the coefficient of correlation (R) was closer to 1.
(1)  Total soluble solids. The total soluble solids of the finished beer were determined using a digital refractometer (Hanna Instruments (Pty) Ltd., Johannesburg, South Africa). A clean pipette was used to place 0.5-1 ml of the finished beer on the sample well. The refractive indices of the samples were then recorded accordingly.
Viscosity. The consistometer (Endecotts, London, United Kingdom) was used to determine the consistency of the finished beer (cm/min) by pouring 100 ml of the sample into the reservoir behind the gate of the consistometer. The lock release lever was released to instantaneously open the gate, allowing the liquid to flow over the instrument's graduated scale for 1 min.
Total titratable acidity. The American Association of Cereal Chemists (AACC) 02-31 19 approved method was used to determine the total titratable acidity whereby 10 g of the sample was dissolved in 100 ml distilled water. The solution was well mixed and 0.5 ml of 1% phenolphthalein indicator was added. Finally, standardized 0.1 N sodium hydroxide was used to titrate the prepared solution until a faint pink color was observed. Titratable acidity (in terms of lactic acid %) = volume (ml) required / 20.

Statistical analysis.
All experiments and analyses were conducted in triplicates. ANOVA was employed to determine the significance of the generated models. Design-Expert software version 11.0.0 (Stat-Ease Inc., Minneapolis, USA) was used to determine the Response (Y) of the second-order polynomial equation, the coefficient of determination (R 2 ), the 'predicted R-squared' and 'adjusted R-squared' , the coefficient of variance (CV), and the 'probability F' value.

Statement on experimental research and field studies on plants.
We confirm that the use of plant-based cereals in our study complied with the relevant institutional, national, and international guidelines and legislation, in particular the IUCN Policy Statement on Research Involving Species at Risk of Extinction.

Results and discussion
The effect of cooking time, fermentation temperature, and fermentation time on the alcohol content, TSS, TTA, pH, and viscosity were investigated. Optimization of cooking time, fermentation temperature, and fermentation time is essential for maintaining consistent physicochemical properties, curbing undesired changes that may occur during bioprocessing, and understanding the interactions among these process variables at different conditions 1 . In beer production, these are principal factors that influence the final product and its acceptance by consumers 20,21 .
The effect of input factors on the physicochemical properties of the beer. Alcohol content. Samples fermented for a longer period (≥ 60 h) at a relatively higher temperature (≥ 30 °C) contained a lower alcohol content (Table 3, see experimental run numbers 1, 4, 7, 9, 15, and 20). Generally, a higher fermentation temperature affects the rate of sugar metabolism (i.e., leads to a rapid increase in alcohol content and other by-products such as volatile compounds) 21 . On a contrary, in this study, a higher temperature accompanied by a longer fermentation time led to a lower alcohol content (Table 3). Given these conditions, a low alcohol content may be attributed to evaporative ethanol loss. It's not uncommon for product inhibition to occur during simultaneous saccharification and fermentation, whereby ethanol, a fermentation product, inhibits zymase over time while the products of saccharification inhibit hydrolytic enzymes 22 . In addition, the synthesis of acetate and acids such as formic acid, acetic acid, and levulinic acid at concentrations above 100 mM may inhibit the bioconversion of biomass 22,23 and thus influence alcohol content.
TSS. Cooking the soured porridge for an adequate amount of time is essential for starch gelatinization and release of locked-up nutrients in yeasts cells 24 . The cooking time was found to influence the alcohol content, TSS, pH, and viscosity ( Table 3). The proliferation of fermentative microbes is driven by the hydrolysis of cooked starch to fermentable sugars by endogenous amylolytic enzymes 25 . As the endosperm protein enclosing the starch granules is softened (during gelatinization), moving the grain to the retting water, thereby increasing the amount of TSS 26 . This might explain the increasing trend in the amount of TSS with an increased cooking and fermentation time. As observed from Table 3, cooking for more than 1 h significantly increased the amount of TSS. A reverse trend was observed when the fermentation time was increased. This could be attributed to the growth patterns of microorganisms that correspond to the consumption of soluble solids over time 26 . The fermentation time largely contributed to the final product's quality. The longer the fermentation was allowed to proceed, the lower the alcohol content, pH, and viscosity (Table 3). Fermentative microorganisms need sufficient time to adjust to environmental changes for optimal utilization of the substrate for building cellular components (RNA, enzymes, and metabolites) 27 . As cells complete the cell cycle, they enter the exponential growth phase, where they are the healthiest and most uniform, rapidly driving alcoholic fermentation pH and TTA . The TTA and pH ranged between 0.50-1.54% lactic acid and 2.81-4.60, respectively (Table 3). Generally, umqombothi and other African traditional beers have a pH range of 3 to 4.2, and a lactic acid level of 0.26% depending on how the beer is brewed 4,24 . Changes in TTA may be a better measure of the success rate of the fermentation process than changes in pH 26 . A biochemical relationship between alcohol content, TTA, and pH, whereby a lower pH was directly proportional to a high TTA and alcohol content, was observed in this study (Table 3). According to 25 , as the microorganisms carry out alcoholic fermentation, a decrease in the TSS and pH are usually observed. Beers with decreased pH values, such as umqombothi (Table 3) have a longer shelf-life, better safety and quality, superior facilitation of microbial growth, and a higher concentration of antimicrobial agents 28 . The low pH and elevated acidity in these beers aid in the elimination of certain pathogenic microorganisms that could pose safety threats 29,30 .

Viscosity.
Cooking time had a direct influence on the final beer's viscosity. This is because cooking increasing the availability of starch, which imparts viscidness to food and describes the clarity of the finished beer 31 .
In addition, residual starch from incomplete hydrolysis into sugar contributes to a beer's viscosity 25 . As TTA increases and the pH is lowered, the joint action of malt α-and-β amylases is reduced, thereby reducing the beer's viscosity, and giving body to the final beer 25 . An increase in the α-amylase, Hitempase 2XL decreased the viscosity in beer produced from malted buckwheat 32 . In western beers, filtration of the beer may be difficult due to high viscosity, thus leading to starch hazes in the final product 32 , while in traditional beers such as umqombothi, filtering the beer may result in the loss of important fiber-imparting solids, giving the beer a higher viscosity 4,33,34 .
Multi-response optimization of process parameters. In search for the solution, ANOVA, and Fisher's F-values were used to examine the best fit of the generated RSM models. Model adequacy was determined by the coefficient of determination values (R 2 ) and lack of fit tests 1,20 . For the response in view, the R 2 described the percentage contribution of the process variables (i.e., the amount of variation around the mean explained by the model). For high-confidence prediction purposes, a usable model demands percentage contribution of 88% (R 2 > 0.88) 35 . The probability of significance was represented by p-values, with a high p-value indicating an inadequate model due to a significant lack of fit 36 . The models for alcohol content, TSS, and pH all had p-values of 0.00, indicating that the lack of fit was insignificant at a 100% confidence level. Polynomial equations together with 3D response surface plots were used to describe the mathematical solutions of the models. Polynomial equations for alcohol content, TSS, TTA, pH, and viscosity are shown in Eqs.
(2), (3), (4), (5), and (6), respec-  Fig. 2a,b. Regression equations from the fitted models were used to generate 3D plots. The models for optimizing the alcohol content (°P), TSS (g/100 g) and pH, in the beer, were found to be significant as implied by high model F-values (F ≥ 10) and low p-values (p ≤ 0.05) ( Table 4). For the alcohol content and TSS models, X 3 , X 1 X 2 , X 1 2 were significant model terms (p ≤ 0.05) ( Table 4). Significant model terms for pH were X 3 , X 1 X 3 , X 3 2 , with p-values of 0.00, 0.047, and 0.00 respectively ( Table 4). The predicted determination (pred-R 2 ) values for alcohol content and TSS were not as close to the adjusted determination (adj-R 2 ) indicating a slight limitation with the model (Table 5). A consideration of outliers, model reduction, and response transformation may improve the empirical model 37 . In contrast, the predicted determination (pred-R 2 ) of 0.89 in the pH optimization model was reasonably close to the adjusted determination (adj-R 2 ) of 0.97, thus confirming the model's accuracy in correctly predicting responses (Table 5). Adequate precision values above 4 indicated an adequate signal-to-noise ratio. This means the optimization models for alcohol content, TSS, and pH were www.nature.com/scientificreports/ suitable to navigate the design space and all of the model's parameters showed that the developed models were able to predict the responses correctly. The optimization models for alcohol content, TSS had reproducibility above 90% (R 2 ≥ 0.90) and low coefficient of variation (C.V. %) values (Table 5), indicating a good precision for the capability of the process under evaluation. The models for optimizing TTA and viscosity were insignificant as implied by low model F-values (F ≤ 10) and high p-values (p > 0.05) (Fig. 2a,b). Here, model reduction, consideration of outliers, and response transformation will not improve the model. The overall mean may be a better predictor of the designed responses than the current models. A higher-order model may also predict better in certain cases. None of the TTA optimization model terms were significant, while X 1 and X 1 2 were significant model terms (p ≤ 0.05) for viscosity. Both the models' limitations were described by significant differences between the predicted determination. The TTA model had a pred-R 2 of -3.34 and an adj-R 2 of -0.09. Similarly, the model for viscosity had a pred-R 2 of -1.05 and an adj-R 2 of 0.49. In this case, a negative predicted determinant (pred-R 2 ) implies that the overall mean may be a better predictor of the designed response than the current model 38 . A higher-order model may also predict better in certain cases. An adequate precision value of 2.99 in the TTA model indicated an undesirable signal-to-noise ratio. This means the model was not suitable to navigate the design space. The viscosity optimization model had an adequate precision above 4, meaning the model was suitable for navigating the design space. The low reproducibility of 42% (Table 5) for the TTA optimization model was indicated by a low coefficient of determination (R 2 = 0.424). In contrast, the coefficient of determination for the viscosity was 0.729, representing a 73% reproducibility. Although the reproducibility can be considered adequate, a C.V. % value of 15.22 may be alarming (Table 5). From the obtained experimental data, second-order polynomial equations showing the significance of linear, quadratic, and interactive terms in predicting the response were generated and shown in Eqs. where Y 1 = response for alcohol content (°P), Y 2 = response for TSS (g/100 g), Y 3 = response for TTA (% lactic acid), Y 4 = response for pH, Y 5 = response for viscosity (cm/min), X 1 = Cooking time (hr), X 2 = Fermentation temperature (°C), X 3 = Fermentation time (hr).
The effect of input factors on the physicochemical properties of the optimal beer brew. Independent variables, cooking time (hr) coded as (X 1 ), fermentation temperature (°C) coded as (X 2 ), and time (hr) coded as (X 3 ) were optimized. The optimization goal for all independent variables was set to 'target' as dictated by the nature of the study. The responses alcohol content (°P), TSS (g/100 g), TTA (% lactic acid), pH, and viscosity (cm/min) were considered for optimization. The software generated 100 optimization solutions each with a desirability value of 1. To select a suitable solution, prediction values of each solution were compared to prediction values of the constructed ANN. Yeast survival and proliferation, under-and-over cooking, shelflife associated spoilage, and conditions' applicability in real-life (study objectives) were also considered. Taking these variables into account, a solution that favored these considerations was selected. A cooking time of 1.1 h, fermentation temperature of 29.3 °C, and fermentation time of 25.9 h were optimal bioprocessing conditions. The parameters (alcohol content, TSS, TTA, pH, and viscosity) were subsequently investigated and the results are provided in Table 6. The customary brew (CB) was prepared by cooking the mixed ingredients for 30 min and leaving the cooked slurry to ferment at 25 °C for 24 h. The CB was then compared with the optimized brew (OPB).
The OPB was found to have a low pH (3.27 ± 0.03) compared to the CB (4.23 ± 0.02) ( Table 6). As a result, the OPB had a higher alcohol content (13.63 ± 0.12°P) and a higher TTA (0.68 ± 0.02% lactic acid). In preparing high-quality umqombothi, a 60 min cooking time has been suggested to be ideal 39 . A cooking time of 1.1 h did not under-/over-gelatinize the starch and provided adequate nutrients to yeasts cells 24 . In addition, the achieved gelatinization improved water absorption into the granules, thereby improving the viscosity 40 . This was reflected in the viscosity obtained for the OPB, which had more a desirable viscosity value compared to the CB (Table 6). A fermentation temperature of 29.3 °C was optimal for higher production of alcohol in the OPB (Table 6). A higher TSS in the OPB ( Table 6) described the type of sugar conversion and its dependence on temperature for a rich, finished beer 41 . The slightly higher fermentation temperature and a relatively short fermentation time in    Table 6. Physicochemical properties of umqombothi. CB customary brew, OPB optimized brew. Each value is a mean ± standard deviation of triplicates. *Each value is a mean of triplicates ± standard deviation of triplicates. Means with no common letters within a row significantly differ (p < 0.05).  ANN training, validation, and testing on experimental responses. An appropriate ANN construction involves the selection of network architecture, determination of hidden layers and number of neurons in each layer, learning-training-validation, and verification of the data 18 . In building a better ANN model, the number of the hidden layers between inputs and output must be appropriately trained and fitted 18 . To achieve this, the number of neurons in the hidden was varied (i.e., 5, 10, and 20 neurons in the hidden layer) (data not reported). To further study the responses of the model, three different training algorithms were evaluated. When 10 neurons in the hidden layer were used, all the algorithms rapidly generated solutions with high R and R 2 values (data not reported). However, when the neurons were increased to 20, the number of reiterations increased in the BR algorithm, thus taking longer to generate a solution. In contrast, both the LM and SCG algorithms were not significantly affected by an increase or decrease in the number of neurons and maintained a higher rapidity in generating solutions. The SCG uses second-order approximation, resulting in fewer iterations and faster learning 42 . This may be due to the algorithm using a step-size scaling mechanism that avoids a timewasting line search per learning iteration 43,44 . Adequate training, validation, testing, and overall prediction accuracy were observed when the LM algorithm was used ( Table 7). The LM algorithm which may be the fastest of the three training algorithms specifically works with loss functions presented in the form of a sum of squared errors (SSE) 45,46 . Unfortunately, LM cannot be applied to the cross-entropy error and the root mean squared error functions 46 . For functioning approximation problems, the LM training algorithm was able to obtain lower MSE than all other algorithms among regularization techniques. As a result, the LM is the recommended choice with better performance in terms of rapidity and the overfitting problem when there are a few thousand instances and a few hundred parameters for training the ANN 46,47 . In an unrelated study, the LM training algorithm was found to show the highest accuracy in comparison to different training algorithms in a MLP model that forecasted chemical elements distribution in the topsoil 45 .

Sample Alcohol (°P) TSS (g/100 g) TTA (% lactic acid) pH
The ANN training using the LM algorithm stopped automatically when generalization stopped, indicated by an increase in the MSE of the validation samples. In measuring performance indices of the ANN, the MSE is the most used and simplest error function 48,49 . The MSE measures the ability of the model to predict responses accurately, with a lower MSE showing a higher modeling ability 18 . In combination, R 2 and MSE evaluated the overall accuracy of the model 18 . The coefficient of correlation (R) was used to measure the correlation between inputs and targets. R = 1 described a close relationship, and R = 0 described a random relationship. ANN models for alcohol content, TSS, TTA, and viscosity had overall R 2 values of 0.96, 0.96, 0.81, and 0.92, respectively (Table 7). These values were closer to 1, suggesting high reliability in model prediction accuracy. The overall R 2 value for pH was 0.50 representing a 50% reproducibility. Overall, a high correlation between inputs and targets was observed for alcohol content (0.98), TSS (0.98), TTA (0.90), and viscosity (0.96) (Fig. 3).
Apart from MSE values, the ANN was further assessed using performance curves. Performance curves display the network's incremental training process and the direction in which it learns. These curves plot training record error values against the number of training epochs. Consequently, the learning curve is a plot describing a model learning performance over time or experience 50 . Performance curves are useful in diagnosing problems with learning aspects such as unrepresentative training datasets, underfitting models, unrepresentative validation datasets, and overfitting models 50 . The ANN best validation performance curves for the responses are shown in Fig. 4. The ANN achieved the best learning and the lowest error after a few iterations (epochs). The best validation performance for each network was taken from the epoch with the lowest validation error. Both alcohol and TTA had the shortest iterations before achieving the best validation performance. In contrast, TSS achieved its best validation performance at epoch 5. After more epochs of training, the error is generally reduced but may start to increase on the validation dataset as overfitting of the training data occurs 51 . All the networks showed a good learning rate for the training stage and a high learning rate for the validation and testing stages 52 . In addition, both the training and validation showed a good fit displayed by training and validation MSE (loss) values which decreased to a point of stability with relatively nominal gaps between the two final MSE (or loss) values 50 . Overall better learning is described by error scores closer to 0, thus indicating that the training dataset was learned thoroughly and minimal mistakes were made 50 .
Comparison between the RSM and ANN responses. An optimization prediction model developed by RSM was assessed by comparing its prediction accuracy with that of the ANN which was also used to validate the entire process. Table 8 shows the error comparison obtained from both and ANN predictions. The comparative error analysis was used to verify the prediction accuracy and generalization capacity of both models in optimizing the bioprocess 53,54 . Overall, the ANN model showed lower error values than the RSM, indicating lower computational deviations and an advanced generalization capability 11,54 . As a result, ANN displayed a higher prediction accuracy and better model fitting 18 . On the other hand, RSM prediction values can be accepted with a higher degree of confidence since they are closer to experimental values and ANN prediction values 18,55 . The results from Table 8 show a close correlation between the experimental values and RSM and ANN's predicted values. Both RSM and ANN models showed a relatively high number of inexact predictions for viscosity.
The difference between predicted and experimental values directly contributed to the extent deviation in predictive capacity of each model. While RSM is recommended for modelling new processes, its sensitivity may be limited 55 . Despite this limitation, RSM has an obvious way of showing the effect of individual elements and their interactions on a specific system 11 . For example, the effect on a specific parameter is shown by a greater higher value of coefficients in ANOVA 57 . On the other hand, a higher number of inputs are required for ANN than RSM to have better predictions 55 . ANN cannot give such insights into the system directly since it is a 'black box' 56  www.nature.com/scientificreports/ Nonetheless, ANN can universally describe high-level interactions in non-linear systems without prior specification for suitable fitting function 55,57 . Additionally, ANN can calculate multi-responses in a single process 53 . As depicted by the close agreement between the experimental and predicted values, RSM and ANN are adequate for developing a bioprocess that optimally produces umqombothi. Advanced soft computing approaches like ANN may be preferred in the case of data sets with a limited number of observations in which regression models fail to capture reliably 18 . The closeness of the experimental values and predicted suggest that the non-linear fitting effects of the model are good, recommending the use of the proposed procedure 18,57 . A coupled modeling approach can thus be applied in bioprocess method development for umqombothi and related variations.

Conclusion
Both RSM and ANN were effective bioprocess development tools that facilitated the optimization of umqombothi. The effectiveness of RSM was shown by R 2 closer to 1. The R 2 values were 0.94, 0.93, 0.99, and 0.73 for alcohol, TSS, pH, and viscosity respectively, showing reliability and reproducibility above 70%. Similarly, ANN displayed a high degree of accuracy. Constructed ANN models for alcohol, TSS, TTA, and viscosity had R 2 values of 0.96, 0.96, 0.81, and 0.92 respectively. As result, a good correlation between the experimental and predicted values suggests that a coupled approach may positively impact the bioprocess and the final product. However, further investigation of other key parameters (i.e., starter culture, the content and ratio of raw materials, souring time www.nature.com/scientificreports/ and temperature, and cooking temperature) is still required. Furthermore, the use of an additional tool such as genetic algorithm may resolve computational and modeling limitations.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.