Introduction

Nowadays, Peptides with their bioactive amino acids play a functional role at many pharmaceutecal and nutriceutical industries. In this trend, we need intelligent, accurate and fast bioanalytical measurements to assess the analytes of interest with variable concentrations in variable conditions. As we should promote specific production, enhance quality control processes and show food metabolic studies more clearly. Unfortunately, these industries face great technical problems with the bioactive amino acids production, the major problem is amino acid analysis in foodstuffs as they are destructed during acid hydrolysis in the preparation step, this problem can be greatest with the essential amino acids likely to be limiting in functional diets “methionine and cystine”, Lysine, threonine, and tryptophan. All amino acids have already been commercialized as nutraceuticals1. So, there is an urgent need to apply intelligent algorithms for not only detection but also the characterization of novel bioactive peptides in the protein2. Peptides from Fish protein hydrolysates differ so widely in their composition that “Lab” analytical methods would need to be more specific for each type, but these methods are time-consuming. Thus, compromises between the Lab and computerized analytical methods are often necessary, especially to promote the best utilization of great functional and nutritional benefits in protein3.

Fish proteins have variable but functional and biological applications4. They are the source of secretagogues, calciotropic hormones and growth factors5. Their bioactive amino acids provide important functional and biological roles such as antihypertensive, antioxidant and immune modulatory activities. They perform the regulation of the blood pressure through inhibition angiotensin converting enzyme activity. As well as the antihypertensive role, they perform antioxidant roles through scavenging activity that prevent oxidation process6. They also enhance the capacity of lymphocyte proliferation, percent of T-helper cells in spleen and secretion of interferon plus cytokines. So, they have a great role in clinical diet formations which used in specific diseases. In allergic patients, enzymatic protein hydrolysates and a mixture of specific amino acids have a great importance to decrease immune-mediated hypersensitive reactions7. Not only allergic patients but also patients with cancer and hepatic encephalopathies as they suffer from disorders in metabolism8.

Many studies have found that the functional properties of amino acids are related to the concentration in the diet and to the source of amino acids. As an example, fish-derived bioactive peptides are more functionally active than other sources9. In this paper, we estimated the concentration of bioactive amino acids in fish by-product protein hydrolysates with studying the effect of variable environmental temperature over the year. We aim to study the dynamic properties of functional amino acids as it has a great importance as it detects the functional quality of the extracted protein hydrolysates at different times10. This also has a vital role in pharmaceutical dynamic properties. So, we can target the produced protein hydrolysates to certain drugs based on amino acids concentration levels. As an example, Patients with the liver disease show a plasma amino acid imbalance with high levels of tyrosine and phenylalanine and low level of valine leucine and isoleucine11. Therefore, the new analytical algorithms help to choose the specific amino acids which has an essential role in the treatment of patients with chronic liver diseases as an example. The optimum supply of amino acids is also necessary to enhance hepatic regeneration and immunologic host defense12 as well as normalization of plasma amino acid profile13. Finally, The previous functional and bioactive properties struggle the challenge of many changes in the environmental conditions and variation in the temperature, so the optimal exploitation of bioactive amino acids for human nutrition and health possesses an exciting scientific and technological challenge while at the same time offering potential for commercially successful applications.

In this paper, we proposed an a new prediction approach based on adaptive neuro-fuzzy inference system (ANFIS)14, 15 to improve the performance of predicting the amino acids concentration in fish. However, determining the optimal values for the parameters of the memberships function and weights between layers of ANFIS model is the main problem in ANFIS. The gradient descent approaches are the popular algorithms that used to learn the parameters of ANFIS. However, the gradient is computed at each iteration and it can be stuck with local point and therefore not a global solution can be determined16. To solve these drawbacks, the meta-heuristics like genetic algorithms (GAs)17 and particle swarm optimization (PSO)18, 19 are used. However, GAs are slow convergence speed, whereas PSO is sensitive to neighborhood topology. So, the whale Optimizer (WO) algorithm is used to solve this problem20.WO is a new metaheuristic inspired that emulates the humpback whales20. In WO, there are three steps are used to mimic the hunting behavior: tracking, encircling and attacking the prey.

The main goal of this paper is to analyze the amino acid dynamics at variable temperature values with improving the performance of intelligent system (ANFIS based WO algorithm) to obtain the highest predictive importance.

Adaptive Neuro-Fuzzy Inference System (ANFIS)

The adaptive neuro-fuzzy inference system (ANFIS) is a hybrid of both neural network (NN) and fuzzy logic14, 15, 21. The structure of ANFIS is illustrated in Fig. 1, in which the ANFIS consists of five layers. The input data (x and y) are presented to each node in the first layer and the output is computed by using the generalized Gaussian membership function \(\mu (x)\) as:

$${O}_{1i}={\mu }_{{A}_{i}}(x),i=1,2,{O}_{1i}={\mu }_{{B}_{i-2}}(y),i=3,4,\mu (x)={e}^{-{((x-{\rho }_{i})/{\sigma }_{i})}^{2}}$$
(1)

where \({A}_{i},{B}_{i}\) are the membership values of the \({\mu }_{A}\) and \({\mu }_{B}\), respectively. \({\rho }_{i}\) and \({\sigma }_{i}\) are represent the mean and standard deviation of data respectively. The output of each node in the first layer is passed to the second layer and the firing strength of a rule (\({w}_{i}\)) is computed as:

$${w}_{2i}={\mu }_{{A}_{i}}(x)\times {\mu }_{{B}_{i-2}}(y)$$
(2)
Figure 1
figure 1

The five layers of ANFIS model.

Then in third layer the normalized firing strength (\({\bar{w}}_{i}\)) is computed for each node as:

$${O}_{3i}={\bar{w}}_{i}={w}_{i}/(\sum _{i\mathrm{=1}}^{2}{w}_{i}),$$
(3)

The normalized firing strength and the function \({f}_{i}\) is passed to each node in the fourth layer (an adaptive node) and its output is computed as:

$${O}_{4i}={\bar{w}}_{i}{f}_{i}={\bar{w}}_{i}({p}_{i}x+{q}_{i}y+{r}_{i})$$
(4)

where \({p}_{i},{q}_{i}\) and \({r}_{i}\) is the consequent parameters of the node. In the last layer, there is a single node and it is output is defined as.

$${O}_{5}=\sum _{i}{\bar{w}}_{i}{f}_{i}$$
(5)

The ANFIS parameters are divided into two sets, the consequent and premise parameters. All of these parameters are needed to update in learning process until the target is achieved. There are some approaches used to learn the ANFIS parameters such as the Least Square Method (LSM) is used to find the optimal values for both sets of the parameter. However, its convergence is slow and the hybrid algorithm that combines the LSM and the backpropagation (BP) algorithm is used to solve this problem17. This algorithm is susceptible to get stuck at local optima. To overcome this drawback, this paper introduces a new evolutionary technique, namely, Whale algorithm as in the following section.

The Whale Optimization Algorithm

The whale optimization (WO) algorithm is a new swarm technique that emulates the humpback whales20. In WO algorithm, the search starts by generating a random population of whales (solutions). These whales attacking (optimization) their prey (\({\overrightarrow{X}}^{\ast }\)) in either Encircling or Bubble-net method after determining the location of the prey.

In the encircling method20: The position of humpback whales are updated according to the best position as20:

$$\overrightarrow{D}=|\overrightarrow{C}\odot \overrightarrow{X}\ast (t)-\overrightarrow{X}(t)|$$
(6)
$$\overrightarrow{X}(t+\mathrm{1)}=|\overrightarrow{X}\ast (t)-\overrightarrow{A}\odot \overrightarrow{D}|$$
(7)

where \(\overrightarrow{D}\) is the distance between the position of the prey (\(\overrightarrow{X}{(t)}^{\ast }\)) and the and other whales (\(\overrightarrow{X}(t)\)) at the current iteration number \(t\). The two coefficient \(\overrightarrow{A}\) and \(\overrightarrow{C}\), and are calculated as follows:

$$\overrightarrow{A}=2\overrightarrow{a}\odot \overrightarrow{r}-\overrightarrow{a},\,\,\,\overrightarrow{C}=2\overrightarrow{r}$$
(8)

where \(r\in \mathrm{[0},\mathrm{1]}\) is random number, and the parameter \(\overrightarrow{a}\) is decreased linearly from 2 to 0 as the iteration increased.

There are two approaches to simulate the bubble-net behavior. The first approach is the shrinking encircling that achieved by using equation (8), also, \(\overrightarrow{A}\) is decreased. The second approach is the spiral updating position: This method is used to simulate the helix-shaped movement of humpback whales around prey:

$$\overrightarrow{X}(t+\mathrm{1)=}\overrightarrow{D}\text{'}\odot {e}^{bl}\odot cos\mathrm{(2}\pi l)+{\overrightarrow{X}}^{\ast }(t)$$
(9)

where \(\overrightarrow{D}\text{'}=|{\overrightarrow{X}}^{\ast }(t)-\overrightarrow{X}(t)|\) is the distance between the whale and prey, \(b\) is a constant for defining the shape of the logarithmic spiral, \(l\) is a random number in [−1, 1], and \(\odot \) is an element-by-element multiplication. The humpback whales can simultaneously swim around the prey through a shrinking circle and along a spiral-shaped path20.

$$\overrightarrow{X}(t+\mathrm{1)}=\{\begin{array}{cc}{\overrightarrow{X}}^{\ast }(t)-\overrightarrow{A}\odot \overrightarrow{D} & if\,\,p\ge 0.5\\ \overrightarrow{D}^{\prime} \odot {e}^{bl}\odot cos\mathrm{(2}\pi l)+{\overrightarrow{X}}^{\ast }(t) & if\,\,p < 0.5\end{array}$$
(10)

where a random probability \(p\in \mathrm{[0},\mathrm{1]}\) is used to switch between the spiral model or the shrinking encircling mechanism to improve the position of whales.

In exploration phase, the whales search about the prey in a random from. The position of a whale is updated by selecting a random whale rather than \({X}^{\ast }\) as follows:

$$\overrightarrow{D}=|\overrightarrow{C}\odot {\overrightarrow{X}}_{rand}-\overrightarrow{X}(t)|$$
(11)
$$\overrightarrow{X}(t+\mathrm{1)}=|{\overrightarrow{X}}_{rand}-\overrightarrow{A}\odot \overrightarrow{D}|$$
(12)

where \({\overrightarrow{X}}_{rand}\) is a random whale’s position selected from the population.

The proposed prediction Algorithm

In this section, the proposed algorithm for predicting the bioactive amino acids concentration in fish. This algorithm is the ANFIS based on WO (called ANFIS-WO), where this approach consists of five layers. The inputs variables to the first layer are (Moisture, fat, ash, Crude protein, and Temperature) and the output of layer 5 is the amino acids concentrations.

The proposed algorithm starts by normalizing dataset then the fuzzy c mean (FCM) is used to determine the number of membership functions. The next step is to construct the ANFIS based on the number of membership function. The parameters in ANFIS are updated based on WO algorithm, that used square euclidian distance as a fitness function is defined as:

$$fitness\,function={\Vert out-pred\Vert }^{2}$$
(13)

where the WO algorithm is started by generating a population with a random position for each whale that represents the parameters of ANFIS. Then the fitness function for all population is computed and the global objective function is determined. The value of \(a\) is decreased from 2 to 0 and for each whale in the population the \(A\) and \(C\) are computed based on equations (8) and (??) respectively. Then the position of current whale is updated based on the value of \(p\), where if \(p\,\mathrm{ > 0.5}\) then the current position becomes the best position otherwise the position is based on either equations (67) or equations (1112) based on if \(|A| < \,0.5\) or \(|A|\ge 0.5\) respectively. The WO still update the position until the stop condition is satisfied, the best solution is passed to ANFIS.

The training phase is finished, if the stop conditions (maximum number of iteration and error less than small value) are satisfied. In the predicting phase, the test data set in introduced to the ANFIS that predict the output and the performance of the output is evaluated. The proposed algorithm is illustrated in Fig. 2.

Figure 2
figure 2

Flowchart of Proposed model.

Experimental Results and Discussion

In the experiments, the data is divided into training and testing sets by using two methods, in the first method, the data is split randomly into 70% samples for the training set and the rest 30% as a testing set. However, the random division may be not accurate and can cause bias in the results of prediction, so, in order to avoid this limitation there are four strategies can be used as a second method. For example. the N-fold cross-validation test, sub-sampling test, independent dataset test and jackknife cross-validation test, in which these strategies have been widely used to examine the performance of a prediction model22,23,24,25,26. In this study, the N fold cross-validation test, (here 10 fold) was used to investigate the performance of the prediction model.

The ANFIS-WO algorithm was compared with five models, namely, ANFIS-PSO, ANFIS-GA, ANFIS, IBK, SMO, and SVM. The experiments were implemented in Matlab R2014b and Windows 10 (64-bit). The parameters are set as size of population is n = 25, the max iteration is \(100\).

Dataset collection

By-products of 120 fresh farmed tilapia (oreochromus niloticus) were collected every month over a year at Kafrelsheikh Governorate, Egypt (One of the most important areas in the production of tilapia in the world). We collected fish byproduct under measured parameters (weight, Sex, length, water quality, ration). Then, they were minced and stored at −30 °C till use. The following steps shows the Enzymatic hydrolysis reaction process for preparing the data samples.

  • Thawing the stored by product over night in cold place(4 °C).

  • 15% of the samples volume mixed with 50 ml phosphate buffer saline (pH 7.5).

  • Pre-incubation at 60 °C for 20 minutes.

  • Adding alcalase enzyme (2.5%)to initiate the enzymatic hydrolysis reaction.

  • Heating in water bath (90 °C) for 15 minutes.

  • Cooling in ice.

  • Centrifuge the cooling mixture for twenty minutes at 10000 rpm then the hydrolysis degree was measured to the supernatant according to27.

  • Supernatant extraction then freeze dried and characterized.

Preparation Phase

Analysis of Tilapia fish by-product and its hydrolysates powder

The contents of tilapia fish by-product and its hydrolysates were measured according to AOAC method28, The protein content was determined using kjeldal method. Moisture percentage was estimated with drying method. In addition, Ash content was measured by muffle furnace. Within our study, the protein hydrolysates were extracted with alcalase enzyme with appropriate PH and temperature29.

Amino acid sequence analysis

According to30, stacking and separating gel were prepared using gel buffer with percentage 4% and 16% respectively. Heating the sample mixture with the buffer till 90 °C for 10 min, then loading into specific wells. Protein standards (1.06 kDa to 26.6 kDa) were also performed on the gels. Fixing, staining and destaining solutions were mixed with gel, after electrophoresis then comparing the resulted protein bands with the standard ones31.

Tricine SDS-PAGE analysis

According to30, we performed Tricine-SDS-PAGE by preparing gel buffer with 4% and 16% stacking and separating gel respectively, then fixing solution was added to gels. After that staining solution was added before the destaining solution. Comparing the resulted bands with standard protein bands.

Evaluation criteria

The performance and efficiency of the ANFIS-WO model is evaluated by three statistical methods, namely, Average Absolute Percent Relative Error (AAPRE)and Root Mean Square Error (RMSE) as in Table 1: where \({x}_{i}\) is the \(i\)-th predicted element, \({y}_{i}\) is the \(i\)-th measured element, and \(N\) is the number of samples. \({y}_{i}\) is the average of the corresponding predicted value.

Table 1 Measure the performance of algorithms.

Results

The results of the proposed model compared with other models according to divided the data randomly are introduced in Table 2 and Figures S1S15 in Supplementary Material which are the average of 10 runs. Where Figure S1 is the average of the algorithm overall the concentration, and from this figure we can conclude that, in general, the proposed algorithm has the best values of RMSE and AAPRE which are 1.70 and 8.23 respectively. Also, its accuracy is higher than all other versions ANFIS model that have the values 8.81, 6.35 and 8.069 for ANFIS, ANFIS-GA, and ANFIS-PSO, respectively. Also, when compared the proposed algorithm with SMO, SVM, IBK, and RF, it also still has the best solution in term of all measures. Figures S2 and S15 which indicate that The ANFIS-WO output values are nearest to the target data (not testing target only).

Table 2 Comparison between algorithms based on RMSE and AAPRE using random division dataset for training and testing

Discussion

Preliminary studies were carried out in order to determine the concentration of bioactive amino acids (Figures S1 and S16) in crude protein by-products and protein hydrolysates by-products by alcalase enzyme hydrolysis in tilapia fish. Obviously, the main criteria for selection of the alcalase enzyme are its ability for high extraction of the target analytes (Figures S2 and S17). The characterization of the molecular weights of Protein hydrolysates by SDS-PAGE showed the presence of strong bands ranging between 3.5–26.7 kDa, which indicated that alcalase enzyme was able to produce small-sized peptides in 120 min. Our study shows the alcalase enzyme ability to produce low molecular weight peptides through a high degree of hydrolysis. Fish protein hydrolysates with high functional values must be rich in low molecular weight peptides, and the effective production of such peptides from Tilapia By-product indicated its potential application in functional food products32 Based on this, the measured proximate compositions of Tilapia by-product and Tilapia protein hydrolysates with special concern to environmental temperature effect were selected and tested.

The experimental results showed the significant effect of environmental temperature and proximate compositions on the concentration of amino acids as. According to33, crude protein as a proximate composition has the greatest effect on amino acid concentrations and this matched respectively with water temperature values as external factor, Fat, moisture and ash which have a great effect on the metabolism and gene expression of amino acids in fish, specially adapted to different thermal conditions Under these controlled conditions, amino acids biosynthesis represented by their concentrations can be predicted by different algorithms.

Development of ANFIS via WO algorithm shows better performance in the concentrations standard deviation of the differences between predicted and observed amino acids concentrations than showing incorrect amino acids quantity is from the true values within the biosynthesis process of whole amino acids at different temperature values. Figures S2 and S11 (and Figures S17S26) show the best biosynthesis process expressed by aspartic and glutamic amino acids concentrations obtained within the 27–29 °C with a marked decrease at 35 °C and 38 C. It is noted that these two amino acids have the same carboxylic acid on its side chain that gives it acidic (proton-donating) and functional properties. It is noted that Aspartate can be converted into methionine and threonine that, also, gives rise to isoleucine. Although these amino acids contain different mechanisms for their regulation and concentration, ANFIS-WO algorithm gives errors with larger absolute concentration values more weight than errors with smaller absolute concentration values with the same effect of temperature for aspartic acid and isoleucine (Figures S2 and S9, and Figures S17 and S24), otherwise, methionine which found SVM the best to show its predicted values (Table 2).

According to34, Alanine and Valine are produced by the transamination of pyruvate molecules as given in Figure S3 (Figure S18) and Figure S4 (Figure S19), respectively. These figures show that the lowest concentrations were in between the 12–15 C; as cold temperature may affect the glycolysis process and decrease the pyruvate production which is the precursor for the alanine and valine. Because leucine is synthesized by a diversion from the valine synthetic pathway, the feedback inhibition of valine on its pathway also can inhibit the synthesis of leucine.

This biosynthesis process of the previous nonpolar amino acids alanine, as well as leucine diversion from the valine synthetic pathway, were optimized based on their concentrations with higher performance by ANFIS-WO algorithm under variable temperature measurements as shown in Figure S8 (Figure S23), but valine concentrations were predicted with the highest standard metric values to be at the highest performance with SMO Tables 23. Likewise, the proline amino acid (non-polar amino acid) was predicted in the best performance value with SMO as in Table 2, however, based on the results in Table 3, the SMO algorithm is in the third rank after the proposed ANFIS-WO algorithm and RF algorithm.

Table 3 Comparison between algorithms based on RMSE and AAPRE using 10fold cross validation.

It is noted that ANFIS-WO gives the best performance predicted values for Phosphoryl creates group concentrations represented in Serine-glycine and Cysteine, Serine is the first amino acid in this family to be produced; it is then modified to produce both glycine and cysteine. In Figures S6S10 (Figures S21S25), the proposed algorithm shows the clear variation in the serine and glycine as well as tyrosine (Figure S4 and Figure S19) (as polar amino acids) concentrations at different times over the year which matched with the actual higher concentration values at 27–29 °C and lower concentration values at 12° C, but in Figures S12 and S27, although cysteine biosynthesis derived from serine amino acids, their concentration variation is not clear as serine. This may be due to down regulation of genes required for the synthesis of cysteine which is coded on the cys regulon. Cys regulon can actually down regulate its own transcription by binding to its own DNA sequence and blocking the RNA polymerase. In this case, N-acetyl-serine which is an effective inducer of this regulon act to disallow the binding of regulon to its own DNA sequence35.

In Figures SS13SS15 and SS28SS30, ANFIS-WO present the best-predicted value performance for basic amino acids (arginine and histidine) which possesses similar chemical property. Biosynthesis prediction of these amino acids via their concentration is so vital to give clear understand the biological dynamics of amino acids in the fish and consequently their products including protein hydrolysates.

From all previous figures we can conclude that the standard deviation, MSE and RMSE has small values for all amino- acids concentration, where the predictions are the closer to the actual data.

Conclusion and Future work

The Prediction algorithms enhance the practical properties of fish products that have an extraordinary therapeutic and industrial roles throughout our life. In this way, we assessed the concentration levels of bioactive amino acids in protein hydrolysates extracted biotechnologically from tilapia fish product with every settled parameter aside from the water temperature, planning to optimize their concentration and their interactions with each other. In addition, it is entirely costly and time-consuming to know these concentrations by the real experimental tests, especially to conduct a large-scale project. In this paper, we have introduced a new intelligent algorithm for predict the amino acids concentration. The proposed algorithm is the ANFIS based whale optimization algorithm, in which the whale is used to improve the performance of ANFIS. The results indicate that the higher performance of ANFIS-WO algorithm when compared with other algorithms in terms of RMSE and AAPRE.

For future work, further investigations are required to identify the behavior of the proposed algorithm in different applications such as water quality and other food applications. According to the potential of the proposed method, it can be applied to, other related problems such as DNA-binding protein prediction36, detection of tubule boundary37, methylation site prediction37, 38, phosphorylation site prediction39, and protein-protein interaction prediction40, 41. Moreover, since user-friendly and publicly accessible web-servers represent the future direction for developing practically more useful models42,43,44,45,46, we shall make efforts in our future work to provide a web-server for the method presented in this paper.