Linear matrix genetic programming as a tool for data-driven black-box control-oriented modeling in conditions of limited access to training data

In the paper, a new evolutionary technique called Linear Matrix Genetic Programming (LMGP) is proposed. It is a matrix extension of Linear Genetic Programming and its application is data-driven black-box control-oriented modeling in conditions of limited access to training data. In LMGP, the model is in the form of an evolutionarily-shaped program which is a sequence of matrix operations. Since the program has a hidden state, running it for a sequence of input data has a similar effect to using well-known recurrent neural networks such as Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU). To verify the effectiveness of the LMGP, it was compared with different types of neural networks. The task of all the compared techniques was to reproduce the behavior of a nonlinear model of an underwater vehicle. The results of the comparative tests are reported in the paper and they show that the LMGP can quickly find an effective and very simple solution to the given problem. Moreover, a detailed comparison of models, generated by LMGP and LSTM/GRU, revealed that the former are up to four times more accurate than the latter in reproducing vehicle behavior.

1. the LMGP is proposed which is a new technique for evolving data-driven black-box models in the form of LSTM/GRU-like recurrent neural networks, 2. the LMGP was experimentally verified by using it to construct a model of an AUV, 3. the LMGP was compared with different types of feed-forward and recurrent neural networks and training algorithms, 4. the models were built based on small sets of training data.
The rest of the paper is organized as follows: section two details LMGP, section three compares MOPs and LSTM/GRU networks, section four outlines HCMAE, section five reports the experiments, section six outlines limitations of the LMGP, section seven presents possible directions of further research, and the last eigth section contains the conclusions from the research.
Since many abbreviations are used in the paper, for the convenience of the reader, the list of the most frequently used abbreviations is presented in Table 1.

Linear matrix genetic programming
The LMGP is an evolutionary technique meant for the construction of neural network data-driven black-box models represented in the form of Matrix Operation Programs (MOP).The MOPs evolve according to the HCMAE which is an algorithm for evolving a set of matrices and a LMGP optimization engine.The LMGP scheme is presented in Fig. 1.
The task of MOP is to predict the state of the modeled system based on a sequence of previous states and control signals.The current state of the system and the control signals are fed into the MOP input, and after executing the program, the predicted state is obtained at the MOP output.
MOP consists of a sequence of matrix operations: MOP=< O 1 , O 2 , . . ., O N O > , N O ≤ N O,max , where N O,max is maximum number of operations in all evolving MOPs.Execution of an MOP involves executing all operations in sequence, starting from operation O 1 and ending with operation O N O.
The task of each operation is to change the state of one of the MOP registers.The MOP operates on a set of registers: < r 0 = r in , r 1 = r out , r 2 , . . ., r N R > , where r 0 ∈ R I length , r 1 , . . ., r N R ∈ R O length , i.e. vectors that play the role of variables in the MOP.Each MOP has N R + 1 registers at its disposal, including an input register r in = r 0 Table 1.List of abbreviations used throughout the paper.LMGP defines six main operations that were used in the experiments reported further: 1. Input Multiplication (IM): r i := M IM j r in , where i = 1..N R , j = 1..N IM , and M IM j is a matrix of size O length × I length , 2. Register Multiplication (RM): r i := M RM j r k , where i, k = 1..N R , j = 1..N RM , and M RM j is a matrix of size O length × O length , 3. Register Addition (RA): r i := r j + r k , where i, j, k = 1..N R , 4. Register Addition and Element-Wise product (RAEW): r i := r i + r j • r k , where i, j, k = 1..N R , 5. Register Logistic (RL): r i := logistic(r i ) , where i = 1..N R , 6. Register Hyperbolic Tangent (RHT): r i := tanh(r i ) , where i = 1..N R .However, the above list of operations is not exhaustive and, if necessary, it can be extended to include other operations, e.g.:   The first operation in any MOP is always an IM.All other operations can be of any type.
In addition to a set of registers-variables r i , i = 0..N R , MOP also has at its disposal a set of matrices-constants To decode and create a MOP it is necessary to decode all operations.However, the final MOP only contains operations whose o g 1 ≥ 1 .Rows of M MOP whose o g 1 ≤ 0 correspond to inactive operations and they are not analyzed.This is the case except for the first row which always corresponds to the IM operation.The above mechanism enables LMGP to evolve variable-length MOPs.
To decode an operation, the operation type is decoded first, and then the other type-dependent components.The type of operation is decoded as follows: , where o T ∈ {IM, RM, RA, RAEW, RL, RHT} is an operation type whereas N T = 6 is the number of operation types.In turn, registers and matrices used in the operation are decoded as follows: o R 2,3,4 := f 2 (o g 2,3,4 ) := (abs(o g 2,3,4 ) mod N max ) + 1 , where o R is the register/matrix number and N max , depending on the operation type and decoded component, is equal to N R , N IM or N RM .For example, if o T = RM then

Comparing MOP to LSTM/GRU
As already mentioned, MOP is LSTM/GRU-like neural network organized in the form of a sequence of matrix operations.However, LSTM/GRU networks can also be represented as a sequence of matrix operations.Such representation of GRU and LSTM networks consisting of h units is depicted in Fig. 4.
The operation of GRU/LSTM networks can also be implemented in the form of appropriate MOP programs.Figure 5 depicts MOP counterparts of GRU and LSTM programs specified in Fig. 4.
The above-mentioned MOPs and GRU/LSTM programs are equivalent in terms of the resources they use, i.e. registers/gates, weight matrices or bias vectors, and the output they generate.However, they differ in the way they operate.GRU/LSTM programs allow for greater parallelization of processing.The multiplication of  GRU and LSTM programs (t is point in time, x t ∈ R d is an input vector of size d (or input register like in MOP), h t ∈< −1, 1 > h is an output vector of GRU/LSTM layer (or output register line in MOP), W ∈ R h×d , U ∈ R h×h , b ∈ R h are weight matrices and bias vectors, z t , r t , o t are different gate vectors (or hidden registers like in MOP), and c t , g t are working vectors (or hidden registers like in MOP)).

Figure 5.
Example MOPs implementing GRU and LSTM networks-one block of operations corresponds to one operation in GRU/LSTM program ( r in ∈ R d is an input vector of size d, r 1 ∈< −1, 1 > h is an output vector, M IM ∈ R h×d , M RM ∈ R h×h , M B (•) ∈ R h are weight matrices and bias vectors, r i , where i = 1..5(7) are registers).
the weight matrices with the input vector and gate vectors in each single operation (line in the program) can be done in parallel.Moreover, the calculation of the output of the individual gates is also mostly independent of each other and can be done in parallel.
Meanwhile, in MOP we deal with the sequential execution of a series of simple operations, each of which changes the state of one register.MOP counterparts of GRU/LSTM programs simply break down the GRU/LSTM functionality into smaller pieces that are executed sequentially.
However, while GRU/LSTM programs have a fixed architecture, and their operation can only be influenced by modifying the weight matrices and bias vectors, the advantage of MOPs is their flexibility which results from the fact that they can implement different behaviors not only by changing the content of IM and RM matrices, but also by adjusting the program code itself to the problem to be solved.MOPs can be very simple but also more complex like those presented in Fig. 5.The complexity of MOPs depends on the hyper-parameter setting, i.e. types of operations that are allowed in the programs, the maximum number of IM and RM matrices, the maximum number of registers, and decisions of the HCMAE optimization engine which designs the matrices and then assembles MOPs from ready-made parts.

Hill climb modular assembler encoding
In order to construct a MOP, a program code and constants have to be defined, which in LMGP take the form of matrices.This means that the MOP optimization problem can be represented as a search for a set of optimal matrices.For this purpose, Hill Climb Modular Assembler Encoding (HCMAE) can be used, i.e. an algorithm for the evolution of modular neural networks in which each module is encoded in the form of a matrix.Since the algorithm is already detailed in 49 , in the current paper only its outline is given.
The HCMAE pseudo-code presenting a variant of the algorithm dedicated to the evolution of matrices (not neural networks) is shown in Algorithm 1. Generally, HCMAE represents an incremental approach meaning that it improves the matrices in many subsequent iterations.Initially, they are initiated as sparse matrices-(line 2 in Algorithm 1), and then, step by step, they are improved through gradually modifying their content (line 9 in Algorithm 1).In each iteration, only one matrix is modified, all other matrices remain unchanged.The selection of the matrix to update is random (line 8 in Algorithm 1).The probability of selecting a matrix, initially the same for all matrices (line 3 in Algorithm 1), changes in each HCMAE iteration and depends on the successes and failures in searching for its improved variant (line 18 in Algorithm 1).In each iteration, only matrix modifications leading to improved results are accepted (line 13 in Algorithm 1).
A key element of HCMAE is the way each of the selected matrices is modified to obtain improvement.Modification of the matrix in one iteration does not change its entire content, it usually concerns some fragment of the matrix.This means that we do not lose what was achieved earlier.The goal we want to achieve is to slightly change, add, or remove some matrix items, however, keeping all the positive changes obtained in the previous iterations.
Such an effect is achieved thanks to the evolutionarily shaped Assembler Encoding Program (AEP), which is a simple assembler-like program consisting of two/three operations and a sequence of data.The task of each operation is to transfer selected values from the AEP data sequence to the matrix to the place indicated by the operation parameters.The number of transferred values also depends on the parameters of the operation.Due to the small number of operations in AEP, matrix modifications usually have a small range.Moreover, the small size of the AEP enables its effective evolution.
Algorithm 1. Pseudo-code of HCMAE.In HCMAE, the evolution of AEPs, and consequently matrices, follows the Cooperative Co-Evolutionary Genetic Algorithm (CCEGA) 53,54 (line 9 in Algorithm 1).Each operation as well as the data sequence evolve in separate populations.Operations are encoded in the form of integer strings (operation parameters indicating where to change the matrix, and what and how many values to copy), while the data sequences are encoded in the form of real-number strings.
In individual populations, evolution follows the classical genetic algorithm, with tournament selection, onepoint crossover, and mutation.The probability of crossover is constant throughout the evolution, while the probability of mutation varies depending on the state of the evolutionary process.The operations and data sequences are crossed and mutated with different probabilities.

Experiments
In order to verify the effectiveness of LMGP, the algorithm was applied to evolve a model of an AUV based on previously collected recordings of its behavior in an underwater environment with disturbances.The AUV carried out a mission by following the lawnmower trajectory a short distance from the bottom.Since the registration of the AUV behavior took place in simulation conditions, the task of LMGP was to recreate the mathematical model of the AUV used during simulations 50 .

Modeling problem formulation
The state of the modeled vehicle S =< x, y, z, H, V , β > (see Fig. 6) was determined by six parameters: (x, y, z)- AUV position in a local coordinate system, H-heading (direction), V-speed, and β-pitch angle.Moreover, the AUV was controlled by three control signals C =< T, R, E > : T-thrust of the propeller (forward), R-rudder angle (left/right), and E-elevator angle (down/up).This means that the task of the model was to predict the state of the vehicle S(t + 1) for the known current state of the vehicle S(t) , the previous states, the control signal C(t) , and the previous control signals.www.nature.com/scientificreports/For the feed-forward networks to be able to perform the above task, it was slightly reconfigured.Namely, the networks were not fed with combined vectors of the state and control signals: S(t) C(t) , but with the change of the state in two consecutive time steps concatenated with the control signals: �S <−1,1> (t + 1)�C <0,1> (t + 1) , (see Figs. 7 and 8) where �S <−1,1> (t + 1) := N <−1,1> (�S(t + 1)) , �S(t + 1) := S(t + 1) − S(t) , C <0,1> (t + 1) := N <0,1> (C(t + 1)) , whereas N <−1,1> and N <0,1> are normalization functions.

Neural networks
In the experiments reported further, LMGP and MOP models were compared with: 1. Monolithic feed-forward (FFANN) and recurrent neural networks (RANN) evolving according to HCAE, 2. Modular feed-forward (MFFANN) and recurrent neural networks (MRANN) evolving according to HCMAE, 3. LSTM networks trained with Stochastic Gradient Decent algorithm, 4. GRU networks trained with Stochastic Gradient Decent algorithm.
The list of all neural networks and types of MOPs applied in the experiments is given in Tables 2 and 3 Figure 7. Example normalized state of modeled AUV during 400 second voyage.Each normalized state parameter was calculated according to the following procedure: www.nature.com/scientificreports/Monolithic networks have a loose, non-layered architecture and contain neurons with the logistic sigmoid activation function.In feed-forward networks, all but recurrent connections are allowed, while in recurrent networks there are no restrictions on the inter-neural connections.In each evolved network, the number of neurons and connections is determined evolutionarily.However, the total number of neurons, including input and output neurons, cannot exceed the maximum value which in the experiments amounted to 30: 9 input, 6 output, and a maximum of 15 hidden neurons.Modular networks had three different architectures depicted in Fig. 9.In all the networks, regardless of the architecture, the maximum number of neurons in each module amounted to 20: 9 input, 6 output, and a maximum of 5 hidden neurons.Like in the monolithic networks, the exact number of neurons and connections depended on the evolutionary process.The inner architecture of the modules was the same as the architecture of monolithic networks.
The architecture MANN1 (see Fig. 9a) had two variants, i.e. feed-forward (FFMANN1) and recurrent (RMANN1).It consisted of two separate network-modules each fed with the entire input vector �S <−1,1> (t + 1)�C <0,1> (t + 1) .In FFMANN1 both modules were feed-forward, whereas in RMANN1 both modules were recurrent.The task of module M1 was to predict half of the AUV next state vector, while the task of module M2 was to predict the other half of this vector.
The MANN2 architecture consisted of three modules-see Fig. 9b.Like the MANN1, the MANN2 also had feed-forward (FFMANN2) and recurrent (RMANN2) variants.In FFMANN2, all the modules were feedforward, whereas in RMANN2, modules M1, and M3 were recurrent and M2 was feed-forward.The task of modules M1, and M3 was to predict the next state of AUV, whereas M2 was responsible for selecting one of the predictive modules.
The MANN3 architecture was a combination of neural networks and the KMeans clustering algorithm.The input space of the modeled system (vector state plus control signals) was clustered into N C = 2, 3 or 4 groups, each represented by a center (C1..C4) and a module (M1..M4).To predict the state of AUV at time-point t + 2 , the input vector corresponding to time-point t + 1 was assigned to one of the groups.The module associated with this group then made predictions.In the feed-forward variant (FFMANN3), all the modules were feed-forward, whereas in the recurrent variant (RMANN3) all the modules were recurrent.
All the networks specified above were compared to six different MOPs, say, MOP1, MOP2,..., and MOP6.MOP1 had the following parameters: N O,max = 9 , N R = 3 , N IM = 2 , N RM = 2 , I length = 9 , O length = 6 , and during evolution, it was encoded in the form of five separate matrices, i.e. one M MOP matrix, two M IM matrices and two M RM matrices.
MOP2 was an extension of MOP1 with two further M IM matrices and two M RM matrices ( N IM = 4 , N RM = 4 ).Moreover, MOP2 was encoded as two separate matrices, i.e. one M MOP matrix and one super-matrix of size 4 O length × (I length + O length ) that included four M IM matrices and four M RM matrices.This procedure aimed to reduce the number of matrices processed by HCMAE.In MOP2, instead of nine separate matrices, HCMAE dealt with only two matrices.Another reason for the evolution of MOP2 as two separate matrices is the limitation of the current GPU HCMAE implementation which does not allow the evolution of such many matrices as in MOP2.
MOP3 extended MOP2 with two further registers ( N R = 5 ).The evolution of MOP3 took place in two separate matrices like in MOP2.
MOP4 was MOP2 with a slight change in the process of matrices M MOP evolution.Normally, HCMAE treats all matrices the same way.It does not pay attention to the fact that they represent different entities, i.e. either a list of MOP operations or MOP matrices.In HCMAE, each genotype represents a change in a selected matrix.

Figure 9. of MANN1, MANN2, and MANN3 (Input=�S
where i is the number of module).
The change is made on selected items in the matrix-the items can be updated or set to zero.Which items are modified does not depend on what information these items store.The only adjustable parameter that can be set for each matrix is the range of changes.Usually, larger matrices need greater ranges than smaller matrices.In MOP4, the other approach was applied, namely, M MOP s evolved differently from the remaining matrices.In this case, each HCMAE genotype had three options, i.e.: (i) to remove the last operation from MOP, (ii) to modify the last operation in MOP, or (iii) to add a new operation at the end of MOP.Parameters for a new/ modified operation were taken from the genotype.
MOP5 and MOP6 were modular programs which consisted of two MOP modules.In MOP5, the MOP modules were organized like modules M1 and M2 in MANN1 (see Fig. 9a), that is, each module was responsible for predicting half of the AUV state vector.In turn, MOP6 had a two-layered architecture like LSTM2/GRU2 networks, that is, the first layer MOP module produced input for the second layer MOP module.
Regardless of the modular option, all MOP modules had similar architecture: N O,max = 5 , N R = 3 , N IM = 2 , N RM = 2 .They differed in I length and O length .In MOP5, both modules had the same input and output size: I length = 9 and O length = 3 , whereas in MOP6, the input module was larger than the output module: I length = 9 and O length = 6 for the input module and I length = 6 and O length = 6 for the output module.
MOP5 and MOP6 evolved in three separate matrices, i.e. two M MOP matrices-one matrix per module, and one super-matrix including M IM and M RM sub-matrices-four sub-matrices module.

Dataset
In the experiments, 40 recordings of the modeled AUV were used.The behavior of the AUV was recorded at a frequency of 10 Hz.Each recording corresponded to a 400-s voyage.20 recordings were used to evolve/train the models and the other 20 recordings were applied to verify their effectiveness.
From the point of view of the high demand of deep learning algorithms for training data, the number of 20, 400-s recordings intended for training the networks seems very small.However, such a number of data is a deliberate choice resulting from the desire to recreate the real situation in which, due to the high costs of recordings at sea, the number of available training data is limited.

Evaluation
To evaluate models evolved according to LMGP, HCAE, and HCMAE, the following fitness function was used: where Out l,t,k -is kth output of the model in lth recording and tth recording entry, N 400 -is the number of entries in a single recording, N 400 = 3998, N 20 -is the number of recordings used in the training process, N 20 = 20.
The function (1) is the inverse of error E sum that is the sum of errors E entry determined for all recordings used in the training process and all entries in these recordings.In turn, error E entry is either the total error of reproduc- ing all parameters of the modeled object |S| k=1 (E(l, t, k)) 2 or penalty 10|S| when the model is a local maximum which generates a value of 0.5 on each of its outputs Out k , k = 1..|S| , regardless of the modeled object's state In the case of LSTM/GRU networks, the standard MSE loss function was used in the network training process.The final evaluation of the models was performed using the following function: (1) where l = 21..40 means the evaluation on the twenty recordings that were not used in the training process.Func- tion ( 5) corresponds to a scalarization of a multi-objective optimization problem in which the goal is to minimize the Euclidean distance dist(E 0 , E max ) between maximum error point E max =< E max (1), E max (2), . . ., E max (|S|) > , which represents the combination of the largest errors for all tested cases and all vehicle parameters, and ideal point E 0 =< 0, 0, . . ., 0 > , which indicates the perfect situation when all errors are equal to zero for each k, l, and t.

HCMAE setting
In general, HCMAE parameters remained largely unchanged compared to the research reported in 49 .The only parameters that were tuned to the modeling problem were ranges of matrix modifications (number of updated items) by individual AEP operations.For matrices M IM and M RM , the maximum and minimum ranges amounted to 50 and 5, respectively, while for matrix M MOP they amounted to 5 and 1.The remaining HCMAE parameters are given in the last section of the paper.

Implementation
During the research, our own parallel (GPU, CUDA-NVIDIA parallel computing platform) implementation of LMGP in C++ was used.Both the optimization/evolutionary engine HCMA/HCMAE and the neural networks FFANN, RANN, MFFANN, MRANN, and MOPs had their parallel implementation in C++.
In turn, the LSTM/GRU networks as well as the process of their learning and testing were implemented in Python on GPU servers using the Keras library.

Results
In order to compare all the modeling methods, each of them was used to generate thirty different models, each of which represented one run of the training process.Each run delegated a single most effective model with the highest evaluation according to function (1) or MSE loss function.The final evaluation in the form of mean and maximum values of the function ( 5) are given in Table 4 and Fig. 10.
The maximum values in the table and the figure show the result of the best model that was generated for a given modeling method out of thirty available models after the learning process.It turns out that LMGP, regardless of the MOP type, is the most effective in this respect.All MOPs except MOP1 achieved a fitness greater than 0.38.Only evolutionary neural networks FFANN and FFMANN2 achieved similar effectiveness.
However, if, when evaluating the methods, we take into account the average result, which shows the true potential of a given modeling method to generate effective models, then the most effective methods are LMGP(MOP1, MOP2, MOP5), HCAE(FFANN) and HCMAE(FFMANN1).Each of the above methods resulted in an average fitness above 0.35.
Analyzing the state prediction errors generated by the most effective models mentioned above, examples of absolute errors are presented in Fig. 11, while examples of actual values are shown in Fig. 6, it should be stated that errors of position (x, y, z) are at a very low level.This is probably due to the fact that the position of the vehicle, regardless of the coordinate, is slowly changing.The error of heading H is also very small with two major www.nature.com/scientificreports/spikes when the heading crosses 360 degrees and there is a drastic spike in value.The error of vehicle speed V is also at a small satisfactory level compared to the real values.The biggest errors appear for the pitch angle β .The reason is the large spikes in the input values of the models for this angle, which can be seen in Fig. 7.During the tests, all models were not supplied with the vehicle state parameters but with the change of the state in two consecutive time steps normalized to the range < −1.1 >-�S <−1,1> .As Fig. 7 shows, the largest spikes in the input values to the models occur precisely for the pitch angle.Noteworthy are the extremely poor results of the LSTM and GRU networks.Surprisingly, during the simulations, it was not possible to obtain even a single LSTM/GRU network that would be as effective as other modeling techniques.There are two most likely reasons for this state of affairs.First, the training dataset may not have been enough to make the networks more efficient.Second, they are the only networks that have not been trained evolutionarily.The use of the gradient training algorithm in their case could cause the objective function to get stuck in the local minima, which in turn could affect the mean value of the error.
Due to the population nature of the HCAE and HCMAE evolutionary algorithms, the learning process of MOPs and FFANN, RANN, MFFANN, MRANN neural networks was slower than the LSTM/GRU networks trained with the use of gradient algorithms in which a single instance of the network is trained.Regardless of the algorithm, the learning process ended each time when no progress was found for a longer period.
However, learning speed is not a key parameter in the construction of a reliable model of any real object.The most important in this case are two parameters.The first is the quality of the model, which is presented in Table 4 and Fig. 10 and is understood as its accuracy in reproducing the behavior of the real object.The second parameter is the speed of the model, which is important, for example, during its use in the process of building/ learning the ACS system for the modeled object.The speed of the models is influenced by the implementation which was different for LSTM/GRU and the other models, but also by the "size" of each model and the potential of each of them for parallel processing and optimal use of GPU resources.In this case, it should be noted that although during the tests the differences between the various models were barely noticeable, the MOPs, due to their sequential nature, will always be slightly slower than other models of similar size.
Although MOPs are not significantly more effective than evolutionary neural networks, they have one feature that can make them a more attractive modeling tool.Namely, they constitute a comprehensible representation of the modeled object.Neural networks are a classic black-box model.The complex internal architecture of the networks makes it difficult or even impossible to analyze the modeled object based on the analysis of the networks.In the case of simple networks consisting of a few neurons, it is possible to analyze the impact of individual interneuron connections and the neurons themselves on the state of the modeled object.It is similar to larger networks with clearly separated building blocks, in which analysis is possible at the level of blocks, not neurons or connections.However, for more complex networks, for example, those constructed using HCAE/ HCMAE which are monolithic or modular networks with monolithic modules without clearly separated layers or blocks, the analysis of their operation comes down to the level of a single neuron and synapse, which for people unfamiliar with neural networks seems to be impossible.
Unlike the networks, MOPs offer this possibility.Representing a modeled object in the form of a MOP, which is a sequence of well-known matrix operations, makes the object easier to understand.In the case considered in this paper, the neural models are encoded in the form of a matrix in which each entry indicates the weight of a connection between neurons.In addition, the matrix also includes neuron parameters such as bias or type of transfer function.Analysis of such a model, even for medium-size networks that were used in the experiments, is at least problematic.Meanwhile, most effective MOPs, regardless of their type, consisted of a single IM operation, which generated the output of the entire program-the result of the operation was saved in register no. 1 which in all MOPs played the role of an output register: r 1 := M IM r in .Such MOPs revealed that the behavior of the modeled vehicle can be approximated with a high accuracy by a simple linear system and complex neural networks were unnecessary in this case.
In addition to the simple MOP models mentioned above, LMGP also evolved more complex models which were equally accurate as the simple ones.Examples of such models are given in Fig. 12.Despite their greater complexity, these models still seem simpler to analyze than their neural black-box counterparts.
The examples of MOPs depicted in Fig. 12 also show that although each of them is a sequence of operations, they enable the implementation of multiple parallel processing streams, each of which is associated with a different register.The more registers and matrices from which the MOPs can be assembled, the greater the possibility of building more complex models.However, increasing the number of potential MOP components increases simultaneously the complexity of the task of combining these components into a single model and makes it difficult to construct effective models.This is visible in the case of MOP3 which is an option with the largest number of available registers and matrices.
The experiments also revealed the key influence of the slow gradual evolution of matrices M MOP , that encode list of MOP operations, on LMGP effectiveness.Faster and deeper changes of these matrices in subsequent iterations of the evolutionary process significantly hinder the generation of effective programs, as demonstrated by preliminary experiments with LMGP.For that reason, in all LMGP options, genetic operations on M MOP are very delicate and they are made on small fragments of the matrices.Most options simply modify some vertical or horizontal patch in the matrix whereas MOP4 adds or removes one operation at the end of the program.
In contrast to single-program MOPs, their modular counterparts MOP5 and MOP6 very rarely implemented simple linear models.In MOP5, which like MANN1 implemented two parallel streams of processing and which like MANN1 appeared to be an effective modeling solution, although each module was responsible for a piece of the entire task, there was not even a single linear module.What is more, a lot of modules consisted of the maximum number of operations.Example MOP5s are depicted in Fig. 13.
The most likely reason for this is the difficulty in the evolution of the appropriate matrix M IM .In all modules presented in Fig. 13, the first operation represents a linear model.Despite this, none of the modules is ultimately linear.This means that HCMAE was unable to produce the appropriate matrix M IM and more operations were required to obtain an accurate model.
In turn, HCMAE difficulties were most likely due to the placement of matrices M IM and M RM , used by two separate modules, in one common super-matrix and the evolution of this matrix as a separate consolidated module.To obtain two separate linear modules, it would be necessary to create them in two separate locations in the super-matrix.As it turned out, it was easier to expand the MOP modules with further operations than to obtain such a super-matrix.
MOP6 with sequential modular architecture turned out to be the least effective variant of LMGP.It seems that the most reasonable explanation for such a result obtained by MOP6 is the fact that the construction of a reliable model consisting of a sequence of simple operations used by the current version of the LMGP is not an easy task.The linear model is simply the most natural solution in the case considered in the paper, as shown by the tests with monolithic LMGP variants.As already mentioned, most of the models generated by these variants were simple linear models encoded in the form of MOPs consisting of a single operation.More complex models were very rare.Meanwhile, in MOP6, LMGP was forced to construct models composed of a sequence of operations.Of course, an additional difficulty was the same factor as in MOP5, i.e. the use of one super-matrix to represent all the matrices applied in both MOP6 modules.
The construction of reliable models following the MOP6 architecture (see Fig. 14) seems to be the confirmation of the above considerations.Each of these models has a simple linear architecture of an input module and a more complex architecture of an output module.The output module cannot be neutral and pass the input signal without processing which would be equivalent to the solutions obtained for the monolithic LMGP variants.It has to somehow transform the output of the linear module into the final output of the entire model, and the complex architecture of this module, and generally, the results of MOP6, show that this task is a serious challenge for LMGP.

Limitations
The research reported in the paper shows that LMGP has great potential for effective time series processing, in particular black-box modeling.However, LMGP also has limitations that may hinder its application to more complex problems which may require more complex MOP programs and a larger number of evolving matrices M IM and M RM .
LMGP uses HCMAE as an optimization engine.One iteration of HCMAE modifies only one matrix among the entire set of matrices.The choice of matrix to change depends how previous changes to this matrix affected the MOP result.If there are many of these matrices, the process of constructing MOPs may require many iterations of the algorithm.
However, a solution to the above problem may be to use one shared matrix, say SM , to encode all the matri- ces required by LMGP, except the matrix encoding MOP ( M MOP ).In this case, the task of HCMEA will be to construct two matrices, i.e. one matrix with the list of operations ( M MOP ) and the second large matrix coding all other matrices used in MOP operations ( SM ).An interesting issue in this case may be the possibility of sharing information among many different matrices used by different MOP operations, but encoded in one common SM matrix modified by HCMEA.This situation will occur if individual MOP matrices overlap in the matrix modified by HCMEA.
Abbreviation Definition ACS Autonomous control system AMR Autonomous mobile robot AUV Autonomous underwater vehicle CNN Convolutional neural networks GRU Gated recurrent unit HCAE Hill climb assembler encoding HCMAE Hill climb modular assembler encoding LMGP Linear matrix genetic programming LSTM Long short-term memory MOP Matrix operation program RNN Recurrent neural networks

7 .Figure 1 .
Figure 1.Scheme of LMGP for N IM = 2 and N RM = 2 .In each iteration, HCMAE generates a population of matrix sets.Each set consists of N IM matrices M IM , N RM matrices M RM and one matrix M MOP .Each M MOP is decoded into MOP which is then evaluated.The evaluation of all MOPs is used by HCMAE to generated new matrix set.

Figure 2 .
Figure 2. MOP operation over time.Like LSTM/GRU, MOP processes data sequentially and keeps its hidden state in registers r 2 , . . ., r N R through time (t -time step).

3 and o g 4
whose state does not change while the program is running.Matrices M IM i are used in IM operations while M RM j are used in RM operations.MOPs are constructed in an evolutionary manner using HCMAE.The HCMAE decides on the number of operations N O ≤ N O,max , the type of individual operations, the assignment of registers r i , i = 2..N R and matrices M IM i , M RM j , i = 1..N IM , j = 1..N RM to individual operations, as well as the content of matrices M IM i and M RM j .At the genotypic level, each operation is encoded in the form of an integer vector o g =< o g 1 , . . ., o g 4 >∈ Z 4 , where o g 1 determines type of operation, o g 2 indicates left-hand side register, whereas o g encode right-hand side matrices, registers, and constants.All operations of MOP are grouped in a matrix M MOP of size N O,max × 4 in which each row encodes a single operation-see Fig. 3.

Figure 4 .
Figure 4. GRU and LSTM programs (t is point in time, x t ∈ R d is an input vector of size d (or input register like in MOP), h t ∈< −1, 1 > h is an output vector of GRU/LSTM layer (or output register line in MOP), W ∈ R h×d , U ∈ R h×h , b ∈ R h are weight matrices and bias vectors, z t , r t , o t are different gate vectors (or hidden registers like in MOP), and c t , g t are working vectors (or hidden registers like in MOP)).

Figure 6 .
Figure 6.Example state of modeled AUV during 400 s voyage: (x, y, z) AUV position in a local coordinate system, H heading (direction), V speed, and β pitch angle.

Figure 12 .
Figure 12.Example effective MOP models consisting of more than one operation.

Table 2 .
List of neural networks used in the experiments (FF feed-forward, R recurrent).. FF/R

Table 3 .
List of MOPs used in the experiments. .
N O,max N IM + N

Table 4 .
Results of experiments (the best results are bolded, FF means FFANN, R means RANN, and similarly for other neural networks).