Combining multi-objective genetic algorithm and neural network dynamically for the complex optimization problems in physics

Neural network (NN) has been tentatively combined into multi-objective genetic algorithms (MOGAs) to solve the optimization problems in physics. However, the computationally complex physical evaluations and limited computing resources always cause the unsatisfied size of training set, which further results in the combined algorithms handling strict constraints ineffectively. Here, the dynamically used NN-based MOGA (DNMOGA) is proposed for the first time, which includes dynamically redistributing the number of evaluated individuals to different operators and some other improvements. Radio frequency cavity is designed by this algorithm as an example, in which four objectives and an equality constraint (a sort of strict constraint) are considered simultaneously. Comparing with the baseline algorithms, both the number and competitiveness of the final feasible individuals of DNMOGA are considerably improved. In general, DNMOGA is instructive for dealing with the complex situations of strict constraints and preference in multi-objective optimization problems in physics.

relying on the estimated indicators to select parents since the training of NN begins [14][15][16] still exists in them. Once the strict constraint is considered in the optimization, the feasible individuals are easily estimated as infeasible because of the small size of training set, and this is a major challenge for these algorithms.
To deal with this problem, a penalty operation that can be progressively stricter over generations is executed in fitness function to fulfill constraints gradually. Meanwhile, NN is included in an operator. This operator not only produces several individuals to be further evaluated, just like the operators of mutation and crossover do, but also screens in a great number of estimated individuals internally. The performance of these operators is different when the penalty changes, so the number of individuals that come from these operators to be further evaluated is dynamically redistributed. Namely, this algorithm is called the dynamically used NN-based MOGA (DNMOGA). In addition, accessibility algorithm is proposed as a new idea to deal with preference in NSGA-II, which doesn't depend on extra reference points that are set manually in other algorithms 17,18 to lead the nondominated front to approach them.
The shape of spherically shaped (SS) normal conducting cavity, a kind of RF cavity used in PEP II 19 , is optimized to prove the advantage of DNMOGA. Differences among various algorithms are compared, while two NN models are combined into DNMOGA respectively to illustrate the relationship of the performance of this optimizer and the accuracy of NN. Besides, some other details about DNMOGA are also discussed below. As a result, DNMOGA shows the potential to completely replace manual procession in this question, which also announces its ability to solve other optimization problems in physics that have the similar features with this design.

Results
NN models. One of the two NN models mentioned above is a simple artificial neural network (ANN) that has 5 neurons within one hidden layer [20][21][22][23] . The other one, which is shown in Fig. 1, combines ANN and Transformer 24,25 and has better accuracy. Note that neither of these models can accurately estimate the indicator of equality constraint in these experiments. The accuracies of these models are expressed with R 2 , a criterion unrelated to MOGAs. This criterion is defined as where y i is the label, y i is the estimated value and y is the average of y i . The indicator estimated by NN is more precise when the value of R 2 is closer to 1. The values of R 2 of these models in 5 indicators are listed in Table 1.
The 5 indicators are f FM , R/Q of the FM (R/Q FM ), R a of the FM (R a FM ), Q factor of the FM (Q FM ), frequency of (1)  www.nature.com/scientificreports/ the higher order modes (f HOM ), in turn. From this table, the Transformer combined with ANN estimates more accurately in all indicators. The specific illustration of these models and the idea of designing these models are described in Supplementary materials.
Fitness function for dealing with constraint. Although feasible individuals are the favorites to decider, infeasible individuals can always provide the hidden information about global optima and the lower constraint violation (cf. Eq. 9). So, how to balance the numbers of feasible and infeasible individuals in parents is an important problem. Generally, there are four categories 26 to deal with it: adding a penalty to fitness function 27,28 , judging constraint condition before adding the penalty 29,30 , proposing novel selection strategy [31][32][33] and regarding constraint as objective 34,35 . The popular ways in engineering are the first two categories, while the specific operation of penalty in them is punishing infeasible individuals by increasing their fitness. Unfortunately, the uncontrollable numbers of feasible and infeasible parents in these algorithms can be dramatically different when the range of constraint changes or the number of generations is different, which results in a failure of balance.
To solve the mentioned problem, the feasible and infeasible parents are put into two independent groups as a detail of setting parents in this work, while a fitness function with a penalty operation that can make individuals fulfill constraints gradually is used. In the mathematical model shown below, − → x = (x 1 , x 2 , . . . , x l ) ∈ S is a l-dimensional decision vector, in which S ⊂ R l is the decision space.
− → F − → x is a fitness vector consisting of the fitness values of m objectives, and F i − → x (i ∈ {1, 2, . . . , m}) is the fitness function. The g j − → x and h j − → x are inequality and equality constraints respectively, n represents the total number of inequality and equality constraints, and x min q and x max q are the bound constraints of x q (q ∈ {1, 2, . . . , l}).
The general way to penalize the fitness function is shown here 27,30 : where f i − → x is the value of the ith objective, and δ is a very small positive tolerance value to relax the equality constraints. r ′ j as the penalty coefficient used to be adjusted in different problems to keep the balance 28,36 . Based on these experiences, the r ′ j is redesigned to match the operator with NN, and the whole expression of penalty operation is described below.
The F i − → x used in Eq. (2) is set as where CV − → x described in Eq. (9) is the overall value of constraint violation, and r j is a penalty coefficient shown in Eq. (8).
In Eq. (8), pre represents the number of generations (the evaluated generations, Fig. 3) at present, and total represents the total number of evaluated generations. k j is an adjustable value and written as k later, because there is only one constraint in the specific problem. const is a constant to keep a suitable range of speed to converge, and it is set as 1 e 4 . In this penalty operation, F i − → x will be more and more large when CV − → x grows. r j (pre) has a good mathematical property, because it is a continuous and derivable convex function when pre is regarded as an independent variable. The CV − → x is calculated as where the operation of normalization can be calculated according to Eq. 10. www.nature.com/scientificreports/ Note that the values of F i − → x , CV − → x and f i − → x must be normalized. In each generation, the values that come from the same function create a set, and the method of normalization in each set is expressed below: where the Y is an element in set, and max(Y ) , min(Y ) is the maximum and minimum in this set respectively.

Dynamically redistributing the numbers of individuals to operators. The way to redistribute is defined as
In Eq. 11, T is fixed as the total number of evaluated individuals in each generation, while PT v represents how many individuals are selected as parents totally in the vth generation. For each operator, N v+1 represents the redistributed number of individuals in the (v + 1) th generation, and PN v represents how many individuals from this operator are selected as parents in the vth generation.
The accessibility algorithm. Sometimes the decider might rank the degrees of importance to all objectives, which means preference exists. The common idea to deal with preference is introducing manual reference points 17,18,37 and calculating distances between individuals and these points, then leading the nondominated front to approach them. But these methods only concentrate on the narrow regions around the reference points where too much hidden information about global optima is lost, and consequently converge to local optima in complex problems. In DNMOGA, accessibility which is frequently used in traffic problems 38 is introduced instead of original crowding-distance, and it can be expressed as The first three terms of Eq. 12 is expressed in 38 as well, in which the M represents the number of areas, 1 A is the accessibility of the first area, and 1 A o is the accessibility from the first area to the oth. At the same time, S o represents the vitality in the oth area, and it can be described as population, gross domestic product, etc. Besides, T 1,o represents the travel time between these areas, while Y is an exponent to describe the effect of travel time. In DNMOGA, T Y 1,o is replaced by D 1,o to represent the Euclidean distance of objectives between individuals. Then, accessibility can be used to deal with preference by relating S o to every objective as a number larger than 1. In the last term of Eq. 12, − → H represents the preference vector with a series of values, and each of them is corresponded to a specific objective. The larger value in − → H represents that the corresponding objective is more important. In addition, the − → F o represents the fitness vector for these objectives. There are two sets in accessibility algorithm, one is the pending set in which all the individuals of nondominated front are initially included as the pending parents of the next generation, and the other one is the selected set to collect the selected individuals. After normalizing the objective value and putting the boundary solutions to the selected set from the pending one, the accessibility of each pending individual to all the selected individuals are calculated, then the most inaccessible one is taken to the selected set. This procession is repeated until the size of selected set is satisfied. The picture below (Fig. 2) shows the advantage of accessibility. The fake code is released in Supplementary materials.
The complete process of DNMOGA. The flow chart of DNMOGA is shown in Fig. 3. Before describing the complete process of DNMOGA, it is necessary to detail the operator including NN first. In this operator (the orange blocks in Fig. 3), the crossover, mutation and Latin hypercube sampling (LHS) 39 are used to produce astronomical individuals together, then NN is used to estimate the indicators of these individuals. After executing the fast nondominated sort algorithm, the nondominated front is generated as the parents of next generation. This series of operations constitutes the estimated generation, which is carried out five times continuously. Then the accessibility algorithm is executed among the nondominated front of the fifth estimated generation to pick up a certain number of individuals that will be further evaluated.
The first step of the complete process in DNMOGA is evaluating a certain number of individuals that are produced by LHS as the training set of NN and the first generation of NSGA-II, after which the parents are selected and the main loop begins. In this cycle, crossover and mutation are used again as two of the operators (the green and yellow blocks in Fig. 3), and they produce the individuals to be directly put in the actual evaluator together with the operator with NN. Then, all the evaluated individuals in this generation are mixed with these earlier evaluated, after that the fitness function is executed. Once the fast nondominated sort algorithm and accessibility algorithm are executed in feasible and infeasible individuals respectively, the two groups of parents for next generation can be obtained. As the last step, the numbers of individuals generated by these operators in the next generation are dynamically redistributed. This generation produces new individuals through actual evaluator, and it is naturally called the evaluated generation. The fake code of the complete process is released in Supplementary materials. www.nature.com/scientificreports/ In this study, some specific details can also be helpful to obtain the competitive individuals. To begin with, as an assisting method to satisfy constraint, standard mutation operation is replaced by adjusting the f FM to produce more feasible individuals, and the specific operation of adjusting frequency is shown below: It is assumed that f FM is only related to Req , and because of the linear relationship in physics between Req and f FM , an appropriate coefficient is empirically set as 2.55 [MHz/mm]. The Req ′ in Eq. 13 is the new decision   www.nature.com/scientificreports/ variables obtained by this operation. The comparison between this method and the standard mutation will be described blow. For the optimization problems in physics, tuning frequency might not suit for all scenarios, but the relationship between the objectives and decision variables can be expressed in a formula, and the positive or negative correlation can be found locally as the basis to design the assisting method even if it sometimes is hard for the objectives to be calculated correctly according this relationship. The second detail is about the two groups of parents. To increase the variety of parents, the number of parents in each group is the greater value between 90% of the size of their nondominated front and a constant. Note that all the values mentioned below about the parents are this constant, and the operation of two groups of parents is not used in selecting estimated individuals because of the dissatisfied accuracy.
Optimization of the SS cavity. The shape and the geometric parameters of the SS cavity are shown in Fig. 5c, in which the tube parameters Rt and Lt that we do not care about are fixed as 200 and 16 mm respectively. In order to obtain the shapes with performance as good as possible, larger space should be explored, which means the limits among the geometric parameters must be considered. For example, the R0_l, R0_r should be smaller than Req, while the sum of Rt, R3_l and nose should be smaller than the Req as well. The way to deal with these limits is executing nonlinear transformations to relate the geometric parameters with independent variables, and the variables should have the same degrees of freedom to the parameters. As a result, 13 independent variables are set as decision variables in DNMOGA. The specific procession is discussed in Supplementary materials, and the range of Leq is in the range of 11 and 630 mm, while Req is between 116 and 216 mm.
Besides the indicators mentioned in the introduction, there are also many other indicators in the optimization problems of RF cavity, such as the Q factor, the normalized peak electric field on the cavity surface, and so on. Principally, Q factor and R a of the FM should be maximized, but according to the previous papers 9,10,40 and mathematical relationship among R/Q , R a and Q factor (as shown in Eq. 14), R a and R/Q are usually considered as the objectives. In addition, the normalized peak electric field on the surface of the optimized normal conducting cavity can always satisfy the decider's requirement, so it is meaningless to be seen as an objective. Moreover, the indicators in HOM are important for the cavity of the 4th generation synchrotron radiation sources, in which the R/Q should be minimized, and the frequency should be maximized. As a result, four indicators shown in Eq. (15) are selected as the objectives of the optimization. Note that the indicators with * should be minimized, so they are transferred to satisfy the mathematical model mentioned above. This is shown in Eq. 15 as well. Besides, HOM is the first higher order mode we meet during the calculation. The δ in equality constraint is 0.05.
When the range of FM frequency in 1000 initial individuals is between 294.44 MHz and 954.05 MHz, which is 6600 times larger than the constraint, approximately 300 feasible individuals are found after evaluating 2000 individuals within 40 evaluated generations. This algorithm takes about 34 h totally when two CST processions work simultaneously.
In order to finish following discussion, a suitable k (which is mentioned in Eq. 8) is set as 0.1. Fit sizes of two groups of parents are set as well, which are 100 and 50 to feasible and infeasible groups respectively. The trends of changing k and the sizes of two groups of parents are discussed in Supplementary materials. These parameters in DNMOGA mainly influence the speed of convergence, while the performance of nondominated individuals would be affected tremendously only with harsh settings.
The results of experiments are shown in Fig. 4. The first comparison is about the two groups of parents and the general way of setting parents. In Fig. 4a,b, both the constants (mentioned above) of setting the total number of parents are the same as 150. The performance of nondominated individuals produced by the two groups of parents is approximately like that with the general method, but the former one creates a larger size of nondominated front. Figure 4c illustrates the dynamic redistribution of evaluated individuals. As the algorithm goes, the penalty is stricter so that the individuals from NN performs worse in fitness values. As a result, less individuals from NN are picked up, and this is the reason why DNMOGA performs better than others when the accurate of NN is dissatisfied.
The results of DNMOGA with two different preferences are shown as well, and their − → H vectors (described in Eq. 12) are set as (5, 1, 1, 0) (Fig. 4d) and (10, 1, 1, 0) (Fig. 4e) respectively. The four values in − → H correspond with the four objectives ranging from F1 to F4 in turn. When the preference of F1 is improving gradually, more individuals that perform good in this objective are obtained. In Figs. 4f,i, individuals from Fig. 4a,d,e are mixed, and all these individuals are ranked by F1 and F4 respectively. From these figures, individuals generated with (10, 1, 1, 0) perform well in F1 and poorly in F4.
By comparing the results of DNMOGA with two different NN models, it is concluded that the accuracy of NN only influences the convergence speed. When the total number of evaluated generations is 40, the performance www.nature.com/scientificreports/ of DNMOGA with ANN is not good (Fig. 4g). But if continuing to evaluate 20 generations (Fig. 4h), it is nearly the same as the result of 40 total generations with a better NN model (Fig. 4a).
The results of different algorithms (NSGA-II, DNMOGA and the NBMOGA in 15 ) are shown in Fig. 5a,b, while the method proposed by 29 is combined in NSGA-II and NBMOGA to deal with constraints. All these experiments evaluated 50 individuals per generation, and the total number of individuals is 3000. The initial population and the numbers of the total generations in NSGA-II are the same as DNMOGA, while these values are 50 and 500 for NBMOGA to keep the same number of estimated generations as DNMOGA (the training of NN in NBMOGA begins at the 10th generation). From the result of Fig. 5a,b, adjusting frequency is a useful assisting method to obtain more feasible individuals, while the gaps in the distributions and sizes of nondominated front between the two algorithms and DNMOGA is clear.  Table 2, in which the advantage of DNMOGA can be discovered. The R/Q FM is improved by about 24% and 14% compared with using the NBMOGA and NSGA-II, while the R a FM is increased by approximately 55 and 22% respectively. Besides, only the R/Q HOM that comes from the individual of DNOMGA tends to zero, which is beneficial for further analysis of HOM. The geometric parameters of the individual from DNMOGA are shown in Fig. 5c.
Some benchmark optimization problems, such as CEC2009, DTLZ, and CMOP, are also used to validate the performance of DNMOGA, and the results have been added in the Supplementary materials.

Discussion
Various machine learning models have potential to be combined into MOGAs to solve multi-objective optimization problems, but the way to combine is the key factor to influence the performance. In this paper, DNMOGA in which NN is dynamically used in a novel way of combination, is proposed and demonstrated. It's good at dealing with the optimizing problem in physics, especially the complex questions with constraints and preference. It is easy to find the advantage of DNMOGA compared to NBMOGA and NSGA-II through the design of RF cavity. At the same time, using assisting methods is functional to make individuals more feasible in problems with strict constraints, while these methods are easier to operate in physically meaningful problems.
Not only can all kinds of RF cavity optimizations be well handled, such as multi-cell cavity 11 and heavy ion cavities 41 , but also other accelerator optimization are principally suitable, like free electron laser 5,6 , nonlinear beam dynamics 4 and so on. If we look at optimization designs in other physics fields, such as radio apparatus 42 and structural components of materials 43 , more questions would be solved. In aerospace, MOGAs have been used to optimize 3D Wing-Shape, but response surface methodology is executed to meet the limited computing resources 44 ; if suitable estimator is combined into MOGAs, the limit of calculation resource might be solved. All in all, it is obvious to see that combining machine learning and MOGAs have great potential to be dug out, the DNMOGA is one of the excellent methodology output.

Methods
Actual evaluator. The actual evaluator in these experiments is CST Studio Suite 45 , a software that can parametrically produce 3D models and obtain the electromagnetic field of the cavity by finite element analysis. The powerful post-processing functions in it can calculate various indicators. In this experiment, the mesh of the finite element analysis is set as 20 cells per wavelength.
Calculation facility. The calculation facility used is a workstation with Xeon W-2265 CPU and 64 GB memory, and it takes about 45 s for this workstation to evaluate a cavity.  www.nature.com/scientificreports/