Introduction

A drug is referred to a substance, except for the nutrients, which impose a temporary and/or diachronic physiological impact(s) in the body. Based on the mechanism of actions and therapeutic properties of drugs, they can be categorized into several classes such as the anatomical therapeutic chemical classification (ATC) and biopharmaceutics classification systems (BCS). Because of their importance and critical efficacies, many researchers have proposed various methods for the design of a drug1. Nonetheless, the design of a new drug is a very costly and time-consuming process, which takes over 15 years. Also, lots of drug discovery and development projects may fail, in large part because of the rigorous controls during drug development phases. Hence, researchers attempted to find other approaches for the treatment of diseases such as drug repurposing method as a cost- and time-effective strategy that offers many new benefits of the existing drugs. Several computational manners have been suggested for the repurposing of medications. These approaches can be categorized into some classes, including:

  1. i)

    Molecular docking methods: These methods, which look for ligands that can bind to proteins based on their multi-dimension structures, are the most popular approaches in drug repositioning field2. However, the methods cannot be used if the multi-dimension structure of a protein or a ligand is unknown.

  2. ii)

    Metabolic pathway-based methods: These procedures are usually used for treating orphan or rare diseases. For this purpose, the metabolic pathways related to the disease are identified. Next, drugs, which can affect the metabolic pathways of the diseases, are investigated3, and then, introduced to treat the diseases if they are qualified. Since the metabolic pathways of many orphan and rare diseases are not determined, these methods have a low level of success rate.

  3. iii)

    Connectivity-MAP (CMAP) methods: These approaches, which confront lots of genomic data, are used to discover relationships between diseases and genes4. For the methods, one can refer to some limitations such as various cell-lines, platforms, etc., which make the data inconsistent.

  4. iv)

    Data-mining methods: These methods, which include different procedures such as text-mining, machine learning, etc., are the most powerful ones in finding the novel usages of drugs. Since the methods act based on existing data, they increase the success rate of drug repositioning and many researchers take them into consideration5. Nevertheless, the validity of the acquired results remains a primary challenge.

The existing machine learning methods might achieve acceptable results. However, the more effective the approaches are, the better the prediction will be. To this end, we proposed an improved and efficient machine learning method which predicts drug-target interactions (DTIs) efficiently and fits into the fourth category of the groups mentioned above. The proposed method (the so-called ANNTR) is a multi-layer artificial neural network which is trained by a novel optimization algorithm called “Trader”. Accordingly, a proper model with a higher predicting ability is acquired. Besides introducing an efficient and improved machine learning approach for predicting DTIs, two other facts motivate us to introduce Trader optimization algorithm. First, an efficient algorithm, which eliminates the limitations of the optimization algorithms and can be applied to different fields such as engineering, biology, computer science, etc., is useful and essential. Second, a comprehensive and suitable comparison of optimization algorithms with others can determine their actual performance in the real-world usages.

Related Works

Our proposed method, which is a combination of artificial neural network and Trader optimization algorithm (ANNTR), falls into the data-mining class of drug repositioning and on predicting DTIs. This section is allocated to reviewing the related literature from the data-mining viewpoints. The conducted investigations have been categorized into six classes, as follows:

  1. i)

    Learner-based methods: In these studies, learners such as Deep learning6,7, Support vector machine8,9,10,11, Regression algorithms12, K-nearest neighbors13, Rotation forest learner11, and Relevance vector machine14 aimed to find out the relationships between the input and output using labeled datasets. The acquired model is evaluated and applied to predict unknown DTIs. Since every learner uses a different method for separating samples, their results differ from each other. The biggest weakness of the mentioned literature works is generating negative datasets and obtaining a model based on them. For this reason, the percentage of error goes up due to a possible positive interaction between a drug and a target in the generated negative dataset. To tackle such restriction, one-class classification machine learning approaches can be used15. There is a low level of accuracy in the methods used in the related literature despite the fact that their obtained results are acceptable. To enhance the prediction accuracy, we have introduced an efficient machine learning method, which is based on a new optimization algorithm, so-called “Trader”, as well as an artificial neural network.

  2. ii)

    Network-based methods: This type of literature works formulate drugs and their various targets (genes, proteins, enzymes, metabolic pathways, etc.) and then analyze them for obtaining new information. In a series of related works, the designed network is examined by various algorithms such as Random walk16,17 and Random forest18. Unlike the first class of related works which depends on the negative dataset19, the second group only considers the existing information. As a result, the error of the second category is lower than the first one. Nevertheless, the performance of the first category is higher than the second group.

  3. iii)

    Prioritization-based methods: These types of researches calculate drug-drug, network-network or target-target similarities. After they are ranked based on acquired scores, the intended drugs are suggested for treating diseases. To compute the scores, chemical information of drugs, topological information of networks, and sequence information of targets are examined20. Considering different studies, it can be concluded that the similarity is not an only determinant factor in the repositioning of drugs. Hence, the false positive rates of prioritization-based methods are high. To overcome the restriction, some researches integrate different information and then calculate the similarity scores21.

  4. iv)

    Mathematics and probabilistic-based methods: This type of studies formulate the problem as a graph and then mine it to obtain new information22. These methods run into difficulties when there are orphan nodes in the generated graph. To deal with the existing constraint, a matrix regulation and factorization method may be usefull23.

  5. v)

    Ensemble-based methods: It has been shown that a proper combination of machine learning methods usually leads to better results in computer science problems. Inspired by the combination idea, some researchers have predicted DTIs using a combination of the above-mentioned classes24,25,26. Although these methods enhance the separability power of a drug-target predictor, they increase the error rate and suffer from the disadvantages of the combined methods.

  6. vi)

    Review-based approaches: Large numbers of drug-target prediction literature studies are considered just to review articles which have investigated the problem from various viewpoints such as applied tools27, methods28, databases, software applications29, etc. These articles usually include a discussion of the advantages and disadvantages of proposed methods and give some directions to be followed in the future30.

Methods and Materials

Preparing the datasets

We integrate chemical and genomic spaces and gather information about drugs and targets as a dataset, similar to the work carried out by Yamanishi et al.31. The targets are divided into four classes, including enzymes (EN), ion channel proteins (IC), G-protein coupled receptors (GP), and nuclear receptor proteins (NR). To provide the datasets, the following steps can be considered:

  1. i)

    The chemical information on drugs and ligands is obtained from KEGG DRUG and KEGG LIGAND databases32. Then, the similarity scores between drugs are calculated by Eq. (1) 14. For this purpose, the pharmacological effects of medications on 17109 molecular properties are taken into consideration.

    $${\rm{SIM}}(D,{\rm{D}}^{\prime} )=\frac{{\sum }_{i=1}^{n}{W}_{i}{F}_{i}{F^{\prime} }_{i}}{\sqrt{{\sum }_{i=1}^{n}{W}_{i}{F}_{i}^{2}}\sqrt{{\sum }_{i=1}^{n}{W}_{i}{F^{\prime} }_{i}^{2}}}$$
    (1)

    Where, n, Fi, SIM, and Wi are a total number of molecular properties (17109), the ith molecular feature, the similarity score between two drugs such as D and D’, and the weight of the Fi calculated by Eq. (2) 14, respectively.

    $${{\rm{W}}}_{{\rm{i}}}=\exp (-{{{\rm{d}}}_{{\rm{i}}}}^{2}/({{\rm{\sigma }}}^{2}{{\rm{h}}}^{2}))$$
    (2)

    Where, di, σ, and h are the frequency of ith feature, the standard deviation of dk (k = 1through n), and a constant value 0.1, respectively. Using Eq. (1), a matrix of the effect similarity score for every pair of drugs is created.

  2. ii)

    Amino acid sequences of protein targets are obtained from the DrugBank33 database and KEGG GENE databases. Further, we have developed an integrated database named DrugR+34 (http://www.drugr.ir) which is a relational database and contains all data of DrugBank and some data of KEGG. Next, the similarity score between every pair of targets is computed by the normalized smith and waterman alignment scoring method35, and a matrix is generated for target-target similarity scores.

  3. iii)

    The interaction information between drugs and targets is obtained from the DrugR+ database.

For every type of the targets, a dataset is created by the pseudocode presented in Fig. 1. These datasets can be used as gold standard datasets by researchers who want to predict the interaction between drugs and targets using machine learning approaches. In Table 1, the attributes of the generated datasets are also shown.

Figure 1
figure 1

Pseudocodes for generating the dataset. The generated datasets only include positive drug-target interactions and have been obtained based on the chemical similarity score of drugs and smith waterman alignment score of targets.

Table 1 Properties of the generated datasets.

The machine learning approach

Our proposed method, whose framework is depicted in Fig. 2, creates a prediction model using a multi-layer perceptron (MLP) artificial neural network (ANN) with two hidden layers. The generated datasets are divided into two sets, including (i) training and (ii) testing sets. For all the generated datasets, the ANN is trained by Trader in which every candidate solution consists of 38 variables. There are 8, 3, 2, and 1 neurons in the input, the first hidden, the second hidden, and the output layers, respectively. In the ANN, all the neurons of a layer are connected to all the neurons of the next layer, and hence, the total number of synapses or ANN’s edges is 8*3 + 3*2 + 2*1 = 32. Moreover, since there are six biases which are specified in Fig. 2, the total number of variables will be 32 + 6 = 38 in a potential answer. In this problem, the objective function is considered as root mean square error (RMSE) which is computed by Eq. (3):

$${\rm{RMSE}}=\sqrt{\frac{{\sum }_{i=1}^{S}{({P}_{i}-{O}_{i})}^{2}}{S}}$$
(3)

Where, S, P, and O are the total number of samples, predicted and real-world values, respectively.

Figure 2
figure 2

The framework of the proposed method for drug repurposing. After generating the datasets, Trader trains the ANN using datasets. When the ANN is appropriately trained, the model is generated and then applied to the prediction of the unknown drug-target interactions. IN, H, D, and T show neurons of the input layer, and neurons of hidden layers, a drug, and a target, respectively.

Trader optimization algorithm

Our proposed algorithm, Trader, has been inspired by the intelligent behavior of traders who are looking for more profit and property using different operations such as retailing, importing, exporting, and many other activities. In Fig. 3, the flowchart of Trader has been shown. Trader consists of several steps that are described, as follows:

  1. i)

    Creating the first population of candidate solutions: Like other optimization algorithms, Trader starts with some potential answers which consist of several variables and can be considered as an array. Equation (4) shows a candidate solution (CS) with n variables:

    $${\rm{Variable}}=\{{{\rm{v}}}_{1},\,{{\rm{v}}}_{{\rm{2}}},\,\ldots ,\,{{\rm{v}}}_{{\rm{n}}},\,{\rm{G}}\}$$
    (4)

    Where G determines the group of the CS which belongs to a trader, vi shows the ith variable. The groups are not specified at the beginning of the algorithm. For the drug repurposing problem, a CS determines the weights of the ANN’s edges, and the variables show the edges of the ANN. Therefore, the total number of variables and edges of the ANN are the same.

  2. ii)

    Calling the objective function: After creating the first population of CSs, the worthiness of each of them is calculated by an objective function (OF), whose worthiness is defined based on a problem nature. For example, the fitness of a CS is computed by the value of the error in the problem of training an artificial neural network (Eq. (3)).

  3. iii)

    Grouping the candidate solutions: The groups are constituted based on the number of traders and their properties. At the start of the algorithm, all the traders have a same property which will be updated during the algorithm’s iterations. Equation (5) is used to calculate the number of CSs devoted to a specific trader (a group):

    $${{\rm{NB}}}_{{\rm{i}}}=2+{\rm{round}}(\frac{Pi}{{\sum }_{j=1}^{T}Pj}\times (C-2\times T))$$
    (5)

    where, NBi, Pi, C, and T are the total number of CSs assigned to the ith trader or group, the property of the ith trader, the number of existing CSs, and the number of traders, respectively. Also, the constant value of 2 indicates that none of the traders or groups is eliminated during the algorithm iterations, and at least two CSs remain in every group. Figure S2 has illustrated an example of the competition among traders for getting the CSs.

  4. iv)

    Changing the candidate solutions: After grouping candidate solutions, at first, the best CS of each group named Master CS is selected, and then its variable values are distributed to the another CS, named Slave CS, using Eq. (6):

    $${\sum }_{j=1}^{Ck}({\sum }_{i=1}^{R}(CS\_slave\_j(rand(n))=CS\_master\_k(rand(n))))$$
    (6)

    where n is the total number of variables in a CS, R is a random integer value between [1, n], Ck is the number of CSs of the kth group, CS_slave_j is the jth Slave CS of the kth group, and CS_master_k is the Master CS of the kth group. In case Eq. (6) enhances the value of the OF (RMSE), these changes are ignored. Otherwise, they will be accepted. In addition to the Eq. (6) which helps the Slave CSs to improve their value of OF, there is another operator that changes the Slave CSs based on their contents. These changes are applied to the Slave CSs using Eq. (7).

    $${\sum }_{i=1}^{R}(C{S}_{slave(M)}=C{S}_{slave(M)}+k\times rand(C{S}_{slave(M)}))$$
    (7)

    where R is a random integer value between 1 and n/10, M is a random integer value between 1 and n, CSslave is a Slave CS, and k is an arbitrary value which is selected either 1 or −1. Like the previous operator, the changes are accepted if they improve the value of the OF. Unlike Eqs (6) and (7) which only change Slave CSs, there is another equation (Eq. (8)) which alters Master CSs. This operator exchanges values of variables among Master CSs. For applying it to the Master CSs, some of the values of the best CS of other groups are randomly chosen and then are imported to the selected Master CS.

    $${\sum }_{j=1}^{T}{\sum }_{i=1}^{R}(CS\_master\_j(rand(n))=CS\_master\_k(rand(n)))$$
    (8)

    Where, R is a random integer value between 1 and n, j and k indicate the importer and exporter groups, CS_master_j shows the Master CS of the importer group, and CS_master_k shows the Master CS of the exporter group. The value of k is calculated by Eq. (9).

    $${\rm{K}}=\{{\rm{a}}|{\rm{a}}\ne {\rm{j}}\,{\rm{and}}\,{\rm{a}}\,{\rm{is}}\,{\rm{an}}\,{\rm{integer}}\,{\rm{random}}\,{\rm{value}}\,{\rm{in}}\,1\le {\rm{a}}\le {\rm{n}}\}$$
    (9)

    Like the other operators of Trader, the changes, induced by the Eq. (8), are accepted if the imported values improve the value of the OF. By the Eq. (6) through (8), the weights of the ANN’s edges are altered, and a new drug-target predictor is acquired. Provided that the new drug-target predictor reduces the value of the RMSE (Eq. (3)), the changes of weights are admitted. Figures S3 through S5 illustrate how the changes on CSs are applied.

  5. v)

    Updating property: The operators of Trader, shown by Eq. 6 through 8, may change the CSs. Hence, the total value of the objective functions of a group, which is computed using Eq. (10), varies. Accordingly, the property of the groups must be updated.

    $${{\rm{Property}}}_{{\rm{i}}}=\{{\sum }_{j=1}^{B}OF(j)|CS(j,\,G)={G}_{i}\}$$
    (10)

    where, propertyi is the property of the ith trader or group, B, Gi, and G are the number of CSs, the ith group, and the group which the jth CS belongs to it, respectively.

  6. vi)

    Termination condition: Like other optimization algorithms, each of the following options can be considered as the termination condition of Trader: (i) calling algorithms steps based on a predefined number of iterations; (ii) reaching a determined value of accuracy or error; (iii) elapsing a certain amount of time; (iv) stabilizing of the best answer in recent iterations. For training the ANN, a predefined number of iterations has been selected as the termination condition.

  7. vii)

    Selecting the best answer: When the termination condition is satisfied, a CS having the best value of OF will be selected and introduced as the solution to the problem. For the DTIs prediction problem, a CS, which has the minimum value of the RMSE, is chosen as a solution to forecast unknown DTIs. Figure 4 shows the pseudocode of Trader.

Figure 3
figure 3

The flowchart of Trader: The proposed optimization algorithm starts with some candidate solutions which each of them determine the weights of the ANN. Next, they are placed into several groups and are improved by Eq. 6 through 8 (see the text for details). The steps of Trader are repeated until the termination condition is satisfied. By passing the steps of the algorithm, the value of RMSE is also reduced and a suitable predictor model is acquired.

Figure 4
figure 4

The pseudocode of Trader. For training the ANN, Trader produces some potential answers which consist of several variables (the edges of the ANN). Trader includes three operations, shown by Eq. 6 through 8. These operations change the weight of ANN’s edges differently. For instance, Eq. (7) alters them based on their content, or Eq. (8) tries to improve them by importing some values from the best solutions.

Results

The proposed machine learning approach has been implemented in MATLAB programing language and all the implemented source codes are available at (https://github.com/LBBSoft/Trader). This section contains three categories of results as follows:

Trader in comparison with the other optimization algorithms

Besides Trader, ten state-of-the-art optimization algorithms (PSO36, WCC37, TGA38, TE39, EPO40, ION41, VIR42, DVBA43, HTS44, and CEFOA45) were implemented. Then, these algorithms were applied to 20 benchmark functions which are used in various researches in which the above-mentioned optimization algorithms have been introduced. These standard test functions, which are available in Table S1 (Supplementary File), are categorized into unimodal, multimodal, fix dimension, expanded, penalized, and hybrid categories. Since optimization algorithms produce variable results in different executions, these algorithms are recommended to be executed at least 30 times for an intended problem, and then, the final best-obtained result should be reported to answer the problem46. Hence, all of the above-mentioned algorithms were executed over 50 individual executions on the determined benchmark functions with high dimensions. Further, the algorithms were executed under similar conditions such as the number of iterations during execution and the number of OF callings, and their parameters are determined in such a way that their performances were maximized. To evaluate the optimization algorithms, the criteria like convergence and stability of acquired results are considered. Figure 5 shows the convergences of the algorithms on the test functions, which relate to their best result over 50 individual executions. For similar convergence behaviors, the average outcomes were drawn. For example, the results of F11 and F12 were merged into one. Besides, the convergences of the algorithms on each of the benchmark functions are presented in Fig. S6 (Supplementary File).

Figure 5
figure 5

The convergence of the algorithms on different test functions shown by F. For instance, Fi presents ith test function. (a) The average convergence of the algorithms on F1 through F9 and F15. (b) The average convergence of the algorithms on F11 and F12. (c) The convergence of the algorithms on F13. (d) The average convergence of the algorithms on F10, F14, and F16 through F20. Among the test functions, F10, F14, and F16 through F20 are the benchmark functions with the small sizes, but the others have a large number of variables with a higher range. These diagrams show that Trader has more stable behavior than the others on different benchmark functions whereas EPO, TGA, and ION fall into local optima for some of them as F11, F12, and F13. Also, the results state that the performance of the algorithms is almost the same when the size of a problem or the number of variables is small.

The acquired results show the below findings:

  1. i)

    Trader, TGA, TE, and EPO have more convergence speed than the others and can get better results. However, the convergence speed of EPO is lower than TGA, Trader, and TE in early steps and depends on its fourth quarter of iterations in which the range of variables become smaller and smaller (Fig. 5a). Therefore, EPO gets more speed of convergence in the last quarter. TE, TGA, and ION use a similar method as does EPO and limit the range of variables by passing the iteration of the algorithms’ steps; and therefore, produce the better results for some special problems such as F1 through F9.

  2. ii)

    EPO, TGA, and ION algorithms cannot produce the desired results to some problems such as F11 and F12 (Fig. 5b,c). The other algorithms outperform these three algorithms when their iterations of steps are enhanced and can acquire better results.

  3. iii)

    For the small-sized benchmark functions such as F17 through F20, the algorithms have similar performance, and all of them can obtain the optimal solution (Fig. 5d).

  4. iv)

    The convergence of VIR, HTS, WCC, and CEFOA are slower than other algorithms for some of the test functions (Fig. 5a). Nonetheless, they can acquire acceptable results with enhancing the allocated time or the number of iterations, but not EPO, TE, and TGA, because of falling into local optima.

For an accurate evaluation of the algorithms, we summarized their findings over 50 distinct executions in Tables S2 through S4 (Supplementary File) with two decimal digits of accuracy using the ANOVA one-way test. We also provided Table 2 which includes the P- values of the algorithms compared to Trader as a test base and shows that the null hypothesis can be strongly rejected. For this purpose, the Wilcoxon rank sum test, which states how much the generated results are the same47, was done.

Table 2 The obtained P-values of the algorithms based on their best results in different executions with Trader as a test base.

Based on the average and standard deviation point of views (Table 3), Trader has proper functionality, but its results are close to the outcomes of EPO, TGA, and TE for the test functions F1 through F9. However, they are only suitable for the problems whose optimal answer is 0 because of their operators’ nature. For this reason, their performance is the same for all of the benchmark function. From the STD aspect, HTS will be the best algorithm and the best option when the range of variables is small in a problem.

Table 3 The obtained mean and standard deviation values of the algorithms with [mean] ± [standard deviation] pattern.

The proposed machine learning method against the others

In the second part of this section, the performance of the proposed method (ANNTR) is evaluated based on four gold standard datasets48, and then, is compared against three state-of-art methods, including the rotation forest-based drug-target (RFDT) predictor method11, the Bayesian (BAY) ranking-based method22, and a relevance vector machine-based method14 (RVM). The datasets, which are named Enzyme, Ion channel, G-protein, and Nuclear receptor, consist of 4,449, 2,029, 1,268, and 168 DTIs samples, respectively. Further, the samples have been marked using positive and negative labels, which show whether an intended drug and target have the interaction or not. The acquired results, which present the proposed method outperforms the other methods in the overall state, have been shown in Table 4. For every criterion on the datasets, the best-acquired outcome has been determined using the boldface value.

Table 4 A comprehensive comparison between the 5-fold cross-validation results of the proposed method and the others.

Figures 6 and 7 show the receiver operating characteristic (ROC) and precision-recall (PR) curves based on 5-fold cross-validation test, respectively. Besides, these data (Figs 6 and 7) represent information about the area under the curve (AUC) which compares the performance of the methods on the datasets. Except for the enzyme dataset, ANNTR achieves better results than three others. Furthermore, the proposed method obtains the average AUC values of 0.9457 and 0.9708 for ROC and PR curves, respectively, which are better than three others. Furthermore, as shown in Figs 6 and 7, RFDT, RVM, and BAY respectively obtain the average AUC values of 0.8736, 0.9216, and 0.9215 for the ROC, and 0.9248, 0.9581, and 0.9634 for the PR.

Figure 6
figure 6

The ROC curve of the methods on the four gold-standard datasets. (a) The ROC curves of the algorithms on the enzyme dataset. (b) The ROC curves of the algorithms on the ion channel dataset. (c) The ROC curves of the algorithms on the G-protein dataset. (d) The ROC curves of the algorithms on the nuclear receptor dataset. Besides the four plots, there are also the values of the AUC. Except for the enzyme dataset, the proposed method has obtained better results than others. Furthermore, Trader’s average value of the AUC is higher than four others. ANNTR: Trader-based Artificial Neural network; RFDT: Rotation Forest-based Drug-Target predictor; RVM: Relevance Vector Machine; BAY: Bayesian ranking-based.

Figure 7
figure 7

The PR curve of the methods on the gold-standard datasets. (a) The PR curves of the algorithms on the enzyme dataset. (b) The PR curves of the algorithms on the ion channel dataset. (c) The PR curves of the algorithms on the G-protein dataset. (d) The PR curves of the algorithms on the nuclear receptor dataset. The size of the positive and negative datasets is the same. The PR curves show the proper performance of the proposed method relative to the others. The average value of Trader’s AUS is also higher than them. ANNTR: Trader-based Artificial Neural network; RFDT: Rotation Forest-based Drug-Target predictor; RVM: Relevance Vector Machine; BAY: Bayesian ranking-based.

The acquired results on the generated datasets

In the third part of the results, we investigated the performance of Trader on the generated DTIs datasets (Table 1). For the testing datasets, a total of 751 (from 800 samples) DTIs were correctly predicted by the proposed method. In an evaluation with details and a comparison with other methods, we compared the proposed method with three other popular and efficient classification methods, including the support vector machine (SVM), the decision tree (DT), and the artificial neural network trained by error back propagation method (ANNEBP)15. The acquired results are shown in Table 5. Since the datasets are relating to the known DTIs, the problem is considered to be a one-class classification problem. Thus, true positive and false positive rates are reported in Table 5.

Table 5 Acquired results using 10-fold cross-validation test on the generated datasets.

As reported in Table 5, ANNTR displays a higher detection capability of DTIs relative to the others. We used 10-fold cross-validation test in which a dataset has been divided into ten distinct sets for the comprehensive evaluation of the methods. In 10 iterations, four sets are considered as the training set; and the remaining one is used as the test set. There is also the convergence behavior of Trader on the generated datasets (Fig. 8).

Figure 8
figure 8

The convergence behavior of Trader on all the generated datasets in training of the ANN. (a) The Convergence of Trader on the EN dataset. (b) The Convergence of Trader on the IC dataset. (c) The Convergence of Trader on the GP dataset. (d) The Convergence of Trader on the NR dataset. The results relate to the best-obtained outcomes from 50 distinct executions. For all the datasets, Trader has led to an acceptable value of the RMSE.

Furthermore, we generated all the potential drug-target interactions dataset using pseudocode (Fig. 1), resulting in 119,743 records. Then, we applied the obtained models to the dataset of the potential DTIs. For all the possible DTIs dataset (119,743 samples), ANNTR predicted 47 new DTIs (Table 6).

Table 6 The detected drug-target interactions.

Discussion

The proposed machine learning ANNTR method, which is based on the new optimization algorithm, was developed and compared with the well-known and efficient machine learning methods. Then, the acquired results were analyzed. Although many optimization algorithms have been proposed, they suffer from some limitations. Our proposed algorithm, which eliminates the shortcomings of other algorithms, shows stable behavior much more than the others do, and trains the ANN appropriately. The findings also show that Trader, VIR, HTS, DVBA, CEFOA, ION, PSO, and WCC have better performance in comparison with TGA, TE, and EPO. The main reasons for such performance are as follows: (i) They lack any assumptions about the optimal answer to a problem with their operators, whereas TGA, TE, and EPO include operators making the range of variables smaller and smaller and therefore can reach optimal answers in a faster manner for some of the problems. (ii) Their behaviors are almost the same on the different benchmark functions, whereas TE, EPO, and TGA fall into local optima positions for some test functions.

As a case study, we applied the proposed ANNTR method to some biological datasets and all the potential DTIs data to find drugs which may affect targets, with the result that 47 DTIs were discovered. The predicted results can be used in three manners. First, they propose some unknown DTIs. In case a disease is due to the intended target, the related drug can be introduced as an option for the treatment of disease. Further, the side effects of drugs can be determined by investigating the predicted relation between a drug and a target. Second, they show that some of the medications, like D00086 and D00145, have an identical predicted target. There is a possibility that they have similar functionality and can be used alternatively. Also, these results can be useful to chemical pharmacists who look for the novel potential efficacy of drugs and researchers who want to validate their predicted outcomes. Third, the predicted DTIs might reveal the real mechanism of actions (MOA) of drugs49 which show the pharmacological effect(s) of a drug.

For example, we predicted that diazoxide interacts with the angiotensin-I-converting enzyme (ACE). An investigation in the clinical impact of diazoxide and ACE together with some previous studies reveals that diazoxide can be used for the treatment of severe hypertension50, while ACE is responsible for controlling blood pressure51. Therefore, for the first time, we have shown that diazoxide can affect the ACE, while the MOA of diazoxide has been reported differently by others52. Also, similar to diazoxide, Ketotifen, used to reduce conjunctivitis allergic effects, can also interact with the ACE. Likewise, in a study conducted by Sanchez-Patan et al.53, Ketotifen was shown to decrease hypertension in rats.

Another example is erlotinib which is used for treating epithelial lung cancer. Our proposed method has predicted that erlotinib interacts with Muscle, skeletal, receptor tyrosine kinase (MuSK) which its antibodies are found in neuromuscular diseases. The disease leads to various phenotypes such as less eye involvement, weakness, and pain in the neck. Some researches related to the side effects of ertolinib54 may validate the predicted interaction.

Conclusion

A new optimization algorithm, named Trader, was introduced and compared with ten state-of-art optimization algorithms based on various statistical criteria. The results show that Trader outperforms other optimization algorithms and eliminates their limitations. As an empirical, yet smart evaluation, we examined the performance of Trader in the training of a multi-layer perceptron artificial neural network to discover potential DTIs on the gold-standard and generated datasets. The predicting model obtained from Trader achieved 94.62%, 94.24%, 94.80%, and 94.10% of the average 5-fold cross-validation respectively for the accuracy, sensitivity, specificity, and precision of the model. These values appeared to be better than the acquired results from other methods. Furthermore, the proposed method predicted 47 potential DTIs. We envision that the outcomes obtained by the proposed model may be used for managing possible side-effects of medications, understanding the MOA of drugs, and finding new research opportunities. Taken all, this study may pave the way in terms of de novo applications of computer-aided methods in drug discovery and development.