Improved Classification of Blood-Brain-Barrier Drugs Using Deep Learning

Blood-Brain-Barrier (BBB) is a strict permeability barrier for maintaining the Central Nervous System (CNS) homeostasis. One of the most important conditions to judge a CNS drug is to figure out whether it has BBB permeability or not. In the past 20 years, the existing prediction approaches are usually based on the data of the physical characteristics and chemical structure of drugs. However, these methods are usually only applicable to small molecule compounds based on passive diffusion through BBB. To deal this problem, one of the most famous methods is multi-core SVM method, which is based on clinical phenotypes about Drug Side Effects and Drug Indications to predict drug penetration of BBB. This paper proposed a Deep Learning method to predict the Blood-Brain-Barrier permeability based on the clinical phenotypes data. The validation result on three datasets proved that Deep Learning method achieves better performance than the other existing methods. The average accuracy of our method reaches 0.97, AUC reaches 0.98, and the F1 score is 0.92. The results proved that Deep Learning methods can significantly improve the prediction accuracy of drug BBB permeability and it can help researchers to reduce clinical trials and find new CNS drugs.

comparing the experiment results with other methods. What's more, the accuracy of multi-core SVM method only reached 0.76, AUC was 0.739 and F1 score was 0.76, which need to be improved urgently. For thousands of possible drugs, every 1% increase in accuracy can save a lot of drug clinical testing time. The 0.76-accuracy of existing SVM-based methods is far away to satisfy the realistic requirement.
This paper proposes a Deep Learning method in predicting the drug permeability of BBB which is based on clinical features. At present, Deep Learning method is widely used in the fields of image, sound, and text recognition, which have been achieved a majority of remarkable results. In recent years, some researchers have proposed many Deep Learning methods in the field of drug prediction and achieved excellent results [30][31][32] . However, the application of Deep Learning methods is still rare for the prediction of BBB permeability of CNS drugs. Therefore, our paper also tries to verify the Deep Learning method whether is effective in predicting the drug's BBB penetration based on clinical features. Compared with the existing methods, our method has the following advantages: (i) The average prediction accuracy of experiments with three datasets already achieved 0.97, the average AUC is 0.98, F1 score is 0.91. It significantly performed better than the multi-core SVM method, Decision Tree and the KNN method, which can help researchers save experiment time and discover new drugs. (ii) The accuracy, AUC and F1 scores of SVM methods with different datasets are fluctuated greatly, but the accuracy of the Deep Learning method, which proposed in this paper, is very stable and adaptable. (iii) The Deep Learning method can be applied in both simple diffusions of small molecule compounds and other compounds that diffuse through complex pathways. In summary, this paper proposes a Deep Learning method in drug prediction of BBB permeability which is based on the clinical features and our results are better than the previous researches' results like multi-core SVM methods. In the future, we will experiment with more types of drug data and hope our method can be applied in different disease.
The remaining sections organized Section III introduces the datasets and how we established them. Section IV is talking about the Deep Learning methods, which is design for predicting BBB permeability. Section V is the performance analyses which compared the Deep Learning method with multi-core SVM, KNN and Decision Tree (DT) on the three datasets. In section VI, it is the discussion of the advantages of the Deep Learning method proposed in this paper. Section VII concludes and describes future work.

Results
The experiments compare the Deep Learning method with Sigmoid-Support Vector Machine (Sigmoid-SVM), POLY-Support Vector Machine (POLY-SVM), Radial Basis Function-Support Vector Machine (RBF-SVM), K-Nearest Neighbor (KNN) and Decision Tree (DT). We tested the three datasets independently. Each test randomly assigned 1000 samples into mutually exclusive training sets (70%) and validation sets (30%). We also adopt 5-fold cross validation of the training datasets and validation datasets.
We adopt several evaluation methods to ensure the precision of the results. First, we calculate the accuracy on the training and validation datasets to evaluate the learning methods. However, the accuracy is not always valid for evaluating the learning performance in different situations, especially when the true and false samples of the dataset have large difference. Then we calculate the F1 score which is an indicator used in statistics to measure the accuracy of binary classification models, and we also consider the models' accuracy and the recall rate. Finally, in order to judge the performance of the learning models intuitively, we draw the ROC curve (Receiver Operating Characteristic curve) and calculate the AUC of the ROC curve (Area under the Curve of ROC). We also calculated all the indicators for the training and prediction datasets. Because the results analysis only requires the Figure 1. Mechanisms of drugs passing BBB and the applicable scope of prediction methods 29 . The right part presents the blood vessel, which shows the mechanisms for drug passing BBB, and the left part is the brain, which shows the scope of clinical drug phenotype based and chemical feature based BBB permeability prediction methods 29 . www.nature.com/scientificreports www.nature.com/scientificreports/ results of the predicted dataset, we did not list the results of the training dataset in the manuscript. The detailed results of dataset 1-3 are shown in the Supplementary Table S1 Predictive performance of different methods with Dataset 1 and Dataset 2. In this section, first of all, we established Datasets 1 and 2 and validated the performance of different learning methods with them, which based on drug's side effects, drug's indications and drug's side effects (SE) + indications. Then, we collected and analyzed the results of each individual test. Table 1 and Fig. 2 are the experiment outputs of Dataset 1 with different methods. According to Table 1, besides the Deep Learning method, the RBF-SVM method achieved the best results and its accuracy is 0.84, the AUC is 0.84 and the F1 score is 0.73. However, the performance of Deep Learning method is the best, the AUC increases by 13.9%, the accuracy increases by 12% and the F1 score increases by 17.4%. Therefore, the results show that Deep Learning method has better performance than the other methods on the experiments with Dataset 1.
In order to further verify the performance of the learning models, we do the experiments with Dataset 2, which has a lager sample number. The predictive performance of Dataset 2 is shown in Table 2. The drug-side-effects' , indications' and drug-side-effects + indications' ROC curves of Dataset 2 are shown in Fig. 3 Table 3. The ROC curves of Drug-side-effects, Indication, and Drug-side-effects (SE) + indications on the Independent Dataset are shown in Fig. 4 respectively. According to Table 3   www.nature.com/scientificreports www.nature.com/scientificreports/ Inter dataset validation. To verify the versatility of the Deep Learning method, we also performed inter dataset validation. We used Dataset 2 as the training dataset and Dataset 1 as the validation dataset. The inter dataset validation results are shown in Table 4.
The results show that the Deep Learning method proposed in this paper achieves the ideal effect. The optimal accuracy is 0.97, the AUC is 0.98, and the F1 score is 0.92. It proves that the Deep Learning method has the best versatility among different datasets.
There is a brief summary of the experiments that the prediction accuracy of the Deep Learning method is very stable and always between 0.96 to 0.98. In addition, the best-processing method which besides the Deep Learning method is different in each dataset. At the same time, the fluctuation of accuracy is obvious which is influenced by the difference between the number of samples and the datasets and the number of positive and negative samples. What's more, the AUC and F1 score of the Deep Learning method also remain at a relatively high level. We also performed inter dataset validation to demonstrate the versatility of the Deep Learning method. Therefore, under the conditions described in this paper, we think the performance of the Deep Learning method is better than other existing methods. . Each test data field shows average ± std of 1000 random splits of training and test data. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Research on neurological diseases has a long history. These kinds of researches can cure neurological diseases. At present, most researchers are still using various data mining algorithms based on different chemical characteristics to predict drugs' BBB permeability 17,18 . To further improve the performance of drug prediction models, researchers are still experimenting with many new physical and chemical features such as 2D molecular descriptors and molecular fingerprints, and machine learning methods like Gaussian process, Synthetic Minority Oversampling Technique (SMOTE) and SMOTE + edited nearest neighbor 19,[33][34][35][36] . In fact, to improve the predictive methods, scientists have tried more than 1,000 chemical descriptors, many of which rely on esoteric quantum chemical calculations, and it is difficult to obtain accurate data using existing techniques 37 . In addition to the reason of computational complexity, there are some situations that chemical features are not available, such as some drugs/biologic with no precisely defined structures and most of the nutrients, nutrients analogs and certain physiologically important macro-molecules which pass through BBB must with more complex biological active mechanisms 27,28,38 . According to the cases mentioned above, if a model is trained with passive diffusion of BBB agents, the accuracy of BBB penetration prediction will be low. On the other hand, scientists can neither predict the mechanism by which a drug penetrates the BBB, nor predict the applicability of the model without the support of elaborate in vivo experiments. In order to solve this problem, the researchers have also made many attempts, such as: trying to establish an in vitro model. This method will clarify the mechanism of BBB development and help researchers predict the BBB permeability of drugs 39,40 . However, these methods still cannot completely solve the problem that small molecule drugs cannot be predicted. In this case, the researchers considered using drug side effects and drug indication information to predict BBB penetrate which the advantage is that most drugs have undergone an extensive clinical application and accumulated a wealth of information. These kinds of methods can greatly broaden the prediction range of CNS drugs.
For a long time, researchers often overlooked the relation between the clinical phenotype and efficacy of CNS drugs. In order to cross this barrier, Gao et al. have proved that data mining methods can effectively connect these two features 29 . However, there still has a problem of prediction with data mining methods which is the accuracy relatively low which means that clinical researchers still need to spend more time and effort to verify the effectiveness of the drug.
We think that due to the difference of features based on physics and chemistry, the relation between drug side effects and adaptability is more abstract and deeper. That means traditional machine learning methods might not find the relation between data and results very efficiently, and that is the reason why the classification result is not ideal. However, basically, the characteristic of Deep Learning method is suitable for handling the data with abstract relation. To solve the problem of the small number of drugs clinical data, we try several Deep Learning Network with different depth. The results prove that these kinds of datasets are not suitable for very deep network and it requires us to build a moderate-size Deep Learning model. Therefore, the purpose of our research is trying to find out a novel classification method that can more effectively predict the drug BBB permeability based on the clinical phenotype. The experiment result validates our thought that we can get an effective relation between clinical performance and efficacy of drugs with an appropriate size and depth Deep Learning model. Because these relations are on a deep level, the results of general machine learning models are not ideal which can have better performance with Deep Learning model. The performance of Deep Learning method proposed in this paper has been proved by the experiment results that we can greatly improve the final classification results. We think the method proposed in this paper is very helpful for CNS drug calculation and saving time and cost of clinical trials.
Despite the Deep Learning method proposed in this paper has lots of advantages, it is worth noting that this method still cannot predict how the drug penetrates BBB. This is of great significance to biology. Because in this case, we cannot distinguish between the side effects and secondary effects caused by the penetration of the compound into the BBB. Therefore, in the future, we consider combining drug clinical phenotypic effects and drug chemical structure characteristics, determining the general route of drug penetration into BBB. For example, if a drug appears to be permeable in a clinical phenotype-based model and not permeable in a physical and chemical-based model, the drug may enter the body indirectly through other means.

Conclusion
This paper proposes the Deep Learning method to predict the permeability of Blood-Brain-Barrier based on clinical phenotype. There are three datasets with independent testing and the experimental results show that the Deep Learning method performs better than multi-core SVMs, KNNs and Decision Trees. What's more, the prediction accuracy of CNS drugs with our Deep Learning method increases more than 15%. The Deep Learning method  www.nature.com/scientificreports www.nature.com/scientificreports/ proposed in this paper adopted the clinical phenotypic approach, which means that our method has wider applicable scope and can reduce the workload of many clinical trials of drugs.

Materials and Methods
Datasets of clinical drug phenotypes. According to the existing literature, this paper collected the drug names and SIDER datasets which have been proved that have BBB permeability true or false in the clinic.
The SIDER (http://sideeffects.embl.de/) dataset is a public dataset which contains a large number of drug side effects and drug indications 41 . We extracted the characteristics of the drug from this dataset. There is no existing complete BBB-permeable dataset on the Internet currently, so we refer to the literature which published in 2016 29 that collected experimental datasets from other six academic papers 20,37,[42][43][44][45] . Based on this drug dataset, we classify the drugs into two categories, one is BBB permeability true and the other one is BBB permeability false.
The clinical drug phenotypes (side effects and indications) in the SIDER database were formatted according to the Medical Dictionary for Regulatory Activities (MedDRA, http://www.meddra.org/). MedDRA divides the clinical phenotype into 5 levels: Lowest Level Term (LLT), Preferred Term (PT), High-Level Term (HLT), High-Level Group Terms (HLGT) and System Organ Classes (SOC). PT is a special descriptor and it includes the information about symptoms, therapeutic adaptability diagnosis and so on. According to the High-Level Group Terms for neurological diseases (HLGT), we selected 43 terms as clinical phenotypic characteristics of drugs and the details were listed in Supplementary Table S5. Each HLGT also contained specific side effects and indications (PT). Then, took each drug's number of matching times under each specific HLGT group as training features 29 . More details are listed in Supplementary Table S6. In a brief summary, we had established three datasets. The first dataset was referring to Doniger et al. paper which published in 20 and this dataset had 91 samples in total, of which 38 samples were BBB permeability true and 53 samples were BBB permeability false. The second dataset was referring to the papers published from 29,37,[42][43][44]46 and this dataset had 210 samples in total, of which 136 samples were BBB permeability true and 74 samples were BBB permeability false. However, there was an imbalance in the sample distribution of Dataset 1 and Dataset 2.
To solve the lopsidedness of the sample number of these datasets, we established the third Independent Dataset. The third dataset had 161 samples totally, of which 76 samples were BBB permeability true and 85 samples were BBB permeability false. The basic information of these datasets was shown in Table 5. The details of these datasets were given in Supplementary Tables S7-S9. The drug Side Effects and Indication based on SIDER dataset were listed in Supplementary Tables S10 and S11.
System model of Deep Learning method. Deep Learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have already dramatically improved the state-of-the-art speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep Learning is a model that can discover the more complicated structure of datasets by using the back-propagation algorithm. According to the discovered structure, the mode can change the internal parameters. The internal parameter of each layer is the result of the previous layer 47 . For different complex datasets, the number of layers required for Deep Learning is varied. We think that although the relation between clinical side effects and adaptability of drugs may be not so strong, there may have deeper relation between clinical expressiveness and final efficacy. That means clinical expressiveness will affect final efficacy. This relation is suitable for the main idea of Deep Learning which is trying to discover the deeper relation between the data through Multi-layer Network and back Propagation algorithm. Therefore, we try to establish a Deep Learning model to verify our thought. Based on the number of samples and dimensions of the drug datasets processed in this paper, we propose the four-layer Deep Learning model to deal with these datasets. The Deep Learning model which proposed in this paper is shown in Fig. 5.
Hidden layer selection. The number of nodes in the input layer and the output layer of the Deep Learning network. Here we calculated the number of nodes in the hidden layer using the following equation: Where h is the number of hidden layer nodes, m is the number of input layer nodes, n is the number of output layer nodes, α is an adjustment constant between 1 to 10, and generally, α = 1.
Forward pass subprocess. Setting the weight between node i and node j is w ij , the threshold of the node j is b j , and the output value of each node is x j . The output value of each node in the current layer is changed with the  www.nature.com/scientificreports www.nature.com/scientificreports/ output value of all nodes in the previous layer. The weights and the thresholds of the nodes are implemented by an active function. The equations are as follows: where f is the active function represented by the sigmoid function, and its equation as following: The computation procedure is from top to bottom and then from left to right, and it needs to be observed strictly to finish the entire forward process.
Reverse transfer subprocess. After finishing the forward pass process, we need to construct the reverse transfer process. The most important thing in the reverse transfer process is the adjustment of the weights and thresholds between each adjacent layer. The specific adjustment steps are as follows: Step 1. Assume that all results of the output layer are d j and the equation of error function is as follows: Step 2. According to the gradient descent method, the weights and thresholds of the functions are modified in several times in order to minimize the error function. The gradient of E w b ( , ) is divided by the correction of the weight vector at the current position. For the output node j: Step 3. In order to calculate the weights and thresholds between the hidden layer and the output layer, we derive the active function which represents by equation (4), then through equations (7) and (8) for w ij , finally δ ij and b j are calculated by the equations (9) and (10): 2 Figure 5. The four-layer Deep Learning model constructed in this paper, x represents the data of each input node, D Srepresents the data of each output node. W ki is the weight between the input layer and the hidden layer, w mn is the weight between the first hidden layer and second hidden layer and w ij is the weight between the hidden layer and the output layer.
www.nature.com/scientificreports www.nature.com/scientificreports/ j ij Step 4. Calculate the thresholds between two hidden layers and between the input and hidden layers. In equations (11) and (12), we suppose that w mn is the weight between the node m belongs to the first hidden layer and the node n belongs to the second hidden layer. The w ki is the weight between the node K belongs to the input layer and the node i belongs to the hidden layer. The thresholds δ ki and δ mn are calculated by the equations (13) and (14): Step 5. According to the gradient descent method and the formulas, which mentioned above, equations (15) and (16) are used to adjust the weights and thresholds between the hidden layer and the output layer. The equations (17) and (18) are used to adjust the weights and thresholds between two hidden layers. The equations (19) and (20) are used to adjust the weights and thresholds between the input layer and the hidden layer: There is the whole procedure of the reverse transfer process in the Deep Learning method which is proposed in this paper. To complete the learning process of the entire Deep Learning network, the continuous adjustments of weights and thresholds are necessary. We can set an error threshold or a maximal number of cycles as a stop criterion to break off the entire learning process.