Artificial intelligence in cancer target identification and drug discovery

You, Yujie; Lai, Xin; Pan, Yi; Zheng, Huiru; Vera, Julio; Liu, Suran; Deng, Senyi; Zhang, Le

doi:10.1038/s41392-022-00994-0

Download PDF

Review Article
Open access
Published: 10 May 2022

Artificial intelligence in cancer target identification and drug discovery

Yujie You¹^na1,
Xin Lai ORCID: orcid.org/0000-0003-4913-5822²^na1,
Yi Pan³,
Huiru Zheng⁴,
Julio Vera²,
Suran Liu¹,
Senyi Deng⁵ &
…
Le Zhang ORCID: orcid.org/0000-0002-3708-1727^1,6,7

Signal Transduction and Targeted Therapy volume 7, Article number: 156 (2022) Cite this article

33k Accesses
72 Citations
45 Altmetric
Metrics details

Subjects

Abstract

Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Generative AI for designing and validating easily synthesizable and structurally novel antibiotics

Article 22 March 2024

Kyle Swanson, Gary Liu, … Jonathan M. Stokes

A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models

Article Open access 08 March 2024

Feng Ren, Alex Aliper, … Alex Zhavoronkov

Introduction

As one of the cutting-edge cancer treatments, targeted drug therapy has the advantages of high efficiency, few side effects, and low drug resistance for patients¹. However, there are several drawbacks to the existing targeted therapies, such as a few druggable targets², ineffective coverage of the patient population, and the lack of alternative responses to drug resistance in patients¹. Therefore, identifying novel therapeutic targets and evaluating their druggability^3,4 becomes the current cancer research focus of targeted drug therapy.

Since we have difficulty in comprehensively understanding the pathogenesis of cancer due to the complexity of the disease⁵, most of the current targeted drugs are developed based on the experimentally validated hypothesis that can explain a possible mechanism underlying carcinogenesis but ignore other facts of the disease⁶. As a result, these therapies could have undesired impacts on normal tissues and even provoke serious side effects for patients^7,8.

To elucidate the molecular mechanisms underlying cancer genesis, interactome data can be comprised and modelled in network structures in which components are biological entities (e.g., genes, proteins, mRNAs, and metabolites) and edges are associations/interactions between them (e.g., gene co-expression, signalling transduction, gene regulation, and physical interaction between proteins^{9,10,11,12,13,14}). Artificial intelligence biology analysis algorithms are effective method to process the biological network data, which build machines or programs to simulate human intelligence, so as to implement classification, clustering and prediction tasks in biological network¹⁵. Therefore, artificial intelligence algorithms can effectively tackle the complexity of cancer that arises from interactions between genes and their products^16,17 in biological network structures, so as to improve our understanding of carcinogenesis^{11,12,18,19,20,21,22} and explore novel anticancer targets^{23,24,25,26,27,28,29}.

Over the past few decades, we have seen a fast development of artificial intelligence biology analysis algorithms. To make this study easy to understand, we not only divide these artificial intelligence algorithms into network-based biology analysis algorithm and machine learning-based (ML-based) biology analysis algorithm according to the data of biological network structure, but also employ Fig. 1 to describe the historical milestone for these artificial intelligence biology analysis algorithms.

On the one hand, network-based biology analysis algorithms provide a variety of alternative network approaches to identify cancer targets. More importantly, various network-based biology analysis algorithms can investigate network data from different perspectives, therefore they can compensate each other to provide accurate biological explanations³⁰.

On the other hand, ML-based biology analysis^31,32,33 not only can efficiently handle high throughput, heterogeneous, and complex molecular data, but also can mine the feature or relationship in the biological networks. Thus, we should develop more ML-based biology analysis algorithms to provide such advanced biology analyses that can allow precise target identification and drug discovery for cancer.

Although artificial intelligence biology analysis has been widely used to improve our understanding of carcinogenesis, to the best of our knowledge, there is no systematic review that introduces the scope of related research and explains the network-based and the ML-based biology analysis algorithms to identify novel anticancer targets and discover drugs. Therefore, in the next section, we will describe the scope of artificial intelligence biology analysis for novel anticancer targets investigation. In the third section, we will introduce the basic principles and theory of commonly used artificial intelligence biology analysis algorithms. Then, we will briefly review and discuss studies that utilize network-based and ML-based biology analysis for cancer target identification and drug discovery. Finally, we will summarize the content of the article, discuss the limitations and challenges faced by the community, and point out the potential of artificial intelligence biology analysis to identify the therapeutic targets and discover drugs for cancer.

The scope of artificial intelligence biology analysis for novel anticancer target investigations

Recently, the rapid development of cancer-related multiomics technologies^34,35,36 has been one of the most important factors for artificial intelligence biology analysis to explore novel anticancer targets^37,38,39. Figure 2 classifies these technologies into five aspects: epigenetics, genomics, proteomics, metabolomics, and multiomics integration analysis. Furthermore, Table 1 lists the related major diseases, drug targets, genomics, and network databases commonly used in multiomics integration analysis for these five aspects. Next, we will detail these five aspects.

Table 1 Commonly used repositories related to human diseases, drug targets, genomics, and biological networks

Full size table

Epigenetics analyses the reversal modifications of DNA or DNA-related proteins⁵⁴. These modifications affect gene expression without changing the DNA sequence⁵⁴. Investigating epigenetic data through artificial intelligence is not only important for elucidating fundamental mechanisms of cancer but also necessary for the design of targeted therapeutics. For example, Wilson et al.⁵⁵ took advantage of information-rich transcriptomic and epigenetic data to study regulatory networks surrounding histone lysine demethylation and highlighted the importance of epigenetic regulators in mitogenic control and their potential as therapeutic targets, which showed that epigenetic regulators such as KDM1A, KDM3A, EZH2, and DOT1L⁵⁶ are critical in oncogenesis and drug resistance.

Genomics aims to characterize the function of every genomic element of an organism by using genome-scale assays such as genome sequencing⁵⁷. Applications of genomics include finding associations between genotype and phenotype⁵⁸, discovering biomarkers for patient stratification⁵⁹, predicting the function of genes⁶⁰ and charting biochemically active genomic regions such as transcriptional enhancers⁴⁹. Recent developments in network-based biology analysis methods, such as sequence-similarity networks, genome networks, and gene family networks, have significantly improved the usability of molecular datasets in comparative genomics analysis⁶¹. These network methods collect expression and interaction data in the beginning and then transform them into interpretable biological processes^62,63, leading to the identification of tumour subtypes and the discovery of drug targets⁶⁴.

For example, Medi et al.⁶⁵ integrated gene expression profiles into genome-scale molecular networks to identify novel therapeutic targets for cervical cancer, including receptors, microRNAs (miRNAs), transcription factors (TFs), proteins (e.g., CRYAB, CDK1, PARP1, WNK1, GSK3B, and KAT2B), and metabolites (arachidonic acids). Laura et al.⁶⁶ developed a network-based biology analysis workflow that integrates different layers of genomic information, including transcription factor cotargeting, miRNA cotargeting, protein–protein interaction and gene coexpression, into a biological network. Then, the authors applied a consensus clustering algorithm (An ML-based biology analysis algorithm that divide the network into sub-modules with different functions)^{67,68,69,70,71,72,73} on identified network communities to discover cancer driver genes, which demonstrated that F11R, HDGF, PRCC, ATF3, BTG2, and CD46 could be oncogenes and promising markers for pancreatic cancer.

For proteomics, proteomic experiments are performed for annotation and correlation of genome sequences, quantitation of protein abundance, detection of posttranslational modifications, and identification of protein-protein interactions (PPIs)⁷⁴. PPIs not only play fundamental roles in structuring and mediating biological processes but also have been widely used for proteomics data analysis⁷⁵. For example, Vinayagam et al.³⁷ analysed the human PPI interaction network to identify indispensable proteins that affect the controllability of the network with control theory⁷⁶, which shows that if a system can be driven from any initial state to any desired final state in finite time with a suitable choice of inputs, the system is controllable. By changing the number of driver nodes in the network upon removal of that protein, the hub can be classified as “indispensable” “neutral” or “dispensable”, which correlates with increasing, no effect, or decreasing the number of driver nodes in the network upon removal of the key protein. The evidence shows that these indispensable proteins are primary targets of disease-causing mutations, viruses, and drugs.

Furthermore, analysing data from 1,547 cancer patients revealed 56 indispensable genes in nine cancers. 46 of these genes were associated with cancer for the first time, demonstrating the ability of intelligent network controllability analysis to identify novel disease genes and potential drug targets⁷⁷. Moreover, Valle et al.⁷⁸ developed a network-based biology analysis framework to compute the proximity between polyphenol targets and disease proteins. The calculated results indicated that the diseases whose proteins are proximal to polyphenol targets have significant gene expression changes, while the diseases whose proteins are distal to polyphenol targets have no such change. The network relationship between disease proteins and polyphenol targets provides not only a computing method to reveal the effect of polyphenols on diseases but also a basis to identify novel anticancer targets.

Metabolomics is routinely applied for biomarker discovery by profiling metabolites in biofluids, cells and tissues³⁴. Because of the inherent sensitivity of biotechnology, subtle alterations in metabolic pathways can be detected to provide insights into the mechanisms that underlie various physiological conditions and cancer processing³⁴. Owing to innovative developments in network biology, researchers employ biological networks to perform metabolomic analyses and provide us with a systems-level understanding of the role that metabolites play in cancer.

For example, Basler et al.⁷⁹ proposed an effective network-based biology analysis framework for the systematic study of flow control and identification of driver reactions in large-scale metabolic networks. They found that the driver reactions were under complex cellular regulation in Escherichia coli, suggesting their preeminent role in facilitating cellular control. Correlation statistics indicate that the driven response plays an important role in inhibiting tumour growth and represents a potential therapeutic target.

For multiomics integration analysis, addressing the complexity of tumour-host interactions requires an approach to handle integrative omics data⁸⁰. Compared to single omics studies, multiomics data provide researchers with various and interconnected molecular profiles to study carcinogenesis⁸⁰. Thus, integrated multiomics datasets in a network structure to artificial intelligence biology analysis has emerged as a powerful tool to fully appreciate the complex interlayer regulatory interactions in cancer progression. Such an approach allows us to benefit from prior information that can be summarized and presented in networks, thereby providing us with insights into carcinogenesis from an overall perspective⁸¹.

For example, Gov et al.⁸² first performed comparative analyses of transcriptome data, and then identified common and tissue-specific reporter biomolecules such as genes, receptors, membrane proteins, TFs, and miRNAs. Second, they used the interactions among receptors, TFs, miRNAs, and their targeted DEGs to reconstruct a tissue-specific network for ovarian cancer and used network-based biology methods to identify interaction hubs. Finally, GATA2 and miR-124-3p were identified as hub nodes, suggesting that they are potential biomarkers for ovarian cancer.

The principles and theories for commonly used artificial intelligence biology analysis algorithms

This study divides these commonly used artificial intelligence biology analysis algorithms into two categories. One is network-based biology analysis algorithm, including shortest path⁸³, module detection⁸⁴, and network centrality⁸⁵; the other is ML-based biology analysis algorithm including decision tree^86,87,88 and deep learning models^89,90,91.

The principles and theory of network-based biology analysis algorithms

Biological networks are efficient in integrating complicated biological data, because they can capture the property of biological entities and their relationships⁹². Mathematically, a network can be represented as a graph G = (V, E) where V and E are a set of nodes (vertices) and edges, respectively. Nodes in biological networks can represent proteins, genes, diseases, and drugs and edges in the network represent various biochemical physical or functional interactions between nodes. Therefore, network-based biology analysis algorithms focuses on identifying therapeutic targets and discovery of novel drugs for cancer from molecular networks such as protein-protein interaction networks⁷⁵, gene regulatory networks⁹³, metabolic networks⁹⁴, and drug-drug interaction networks⁹⁵.

Computational biologists have developed several network-based biology analysis algorithms to effectively process and analyze non-ordered or non-Euclidean data in biological networks, which can perform tasks such as link prediction⁹⁶, node ranking⁸⁵, network propagation⁹⁷, network modularization⁹⁸, and network control⁹⁹. Here, we briefly review and discuss the shortest path algorithm, module detection algorithm, and node prioritization methods using node centrality in identifying cancer therapeutic targets and discovering drugs.

Tthe shortest path algorithm

The shortest path algorithm, one of network link algorithm, is used to intelligently identify the shortest connection between two genes or proteins in a graphical model that represents a cellular network^100,101. The algorithm is illustrated in Fig. 3 and Algorithm 1. The shortest distance for a given network is calculated by Eq. (1):

$$d(S,T) = \mathop {{\min }}\limits_{K \in V} \;d(S,K) + d_{K,T}$$

(1)

Here, S and T stand for the source and target node, respectively. d(S,T) is the length of the shortest path from node S to T. V is a set of network nodes. K stands for a node in the network, and d_K,T represents the lengths of possible paths connecting nodes K and T.

Algorithm 1

The shortest path algorithm¹⁰²

1:	Input: Network G, Source S, Target T, Nodes
2:	create an empty set P and a set Q contains all nodes
3:	for each vertex V in Network:
4:	d(S,V) ← infinity
5:	d(S,S) ← 0
6:	do:
7:	U ← vertex in Q with minimal d(S,U)
8:	remove U from Q
9:	for each vertex V in Q that is connected with U:
10:	alt ← d(S,U) + d_U,V
11:	if alt < d(S,V):
12:	d(S,V) ← alt
13:	add U to the set P
14:	until Q is empty
15:	Output: the shortest path from S to T

The shortest path algorithm has been widely used to determine regulatory paths in cancer networks^103,104 and then discover the key targets on the paths¹⁰⁵. For example, Li et al.¹⁰⁶ first identified a set of six genes that can distinguish colorectal tumours from normal adjacent tissues using the maximum relevance minimum redundancy approach¹⁰⁷. The method ranks genes according to their relevance to the class of samples concerned while considering the redundancy of genes. Those genes that had the best trade-off between the maximum relevance to the sample class and the minimum redundancy were considered “good” biomarkers. Then, the authors applied the shortest path algorithm among the six genes in a PPI network underlying cancer and identified 15 shortest paths between any two genes of the gene set. Last, they found 35 genes on the identified shortest paths and ranked them according to their betweenness¹⁰⁸. The results showed that androgen receptor (AR), a ligand-dependent transcription factor, is ranked as the top gene, suggesting its involvement in colon carcinogenesis through regulating the proliferation and differentiation of tumour cells¹⁰⁹.

Additionally, Chen et al.¹⁰⁵ used a network-based biology analysis method, SAM (Significance Analysis of Microarrays)¹¹⁰, to analyse omics data and identified 153 differentially methylated CpG sites and differentially expressed molecules, including 42 miRNAs and 1,373 protein-coding genes. The authors first used the differentially expressed genes from the STRING database¹¹¹ to construct a PPI network. Then, they searched all the shortest paths connecting dysfunctional genes to identify potential cancer driver genes. Next, they ranked the genes by a permutation test and their network properties, such as betweenness and interaction scores. The top-ranking genes at different levels (i.e., methylation level, miRNA level, mutation level, and mRNA level) were regarded as driver genes of lung adenocarcinoma. Among these cancer driver genes, some appeared to be top candidates at different levels, suggesting their multifaceted contribution to lung carcinogenesis.

Above all, the shortest path algorithms^100,101 can help us efficiently identify regulatory paths in networks, allowing us to identify potential genes that are proximate to known cancer genes and thereby important for tumorigenesis. However, due to the complexity of the disease, potential cancer genes are not always on the identified shortest paths¹⁰⁶, revealing the limitations of such algorithms. To resolve this issue, Lu et al.¹¹² proposed a random walk with restart algorithm method and identified 298 potential CRC-associated genes, which is more effective and accurate than the shortest path algorithm proposed by Li et al.¹⁰⁶. In particular, the computing efficacy of the shortest path algorithm could be compromised by large networks and their search strategies¹¹².

The module detection algorithm

Cancers usually result from disruption of interactions of key regulatory genes with their partners^81,113. Module detection algorithms¹¹⁴, one of network propagation algorithm, identify communities of cancer genes in complex networks¹¹⁵ by analysing their topological structures (Fig. 4 and Algorithm 2). Here, we explain and illustrate the commonly used modularity maximization algorithm¹¹⁶, which identifies network modules with the maximum modularity coefficients by Eq. 2.

$$Q = \frac{1}{{2M}}\mathop {\sum}\limits_{i,j\, \in \,V} {[A_{ij} - P_{ij}] \cdot \delta _{C_i,C_j}}$$

(2)

where Q represents the modularity coefficient of an identified module, M is the total number of edges in the network, A_ij is the adjacency matrix, and P_ij represents the expected number of edges between nodes i and j. C_i or C_j represents the module to which node i or node j belongs. If i and j belong to the same module, $\delta _{C_i,C_j} = {{{\mathrm{1}}}}$; otherwise, $\delta _{C_i,C_j} = {{{\mathrm{0}}}}$. The identified modules are a group of genes that are supposed to have a similar biological function, such as promoting or inhibiting tumourigenesis.

Algorithm 2

Module detection algorithm.

1:	Input: Network G
2:	M ← the total number of edges in the Network
3:	for each vertex i in Network:
4:	i ← a single module
5:	k_i ← degree of vertex i
6:	a_i ← k_i/2 M
7:	for each edge in Network:
8:	if vertex i connects j:
9:	e_i.j ← 1/2 M
10:	else:
11:	e_i.j ← 0
12:	do:
13:	ΔQ ← e_i.j + e_j,i-2a_ia_j
14:	consolidate related communities
15:	direction ← the greatest increase (or smallest decrease) in Q
16:	until the entire network becomes a module
17:	Output: the module with a local maximum Q

Currently, many researchers employ module detection algorithms to intelligently identify potential therapeutic targets for cancer^117,118,119. For example, Ghiassian et al.¹²⁰ used the DIseAse MOdule Detection (DIAMOnD) method¹²¹ to identify the local modules within the interconnected map of molecular components. They found that disease-related genes were significantly enriched in highly overlapping modules, which indicated that the predicted modules may help identify new anticancer targets. Of note, since the results of module detection algorithms depend mainly on network structures, the identified modules may vary for the same disease network with slightly different topology^85,117.

Since potential drug targets may exist in different network modules, we can make use of the correlation between modules to identify reliable cancer treatment targets⁸¹. Therefore, Wang et al.¹²² proposed the seed connector algorithm (adding a few extra hidden nodes as much as possible to link disease proteins) by considering the interactions among cancer-associated proteins. First, this algorithm starts with known seed proteins and induces a loosely connected subnetwork consisting of only seed proteins. Second, Wang et al. sequentially select such proteins as seed connectors that maximally increase the size of the largest connected component of the subnetwork until there is no additional protein that can be selected as a seed connector. Finally, the cancer modules are pinpointed.

While these aforementioned algorithms^122,123,124 can intelligently identify meaningful functional modules from network topologies, it may be difficult to capture disease modules¹²⁵. One possible reason is that disease proteins do not constitute particularly densely connected subgraphs but agglomerate in specific large regions of the network. For this reason, Tripathi et al.¹²⁶ considered analysing the patterns of connectivity in a disease module to be an effective way to understand the properties of disease modules.

The node centrality

Node centrality measures the importance of nodes and is suitable to intelligently locate key nodes with important biological functions for network biology¹²⁷.

Usually, we listed four types of node centrality as follows: (1) As the simplest form of network centrality, degree centrality is the number of nodes directly connected to the network^127,128; (2) Coreness centrality considers both the degree of nodes and their positions in a network¹²⁹; (3) Betweenness centrality of a node is the probability for the shortest path between two randomly chosen nodes to go through that node, and it determines the actor that controls information among other nodes by connecting paths¹³⁰; (4) Eigenvector centrality¹³¹ not only considers the number of edges and the position of nodes but also the impact of adjacent nodes on the interactive network.

Table 2 shows the formulas for node centrality computing. Figure 5(a–d) illustrates the above four types of node centrality, and Algorithm 3 presents the pseudocode to compute four types of node centrality.

Table 2 The formula to compute degree centrality, coreness centrality, betweenness centrality and eigenvector centrality

Full size table

**Fig. 5: Four types of node centralities of biological networks.**

Algorithm 3

The algorithm of degree centrality, coreness centrality, betweenness centrality and eigenvector centrality.

1:	function1 Degree centrality:
2:	Input: Network G
3:	for each vertex i in Network:
4:	d_i ← the number of ties that vertex i has
5:	C_D(i)=d_i
6:	Output: C_D(i)
7:	function2 Coreness centrality:
8:	Input: Network G
9:	for each vertex i in Network:
10:	N(i) ← the set of the neighbours adjacent to vertex i
11:	for each vertex j in N(i):
12:	ks(j) ← the k-shell index of vertex j
13:	C_C(i) ← C_C(i) + ks(j)
14:	Output: C_C(i)
15:	function3 Betweenness centrality:
16:	Input: Network G
17:	for each vertex i in Network:
18:	for each vertex j in Network:
19:	for each vertex k in Network:
20:	if j < k:
21:	g_j,k ← number of all shortest paths between j and k
22:	g_j,k(i) ← number of shortest paths between j and k containing i
23:	C_B(i) ← C_B(i) + g_j,k(i)/g_j,k
24:	Output: C_B(i)
25:	function4 Eigenvector centrality:
26:	Input: Network G
27:	for each vertex i in Network:
28:	for each vertex j in Network:
29:	if vertex i is linked to vertex j:
30:	a_i,j=1
31:	else:
32:	a_i,j=0
33:	x_j ← the degree of vertex j
34:	C_E(i) ← C_E(i)+ 1/λ ∙ a_i,jx_j
35:	Output: C_E(i)

As described in Fig. 5(a) and Eq. 3, the degree centrality of node 2 is 3 (C_D (2) = 3) because node 2 interacts with nodes 0, 1, and 3. We demonstrated that highly connected nodes or hubs are more likely to be essential¹²⁷. Because the more direct connections a node has, the greater the impact that the node can exert on the network¹³², we can utilize the degree centrality of nodes to identify cancer therapeutic targets.

For example, Zhang et al.¹³³ predicted that hypoxia inducible factor-1α (HIF-1α) and prolyl 4-hydroxylase beta polypeptide (P4HB) may be considered potential biomarkers of gastric cancer by constructing a PPI network. Nevertheless, not only Jalili et al.¹³⁰ suggested that high connectivity does not necessarily imply its essentiality, but also Kitsak et al.¹²⁹ argued that the location of nodes is more significant than the immediate neighbours to evaluate its spreading influence because degree centrality considers only direct interactions of a node but not its impact on other nodes, resulting in low accuracy for target prediction compared to other methods such as coreness centrality¹³⁴.

As shown in Fig. 5(b) and Eq. 4, the coreness centrality of node 3 is 8 (C_C (3) = 8) because the neighbours adjacent to the labelled vertex (3) are vertex (1), vertex (2), vertex (4) and vertex (5), and these four nodes belong to a 2-shell. Coreness centrality is an advanced form of node centrality because it considers both the degree of nodes and their positions in a network to quantify the importance of nodes in a network¹²⁹. A node with a greater coreness means that the node is located in a more central place and is much more influential in network propagation than the nodes with high-degree but less coreness¹²⁹. Among them, the most classic method to calculate the coreness centrality of network nodes is the k-core decomposition method¹³⁵, which decomposes the network iteratively according to the remaining degree of the nodes.

For instance, Li et al.¹³⁶ employed the k-core decomposition method to obtain the coreness of the PPI network. Subsequently, the targets were screened for topological importance. Then, the major hubs in the hub interaction network were determined, and a total of 62 major hubs were identified, including 11 indirubin (EGFR, JAK2, ERBB2, CHUK, CDK5, KIF11, DRD2, CDK3, HTR1A, JAK3 and TYK2) and derivative targets and 51 differentially expressed genes (DEGs) for imatinib resistance. These 11 major hubs were closely related to DEGs that were resistant to imatinib. Indirubin and its derivatives may inhibit imatinib resistance through the regulation of these genes to treat chronic myeloid leukaemia (CML).

Described by Fig. 5(c) and Eq. 5, the betweenness centrality of node 1 is 3.5 (C_B (1) = 3.5) because there are four node pairs contributing to node one (g_0,2(1)/g_0,2(1) = 1, g_0,3(1)/g_0,3 = 1, g_0,4(1) / g_0,4 = 1, and g_2,3(1)/g_2,3 =0.5). Betweenness centrality is based upon the frequency with which a node lies between the shortest path of all other possible pairs of nodes within a network and identifies the gatekeepers that control communication of nodes in the network¹³⁰.

For example, Taylor et al.¹³⁷ used betweenness centrality analysis to identify intermodular hub proteins and intramodular hub proteins in the breast cancer network. The identified proteins may serve as an indicator of breast cancer prognosis. Moreover, Raman et al.¹³⁸ computed degree, betweenness, and closeness indices in PPI networks for 20 organisms and showed that the degree and betweenness centralities of nodes correlate with their lethality in many organisms.

As described in Fig. 5(d) and Eq. 6, the eigenvector centrality of node 1 is 3 (C_E (1) = 3) because node 1 is connected to nodes 0, 2 and 3 (a_1,0, a_1,2 and a_1,3 equal 1, respectively), and the degree of x₀, x₂ and x₃ equals 1, respectively. Eigenvector centrality considers not only the number of edges and the position of nodes but also the impact of adjacent nodes on a network.

For example, Mallik et al.¹³⁹ first identified differentially expressed and methylated genes in uterine leiomyoma tumours and then found TFs and miRNAs that regulate the expression of these genes. Subsequently, they reconstructed a network that comprised the genes, TFs, and miRNAs and then used eigenvector centrality to identify potential biomarkers. They specified that PTGS2 and TACSTD2 are potential novel biomarkers, since both genes are downregulated and hypermethylated in the tumour.

Moreover, several researchers have attempted to integrate more than one centrality index to increase the efficiency of the node centrality algorithm. For instance, Chen et al.¹⁴⁰ used the differentially expressed proteins of prostate cancer (PC) to construct a PPI network. Then, they integrated the connectivity degree, betweenness centrality, and closeness centrality of nodes to evaluate critical nodes to identify the core module of the PPI network. Finally, they identified SLC2A4 and TUBB2C as important proteins regulating the pathogenesis of cancer, suggesting the proteins involved in biological processes and pathways as potential targets for PC diagnosis and treatment. In addition, Aamri et al.¹⁴¹ constructed a gene-gene-interaction network for the entire human genome and then applied betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network to identify possible cancer-related genes. The results showed that the average precision for identifying breast, prostate, and lung cancer genes varied between 80–100%.

Although highly connected nodes in the network architecture are essential, recent studies point out that integrating the prior knowledge of cancer into centrality indices can accurately identify anticancer targets¹³⁰. For this reason, Jiang et al.¹⁴² developed a network-based biology analysis method, named NEST, which predicts essential proteins according to the expression levels of their interacting partners in a network. Additionally, the results showed that NEST significantly outperformed the classic centralities on gene essentiality prediction and functional screen result enhancement.

Machine learning-based biology analysis algorithms

Machine learning (ML) algorithm is a subset of AI algorithms that can learn from data, therefore removing the need for explicit instructions on how to do certain tasks¹⁵. The key to identify therapeutic targets and discover drugs using ML-based biology analysis is to make use of network features in biological networks. The network features include the topological features (such as node centrality, interaction, local structure, subgraph, network propagation results, and network-based structure similarities) and the biological information that is embedded in network nodes (such as the gene expression profile, gene mutation frequency, and gene functional annotation).

Here, we introduce two classical ML-based algorithms: one is the decision tree algorithm, which selects significant topological features for cancer; the other is deep learning, which uses the network features to identify cancer targets and discover drugs.

The decision tree algorithm

A decision tree is a supervised classification algorithm¹⁴³ with three steps: feature selection, decision tree generation, and decision tree pruning^86,87,88. Figure 6 shows how to classify a set of samples into two groups using the decision tree algorithm.

In the network-based biology analysis, network topology features⁸⁸ are usually integrated into a decision tree to classify gene-phenotype associations for cancers^144,145,146 to select significant topological features for cancer.

For instance, Ramadan et al.¹⁴⁷ extracted thirteen network topological features (Table 3) from a publicly available gene co-expression network and a PPI network of breast cancer. Then, to assess the significance of topological measurements associated with breast cancer, they used Decision Tree Bagger¹⁵⁶ to classify breast cancer gene-phenotype associations. The importance of each topological measure was then evaluated using a score that combines the accuracy of breast cancer classification and the Gini index¹⁴⁸ (Table 3). The computed scores of the top five identified features (i.e., structural holes, node degree, node coreness, k-Step Markov and subgraph) outperformed the others, and they were selected as key features for the classification of breast cancer phenotype-gene associations.

Table 3 Thirteen network topological features for decision tree classification¹⁴⁷. The score is a combination of the classification accuracy and the Gini index¹⁴⁸

Full size table

Although the decision tree algorithm can help us select key network features, it usually has the overfitting problem when too many features exist in the network¹⁵⁷, which significantly decreases the classification and prediction on independent testing¹⁵⁷.

At present, there are two commonly used methods to resolve overfitting caused by the decision tree algorithm. One method is using dimension reduction¹⁵⁷ and pruning strategy⁸⁶ to improve the classification accuracy by feature reduction; the other is employing the random forest algorithm¹⁵⁸, an ensemble algorithm with multiple decision trees. The random forest algorithm adopts a bagging strategy, which has higher accuracy and reliability than the classical decision tree algorithm¹⁵⁹.

For example, Toth et al.¹⁶⁰ used the random forest algorithm to predict the aggressive behaviour of prostate cancer. Their methylation-based classifier demonstrated excellent performance in discriminating prognosis subgroups of the test set (Kaplan-Meier survival analyses with log-rank p value < 0.0001) with an AUC value of 0.95¹⁶¹ for the sensitivity analysis. Finally, the experimental verification showed that the loss of ZIC2 protein expression was associated with poor prognosis and correlated with a significantly shorter time to biochemical recurrence.

In addition to the overfitting problem, it is difficult for decision trees to visualize the complicated classification procedure¹⁴⁶. Recently, the alternating decision tree (ADTree)¹⁶² has made the classification procedure intuitive and easy to understand by adding an intuitive graphical model, and the algorithm builds decision trees over a user-defined number of iterations using confidence-rated boosting, so it returns both a class label and a score that measures confidence in the classification, as shown in Fig. 7 and Algorithm 4.

**Fig. 7: An example of an ADTree model.**

For example, Carson et al.¹⁴⁶ used ADTree to classify proteins in a breast cancer network. As indicated in Fig. 7, the most effective attributes to distinguish disease and non-disease proteins are node degree, disease neighbour ratio, eccentricity, and neighbourhood connectivity, which was proven by Hao et al.¹⁶³ and Zhang et al.¹⁶⁴.

Algorithm 4

The algorithm of ADTree model¹⁶⁵

1:	Input: labelled dataset
2:	root node ← the bias in the dataset
3:	for each decision node in the tree:
4:	a_i ← attribute value
5:	t_i ← threshold
6:	for each decision node in the tree
7:	if (the decision node has a parent node):
8:	if a_i ≥ t_i:
9:	return the score of the prediction node for the left path
10:	else:
11:	return the score of the prediction node for the right path
12:	else:
13:	return 0
14:	s ← the sum of all scores acquired
15:	if s > 0:
16:	Output: the positive class
17:	else:
18:	Output: the negative class

Although the decision tree, random forest and ADTree^86,87,88,158 demonstrate the tendency to identify such proteins that are well annotated and studied for cancer, these methods are subject to producing local optimal solutions. Therefore, Chen et al.¹⁴³ proposed using the decision tree classifier based on particle swarm optimization¹⁶⁶ to avoid falling into the trap of local minima by adding randomness to optimize the number of features and detection accuracy of cancer treatment targets. Furthermore, the gradient boosting decision tree¹⁶⁷ is a very flexible and scalable method to classify network nodes for future study.

The deep learning algorithms

Deep learning is a subfield of machine learning, and the origin of neural networks sets the stage for the emergence of deep learning models¹⁶⁸. Deep learning model is a neural network composed of complex structures and nonlinear transformations^90,91 that attempts to model high-level abstractions of data using multilayer neurons. Through training and iteratively updating its hyperparameters (Eq. 7), the initial low-level feature representation (such as topological features and biological information) of samples is transformed into the high-level representation that shows the distinction between samples. The strength of deep learning is its ability to detect complex patterns in data, making it suitable to interrogate the biological networks that consist of complex, interdependent relationships among genes.

$$W_{{{\mathrm{k}}}} \to W_{{{{\mathrm{k}}}} + {{{\mathrm{1}}}}} = W_{{{\mathrm{k}}}} - \eta \frac{{\partial C}}{{\partial W_{{{\mathrm{k}}}}}}$$

(7)

W, k, and C are the weight, iteration, learning rate, and loss function, respectively.

Currently, there are many neural network models and complex functions for ML-based biology analysis. In this paper, we only present several commonly used neural networks (Table 4). Benefiting from the strong ability of neural networks in mining complex information on links or nodes, deep learning is a suitable method to identify potential cancer targets and discover drugs for cancer treatment in complex biological networks¹⁷⁵. For example, Selvaraj et al.¹⁷⁶ searched for therapeutic targets for lung adenocarcinoma in a network of protein-protein and protein-drug interactions and employed a neural network to identify candidate drugs, where phosphothreonine is predicted via molecular dynamics simulations to target the hub node MAPK1 in the network.

Table 4 Commonly used neural networks in ML-based biology analysis

Full size table

Currently, artificial intelligence biology analysis has benefited from the utilization of graph-based neural networks instead of commonly used non-graph neural networks such as CNN¹⁷⁰ or DNN¹⁶⁹, because graph-based neural networks can take the biological network structure as the input directly, learn an embedding that contains information about the neighbourhood of a target node in a graph, and analyse the biological network with neural networks technology. Figure 8 illustrates the basic flowchart of graph-based neural networks for the investigation of different properties of biological networks.

**Fig. 8: The illustration of graph-based neural networks for ML-based biology analysis.**

There are two advantages in using graph-based neural networks to identify cancer targets or discover drugs from biological networks.

1.
Feature representation. Graph embedding¹⁷⁷ is the core method to extract features in graph-based neural networks, which represent network nodes as a low-dimensional vector representation, preserving both network topology and node content information¹⁷⁸. For example, Li et al¹⁷⁴ proposed a similarity-based miRNA-disease prediction method that used DeepWalk, a graph embedding algorithm, to compute the topological similarities between two diseases nodes. The model extracts the disease node features in the disease-disease network based on the random walk algorithm, and significantly enhances the prediction performance by utilizing global network association information. For diseases nodes with similar features, if one of the diseases is associated with miRNA, the other is predicted to be associated with the miRNA.

In addition, Zheng et al.¹⁷⁹ proposed an attention-based graph neural networks (attention mechanism assigns different weight parameters to different targets through learning, so as to consider the importance of key targets locally and globally¹⁸⁰) to learn the graph embedding feature (association scores) from piRNA-disease association network. The results showed that the predicted scores of piRNA-disease associations are positively correlated with the association probability between a piRNA and a disease, suggesting that piRNAs with closer distances to tumour genes in the network are more likely to be therapeutic targets of cancer.
2.
Feature integration, which integrates the heterogeneous, noisy, nonlinear-related biological network information (such as node similarity, node interactions, upstream and downstream relationships) multi-views (such as drug molecular structures and drugs’ indications)¹⁸¹. For example, Ma et al.¹⁷² proposed a novel graph autoencoders model (GAE) to learn accurate and interpretable drug similarity measures from multiple types of drug properties. The GAE uses attention mechanism¹⁸⁰ to integrate multi-view (multiple types of drug properties) from drug-drug interactions network and determines the weights for each view with respect to the similarity measure tasks for better explaining the contribution of drug properties to drug similarity. Due to the ability to integrate network data from multi-views and autoencoder structures, GAE can resist the noise interference in the data. Thus, graph-based neural networks are more robust and reliable in most application scenarios¹⁸².

Overall, deep learning can comprehensively explore features such as node degree, edge length, and module in biological networks^83,84,85,183 to provide an accurate prediction for drug targets of cancer through artificial intelligence of multiomics data in complex biology networks¹⁸⁴. However, there are still two key issues to be addressed. One is the interpretability of the models, which is critical for clinical adoption¹⁸⁵. The other is how to demonstrate the generalizability of the approach¹⁸⁵ and validate these approaches in the context of multi-institutional datasets. Therefore, these issues are actively being tackled from model interpretation, extraction of biological insights¹⁸⁶ and model reproducibility¹⁸⁷.

The artificial intelligence biology analysis for biomedical applications

Because the wide and easy accessibility of high-throughput data in oncology has provided the basis for developing novel artificial intelligence methods and validating their capability to identify therapeutic targets, this section will focus on reviewing the biomedical applications from four perspectives. First, we present the artificial intelligence applications to identify novel anticancer targets. Second, we present the artificial intelligence applications to evaluate the druggability of potential target genes. Third, we show the artificial intelligence applications for drug discovery. Fourth, we show the artificial intelligence applications for drug property prediction.

Identification of novel anticancer targets

Artificial intelligence biology analysis applications¹⁸⁸ usually use omics data to build networks and identify co-expression modules of genes, proteins, metabolites, critical pathways between molecules, and key molecules in biological networks¹⁸⁹. This study will introduce these applications from two perspectives: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for identifying novel anticancer targets

Network-based biology analysis applications firstly reconstruct networks by computing differential expressions of molecules and their correlations^{190,191,192,193}. Then, gene set enrichment analysis are performed to identify network modules with different biological functions¹⁹⁴. Finally, the identified network modules are used to discover key genes that are potential therapeutic targets (or biomarkers) for cancer. Here, we show the key target identification procedure by network-based biology analysis applications as follows.

WGCNA¹⁹⁵ is a commonly used network-based biology analysis application that uses various gene expression matrices as input. Then, WGCNA outputs different gene network modules and the core genes in the biological network. For example, Zhou et al.¹⁹⁶ used WGCNA to analyse colorectal cancer data from TCGA (Fig. 9), which demonstrated that 11 hub genes and 5 hub miRNAs have predictive power for the prognosis of colorectal cancer patients by the following steps.

In Step 1, the correlation between all pairs of genes and miRNAs by differential gene expression analysis was calculated, and two similarity matrices were constructed. In Step 2, the adjacency matrix, which comes from similarity matrices, is transformed into a topological overlap matrix (TOM) by using TOM similarity, and then the coexpressed gene and miRNA modules are identified by using dynamic tree cutting¹⁹⁷. In Step 3, after module preservation analysis, six gene modules were found to have strong stability, and one miRNA module was found to have low stability. In Step 4, they performed module-trait relationship analysis to further validate the module–clinical trait relationships, and two pathological stage-related gene modules and one pathological stage-related miRNA module were identified. In Step 5, hub genes and hub miRNAs were identified by calculating the module membership and gene significance.

Though network-based biology analysis methods are useful in identifying anticancer targets, they have some limitations, such as they cannot effectively handle multiomics data, leading to high false-positive rates of identified targets⁴². Developing comprehensive network-based biology analysis applications may resolve the problems and increase the precision for predicting cancer biomarkers¹⁹⁸.

For example, Lai et al.¹⁹⁹ deployed an integrated approach that combined network-based algorithms and RNA sequencing data to delineate miRNA-based strategies that enhanced DC (dendritic cell)-elicited immune responses. First, the authors performed RNA sequencing to obtain the protein-coding genes and miRNAs in relation to standard DCs. Then, they analysed miRNA-gene interactions at the pathway level and reconstructed regulatory networks underlying the immunological functions of DCs. Finally, they performed network-based prioritization of miRNAs by combining their expression profiles and strength of association with other protein-coding genes. Their analysis identified dozens of promising miRNA candidates, of which miR-15a and miR-16 are the most promising ones for increasing the immunogenic potency of DCs and therefore improving DC-based immunotherapy against cancer.

In summary, we consider that an increasing number of network-based biology analysis applications will be developed for novel anticancer targets identification in the distant future.

ML-based artificial intelligence for identifying novel anticancer targets

ML-based biology network analysis applications are applied to interrogate the large, complex data and thus identifying reliable potential novel targets as effective treatments of human diseases²⁰⁰. These ML-based biology analysis applications for novel anticancer targets identification consist of classification²⁰¹, clustering²⁰², neural networks^203,204, and so on²⁰⁵. Here, due to the limit space of the review, we only focus on the ML-based biology network analysis applications for classifications and graph-based neural networks.

ML-based biology network analysis applications for classifications identify key targets by determining the key factors of classifications²⁰⁶. It considers specific biomarkers (such as gene or protein nodes) of the defined classes as key targets²⁰⁶. Recently, the classification-based applications and molecular profiling²⁰⁷, use genome-wide gene transcription profiles, protein expression profiles and/or mutational landscapes to make a more accurate classification of tumor subtypes and identify biomarkers for specific tumor types.

For example, Sinkala et al,²⁰⁸ applied classification analysis on networks to reveal subtypes of pancreatic cancer and their molecular characteristics. Firstly, the authors employed K-means clustering to the reverse phase protein array (RPPA), determined proteomics data with 45 high-purity pancreatic cancer samples, and then identified two clusters of samples.

Secondly, they compared their clustering results to other subtypes that have been reported in the literature for various other molecular data types (such as DNA methylation status, protein expression levels and expression levels of mRNAs and miRNAs), and then applied the similarity network fusion (SNF) to identify two-cluster and three-cluster solutions comprised 25 and 20 tumors. The SNF method solves the disparate clustering problem by constructing similarity networks of samples for each available molecular data type and then efficiently fuses these into one network that represents clustering based on all the underlying data.

Thirdly, they applied proteomics-based signaling pathway analysis to distinguish disease subtypes and found that, for tumors of the two major pancreatic cancer subtypes, oncogenesis may be primarily driven by perturbation in either SMAD4 or mTOR signaling pathways. Furthermore, they performed gene set enrichment analysis using the Gene Ontology database⁵² and found that pancreatic cancer subtypes classified by mRNA expression levels and DNA methylation statuses show differences in molecular functions in terms of mRNA.

Finally, given that different types of molecular data yield different patterns of tumor clustering, they attempted to identify a list of biomarkers that can differentiate the two tumor subtypes. Using neighborhood component analysis, they identified biomarker sets comprising 50 mRNAs, 49 methylated genes, 14 proteins, and 20 miRNAs. Subsequently, they separately applied hierarchical clustering using each type of the molecular data and successfully reproduced the two pancreatic cancer subtypes.

For graph-based neural networks, they take advantage of not only making use of the correlation among samples described by similar networks, but also message passing between targets and neighbors to improve the accuracy of targets identification²⁰⁹.

For example, to the best of our knowledge, the MOGONET proposed by Wang et al.²⁰³ is the first to make use of both graph convolution networks (GCNs) and cross-omics relationships in the label space for effective multiomics integration in biomedical data classification tasks. The specific process is as follows:

Firstly, they constructed a weighted sample similarity network for each type of omics data using cosine similarity. Taking both the omics features and the corresponding similarity network as the input, a GCN is trained for each type of omics data to predict class labels.

Secondly, the predictions generated by each omics data-specific GCN are further utilized to construct a new tensor, named cross-omics discovery tensor, which can reflect the cross-omics label correlations.

Finally, the cross-omics discovery tensor is forwarded to VCDN (view correlation discovery network) to explore the latent correlations across different omics data for final label prediction. Because the importance of a feature to the classification task can be measured by the performance decrease after removing individual features. Therefore, they used this method on the test data set to quantify and rank the contribution of each feature of different omics data to the prediction. Using the method, they identified top-ranking features as biomarkers for breast cancer.

In addition, Xuan et al.²⁰⁴ proposed a novel method based on the graph convolutional network and convolutional neural network (GCNLDA) to infer disease-related lncRNA candidates. First, they developed a network that is comprised of lncRNA, disease, and miRNA nodes. Then, they developed an embedding matrix of lncRNA-disease node pairs with respect to the biological premises. Then, they employed a convolutional neural network to explore various connections related to lncRNA-disease on node pair embedding. Finally, they learned the local network representations of lncRNA-disease pairs by deeply integrating the graph convolution autoencoder into topological lncRNA-disease-miRNA heterogeneous networks. Cross-validation confirmed that GCNLDA outperforms other state-of-the-art methods in terms of both AUC and AUPR¹⁶¹. Case studies²⁰⁴ on stomach cancer, osteosarcoma and lung cancer confirmed that GCNLDA effectively discovered potential lncRNA-disease associations. Therefore, GCNLDA is becoming an effective tool to screen reliable candidates for lncRNA-disease association validation with the help of biological experiments.

In summary, we consider that an increasing number of ML-based biology analysis applications will be developed to identify novel anticancer targets with the development of deep learning in the future.

Evaluation of the druggability of potential targets

Druggability is a concept that assesses whether a drug can bind to a protein to alter its activity^3,4. The human proteome has approximately 6,000 to 8,000 potential pharmacological targets, but only a small fraction can be targeted by drugs^7,210. Therefore, it is important for us to evaluate druggability after finding novel anticancer targets. This study will introduce these applications from two perspectives: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for evaluating the druggability of potential targets

The druggability evaluating approach requires a long development cycle and high financial cost for the 3D structures of protein analysis²¹¹, while network-based biology analysis application provides an alternative methods to accelerate the evaluation procedure for the druggability of potential targets²¹².

Described by Fig. 10, PockDrug is a novel web server that is employed to predict pocket druggability on proteins and queried for a protein or a set of proteins²¹³. For example, Yang et al.²¹⁴ constructed a protein–protein interaction network for thyroid cancer and identified three key targets, HEY2, TNIK, and LRP4. Then, they used PockDrug to predict whether HEY2, TNIK, or LRP4 have targetable pockets for drugs in the following three steps.

In Step 1, they inputted the potential target and located pocket estimation methods. In Step 2, they predicted the druggability of the pockets by computing the physicochemical properties of the target pockets. In Step 3, they screened three hub genes, HEY2, TNIK, and LRP4. Based on the predictions, TNIK, which has 8 out of 538 residues, has an average druggability probability greater than 0.5 and thus was considered a druggable pocket for thyroid cancer.

In short, with the in-depth study of protein pocket, an increasing number of network-based biology analysis applications are developed to accurately evaluate the druggability of anticancer targets, providing reliable druggable targets for cancer treatment.

ML-based artificial intelligence for evaluating the druggability of potential targets

These ML-based biology analysis applications for evaluating the druggability of potential targets consist of protein structure modeling and drug-target affinity analysis. Previously, traditional analysis of protein structure modeling required considerable time and financial cost²¹¹, which greatly limited the traditional application of PockDrug since it is heavily dependent on an accurate 3D protein structure. Recent ML-based biology analysis applications have focused on developing methods to predict the 3D structure of a protein from its genetic sequence, also known as the protein folding problem. The cutting-edge ML-based modelling method^215,216,217 can generate 3D protein structures with high accuracy and efficiency, which makes it possible for PockDrug to be widely used.

For example, Yang et al.²¹⁸ developed the trRosetta algorithm, which fast and accurately predicts protein structures based on energy minimizations with restrained trRosetta. They employ a deep residual neural network to predict the restrained trRosetta, which consists of inter-residue distance and orientation distributions. Since trRosetta outperforms all previously protein modelling methods in benchmark tests on CASP13-²¹⁹ and CAMEO-²²⁰ derived sets, it turns out that trRosetta can accurately predict protein structure. Furthermore, Senior et al.²²¹ developed Alphafold to predict protein structures from amino acid sequences. First, Alphafold predicts the distances between pairs of residues by training a neural network to analyse the covariation of homologous sequences. Then, Alphafold constructs a potential mean force that accurately describes the shape of a protein. Finally, Alphafold optimizes the protein structure by a gradient descent algorithm. Because AlphaFold can predict protein structure with high accuracy even for such sequences with fewer homologous sequences, we consider that AlphaFold makes great progress in protein-structure prediction.

ML-based biology analysis applications for drug-target affinity (DTA) analysis application estimates the interaction strength of novel drug–target pairs based on previous studies to evaluate the druggability of targets²²².

Compared with other methods, such as molecular docking²²³ and collaborative filtering²²⁴, graph-based neural networks are more effective in DTA prediction, because graph-based models facilitate the learning by considering both drug structure and drug-target interaction information instead of representing the drugs as string, as string sequences may lose the structural information of the molecule and may impair the predictive power of models²²⁵.

For example, Nguyen et al.²²⁵ is the first to use GNN for predicting DTA. The authors proposed GraphDTA, a new neural network model for regression tasks, which takes the drug-target pair as the input and outputs the continuous measurement of the binding affinity of the pair.

In detail, for the input drug-target pair, the protein targets are represented as sequence information instead of the molecular diagram of tertiary structure. While the drug compounds are represented as network graphs of atomic interaction, where each node is an eigenvector that represents five kinds of information: the atom symbol, the number of adjacent atoms, the number of adjacent hydrogens, the implicit value of the atom, and whether the atom is in an aromatic structure. For the output, GraphDTA combined the drug-target pair feature information to predict the continuous measurement of the binding affinity of the drug-target pair.

Through a multivariable statistical analysis of GraphDTA’s output data from hidden layers, the authors have two conclusions. One is to identify the correlations between hidden node activations and domain-specific drug annotations, such as the number of aliphatic hydroxyl groups, which suggests that the graph neural network can automatically assign importance to well-defined chemical features without any prior knowledge. The other is that the model makes it easier to extract features from drugs with obvious molecular structure patterns to achieve high-precision predictions. Especially, drugs that do not have an obvious molecular structure pattern are more difficult to predict.

In short, with the development of deep learning, an increasing number of ML-based biology analysis applications can quickly and accurately evaluate the druggability of anticancer targets, providing reliable druggable targets for cancer treatment and reducing the time and financial costs of experiments.

Drug discovery

After evaluating the druggability of potential targets, it is essential to discover the drugs that interact with the potential therapeutic targets. As complex or concomitant diseases may usually require treatment with multiple drugs, but the use of multiple drugs will increase the risk of side effects²⁰⁰, it is very essential for drug discovery to predict the interactions between drug-target and drug-drug.

This study will introduce these applications from two perspectives as the above section: one is network-based biology analysis applications, and the other is ML-based biology analysis applications.

Network-based artificial intelligence for drug discovery

These network-based analysis applications for drug discovery consist of drug screening and drug repurposing. Drug screening is a process that potential drugs are identified and optimized before selecting a candidate drug to progress to clinical trials²²⁶. Since screening drugs through biological experiment is quite laborious, expensive, and time-consuming²²⁶, network-based biology analysis application becomes an alternative way for efficiently drugs screening.

Identifying drug-target interactions (DTIs) is crucial for drug screening. Especially, novel DTIs can be employed to look for the novel anticancer drugs with known targets²²⁷.

The network-based biology analysis applications for DTI prediction are usually based on guilt-by-association principle that a protein may be a target for a drug if many of the protein’s neighbors in the interaction network are targets of the drug²²⁸. Based on this principle, we classify the network-based biology analysis applications for predicting DTI into two categories.

One is ‘top-down’, which is from observable characteristics, such as side-effects or the diseases treated by a drug, to the interaction. For example, Campillos et al.²²⁹ used the physiological effect information from side effect similarity networks between entities for DTI prediction to predict whether two molecules could interact.

The other is ‘bottom-up’, which is from molecular features, such as protein structure, to interactions. For example, Feng et al.²³⁰ and Lee et al.²³¹ predicted DTI based on the proteins in protein-protein interaction networks with similar property features that may interact with the same drug.

Drug repurposing, also known as drug repositioning, is another drug discovery application. It refers to a method that identifies new indications for approved drugs or drug candidates which have failed in the development phase²³². Compared to the drug screening process, since drug repurposing can significantly reduce the drug development period and costs²³³, it is a better application to discover anticancer drugs.

The network-based biology analysis applications are efficient to carry out drug repurposing analysis, because the constructed drug similarity networks contain the similarity, interaction or linkages between drugs, diseases, and targets. Here, we introduce four major network-based biology analysis applications of drug repurposing^{234,235,236,237,238,239,240,241} as follows.

The first network-based biology analysis application of drug repurposing quantifies the similarities or relationships for known drug-disease associations, and then uses regression models or statistical models to predict novel drug-disease associations^234,235. For example, Cheng et al.²⁴² presented a network-based drug repurposing tool, which can accurately predicts drug responses in cancer cell lines by integrating human protein-protein interactome with transcriptome profiles, whole-exome sequencing, drug-target interactions and drug-induced microarray data.

The second network-based biology analysis application of drug repurposing infers new indications of drugs through analyzing information flow or performing random walks on drug-disease association networks^236,237,238. For example, Luo et al.²⁴³ proposed a novel random walk method to measure the similarity of drugs and diseases respectively by the drugs properties and diseases properties, so as to predict potential indications of drugs.

The third network-based biology analysis application of drug repurposing, named individualized Network-based Co-Mutation, quantifies putative genetic interactions in cancer and it can be used to identify candidate therapeutic pathways for cancer²³⁹. For example, Cheng et al.²⁴⁴ used the approach to identify potential targets or new indications of existing cancer drugs that directly target significantly mutated genes or their neighbor genes in the human PPI interaction network.

The fourth network-based biology analysis application of drug repurposing can be realized directly through calculating the adjacency matrix of drug and disease network^240,241. Based on this method, Luo et al.²⁴⁵ utilized the matrix completion algorithm to fills out the unknown entries in the drug–disease matrix by constructing a low-rank matrix approximation. New drug–disease associations will be screened by the predicted fill value.

Taken together, the network-based drug screening and repurposing applications provide researchers a lot of alternative approaches for quickly anticancer drugs discovery.

ML-based artificial intelligence for drug discovery

Currently, ML-based biology analysis applications have been employed to carry out drug screening and drug repurposing. For drug screening, previous studies have shown that network-based biology analysis applications can only screen the neighbour proteins of known targets, while drug-protein interactions may dysregulate the targets’ interacting neighbours²²⁷ resulting in high false positive prediction results. ML-based biology analysis applications, such as graph-based neural network, have the advantage of integrated features that combine both ‘bottom-up’²²⁹ and ‘top-down’²³⁰ approaches to reduce the high false positive prediction results.

For example, Hinnerichs et al.²²⁷ developed the DTI-Voodoo that combines molecular features and phenotypes information with an interaction network using graph neural networks to predict drug-protein interactions (Fig. 11).

Firstly, the model takes the two features, phenotypes features and molecular features, as input. To extracted phenotypes features, they utilized DL2Vec²⁴⁶ to obtain ontology-based representations. DL2vec constructs a PPI network by introducing nodes for each ontology class and edges for ontology axioms, followed by random walks starting from each node in the graph to generate representations that enable encoding drug effects or protein functions while preserving their semantic neighborhood within that graph. To extract molecular features, they utilized SmilesTransformer²⁴⁷ to capture the molecular organization of each drug from molecular structures of drugs and utilized DeepGOPlus²⁴⁸ to capture protein molecular features from protein amino acid sequences.

Secondly, they used two learnable feature transformer models to investigate the latent relationship between phenotypes features and molecular features. According to relationship information, the transformer model, which input the phenotypes features, will output the protein embedding for PPI networks (the top-down approach), and the other transformer model, which input the molecular features, will output drug embedding (the bottom-up approach).

Finally, a DNN was used to extract similar information related to protein from drug embedding, while a GCN is used to update the nodes embedding in PPI networks. Then both protein features and both drugs’ features are combined to calculate the similarity by cosine similarity. Since DTI-Voodoo performs well, it demonstrated that graph-based neural networks are good at identifying novel drug-protein interactions.

For drug repurposing, graph-based neural networks take the advantage of feature representation, which can not only utilize the drug-drug links information, but also the features between drug-cancer pairs.

For example, Cui et al.²⁴⁹ proposed GraphRepur, a model for drug repurposing prediction based on graph neural networks. Firstly, the authors collected the drug-induced gene expression data from the LINCS project²⁵⁰ as well as the drug-drug links information from the STITCH database²⁵¹. Secondly, to obtain the signature of drugs, they identified differentially expressed genes for breast cancer and used the drug-induced genes from LINCS as drug signatures. Thirdly, based on the drug-drug links information from the STITCH database and drug signatures, they constructed a drug-drug links graph with drug signatures as node features. Fourthly, they input drug signatures and drug-drug links information into GraphRepur, and then the model computes scores for drugs that can be repurposed for treating breast cancer. Finally, the authors validated some predictive drugs for breast cancer using experimental data from the literature and showed that the model has significantly better performances than others, such as GCN, DNN, and random forest, in drug repurposing. using published studies.

Furthermore, the authors summarize three conclusions. The first conclusion is that the drug-drug links information plays an important role in studying drug repurposing. The second conclusion is that if such a network with fewer isolated nodes can provide a lot of network topology information, it will significantly improve the prediction performance of graph neural networks. The third is that the drug-induced genetic feature help to improve the DTI prediction accuracy of graph neural network.

Taken together, with the development of graph-based neural networks, an increasing number of ML-based drug screening and repurposing applications can quickly and accurately discover anticancer drugs, reducing the time and financial costs of experiments.

Drug properties prediction

ADMET properties prediction

As discussed in section 4.3 (drug discovery step), after we have a list of drug molecules showing high affinity with the therapeutic target, it is necessary to investigate the properties of these candidates’ drugs^{252,253,254,255}. Since the prediction of drug properties usually adopts the ML-based methods, this study mainly reviews the ML-based biology analysis applications for drug properties prediction such as the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties of chemical compounds²⁵⁶. Table 5 briefly described the ADMET properties.

Table 5 The brief description of the ADMET properties²⁵⁶

Full size table

ADMET properties prediction can be considered as a classification or regression problem. Because of the strong ability of feature representation¹⁷⁷, graph-based neural networks can capture the drug descriptors (the physicochemical properties, molecular representations, and drug-like properties of molecules) from the drug fingerprints (the substructure features of a molecule)²⁵⁷, so as to predict ADMET properties by classification or regression algorithm (Fig. 12)²⁵⁸.

For example, Duvenaud et al.²⁵⁹ proposed a graph convolution network to learn drug molecular fingerprints, which shows better performance than the state-of-the-art circular fingerprint method for ADMET properties prediction. After that, more and more scientists have used graph-based neural networks to predict the ADMET properties of drug molecules.

For example, Liu et al.¹⁷¹ proposed Chemi-Net, which utilizes GCN for ADMET properties prediction. They set the characterization of the atoms of the drug molecule and the relationship between atoms as the input of the Chemi-Net, while the output of Chemi-Net is the ADMET properties prediction of drug molecules. The predictive process of Chemi-Net is as follows.

Firstly, the model projects the assembling of the atoms and atom pair descriptors (features between atomic pairs)²⁵⁷ onto a 3D space to obtain a drug molecule-shaped graph structure. Secondly, Chemi-Net carries out a series of graph convolution operations to output a single fixed-sized molecule embedding. Finally, they obtain accurate ADMET properties predictions of drugs after passing the molecule embedding representation through fully connected layers.

In summary, we consider that more artificial intelligence models for drug properties prediction will be developed in the distant future.

The drug properties application in clinical trial

Since there have been a large number of applications based on artificial intelligence to study the properties of drugs, it still takes on average 10–15 years and 1.5–2.0 billion to bring a new drug to market²⁶⁰. One of the main stumbling blocks is the high failure rate of clinical trials. Therefore, some research are committed to the application of artificial intelligence for clinical trial design.

For example, Shah et al²⁶¹ construct an artificial intelligence system that made use of the ‘self-learning’ deep reinforcement learning technology to looks at treatment regimens currently in use, and iteratively adjusts the doses. Therefore, the system can determine the fewest, smallest doses that could still shrink brain tumors, reduce toxicity and eventually find an optimal treatment plan with the lowest possible potency and frequency of doses that should still reduce tumor sizes to a degree comparable to that of traditional regimens. In simulated trials of 50 patients, the system designed treatment cycles that reduced the potency to less than a half of all the doses while maintaining the same tumor-shrinking potential.

In conclusion, we believe that with the development of artificial intelligence applications for drug property prediction, these applications will provide better help for clinical trial.

Discussion and Conclusions

Modelling of cellular networks underlying cancer has provided us with a quantitative framework to investigate the link between network properties and the disease by artificial intelligence biology analysis, thereby leading to the discovery of potential novel anticancer targets and drugs^{23,24,25,26,27,28,29}. However, there is no systematic review that introduces artificial intelligence biology analysis in cancer target identification and drug discovery. For this reason, this study briefly reviewed the scope of artificial intelligence biology analysis to explore new anticancer targets^{34,54,57,74,80}, the principles and theory of commonly used artificial intelligence biology analysis algorithms^{83,84,85,86,87,88,89,90,91}, and the artificial intelligence applications for artificial intelligence biology analysis^42,195,213.

The scope of artificial intelligence analysis to explore novel anticancer targets consists of epigenetics⁵⁴, genomics⁵⁷, proteomics⁷⁴, metabolomics³⁴, etc. Since it is not accurate to have anticancer targets by single omics studies, we have to employ artificial intelligence biology analysis to effectively integrate multiple omics data and tackle the complexity of cancer that arises from interactions between genes and their products^16,17 and improve our understanding of carcinogenesis^{23,24,25,26,27,28,29}. Therefore, how to employ artificial intelligence biology analysis algorithms to integrate multiomics data and identify novel anticancer targets will be an important future study direction.

Next, we introduced two categories of commonly used artificial intelligence algorithms. One is network-based biology analysis algorithms and the other is ML-based biology analysis algorithms. We here discuss their limitations and advantages.

The network-based biology analysis algorithms usually are comprised of shortest path⁸³, module detection⁸⁴ and network centrality⁸⁵, which have three major advantages: First, they provide a variety of alternative approaches to identify cancer targets, and different algorithms can compensate each other to identify targets from various perspectives, therefore providing new biological explanations³⁰; Second, since they are not limited by the scale of the network, they are good at dealing with the case of small sample network; Third, prior biological knowledge and experience could be conveniently integrated into network-based biology analysis algorithms to make them interpretable.

However, previous studies also show two major shortcomings for the network-based algorithms: First, the current biological network data are biased toward much-studied targets²⁶². Since previous studies have paid much attention to these targets, the network-based algorithms will more likely identify these well-studied targets than others due to the data bias²⁶². Second, most algorithms only use the topological information of the biological network, but neglect the association between cell function or phenotypes and topological features (such as centrality-based algorithms that are discussed in Section 3.1.2).

ML-based biology analysis algorithms are usually comprised of decision trees^86,87,88 and deep learning^89,90,91, which have two major advantages.

One is feature learning and detection^177,181, which employ sophisticated neural network architectures to link up features of biological networks and characterize their relationships. Subsequently, they iteratively train the model to detect such features that are hard to be detected by network-based biology analysis algorithms.

The other is their ability to effectively integrate large and diverse data. It is possible for ML-based networks biology analysis algorithms to integrate multiomics biological network data and identify novel targets²⁶³, because of the fast development of deep learning models and the easy access to high-throughput biological.

Although employing ML-based algorithms greatly benefits the target identification and drug discovery for cancer treatment¹⁷⁴, we still have three major challenges to overcome.

The first challenge is the lack of consistent data for validation³³. Although the recent advances in biotechnologies have enabled the fast generation of massive biomedical data, such data often suffer from inconsistency in production and information missing in annotation, resulting in the lack of reliable and consistent data for validating deep learning models²⁶⁴.

The second challenge is the integration of heterogeneous information¹⁰³. Although deep learning models facilitate the integration of multimodal biological data, it is still difficult to build up a universal deep learning model due to the lack of biological domain knowledge²⁰⁰.

The third challenge is hard to provide interpretability of deep learning models¹⁸⁵. However, a recent study sheds a light to resolve the issue through a combination of a disease network with a neural network to characterize the mechanism of melanoma²⁶³. In addition, graphs-based neural networks can improve the interpretability of deep learning models²⁶⁵.

In the last section of the study, we have reviewed the applications of artificial intelligence biology analysis for cancer therapy from four perspectives: novel anticancer targets identification¹⁸⁹, evaluating the druggability of potential targets^3,4, drug discovery²⁰⁰, and drug properties prediction^{252,253,254,255}.

First, we presented several widely used applications to identify novel anticancer targets. However, exemplified by WGCNA¹⁹⁵, these network-based biology analysis applications not only requires high computing costs to reconstruct gene co-expression networks⁴² but also has difficulty in accurately locating effective network nodes. Although ML-based biology analysis applications employ collaborative modelling by neighbourhood nodes information to reduce the computational cost and improve the predictive accuracy for anticancer targets, biological networks still have data bias²⁶², resulting in most of the identified targets by current applications already have been reported in previous studies. Therefore, how to develop such an efficient feature selection application that can solve the data bias problem will be appealing for novel therapeutic anticancer target identification^266,267,268 in the distant future.

Second, we introduce several widely used applications to evaluate the druggability of potential targets. For example, PockDrug is usually used to predict druggable pockets on proteins²¹³. Although trRosetta²¹⁸ and Alphafold²²¹ offer opportunities for Pockdrug to evaluate the pharmaceuticals of potential targets, Pockdrug neither accurately predicts druggability due to the complexity of protein structure^269,270,271 nor costs low efforts to validate through biological experiments^272,273. Nevertheless, since DTA prediction can quickly provide reliable druggable targets for cancer care with low financial costs²¹¹, it is potential to develop the related efficient artificial intelligence biology analysis applications for DTA prediction in the distant future.

Third, we investigated several widely used applications for drug discovery, which consists of drug screening and drug repurposing.

For drug screening, identifying drug-target interactions (DTIs) is a crucial step. Since network-based biology analysis applications for DTI prediction are usually based on the guilt-by-association principle²²⁸, it can only predict the interacting neighbors of known cancer targets. Currently, ML-based biology analysis applications can extend the predictions to downstream consequences²²⁷, thereby screening out more possible anticancer drugs.

For drug repurposing²³², there are four commonly used network-based biology analysis applications^{234,235,236,237,238,239,240,241} that integrate the similarities among various drugs but ignore prior knowledge. However, ML-based biology analysis applications not only can take advantage of the similarity among drugs, but also can integrate drug properties to improve the accuracy of drug repurposing.

Fourth, we introduce widely used applications for drug properties prediction. For example, graph convolution networks, which have a strong ability of feature representation¹⁷⁷, can capture the features related to ADMET properties of drugs from their molecular structures. Therefore, it is becoming a popular method to predict drug properties by integrating drug molecular structures and drug clinical phenotype for drug properties prediction through graph convolution networks²⁷⁴. Here, we wish once more and more artificial intelligence biology analysis models are developed to capture the features related to ADMET properties from the drug molecular structure, to improve the success rate of clinical trials.

In summary, although we have reviewed and discussed many artificial intelligence algorithms and corresponding applications for novel anticancer target identification and drug discovery, this review is still too brief to cover the entire research area. However, because artificial intelligence algorithms are effective in exploring new anticancer targets and discovering drugs, we wish this review could offer valuable enlightenments for interested researchers to develop an understanding of the principles behind artificial intelligence biology analysis in cancer target identification and drug discovery. Moreover, we wish that our perspective on artificial intelligence and related applications will provide the pathway for further advancement in the field.

References

Shabani, M. & Hojjat-Farsangi, M. Targeting receptor tyrosine kinases using monoclonal antibodies: the most specific tools for targeted-based cancer therapy. Curr. Drug Targets 17, 1687–1703 (2016).
Article CAS PubMed Google Scholar
Paananen, J. & Fortino, V. An omics perspective on drug target discovery platforms. Brief. Bioinform 21, 1937–1953 (2019).
Article PubMed Central CAS Google Scholar
Hopkins, A. L. & Groom, C. R. Opinion: The druggable genome. Nat. Rev. Drug Discov. 1, 727–730 (2002).
Article CAS PubMed Google Scholar
Bushweller, J. H. Targeting transcription factors in cancer—from undruggable to reality. Nat. Rev. Cancer 19, 611–624 (2019).
Article CAS PubMed PubMed Central Google Scholar
Colaprico, A. et al. Interpreting pathways to discover cancer driver genes with Moonlight. Nat. Commun. 11, 69 (2020).
Article CAS PubMed PubMed Central Google Scholar
Dugger, S. A., Platt, A. & Goldstein, D. B. Drug development in the era of precision medicine. Nat. Rev. Drug Discov. 17, 183–196 (2018).
Article CAS PubMed Google Scholar
Manzari, M. T. et al. Targeted drug delivery strategies for precision medicines. Nat. Rev. Mater. 6, 351–370 (2021).
Article CAS PubMed PubMed Central Google Scholar
Rosenblum, D., Joshi, N., Tao, W., Karp, J. M. & Peer, D. Progress and challenges towards targeted delivery of cancer therapeutics. Nat. Commun. 9, 1410 (2018).
Article PubMed PubMed Central CAS Google Scholar
Song, H. et al. Denoising of MR and CT images using cascaded multi-supervision convolutional neural networks with progressive training. Neurocomputing 469, 354–365 (2022).
Article Google Scholar
Zhang, L. et al. MCDB: a comprehensive curated mitotic catastrophe database for retrieval, protein sequence alignment, and target prediction. Acta Pharm. Sin. B 11, 3092–3104 (2021).
Article CAS PubMed PubMed Central Google Scholar
Gao, J., Liu, P., Liu, G. D. & Zhang, L. Robust needle localization and enhancement algorithm for ultrasound by deep learning and beam steering methods. J. Comput. Sci. Technol. 36, 334–346 (2021).
Article Google Scholar
Liu, G. D., Li, Y. C., Zhang, W. & Zhang, L. A brief review of artificial intelligence applications and algorithms for psychiatric disorders. Eng.-Prc 6, 462–467 (2020).
Google Scholar
Zhang, L., Bai, W., Yuan, N. & Du, Z. Comprehensively benchmarking applications for detecting copy number variation. PLoS Comput. Biol. 15, e1007069 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L. & Zhang, S. Using game theory to investigate the epigenetic control mechanisms of embryo development: Comment on: “Epigenetic game theory: How to compute the epigenetic control of maternal-to-zygotic transition” by Qian Wang et al. Phys. Life Rev. 20, 140–142 (2017).
Article PubMed Google Scholar
Zhou, Y., Wang, F., Tang, J., Nussinov, R. & Cheng, F. Artificial intelligence in COVID-19 drug repurposing. Lancet Digit. Health 2, e667–e676 (2020).
Article PubMed PubMed Central Google Scholar
Suhail, Y. et al. Systems biology of cancer metastasis. Cell Syst. 9, 109–127 (2019).
Article CAS PubMed PubMed Central Google Scholar
Barabási, A.-L. & Oltvai, Z. N. Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004).
Article PubMed CAS Google Scholar
Lv, J., Deng, S. & Zhang, L. A review of artificial intelligence applications for antimicrobial resistance. Biosaf. Health 3, 22–31 (2021).
Article Google Scholar
Wu, W. et al. Exploring the dynamics and interplay of human papillomavirus and cervical tumorigenesis by integrating biological data into a mathematical model. BMC Bioinform. 21, 152 (2020).
Article CAS Google Scholar
Xiao, M. et al. 2019nCoVAS: developing the web service for epidemic transmission prediction, genome analysis, and psychological stress assessment for 2019-nCoV. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1250–1261 (2021).
Article CAS PubMed Google Scholar
Xiao, M., Yang, X., Yu, J. & Zhang, L. CGIDLA: developing the web server for CpG island related density and LAUPs (Lineage-Associated Underrepresented Permutations) study. IEEE/ACM Trans. Comput Biol. Bioinform 17, 2148–2154 (2020).
Article PubMed Google Scholar
Zhao, J., Cao, Y. & Zhang, L. Exploring the computational methods for protein-ligand binding site prediction. Comput. Struct. Biotechnol. J. 18, 417–426 (2020).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. X., Thomas, C. E. & Brunak, S. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet. 17, 615–629 (2016).
Article CAS PubMed Google Scholar
Ideker, T. & Nussinov, R. Network approaches and applications in biology. PLoS Comput. Biol. 13, e1005771 (2017).
Article PubMed PubMed Central CAS Google Scholar
Lai, X. et al. MiR-205-5p and miR-342-3p cooperate in the repression of the E2F1 transcription factor in the context of anticancer chemotherapy resistance. Theranostics 8, 1106 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lai, X., Eberhardt, M., Schmitz, U. & Vera, J. Systems biology-based investigation of cooperating microRNAs as monotherapy or adjuvant therapy in cancer. Nucleic Acids Res. 47, 7753–7766 (2019).
Article CAS PubMed PubMed Central Google Scholar
Seyfried, N. T. et al. A multi-network approach identifies protein-specific co-expression in asymptomatic and symptomatic Alzheimer’s disease. Cell Syst. 4, 60–72.e64 (2017).
Article CAS PubMed Google Scholar
Vidal, M., Cusick, Michael, E. & Barabási, A.-L. Interactome networks and human disease. Cell 144, 986–998 (2011).
Article CAS PubMed PubMed Central Google Scholar
Chen, L. & Wu, J. Bio-network medicine. J. Mol. Cell Biol. 7, 185–186 (2015).
Article CAS PubMed Google Scholar
Ghanat Bari, M., Ung, C. Y., Zhang, C., Zhu, S. & Li, H. Machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks. Sci. Rep. 7, 6993 (2017).
Article PubMed PubMed Central CAS Google Scholar
Muzio, G., O’Bray, L. & Borgwardt, K. Biological network analysis with deep learning. Brief. Bioinform. 22, 1515–1530 (2021).
Article PubMed Google Scholar
Camacho, D. M., Collins, K. M., Powers, R. K., Costello, J. C. & Collins, J. J. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Article CAS PubMed Google Scholar
Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hasin, Y., Seldin, M. & Lusis, A. Multi-omics approaches to disease. Genome Biol. 18, 1–15 (2017).
Article CAS Google Scholar
Kim, H. & Kim, Y.-M. Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types. Sci. Rep. 8, 6041 (2018).
Article PubMed PubMed Central CAS Google Scholar
Vinayagam, A. et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl Acad. Sci. USA 113, 4976–4981 (2016).
Article CAS PubMed PubMed Central Google Scholar
do Valle, Í. F. et al. Network integration of multi-tumour omics data suggests novel targeting strategies. Nat. Commun. 9, 1–10 (2018).
Article CAS Google Scholar
Yang, K. et al. A comprehensive analysis of metabolomics and transcriptomics in cervical cancer. Sci. Rep. 7, 43353 (2017).
Article CAS PubMed PubMed Central Google Scholar
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Article CAS PubMed Google Scholar
Casparie, M. et al. Pathology databanking and biobanking in The Netherlands, a central role for PALGA, the nationwide histopathology and cytopathology data network and archive. Cell Oncol. 29, 19–24 (2007).
CAS PubMed PubMed Central Google Scholar
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2017).
Article PubMed Central CAS Google Scholar
Wang, Y. et al. Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res. 48, D1031–D1041 (2019).
PubMed Central Google Scholar
Wang, Y. et al. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 37, W623–W633 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2011).
Article PubMed PubMed Central CAS Google Scholar
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Article CAS PubMed PubMed Central Google Scholar
McLendon, R. et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Article CAS Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Article CAS PubMed PubMed Central Google Scholar
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Article CAS Google Scholar
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2010).
Article PubMed PubMed Central CAS Google Scholar
Szklarczyk, D. et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2020).
Article PubMed Central CAS Google Scholar
Consortium, G. O. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–D261 (2004).
Article CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Perakakis, N., Yazdani, A., Karniadakis, G. E. & Mantzoros, C. Omics, big data and machine learning as tools to propel understanding of biological mechanisms and to discover novel diagnostics and therapeutics. Metabolism 87, A1–A9 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wilson, S. & Filipp, F. V. A network of epigenomic and transcriptional cooperation encompassing an epigenomic master regulator in cancer. NPJ Syst. Biol. Appl. 4, 24 (2018).
Article PubMed PubMed Central Google Scholar
Filipp, F. V. Crosstalk between epigenetics and metabolism—Yin and Yang of histone demethylases and methyltransferases in cancer. Brief. Funct. Genom. 16, 320–325 (2017).
Article CAS Google Scholar
Holmes, M. V., Richardson, T. G., Ference, B. A., Davies, N. M. & Davey Smith, G. Integrating genomics with biomarkers and therapeutic targets to invigorate cardiovascular drug development. Nat. Rev. Cardiol. 18, 435–453 (2021).
Article PubMed Google Scholar
Ozaki, K. et al. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat. Genet. 32, 650–654 (2002).
Article CAS PubMed Google Scholar
Golub, T. R. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999).
Article CAS PubMed Google Scholar
Oliver, S. Proteomics: guilt-by-association goes global. Nature 403, 601–603 (2000).
Article CAS PubMed Google Scholar
Lanza, V. F., Baquero, F., Cruz, F. D. L. & Coque, T. M. AccNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks. Bioinformatics 33, btw601 (2016).
Google Scholar
Fernandes, E. G., Lombardi, A., Solaro, R. & Chiellini, E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat. Genet. 44, 841–847 (2012).
Article CAS Google Scholar
Escala-Garcia, M., Abraham, J. & Andrulis, I. L. et al. A network analysis to identify mediators of germline-driven differences in breast cancer prognosis. Nat. Commun. 11, 312 (2020).
Article CAS PubMed PubMed Central Google Scholar
Pidò, S., Ceddia, G. & Masseroli, M, MM. Computational analysis of fused co-expression networks for the identification of candidate cancer gene biomarkers. NPJ Syst. Biol. Appl. 7, 17 (2021).
Article PubMed PubMed Central CAS Google Scholar
Medi, K., Kazim, Y. A. & Craig, M. Potential biomarkers and therapeutic targets in cervical cancer: Insights from the meta-analysis of transcriptomics data within network biomedicine perspective. PLoS One 13, e0200717 (2018).
Article Google Scholar
Cantini, L., Medico, E., Fortunato, S. & Caselle, M. Detection of gene communities in multi-networks reveals cancer drivers. Sci. Rep. 5, 17386 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L., Dai, Z., Yu, J. & Xiao, M. CpG-island-based annotation and analysis of human housekeeping genes. Brief. Bioinform. 22, 515–525 (2021).
Article CAS PubMed Google Scholar
Zhang, L. et al. Computed tomography angiography-based analysis of high-risk intracerebral haemorrhage patients by employing a mathematical model. BMC Bioinform. 20, 193 (2019).
Article Google Scholar
Zhang, L. et al. EZH2-, CHD4-, and IDH-linked epigenetic perturbation and its association with survival in glioma patients. J. Mol. Cell Biol. 9, 477–488 (2017).
Article CAS PubMed Google Scholar
Zhang, L. et al. Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization/validation. Nanoscale 8, 14877–14887 (2016).
Article CAS PubMed Google Scholar
Zhang, L., Xiao, M., Zhou, J. & Yu, J. Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA). Bioinformatics 34, 3624–3630 (2018).
Article CAS PubMed Google Scholar
Zhang, L. et al. Building up a robust risk mathematical platform to predict colorectal cancer. Complexity 2017, 8917258 (2017).
Article Google Scholar
Zhang, L. et al. Bioinformatic analysis of chromatin organization and biased expression of duplicated genes between two poplars with a common whole-genome duplication. Horticult. Res. 8, 62 (2021).
Article CAS Google Scholar
Ong, S.-E. & Mann, M. Mass spectrometry–based proteomics turns quantitative. Nat. Chem. Biol. 1, 252–262 (2005).
Article CAS PubMed Google Scholar
Li, Z., Ivanov, A. A. & AL, e The OncoPPi network of cancer-focused protein–protein interactions to inform biological insights and therapeutic strategies. Nat. Commun. 8, 14356 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kalman, R. E. Mathematical description of linear dynamical systems. J. Soc. Ind. Appl. Math. Ser. A Control 1, 152–192 (1963).
Article Google Scholar
Ravindran, V., Sunitha, V. & Bagler, G. Identification of critical regulatory genes in cancer signaling network using controllability analysis. Phys. A: Stat. Mech. Appl. 474, 134–143 (2017).
Article Google Scholar
do Valle, I. F. et al. Network medicine framework shows that proximity of polyphenol targets and disease proteins predicts therapeutic effects of polyphenols. Nat. Food 2, 143–155 (2021).
Article Google Scholar
Basler, G., Nikoloski, Z., Larhlimi, A., Barabási, A.-L. & Liu, Y.-Y. Control of fluxes in metabolic networks. Genome Res. 26, 956–968 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chakraborty, S., Hosen, M. I., Ahmed, M. & Shekhar, H. U. Onco-Multi-OMICS approach: a new frontier in cancer research. Biomed. Res. Int. 2018, 9836256 (2018).
Article PubMed PubMed Central CAS Google Scholar
Zhang, C. et al. The identification of key genes and pathways in hepatocellular carcinoma by bioinformatics analysis of high-throughput data. Med. Oncol. 34, 101 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gov, E., Kori, M. & Arga, K. Y. Multiomics analysis of tumor microenvironment reveals Gata2 and miRNA-124-3p as potential novel biomarkers in ovarian cancer. OMICS 21, 603–615 (2017).
Article CAS PubMed Google Scholar
Guney, E., Menche, J., Vidal, M. & Barábasi, A.-L. Network-based in silico drug efficacy screening. Nat. Commun. 7, 10331 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fortunato, S. Community detection in graphs. Phys. Rep. 486, 75–174 (2009).
Article Google Scholar
Lü, L. et al. Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016).
Article Google Scholar
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).
Article Google Scholar
Loh, W.-Y. Classification and regression trees. Phys. Rep. 1, 14–23 (2011).
Google Scholar
Cao, Y., Geddes, T. A., Yang, J. Y. H. & Yang, P. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2, 500–508 (2020).
Article Google Scholar
Nordhausen & Klaus An introduction to statistical learning—with applications in R by Gareth James, Daniela Witten, Trevor Hastie & Robert Tibshirani. Int. Stat. Rev. 82, 156–157 (2014).
Article Google Scholar
Hao, X., Zhang, G. & Ma, S. Deep learning. Int. J. Semantic Comput. 10, 417–439 (2016).
Article Google Scholar
Lecun, Y., Bengio, Y. & Hinton, G. E. Deep learning. Nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
Barabási, A.-L., Gulbahce, N. & Loscalzo, J. Network medicine: a network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011).
Article PubMed PubMed Central CAS Google Scholar
Karlebach, G. & Shamir, R. Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).
Article CAS PubMed Google Scholar
Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S. & Gilles, E. D. Metabolic network structure determines key aspects of functionality and regulation. Nature 420, 190–193 (2002).
Article CAS PubMed Google Scholar
T-M, H. Architecture of the drug-drug interaction network. J. Clin. Pharm. Ther. 36, 135–143 (2011).
Article CAS Google Scholar
Martinez, V., Berzal, F. & Cubero, J. C. A survey of link prediction in complex networks. ACM Comput. Surv. 49, 69.61–69.33 (2017).
Article Google Scholar
Hens, C., Harush, U., Haber, S., Cohen, R. & Barzel, B. Spatiotemporal signal propagation in complex networks. Nat. Phys. 15, 403–412 (2019).
Article CAS Google Scholar
Lazareva, O., Baumbach, J., List, M. & Blumenthal, D. B. On the limits of active module identification. Brief. Bioinform. 22, bbab066 (2021).
Article PubMed Google Scholar
Liu, Y.-Y., Slotine, J.-J. & Barabási, A.-L. Controllability of complex networks. Nature 473, 167–173 (2011).
Article CAS PubMed Google Scholar
Abhik, S. & Wild, D. J. Netpredictor: R and Shiny package to perform drug-target network analysis and prediction of missing links. BMC Bioinform. 19, 265 (2018).
Article CAS Google Scholar
Kuperstein, I. et al. The shortest path is not the one you know: application of biological network resources in precision oncology research. Mutagenesis 30, 191–204 (2015).
Article CAS PubMed Google Scholar
Rabbani, M. & Kazemi, S. Solving uncapacitated multiple allocation p-hub center problem by Dijkstra’s algorithm-based genetic algorithm and simulated annealing. Int. J. Ind. Eng. Comput. 6, 405–418 (2015).
Google Scholar
Li, Z. et al. Identifying novel genes and chemicals related to nasopharyngeal cancer in a heterogeneous network. Sci. Rep. 6, 25515 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ruiz, C., Zitnik, M. & Leskovec, J. Identification of disease treatment mechanisms through the multiscale interactome. Nat. Commun. 12, 1796 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, L. et al. Identification of novel candidate drivers connecting different dysfunctional levels for lung adenocarcinoma using protein-protein interactions and a shortest path approach. Sci. Rep. 6, 29849 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, B.-Q., Huang, T., Liu, L., Cai, Y.-D. & Chou, K.-C. Identification of colorectal cancer related genes with mrmr and shortest path in protein-protein interaction network. PLoS One 7, e33393 (2012).
Article CAS PubMed PubMed Central Google Scholar
Peng, H., Long, F. & Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005).
Article PubMed Google Scholar
Barthélemy, M. Betweenness centrality in large complex networks. Eur. Phys. J. B 38, 163–168 (2004).
Article CAS Google Scholar
Maclean, H. E., Warne, G. L. & Zajac, J. D. Localization of functional domains in the androgen receptor. J. Steroid Biochem. Mol. Biol. 62, 233–242 (1997).
Article CAS PubMed Google Scholar
Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 98, 5116 (2001).
Article CAS PubMed PubMed Central Google Scholar
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47, D607–D613 (2018).
Article PubMed Central CAS Google Scholar
Lu, S., Zhu, Z.-G. & Lu, W.-C. Inferring novel genes related to colorectal cancer via random walk with restart algorithm. Gene Ther. 26, 373–385 (2019).
Article CAS PubMed Google Scholar
Menche, J. et al. Uncovering disease-disease relationships through the incomplete interactome. Science 347, 1257601–1257601 (2015).
Article PubMed PubMed Central CAS Google Scholar
Choobdar, S. et al. Assessment of network module identification across complex diseases. Nat. Methods 16, 843–852 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fortunato, S. & Hric, D. Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016).
Article Google Scholar
Newman, M. E. J. Communities, modules and large-scale structure in networks. Nat. Phys. 8, 25–31 (2012).
Article CAS Google Scholar
Cowen, L., Ideker, T., Raphael, B. J. & Sharan, R. Network propagation: a universal amplifier of genetic associations. Nat. Rev. Genet. 18, 551–562 (2017).
Article CAS PubMed Google Scholar
Silverbush, D. et al. Simultaneous integration of multi-omics data improves the identification of cancer driver modules. Cell Syst. 8, 456–466.e455 (2019).
Article CAS PubMed Google Scholar
Hossain, S. M. M., Halsana, A. A., Khatun, L., Ray, S. & Mukhopadhyay, A. Discovering key transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian mixture model. Sci. Rep. 11, 7853 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ghiassian, S. D. et al. Endophenotype network models: common core of complex diseases. Sci. Rep. 6, 27414 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ghiassian, S. D., Menche, J. & Barabási, A.-L. A DIseAse MOdule Detection (DIAMOnD) algorithm derived from a systematic analysis of connectivity patterns of disease proteins in the human interactome. PLoS Comput. Biol. 11, e1004120 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wang, R.-S. & Loscalzo, J. Network-based disease module discovery by a novel seed connector algorithm with pathobiological implications. J. Mol. Biol. 430, 2939–2950 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, Q., Yu, H., Zhao, Z. & Jia, P. EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles. Bioinformatics 31, 2591–2594 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. A gene module identification algorithm and its applications to identify gene modules and key genes of hepatocellular carcinoma. Sci. Rep. 11, 5517 (2021).
Article CAS PubMed PubMed Central Google Scholar
Paci, P. et al. Gene co-expression in the interactome: moving from correlation toward causation via an integrated approach to disease module discovery. NPJ Syst. Biol. Appl. 7, 3 (2021).
Article PubMed PubMed Central Google Scholar
Tripathi, B., Parthasarathy, S., Sinha, H., Raman, K. & Ravindran, B. Adapting community detection algorithms for disease module identification in heterogeneous biological networks. Front. Genet. 10, 164 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jeong, H., Mason, S. P., Barabási, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
Article CAS PubMed Google Scholar
Mangangcha, I. R., Malik, M. Z., Küçük, Ö., Ali, S. & Singh, R. K. B. Identification of key regulators in prostate cancer from gene expression datasets of patients. Sci. Rep. 9, 16420 (2019).
Article PubMed PubMed Central CAS Google Scholar
Kitsak, M. et al. Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893 (2010).
Article CAS Google Scholar
Jalili, M. et al. Evolution of centrality measurements for the detection of essential proteins in biological networks. Front. Physiol. 7, 375 (2016).
Article PubMed PubMed Central Google Scholar
Pastor-Satorras, R. & Castellano, C. Distinct types of eigenvector localization in networks. Sci. Rep. 6, 18847 (2016).
Article CAS PubMed PubMed Central Google Scholar
Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).
Article PubMed Google Scholar
Zhang, J. et al. P4HB, a novel hypoxia target gene related to gastric cancer invasion and metastasis. Biomed. Res. Int. 2019, 9749751 (2019).
PubMed PubMed Central Google Scholar
Ahajjam, S. & Badir, H. Identification of influential spreaders in complex networks using HybridRank algorithm. Sci. Rep. 8, 11932 (2018).
Article PubMed PubMed Central CAS Google Scholar
Malliaros, F. D., Rossi, M.-E. G. & Vazirgiannis, M. Locating influential nodes in complex networks. Sci. Rep. 6, 19307 (2016).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Deciphering the mechanism of Indirubin and its derivatives in the inhibition of Imatinib resistance using a “drug target prediction-gene microarray analysis-protein network construction” strategy. BMC Complement. Alter. Med. 19, 75 (2019).
Article Google Scholar
Taylor, I. W. et al. Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat. Biotechnol. 27, 199–204 (2009).
Article CAS PubMed Google Scholar
Raman, K., Damaraju, N. & Joshi, G. K. The organisational structure of protein networks: revisiting the centrality-lethality hypothesis. Syst. Synth. Biol. 8, 73–81 (2014).
Article PubMed Google Scholar
Mallik, S. & Maulik, U. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset. J. Biomed. Infor. 57, 308–319 (2015).
Article Google Scholar
Chen, C. et al. Construction and analysis of protein-protein interaction networks based on proteomics data of prostate cancer. Int. J. Mol. Med. 37, 1576–1586 (2016).
Article CAS PubMed PubMed Central Google Scholar
Al-Aamri, A., Taha, K., Al-Hammadi, Y., Maalouf, M. & Homouz, D. Analyzing a co-occurrence gene-interaction network to identify disease-gene association. BMC Bioinform. 20, 70 (2019).
Article Google Scholar
Jiang, P. et al. Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 16, 1–10 (2015).
Article CAS Google Scholar
Chen, K.-H., Wang, K.-J., Wang, K.-M. & Angelia, M.-A. Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data. Appl. Soft Comput. 24, 773–780 (2014).
Article Google Scholar
Chen, K.-H. et al. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm. BMC Bioinform. 15, 49 (2014).
Article Google Scholar
Li, Y., Tang, X.-Q., Bai, Z. & Dai, X. Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree. Sci. Rep. 6, 1–13 (2016).
CAS Google Scholar
Carson, M. B. & Lu, H. Network-based prediction and knowledge mining of disease genes. BMC Med. Genom. 8, S9 (2015).
Article CAS Google Scholar
Ramadan, E., Alinsaif, S. & Hassan, M. R. Network topology measures for identifying disease-gene association in breast cancer. BMC Bioinform. 17, 274 (2016).
Article Google Scholar
Lerman, R. I. & Yitzhaki, S. A note on the calculation and interpretation of the Gini index. Econ. Lett. 15, 363–368 (1984).
Article Google Scholar
Burt, R. S. Structural holes and good ideas. Am. J. Sociol. 110, 349–399 (2004).
Article Google Scholar
Ye, N., Zhang, Y., Wang, R. & Malekian, R. Vehicle trajectory prediction based on Hidden Markov Model. KSII Trans. Internet Infor. Syst. 10, 3150–3170 (2016).
Google Scholar
Ernesto, E. Subgraph centrality in complex networks. Phys. Rev. E Stat. Nonlin Soft Matter Phys. 71, 056103 (2005).
Article CAS Google Scholar
Guimerà, R. & Nunes Amaral, L. A. Functional cartography of complex metabolic networks. Nature 433, 895–900 (2005).
Article PubMed PubMed Central CAS Google Scholar
Katz, L. A new status index derived from sociometric analysis. Psychometrika 18, 39–43 (1953).
Article Google Scholar
Towfic, F. et al. Detection of gene orthology from gene co-expression and protein interaction networks. BMC Bioinform. 11, S7 (2010).
Article CAS Google Scholar
Soffer, S. N. & Vázquez, A. Network clustering coefficient without degree-correlation biases. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 71, 057101 (2005).
Article PubMed CAS Google Scholar
Breiman, L. Bagging predictors. Mach. Learn. 24, 123–140 (1996).
Article Google Scholar
Ezzat, A., Wu, M., Li, X. L. & Kwoh, C. K. Drug-target interaction prediction using ensemble learning and dimensionality reduction. Methods 129, 81 (2017).
Article CAS PubMed Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Article Google Scholar
Sarica, A., Cerasa, A. & Quattrone, A. Random Forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front. Aging Neurosci. 9, 329 (2017).
Article PubMed PubMed Central Google Scholar
Toth, R., Schiffmann, H., Hube-Magg, C., Büscheck, F. & Gerhuser, C. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin. Epigenet. 11, 148 (2019).
Article CAS Google Scholar
Jin, H. & Ling, C. X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005).
Article Google Scholar
Kingsford, C. & Salzberg, S. L. What are decision trees? Nat. Biotechnol. 26, 1011–1013 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hao, D. & Li, C. The dichotomy in degree correlation of biological networks. PLoS One 6, e28322 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Q., Wang, F. Y., Zeng, D. & Wang, T. Understanding crowd-powered search groups: a social network perspective. PLoS One 7, 1–16 (2012).
CAS Google Scholar
Freund, Y. & Mason, L. The Alternating Decision Tree Learning Algorithm. In Proc. Sixteenth International Conference on Machine Learning, 124–133 (1999).
Zhang, L. et al. Revealing dynamic regulations and the related key proteins of myeloma-initiating cells by integrating experimental data into a systems biological model. Bioinformatics 37, 1554–1561 (2021).
Article CAS PubMed Google Scholar
Tabrizchi, H., Tabrizchi, M. & Tabrizchi, H. Breast cancer diagnosis using a multi-verse optimizer-based gradient boosting decision tree. SN Appl. Sci. 2, 1–19 (2020).
Article Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Article Google Scholar
Liu, H., Zhang, W., Song, Y., Deng, L. & Zhou, S. HNet-DNN: inferring new drug–disease associations with deep neural network based on heterogeneous network features. J. Chem. Inform. Modeling 60, 2367–2376 (2020).
Article CAS Google Scholar
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J. & Koes, D. R. Protein–ligand scoring with convolutional neural networks. J. Chem. Inform. Modeling 57, 942–957 (2017).
Article CAS Google Scholar
Korotcov, A., Tkachenko, V., Russo, D. P. & Ekins, S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14, 4462–4475 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ma, T., Xiao, C., Zhou, J. & Wang, F. Drug similarity integration through attentive multi-view graph auto-encoders. In Proc. Twenty-Seventh International Joint Conference on Artificial Intelligence, 3477–3483, https://doi.org/10.24963/ijcai.2018/483 (2018).
Lan, W. et al. GANLDA: graph attention network for lncRNA-disease associations prediction. Neurocomputing 469, 384–393 (2022).
Article Google Scholar
Li, G. et al. Predicting MicroRNA-disease associations using network topological similarity based on DeepWalk. IEEE Access 5, 24032–24039 (2017).
Article Google Scholar
Webb, S. Deep learning for biology. Nature 554, 555–557 (2018).
Article CAS PubMed Google Scholar
Selvaraj, G. et al. Identification of target gene and prognostic evaluation for lung adenocarcinoma using gene expression meta-analysis, network analysis and neural network algorithms. J. Biomed. Inform. 86, 120–134 (2018).
Article PubMed Google Scholar
Goyal, P. & Ferrara, E. Graph embedding techniques, applications, and performance: a survey. Knowl.-Based Syst. 151, 78–94 (2017).
Article Google Scholar
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn Syst. 32, 4–24 (2021).
Article PubMed Google Scholar
Zheng, K., You, Z.-H., Wang, L., Wong, L. & Chen, Z.-H. Inferring disease-associated Piwi-interacting RNAs via graph attention networks. 239–250, (2020).
Vaswani, A. et al. Attention is all you need. In Proc. 31st International Conference on Neural Information Processing Systems, 5998–6008 (2017).
Singh, M., Singh, R. & Ross, A. A comprehensive overview of biometric fusion. Inform. Fusion 52, 187–205 (2019).
Article Google Scholar
Shi, Z., Zhang, H., Jin, C., Quan, X. & Yin, Y. A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinform. 22, 136 (2021).
Article CAS Google Scholar
Jordan, M. I. & Mitchell, T. M. Machine learning: trends, perspectives, and prospects. Science 349, 255–260 (2015).
Article CAS PubMed Google Scholar
Kim, D. et al. Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma. J. Am. Med. Inform. Assoc. 24, 577–587 (2016).
Article PubMed Central Google Scholar
Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18, 463–477 (2019).
Article CAS PubMed PubMed Central Google Scholar
Alex, F., Song, J. S. & Ilya, I. Maximum entropy methods for extracting the learned features of deep neural networks. PLoS Comput. Biol. 13, e1005836- (2017).
Article CAS Google Scholar
Hutson, M. Artificial intelligence faces reproducibility crisis. Science 359, 725–726 (2018).
Article PubMed Google Scholar
Ozerov, I. V. et al. In silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development. Nat. Commun. 7, 13427 (2016).
Article CAS PubMed PubMed Central Google Scholar
Xia, J., Benner, M. J. & Hancock, R. E. NetworkAnalyst-integrative approaches for protein–protein interaction network analysis and visual exploration. Nucleic Acids Res. 42, W167–W174 (2014).
Article CAS PubMed PubMed Central Google Scholar
Hernández-de-Diego, R. et al. PaintOmics 3: a web resource for the pathway analysis and visualization of multi-omics data. Nucleic Acids Res. 46, W503–W509 (2018).
Article PubMed PubMed Central CAS Google Scholar
Tuhkuri, A. et al. Patients with early-stage oropharyngeal cancer can be identified with label-free serum proteomics. Br. J. Cancer 119, 200–212 (2018).
Article CAS PubMed PubMed Central Google Scholar
Abbas, S. Z., Qadir, M. I. & Muhammad, S. A. Systems-level differential gene expression analysis reveals new genetic variants of oral cancer. Sci. Rep. 10, 14667 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ren, G. & Liu, Z. NetCAD: a network analysis tool for coronary artery disease-associated PPI network. Bioinformatics 29, 279–280 (2012).
Article PubMed CAS Google Scholar
Reimand, J. et al. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap. Nat. Protoc. 14, 482–517 (2019).
Article CAS PubMed PubMed Central Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008).
Article CAS Google Scholar
Xian-Guo, Z. et al. Identifying miRNA and gene modules of colon cancer associated with pathological stage by weighted gene co-expression network analysis. Onco Targets Ther. 11, 2815–2830 (2018).
Article Google Scholar
Langfelder, P., Zhang, B. & Horvath, S. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008).
Article CAS PubMed Google Scholar
Wang, A. et al. Cell adhesion-related molecules play a key role in renal cancer progression by multinetwork analysis. Biomed. Res. Int. 2019, 2325765 (2019).
Article PubMed PubMed Central Google Scholar
Lai, X. et al. Network- and systems-based re-engineering of dendritic cells with non-coding RNAs for cancer immunotherapy. Theranostics 11, 1412–1428 (2021).
Article CAS PubMed PubMed Central Google Scholar
Jin, S., Zeng, X., Xia, F., Huang, W. & Liu, X. Application of deep learning methods in biological networks. Brief. Bioinform. 22, 1902–1917 (2020).
Article CAS Google Scholar
Zhu, Y., Shen, X. & Pan, W. Network-based support vector machine for classification of microarray samples. BMC Bioinform. 10, S21 (2009).
Article CAS Google Scholar
Sanchez, R. & Mackenzie, S. A. Integrative network analysis of differentially methylated and expressed genes for biomarker identification in leukemia. Sci. Rep. 10, 2123 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 12, 3445 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xuan, P., Zhang, P. S., Liu, T., Sun, Y. & Graph, H. Convolutional network and convolutional neural network based method for predicting lncrna-disease associations. Cells 8, 1012 (2019). Aug 30.
Article PubMed Central Google Scholar
Wu, M.-Y. et al. Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinform. 17, 108 (2016).
Article CAS Google Scholar
Swan, A. L., Mobasheri, A., Allaway, D., Liddell, S. & Bacardit, J. Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. OMICS 17, 595–610 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gigliotti, B. J., Russell, M. D., Shonka, D. & Stathatos, N. Fine-needle aspiration and molecular analysis. Surgery of the Thyroid and Parathyroid Glands (Third Edition), 118–131, https://doi.org/10.1016/B978-0-323-66127-0.00012-0 (2021).
Sinkala, M., Mulder, N. & Martin, D. Machine learning and network analyses reveal disease subtypes of pancreatic cancer and their molecular characteristics. Sci. Rep. 10, 1212 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kaczmarek, E. et al. Multi-Omic graph transformers for cancer classification and interpretation. In Proc. Pacific Symposium on Biocomputing 27, 373–384, https://doi.org/10.1142/9789811250477_0034.
Vermeulen, M. & Lelie, N. The current status of nucleic acid amplification technology in transfusion-transmitted infectious disease testing. ISBT Sci. Ser. 11, 123–128 (2016).
Article CAS Google Scholar
Kuhlman, B. & Bradley, P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 20, 681–697 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kandoi, G., Acencio, M. L. & Lemke, N. Prediction of druggable proteins using machine learning and systems biology: a mini-review. Front. Physiol. 6, 366 (2015).
Article PubMed PubMed Central Google Scholar
Hussein, H. A. et al. PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins. Nucleic Acids Res. 43, W436–W442 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, Y.-F., Yu, B., Zhang, X.-X. & Zhu, Y.-H. Identification of TNIK as a novel potential drug target in thyroid cancer based on protein druggability prediction. Medicines 100, e25541–e25541 (2021).
CAS Google Scholar
Sheng, W. et al. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
Article CAS Google Scholar
Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, T. et al. Improved fragment sampling for ab initio protein structure prediction using deep neural networks. Nat. Mach. Intell. 1, 347–355 (2019).
Article Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kryshtafovych, A., Schwede, T., Topf, M., Fidelis, K. & Moult, J. Critical assessment of methods of protein structure prediction (CASP)—Round XIII. Proteins 87, 1011–1020 (2019).
Article CAS PubMed PubMed Central Google Scholar
Haas, J. et al. Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins 86, 387–398 (2018).
Article CAS PubMed Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
Article CAS PubMed Google Scholar
Shim, J., Hong, Z.-Y., Sohn, I. & Hwang, C. Prediction of drug–target binding affinity using similarity-based convolutional neural network. Sci. Rep. 11, 4416 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, B., He, H., Luo, H., Zhang, T. & Jiang, J. Artificial intelligence and big data facilitated targeted drug discovery. Stroke Vasc. Neurol. 4, 206–213 (2019).
Article PubMed PubMed Central Google Scholar
He, T., Heidemeyer, M., Ban, F., Cherkasov, A. & Ester, M. SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines. J. Cheminform. 9, 24 (2017).
Article PubMed PubMed Central CAS Google Scholar
Nguyen, T. et al. GraphDTA: predicting drug–target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2020).
Article CAS Google Scholar
Roda, A., Guardigli, M., Pasini, P. & Mirasoli, M. Bioluminescence and chemiluminescence in drug screening. Anal. Bioanal. Chem. 377, 826–833 (2003).
Article CAS PubMed Google Scholar
Hinnerichs, T. & Hoehndorf, R. DTI-Voodoo: machine learning over interaction networks and ontology-based background knowledge predicts drug–target interactions. Bioinformatics 37, 4835–4843 (2021).
Article CAS PubMed Central Google Scholar
Oliver, S. Guilt-by-association goes global. Nature 403, 601–602 (2000).
Article CAS PubMed Google Scholar
Monica, C. Drug target identification using side-effect similarity. Science 321, 263–266 (2008).
Article CAS Google Scholar
Feng, Y., Wang, Q. & Wang, T. Drug target protein-protein interaction networks: a systematic perspective. Biomed. Res. Int. 2017, 1289259 (2017).
Article PubMed PubMed Central Google Scholar
Lee, I., Keum, J. & Nam, H. DeepConv-DTI: prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput. Biol. 15, e1007129 (2019).
Article CAS PubMed PubMed Central Google Scholar
Parvathaneni, V., Kulkarni, N. S., Muth, A. & Gupta, V. Drug repurposing: a promising tool to accelerate the drug discovery process. Drug Discov. Today 24, 2076–2085 (2019).
Article CAS PubMed Google Scholar
Pritchard, J. E., O’Mara, T. A. & Glubb, D. M. Enhancing the promise of drug repositioning through genetics. Front. Pharm. 8, 896 (2017).
Article CAS Google Scholar
Gottlieb, A., Stein, G. Y., Ruppin, E. & Sharan, R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol. 7, 496 (2011).
Article PubMed PubMed Central CAS Google Scholar
Iwata, H., Sawada, R., Mizutani, S. & Yamanishi, Y. Systematic drug repositioning for a wide range of diseases with integrative analyses of phenotypic and molecular data. J. Chem. Inform. Model. 55, 446–459 (2015).
Article CAS Google Scholar
Liu, H., Song, Y., Guan, J., Luo, L. & Zhuang, Z. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinform. 17, 539 (2016).
Article Google Scholar
Luo, H. et al. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics 32, 2664–2671 (2016).
Article CAS PubMed Google Scholar
Wang, W., Yang, S., Zhang, X. & Li, J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 30, 2923–2930 (2014).
Article CAS PubMed PubMed Central Google Scholar
Liu, C. et al. Individualized genetic network analysis reveals new therapeutic vulnerabilities in 6,700 cancer genomes. PLoS Comput. Biol. 16, e1007701 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yang, M., Luo, H., Li, Y. & Wang, J. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics 35, i455–i463 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yang, M., Luo, H., Li, Y., Wu, F.-X. & Wang, J. Overlap matrix completion for predicting drug-associated indications. PLoS Comput. Biol. 15, e1007541 (2019).
Article PubMed PubMed Central CAS Google Scholar
Cheng, F. et al. A genome-wide positioning systems network algorithm for in silico drug repurposing. Nat. Commun. 10, 3476 (2019).
Article PubMed PubMed Central CAS Google Scholar
Luo, H. et al. Drug repositioning based on comprehensive similarity measures and Bi-Random Walk algorithm. Bioinformatics 32, btw228 (2016).
Article CAS Google Scholar
Feixiong, C., Junfei, Z., Michaela, F. & Zhongming, Z. A network-based drug repositioning infrastructure for precision cancer medicine through targeting significantly mutated genes in the human cancer genomes. J. Am. Med. Inform. Assoc. 23, 681–691 (2016).
Article Google Scholar
Luo, H. et al. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics 34, 1904–1912 (2018).
Article CAS PubMed Google Scholar
Chen, J., Althagafi, A. & Hoehndorf, R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 37, 853–860 (2020).
Article PubMed Central CAS Google Scholar
Honda, S., Shi, S. & Ueda, H. R. SMILES transformer: pre-trained molecular fingerprint for low data drug discovery. CoRR abs/1911.04738 (2019).
Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 36, 422–429 (2019).
Google Scholar
Cui, C. et al. Drug repurposing against breast cancer by integrating drug-exposure expression profiles and drug–drug links based on graph neural network. Bioinformatics 37, 2930–2937 (2021).
Article CAS PubMed Central Google Scholar
Aravind, S. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452.e1417 (2017).
Article CAS Google Scholar
Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2014).
Article PubMed PubMed Central CAS Google Scholar
Huang, L.-C., Wu, X. & Chen, J. Y. Predicting adverse side effects of drugs. BMC Genom. 12, S11 (2011).
Article CAS Google Scholar
Arrowsmith & John Trial watch: phase III and submission failures: 2007–2010. Nat. Rev. Drug Discov. 10, 87 (2011).
Article CAS PubMed Google Scholar
Kuhn, M., Letunic, I., Jensen, L. J. & Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res. 44, D1075–D1079 (2016).
Article CAS PubMed Google Scholar
Shaked, I., Oberhardt, M. A., Atias, N., Sharan, R. & Ruppin, E. Metabolic network prediction of drug side effects. Cell Syst. 2, 209–213 (2016).
Article CAS PubMed Google Scholar
Zhong, H. A. ADMET properties: overview and current topics. Drug Design: Principles and Applications, 113–133, https://doi.org/10.1007/978-981-10-5187-6_8 (2017).
Lei, T. et al. ADMET evaluation in drug discovery: 15. Accurate prediction of rat oral acute toxicity using relevance vector machine and consensus modeling. J. Cheminform. 8, 6 (2016).
Article PubMed PubMed Central CAS Google Scholar
Tropsha, A., Gramatica, P. & Gombar, V. K. The importance of being earnest: validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 22, 69–77 (2003).
Article CAS Google Scholar
Duvenaud, D. et al. ConvolutioNal networks on graphs for learning molecular fingerprints. Adv. Neural Inform. Process. Syst. 13, 2224–2232 (2015).
Google Scholar
Harrer, S., Shah, P., Antony, B. & Hu, J. Artificial intelligence for clinical trial design. Trends Pharm. Sci. 40, 577–591 (2019).
Article CAS PubMed Google Scholar
Yauney, G. & Shah, P. Reinforcement learning with action-derived rewards for chemotherapy and clinical trial dosing regimen selection. Proc. 3rd Mach. Learn. Healthc. Conf. 85, 161–226 (2018).
Google Scholar
Cusick, M. E. et al. Literature-curated protein interaction datasets. Nat. Methods 6, 934–935 (2009).
Article CAS Google Scholar
Lai, X. et al. A disease network-based deep learning approach for characterizing melanoma. Int. J. Cancer 150, 1029–1044 (2022).
Article CAS PubMed Google Scholar
Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
Article PubMed PubMed Central Google Scholar
Mohamed, S. K., Nováček, V. & Nounu, A. Discovering protein drug targets using knowledge graph embeddings. Bioinformatics 36, 603–610 (2019).
Google Scholar
Zhang, Q. et al. Deep learning based classification of breast tumors with shear-wave elastography. Ultrasonics 72, 150–157 (2016).
Article PubMed Google Scholar
Takahashi, Y. et al. Improved metabolomic data-based prediction of depressive symptoms using nonlinear machine learning with feature selection. Transl. Psychiatry 10, 157 (2020).
Article PubMed PubMed Central Google Scholar
Schulte-Sasse, R., Budach, S., Hnisz, D. & Marsico, A. Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. Nat. Mach. Intell. 3, 513–526 (2021).
Article Google Scholar
Wu, F., Ma, C. & Tan, C. Network motifs modulate druggability of cellular targets. Sci. Rep. 6, 36626 (2016).
Article CAS PubMed PubMed Central Google Scholar
Abi Hussein, H. et al. Global vision of druggability issues: applications and perspectives. Drug Discov. Today 22, 404–415 (2017).
Article PubMed Google Scholar
Hiba Abi, H. PockDrug-Server: a new web server for predicting pocket druggability on holo and apo proteins. Nucleic Acids Res. 43, W436–442 (2015).
Article CAS Google Scholar
Zhang, A. et al. Discovery and verification of the potential targets from bioactive molecules by network pharmacology-based target prediction combined with high-throughput metabolomics. RSC Adv. 7, 51069–51078 (2017).
Article CAS Google Scholar
Madhamshettiwar, P. B., Maetschke, S. R., Davis, M. J., Reverter, A. & Ragan, M. A. Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med. 4, 41–41 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zheng, Y., Peng, H., Ghosh, S., Lan, C. & Li, J. Inverse similarity and reliable negative samples for drug side-effect prediction. BMC Bioinform. 19, 554 (2019).
Article CAS Google Scholar

Download references

Author information

These authors contributed equally: Yujie You, Xin Lai.

Authors and Affiliations

College of Computer Science, Sichuan University, Chengdu, 610065, China
Yujie You, Suran Liu & Le Zhang
Laboratory of Systems Tumor Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, 91052, Germany
Xin Lai & Julio Vera
Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Room D513, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, China
Yi Pan
School of Computing, Ulster University, Belfast, BT15 1ED, UK
Huiru Zheng
Institute of Thoracic Oncology, Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, 610065, China
Senyi Deng
Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China
Le Zhang
Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China
Le Zhang

Authors

Yujie You
View author publications
You can also search for this author in PubMed Google Scholar
Xin Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yi Pan
View author publications
You can also search for this author in PubMed Google Scholar
Huiru Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Julio Vera
View author publications
You can also search for this author in PubMed Google Scholar
Suran Liu
View author publications
You can also search for this author in PubMed Google Scholar
Senyi Deng
View author publications
You can also search for this author in PubMed Google Scholar
Le Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.Y. and X.L. contributed equally to this work. Y.Y., X.L., Y.P., H.Z., J.V., S.L., S.D. and L.Z. contributed to writing and revising the paper. X.L., S.D., and L.Z. supervised the research. All authors have read and approved the article.

Corresponding authors

Correspondence to Senyi Deng or Le Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

You, Y., Lai, X., Pan, Y. et al. Artificial intelligence in cancer target identification and drug discovery. Sig Transduct Target Ther 7, 156 (2022). https://doi.org/10.1038/s41392-022-00994-0

Download citation

Received: 04 July 2021
Revised: 14 March 2022
Accepted: 05 April 2022
Published: 10 May 2022
DOI: https://doi.org/10.1038/s41392-022-00994-0

This article is cited by

Point biserial correlation symbiotic organism search nanoengineering based drug delivery for tumor diagnosis
- Garima Shukla
- Sofia Singh
- Ben Othman Soufiene
Scientific Reports (2024)
The Evolving Landscape of Cervical Cancer: Breakthroughs in Screening and Therapy Through Integrating Biotechnology and Artificial Intelligence
- Raghu Aswathy
- Sundaravadivelu Sumathi
Molecular Biotechnology (2024)
Exercise, cancer, and the cardiovascular system: clinical effects and mechanistic insights
- Simon Wernhart
- Tienush Rassaf
Basic Research in Cardiology (2024)
From understanding diseases to drug design: can artificial intelligence bridge the gap?
- Anju Choorakottayil Pushkaran
- Alya A. Arabi
Artificial Intelligence Review (2024)
A Review of the Application of Spatial Transcriptomics in Neuroscience
- Le Zhang
- Zhenqi Xiong
- Ming Xiao
Interdisciplinary Sciences: Computational Life Sciences (2024)

Subjects

Abstract

Similar content being viewed by others

Introduction

The scope of artificial intelligence biology analysis for novel anticancer target investigations

The principles and theories for commonly used artificial intelligence biology analysis algorithms

The principles and theory of network-based biology analysis algorithms

Tthe shortest path algorithm

Algorithm 1

The module detection algorithm

Algorithm 2

The node centrality

Algorithm 3

Machine learning-based biology analysis algorithms

The decision tree algorithm

Algorithm 4

The deep learning algorithms

The artificial intelligence biology analysis for biomedical applications

Identification of novel anticancer targets

Network-based artificial intelligence for identifying novel anticancer targets

ML-based artificial intelligence for identifying novel anticancer targets

Evaluation of the druggability of potential targets

Network-based artificial intelligence for evaluating the druggability of potential targets

ML-based artificial intelligence for evaluating the druggability of potential targets

Drug discovery

Network-based artificial intelligence for drug discovery

ML-based artificial intelligence for drug discovery

Drug properties prediction

ADMET properties prediction

The drug properties application in clinical trial

Discussion and Conclusions

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links