Hyperbolic matrix factorization improves prediction of drug-target associations

Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.


Hyperbolic matrix factorization improves prediction of drug-target associations Aleksandar Poleksic
Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.
Computational methods for biological relationship inference use dimension reduction techniques to represent biological objects as points in a low-dimensional space. The underlying assumption is that biological systems have low intrinsic dimension. For instance, it has been well established that most variations in genomic databases can be explained by a small set of features, such as the cell state, the cell type, or a gene program 1 . In a different example, the low dimensionality of databases of drugs' adverse reactions is due to associations of side-effects to chemical substructures and their combinations 2,3 . To put it differently, it is known that drugs sharing chemical substructures give rise to same adverse reactions.
The research on dimensionality reduction and associated relationship prediction has traditionally focused on the development and applications of advanced computational and statistical techniques while taking the Euclidean geometry of the native biological space for granted. However, recent theoretical studies challenge the flat geometry assumption [4][5][6][7][8][9][10] . According to these studies, complex systems exhibit tree-like topology with high degree of clustering. Therefore, embedding those systems into the Euclidean space inevitably leads to distortion of distances between individual objects and, in turn, compromises the accuracy of relationship inference. In contrast, a negatively curved space can accommodate the exponential growth in the number of relevant network features since the area of a hyperbolic circle is an exponential function of its radius (Fig. 1).
Recent years have seen the development of practical algorithms that use hyperbolic geometry to model complex networks [11][12][13][14][15][16][17][18] 18 . Hyperbolic distance learning has also been incorporated into artificial neural network models, for instance to encode the chemical structures of drugs 19 .
In this paper, we show how hyperbolic latent space can be utilized to increase the accuracy of matrix factorization. While our algorithm has been benchmarked on drug-target interaction datasets, the same technique can be applied to other relationship inference tasks (e.g., to predict drug-disease or drug-side effect associations, user preferences to movies or songs, etc.).
We emphasize that improving matrix factorization techniques is of particular importance in recommender systems, since a carefully designed matrix factorization method is known to outperform deep learning in many

Methods
The theoretical foundation. Hyperbolic geometry can be modeled on the n-dimensional hyperboloid in the Lorentzian space R n,1 (Fig. 1), where R n,1 is a copy of R n+1 equipped with a bilinear form �·, ·� L defined as Hyperbolic space is represented by one sheet of the two-sheeted hyperboloid (which can be thought of as a sphere of radius i = √ −1 ), namely, It can be shown that the bilinear form �·, ·� L restricted on the tangent space T p H n at a point p ∈ H n , defined by is positive definite, thereby providing a genuine Riemannian metric on H n . The distance between two points x , y ∈ H n is given by An interesting (and in the biological context insightful) property of the hyperbolic space is that the shortest path between two random points in H n that are far away from the vertex µ 0 has almost the same length as the path through the vertex (Fig. 1). This resembles the property of the distance function on trees, where the shortest path between two randomly selected nodes deep in the tree is almost of the same length as the path through the root.
Let A = a i m i=1 be the set of drugs and B = b j n j=1 the set of targets (proteins). Denote by R = r i,j m×n the matrix of relationships (edges) between the elements of A and B . Specifically, r i,j = 1 if a i interacts with b j and r i,j = 0 otherwise (no interaction or unknown). Let u i , v j ∈ H d be the latent vector representations of a i and b j , respectively, where d ≪ max(m, n) . Denote by e i,j the event that a i interacts with b j . In line with the classical (1) �x, y� L = x 1 y 1 + · · · + x n y n − x n+1 y n+1 .
(2) x ∈ R n,1 |�x, x� L = −1  where d 2 L x, y denotes the squared Lorentzian distance 34 between the points x, y ∈ H d , namely Denote by W = w i,j m×n our confidence in the entries r i,j of the interaction matrix R . In many practical applications, w i,j = 1 if r i,j = 0 , and w i,j = c if r i,j = 1 , where c > 1 is a constant 21 . In general, the idea is to assign higher weights to trustworthy pairs i.e., those for which we have higher confidence of interaction. Given the weights w i,j , the likelihood of r i,j given u i and v j is Thus, assuming the independence of events e i,j , it follows that where U and V represent the matrices of latent preferences of elements from A and B , respectively (in other words, the i th row of U is the vector u i and i th row of V is v i ).
Computing the prior distribution. Similar to the Euclidean case 21,31 , our goal is to derive the probability p(U, V |R) from (9) through the Bayesian inference.
Utilizing the recent work on wrapped normal distribution in hyperbolic space 35 , we define the prior distributions as where G(µ, �) is the pseudo-hyperbolic Gaussian distribution and µ 0 = (0, . . . , 0, 1) is the vertex of the hyperboloid (the origin of the hyperbolic space).
The pseudo-hyperbolic Gaussian distribution extends the notion of Gaussian distribution to the hyperbolic space (Fig. 2). In short, for µ ∈ H d and positive definite , sampling from G(µ, �) can be thought of as a three step process: (a) Sample a vector x ∈ T µ 0 H d from N (0, �) , (b) Transport x along the geodesic joining the points µ 0 ∈ H d and µ ∈ H d to y∈ T µ H d , and (c) Project y to z ∈ H d .
The step (b) is carried out using the parallel transport g µ 0 →µ :

The loss function.
With the prior placed on U and V , we return to calculating the posterior probability p(U, V |R) through the Bayesian inference Following the Euclidean matrix factorization, we take the logarithm of the posterior distribution (15) to arrive at the closed form expression for the loss function In the expression above, p is the probability density function of the normal distribution N 0, σ 2 I in the tangent space T µ 0 H d at the vertex µ 0 = (0, . . . , 0, 1) and, for where C 1 is a constant. Moreover, since are trainable parameters and C is a constant.
Alternating gradient descent in hyperbolic space. Minimizing a real function defined in a d-dimensional Euclidean space R d is routinely accomplished using the gradient descent technique. We adopt a similar method for finding the point u ∈ H d of a local minimum of any real valued function f : H d → R 36,37 . For this strategy to work, the function f must be defined is in the ambient space R d,1 of H d , as well as on H d . Specifically, given the initial value u = u (0) and a step size η , the gradient descent in hyperbolic space can be carried out by repeating the following steps: The gradient ∇ R d,1 u f in the ambient space R d,1 is a vector of partial derivatives (note the negative sign of the last vector's component). The above representation of the gradient follows directly from its definition: The orthogonal projection from the ambient space onto the tangent space in (step 2 above) is given by We use the "alternating gradient descent" method to minimize the error function L A,B given in (21). The partial derivatives of L A,B are Figure 4 shows the pseudocode of our algorithm. www.nature.com/scientificreports/ Hyperbolic neighborhood regularization and cold-start. A standard way to increase the accuracy of relationship inference between the elements of two biological domains A and B is to employ the so-called neighborhood regularization. The goal is to ensure that similar entities from A are in relationship with similar entities from B (e.g., similar drugs interact with similar genes). To achieve this, we extend the Euclidean neighborhood regularization method 21,38 to H d by adding the following term to the loss function L (21): where s i,j (respectively t i,j ) is the value reflecting the similarity between a i and a j (respectively b i and b j ) and β U , β V are trainable (neighborhood regularization) parameters. A separate procedure is needed to address the "cold-start" problem i.e., the arrival of a new node (a node with no known relationships to other nodes). In the setting of drug-target interaction prediction, this procedure is used to predict targets for new compounds (such as a chemical in pre-clinical studies) and vice versa.
For the hyperbolic cold-start, we use a hyperbolic variant of the Euclidean weighted-profile method 21,31,33 . Specifically, the latent vector u i ∈ R d for a drug a i ∈ A that does not interact with any protein b j ∈ B (i.e., the i th row of R is empty) is computed as the weighted combination of the rows u j ∈ U most similar to u i . Specifically, where SM = J j=1 s i,j and J is a pre-defined number of nearest neighbors. The hyperbolic center of mass u i is computed as in Law et al. 39 .

Results
Benchmarking experiments. We benchmarked the hyperbolic matrix factorization on four drug-target interaction test sets, specifically Nr, Gpcr, Ion, and Enz 40 , using four traditional classification measures, namely the area under the receiver operating characteristics curve (AUC), the area under the precision-recall curve www.nature.com/scientificreports/ (AUPR), precision at top ten (PREC@10), and the average precision (AP). An extensive grid search is employed to train the parameters of each method (see the Supplementary Data). In our first benchmark, we assessed the advantage of the basic logistic hyperbolic matrix factorization over the classical Euclidean matrix factorization (as implemented in the popular NRLMF method 21 ), in absence of any side-information (i.e., the pairwise drug and the pairwise protein similarity). As described in the "Methods" section, the hyperbolic method is conceptually the same as the Euclidean method, but it uses −d 2 L x, y = 2 + 2�x, y� L in place of x, y and uses the pseudo-hyperbolic Gaussian distribution (10) in place of the Gaussian prior.
We submit each method (Euclidean and hyperbolic) to ten rounds of the fivefold cross-validation (CV) test (also known as CVP test 22 ). In each CV round, the data set under consideration (i.e., the drug-target association matrix) is randomly split into 5 groups. Each group is used once as test data, while the remaining four groups represent training data. Hence, every (interacting and non-interacting) drug-target pair is scored once in each CV round. The final classification score (AUC, AUPR, PREC@10, AP) assigned to each DTI prediction method is computed by averaging classification scores obtained across different CV rounds.
As seen in Table 1, the bare-bone hyperbolic matrix factorization routinely outperforms the bare-bone Euclidean factorization in identifying four types of drug targets (Nr, Gpcr, Ion, and Enz) and across fundamentally different classification measures (AUC, AUPR, PREC10, AP).
Interestingly, the hyperbolic matrix factorization achieves superior accuracy at latent dimensions that are by an order of magnitude smaller compared to dimensions needed for an optimal Euclidean embedding. Specifically, optimal Euclidean factorization is most often achieved at ranks exceeding 150. In contrast, most of the time, hyperbolic factorization needs only 5 or 10 latent features to achieve the same or better classification scores (Fig. 5). We view this as additional evidence that the hyperbolic space is the native space of biological networks.
In our second test, we allow both methodologies to use drug and protein homophily information to boost the prediction accuracy. In the classical (Euclidean) setting, we incorporate side-information precisely as done in the NRLMF method 21 . The hyperbolic algorithm uses the same general formula (27), but employs the hyperbolic distances in place of the Euclidean distances. As seen in Table 2, the Euclidean factorization erases some head-start Table 1. Comparison of the basic (no side-information or profile-weighting) Euclidean and hyperbolic logistic matrix factorization algorithm (as implemented in the NRLMF method). The results are obtained using ten rounds of fivefold cross-validation. Better results are shown in bold. www.nature.com/scientificreports/ advantage of hyperbolic factorization in the fivefold CVP test, albeit at much higher latent dimensions. This is somewhat expected, as the side information enables the Euclidean method to approach the theoretical limits on the accuracy that can be achieved on the four noisy, sparse, and biased test sets used in our study. For a more thorough analysis, we also carried out the above benchmarks using tenfold cross validation. The results of our tenfold CV tests are shown in the Supplementary Tables 1 and 2. Depending on a test set under consideration, tenfold cross validation might be a more meaningful experiment as removing only 10% of the existing network links (as opposed to 20% in a fivefold CV) preserves important structural features of the target network 41,42 .

AUC
While the first two benchmarks help gain insight into the value added by different components of the lossfunction, our final benchmark compares the two techniques in the most important and the most difficult coldstart setting. In this experiment, known as Leave-One-Out Cross-Validation (LOOCV), we hide (zero out) and then try to recover all interactions of every drug under consideration. Specifically, for each drug d, we hide (zero out) and then try to recover all interactions (known and unknown) of d with all proteins in the data set. Thus, LOOCV can be viewed as a (non-stochastic) variant of a (single round) m-cross validation procedure, where m is the number of drugs.
To better assess the performance of hyperbolic embedding, we include in the LOOCV benchmark two additional state-of-the-arts methods, namely, DNILMF 22 , and NGN 24 . The DNILMF method is like NRLMF, but it incorporates drug and protein homophily directly into the formula for p i,j (6). Moreover, it employs a nonlinear diffusion technique to construct pairwise drug and protein similarity matrices 22 . The NGN method is also similar in spirit to NRLMF, but it builds a neighborhood-based global network model instead of learning drug and target features separately 24 .
We constructed a hyperbolic variant of each technique by simply replacing the Euclidean dot product with the negative Lorentzian distance and by replacing the Gaussian prior by the wrapped normal distribution in the hyperbolic space (as discussed in the "Methods" section).
As seen in Table 3, the hyperbolic matrix factorization improves the accuracy of current techniques in predicting protein targets for new compounds, such as the chemicals in preclinical studies or clinical trials. In addition, the Supplementary Table 3 shows that our method improves DTI predictions on isolated samples, namely drug-target pairs (d, t) , where d does not have interacting targets (other than t ) and t does not have interacting drugs (other than d). Table 2. Accuracy of the full-blown Euclidean and hyperbolic logistic matrix factorizations in predicting drug-target interactions (10 rounds of fivefold CV test). Both methods are allowed to take advantage of the side-information and profile weighting. Significantly better values are shown in bold. www.nature.com/scientificreports/ Additional tests. Recent years have seen the developments of machine learning algorithms for different biological relationship inference tasks [43][44][45][46] . While many of those methods can be tailored to provide predictions of drug-target interactions, it would be unrealistic to benchmark them all against the methodology presented in this article. Supplementary Table 4 provides the comparison of our technique against the SVM-based algorithm BLM 47 and the GRGMF-a matrix factorization algorithm 48 . We were also interested in how our method fares against the Cannistraci's methods 49 based on the localcommunity-paradigm (LCP). These methods are simple to interpret as they use a combination of node similarity metrics (directly observable in a bipartite drug-target network), such as the number of common neighbors (CN) and the number of links between those neighbors (LCL). Aside from exhibiting the accuracy superior to that of other unsupervised drug-target link prediction algorithms (and comparable to accuracies of supervised algorithms), Cannistraci's methods are extremely fast (Supplementary Fig. 1) and thus ideally suited for the task of link prediction in large networks. The results of our comparison with the LCP-based methods are shown in the Supplementary Tables 5 and 6.
While our project was, in part, inspired by the recent studies on hyperbolic network embedding, most of those methods, such as Coalescent Embedding (CE) 14 , were not specifically tailored for the DTI prediction task. To make a meaningful comparison with CE, we had to first place the two algorithms on the same ground. More precisely, in our tests the inference by CE was conducted based upon the hyperbolic distances between drugs and targets (closer objects are more likely to interact) computed from the coalescent embedding of the drug-target interaction network in the Poincaré disk. We also restricted the embedding dimension in our method to 2 since CE preferably uses the Poincaré disk as the latent space. The classification scores achieved by the two techniques are presented in the Supplementary Tables 7 and 8. We emphasize that, due to the methods' modifications mentioned above, the benchmarking results shown in the supplementary material should be interpreted with caution.
In a quest for high accuracy, some algorithms for DTI prediction utilize biomedical knowledge beyond the protein amino-acid sequences and drug chemical structures, including the information on adverse drug reactions, drug-disease and protein-disease associations, drug-induced gene expression profiles, protein-protein interactions, etc. Such a rich input often leads to information leak, presenting a challenge in evaluating these methods in a classical drug discovery setting where (typically) only a chemical structure of the drug and the primary sequence of the gene is known upfront.
Recent years have also seen the development of methods for drug-target affinity (DTA) prediction [50][51][52][53] . In contrast to DTI prediction methods, DTA algorithms utilize drug-target binding affinity scores and treat DTI as a regression (rather than a binary classification) problem. Moreover, unlike DTI methods, DTA algorithms are typically evaluated on Davis 54

Discussion and conclusion
Matrix factorization is one of the main techniques used in computational systems biology to uncover relationships between the elements from a pair of biological domains. The technique works by representing the biological objects as points in a low dimensional (latent) space in a way that best explains the input set of known interactions. More precisely, the input matrix of know associations is completed by approximating it as a product of two lower dimensional matrices.
Past research in computational systems biology, including matrix factorization techniques, has taken the Euclidean geometry of the biological space for granted. This has been convenient due to the availability of advanced analytic, numerical, statistical and machine learning procedures in the Euclidean space. However, recent theoretical studies suggest that the hyperbolic geometry, rather than Euclidean, underpins all complex networks in general and the biological networks in particular. Therefore, a radical shift in data representation is necessary to obtain an undistorted view of the biological space and, in turn, ensure further progress in systems biology and related fields.
We have developed and benchmarked a technique for a probabilistic hyperbolic matrix factorization and applied it to predict drug-target interactions. We demonstrate that the Lorentzian model of hyperbolic space allows for a closed form expression of the key transformations and techniques required for latent space dimensionality reduction. Our method builds upon recent advances in the development of probabilistic models and numerical optimization in hyperbolic space to learn an optimal embedding and to compute the probabilities of drug-target interactions. Our benchmarking tests demonstrate a significant increase in accuracy and a drastic reduction in latent space dimensionality of hyperbolic embedding compared to Euclidean embedding. These findings reaffirm the negative curvature of the native biological space.
Although a (bipartite drug-target) hyperbolic network embedding arises as a byproduct of hyperbolic matrix factorization, our focus is on prioritizing targets for a given drug (and vice versa). To better assisting structurebased drug discovery, DTI prediction methods focus more on identifying a handful of targets with strong binding affinities and much less on prioritizing many remaining targets with weak interactions (this also explains why the AUPR-like metrics are preferred in computational systems biology). To achieve this goal, DTI prediction methods are willing to distort the network structure away from the immediate neighbors of each drug in order to better model the network in the vicinities of drugs. In our methods, the distortion occurs each time a weighted profile is constructed to address the cold-start problem. www.nature.com/scientificreports/ There are several aspects of hyperbolic matrix factorization that this study has not explored in detail, including optimal procedure for gradient descent in hyperbolic space. In contrast to decades of research on Euclidean numerical analysis techniques, the methods for numerical optimization in the hyperbolic space are few and far between. The main difficulty is the numerical instability of the hyperbolic gradient descent in vicinity of cliffs 57 . In this study, we applied a simple heuristic intervention to combat the explosion in the magnitude of the hyperbolic gradient. For our optimization method to converge to a local minimum, we carried out three iterations of the gradient descent procedure, lowering the learning rate on the fly and clipping the gradient if necessary. We believe that further research in this area will add significant value to hyperbolic embedding and inference methods.
Our model uses the same hyperbolic space to represent both drugs and proteins. This widely used approach 58-60 is applied in our study due to simplicity of algorithm design and the fact that heterogeneous networks are shown to have a metric structure with an effective hyperbolic geometry underneath 61 . However, alternative approaches are also worthwhile considering. Viewing biomedical entities (in our case drugs and proteins) as objects residing in spaces of different dimension and curvature, the bipartite graph of their relationships can be realized in the hyperbolic product space 62 . Finding the proper dimension and the curvature of the space that underlines each biological domain is expected to result in a more accurate latent representation and, in turn, more accurate relationship prediction.

Data availability
The code and test sets are available in the github repository https:// github. com/ polek sic/ Hyper bolic_ MF. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.