Toward Repurposing Metformin as a Precision Anti-Cancer Therapy Using Structural Systems Pharmacology

Metformin, a drug prescribed to treat type-2 diabetes, exhibits anti-cancer effects in a portion of patients, but the direct molecular and genetic interactions leading to this pleiotropic effect have not yet been fully explored. To repurpose metformin as a precision anti-cancer therapy, we have developed a novel structural systems pharmacology approach to elucidate metformin’s molecular basis and genetic biomarkers of action. We integrated structural proteome-scale drug target identification with network biology analysis by combining structural genomic, functional genomic, and interactomic data. Through searching the human structural proteome, we identified twenty putative metformin binding targets and their interaction models. We experimentally verified the interactions between metformin and our top-ranked kinase targets. Notably, kinases, particularly SGK1 and EGFR were identified as key molecular targets of metformin. Subsequently, we linked these putative binding targets to genes that do not directly bind to metformin but whose expressions are altered by metformin through protein-protein interactions, and identified network biomarkers of phenotypic response of metformin. The molecular targets and the key nodes in genetic networks are largely consistent with the existing experimental evidence. Their interactions can be affected by the observed cancer mutations. This study will shed new light into repurposing metformin for safe, effective, personalized therapies.


Supplemental methods
Materials and methods 1. Overview: a structural systems pharmacology approach to predictive modeling of drug actions under diverse genetic background.
Our methodology begins at the level of molecular structure. We start by ranking the proteins annotated within the PDB by the similarity of their binding sites to that of AMPK 1 and dipeptidyl peptidase-IV activity (DPP4), a protein which has been observed to interact directly with metformin 2 . We considered the top hits from these comparisons to be our "putative molecular targets", and simulated the binding between metformin and each putative molecular target to assess the binding pose and interactions between ligand and target. Next, we moved to the larger and more abstract scale of the protein-protein interaction network. We obtained a dataset which represents genes whose expression is perturbed by metformin treatment 3 . Then, for each putative molecular target, we computationally predicted a tree-shaped path (TSP) which functionally linked that putative molecular target to all genes whose expression is perturbed by metformin. We then analyzed the intermediate nodes of each TSP in terms of their association with biological pathways linked to cancer or AMPK signaling, and identified the most critical nodes of each network. Subsequently, we experimentally validated the binding between metformin and several putative molecular targets. Finally, we mapped the observed cancer mutations to the binding pockets of experimentally validated targets, and provided clues to the individualized response of metformin.

Prediction of molecular targets
Previously, we have constructed 3-dimensional complex models of metformin and AMPK and used it to search for the targets of metformin. In addition, DPP4 was retrieved as a molecular target directly binding to metformin by querying the MySql database of ChEMBL release 16. SQL and its output is shown in Figure S1. Recent studies support that metformin inhibits the activity of DPP4 4 5 . Thus, we used the DPP4 (PDB ID 3O95) as another template for the binding site analysis 6 . The SMAP software developed by Xie et al obtains protein structures from the PDB and characterizes that protein"s ligand-binding potential from the geometric, physiochemical, and evolutionary characteristics of its binding pocket. The software compares protein structures and accurately predicts the binding site similarity between the template and all other available structures 7 . Using the crystal structure for AMPK bound to metformin and DPP4 bound to TAK-100 as the templates and 7277 available non-redundant PDB structures for human proteins as the test sample, we identified all structures with similar binding sites at a confidence of 95% which were not homologs of DPP4. The p-value of ligand binding site similarity was corrected using the ROC curve of a benchmark set.
We obtained structures for these proteins co-crystallized with their ligand from the PDB, as well as a 3D structure for metformin. Autodock Vina 8 was used to predict the binding pose and energy of binding between each protein and metformin. These reverse-docking experiments were performed at a simulated pH of 7. The size of the binding region was determined by re-docking each protein"s original ligand to the pocket and choosing the size which minimized the RMSD between the location of the original ligand and the re-docked ligand. Binding interactions were analyzed using DS Visualizer 9 and visualized using PyMOL 10 .

Experimental validation of metformin's interaction with kinases
To test the accuracy of our binding site analysis, we employed a competition binding assay to detect the binding of metformin to a set of kinases chosen from our putative targets. We examined binding between metformin and six top-ranked kinases, namely AKT1, SGK, CDK7, and MAPK14, MAP2K2, and EGFR. The proprietary KINOMEscan assay was performed by DiscoverX (CA). The assay tested the capacity for metformin to disrupt the binding of each DNA-tagged kinase to a support which was in turn bound to the kinase"s known ligand. If binding between the kinase and its known ligand was disrupted in the presence of metformin, this indicated that metformin either competed directly with the known ligand or allosterically altered the kinase"s ability to bind to that ligand. DMSO was used as a positive control and a pico-molar kinase inhibitor was used as a negative control. Binding levels were quantified by performing qPCR on the DNA tag of the ligand-bound kinases. The tests were performed at 1x10 6 nM concentration of metformin, and results were reported as %Control, calculated as follows: A lower %Control score indicates a stronger interaction. The KINOMEScan experiment and data analysis were performed by DiscoveRx (Fremont, CA).

Identification of drug-modulated interaction sub-networks
We obtained human protein-protein interaction data from STRING-DB 11 , and converted the Ensembl-protein IDs 12 to Unigene IDs 13 , and then to official gene symbols 14 15 . We removed entries for two ubiquitin genes, UBC and UBA52, from the network to prevent their high degree of connectivity from introducing bias into our predicted sub-networks.
The terminal nodes for our network were chosen from a set of genes which were differentially- Using the MSGSTEINER software, we generated a protein-protein interaction sub-network for each metformin-interacting protein 16 . The parameters used in this study are shown in Table S2.
Each sub-network represents the most parsimonious series of interactions found within the STRING-DB that connects each root node to all differentially-translated genes. Edges were weighted as the strength of the protein-protein interaction as documented in the STRING-DB, normalized to a score between zero and one. As our comparison of binding site experiment initially yielded 16 putative targets, as a control we also generated interaction networks using the same terminal nodes for 16 randomly-selected genes (R-control) whose protein products were annotated within the STRING-DB 17 . A second control set was produced using 20 genes selected at random from the set of leaf nodes (L-control). A third control set was generated using a random selection of 19 human genes (K-control) which were annotated with the "kinase activity" Gene Ontology term (GO:0016301).
A previous study tested the possibility that metformin"s effects are elicited through direct interaction with AMPK. The PDB structure for the AMPK subunit PRKAB1 (PDB ID 1Z0M) was used as a template for binding site comparison, and the structures for MAP2K2 (PDB ID 1S9I), EGFR (PDB ID 3B2V), TIAM1 (PDB ID 3A8N), and PDK2 (PDB ID 2BU7) were identified 1 . Additional sub-networks were generated using these five putative targets as roots.

Analysis of interaction sub-networks
With the goal of differentiating between the predicted content of each sub-network, we removed the terminal nodes from the gene set for our interaction networks. We used the online tool GeneTrail to perform over-or under-representation analysis of these gene sets to identify biological pathways relevant to metformin"s putative mechanism of action or other cancerrelated pathways 18 . The significance threshold was set to 0.05 and the false discovery rate was controlled by the Benjamini-Hochberg method, and the minimum number of genes that must be part of a pathway for it to be considered significant was set to 2.
Based on the results of the pathway analysis, we chose several key KEGG pathways to examine.
These pathways included cancer-related pathways as well as pathways linked to metformin"s demonstrated regulation of metabolism and AMPK signaling: "AMPK signaling", "MAPK signaling", "ErbB signaling", "mTOR signaling", "B cell receptor signaling", "Adipocytokine signaling", "Neurotrophin signaling", "Insulin signaling", "VEGF signaling", "DNA replication", "Apoptosis", "Cell cycle", "Pathways in cancer", and "Nucleotide excision repair" 19 . We evaluated the number of genes from each interaction network that participated in each of these pathways. Both terminal and constituent nodes were included in this analysis.
We used the Markov Cluster algorithm to rank the nodes of these sub-networks by their betweenness-centrality in an attempt to determine which genes are most critical to the functionality of the interaction network 20 21 . We defined a node as critical if its betweennesscentrality value was ranked in the 95 th percentile for the sub-network in question. We identified those nodes that are critical across all predicted sub-networks, as well as those that were grouped within highly-participatory sub-networks. We characterized individual genes using the GeneCards database (http://www.genecards.org/).

Mutation analysis
For experimentally validated kinase targets of metformin, the binding pose of metformin was predicted using protein-ligand docking software Autodock Vina 8 . The amino acid residues that interact with metformin were determined using DS Visualizer. The mutations observed in COSMIC 22 were mapped to the binding site residues, and visualized using DS Visualizer. The mutations that might lead to the rewiring of protein-protein interaction network were extracted from AlQuraishi et al 23 .   Supplemental Tables   Table S1: Mutations from the COSMIC database which likely affect metformin binding.
Italicized mutations are those that affect residues identified as the binding site on PDB. * indicates mutations which involve charged residues.