HyperFoods: Machine intelligent mapping of cancer-beating molecules in foods

Recent data indicate that up-to 30–40% of cancers can be prevented by dietary and lifestyle measures alone. Herein, we introduce a unique network-based machine learning platform to identify putative food-based cancer-beating molecules. These have been identified through their molecular biological network commonality with clinically approved anti-cancer therapies. A machine-learning algorithm of random walks on graphs (operating within the supercomputing DreamLab platform) was used to simulate drug actions on human interactome networks to obtain genome-wide activity profiles of 1962 approved drugs (199 of which were classified as “anti-cancer” with their primary indications). A supervised approach was employed to predict cancer-beating molecules using these ‘learned’ interactome activity profiles. The validated model performance predicted anti-cancer therapeutics with classification accuracy of 84–90%. A comprehensive database of 7962 bioactive molecules within foods was fed into the model, which predicted 110 cancer-beating molecules (defined by anti-cancer drug likeness threshold of >70%) with expected capacity comparable to clinically approved anti-cancer drugs from a variety of chemical classes including flavonoids, terpenoids, and polyphenols. This in turn was used to construct a ‘food map’ with anti-cancer potential of each ingredient defined by the number of cancer-beating molecules found therein. Our analysis underpins the design of next-generation cancer preventative and therapeutic nutrition strategies.


Additional dataset S1 (separate file): Model parameters and scores
List of all used parameter combinations and their corresponding F-scores and individual AC (anti-cancer, "positive" class) and non-AC ("other", "negative" class) fractions of correctly predicted labels.
Additional dataset S2 (separate file): Anti-cancer likeness drug prediction gene scores Higher score means higher correlation between anti-cancer likeness of the drug and higher propagated gene value.
Additional dataset S3 (separate file): Anti-cancer likeness prediction for existing FDA approved drugs Anticancer label columns reflect if the drug was marked as anti-cancer ("1") or not ("0") in DrugBank, repoDB or DrugCentral (as indication or off-label use). Column "Any" summarizes the labels. Anticancer likeness is the Platt/Logistic Regression probability of the compound being an anti-cancer drug and is an averaged prediction from the selected best 700 models with F-score >=0.84. Different models have different compound-gene connection confidence thresholds. Some compounds could not be analyzed by models with higher thresholds because none of the compound connections passed them. Model count indicates how many models could make the prediction for the compound in the list and reflects the reliability/strength of the prediction. The default decision threshold for anticancer likeness is 0.5. However, some compounds with the predicted anticancer likeness as low as of 0.09 were retrospectively found to be reported in the literature as potential anti-cancer candidates. Thus the list of compounds is reported up to the anticancer likeness of 0.09. Candidates with less than 90 models are excluded as unreliable.

Additional dataset S4 (separate file): Anti-cancer likeness prediction for the food compounds from FooDB
Anticancer likeness is the Platt/Logistic Regression probability of the compound being an anti-cancer drug and is an averaged prediction from the selected best 700 models with fscore >=0.84. Different models have different compound-gene connection confidence thresholds. Some compounds could not be analyzed by models with higher thresholds because none of the compound connections passed them. Model count indicates how many models could make the prediction for the compound in the list and reflects the reliability/strength of the prediction. Only compounds for which half of the models gave predictions were reported. Both cancerogenic and anti-cancer compounds could potentially be highlighted by this method. Some compounds were found to be present in toxin database as well (T3DB) (InChI Key matching, stereo isomers treated as the same molecule) and were supplemented with additional information about their mechanism of toxicity and carcinogenicity if available. Other fields cover compound classification based on FooDB, their description, top 10 pathways from Reactome and KEGG affected according to GSEA (Gene Set Enrichment Analysis), top 10 propagated genes, InChI keys and synonyms from FooDB. PubChem IDs are also provided for all compounds. The list of compounds is reported up to the anticancer likeness of 0.09 and candidates with less than 90 models are excluded as unreliable as in the case for anti-cancer drugs in SI Dataset S2.
Additional dataset S5 (separate file): Pathway enrichment analysis for the curated list of cancer beating molecules (CBM) from FooDB with anti-cancer likeness >0. 7