Abstract
As the field of precision medicine progresses, treatments for patients with cancer are starting to be tailored to their molecular as well as their clinical features. The emerging cancer subtypes defined by these molecular features require that dedicated resources be used to assist the discovery of drug candidates for preclinical evaluation. Voluminous gene expression profiles of patients with cancer have been accumulated in public databases, enabling the creation of cancer-specific expression signatures. Meanwhile, large-scale gene expression profiles of cellular responses to chemical compounds have also recently became available. By matching the cancer-specific expression signature to compound-induced gene expression profiles from large drug libraries, researchers can prioritize small molecules that present high potency to reverse expression of signature genes for further experimental testing of their efficacy. This approach has proven to be an efficient and cost-effective way to identify efficacious drug candidates. However, the success of this approach requires multiscale procedures, imposing considerable challenges to many labs. To address this, we developed Open Cancer TherApeutic Discovery (OCTAD; http://octad.org): an open workspace for virtually screening compounds targeting precise groups of patients with cancer using gene expression features. Its database includes 19,127 patient tissue samples covering more than 50 cancer types and expression profiles for 12,442 distinct compounds. The program is used to perform deep-learning-based reference tissue selection, disease gene expression signature creation, drug reversal potency scoring and in silico validation. OCTAD is available as a web portal and a standalone R package to allow experimental and computational scientists to easily navigate the tool.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data related to this protocol can be found at http://octad.org/download or https://www.synapse.org/#!Synapse:syn22101254. You can also refer to the preprint version of our protocol: https://www.biorxiv.org/content/10.1101/821546v1. This pipeline was verified in our previous research papers.
Software availability
The software is available from http://octad.org/download or https://www.synapse.org/#!Synapse:syn22101254.
References
Balamuth, N. J. & Womer, R. B. Ewing’s sarcoma. Lancet Oncol. 11, 184–192 (2010).
Torre, L. A. et al. Global cancer statistics, 2012. CA Cancer J. Clin. 65, 87–108 (2015).
Genetic and Rare Diseases Information Center, National Institutes of Health. FAQs About Rare Diseases. https://rarediseases.info.nih.gov/diseases/pages/31/faqs-about-rare-diseases (2020).
Sirota, M. et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science Tranl. Med. 3, 96ra77 (2011).
Jahchan, N. S. et al. A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov. 3, 1364–1377 (2013).
van Noort, V. et al. Novel drug candidates for the treatment of metastatic colorectal cancer through global inverse gene-expression profiling. Cancer Res. 74, 5690–5699 (2014).
Brum, A. M. et al. Connectivity Map-based discovery of parbendazole reveals targetable human osteogenic pathway. Proc. Natl Acad. Sci. USA 112, 12711–12716 (2015).
Chen, B. et al. Computational discovery of niclosamide ethanolamine, a repurposed drug candidate that reduces growth of hepatocellular carcinoma cells in vitro and in mice by inhibiting cell division cycle 37 signaling. Gastroenterology 152, 2022–2036 (2017).
Pessetto, Z. Y. et al. In silico and in vitro drug screening identifies new therapeutic approaches for Ewing sarcoma. Oncotarget 8, 4079–4095 (2017).
Mirza, A. N. et al. Combined inhibition of atypical PKC and histone deacetylase 1 is cooperative in basal cell carcinoma treatment. JCI Insight 2, e97071 (2017).
Chen, B. et al. Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets. Nat. Commun. 8, 16022 (2017).
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
Subramanian, A. et al. A next generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
Zeng, W. Z. D., Glicksberg, B. S., Li, Y. & Chen, B. Selecting precise reference normal tissue samples for cancer research using a deep learning approach. BMC Med. Genomics 12, 21 (2019).
Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Yu, K. et al. Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types. Nat. Commun. 10, 3574 (2019).
Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35, 314–316 (2017).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Liu, K. et al. Evaluating cell lines as models for metastatic breast cancer through integrative analysis of genomic data. Nat. Commun. 10, (2019).
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechol. 32, 896–902 (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, (2014).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12, (2017).
Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model 55, 2324–2337 (2015).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019).
Bento, A. P. et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42, D1083–D1090 (2014).
Chen, B., Sirota, M., Fan-Minogue, H., Hadley, D. & Butte, A. J. Relating hepatocellular carcinoma tumor samples and cell lines using gene expression data in translational research. BMC Med. Genomics 8, S5 (2015).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov. 5, 1210–1223 (2015).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
McFerrin, L. G. et al. Analysis and visualization of linked molecular and clinical cancer data by using Oncoscape. Nat. Genet. 50, 1203–1204 (2018).
Newton, Y. et al. TumorMap: exploring the molecular similarities of cancer samples in an interactive portal. Cancer Res. 77, e111–e114 (2017).
Schmid, M. W. & Grossniklaus, U. Rcount: simple and flexible RNA-Seq read counting. Bioinformatics 31, 436–437 (2015).
Lachmann, A. et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9, 1366 (2018).
Huang, D. W. et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 8, R183 (2007).
Kucukural, A., Yukselen, O., Ozata, D. M., Moore, M. J. & Garber, M. DEBrowser: interactive differential expression analysis and visualization tool for count data. BMC Genomics 20, 6 (2019).
Wu, H., Huang, J., Zhong, Y. & Huang, Q. DrugSig: a resource for computational drug repositioning utilizing gene expression signatures. PLoS ONE 12, e0177743 (2017).
Moosavinasab, S. et al. ‘RE:fine drugs’: an interactive dashboard to access drug repurposing opportunities. Database 2016, baw083 (2016).
Lee, B. K. B. et al. DeSigN: connecting gene expression with therapeutics for drug repurposing and development. BMC Genomics 18, 934 (2017).
Wang, Z. et al. Drug Gene Budger (DGB): an application for ranking drugs to modulate a specific gene based on transcriptomic signatures. Bioinformatics 35, 1247–1248.
Shameer, K. et al. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Brief. Bioinformatics 19, 656–678 (2018).
Brown, A. S. & Patel, C. J. A standard database for drug repositioning. Sci. Data 4, 170029 (2017).
Chen, B. et al. Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma. Nat. Rev. Gastroenterol. Hepatol. 17, 238–251 (2020).
Smirnov, P. et al. PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics 32, 1244–1246 (2016).
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013).
Dang, C. V. MYC on the path to cancer. Cell 149, 22–35 (2012).
Courtney, K. D., Corcoran, R. B. & Engelman, J. A. The PI3K pathway as drug target in human cancer. J. Clin. Oncol. 28, 1075–1083 (2010).
Glicksberg, B. S., Li, L., Chen, R., Dudley, J. & Chen, B. Leveraging big data to transform drug discovery. Methods Mol. Biol. 1939, 91–118 (2019).
Robinson, D. R. et al. Integrative clinical genomics of metastatic cancer. Nature 548, 297–303 (2017).
Acknowledgements
The research is supported by R01GM134307, R21 TR001743 and K01 ES028047 and the MSU Global Impact Initiative. Amazon AWS research credits were received to support portal development and hosting. The portal was developed with help from Optra Health and MSU IT. The content is solely the responsibility of the authors and does not necessarily represent the official views of sponsors.
Author information
Authors and Affiliations
Contributions
B.Z. led the project, developed the desktop version and wrote the manuscript. B.S.G. and P.N. co-led the project, coordinated web portal development and wrote the manuscript. E.C. developed the R package, led the revision and wrote the manuscript. J.X. performed the LINCS compound analysis, developed compound enrichment analysis and tested the desktop package. K.L. implemented the Toil pipeline and processed RNA-Seq samples. A.W. helped develop the code, prepared tutorials and created case studies. C.C. helped with troubleshooting. B.C. developed the initial code, wrote the manuscript and supervised the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Chen, B. et al. Nat. Commun. 8, 16022 (2017): https://doi.org/10.1038/s41467-019-10148-6
Chen, B. et al. Gastroenterology 152, 2022–2036 (2017): https://doi.org/10.1053/j.gastro.2017.02.039
Zeng, W. Z. D., Glicksberg, B. S., Li, Y. & Chen, B. BMC Med. Genomics 12, 21 (2019): https://doi.org/10.1186/s12920-018-0463-6
Liu, K. et al. Nat. Commun. 10, 2138 (2019): https://doi.org/10.1038/s41467-019-10148-6
Extended data
Extended Data Fig. 1 Screenshots of the web portal.
(a) Disease sample selection, (b) control sample selection, (c) drug prediction job submission, (e) job management, (f) predicted drug list and (g) result files.
Supplementary information
Supplementary Information
Supplementary Text and Supplementary Figs. 1–5.
Rights and permissions
About this article
Cite this article
Zeng, B., Glicksberg, B.S., Newbury, P. et al. OCTAD: an open workspace for virtually screening therapeutics targeting precise cancer patient groups using gene expression features. Nat Protoc 16, 728–753 (2021). https://doi.org/10.1038/s41596-020-00430-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-020-00430-z
This article is cited by
-
Tumor removal limits prostate cancer cell dissemination in bone and osteoblasts induce cancer cell dormancy through focal adhesion kinase
Journal of Experimental & Clinical Cancer Research (2023)
-
ASGARD is A Single-cell Guided Pipeline to Aid Repurposing of Drugs
Nature Communications (2023)
-
Clustering cancers by shared transcriptional risk reveals novel targets for cancer therapy
Molecular Cancer (2022)
-
Machine and cognitive intelligence for human health: systematic review
Brain Informatics (2022)
-
Reversal of cancer gene expression identifies repurposed drugs for diffuse intrinsic pontine glioma
Acta Neuropathologica Communications (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.