As the field of precision medicine progresses, treatments for patients with cancer are starting to be tailored to their molecular as well as their clinical features. The emerging cancer subtypes defined by these molecular features require that dedicated resources be used to assist the discovery of drug candidates for preclinical evaluation. Voluminous gene expression profiles of patients with cancer have been accumulated in public databases, enabling the creation of cancer-specific expression signatures. Meanwhile, large-scale gene expression profiles of cellular responses to chemical compounds have also recently became available. By matching the cancer-specific expression signature to compound-induced gene expression profiles from large drug libraries, researchers can prioritize small molecules that present high potency to reverse expression of signature genes for further experimental testing of their efficacy. This approach has proven to be an efficient and cost-effective way to identify efficacious drug candidates. However, the success of this approach requires multiscale procedures, imposing considerable challenges to many labs. To address this, we developed Open Cancer TherApeutic Discovery (OCTAD; http://octad.org): an open workspace for virtually screening compounds targeting precise groups of patients with cancer using gene expression features. Its database includes 19,127 patient tissue samples covering more than 50 cancer types and expression profiles for 12,442 distinct compounds. The program is used to perform deep-learning-based reference tissue selection, disease gene expression signature creation, drug reversal potency scoring and in silico validation. OCTAD is available as a web portal and a standalone R package to allow experimental and computational scientists to easily navigate the tool.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Balamuth, N. J. & Womer, R. B. Ewing’s sarcoma. Lancet Oncol. 11, 184–192 (2010).
Torre, L. A. et al. Global cancer statistics, 2012. CA Cancer J. Clin. 65, 87–108 (2015).
Genetic and Rare Diseases Information Center, National Institutes of Health. FAQs About Rare Diseases. https://rarediseases.info.nih.gov/diseases/pages/31/faqs-about-rare-diseases (2020).
Sirota, M. et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Science Tranl. Med. 3, 96ra77 (2011).
Jahchan, N. S. et al. A drug repositioning approach identifies tricyclic antidepressants as inhibitors of small cell lung cancer and other neuroendocrine tumors. Cancer Discov. 3, 1364–1377 (2013).
van Noort, V. et al. Novel drug candidates for the treatment of metastatic colorectal cancer through global inverse gene-expression profiling. Cancer Res. 74, 5690–5699 (2014).
Brum, A. M. et al. Connectivity Map-based discovery of parbendazole reveals targetable human osteogenic pathway. Proc. Natl Acad. Sci. USA 112, 12711–12716 (2015).
Chen, B. et al. Computational discovery of niclosamide ethanolamine, a repurposed drug candidate that reduces growth of hepatocellular carcinoma cells in vitro and in mice by inhibiting cell division cycle 37 signaling. Gastroenterology 152, 2022–2036 (2017).
Pessetto, Z. Y. et al. In silico and in vitro drug screening identifies new therapeutic approaches for Ewing sarcoma. Oncotarget 8, 4079–4095 (2017).
Mirza, A. N. et al. Combined inhibition of atypical PKC and histone deacetylase 1 is cooperative in basal cell carcinoma treatment. JCI Insight 2, e97071 (2017).
Chen, B. et al. Reversal of cancer gene expression correlates with drug efficacy and reveals therapeutic targets. Nat. Commun. 8, 16022 (2017).
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
Subramanian, A. et al. A next generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
Zeng, W. Z. D., Glicksberg, B. S., Li, Y. & Chen, B. Selecting precise reference normal tissue samples for cancer research using a deep learning approach. BMC Med. Genomics 12, 21 (2019).
Aran, D., Sirota, M. & Butte, A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4, 2612 (2013).
Yu, K. et al. Comprehensive transcriptomic analysis of cell lines as models of primary tumors across 22 tumor types. Nat. Commun. 10, 3574 (2019).
Vivian, J. et al. Toil enables reproducible, open source, big biomedical data analyses. Nat. Biotechnol. 35, 314–316 (2017).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Liu, K. et al. Evaluating cell lines as models for metastatic breast cancer through integrative analysis of genomic data. Nat. Commun. 10, (2019).
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechol. 32, 896–902 (2014).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Law, C. W., Chen, Y., Shi, W. & Smyth, G. K. voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, (2014).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Costa-Silva, J., Domingues, D. & Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS ONE 12, (2017).
Sterling, T. & Irwin, J. J. ZINC 15–ligand discovery for everyone. J. Chem. Inf. Model 55, 2324–2337 (2015).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019).
Bento, A. P. et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42, D1083–D1090 (2014).
Chen, B., Sirota, M., Fan-Minogue, H., Hadley, D. & Butte, A. J. Relating hepatocellular carcinoma tumor samples and cell lines using gene expression data in translational research. BMC Med. Genomics 8, S5 (2015).
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016).
Seashore-Ludlow, B. et al. Harnessing connectivity in a large-scale small-molecule sensitivity dataset. Cancer Discov. 5, 1210–1223 (2015).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Gao, J. et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, pl1 (2013).
McFerrin, L. G. et al. Analysis and visualization of linked molecular and clinical cancer data by using Oncoscape. Nat. Genet. 50, 1203–1204 (2018).
Newton, Y. et al. TumorMap: exploring the molecular similarities of cancer samples in an interactive portal. Cancer Res. 77, e111–e114 (2017).
Schmid, M. W. & Grossniklaus, U. Rcount: simple and flexible RNA-Seq read counting. Bioinformatics 31, 436–437 (2015).
Lachmann, A. et al. Massive mining of publicly available RNA-seq data from human and mouse. Nat. Commun. 9, 1366 (2018).
Huang, D. W. et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 8, R183 (2007).
Kucukural, A., Yukselen, O., Ozata, D. M., Moore, M. J. & Garber, M. DEBrowser: interactive differential expression analysis and visualization tool for count data. BMC Genomics 20, 6 (2019).
Wu, H., Huang, J., Zhong, Y. & Huang, Q. DrugSig: a resource for computational drug repositioning utilizing gene expression signatures. PLoS ONE 12, e0177743 (2017).
Moosavinasab, S. et al. ‘RE:fine drugs’: an interactive dashboard to access drug repurposing opportunities. Database 2016, baw083 (2016).
Lee, B. K. B. et al. DeSigN: connecting gene expression with therapeutics for drug repurposing and development. BMC Genomics 18, 934 (2017).
Wang, Z. et al. Drug Gene Budger (DGB): an application for ranking drugs to modulate a specific gene based on transcriptomic signatures. Bioinformatics 35, 1247–1248.
Shameer, K. et al. Systematic analyses of drugs and disease indications in RepurposeDB reveal pharmacological, biological and epidemiological factors influencing drug repositioning. Brief. Bioinformatics 19, 656–678 (2018).
Brown, A. S. & Patel, C. J. A standard database for drug repositioning. Sci. Data 4, 170029 (2017).
Chen, B. et al. Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma. Nat. Rev. Gastroenterol. Hepatol. 17, 238–251 (2020).
Smirnov, P. et al. PharmacoGx: an R package for analysis of large pharmacogenomic datasets. Bioinformatics 32, 1244–1246 (2016).
Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics 14, 7 (2013).
Dang, C. V. MYC on the path to cancer. Cell 149, 22–35 (2012).
Courtney, K. D., Corcoran, R. B. & Engelman, J. A. The PI3K pathway as drug target in human cancer. J. Clin. Oncol. 28, 1075–1083 (2010).
Glicksberg, B. S., Li, L., Chen, R., Dudley, J. & Chen, B. Leveraging big data to transform drug discovery. Methods Mol. Biol. 1939, 91–118 (2019).
Robinson, D. R. et al. Integrative clinical genomics of metastatic cancer. Nature 548, 297–303 (2017).
The research is supported by R01GM134307, R21 TR001743 and K01 ES028047 and the MSU Global Impact Initiative. Amazon AWS research credits were received to support portal development and hosting. The portal was developed with help from Optra Health and MSU IT. The content is solely the responsibility of the authors and does not necessarily represent the official views of sponsors.
The authors declare no competing interests.
Peer review information Nature Protocols thanks the anonymous reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Chen, B. et al. Nat. Commun. 8, 16022 (2017): https://doi.org/10.1038/s41467-019-10148-6
Chen, B. et al. Gastroenterology 152, 2022–2036 (2017): https://doi.org/10.1053/j.gastro.2017.02.039
Zeng, W. Z. D., Glicksberg, B. S., Li, Y. & Chen, B. BMC Med. Genomics 12, 21 (2019): https://doi.org/10.1186/s12920-018-0463-6
Liu, K. et al. Nat. Commun. 10, 2138 (2019): https://doi.org/10.1038/s41467-019-10148-6
(a) Disease sample selection, (b) control sample selection, (c) drug prediction job submission, (e) job management, (f) predicted drug list and (g) result files.
About this article
Cite this article
Zeng, B., Glicksberg, B.S., Newbury, P. et al. OCTAD: an open workspace for virtually screening therapeutics targeting precise cancer patient groups using gene expression features. Nat Protoc 16, 728–753 (2021). https://doi.org/10.1038/s41596-020-00430-z