Abstract
Tumors from individuals with cancer are frequently genetically profiled for information about the driving forces behind the disease. We present the CancerMine resource, a text-mined and routinely updated database of drivers, oncogenes and tumor suppressors in different types of cancer. All data are available online (http://bionlp.bcgsc.ca/cancermine) and downloadable under a Creative Commons Zero license for ease of use.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Network embedding framework for driver gene discovery by combining functional and structural information
BMC Genomics Open Access 29 July 2023
-
Genome-wide DNA methylation profiling of HPV-negative leukoplakia and gingivobuccal complex cancers
Clinical Epigenetics Open Access 27 May 2023
-
Integrative competing endogenous RNA network analyses identify novel lncRNA and genes implicated in metastatic breast cancer
Scientific Reports Open Access 10 February 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
The data can be viewed and downloaded through the online viewer (https://github.com/jakelever/cancermine). The February 2019 CancerMine release was used for this analysis (https://doi.org/10.5281/zenodo.2557358). All releases can be found at https://doi.org/10.5281/zenodo.1156241.
Code availability
All code for text mining and the analysis in this paper are available in the Github repository (https://github.com/jakelever/cancermine). The specific code release is archived in Zenodo (https://doi.org/10.5281/zenodo.2586207).
References
Radtke, F. & Raj, K. Nat. Rev. Cancer 3, 756 (2003).
Kristensen, V. N. et al. Nat. Rev. Cancer 14, 299–313 (2014).
Zender, L. et al. Cell 135, 852–864 (2008).
Futreal, P. A. et al. Nat. Rev. Cancer 4, 177 (2004).
Repana, D. et al. Genome Biol. 20, 1 (2019).
Gonzalez-Perez, A. et al. Nat. Methods 10, 1081 (2013).
Liu, Y., Sun, J. & Zhao, M. J. Genet. Genom. 44, 119–121 (2017).
Zhao, M., Kim, P., Mitra, R., Zhao, J. & Zhao, Z. Nucleic Acids Res. 44, D1023–D1031 (2015).
Griffith, M. et al. Nat. Genet. 49, 170 (2017).
Chun, H.-W. et al. Pac. Symp. Biocomput. 2006, 4–15 (2006).
Singhal, A., Simmons, M. & Lu, Z. PLoS Comput. Biol. 12, e1005017 (2016).
Lever, J. & Jones, S. BioNLP 2017, 176–183 (2017).
Comeau, D. C. et al. Database 2013, bat064 (2013).
Kibbe, W. A. et al. Nucleic Acids Res. 43, D1071–D1078 (2014).
Maglott, D. et al. Nucleic Acids Res. 39 (Suppl. 1), D52–D57 (2010).
Bodenreider, O. Nucleic Acids Res. 32 (Suppl. 1), D267–D270 (2004).
Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. Bioinformatics 34, 1600–1602 (2017).
Acknowledgements
J.L. was supported by a Vanier Canada Graduate Scholarship. Funding for S.J.M.J. and M.R.J. was provided through the Personalized Oncogenomics (POG) program, which is generously supported by the BC Cancer Foundation and Genome British Columbia (project B20POG). The authors would like to thank Compute Canada for the use of computational infrastructure for this research.
Author information
Authors and Affiliations
Contributions
J.L., M.R.J. and S.J.M.J. conceived the idea. J.L. implemented the software and carried out the analysis. J.L., E.Y.Z. and J.G. annotated the sentence data. All authors contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Fig. 1 Sources of the text-mined associations from within full-text articles.
The link between a cancer gene association’s novelty to CancerMine and the location within the paper where it is founded is shown. Mentions in the Introduction are much more likely to be not novel compared to other sections.
Supplementary Fig. 2 Sources of text-mined associations across years.
The mentions of drivers, oncogenes and tumor suppressors are increasingly being extracted from the main section of full-text articles. Please note that the minor dip in 2018 numbers (compared to 2017) is due to many 2018 papers only becoming accessible for text-mining later into 2019.
Supplementary Fig. 3 Comparison against other resources using a lower-threshold classifier.
To explore the effect of high-precision low-recall classifier, we show a comparison without the strict thresholding. All classifiers use 0.5 instead of the higher thresholds. There is still not substantial overlap with the CGC and IntOGen resources suggesting that many of the gene associations in these databases are not mentioned in the literature in a form that can be extracted by CancerMine.
Supplementary Fig. 4 Validation of cancer profiles using TCGA somatic mutation data.
All samples in seven TCGA projects are analyzed for likely loss-of-function mutations compared with the CancerMine tumor suppressor profiles and matched with the closest profile. Percentages shown in each cell are the proportion of samples labeled with each CancerMine profile that are from the different TCGA projects. Samples that match no tumor suppressor in these profiles or are ambiguous are assigned to none. The TCGA projects are breast cancer (BRCA), colorectal adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), prostate adenocarcinoma (PRAD), low grade glioma (LGG), lung adenocarcinoma (LUAD) and stomach adenocarcinoma (STAD).
Supplementary Information
Supplementary Information
Supplementary Figs. 1–4 and Supplementary Tables 1–7
Source data
Rights and permissions
About this article
Cite this article
Lever, J., Zhao, E.Y., Grewal, J. et al. CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer. Nat Methods 16, 505–507 (2019). https://doi.org/10.1038/s41592-019-0422-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-019-0422-y
This article is cited by
-
Network embedding framework for driver gene discovery by combining functional and structural information
BMC Genomics (2023)
-
Genome-wide DNA methylation profiling of HPV-negative leukoplakia and gingivobuccal complex cancers
Clinical Epigenetics (2023)
-
Post-translational modifications reshape the antigenic landscape of the MHC I immunopeptidome in tumors
Nature Biotechnology (2023)
-
Single-cell RNA binding protein regulatory network analyses reveal oncogenic HNRNPK-MYC signalling pathway in cancer
Communications Biology (2023)
-
Integrative competing endogenous RNA network analyses identify novel lncRNA and genes implicated in metastatic breast cancer
Scientific Reports (2023)