Abstract
Tumors from individuals with cancer are frequently genetically profiled for information about the driving forces behind the disease. We present the CancerMine resource, a text-mined and routinely updated database of drivers, oncogenes and tumor suppressors in different types of cancer. All data are available online (http://bionlp.bcgsc.ca/cancermine) and downloadable under a Creative Commons Zero license for ease of use.
Your institute does not have access to this article
Relevant articles
Open Access articles citing this article.
-
Single-cell DNA sequencing identifies risk-associated clonal complexity and evolutionary trajectories in childhood medulloblastoma development
Acta Neuropathologica Open Access 13 July 2022
-
Pan-cancer pervasive upregulation of 3′ UTR splicing drives tumourigenesis
Nature Cell Biology Open Access 26 May 2022
-
An annotated dataset for extracting gene-melanoma relations from scientific literature
Journal of Biomedical Semantics Open Access 19 January 2022
Access options
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
$29.99
monthly
Subscribe to Journal
Get full journal access for 1 year
$119.00
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Buy article
Get time limited or full article access on ReadCube.
$32.00
All prices are NET prices.


Data availability
The data can be viewed and downloaded through the online viewer (https://github.com/jakelever/cancermine). The February 2019 CancerMine release was used for this analysis (https://doi.org/10.5281/zenodo.2557358). All releases can be found at https://doi.org/10.5281/zenodo.1156241.
Code availability
All code for text mining and the analysis in this paper are available in the Github repository (https://github.com/jakelever/cancermine). The specific code release is archived in Zenodo (https://doi.org/10.5281/zenodo.2586207).
References
Radtke, F. & Raj, K. Nat. Rev. Cancer 3, 756 (2003).
Kristensen, V. N. et al. Nat. Rev. Cancer 14, 299–313 (2014).
Zender, L. et al. Cell 135, 852–864 (2008).
Futreal, P. A. et al. Nat. Rev. Cancer 4, 177 (2004).
Repana, D. et al. Genome Biol. 20, 1 (2019).
Gonzalez-Perez, A. et al. Nat. Methods 10, 1081 (2013).
Liu, Y., Sun, J. & Zhao, M. J. Genet. Genom. 44, 119–121 (2017).
Zhao, M., Kim, P., Mitra, R., Zhao, J. & Zhao, Z. Nucleic Acids Res. 44, D1023–D1031 (2015).
Griffith, M. et al. Nat. Genet. 49, 170 (2017).
Chun, H.-W. et al. Pac. Symp. Biocomput. 2006, 4–15 (2006).
Singhal, A., Simmons, M. & Lu, Z. PLoS Comput. Biol. 12, e1005017 (2016).
Lever, J. & Jones, S. BioNLP 2017, 176–183 (2017).
Comeau, D. C. et al. Database 2013, bat064 (2013).
Kibbe, W. A. et al. Nucleic Acids Res. 43, D1071–D1078 (2014).
Maglott, D. et al. Nucleic Acids Res. 39 (Suppl. 1), D52–D57 (2010).
Bodenreider, O. Nucleic Acids Res. 32 (Suppl. 1), D267–D270 (2004).
Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. Bioinformatics 34, 1600–1602 (2017).
Acknowledgements
J.L. was supported by a Vanier Canada Graduate Scholarship. Funding for S.J.M.J. and M.R.J. was provided through the Personalized Oncogenomics (POG) program, which is generously supported by the BC Cancer Foundation and Genome British Columbia (project B20POG). The authors would like to thank Compute Canada for the use of computational infrastructure for this research.
Author information
Authors and Affiliations
Contributions
J.L., M.R.J. and S.J.M.J. conceived the idea. J.L. implemented the software and carried out the analysis. J.L., E.Y.Z. and J.G. annotated the sentence data. All authors contributed to the writing of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Fig. 1 Sources of the text-mined associations from within full-text articles.
The link between a cancer gene association’s novelty to CancerMine and the location within the paper where it is founded is shown. Mentions in the Introduction are much more likely to be not novel compared to other sections.
Supplementary Fig. 2 Sources of text-mined associations across years.
The mentions of drivers, oncogenes and tumor suppressors are increasingly being extracted from the main section of full-text articles. Please note that the minor dip in 2018 numbers (compared to 2017) is due to many 2018 papers only becoming accessible for text-mining later into 2019.
Supplementary Fig. 3 Comparison against other resources using a lower-threshold classifier.
To explore the effect of high-precision low-recall classifier, we show a comparison without the strict thresholding. All classifiers use 0.5 instead of the higher thresholds. There is still not substantial overlap with the CGC and IntOGen resources suggesting that many of the gene associations in these databases are not mentioned in the literature in a form that can be extracted by CancerMine.
Supplementary Fig. 4 Validation of cancer profiles using TCGA somatic mutation data.
All samples in seven TCGA projects are analyzed for likely loss-of-function mutations compared with the CancerMine tumor suppressor profiles and matched with the closest profile. Percentages shown in each cell are the proportion of samples labeled with each CancerMine profile that are from the different TCGA projects. Samples that match no tumor suppressor in these profiles or are ambiguous are assigned to none. The TCGA projects are breast cancer (BRCA), colorectal adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), prostate adenocarcinoma (PRAD), low grade glioma (LGG), lung adenocarcinoma (LUAD) and stomach adenocarcinoma (STAD).
Supplementary Information
Supplementary Information
Supplementary Figs. 1–4 and Supplementary Tables 1–7
Source data
Rights and permissions
About this article
Cite this article
Lever, J., Zhao, E.Y., Grewal, J. et al. CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer. Nat Methods 16, 505–507 (2019). https://doi.org/10.1038/s41592-019-0422-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-019-0422-y
Further reading
-
An annotated dataset for extracting gene-melanoma relations from scientific literature
Journal of Biomedical Semantics (2022)
-
The importance of enhancer methylation for epigenetic regulation of tumorigenesis in squamous lung cancer
Experimental & Molecular Medicine (2022)
-
3′untranslated regions of tumor suppressor genes evolved specific features to favor cancer resistance
Oncogene (2022)
-
Single-cell DNA sequencing identifies risk-associated clonal complexity and evolutionary trajectories in childhood medulloblastoma development
Acta Neuropathologica (2022)
-
Long Non-coding RNA and mRNA Co-expression Network Reveals Novel Players in Pleomorphic Xanthoastrocytoma
Molecular Neurobiology (2022)