Tumors from individuals with cancer are frequently genetically profiled for information about the driving forces behind the disease. We present the CancerMine resource, a text-mined and routinely updated database of drivers, oncogenes and tumor suppressors in different types of cancer. All data are available online (http://bionlp.bcgsc.ca/cancermine) and downloadable under a Creative Commons Zero license for ease of use.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Radtke, F. & Raj, K. Nat. Rev. Cancer 3, 756 (2003).
Kristensen, V. N. et al. Nat. Rev. Cancer 14, 299–313 (2014).
Zender, L. et al. Cell 135, 852–864 (2008).
Futreal, P. A. et al. Nat. Rev. Cancer 4, 177 (2004).
Repana, D. et al. Genome Biol. 20, 1 (2019).
Gonzalez-Perez, A. et al. Nat. Methods 10, 1081 (2013).
Liu, Y., Sun, J. & Zhao, M. J. Genet. Genom. 44, 119–121 (2017).
Zhao, M., Kim, P., Mitra, R., Zhao, J. & Zhao, Z. Nucleic Acids Res. 44, D1023–D1031 (2015).
Griffith, M. et al. Nat. Genet. 49, 170 (2017).
Chun, H.-W. et al. Pac. Symp. Biocomput. 2006, 4–15 (2006).
Singhal, A., Simmons, M. & Lu, Z. PLoS Comput. Biol. 12, e1005017 (2016).
Lever, J. & Jones, S. BioNLP 2017, 176–183 (2017).
Comeau, D. C. et al. Database 2013, bat064 (2013).
Kibbe, W. A. et al. Nucleic Acids Res. 43, D1071–D1078 (2014).
Maglott, D. et al. Nucleic Acids Res. 39 (Suppl. 1), D52–D57 (2010).
Bodenreider, O. Nucleic Acids Res. 32 (Suppl. 1), D267–D270 (2004).
Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. Bioinformatics 34, 1600–1602 (2017).
J.L. was supported by a Vanier Canada Graduate Scholarship. Funding for S.J.M.J. and M.R.J. was provided through the Personalized Oncogenomics (POG) program, which is generously supported by the BC Cancer Foundation and Genome British Columbia (project B20POG). The authors would like to thank Compute Canada for the use of computational infrastructure for this research.
Integrated supplementary information
The link between a cancer gene association’s novelty to CancerMine and the location within the paper where it is founded is shown. Mentions in the Introduction are much more likely to be not novel compared to other sections.
The mentions of drivers, oncogenes and tumor suppressors are increasingly being extracted from the main section of full-text articles. Please note that the minor dip in 2018 numbers (compared to 2017) is due to many 2018 papers only becoming accessible for text-mining later into 2019.
To explore the effect of high-precision low-recall classifier, we show a comparison without the strict thresholding. All classifiers use 0.5 instead of the higher thresholds. There is still not substantial overlap with the CGC and IntOGen resources suggesting that many of the gene associations in these databases are not mentioned in the literature in a form that can be extracted by CancerMine.
All samples in seven TCGA projects are analyzed for likely loss-of-function mutations compared with the CancerMine tumor suppressor profiles and matched with the closest profile. Percentages shown in each cell are the proportion of samples labeled with each CancerMine profile that are from the different TCGA projects. Samples that match no tumor suppressor in these profiles or are ambiguous are assigned to none. The TCGA projects are breast cancer (BRCA), colorectal adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), prostate adenocarcinoma (PRAD), low grade glioma (LGG), lung adenocarcinoma (LUAD) and stomach adenocarcinoma (STAD).