Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer

Abstract

Tumors from individuals with cancer are frequently genetically profiled for information about the driving forces behind the disease. We present the CancerMine resource, a text-mined and routinely updated database of drivers, oncogenes and tumor suppressors in different types of cancer. All data are available online (http://bionlp.bcgsc.ca/cancermine) and downloadable under a Creative Commons Zero license for ease of use.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Performance of the relation classifier and frequently extracted genes and cancers.
Fig. 2: Comparison to other resources and cancer-type clustering by gene role.

Similar content being viewed by others

Data availability

The data can be viewed and downloaded through the online viewer (https://github.com/jakelever/cancermine). The February 2019 CancerMine release was used for this analysis (https://doi.org/10.5281/zenodo.2557358). All releases can be found at https://doi.org/10.5281/zenodo.1156241.

Code availability

All code for text mining and the analysis in this paper are available in the Github repository (https://github.com/jakelever/cancermine). The specific code release is archived in Zenodo (https://doi.org/10.5281/zenodo.2586207).

References

  1. Radtke, F. & Raj, K. Nat. Rev. Cancer 3, 756 (2003).

    Article  CAS  Google Scholar 

  2. Kristensen, V. N. et al. Nat. Rev. Cancer 14, 299–313 (2014).

    Article  CAS  Google Scholar 

  3. Zender, L. et al. Cell 135, 852–864 (2008).

    Article  CAS  Google Scholar 

  4. Futreal, P. A. et al. Nat. Rev. Cancer 4, 177 (2004).

    Article  CAS  Google Scholar 

  5. Repana, D. et al. Genome Biol. 20, 1 (2019).

    Article  Google Scholar 

  6. Gonzalez-Perez, A. et al. Nat. Methods 10, 1081 (2013).

    Article  CAS  Google Scholar 

  7. Liu, Y., Sun, J. & Zhao, M. J. Genet. Genom. 44, 119–121 (2017).

    Article  Google Scholar 

  8. Zhao, M., Kim, P., Mitra, R., Zhao, J. & Zhao, Z. Nucleic Acids Res. 44, D1023–D1031 (2015).

    Article  Google Scholar 

  9. Griffith, M. et al. Nat. Genet. 49, 170 (2017).

    Article  CAS  Google Scholar 

  10. Chun, H.-W. et al. Pac. Symp. Biocomput. 2006, 4–15 (2006).

    Google Scholar 

  11. Singhal, A., Simmons, M. & Lu, Z. PLoS Comput. Biol. 12, e1005017 (2016).

    Article  Google Scholar 

  12. Lever, J. & Jones, S. BioNLP 2017, 176–183 (2017).

    Google Scholar 

  13. Comeau, D. C. et al. Database 2013, bat064 (2013).

    Article  Google Scholar 

  14. Kibbe, W. A. et al. Nucleic Acids Res. 43, D1071–D1078 (2014).

    Article  Google Scholar 

  15. Maglott, D. et al. Nucleic Acids Res. 39 (Suppl. 1), D52–D57 (2010).

  16. Bodenreider, O. Nucleic Acids Res. 32 (Suppl. 1), D267–D270 (2004).

  17. Galili, T., O’Callaghan, A., Sidi, J. & Sievert, C. Bioinformatics 34, 1600–1602 (2017).

    Article  Google Scholar 

Download references

Acknowledgements

J.L. was supported by a Vanier Canada Graduate Scholarship. Funding for S.J.M.J. and M.R.J. was provided through the Personalized Oncogenomics (POG) program, which is generously supported by the BC Cancer Foundation and Genome British Columbia (project B20POG). The authors would like to thank Compute Canada for the use of computational infrastructure for this research.

Author information

Authors and Affiliations

Authors

Contributions

J.L., M.R.J. and S.J.M.J. conceived the idea. J.L. implemented the software and carried out the analysis. J.L., E.Y.Z. and J.G. annotated the sentence data. All authors contributed to the writing of the manuscript.

Corresponding author

Correspondence to Steven J. M. Jones.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Fig. 1 Sources of the text-mined associations from within full-text articles.

The link between a cancer gene association’s novelty to CancerMine and the location within the paper where it is founded is shown. Mentions in the Introduction are much more likely to be not novel compared to other sections.

Supplementary Fig. 2 Sources of text-mined associations across years.

The mentions of drivers, oncogenes and tumor suppressors are increasingly being extracted from the main section of full-text articles. Please note that the minor dip in 2018 numbers (compared to 2017) is due to many 2018 papers only becoming accessible for text-mining later into 2019.

Supplementary Fig. 3 Comparison against other resources using a lower-threshold classifier.

To explore the effect of high-precision low-recall classifier, we show a comparison without the strict thresholding. All classifiers use 0.5 instead of the higher thresholds. There is still not substantial overlap with the CGC and IntOGen resources suggesting that many of the gene associations in these databases are not mentioned in the literature in a form that can be extracted by CancerMine.

Supplementary Fig. 4 Validation of cancer profiles using TCGA somatic mutation data.

All samples in seven TCGA projects are analyzed for likely loss-of-function mutations compared with the CancerMine tumor suppressor profiles and matched with the closest profile. Percentages shown in each cell are the proportion of samples labeled with each CancerMine profile that are from the different TCGA projects. Samples that match no tumor suppressor in these profiles or are ambiguous are assigned to none. The TCGA projects are breast cancer (BRCA), colorectal adenocarcinoma (COAD), liver hepatocellular carcinoma (LIHC), prostate adenocarcinoma (PRAD), low grade glioma (LGG), lung adenocarcinoma (LUAD) and stomach adenocarcinoma (STAD).

Supplementary Information

Supplementary Information

Supplementary Figs. 1–4 and Supplementary Tables 1–7

Reporting Summary

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lever, J., Zhao, E.Y., Grewal, J. et al. CancerMine: a literature-mined resource for drivers, oncogenes and tumor suppressors in cancer. Nat Methods 16, 505–507 (2019). https://doi.org/10.1038/s41592-019-0422-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0422-y

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer