DoCM: a database of curated mutations in cancer

Article metrics

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Putting DoCM in the context of other resources.

References

  1. 1

    Van Allen, E.M. et al. Nat. Med. 20, 682–688 (2014).

  2. 2

    Forbes, S.A. et al. Nucleic Acids Res. 43, D805–D811 (2015).

  3. 3

    Zhang, J. et al. Database (Oxford) 2011, bar026 (2011).

  4. 4

    Yeh, P. et al. Clin. Cancer Res. 19, 1894–1901 (2013).

  5. 5

    Dienstmann, R. et al. Mol. Oncol. 8, 859–873 (2014).

  6. 6

    MacConaill, L.E. et al. J. Mol. Diagn. 16, 660–672 (2014).

Download references

Acknowledgements

The authors gratefully acknowledge L. Trani, J. Hodges, and A. Wollam for efforts in manual review. T. Ley, R. Bose, R. Govindan, and S. Devarakonda provided valuable curation input. D. Larson provided valuable analysis input. M.G. was supported by the National Human Genome Research Institute (NIH NHGRI K99HG007940). O.L.G. was supported by the National Cancer Institute (NIH NCI K22CA188163). This work was supported by a grant to R.K.W. from the National Human Genome Research Institute (NIH NHGRI U54HG003079).

Author information

Correspondence to Malachi Griffith or Obi L Griffith.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Overview of DoCM resource

(a) Outline of criteria to curate a variant. Variants are evaluated for inclusion and then curated elements are identified. (b) Summary of current DoCM contents. DoCM contains SNSs and indels across many cancer subtypes with easy identification of the journal article that outlines the variant's relevance. (c) Screenshot of the DoCM web application available at http://docm.info. (d) Illustration of the API. An HTTP GET request for a variety of parameters including gene, chromosome, position etc. and returns a JSON response with the PubMed ids, diseases and other useful information. The API is thoroughly documented at http://docm.genome.wustl.edu/api.

Supplementary Figure 2 Screenshot of DoCM batch submission form.

In the batch submission form, users can enter all the parameters necessary for inclusion into DoCM, including the name of the batch, the rationale statement outlining the reason for including the variants and curation details, any relevant urls, tags to be applied to the whole batch, the TSV file with variants and submitter information. Following submission the user will be given a link to review the batch and any messages from moderators.

Supplementary Figure 3 Screenshot of moderators view of the submitted batches queue.

Once a batch has been submitted, it can be reviewed in the password protected moderator queue. A listing of current DoCM moderators can be viewed at http://docm.genome.wustl.edu/about. Moderators can select a batch, such as the Drug Gene Knowledge Database highlighted in purple above, to review the batch. Once multiple batches have been accepted a moderator can create a new DoCM version using the blue button at the bottom-right of the screen. The “Drug Gene Knowledge Database” link is highlighted in purple as it is the subject of Supplementary Figure 4.

Supplementary Figure 4 Screenshot of moderator review page.

A moderator can review all information submitted with a batch and evaluate whether it fits the scope and quality requirements of DoCM. Individual variants can be accepted or rejected and the moderator can leave a message to the submitter.

Supplementary Figure 5 Number of papers in PubMed indexed by “Cancer” per year.

Searching PubMed with the search term “Cancer” yields the number of papers relating to cancer per year. This serves as an upper-bound limit of the number of papers that need to be curated to accurately summarize important cancer variants. There is a need for public resources that reduce the duplication of curation effort.

Supplementary Figure 6 Overview of variant curation for entry into DoCM

An anecdotal example of the curation involved for the variant BRAF V600E is shown. Typically the literature only lists the gene and amino acid change (purple in the figure), requiring extensive curation to uniquely identify the variant. Correct genomic coordinates on a consistent genome build need to be identified, with accompanying nucleotide and strand information. Occasionally there are multiple nucleotide changes that are synonymous with a particular amino acid change. A representative transcript that correctly models the variant described in the literature also needs to be specified. Cancer subtypes are specified using the disease ontology nomenclature. Green boxes note the class of information that needs to be captured in DoCM, black boxes show the subtype of each class, and white boxes denote the value.

Supplementary Figure 7 Overview of analysis and validation sequencing of four TCGA projects

(a) Outline of the manual review strategy. DoCM sites with two or more reads of support are evaluated for obvious errors. (b) Summary of the variants that passed manual review and were not identified in the original TCGA analyses. (c) Summary of the variants that were validated in the 93 validation samples. (d) Comparison of DoCM-MSRV to ClinSek and the Bayesian classifier.

Supplementary Figure 8 Coverage of the custom capture validation sequencing

Heatmap illustrating the coverage obtained at all target sites in validation sequencing. Bar graphs on the x and y-axes illustrate the mean coverage at each case/position.

Supplementary Figure 9 Overview of validation sequencing results.

Variant allele fraction plot illustrating the types of variants identified through manual review that validated. Variants called in the original TCGA study are highlighted in blue and those missed are in green. Note that TCGA was unable to call variants below 10% VAF while the MSRV approach was able to recover many such variants. Density plots on the x and y-axes show the distribution of tumor VAF and coverage depth for validated variants respectively.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9, Supplementary Tables 1–4, Supplementary Methods and Supplementary Results (PDF 2630 kb)

Supplementary Data 1

DoCM read-count data and manual review calls for all TCGA samples. (ZIP 9091 kb)

Supplementary Data 2

DoCM read-count data and manual review calls for all Validation samples. (ZIP 1052 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading