Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Van Allen, E.M. et al. Nat. Med. 20, 682–688 (2014).
Forbes, S.A. et al. Nucleic Acids Res. 43, D805–D811 (2015).
Zhang, J. et al. Database (Oxford) 2011, bar026 (2011).
Yeh, P. et al. Clin. Cancer Res. 19, 1894–1901 (2013).
Dienstmann, R. et al. Mol. Oncol. 8, 859–873 (2014).
MacConaill, L.E. et al. J. Mol. Diagn. 16, 660–672 (2014).
The authors gratefully acknowledge L. Trani, J. Hodges, and A. Wollam for efforts in manual review. T. Ley, R. Bose, R. Govindan, and S. Devarakonda provided valuable curation input. D. Larson provided valuable analysis input. M.G. was supported by the National Human Genome Research Institute (NIH NHGRI K99HG007940). O.L.G. was supported by the National Cancer Institute (NIH NCI K22CA188163). This work was supported by a grant to R.K.W. from the National Human Genome Research Institute (NIH NHGRI U54HG003079).
The authors declare no competing financial interests.
Integrated supplementary information
(a) Outline of criteria to curate a variant. Variants are evaluated for inclusion and then curated elements are identified. (b) Summary of current DoCM contents. DoCM contains SNSs and indels across many cancer subtypes with easy identification of the journal article that outlines the variant's relevance. (c) Screenshot of the DoCM web application available at http://docm.info. (d) Illustration of the API. An HTTP GET request for a variety of parameters including gene, chromosome, position etc. and returns a JSON response with the PubMed ids, diseases and other useful information. The API is thoroughly documented at http://docm.genome.wustl.edu/api.
In the batch submission form, users can enter all the parameters necessary for inclusion into DoCM, including the name of the batch, the rationale statement outlining the reason for including the variants and curation details, any relevant urls, tags to be applied to the whole batch, the TSV file with variants and submitter information. Following submission the user will be given a link to review the batch and any messages from moderators.
Once a batch has been submitted, it can be reviewed in the password protected moderator queue. A listing of current DoCM moderators can be viewed at http://docm.genome.wustl.edu/about. Moderators can select a batch, such as the Drug Gene Knowledge Database highlighted in purple above, to review the batch. Once multiple batches have been accepted a moderator can create a new DoCM version using the blue button at the bottom-right of the screen. The “Drug Gene Knowledge Database” link is highlighted in purple as it is the subject of Supplementary Figure 4.
A moderator can review all information submitted with a batch and evaluate whether it fits the scope and quality requirements of DoCM. Individual variants can be accepted or rejected and the moderator can leave a message to the submitter.
Searching PubMed with the search term “Cancer” yields the number of papers relating to cancer per year. This serves as an upper-bound limit of the number of papers that need to be curated to accurately summarize important cancer variants. There is a need for public resources that reduce the duplication of curation effort.
An anecdotal example of the curation involved for the variant BRAF V600E is shown. Typically the literature only lists the gene and amino acid change (purple in the figure), requiring extensive curation to uniquely identify the variant. Correct genomic coordinates on a consistent genome build need to be identified, with accompanying nucleotide and strand information. Occasionally there are multiple nucleotide changes that are synonymous with a particular amino acid change. A representative transcript that correctly models the variant described in the literature also needs to be specified. Cancer subtypes are specified using the disease ontology nomenclature. Green boxes note the class of information that needs to be captured in DoCM, black boxes show the subtype of each class, and white boxes denote the value.
(a) Outline of the manual review strategy. DoCM sites with two or more reads of support are evaluated for obvious errors. (b) Summary of the variants that passed manual review and were not identified in the original TCGA analyses. (c) Summary of the variants that were validated in the 93 validation samples. (d) Comparison of DoCM-MSRV to ClinSek and the Bayesian classifier.
Heatmap illustrating the coverage obtained at all target sites in validation sequencing. Bar graphs on the x and y-axes illustrate the mean coverage at each case/position.
Variant allele fraction plot illustrating the types of variants identified through manual review that validated. Variants called in the original TCGA study are highlighted in blue and those missed are in green. Note that TCGA was unable to call variants below ∼10% VAF while the MSRV approach was able to recover many such variants. Density plots on the x and y-axes show the distribution of tumor VAF and coverage depth for validated variants respectively.
Supplementary Figures 1–9, Supplementary Tables 1–4, Supplementary Methods and Supplementary Results (PDF 2630 kb)
DoCM read-count data and manual review calls for all TCGA samples. (ZIP 9091 kb)
DoCM read-count data and manual review calls for all Validation samples. (ZIP 1052 kb)
About this article
Systematic domain-based aggregation of protein structures highlights DNA-, RNA- and other ligand-binding positions
Nucleic Acids Research (2019)
JCO Precision Oncology (2019)
Briefings in Bioinformatics (2019)
PLOS Computational Biology (2019)
International Journal of Clinical Oncology (2019)