CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer

Journal name:
Nature Genetics
Volume:
49,
Pages:
170–174
Year published:
DOI:
doi:10.1038/ng.3774
Published online

Abstract

CIViC is an expert-crowdsourced knowledgebase for Clinical Interpretation of Variants in Cancer describing the therapeutic, prognostic, diagnostic and predisposing relevance of inherited and somatic variants of all types. CIViC is committed to open-source code, open-access content, public application programming interfaces (APIs) and provenance of supporting evidence to allow for the transparent creation of current and accurate variant interpretations for use in cancer precision medicine.

At a glance

Figures

  1. Contribution of CIViC to the precision cancer treatment cycle.
    Figure 1: Contribution of CIViC to the precision cancer treatment cycle.

    The diagram summarizes how research, clinical treatment and CIViC knowledge curation are interrelated. The CIViC knowledgebase aims to develop clinical interpretations for raw cancer variant observations stored in large variant databases (gray). Each CIViC variant interpretation is based on published evidence and leverages complementary knowledgebases and ontologies wherever possible (yellow). The precision medicine clinical treatment cycle (blue) and research cycle (green) both involve sampling, sequencing, analysis, interpretation, intervention, evaluation and publication. These cycles start with hypothesis generation, followed by research projects or clinical trials, and dissemination of their findings. Examples of how each stage specifically relates to or benefits from the CIViC resource are represented by 'persona' icons for the four types of CIViC stakeholders: research scientists (green), clinical scientists (blue), patient advocates (orange) and developers (red). Each is accompanied by a brief description of a possible research, clinical, outreach or software development action. In the center of the diagram, key features of the CIViC interface and data model are summarized (purple). These include the roles and permissions of CIViC users, especially consumers of the content, curators and editors. Members of the CIViC community participate by adding, editing, discussing and approving individual evidence records and summaries that support the clinical interpretation of cancer variants. Anyone willing to log in may assume the role of curator, but contributions must be reviewed by expert editors before acceptance.

  2. CIViC interface overview
    Supplementary Fig. 1: CIViC interface overview

    The user-friendly CIViC interface is the primary point of contact with users whether they are consuming, editing or adding content. CIViC user-curated content (blue boxes) is visible without sign in and provides the bulk of visible content ordered from gene level (top) to variant level (middle), and finally individual evidence records (bottom). Curated content is enhanced by imported content and citations (orange boxes) that are linked directly to their original source. Website navigation and extensive documentation are highlighted with red boxes. Finally, a curator can interact (green boxes) with CIViC user-curated content by 1) suggesting changes (edit button) or adding content; 2) commenting on content or suggested revisions; 3) downloading content; or 4) viewing their activity, suggested changes, notifications, or profile.

  3. The CIViC data model
    Supplementary Fig. 2: The CIViC data model

    Key elements of the CIViC data model are listed below. Briefly, CIViC aims to provide gene and variant level executive summaries of the clinical relevance of specific variants. Multiple structured evidence records are first created and then synthesized to produce these executive variant/gene summaries. Each evidence record is associated with a specific variant and gene. Each evidence record also corresponds to a single clinical assertion for a single cancer type from a single peer-reviewed publication. One publication can be used to generate multiple evidence records. The evidence record consists of a free-form, human readable statement and several structured elements. The statement consists of a few sentences written by a curator to summarize the clinical relevance of a variant according to evidence described in a particular publication. The curator attempts to concisely summarize the clinical assertion being made by the publication, as well as the nature of the evidence supporting that assertion and any caveats the reader should be aware of. The curator must also assign values for each structured element by evaluating details from the publication. These elements include evidence type, clinical significance, evidence direction, and others. Where possible, structured ontologies are used in the CIViC data model (e.g. the disease ontology for disease names). Dark blue boxes refer to primary CIViC entities and light blue boxes refer to external data.

  4. Evidence level definitions and examples
    Supplementary Fig. 3: Evidence level definitions and examples

    Evidence levels defined in the CIViC data model are summarized below. Evidence levels are ordered A-E according to clinical utility (likelihood of relevance to a clinician reading a molecular report). A brief definition of each evidence level is provided along with an example obtained from www.civicdb.org. Updates to the CIViC data model (including to these evidence levels) will be maintained in the CIViC online documentation (https://civic.genome.wustl.edu/#/help/evidence). Additional examples of evidence records assigned to each evidence level can be obtained using the advanced search interface online: https://civic.genome.wustl.edu/#/search/evidence/.

  5. CIViC evidence classes and their relative potential to influence clinical actions and understanding of disease
    Supplementary Fig. 4: CIViC evidence classes and their relative potential to influence clinical actions and understanding of disease

    The following diagram attempts to order each combination of evidence level (A-E) and evidence type (predictive, prognostic, diagnostic, or predisposing) according to their potential clinical relevance and actionability. 'Clinical relevance' refers to the contribution of the variant to clinical understanding of the disease and 'actionability' refers to the ability to identify a specific clinical action for a specific variant. In this assessment, validated predictive variants tend to be the most relevant and actionable, while inferential diagnostic are the least relevant. In general, higher evidence levels are more actionable and predictive assertions exceed prognostic and diagnostic evidence for clinical utility. While CIViC is designed to capture both supporting (positive) and refuting (negative) evidence, the following is an assessment of the likely utility of supporting evidence only.

  6. CIViC database schema
    Supplementary Fig. 5: CIViC database schema

    A simplified schema representing the CIViC data model below provides all table names of the CIViC relational database (running on PostgreSQL). Polymorphic associations are used to relate core domain objects such as evidence records, genes, and variants to the tables that power on-site workflows like moderation and discussion. This allows for a significant reduction in the total number of tables required at the expense of database enforced foreign key constraints. In lieu of traditional foreign keys, validations in the application's business logic are used to enforce data integrity. Solid lines in the diagram indicate direct relationships in the database implemented by a local foreign key (for example, a variant has an evidence record identifier in the variants table, and thus a direct relationship). Dotted lines indicate relationships that exist indirectly (the relationship goes through an intermediate event with some conditions attached to it). For a complete schema including all fields and foreign key relationships, refer to the CIViC backend code repository: https://github.com/genome/civic-server.

  7. Usage statistics and growth of content
    Supplementary Fig. 6: Usage statistics and growth of content

    A) CIViC content as of December 2016. B) Tracking of evidence statements within CIViC over time with respective contributions of internal (Washington University, 'WashU') and external (community) curation. C) Treemap with box size illustrating the relative number of visits (sessions) to the CIViC website www.civicdb.org from specific external organizations and colored by the average session duration (in seconds). Sessions from our own institute are excluded from this summary. D) Map illustrating the location where sessions originated. The size of the circles indicate the amount of traffic from each city. Dark blue indicates visits from a dense cluster of cities that are close to each other. To date, CIViC has achieved 39,881 visits from 16,484 unique visitors from 2,507 cities in 125 countries around the world.

  8. Summary of current CIViC evidence records
    Supplementary Fig. 7: Summary of current CIViC evidence records

    The following panels briefly summarize CIViC evidence records at the time of publication. A) Total publications used in 1,703 evidence records, broken down by review status of the evidence record. Panels B-F further summarize these evidence records after excluding those that had a 'rejected' status (leaving 1,678 submitted or accepted evidence records). B) Evidence records broken down by evidence type and clinical significance. C) Evidence records broken down by evidence direction. D) Evidence records broken down by evidence trust rating. E) Evidence records broken down by evidence level. F) Evidence records broken down by variant origin.

  9. Summary of the most curated drugs and diseases in CIViC
    Supplementary Fig. 8: Summary of the most curated drugs and diseases in CIViC

    A summary of the drugs and diseases represented in CIViC evidence records ranked by the number of evidence records associated with each. A) The top 25 drugs were identified from 1,105 accepted or submitted evidence records of the predictive evidence type. The evidence records for these drugs are broken down by evidence level (left panel) and clinical significance (right panel). B) The top 25 cancer types (distinct disease ontology terms) were identified from all 1,678 accepted or submitted evidence records. The evidence records for these diseases are broken down by evidence level (left panel) and evidence type (right panel).

  10. CIViC evidence records summarized by literature sources
    Supplementary Fig. 9: CIViC evidence records summarized by literature sources

    The published literature used to create all CIViC evidence records are summarized below. A total of 1,678 accepted or submitted evidence records were derived from 1,077 peer-reviewed publications. A) A histogram summarizing articles used in CIViC evidence records broken down by year of publication (and further divided according to their open versus closed access status). B) A histogram showing the distribution of number of evidence records obtained from single publications. Most publications yield only a single evidence record, but as many as 38 have been obtained from a single paper. C) Evidence records obtained from the top 25 journals most commonly mined in CIViC are summarized and broken down by evidence star rating on the left. The same evidence records are broken down by the evidence type on the right.

  11. The collaborative process and user roles in creating evidence
    Supplementary Fig. 10: The collaborative process and user roles in creating evidence

    CIViC consists of an online web resource whose target audience is an international community of cancer researchers, clinicians, and patient advocates. Participants in CIViC fall into various categories with increasing privileges or capabilities in the interface. The first category and most basic level of user is that of 'consumer'. Consumers may view, download and programmatically (via API) access all of the content of CIViC under the terms of the Creative Commons Public Domain Dedication license (CC0). No login is required to use CIViC. No requirement to login, fees, or other encumbrances will be introduced in future versions of CIViC. Consumers may not add, approve, edit, or discuss revisions of content in CIViC. The second category of users includes all those roles that do permit modification and discussion in the site: 'curators', 'editors', and 'administrators'. 'Curators' may add new evidence records describing clinical relevance of variants, add or improve variant/gene summaries, and discuss existing content. While comments/discussion are automatically accepted, additions and revisions to existing content are initially entered in a pending state and must be approved prior to acceptance in CIViC. Rejected content is not deleted and may be revived after further discussion and revision. Editors have the additional capability to approve or reject additions and revisions of content. However, an editor cannot approve their own submissions or revisions, meaning that all content in CIViC must be created in collaboration between at least two members of the community. Editors are selected by a committee of existing editors, based on direct knowledge of the editor's expertise or by promotion from curator after demonstrating extensive high quality contributions to CIViC. Finally, administrators have the abilities of editors but may also change user roles and use advanced site management utilities (e.g. merging duplicate records).

  12. Screenshot of the editor view for a submitted evidence record
    Supplementary Fig. 11: Screenshot of the editor view for a submitted evidence record

    Every new evidence record and any revision of existing content in CIViC must be approved by at least one independent editor prior to acceptance. The following screenshot shows a new evidence record submitted by a curator that is awaiting review by an editor. The following URL will display the live version of this example: https://civic.genome.wustl.edu/links/variants/34

  13. Screenshot of the editor view for a pending revision
    Supplementary Fig. 12: Screenshot of the editor view for a pending revision

    After proposing a revision to existing content, a contributor is presented with a summary of the fields they are proposing to modify. An independent editor must approve these revisions before they are displayed in the canonical CIViC results (the web interface and API).

  14. Screenshot of a complex evidence query
    Supplementary Fig. 13: Screenshot of a complex evidence query

    CIViC has an advanced search interface that currently supports complex queries for evidence records and variants. An arbitrary number of query conditions can be set and the query can be configured to match any one, or all of these conditions. Evidence records can be queried by sixteen variables including disease, variant name, publication ID, evidence type, evidence level, trust rating, curator name, etc. In the following screenshot, the advanced search interface is being used to retrieve all evidence records that correspond to variants involving the gene ALK, where the evidence type is 'Predictive', and the drug involved is alectenib. From this query, 13 evidence records are returned and sorted according to their quality level (evidence level, and trust rating). The standard CIViC evidence datagrid is used to display a summary of the 13 evidence records including: evidence identifier (EID), gene name, variant name, evidence statement (DESC), cancer type (DIS), drugs, evidence level (EL), evidence type (ET), evidence direction (ED), clinical significance (CS), variant origin (VO), and evidence trust rating (TR). The 'Help' button provides a comprehensive legend of all abbreviations, symbols, and colors used to encode information in the evidence record summary. Clicking any row will take the user to the comprehensive display for that evidence record. Every advanced search generates a unique URL that can be used generate an updated result or easily share the result with a colleague. For example: https://civic.genome.wustl.edu/#/search/evidence/fbf0df08-0211-4e55-b4e7-d103d76d0b59.

  15. Screenshot of the source suggestion queue
    Supplementary Fig. 14: Screenshot of the source suggestion queue

    CIViC includes a “source suggestion queue”. This feature allows CIViC external domain experts to quickly and easily add important publications (using PubMed ID) to a queue for later generation of evidence records by the curation team. In addition to PubMed ID, an entry to the queue contains a free text field where submitters can add comments to help guide curation efforts related to each publication. Optional fields available when creating an entry for the publication queue are gene, variant, and disease. Action buttons allow curators to add new evidence records for each publication suggested (yellow), reject the suggestion (red), mark the suggestion as completed (green), or re-activate the source in the suggestion queue (grey).

Understanding of the genetic heterogeneity and mutational landscape underlying cancer has seen incredible advances in recent years. This has accelerated the implementation of precision medicine strategies in which clinicians and researchers target specific molecular variants with treatments tailored to the individual and their disease1. The biomedical literature describing such associations is large and growing rapidly. As a result, the interpretation of individual variants observed in patients has become a bottleneck in clinical sequencing workflows2. Many cancer hospitals and research centers are engaged in separate efforts to interpret cancer-driving variants and genes in the context of clinical relevance. These efforts are largely occurring within independent 'information silos', producing interpretations that require constant updates, lack community consensus and involve intense manual input.

Estimates of the proportion of patients with cancer who would benefit from comprehensive molecular profiling vary substantially3, in part because of the lack of both a community consensus definition of actionability and a comprehensive catalog of specific clinical variant interpretations. Achieving the goals of precision medicine will require this information to be centralized, freely accessible, openly debated and accurately interpreted for application in the clinic. Existing efforts to facilitate clinical interpretation of variants include the Gene Drug Knowledge Database4, the Database of Curated Mutations5, ClinVar6, ClinGen7, PharmGKB8, Cancer Driver Log9, My Cancer Genome10, Jax-Clinical Knowledgebase11, the Personalized Cancer Therapy Knowledgebase, the Precision Medicine Knowledgebase, the Cancer Genome Interpreter, OncoKB and others (Supplementary Table 1). These resources often have barriers to widespread adoption, including some combination of (i) no public access to content, (ii) restrictive content licenses, (iii) no public API, (iv) no bulk data download capabilities and (v) no mechanism for rapid improvement of the content (see Supplementary Table 1 for detailed feature comparison). To address these limitations, we present CIViC, an open-access, open-source knowledgebase for expert crowdsourcing of Clinical Interpretation of Variants in Cancer (http://civicdb.org/; Fig. 1).

Figure 1: Contribution of CIViC to the precision cancer treatment cycle.
Contribution of CIViC to the precision cancer treatment cycle.

The diagram summarizes how research, clinical treatment and CIViC knowledge curation are interrelated. The CIViC knowledgebase aims to develop clinical interpretations for raw cancer variant observations stored in large variant databases (gray). Each CIViC variant interpretation is based on published evidence and leverages complementary knowledgebases and ontologies wherever possible (yellow). The precision medicine clinical treatment cycle (blue) and research cycle (green) both involve sampling, sequencing, analysis, interpretation, intervention, evaluation and publication. These cycles start with hypothesis generation, followed by research projects or clinical trials, and dissemination of their findings. Examples of how each stage specifically relates to or benefits from the CIViC resource are represented by 'persona' icons for the four types of CIViC stakeholders: research scientists (green), clinical scientists (blue), patient advocates (orange) and developers (red). Each is accompanied by a brief description of a possible research, clinical, outreach or software development action. In the center of the diagram, key features of the CIViC interface and data model are summarized (purple). These include the roles and permissions of CIViC users, especially consumers of the content, curators and editors. Members of the CIViC community participate by adding, editing, discussing and approving individual evidence records and summaries that support the clinical interpretation of cancer variants. Anyone willing to log in may assume the role of curator, but contributions must be reviewed by expert editors before acceptance.

The critical distinguishing features of the CIViC initiative, in comparison to several of the resources cited above, stem from its strong commitment to openness and transparency. We believe that these principles (Box 1) are necessary for widespread adoption of such a resource. The target audience of CIViC is deliberately broad, encompassing researchers, clinicians and patient advocates. CIViC is designed to encourage development of community consensus by leveraging an interdisciplinary, international team of experts collaborating remotely within a centralized curation interface. Variant interpretations are created with a high degree of transparency and detailed provenance. The interface is designed to help keep interpretations current and comprehensive, and to acknowledge the efforts of content creators (Supplementary Fig. 1). CIViC accepts public knowledge contributions but requires that experts review these submissions.

Box 1: CIViC principles

The manner in which the clinical relevance of variants in cancer is presented in the published literature is highly heterogeneous. To represent these data in a more easily interpretable and consistent fashion, the CIViC data model is highly structured and ontology driven (Supplementary Fig. 2). Clinical interpretations are captured and displayed as evidence records consisting of a freeform 'evidence statement' and several structured attributes. Each evidence record is associated with a specific gene, variant, disease and clinical action. Evidence records belong to one of four evidence types indicating whether a variant is predictive of response to therapy, prognostic, diagnostic and/or predisposing for cancer. Evidence records are also assigned to an evidence level ranging from established clinical utility (level A) to inferential (level E) evidence (Supplementary Figs. 3 and 4). The quality of the underlying published evidence is rated from one to five stars. As evidence records accumulate for a single variant, they are in turn synthesized into a human-readable 'variant summary' of the variant's overall significance in cancer. Variants can also be aggregated into 'variant groups' that share a clinical significance (for example, imatinib resistance). All variant types are supported (including structural variants, RNA fusions and other expression events) as well as all variant origins (somatic mutation, germline mutation and germline polymorphism). Genomic coordinates, transcript identifiers and variant synonyms are determined by curators, reviewed by editors and stored in a standardized format (for example, HGVS) for each variant. Additional variant information is imported through the MyVariant.info annotation API12, providing links to complementary resources and variant annotations such as ClinVar6, COSMIC13 and ExAC14. Each variant is associated with a single gene, and each gene view provides a curated 'gene summary' synthesizing all of the variants it contains. Additional gene information is imported through the MyGene.info annotation API12, allowing users to focus curation effort on clinical impact and not repeat the efforts of other databases. Integration of public ontologies and databases, such as Disease Ontology15 and Sequence Ontology16, allows CIViC's data to be formally structured (Supplementary Fig. 5) and integrated with other resources. This structure provides both computationally accessible information and human-interpretable content with the flexibility to capture key details for the wide range of variants and experiment types being interpreted (refer to the Supplementary Note for implementation details).

CIViC currently contains 1,678 curated interpretations of clinical relevance for 713 variants affecting 283 genes (Supplementary Fig. 6). These interpretations were curated from 1,077 published studies by 58 CIViC curators. CIViC evidence records are supported by a wide range of evidence levels and trust ratings, currently biased toward somatic alterations and positive associations with treatment response (Supplementary Fig. 7). At least one evidence record has been created for 209 cancer subtypes and 291 drugs, with some bias toward lung, breast, hematologic, colorectal and skin cancers and associated targeted therapies (Supplementary Fig. 8). Supporting publications for these interpretations come from a large number of journals, primarily over the last five years, and tend to provide just one or two evidence records each (Supplementary Fig. 9). From the public launch of CIViC in June 2015 to December 2016, external curators (not affiliated with Washington University) contributed 46.7% of the evidence statements within the knowledgebase (Supplementary Fig. 6b). Thus far, submissions, revisions, comments and expert reviews have produced 11,254 distinct curation actions. These numbers continue to grow. More than 16,000 users have accessed CIViC interpretations from academic, governmental and commercial institutions around the world (Supplementary Fig. 6c,d). Early adopters of CIViC include leaders in developing cancer genomics pipelines17, the UCSC Genome Browser18 and Agilent's Cartagenia Bench Lab NGS. Early curation and content partners include the Gene Drug Knowledgebase4 and the Personalized Oncogenomics Program19. The CIViC resource is freely accessible without login, and no fees or exclusive access will be introduced in the future. Both academic and commercial adoption is free and encouraged. The variant and gene summaries, with additional statistics summarizing the level of supporting evidence in CIViC, can be automatically incorporated into clinical reports using the CIViC API or bulk data releases (updated nightly, with stable monthly releases) (Fig. 1). The source code for the CIViC website and public API are released under an open-source license (the Massachusetts Institute of Technology or 'MIT' license), and all curated content within CIViC is released under an open-access license (the Creative Commons Public Domain Dedication or 'CC0' license). The unencumbered availability of the CIViC bulk data releases, lack of requirements to establish a licensing agreement, the well-documented public API, and use of a structured data model and ontologies allow rapid adoption of CIViC in clinical workflows. As the user base grows, the number of experts with a vested interest in the content will increase, driving community engagement and increasing curation from external users.

A critical concern, as CIViC content expands, is the maintenance of high-quality data and the inherent tradeoff between data quality and rapid or automated updating. The curation workflow of CIViC (Supplementary Fig. 10) requires agreement between at least two independent contributors before acceptance of new evidence or revisions of existing content (Supplementary Figs. 11 and 12). At least one of these users must be an expert editor, and editors are barred from approving their own contributions. CIViC includes features such as typeahead suggestions (recommendations that appear as soon as you start to type), automatic warning of possible duplicates, detailed documentation in all entry forms, and input validation to encourage high-quality data entry. To facilitate team curation efforts (Supplementary Fig. 10), the CIViC interface also includes features such as subscriptions, notifications and mentions. Curators can also use an advanced search interface to generate and share complex queries of CIViC data that help guide curation effort and content consumption (Supplementary Fig. 13). Many of these features were inspired by the 'best practices' of active online collaborative research and software development platforms, including BioStars20 and GitHub.

A major challenge to the success of CIViC is the scope and complexity of the knowledge that needs to be summarized, and the development of strategies to assess the completeness of the resource. The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) recently reported on the variability among nine laboratories in clinical interpretations of germline variants relevant to Mendelian diseases21, a field where the ACMG–AMP have proposed detailed standards and guidelines for variant classification22. This report identified a low rate of interpretation agreement between laboratories (34% concordance). However, discussion and review of criteria were able to more than double this concordance, demonstrating the need for and success of open discourse in clinical variant interpretation21. Recently, the Somatic Working Group (WG) of the Clinical Genome Resource (ClinGen) has published a consensus set of minimal variant-level data (MVLD) to help standardize data elements needed for curation of the clinical utility of somatic cancer variants23. At present, cancer variant interpretation efforts that nominally have the same goals show a remarkably low overlap in source publications cited for these interpretations (1.6–71.6%, but generally less than 25%; Supplementary Table 2). This suggests that no single effort has comprehensively identified or summarized even the most relevant literature in this area, further illustrating the high curation burden involved. Conversely, these small overlaps emphasize the importance of reducing duplication of effort moving forward, especially considering the vastness of the existing literature and its tremendous growth rate. In CIViC, curation efforts thus far have focused on variants relevant to cancer types of particular interest at our center (for example, acute myeloid leukemia, breast cancer and lung cancer; Supplementary Fig. 8b), on variants identified as high priority by early CIViC partners4, 19 and on variants targeted by proof-of-principle precision medicine 'basket' clinical trials such as NCI-MATCH (also known as EAY131 or NCT02465060). Our ability to provide expertise in these areas is complemented by the expert knowledge of other groups and organizations, making CIViC a more comprehensive resource than would be possible with a 'siloed data' approach. To this end, recruitment of external contributors and domain experts from multiple fields is a top priority. This is accomplished in part through planning of CIViC-sponsored events in the cancer research and treatment community. We also allow for different levels of external community involvement, including submission of suggested publications to a queue to guide others to generate new evidence records (Supplementary Fig. 14).

Additional challenges faced by CIViC include long-term sustainability of funding, ensuring broad clinical engagement and maintaining the enthusiasm for the crowdsourcing efforts. We are addressing each of these challenges by engagement with other resources, experts and funding agencies with track records of long-term maintenance of informatics resources (see the Supplementary Note for further discussion). To facilitate such engagement and seek broad input and external guidance for our efforts, we have recently established a Variant Interpretation for Cancer Consortium (VICC) as part of the Global Alliance for Genomics Health (GA4GH). We have also established a panel of clinical domain experts to provide independent guidance on development of the resource and to assess the completeness and accuracy of our variant curation efforts.

CIViC is designed to address many of the challenges of cancer variant interpretation. To our knowledge, CIViC is the only variant interpretation effort currently capable of leveraging community experts and additionally has the most open model (open-access content, open-source code and an open API). We believe that this open strategy represents a sustainable model for achieving current, standardized and comprehensive interpretations of the clinical relevance of cancer variants. As the community of contributors grows, an increased incentive will emerge to help keep CIViC updated with cutting-edge clinical trial and US Food and Drug Administration (FDA) investigational new drug (IND) findings. As we have created a comprehensive and modern API, centers can rapidly integrate CIViC into automated clinical report generation for gene panel, exome, whole-genome and RNA sequencing of tumor samples. While there are many challenges faced by an effort such as this one, we hope that, with input from a critical mass of interested parties, these challenges can be largely overcome. We invite all researchers, healthcare providers and patient advocates engaged in clinical interpretation of variants to join the community at CIViC (http://civicdb.org/).

URLs. The Clinical Interpretation of Variants in Cancer resource described by this work is available at http://www.civicdb.org/. Personalized Cancer Therapy Knowledgebase, https://pct.mdanderson.org/; Precision Medicine Knowledgebase, https://pmkb.weill.cornell.edu/; Cancer Genome Interpreter, https://www.cancergenomeinterpreter.org/; OncoKB, http://oncokb.org/; GitHub, https://github.com/; GA4GH Variant Interpretation for Cancer Consortium (VICC), http://ga4gh.org/#/vicc; MIT license, https://opensource.org/licenses/MIT; CC0 license, https://creativecommons.org/publicdomain/zero/1.0/.

References

  1. Collins, F.S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793795 (2015).
  2. Good, B.M., Ainscough, B.J., McMichael, J.F., Su, A.I. & Griffith, O.L. Organizing knowledge to enable personalization of medicine in cancer. Genome Biol. 15, 438 (2014).
  3. Meric-Bernstam, F. et al. Feasibility of large-scale genomic testing to facilitate enrollment onto genomically matched clinical trials. J. Clin. Oncol. 33, 27532762 (2015).
  4. Dienstmann, R., Jang, I.S., Bot, B., Friend, S. & Guinney, J. Database of genomic biomarkers for cancer drugs and clinical targetability in solid tumors. Cancer Discov. 5, 118123 (2015).
  5. Ainscough, B.J. et al. DoCM: a database of curated mutations in cancer. Nat. Methods 13, 806807 (2016).
  6. Landrum, M.J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44 D1, D862D868 (2016).
  7. Rehm, H.L. et al. ClinGen—the Clinical Genome Resource. N. Engl. J. Med. 372, 22352242 (2015).
  8. Thorn, C.F., Klein, T.E. & Altman, R.B. PharmGKB: the Pharmacogenomics Knowledge Base. Methods Mol. Biol. 1015, 311320 (2013).
  9. Damodaran, S. et al. Cancer Driver Log (CanDL): catalog of potentially actionable cancer mutations. J. Mol. Diagn. 17, 554559 (2015).
  10. Yeh, P. et al. DNA-Mutation Inventory to Refine and Enhance Cancer Treatment (DIRECT): a catalog of clinically relevant cancer mutations to enable genome-directed anticancer therapy. Clin. Cancer Res. 19, 18941901 (2013).
  11. Patterson, S.E. et al. The clinical trial landscape in oncology and connectivity of somatic mutational profiles to targeted therapies. Hum. Genomics 10, 4 (2016).
  12. Xin, J. et al. High-performance web services for querying gene and variant annotation. Genome Biol. 17, 91 (2016).
  13. Forbes, S.A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777D783 (2017).
  14. Karczewski, K.J. et al. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res. 45, D840D845 (2017).
  15. Kibbe, W.A. et al. Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res. 43, D1071D1078 (2015).
  16. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
  17. Gagan, J. & Van Allen, E.M. Next-generation sequencing to guide cancer therapy. Genome Med. 7, 80 (2015).
  18. Speir, M.L. et al. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 44 D1, D717D725 (2016).
  19. Laskin, J. et al. Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers. Cold Spring Harb. Mol. Case Stud. 1, a000570 (2015).
  20. Parnell, L.D. et al. BioStar: an online question & answer resource for the bioinformatics community. PLOS Comput. Biol. 7, e1002216 (2011).
  21. Amendola, L.M. et al. Performance of ACMG-AMP Variant-Interpretation Guidelines among nine laboratories in the Clinical Sequencing Exploratory Research Consortium. Am. J. Hum. Genet. 98, 10671076 (2016).
  22. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405424 (2015).
  23. Ritter, D.I. et al. Somatic cancer variant curation and harmonization through consensus minimum variant level data. Genome Med. 8, 117 (2016).

Download references

Acknowledgments

First and foremost, we are grateful to the community of curators, editors, domain experts and users who make CIViC possible. CIViC is supported by the National Cancer Institute (NCI) of the National Institutes of Health (NIH) under award number U01CA209936 to O.L.G. (with M.G. and E.R.M. as co-principal investigators). This work was also supported by a grant to R.K.W. from the National Human Genome Research Institute (NHGRI) of the NIH under award number U54HG003079. D.T.R. was supported by the German Federal Ministry of Education and Research under award numbers 031L0030E and 031L0023B. The Institute for Research in Biomedicine is a recipient of a Severo Ocho Centre of Excellence Award from MINECO (Government of Spain) and is supported by CERCA (Generalitat de Catalunya). D.T. is funded by a grant from the Spanish Ministry of Economy and Competitiveness and FEDER/UE (SAF2015-74072-JIN). M.G. is supported by the NHGRI under award number K99HG007940. O.L.G. is supported by the NCI under award number K22CA188163. L.D.W. is supported by the NCI under award number K08CA166229. The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

  1. These authors contributed equally to this work.

    • Malachi Griffith,
    • Nicholas C Spies &
    • Kilannin Krysiak

Affiliations

  1. McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Malachi Griffith,
    • Nicholas C Spies,
    • Kilannin Krysiak,
    • Joshua F McMichael,
    • Adam C Coffman,
    • Arpad M Danos,
    • Benjamin J Ainscough,
    • Cody A Ramirez,
    • Lynzey Kujan,
    • Erica K Barnell,
    • Alex H Wagner,
    • Zachary L Skidmore,
    • Amber Wollam,
    • Connor J Liu,
    • Rachel L Bilski,
    • Robert Lesurf,
    • Yan-Yang Feng,
    • Nakul M Shah,
    • Lee Trani,
    • Matthew Matlock,
    • Avinash Ramu,
    • Katie M Campbell,
    • Gregory C Spies,
    • Aaron P Graubert,
    • James M Eldred,
    • David E Larson,
    • Jason R Walker,
    • David H Spencer,
    • Lukas D Wartman,
    • Richard K Wilson,
    • Elaine R Mardis &
    • Obi L Griffith
  2. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Malachi Griffith,
    • Benjamin J Ainscough,
    • Alex H Wagner,
    • Lukas D Wartman,
    • Richard K Wilson,
    • Elaine R Mardis &
    • Obi L Griffith
  3. Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Malachi Griffith,
    • Benjamin J Ainscough,
    • David E Larson,
    • Richard K Wilson,
    • Elaine R Mardis &
    • Obi L Griffith
  4. Department of Medicine, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Malachi Griffith,
    • Kilannin Krysiak,
    • Ron Bose,
    • David H Spencer,
    • Lukas D Wartman,
    • Richard K Wilson,
    • Elaine R Mardis &
    • Obi L Griffith
  5. Charité Comprehensive Cancer Center, Charité Universitätsmedizin, Berlin, Germany.

    • Damian T Rieke
  6. Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada.

    • Martin R Jones,
    • Melika Bonakdar &
    • Steven J M Jones
  7. Department of Molecular and Experimental Medicine, Scripps Research Institute, La Jolla, California, USA.

    • Karthik Gangavarapu,
    • Benjamin M Good,
    • Chunlei Wu &
    • Andrew I Su
  8. Oncology Data Science Group, Vall d'Hebron Institute of Oncology, Barcelona, Spain.

    • Rodrigo Dienstmann
  9. Computational Biology, Oregon Health and Science University, Portland, Oregon, USA.

    • Adam A Margolin
  10. Institute for Research in Biomedicine, Barcelona Institute of Science and Technology, Barcelona, Spain.

    • David Tamborero &
    • Nuria Lopez-Bigas

Contributions

M.G., N.C.S., K.K. and O.L.G. wrote the paper with input from A.M.D., E.K.B., E.R.M., A.H.W., R.L., A.C.C., J.F.M., B.J.A., Z.L.S., C.A.R., C.J.L., D.T.R. and G.C.S. A.C.C. led the back-end code development with contributions from N.C.S., J.F.M., M.G., O.L.G., K.K., G.C.S. and K.G. J.F.M. led the front-end code development with contributions from A.P.G., A.C.C., K.K., M.G., O.L.G. and N.C.S. B.M.G., A.I.S., A.A.M., D.T., N.L.-B. and S.J.M.J. contributed ideas relating to crowdsourcing functionality and integration with existing community resources. J.M.E., D.E.L., C.W. and J.R.W. contributed software engineering expertise. Substantial curation efforts were contributed by M.G., N.C.S., K.K., A.M.D., B.J.A., C.A.R., D.T.R., L.K., E.K.B., A.H.W., Z.L.S., A.W., C.J.L., M.R.J., R.L.B., R.L., Y.-Y.F., N.M.S., M.B., L.T., M.M., A.R., K.M.C., R.D., R.B., D.H.S., L.D.W., E.R.M. and O.L.G. Guidance on developing CIViC for clinical applications was provided by R.D., R.B., D.H.S. and L.D.W. Trainee supervision and project leadership were provided by M.G., R.K.W., E.R.M. and O.L.G.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: CIViC interface overview (644 KB)

    The user-friendly CIViC interface is the primary point of contact with users whether they are consuming, editing or adding content. CIViC user-curated content (blue boxes) is visible without sign in and provides the bulk of visible content ordered from gene level (top) to variant level (middle), and finally individual evidence records (bottom). Curated content is enhanced by imported content and citations (orange boxes) that are linked directly to their original source. Website navigation and extensive documentation are highlighted with red boxes. Finally, a curator can interact (green boxes) with CIViC user-curated content by 1) suggesting changes (edit button) or adding content; 2) commenting on content or suggested revisions; 3) downloading content; or 4) viewing their activity, suggested changes, notifications, or profile.

  2. Supplementary Figure 2: The CIViC data model (561 KB)

    Key elements of the CIViC data model are listed below. Briefly, CIViC aims to provide gene and variant level executive summaries of the clinical relevance of specific variants. Multiple structured evidence records are first created and then synthesized to produce these executive variant/gene summaries. Each evidence record is associated with a specific variant and gene. Each evidence record also corresponds to a single clinical assertion for a single cancer type from a single peer-reviewed publication. One publication can be used to generate multiple evidence records. The evidence record consists of a free-form, human readable statement and several structured elements. The statement consists of a few sentences written by a curator to summarize the clinical relevance of a variant according to evidence described in a particular publication. The curator attempts to concisely summarize the clinical assertion being made by the publication, as well as the nature of the evidence supporting that assertion and any caveats the reader should be aware of. The curator must also assign values for each structured element by evaluating details from the publication. These elements include evidence type, clinical significance, evidence direction, and others. Where possible, structured ontologies are used in the CIViC data model (e.g. the disease ontology for disease names). Dark blue boxes refer to primary CIViC entities and light blue boxes refer to external data.

  3. Supplementary Figure 3: Evidence level definitions and examples (633 KB)

    Evidence levels defined in the CIViC data model are summarized below. Evidence levels are ordered A-E according to clinical utility (likelihood of relevance to a clinician reading a molecular report). A brief definition of each evidence level is provided along with an example obtained from www.civicdb.org. Updates to the CIViC data model (including to these evidence levels) will be maintained in the CIViC online documentation (https://civic.genome.wustl.edu/#/help/evidence). Additional examples of evidence records assigned to each evidence level can be obtained using the advanced search interface online: https://civic.genome.wustl.edu/#/search/evidence/.

  4. Supplementary Figure 4: CIViC evidence classes and their relative potential to influence clinical actions and understanding of disease (256 KB)

    The following diagram attempts to order each combination of evidence level (A-E) and evidence type (predictive, prognostic, diagnostic, or predisposing) according to their potential clinical relevance and actionability. 'Clinical relevance' refers to the contribution of the variant to clinical understanding of the disease and 'actionability' refers to the ability to identify a specific clinical action for a specific variant. In this assessment, validated predictive variants tend to be the most relevant and actionable, while inferential diagnostic are the least relevant. In general, higher evidence levels are more actionable and predictive assertions exceed prognostic and diagnostic evidence for clinical utility. While CIViC is designed to capture both supporting (positive) and refuting (negative) evidence, the following is an assessment of the likely utility of supporting evidence only.

  5. Supplementary Figure 5: CIViC database schema (253 KB)

    A simplified schema representing the CIViC data model below provides all table names of the CIViC relational database (running on PostgreSQL). Polymorphic associations are used to relate core domain objects such as evidence records, genes, and variants to the tables that power on-site workflows like moderation and discussion. This allows for a significant reduction in the total number of tables required at the expense of database enforced foreign key constraints. In lieu of traditional foreign keys, validations in the application's business logic are used to enforce data integrity. Solid lines in the diagram indicate direct relationships in the database implemented by a local foreign key (for example, a variant has an evidence record identifier in the variants table, and thus a direct relationship). Dotted lines indicate relationships that exist indirectly (the relationship goes through an intermediate event with some conditions attached to it). For a complete schema including all fields and foreign key relationships, refer to the CIViC backend code repository: https://github.com/genome/civic-server.

  6. Supplementary Figure 6: Usage statistics and growth of content (352 KB)

    A) CIViC content as of December 2016. B) Tracking of evidence statements within CIViC over time with respective contributions of internal (Washington University, 'WashU') and external (community) curation. C) Treemap with box size illustrating the relative number of visits (sessions) to the CIViC website www.civicdb.org from specific external organizations and colored by the average session duration (in seconds). Sessions from our own institute are excluded from this summary. D) Map illustrating the location where sessions originated. The size of the circles indicate the amount of traffic from each city. Dark blue indicates visits from a dense cluster of cities that are close to each other. To date, CIViC has achieved 39,881 visits from 16,484 unique visitors from 2,507 cities in 125 countries around the world.

  7. Supplementary Figure 7: Summary of current CIViC evidence records (282 KB)

    The following panels briefly summarize CIViC evidence records at the time of publication. A) Total publications used in 1,703 evidence records, broken down by review status of the evidence record. Panels B-F further summarize these evidence records after excluding those that had a 'rejected' status (leaving 1,678 submitted or accepted evidence records). B) Evidence records broken down by evidence type and clinical significance. C) Evidence records broken down by evidence direction. D) Evidence records broken down by evidence trust rating. E) Evidence records broken down by evidence level. F) Evidence records broken down by variant origin.

  8. Supplementary Figure 8: Summary of the most curated drugs and diseases in CIViC (261 KB)

    A summary of the drugs and diseases represented in CIViC evidence records ranked by the number of evidence records associated with each. A) The top 25 drugs were identified from 1,105 accepted or submitted evidence records of the predictive evidence type. The evidence records for these drugs are broken down by evidence level (left panel) and clinical significance (right panel). B) The top 25 cancer types (distinct disease ontology terms) were identified from all 1,678 accepted or submitted evidence records. The evidence records for these diseases are broken down by evidence level (left panel) and evidence type (right panel).

  9. Supplementary Figure 9: CIViC evidence records summarized by literature sources (262 KB)

    The published literature used to create all CIViC evidence records are summarized below. A total of 1,678 accepted or submitted evidence records were derived from 1,077 peer-reviewed publications. A) A histogram summarizing articles used in CIViC evidence records broken down by year of publication (and further divided according to their open versus closed access status). B) A histogram showing the distribution of number of evidence records obtained from single publications. Most publications yield only a single evidence record, but as many as 38 have been obtained from a single paper. C) Evidence records obtained from the top 25 journals most commonly mined in CIViC are summarized and broken down by evidence star rating on the left. The same evidence records are broken down by the evidence type on the right.

  10. Supplementary Figure 10: The collaborative process and user roles in creating evidence (210 KB)

    CIViC consists of an online web resource whose target audience is an international community of cancer researchers, clinicians, and patient advocates. Participants in CIViC fall into various categories with increasing privileges or capabilities in the interface. The first category and most basic level of user is that of 'consumer'. Consumers may view, download and programmatically (via API) access all of the content of CIViC under the terms of the Creative Commons Public Domain Dedication license (CC0). No login is required to use CIViC. No requirement to login, fees, or other encumbrances will be introduced in future versions of CIViC. Consumers may not add, approve, edit, or discuss revisions of content in CIViC. The second category of users includes all those roles that do permit modification and discussion in the site: 'curators', 'editors', and 'administrators'. 'Curators' may add new evidence records describing clinical relevance of variants, add or improve variant/gene summaries, and discuss existing content. While comments/discussion are automatically accepted, additions and revisions to existing content are initially entered in a pending state and must be approved prior to acceptance in CIViC. Rejected content is not deleted and may be revived after further discussion and revision. Editors have the additional capability to approve or reject additions and revisions of content. However, an editor cannot approve their own submissions or revisions, meaning that all content in CIViC must be created in collaboration between at least two members of the community. Editors are selected by a committee of existing editors, based on direct knowledge of the editor's expertise or by promotion from curator after demonstrating extensive high quality contributions to CIViC. Finally, administrators have the abilities of editors but may also change user roles and use advanced site management utilities (e.g. merging duplicate records).

  11. Supplementary Figure 11: Screenshot of the editor view for a submitted evidence record (511 KB)

    Every new evidence record and any revision of existing content in CIViC must be approved by at least one independent editor prior to acceptance. The following screenshot shows a new evidence record submitted by a curator that is awaiting review by an editor. The following URL will display the live version of this example: https://civic.genome.wustl.edu/links/variants/34

  12. Supplementary Figure 12: Screenshot of the editor view for a pending revision (536 KB)

    After proposing a revision to existing content, a contributor is presented with a summary of the fields they are proposing to modify. An independent editor must approve these revisions before they are displayed in the canonical CIViC results (the web interface and API).

  13. Supplementary Figure 13: Screenshot of a complex evidence query (346 KB)

    CIViC has an advanced search interface that currently supports complex queries for evidence records and variants. An arbitrary number of query conditions can be set and the query can be configured to match any one, or all of these conditions. Evidence records can be queried by sixteen variables including disease, variant name, publication ID, evidence type, evidence level, trust rating, curator name, etc. In the following screenshot, the advanced search interface is being used to retrieve all evidence records that correspond to variants involving the gene ALK, where the evidence type is 'Predictive', and the drug involved is alectenib. From this query, 13 evidence records are returned and sorted according to their quality level (evidence level, and trust rating). The standard CIViC evidence datagrid is used to display a summary of the 13 evidence records including: evidence identifier (EID), gene name, variant name, evidence statement (DESC), cancer type (DIS), drugs, evidence level (EL), evidence type (ET), evidence direction (ED), clinical significance (CS), variant origin (VO), and evidence trust rating (TR). The 'Help' button provides a comprehensive legend of all abbreviations, symbols, and colors used to encode information in the evidence record summary. Clicking any row will take the user to the comprehensive display for that evidence record. Every advanced search generates a unique URL that can be used generate an updated result or easily share the result with a colleague. For example: https://civic.genome.wustl.edu/#/search/evidence/fbf0df08-0211-4e55-b4e7-d103d76d0b59.

  14. Supplementary Figure 14: Screenshot of the source suggestion queue (428 KB)

    CIViC includes a “source suggestion queue”. This feature allows CIViC external domain experts to quickly and easily add important publications (using PubMed ID) to a queue for later generation of evidence records by the curation team. In addition to PubMed ID, an entry to the queue contains a free text field where submitters can add comments to help guide curation efforts related to each publication. Optional fields available when creating an entry for the publication queue are gene, variant, and disease. Action buttons allow curators to add new evidence records for each publication suggested (yellow), reject the suggestion (red), mark the suggestion as completed (green), or re-activate the source in the suggestion queue (grey).

PDF files

  1. Supplementary Figures and Text (7 MB)

    Supplementary Figures 1–14 and Supplementary Note

Excel files

  1. Supplementary Table 1 (14 KB)

    Related resources.

  2. Supplementary Table 2 (36 KB)

    Literature covered by CIViC as compared to related resources.

Additional data