Introduction

Many groups have published germ-line variant interpretation/classification systems for use in clinical laboratory reporting for inherited conditions or cancer predisposition disorders. Available resources include recommendations from the American College of Medical Genetics and Genomics (ACMG),1 the International Agency for Research on Cancer,2 and others.3,4,5 The focus of these germ-line variant classification systems is to assign pathogenicity to variants to distinguish known or likely pathogenic variants from benign ones. A clear classification system for reporting variants, particularly if used across multiple laboratories, simplifies variant reporting and highlights significant findings in a potentially complex genetic test report for laboratory clients. Such a classification system may also help with triaging and reporting clinical next-generation sequencing (NGS) data.

The ACMG recommendations for germ-line variants1 are useful for stratifying their pathogenicity and functional significance. Additionally, the ACMG’s 2013 publication on reporting incidental findings from clinical sequencing efforts provides guidance on reporting potentially clinically significant germ-line variants detected secondarily while testing for a different indication.6 We identified additional studies addressing the determination of whether a constitutional sequence variation is likely a benign polymorphism or a functionally deleterious variant.2,3,4 Classification and interpretation systems for somatic variants detected via molecular profiling of cancer and other diseases have not yet been established. We conducted a comprehensive search for categorization systems for somatic variants in cancer in PubMed and Google Scholar using various combinations of the following search terms: somatic, cancer, variant, sequence variant, classification, category, categorization, reporting, and guidelines. A limited number of relevant articles were identified, with most pertaining to germ-line,1,2,3,4,5 rather than somatic, findings in cancer. We found only one relevant recent publication on somatic variant classification (SVC) describing a system of prioritization and assigning a level of evidence for clinical actions given a somatic variant found in whole exome sequencing.7 Our work complements this prior publication by providing a framework for variant interpretation after they have gone through such a prioritization.

Molecular profiling in cancer aims to identify tumor DNA variants that can confer diagnostic, prognostic, or treatment-related information to guide patient management. For example, testing of lung adenocarcinomas variants in EGFR exons 18–21 is recommended8 to identify patients with sensitivity or emerging resistance toward epidermal growth factor receptor (EGFR) tyrosine kinase inhibitors (TKIs). Testing for RAS variants is recommended to optimize patient management in colorectal cancers.9,10 Testing guidelines also recommend profiling melanoma patients for BRAF V600 variants11 to identify patients who benefit from BRAF TKIs, such as vemurafenib or dabrafenib.12

Recent advances in NGS have greatly enhanced the feasibility of routine molecular testing of solid tumors, which was previously constrained by limitations of sequential single-gene testing on limited patient samples. NGS permits parallel profiling of multiple genes from the same sample simultaneously, thus reducing patient wait time while increasing the quantity of information resulting from the test.13,14

Given the increasing clinical utility of molecular profiling in cancer, there is a significant need for a standardized variant interpretation and classification system for somatic variants identified in solid tumors. Such a system should be applicable to multiple types of genetic variation, including single nucleotide variations, indels/frameshifts, copy-number changes, and structural changes across multiple tumor types. The goals of somatic variant interpretation and classification are different than for germ-line variants in that somatic variants are assessed for diagnostic, prognostic, predictive, and/or therapeutic impact in the context of tumor site and histology.

We present a five-category SVC system using assessment of key factors necessary to interpret the significance of a genetic variant in somatic tumor molecular analysis. Molecular profiling in cancer aims to find variants that are potentially clinically “actionable” in patient management. Our classification system therefore uses “actionability” as the core principle, with sub-categories based on evidence for actionability of the variant and in the specific tumor site and histology. The distinction between “actionability” and “pathogenicity” and the relevance of the former to the oncology setting distinguishes our proposed categorization system from previously published classifications. We also present an example of application of our SVC to a dataset of somatic variants found by NGS molecular profiling of 158 tumor samples and summarize the findings with a discussion of the utility of the classification system. Our SVC can be of practical value to other clinical molecular laboratories performing cancer genetic profiling by promoting consistent reporting of somatic variants and permitting harmonization of variant data among laboratories and clinical studies.

Materials and Methods

Assessment protocol for SVC

Our assessment protocol ( Figure 1 ) comprises the following points. First, sequencing data quality is evaluated, including coverage depth at variant position, in normal blood and formaldehyde fixed-paraffin embedded (FFPE) tumor tissues to ensure that somatic variant data are meeting minimum quality criteria15,16; variants at or near quality and allele frequency thresholds are investigated manually, verified by an orthogonal method, and interpreted and reported only if verification was successful. Second, variant frequency in normal germ-line population databases (the 1000 Genomes Project; the Exome Sequencing Project17) is determined. Any variant with a frequency in either database of >1% is considered benign and excluded from further analysis. This algorithm can be applied to cases where no matched normal sample is available or where it is impractical to test normal samples. Third, the variant’s observed frequency in the patient’s tumor site and histology is determined, using databases such as the Catalogue of Somatic Mutations in Cancer (COSMIC)18 and The Cancer Genome Atlas (TCGA) via the cBio Cancer Genomics Portal.19 If a variant is found in a gene for which locus-specific databases containing somatic variant information exist, then these databases are also searched. For cases in this report, locus-specific databases reviewed include the International Agency for Research on Cancer (IARC) TP53 database, the APC Mutations Database, the International Society for Gastrointestinal Hereditary Tumors’ MLH1 database, and the Multiple Endocrine Neoplasia Type 2 RET Database. Fourth, evidence of the impact of the variant on the biochemical activity of the protein and/or cellular pathway, as drawn from the literature and/or locus-specific databases, is compiled. In the absence of published biochemical data, prediction of the variant’s effect on protein function using in silico algorithms for missense variants, including the Sorting Intolerant from Tolerant, PolyPhen, Mutation Taster, Mutation Assessor, AlignGVGD, and likelihood ratio test algorithms, is utilized. Variants were annotated using NextGENe v2.3,1 (SoftGenetics, State College, PA) and Alamut v2.4.5 (Interactive Biosoftware, Rouen, France), or were annotated manually (using the Integrated Genome Viewer, Broad Institute, Boston, MA) when appropriate.

Figure 1
figure 1

Overview of the somatic variant assessment protocol. Variants identified in tumor and normal samples are assessed for data quality and compared to identify those somatic variants, which are present only in the tumor, but not the normal. Somatic variants are then assessed using relevant variant databases for their frequencies in cancer, functional prediction algorithms for their impact on biochemical activity of the protein, and the available medical literature for biochemical data as well as data on actionability. Variants identified in normal and tumor, or in normal alone, are treated as putative germ-line variants and different analyses undertaken, as shown.

Evidence for clinical actionability is assessed through a detailed manual search of the available literature via Google Scholar and PubMed using combinations of the terms “<GENE NAME>,” “<VARIANT NAME>,” “TISSUE,” “HISTOLOGY,” “CANCER,” “PROGNOSIS,” and “RESPONSE.” Information available through compendia of cancer-associated variants (e.g., mycancergenome.org) is also assessed. The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) principles are applied to determine whether evidence in the literature was sufficient to place a given variant in a category based on actionability.20,21,22,23,24,25 Briefly, randomized controlled trials (e.g., phase III studies) are considered to be high-quality evidence, whereas observational and non-human studies are considered low-quality evidence, with adjustments to level of evidence based on number and consistency of studies, magnitude of effect reported, known or potential study bias and limitations, and potential publication bias.

Because databases used in variant interpretation are updated regularly, we perform a regular database and literature review. Available information is reviewed and interpretations are updated every 3 months.

SVC

We identified four considerations in classifying somatic variants in cancer-associated genes: (i) current evidence for clinical actionability; (ii) primary site and tumor histology in which the variant is found; (iii) pathogenicity; and (iv) variant recurrence in cancer. The proposed classification system is shown in Figure 2 . Table 1 gives a detailed description of each of the variant categories for the SVC.

Figure 2
figure 2

Summary of the proposed somatic variant classification. Variants are classified as classes 1 through 5, based on information around actionability (same variant: classes 1 and 2; other variants in the same gene: classes 3 and 4; no data: class 5), tumor site/histology, recurrence in the literature, and variant effect from prediction tools.

Table 1 Detailed description of the somatic variant classification scheme

To ensure applicability and relevance of the SVC to the clinical diagnostic setting, we utilized the Princess Margaret Cancer Centre’s multidisciplinary Cancer Genomics Program Tumor Board, attended by oncologists, pathologists, bioinformaticians, genetic counselors, and laboratory and clinical geneticists, as a review group during the development and initial implementation of the classification system.

“Actionability” was defined as the ability of the test result to modify patient management. We define a variant as actionable if it is associated with a known targeted therapy and/or patient prognosis and/or response to any therapy. For example, although there are currently no direct anti-KRAS therapies because KRAS variants in colon adenocarcinoma are known to be predictive of resistance to EGFR-inhibitors,26 these were considered to be actionable because they would be used to guide treatment decisions. Furthermore, a variant is actionable if it has diagnostic/classification implications and patient management is impacted by variant identification. For example, BRAF V600E can differentiate sporadic from inherited colorectal carcinomas in patients with microsatellite instability high-colonic adenocarcinomas.

Application of the SVC to a pilot set of molecular profiling results

To pilot the SVC, we applied our assessment protocol to molecular profiling results from a cohort of 158 patients with breast or colorectal cancer, lung adenocarcinoma, and melanoma (tested with their consent) who were enrolled in institutional research ethics board–approved studies. Tests were performed in the University Health Network’s (UHN’s) College of American Pathologists-Clinical Laboratory Improvement Act-certified Advanced Molecular Diagnostics Laboratory. Patients were profiled using the Illumina (San Diego, CA) TruSeq Amplicon Cancer Panel on the MiSeq benchtop next-generation sequencer (Illumina) according to laboratory standard protocols. Patients were recruited through an ongoing clinical trial (Integrated Molecular Profiling in Advanced Cancer Therapy (IMPACT); ClinicalTrials.gov identifier NCT01505400). All somatic variants were assessed using the criteria defined above and subsequently classified. Primary review, assessment, and classification of all variants were independently carried out by a minimum of two assessors (M.A.S., K.J.C., or M.T.), followed by a secondary level review (S.K.-R. or T.L.S.); cases involving assessor disagreement were reviewed in group discussion.

Results

We profiled a pilot set of 158 patients diagnosed with breast or colorectal carcinoma, melanoma, and lung adenocarcinoma (see Supplementary Table S1 online), assessed all variants identified using our somatic variant annotation algorithm, and classified them according to actionability using the SVC. We identified and classified 258 variants in this pilot cohort and evaluated them for clinical actionability based on their impact on patient management (i.e., predictive, prognostic, and druggability information). Additionally, to assess the impact of a more focused definition of clinical actionability, we classified variants according to availability of genomic directed therapy with either US Food and Drug Administration (FDA)-approved drugs or clinical trial treatments that patients may have been potentially eligible to receive. These data are summarized in Figure 3 .

Figure 3
figure 3

Classification of variants identified in somatic tumors. (a) Gene distribution of 41 variants identified by molecular profiling of 27 breast cancer patients using the Illumina TruSeq Amplicon Cancer Panel. (b) Distribution by class of variants after application of two definitions of clinical actionability (dark bars: actionability defined as predictive/prognostic/druggable; light bars: actionability defined as information on druggability only). (c) Gene distribution of 103 variants identified by molecular profiling of 39 colorectal cancer patients using the Illumina TruSeq Amplicon Cancer Panel. (d) Distribution by class of variants after application of two definitions of clinical actionability. (e) Gene distribution of 69 variants identified by molecular profiling of 41 lung adenocarcinoma patients using the Illumina TruSeq Amplicon Cancer Panel. (f) Distribution by class of variants after application of two definitions of clinical actionability. (g) Gene distribution of 45 variants identified by molecular profiling of 51 melanoma patients using the Illumina TruSeq Amplicon Cancer Panel. (h) Distribution by class of variants after application of two definitions of clinical actionability.

Overall, 37.0% of identified variants were classified as actionable, class 1, when we considered whether the variant had prognostic or predictive impact or was therapeutically targetable. Furthermore, 49.0% of variants were classified as classes 2–4 (variants of unknown significance but potentially actionable based on gene-level data and/or data in other tumor sites). Conversely, when we focused specifically on actionability being a function of the ability to target the identified variant with an approved or investigational therapy, 15.7% of identified variants were considered actionable, class 1 (druggable, predictive, prognostic, or diagnostic in the tumor site in which they were detected), whereas 22.2% were considered classes 2–4 (variants known to be actionable in different tumor sites/histologies or variants in genes known to be actionable). Although variants in these classes could be used to influence patient care at the discretion of the oncologist, it should be noted that the relevance of that specific variant to patient care in the specific tumor site/histology is presumably much lower for class 4 variants than for class 2 or 3 variants.

The classification of variants varied by tumor site and actionability definition used. This is particularly evident for colorectal carcinomas, for which substantially more data are available for the impact of somatic variants on patient prognosis or response to therapy than on druggability in that tumor site. However, the available information for somatic variants in melanoma is skewed toward druggability; therefore, there is much less difference observed in actionability when either definition is used. Of note, the majority of variants for which gene-level actionability information, but not variant-level data, is available are found in tumor-suppressor genes (e.g., TP53, FBXW7 and SMAD4). In these instances, data are available for prognostic/predictive indications.

Class 5 variants are those for which no information is available in the literature on the clinical actionability of that gene, irrespective of tumor site (5A), or for which actionability has been ruled out (5B). Variants in genes that are specifically associated with one tumor site may fall within this class. For example, while variants in APC may inactivate that protein and are thought to lead to enhanced WNT/CTNNB1 signaling, this pathway is not yet demonstrated to be druggable in a clinical setting. Furthermore, there are no current reports that demonstrate prognostic/predictive relevance for variants in APC. Thus, we classified variants in APC as class 5A.

As expected from available data in the TCGA,27,28 the variant profiles observed were tumor-specific and limited by the NGS panel chosen for the test as well as the patient cohort used in this study. Of significant relevance to our efforts to define a variant classification system, the actionability of a given gene/variant was strongly dependent on tissue of origin. For example, the available evidence indicates that variants in codons 12 and 13 of KRAS are actionable (predictive of patient response to EGFR inhibition) in colorectal cancer.9 Meanwhile, these variants are the active subjects of clinical trials with MEK inhibitors in ovarian cancers.29 Similarly, the p.Val600Glu variant of BRAF is druggable in melanoma,12,30,31 whereas recent evidence indicates its potential predictive utility in colorectal cancer patients treated with EGFR inhibitors.32

Figure 3 shows in further detail how sequence variants were classified for each of the four primary sites of cancer in our study. The pie graphs in panels a, c, e, and g demonstrate the differing patterns of gene variant frequencies at each primary tumor site, whereas the bar graphs in panels b, d, f, and h illustrate the breakdown of classification based on the two definitions of actionability utilized. Major known actionable genes included: TP53 and PIK3CA33,34,35 in breast cancer; KRAS in colorectal cancer9; EGFR in lung cancer; and PIK3CA33 in ovarian cancer.

Discussion

We present a system for the categorization of somatic variants detected by high-throughput genomic testing of cancer samples. The SVC takes into account the following: (i) current evidence of clinical actionability, including targeted therapies and diagnostic utility; (ii) primary site and tumor histology; (iii) pathogenicity; and (iv) recurrence of the variant in cancer. By focusing on actionability, the SVC attempts to gauge the impact of genomic findings on patient management, thus bringing the most clinically relevant findings to the forefront of a potentially lengthy list identified using a genomic scale testing platform.

Van Allen et al.7 recently described a system of assigning levels of evidence to each of a set of potential clinical actions for somatic variants identified during whole genome sequencing. The clinical actions were administration of an FDA-approved therapy, enrollment in an open clinical trial, prognostic stratification, and diagnostic utility. The levels of evidence were as follows: (i) a validated association; (ii) limited clinical evidence; (iii) clinical evidence in another tumor type only; (iv) preclinical evidence; and (v) inferential association. This is a useful system for prioritization of variants from a large-scale sequencing approach such as WGS. The SVC described here focuses more on prioritization of variants from a diagnostic clinical reporting perspective, and it takes into account both the potential for gene-level actionability information as well as potential interactions among multiple variants identified in a given sample. We also emphasize that our classification system is not intended to help distinguish a single-nucleotide polymorphism or other normal polymorphism from a known causative variant but is meant to be applied after normal variants have been excluded.

Factors in determining variant classification

Definition of actionability. Impact on patient care can be multidimensional. Availability of an FDA-approved targeted molecular therapeutic (e.g., molecularly targeted therapies to BRAF, EGFR, or RET) is the most striking example of impact on patient care. However, the number of available targeted therapeutic agents that have been clinically approved at this time is very small; thus, only a small subset of variants may be deemed actionable by this criterion. Availability of investigational agents in clinical trials for therapeutic targeting of a given variant, gene, or pathway greatly increases the opportunities to identify molecularly targeted therapies based on genome-scale testing results. The broadest definition of actionability is whether the variant will impact patient management in any way (e.g., administration of a targeted therapy or a non-targeted therapy; a decision to hold off on further therapy; or a change in patient monitoring). The presence of a given alteration in a gene of interest may predict for patient response to conventional therapies (e.g., KRAS codon 12 variants in colorectal cancers) or impact patient prognosis (e.g., TP53 variants in breast cancer). Furthermore, variants with diagnostic utility can impact patient management through application of treatment algorithms specifically dependent on molecular diagnosis. In this study, we chose to examine actionability as defined by availability of approved or investigational therapies (assessed through curation of literature, mycancergenome.org, and/or ClinicalTrials.gov); as well as predictive/prognostic and diagnostic utility.

Level of available evidence. Data on clinical actionability may be presented in the form of clinical trials of drugs or biomarkers, or may be inferred from preclinical, biochemical, and/or in silico evidence. In applying GRADE principles for assessing evidence in the context of SVC, we developed a set of standards to use in evaluating available data. For example, large, multi-center, prospective, and well-controlled phase II or III trials were considered a higher level of evidence than small-scale and phase I studies. Multiple studies in agreement were considered as cumulative evidence and were weighted higher than a single study. We established a minimum level of evidence required to demonstrate clinical actionability as well, whereby preclinical, biochemical, and in silico data were considered to be insufficient to infer actionability ( Table 2 ). For variants in tumor-specific contexts where no clinical studies are available, preclinical drug sensitivity encyclopedias can be mined to infer potential clinical relevance.36,37 However, there are concerns about the ability to validate predictive genomic biomarkers across cell line datasets38 and the lack of reproducibility of preclinical experiments.39 Therefore, such studies did not suffice for assigning a variant to classes 1–4.

Table 2 Level of evidence assessment for clinical actionability data

Interassessor variability. To minimize the potential for interassessor variability in larger centers where interpretations were undertaken by teams, we applied GRADE guidelines for levels of evidence to assessment of the literature and evolved definitions for low-, medium-, and high-quality evidence in support of clinical actionability ( Table 2 ). With this framework in place, most classifications—particularly those in classes 1 and 2—were clear, whereas uncertainty in classification was extremely rare and most often seen in variants of unknown significance (classes 4 and 5).

Impact of primary site and histology on variant classification. The correct primary tumor site and histology information are crucial in accurately determining the clinical significance of genomic findings. This has been demonstrated clinically in several instances, e.g., the targetable nature of BRAF V600E variants in melanoma (class 1)11,12 versus colorectal carcinoma (class 2).32 The requirement for specific histology data is likewise important. Exon 19 deletions in EGFR are actionable, class 1 (targetable with TKIs), in lung adenocarcinomas. However, the same variant observed in a patient with lung squamous cell carcinoma would be considered class 2 because of the paucity of information on clinical actionability in this histology. An incorrect or ambiguous histology/tissue site assignment will lead to faulty classification and interpretation of a variant’s implications for patient management. Larger targeted sequencing panels include more genes and variants whose actionability in the absence of detailed tissue site/histology information is ambiguous at best. Together, these considerations highlight the requirement for detailed and accurate tissue site/histology data.

Pathogenicity. Our approach to the SVC prioritizes clinical actionability over variant pathogenicity, which references mechanistic (causative or oncogenic) relevance. A variant may be a driver alteration, but not actionable, in a given tumor site and histology. The distinction between “actionability” and “pathogenicity” and the relevance of the former to the oncology setting distinguish the SVC from previously published classifications.1

Recurrence. The TCGA and COSMIC provide useful information about a variant’s prevalence in cancer. Only a few genes (e.g., EGFR in non–small cell lung cancer, KRAS in colorectal cancer, and BRAF in melanoma) have been studied well enough to make clear, evidence-based judgments with respect to the clinical actionability of individual variants. There are some cases (e.g., KIT variants in melanoma) of rare variants with known actionability or of rare variants in “hotspot” codons, which have known actionability. Given the available evidence, the classification of these types of variants is potentially ambiguous. Thus, the question of to what degree does codon-level actionability information impact variant classification may be posed. This decision requires expert opinion and may result in variable categorization of less common variants.20,21,22,23,24,25,40 In our system, this assessment determines whether a variant is assigned to classes 1 vs. 3 or 2 vs. 4.

Correct variant annotation

Correct variant annotation using publicly available databases is dependent on use of the appropriate transcript identifier/accession number. Incorrect transcripts can lead to faulty interpretation. Alternate transcripts can be annotated differently, confounding variant interpretation and classification. In commercially available targeted panels, target amplicons are designed against specific transcript identifiers, which need to be used in variant annotation and interpretation. Likewise, commercially available software tools utilize default transcript identification numbers (IDs). For custom panels and/or informatics solutions, it is important to define transcript IDs according to a consensus. Our practice has been to evaluate which transcripts have been annotated across available databases. For example, we evaluated transcript IDs for TP53 used by National Center for Biotechnology Information RefSeq, the IARC TP53 database, and the Human Gene Mutation Database to determine the most appropriate transcript to annotate against.

Utility of the SVC

In this study, we applied our proposed classification system to pilot datasets derived from profiling tumor samples on a commercially available NGS panel. However, the SVC system can be applied to whole exome or whole genome sequencing data, or other genetic testing methods that may arise, irrespective of tumor type. Furthermore, the SVC is not restricted to a specific variant type; it has been useful in our experience in classifying coding as well as splice site changes, and could be applied to cytogenetic abnormalities (e.g., translocations, amplifications, and deletions).

Some alterations may interact with each other in a given tumor, e.g., a secondary T790M variant in EGFR may undermine the sensitizing effect of an exon 19 deletion in EGFR, resulting in resistance to TKI therapy in a lung cancer.9 Evidence for the clinical significance of such combined effects and for patterns of molecular pathway alterations will continue to emerge as further comprehensive cancer profiling studies are published. This will enable refinements to the SVC to capture the clinical significance of such interactions in reporting variants.

Classification of specific genes/variants may be flexible and dependent on the context of academic center-directed clinical trials. It is only at the end of a given trial that these genes/variants may be classified in a more consistent way. To ensure consistent application of the SVC by practitioners from different health-care environments, the following solutions are warranted: (i) education of all oncologists about the SVC, including details on interpretation of the classifications; (ii) laboratory support for interpretation (i.e., consultation with laboratory directors); and (iii) simplified reporting of variants of less immediate clinical impact (e.g., listing all class 5 variants together).

Applying a few objective variables can result in the stratification of clinical significance of variants detected in a patient’s tumor. The information requirements for the SVC are simply the list of somatic variant(s) identified, the tumor site, and histology. The SVC provides structure to the interface between “actionable” and “not actionable” findings for patients with acquired genetic alterations. The application of the SVC to other datasets using increasingly comprehensive genomic platforms will provide a method of comparing platforms and datasets in terms of volumes of clinically relevant findings. Furthermore, the SVC provides a means to address issues such as whether additional findings from a more comprehensive platform would lead to changes in patient management. It also provides a framework for longitudinal comparison of the impact of the growth of cancer-related knowledge over time on the clinical actionability of findings from genome-scale testing.

Disclosure

The authors declare no conflict of interest.