Genome-wide sequencing approaches enable the characterization of the landscape of genomic alterations in many cancer types and have produced numerous clinically relevant biological insights.1,2 Whole-exome sequencing (WES) studies have identified genomic alterations of potential clinical interest beyond those present in “hot-spot” or “targeted panel” readouts.3 Moreover, the availability of germ-line WES data has led to the discovery of new Mendelian genes,4 high-yield molecular diagnoses in rare diseases5 and in Mendelian neurological disorders,6 identification of clinically actionable secondary findings in cancer,7 and unraveling of causative genetic mutations related to complex phenotype.8 Prospective large-scale somatic and germ-line studies of pediatric and adult cancer patients have also proved informative.9,10,11,12

In most large-scale approaches to cancer precision medicine, the systematic interpretation of somatic and germ-line WES data for clinical care remains challenging. In particular, assigning clinical meaning to each somatic and germ-line variant, including the therapeutic, prognostic, and diagnostic implications for individual patients, poses considerable difficulties in light of the inconsistent state of genome biological annotation. This highlights the need for a systematic and easy-to-understand interpretation system for implementing precision medicine in clinical oncology.

To begin to address these challenges, we developed a basic interpretive framework for somatic and germ-line variants used by CanSeq, a project of the national Clinical Sequencing Exploratory Research consortium13 funded by the National Human Genome Research Institute and the National Cancer Institute. CanSeq is a prospective study of clinical WES in adults with advanced lung and colorectal cancer14 (Supplementary Figure S1 online). Features of the study include the use of clinically acquired formalin-fixed paraffin-embedded (FFPE) tumor specimens, genome-scale analysis and interpretation of both somatic and germ-line results, and return of clinically relevant results to the patient and clinical team according to patient preferences.

Materials and Methods


This study was approved by the Institutional Review Board of the Dana-Farber/Harvard Cancer Center (DFCI protocol 12–078). All patients provided written informed consent for sequencing and return of results (Supplementary Table S1 online).

Patient enrollment

To date, 2,091 patients with metastatic lung cancer and 1,450 patients with metastatic colon cancer have been screened at Dana-Farber Cancer Institute for potential eligibility to participate in this study over a 3-year period (February 2013 to July 2015). The initial screening was based on the availability of tumor DNA; this cohort included patients who had 250 ng DNA previously extracted and available from FFPE tumors obtained for other studies. Overall, 128 patients with metastatic lung adenocarcinoma and 123 patients with colorectal adenocarcinoma were approached for enrollment based on specific clinical criteria (Supplementary Methods online). A total of 116 patients with lung adenocarcinoma and 113 patients with colorectal adenocarcinoma (combined enrollment rate 95%) had been enrolled in the study at the time of this analysis (Supplementary Tables S1 and S2 online). All enrolled patients provided their preferences regarding the return of somatic and germ-line findings during the process of consent.


WES was performed on DNA extracted from FFPE tumor samples and from matched peripheral blood. Initially, the sequencing was performed in a research setting at the Broad Institute (Cambridge, MA) and all reportable findings were confirmed in a Clinical Laboratory Improvement Amendments (CLIA)-certified clinical laboratory at the Brigham and Woman’s Hospital by a targeted panel or Sanger sequencing. Over the course of the study, exome sequencing was transitioned to a CLIA platform at the Broad Institute and somatic findings were returned to participants without additional confirmation; germ-line variants continued to be confirmed prior to the return of results.

Briefly, extracted DNA was fragmented, adapter-ligated, barcoded, and subjected to solution-phase hybridization with probes designed to enrich selected gene regions. Enriched library fragments were sequenced (2 × 76 base-paired end) using sequencing-by-synthesis chemistry and the Illumina HiSeq 2500 sequencer. Sequence data were aligned to the specified National Center for Biotechnology Information human reference sequence after low-quality sequences were discarded. Reads that aligned to more than one region of the reference genome, reads with low alignment scores, and bases with low-quality scores were excluded from variant calling. WES sequencing metrics are listed in Supplementary Table S3 online.

Sequencing analysis

Sequencing data were analyzed to identify somatic and germ-line point mutations, small insertions/deletions (indels), and copy-number alterations. The somatic single-nucleotide variants in targeted exons were identified using the MuTect algorithm. Indelocator ( was used to identify small insertions or deletions. Identified variants were annotated using Oncotator. ReCapSeg was used to analyze somatic copy-number gains and losses (threshold 0.1/−0.1). Tumor in normal (TiN) contamination was analyzed by the DeTiN tool with a TiN cutoff of <1%.

By design, WES did not detect genome rearrangements in introns, translocations and their breakpoint, and repeat expansion disorders. In addition, sequencing was not performed as a clinical test. Because sequencing was performed using an instrument for which clinical-grade analytic parameters have not been established, the absence of findings does not entirely exclude the possibility that additional clinically relevant sequence alterations were present but not detected.

The next step of analysis was performed using the analytical tool Precision Heuristics for Interpreting the Alteration Landscape (PHIAL) as previously described.15 Briefly, PHIAL uses a rule-based approach to rank-order alterations according to clinical and biological significance. These criteria enabled the prioritization of tumor or germ-line variants that were most likely to be clinically significant in patients with cancer (i.e., candidate variants). For somatic alterations, PHIAL interrogates the TARGET (tumor alterations relevant for genomics-driven therapy) database and pathway information to establish a score reflecting potential clinical relevance of alterations. Somatic alterations with a score ≥3 were selected for further evaluation in the postanalytical step. To rank-order relevant germ-line alterations, a list of the Online Mendelian Inheritance in Man (OMIM) genes was generated for PHIAL (Supplementary Table S4 online). This list, comprising 149 cancer-related genes and 35 non-cancer-related genes, included the 56 genes recommended for inclusion in exome- or genome-scale analyses by the American College of Medical Genetics and Genomics (ACMG).16 PHIAL interrogates this custom gene list and the Exome Aggregation Consortium (ExAC) database for reported population frequency. Germ-line variants in the target gene list occurring at a population frequency of <1% in ExAC were selected as candidate variants for further evaluation in the postanalytical step.

Postanalysis clinical interpretation of somatic alterations

The purpose of clinical interpretation of somatic data is to identify and characterize clinically relevant and potential clinically relevant alterations by assigning in-depth variant-level annotations to the somatic findings at the following levels: variant, gene, therapy, prognostic and diagnostic. After selection of candidate somatic variants by PHIAL (as described above), the variants were further annotated by a curation team. At the variant level, alterations were delineated for the type (nature) of the mutation, location of the mutation in the protein, reported functional consequence of the mutation in the literature, and reported incidence of the mutation in a subset of cancers. At the gene level, the role and reported frequency of the genes that harbor the mutation in the given cancer type were assessed. Therapeutic evidence was evaluated for all available therapies, including Food and Drug Administration (FDA)-approved agents as well as investigational therapies specific for the given cancer type. Information obtained from the variant description combined with knowledge of current therapies enabled assignment of a clinical classification to each somatic variant in a given cancer type. Levels of evidence associated with each therapeutic, prognostic, or diagnostic implication were provided using an updated version of our previously described framework15,17 ( Table 1 ): FDA-approved, level A (eligibility criterion for open clinical trial is met); level B (evidence is supported by early or limited clinical evidence); level C (clinical evidence is present in tumor types other than the tumor being tested); and level D (only preclinical evidence is present). Functionally characterized variants with inferential associations with therapeutic agents and functionally uncharacterized variants were assigned to level E.

Table 1 Levels of predictive therapeutic evidence of somatic alterations

At the beginning of the study (46 samples), the variants were curated and annotated by an institutional curation team with expertise in somatic cancer genetics and reviewed and finalized by at least two reviewers from the molecular tumor board (composed of 28 members with a wide variety of expertise, in oncology, pathology, cancer genomics, genetics, developmental therapeutics, bioethics, and other fields). Later in the study, after the protocol-based interpretation and reporting pipelines had been established, each variant was described and classified by the curation team and then reviewed and finalized by at least two reviewers (a medical geneticist and a medical oncologist).

Postanalysis clinical interpretation of germ-line alterations

The purpose of clinical interpretation of germ-line data is to identify and report constitutional genomic variants with familial or therapeutic implications. After selection of candidate germ-line variants by PHIAL (variants in the 184 genes with a population frequency ≤1%), the variants were evaluated and classified for pathogenicity.

At the beginning of the study (46 samples), germ-line curation and annotation were performed by an institutional curation team with expertise in germ-line genetics and then reviewed and finalized by two reviewers from the molecular tumor board. Variants were classified as pathogenic, benign, or a variant of unknown significance. Later in the study, after the protocol-based interpretation and reporting pipelines had been established, germ-line variants were classified according to ACMG guidelines18 and then reviewed and finalized by a board-certified medical geneticist. After classification, germ-line variants were categorized into the following groups: variants in genes specific to the given indication, variants in other cancer-related genes, variants in the ACMG incidental finding list, and variants in non-cancer-related genes.

Variant review and return decision

Decisions to return each annotated alteration (somatic and germ-line) to clinicians were initially based on the review by a molecular tumor board. Over the first 15 months, the tumor board evaluated curated variants from 46 patients. This process was subsequently replaced by a protocol-based interpretive algorithm (Figure 3c) to improve scalability. The somatic algorithm specified the return of results in variants with potential clinical relevance for the given clinical context (FDA-approved classes, levels A–E). Characterized and uncharacterized variants in genes with unknown clinical implications were not reported. The germ-line algorithm specified the return of variants in genes specific to the patient’s cancer (pathogenic, likely pathogenic, and variants of unknown significance), variants in other cancer-related genes (pathogenic, likely pathogenic), variants in the ACMG incidental finding list (pathogenic or likely pathogenic according to ACMG recommendations), and variants in non-cancer-related genes (pathogenic only). Results were returned in accordance with patient preferences for the types of somatic and germ-line findings they were willing to receive. All return decisions were reviewed and finalized by a medical geneticist.

Reporting genomic results

An integrated somatic and germ-line clinical report was prepared for each patient and officially signed out by a medical geneticist before being placed in the medical record. Genome reports were also sent directly to the treating oncologists for disclosure to the participants. All patients were offered genetic counseling at the Dana-Farber Cancer Institute’s Center for Cancer Genetics and Prevention at the time of enrollment and after disclosure of results.


Somatic exome sequencing

At the time of this analysis, tumor and germ-line WES was performed, analyzed, interpreted, and returned to the clinical team for 165 participants. At least one somatic alteration was reported for 91% (150/165) of these participants. Overall, 768 somatic alterations were selected for further curation after initial prioritization by PHIAL;15 67% (511/768) of these somatic alterations were determined to be potentially clinically relevant and reported to the clinical team ( Figure 1a , Supplementary Figures S2–S4 online, Supplementary Table S5 online). Two hundred seven somatic alterations (27%) were assessed as being not clinically relevant and not reported. The remaining 50 alterations were found to be clinically relevant, but 17 were not reported due to patient preferences and 33 were not reported because they were not confirmed by a confirmatory test in a CLIA laboratory (Supplementary Table S5 online). Virtually all the mutations that were not confirmed in a CLIA laboratory were low-allelic-fraction mutations detected by WES in the research platform and were subsequently not detected by a CLIA-targeted sequencing platform, which has a higher stringency for reporting potentially subclonal mutations.

Figure 1
figure 1

Reported CanSeq somatic alterations. A total of 768 somatic variants were deemed candidates for curation; 67% of somatic variants were reported, 27% were not reported because there was no evidence for clinical relevance, 2% were not reported based on patient preference, and the remaining 4% were not confirmed in a Clinical Laboratory Improvement Amendments laboratory (a). Reported somatic variants were rank-ordered according to the highest predictive therapeutic level; 31% were classified as having “clinical evidence” (Food and Drug Administration-approved, levels A, B, and C) and the remaining 69% were classified as having “preclinical or theoretical evidence” (levels D and E) (b). Reported genes that harbored somatic variants are shown (c).

Each of the 511 somatic alterations returned to the clinical team was classified in at least one therapeutic category ( Table 1 ). Only 31% (158/511) were associated with some degree of clinical evidence (i.e., FDA-approved and levels A–C) ( Figure 1b , c ). Reported somatic alterations in each specific therapeutic category are described in further detail in Supplementary Figures S2 and S3 online and Supplementary Tables S5–S8 online. A large proportion (69% (353/511)) of the reported somatic alterations was not associated with any direct clinical evidence ( Figure 1b , c ). For 21%, the highest level of evidence for therapeutic decision making was from preclinical studies (level D); for 48%, the rationale was theoretical and/or inferential (level E). Thus, although 91% of participants received at least one somatic alteration, the majority of these were associated with preclinical, inferential, or theoretical rationales.

Germ-line exome sequencing

Twenty-one percent of the 165 patient reports included at least one germ-line alteration. Overall, 806 germ-line variants were reviewed, of which only 5% (43/806) were deemed potentially clinically relevant ( Figure 2a , Supplementary Figure S4 online). These 43 variants were classified as pathogenic (44%), likely pathogenic (17%), or a variant of unknown significance when the variant was in a gene directly related to the patient’s cancer (39%) ( Figure 2b , Supplementary Table S9 online). The clinical implications of these variants relate to familial susceptibility to cancer and other OMIM diseases. Figure 2c shows the frequency of reported germ-line genes in each of the three categories. Of the 43 reported germ-line variants, 72% were in cancer-related genes and 28% in non-cancer-related genes. Ten additional alterations were found to be clinically relevant; five were not reported due to patient preferences and five were not reported because they were not confirmed by a confirmatory test in a CLIA laboratory (Supplementary Table S5 online). The five germ-line mutations that were not confirmed in a CLIA laboratory were probably due to technical artifacts in the research platform. The remaining 753 germ-line variants (93%) were classified as benign, likely benign, or variants of unknown significance (VUS) in a gene not directly related to the patient’s cancer and therefore not reported (Supplementary Figures S4 and S5 online, Supplementary Table S5 online).

Figure 2
figure 2

Reported CanSeq germ-line alterations. Of 806 curated germ-line variants, 5% were reported and the remaining 95% were not reported (a). Of reported germ-line variants, 47% were classified as pathogenic, 16% as likely pathogenic, and 37% as variants of unknown significance (b). The genes that harbored reported germ-line variants and their associated American College of Medical Genetics and Genomics classifications are shown (c).

Of all curated germ-line variants, 56% were classified as variants of unknown significance (Supplementary Figure S5 online). Although the role of these variants in relation to familial disease or carrier status is unknown, they could present potential therapeutic options for cancer patients. Because many of these variants reside in genes involved in DNA repair (BRAC1, BRCA2, ATM, CHEK), chromosome instability (FANC, BLM, WRN), microsatellite instability, and DNA mismatch repair (MSH2, MSH3, MLH1, MSH6, PMS1, PMS2) pathways, tumors might harbor secondary somatic mutations in these pathways that could be potential therapeutic targets in clinical trials.

Role of the molecular tumor board

Over the first 15 months, CanSeq held two meetings of the molecular tumor board each month. The board’s 28 members represented a wide variety of expertise, in oncology, pathology, cancer genomics, genetics, developmental therapeutics, bioethics, and other fields. The initial purpose of this group was to vet the somatic and germ-line sequencing results and clinical annotations and to make decisions about the return of each somatic and germ-line alteration. For 46 patients reviewed by the tumor board, the committee considered 207 somatic alterations and decided to return 156 (75%) and considered 166 germ-line variants and decided to return 19 (12%).

The scalability of this time-intensive approach proved challenging because the molecular tumor board had difficulty keeping pace with the rapid accumulation of incoming variants from enrolled patients ( Figure 3a ). To address this, lessons learned from early board meetings were transformed into protocols for the evaluation and return decision for somatic and germ-line findings ( Figure 3c ). Specifically, the experience with the first 46 patients informed the development and codification of rules for curation, interpretation, and decisions for return. This included decisions about the threshold for return of somatic and germ-line alterations, the type of information to be returned in reports, and specific language for communicating levels of evidence and uncertainty about clinical decision making to physicians. Implementing a protocol-driven approach to return of results substantially improved the efficacy of the process of variant review and reporting ( Figure 3b ).

Figure 3
figure 3

CanSeq variant review and return decision process. The variant review and return decision process is compared when performed by a tumor board (a) and a protocol-driven approach (b). The cumulative number of the incoming curated variants over time is shown in dark red. The cumulative number of variants that have received a return decision over time is shown in light red. The efficiency of return decision was calculated as a function of cumulative incoming curated variants to cumulative variants with reporting decisions over the same period of time. The somatic and germ-line rules for the protocol-based return decision algorithm are shown (c).


In this paper, we report our experience with clinical interpretation and reporting of prospective somatic and germ-line WES for 165 patients with advanced colorectal and lung carcinoma. The interpretive framework presented here has several key features. First, somatic clinical interpretation is performed at the level of the variant rather than simply that of the gene. Although many genes might be considered actionable in principle, different somatic alterations in a given gene can have different functional consequences, depending on the nature and the location of the mutation, and thus must be interpreted individually.15,17,19 Second, the interpretation of each alteration takes into account the clinical, pathological, and molecular context to determine clinical implications. For example, tumor type, stage, and concurrent genomic alterations may all impact the interpretation of a particular somatic mutation. Finally, because each somatic alteration may have a different level of evidence for each therapeutic option, multiple levels of evidence for each variant can be provided. The level of evidence is linked to the action taken rather than being intrinsic to the particular alteration.

Other frameworks for providing levels of evidence for actionable somatic alterations have recently been described.9,10 Most of these evidentiary frameworks follow the same general pattern as that of the framework used in this study, with therapeutic categories largely mapping to one another. However, many frameworks have less granularity than the one presented in this paper. The presence of distinct clinical, preclinical, and theoretical/inferential categories in the CanSeq framework indicates that a large fraction of the reports of clinical relevance of somatic alterations in tumor profiling studies may be based on preclinical or theoretical rationales. Indeed, this is the case for nearly 70% of reportable alterations in this study.

Interpretive methods for these categories of somatic alterations remain underdeveloped. As a result, for most somatic WES data, no widely accepted framework for decision support currently exists. Nevertheless, the distinction in levels of evidence between clinical, preclinical, and theoretical/inferential may be important for patients and physicians when making clinical decisions and represents a clear area of need for data-interpretation research.

A next-generation interpretive framework might build on the more simplistic model used in this and other studies ( Figure 4 ). Elements of a future framework might include multiple types of -omic and molecular data, novel computational tools (e.g., pathway analysis and structural analysis), and integration of germ-line and somatic findings. Orthogonal sources of information to modify the strength of evidence for alterations that are currently classified as level D and E evidence, including experimental data generated by large-scale efforts or by focused “VUS investigative teams,”15 could also be incorporated. In addition, future interpretive frameworks might include integration of detailed clinicopathological information, physician reasoning and decision making, and longitudinal patient-outcome data. Such information would allow the development of a “learning system” to refine levels of evidence in a specific clinical context, which could then be applied to future patients. Such learning systems are now beginning to emerge in pilot studies, e.g., the American Society for Clinical Oncology’s TAPUR study (Targeted Agent and Profiling Utilization Registry; and the American Association for Cancer Researcher’s Project GENIE (Genomics Evidence Neoplasia Information Exchange; Ultimately, however, studies of patient outcomes following the use of sequencing to inform decision making are needed to permit assessment of the clinical utility of this technology.

Figure 4
figure 4

A next-generation interpretive framework model. This model includes multiple types of -omic data and novel computational tools for pathway analysis and functional investigation of variants of unknown significance alterations. Such a model could enable the development of more refined levels of therapeutic evidence that can in turn aid in the decision-making process.

Matched germ-line sequence data are helpful in refining the analysis of tumor genomic data15,19,20,21 but requires interpretation in their own right.10,18,21,22 In this study, we developed a list of 184 genes for analysis (Supplementary Table S4 online). Others have recently proposed similar lists of germ-line genes for cancer patients.10,22 In contrast to somatic alterations, only 5% of the germ-line variants evaluated were judged to be clinically relevant. This difference was due, in part, to different thresholds for return of somatic and germ-line results. Somatic alterations with theoretical or inferential levels of evidence (level E) were returned to the clinical team because of the potential value of therapeutic options for patients with advanced cancer who had exhausted the standard-of-care treatments. By contrast, the threshold for returning germ-line variants was higher, requiring known pathogenic or likely pathogenic findings for most cancer and noncancer genes. The high threshold for return of germ-line variants is based on the relevance of these alterations to inherited genetic conditions.

Most germ-line interpretation systems, including the one used in CanSeq, do not take into account the potential relevance of these variants on therapeutic decision making in cancer. This highlights another area in need of oncology research because patients harboring these germ-line mutations may benefit from emerging clinical trials due to the presence of germ-line findings or a combination of germ-line and somatic mutations. Indeed, many of the germ-line variants were in genes related to DNA repair, microsatellite instability, and chromosome instability. The current germ-line interpretation system focuses only on establishing pathogenic classification in relation to disease susceptibility. A secondary germ-line interpretation framework will be needed to focus on the classification of germ-line variants based on therapeutic implications and candidacy for clinical trials.

The use of a molecular tumor board was pioneered in the MiOncoseq study at the University of Michigan23 and has become a cornerstone of many cancer precision medicine programs. However, we found it difficult to scale a molecular tumor board that played a gating role in the process of variant evaluation and return ( Figure 3a ). Thus, over time, we incorporated the lessons of the board into a protocol-based framework that could more easily be scaled with patient enrollment ( Figure 3b , c ). The molecular tumor board is no longer “gating” for CanSeq reporting; the role and scope of the CanSeq molecular tumor board evolved to an institutional tumor board for genomics, serving as educational meetings for challenging or educational cases, much as a traditional tumor board is used in academic teaching hospitals.

Implementation of a prospective somatic and germ-line WES program for cancer precision medicine in patients with solid tumors requires robust analytic and interpretive approaches. Beyond this, challenges remain regarding the communication and utilization of clinical sequencing data. Ultimately, the use of clinical sequencing data will depend not only on the level of evidence provided by the interpretation but also on factors such as patient and clinician preference, the availability of therapies, the existence of alternative options, and other aspects of any specific clinical scenario. Furthermore, as more comprehensive sequencing tests are performed, clinicians and patients increasingly face the challenge of prioritizing multiple potentially actionable alterations found simultaneously. The use of experts in clinical cancer genome interpretation as part of a somatic genomics consult team or subspecialty clinic may be helpful in providing guidance in the near term, similar to the role played by medical subspecialists or genetic counselors. Studies of the impact on clinical decision making as well as patient and physician understanding of the results are ongoing and will begin to answer some of these questions as cancer precision medicine becomes more widely implemented.

An interpretive framework for assigning clinical meaning to somatic and germ-line sequencing data is an essential element of prospective genomic profiling in clinical decision making, although many aspects of these frameworks remain underdeveloped. The evolution and implementation of such frameworks in clinical sequencing programs are necessary for the development of cancer precision medicine.


N.W. is a stockholder in Foundation Medicine, is a consultant/adviser for Novartis, and receives sponsored research support from Novartis. L.A.G. is a stockholder in Foundation Medicine; is a consultant/adviser for Foundation Medicine, Novartis, and Boehringer Ingelheim; is a member of the Scientific Advisory Board at Warp Drive; and receives sponsored research support from Novartis. E.M.V.A. is a stockholder in Syapse; is a consultant/adviser for Roche, Third Rock Ventures, and Takeda; and receives sponsored research support from Bristol-Myers Squibb. The other authors declare no conflict of interest.