Toward automation of germline variant curation in clinical cancer genetics

A Correction to this article was published on 21 March 2019

This article has been updated

Abstract

Purpose

Cancer care professionals are confronted with interpreting results from multiplexed gene sequencing of patients at hereditary risk for cancer. Assessments for variant classification now require orthogonal data searches and aggregation of multiple lines of evidence from diverse resources. The clinical genetics community needs a fast algorithm that automates American College of Medical Genetics and Genomics (ACMG) based variant classification and provides uniform results.

Methods

Pathogenicity of Mutation Analyzer (PathoMAN) automates germline genomic variant curation from clinical sequencing based on ACMG guidelines. PathoMAN aggregates multiple tracks of genomic, protein, and disease specific information from public sources. We compared expertly curated variant data from clinical laboratories to assess performance.

Results

PathoMAN achieved a high overall concordance of 94.4% for pathogenic and 81.1% for benign variants. We observed negligible discordance (0.3% pathogenic, 0% benign) when contrasted against expert curated variants. Some loss of resolution (5.3% pathogenic, 18.9% benign) and gain of resolution (1.6% pathogenic, 3.8% benign) were also observed.

Conclusion

Automation of variant curation enables unbiased, fast, efficient delivery of results in both clinical and laboratory research. We highlight the advantages and weaknesses related to the programmable automation of variant classification. PathoMAN will aid in rapid variant classification by generating robust models using a knowledgebase of diverse genetic data (https://pathoman.mskcc.org).

INTRODUCTION

Genetic testing and targeted resequencing of cancer susceptibility genes to facilitate precision cancer prevention and early diagnosis has grown exponentially because of the decreasing costs of next-generation sequencing (NGS).1,2 A major challenge in clinical sequencing is interpreting sequence variants. The American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathology (AMP) have published guidelines on interpretation of germline variants, considering both their pathogenicity and clinical actionability.3 Yet, germline variant classification continues to impose an immense burden on the time and resources of diagnostic molecular laboratories and cancer care professionals. The ACMG classification schema requires manually exploring multiple lines of public data, as well as other orthogonal data sources and literature, and then aggregation and scoring to provide evidence for classifying variants.4 Currently, there is no widely available automated computational framework for classifying genetic variants based on ACMG criteria. We developed the Pathogenicity of Mutation Analyzer (PathoMAN), a computational resource that automates germline variant classification with uniformity, transparency, and speed, to facilitate variant curation for the cancer genetics community.

PathoMAN’s variant curation algorithm classifies germline genomic variants. The schema is inspired by ACMG/AMP classification.3 It aggregates multiple tracks of genetic and molecular evidences using variant annotators and from public repositories containing evidence necessary for pathogenicity assertion. The compiled data is then used in 28 distinct categories, which are grouped as variant type, biological impact, in silico predictions, presence in the control cohort, familial information, and inheritance mode. The aggregate score resulting from evaluation of these categories is used in generating the assertion for a variant as Pathogenic (P), Likely Pathogenic (LP), Benign (B), Likely Benign (LB), or Variant of Uncertain Significance (VUS).

PathoMAN’s performance was measured by reevaluating expertly curated germline cancer variants from three clinical testing laboratories: Ambry, Invitae, and GeneDx. We selected the commonly tested cancer susceptibility genes in multiplex panels, many of which are in the ACMG recommended gene list. We also tested the algorithm on reported P/LP variants from four published cancer studies on non-ACMG heritable cancer risk and putative risk genes. In this study, we also assessed the frequency of clinically actionable variants present in general population using ExAC (noTCGA) data in cancer susceptibility and predisposition genes. We tested the application of ACMG criteria for germline cancer variants and addressed the bottleneck of variant curation in using automated algorithms in variant classification.

MATERIALS AND METHODS

Test data sets

To test the performance of PathoMAN against manual curation, we selected variants that were commonly reported by CLIA-certified clinical testing laboratories—Ambry, Invitae and GeneDx (3513 variants in 27 genes)—with identical assertions in ClinVar (version September 2018). Many variants in this test data set have also been reported by multiple other submitters in ClinVar. The Invitae data set consisted of 1494 B/LB, 608 P/LP, and 1412 VUS variants; the Ambry data set 1517 B/LB, 646 P/LP, and 1351 VUS variants; and the GeneDx data set 1512 B/LB, 633 P/LP, and 1369 VUS variants (Table S1). We used these data sets to test the assertions of pathogenic and benign classifications using PathoMAN.

We also tested PathoMAN on 300 P/LP variants in 55 genes from four published reports on multiple cancers,5,6,7,8 where ACMG criteria and primary assertions were available. We chose these data sets to test lesser known cancer predisposition genes that are not part of the ACMG list.

ACMG/AMP guidelines

The variant classification criteria used by PathoMAN utilizes ACMG/AMP guidelines.3 The ACMG/AMP guidelines constitute 16 criteria that aid in classifying pathogenicity and 12 criteria that aid for classifying benignity. This classification system resolves a variant as pathogenic or benign based on eight major components: population frequency data, genomic annotation and computational predictive data, functional data, segregation data, de novo data, allelic/genotypic data, public databases, and literature and other data (Table 1). The detailed usage is described in the “Determination of ACMG criteria for variant classification” section of the supplemental materials.

Table 1 Utilization of ACMG criteria and knowledgebase in PathoMAN variant curation

The results of PathoMAN were compared against the reported clinical assertion. We report them here in four categories: concordance, discordance, loss of resolution (LOR), and gain of resolution (GOR). When the reported P/LP and B/LB variants are reclassified as P/LP and B/LB respectively by PathoMAN, then the results are considered concordant. Similarly, when reported P/LP and B/LB variants are reclassified as B/LB and P/LP respectively by PathoMAN, then the variants are considered discordant. When reported P/LP or B/LB variants are reclassified as VUS by PathoMAN, then they are placed in the LOR category because PathoMAN cannot definitively classify these variants as either pathogenic or benign, thus losing resolution. Similarly, when the reported VUS are reclassified as P/LP or B/LB, they are considered as GOR because PathoMAN can resolve these variants as pathogenic or benign (Table S2). These evaluations aid in understanding the usage of the eight ACMG categories of evidence in the context of cancer genetics, and in their ability to differentiate between P/LP, B/LB, and VUS.

ExAC subset of cancer predisposition genes

The Exome Aggregation Consortium9 (ExAC) is a joint effort to aggregate exome sequencing data from 14 large sequencing projects to provide summary data such as ethnicity specific allele frequency for a wider scientific community. The ExAC noTCGA data is a subset of 53,105 samples and it excludes The Cancer Genome Atlas (TCGA) cancer germline samples (n = 7601). ExAC is often utilized as convenience controls for several cancers6,10,11 in case–control design. We wanted to estimate the burden of variants in ExAC noTCGA as classified by PathoMAN and contrast against known information in ClinVar. We selected 55,566 variants from 76 known and putative cancer risk genes (Table S3) that were in exonic or essential splice-site regions. This is not considered part of the test data sets described earlier, because ExAC data is used as part of the ACMG criteria PS4, PM2, BA1, BS1, and BS2.

RESULTS

We developed PathoMAN, a germline variant classification algorithm, which is provided freely as a web-based service that allows users to either query single variants or batch upload a CSV file. Single-variant query works on chromosome, position, reference allele, alternative allele, allele count, allele number, de novo status, cosegregation status, and preferred control population subgroup. Batch upload requires six columns [chr, pos, ref, alt, ac, an] in a comma-separated values (CSV) file. Users can select de novo status, cosegregation status, and preferred control population subgroup. The program converts the CSV file to a minimal VCF4.2 file, annotates the minimal VCF, and prepares it for PathoMAN variant classification. The result of a single variant query is displayed back on to the webpage immediately while the batch upload results will be emailed back to the submitter. For an annotated VCF file containing 100 variants, PathoMAN takes 6 minutes, which is 3.6 seconds per variant. This provides a massive advantage in terms of speed, uniformity, efficiency, and service assurance compared with manual curation. In contrast, an expert reviewer may take 20–30 minutes to classify a novel variant.

PathoMAN versus manual curation for test data sets

To evaluate the performance of PathoMAN, we compared its results against clinical assertions from three clinical laboratories. The test data set contained 3513 variants with prior reported curation from the three clinical laboratories: Ambry Genetics, Invitae, and GeneDx. Interlab agreement across the three laboratories was 84%. The 16% interlab disagreement variants were marked as VUS when comparing against PathoMAN results. Amongst 3513 variants, missense variants accounted for 54.2%, synonymous variants for 24.7%, frameshift variants for 7.5%, stopgain variants for 4.7%. and the rest were distributed among splice variants, in-frame insertion/deletion, and stoploss in the test data set (Fig. 1a, Table S4). Variants were annotated with CAVA, ANNOVAR, ExAC noTCGA, gnomAD, and ClinVar and run through PathoMAN.

Fig. 1: Comparison of PathoMAN results against expertly curated variants.
figure1

(a) Distribution of 3513 variants in the test data set by variant class. EE variant that alters the first or last three bases of an exon (i.e., the exon end), but not the frame of the coding sequence; ESS any variant that alters essential splice-site base (+1,+2, −1, −2); FS frameshifting insertion and/or deletion; it alters length and frame of coding sequence; IF in-frame insertion and/or deletion; it alters length but not frame of coding sequence;IM variant that alters initiating methionine start codon; NSY nonsynonymous variant; it alters amino acid(s) but not coding sequence length; SG stopgain (nonsense) variant caused by base substitution; SS any variant that alters splice-site base within the first eight intronic bases flanking exon (i.e., +8 to −8) but not an ESS or SS5 base;SS5 any variant that alters +5 splice-site base but not an ESS base, SY synonymous variant; it does not alter amino acid or coding sequence length; 3PU any variant in 3′ untranslated region; 5PU any variant in 5′ untranslated region. (b) Performance of PathoMAN’s variant classification against variant classification from three clinical laboratories: Ambry Genetics, Invitae, and GeneDx. B/LB benign/likely benign, GOR gain of resolution, LOR loss of resolution, P/LP pathogenic/likely pathogenic. (c) Concordance of PathoMAN and clinical lab results for reported P/LP variants group by penetrance of the gene (1257 variants in 27 genes [supplementary table S5]). (d) Concordance of PathoMAN and published reports for reported P/LP variants from four cancer studies (300 variants in 55 genes: Mandelker et al.7, 97 variants; Maxwell et al.5, 40 variants; Pritchard et al.6, 56 variants; and Zhang et al.8, 107 variants [supplementary table S7]).

Upon comparing the clinical assertions from the three laboratories with PathoMAN’s results, the number of observed agreements was 90% (3153 of 3513 variants) and the Cohen’s kappa coefficient for interrater agreement (к) was 0.83 (confidence interval [CI] 95% 0.82–0.85).

PathoMAN achieved a concordance of 94.4% for P/LP and 81.1% for B/LB variants (Fig. 1b). It showed 100% concordance for P/LP frameshift, splice sites, and truncating variants. Similarly, there was 100% concordance for B/LB synonymous variants. There was a minimal discordance seen at 0.3% for P/LP variants (n = 2). PathoMAN failed to resolve 22 P/LP missense variants and reclassified 2 P/LP variants as B/LB (a synonymous variant and an extended splice-site variant) (Table 2). The synonymous PMS2 variant (c.825A>G p.Gln275Gln) was functionally shown to have aberrant splicing by experimental methods in literature. The MSH2 splice variant (c.942+3A>T) has been demonstrated to disrupt messenger RNA (mRNA) splicing and result in skipping of exon 5. PathoMAN assertions are made in real time by the algorithm. In future versions of the program, we intend to incorporate a consensus splice prediction module and a literature-based evidence module, which would aid in correct classification of these edge case variants. No B/LB variants were reclassified as P/LP.

Table 2 Distribution of PathoMAN reclassified germline cancer variants by variant class

PathoMAN reclassified 5.3% (n = 32) reported P/LP variants and 18.9% (n = 238) reported B/LB variants as VUS. Fifty-three percent of these P/LP variants (17/32) and 74.4% B/LB variants (177/238) were classified as VUS in ClinVar by at least one submitter previously. This suggests that, for these variants, a consensus assertion has not been achieved due to insufficient clinical information. Similarly, PathoMAN reclassified 1.6% (n = 26) and 3.8% (n = 63) of VUS as P/LP and B/LB respectively. Sixty-two percent of these reported VUS variants (55/89) had at least one submitter classify them as P/LP or B/LB among the three laboratories and in ClinVar. One of the major sources of gain of resolution is achieved by PathoMAN’s use of the saturation mutagenesis experiments on BRCA1.12 Other reasons for reclassification are version changes in ClinVar and updates on the public allele frequencies in ExAC and gnomAD.

When grouped as high and low penetrance genes (Fig. 1c, Table S5), we calculated the absolute difference between classifications and observed Cohen’s kappa coefficient for interrater agreement (к) of 0.82 (CI 95% 0.805–0.843) and 0.84 (CI 95% 0.77–0.92) respectively.

When compared against 300 P/LP variants in putative cancer predisposition candidate genes from four published cancer studies, PathoMAN showed 96.4% concordance for Pritchard et al.6 (prostate cancer), 87.5% concordance for Maxwell et al.5 (breast cancer), 84.5% concordance for Mandelker et al.7 (multiple cancer types), and 80.3% for Zhang et al.8 (pediatric cancer) (Fig. 1d).

PathoMAN results for ExAC data set

PathoMAN classified <1% of the heterozygous genotypes in ExAC noTCGA data set (55,566 exonic and essential splice variants from 76 cancer risk genes) as P/LP. We tabulated pathogenic variant burden by genes and compared them against ClinVar (Table 3). PathoMAN calls similar number of P/LP variants as reported in ClinVar for the high-risk cancer genes like BRCA1 and BRCA2. PathoMAN also predicts a few novel P/LP variants unreported in ClinVar. Investigators who intend to use the ExAC noTCGA data set as controls in a gene burden test against sequenced cancer cases can use PathoMAN to get a list of P/LP variants across genes.

Table 3 Comparison of PathoMAN P/LP burden versus ClinVar P/LP burden for ExAC (noTCGA) variants by cancer susceptibility genes

Usage of ACMG/AMP categories in PathoMAN

We analyzed the usability and frequency of use for the eight categories of evidence (population frequency, genomic annotation and computational prediction, functional evidence, cosegregation, de novo status, allelic/genotypic data, public databases, scientific literature and other data) described in the ACMG/AMP guidelines. Interestingly, we find that the categories population frequency data, genomic annotation and computational predictions, databases, and scientific literature (Fig. 2a) are the most used. These are available due to generous data and toolkit-sharing policies in the genomics field. The categories that are rarely if ever used are familial cosegregation data or de novo status, allelic data, and functional data. We have also used ClinVar’s review status to upweight functional evidence in the current version, as we believe that the review status directly corresponds to the literature evidence reported for a variant (Fig. 2b). The cosegregation data and de novo status data are limited to familial studies and are mostly unavailable in sporadic case–control settings because these are collected by investigators based on patient input. It was clear from ACMG guidelines that, for a variant to be classified as pathogenic or likely pathogenic by ACMG criteria, one needs a maximum of 1 PVS1 or 2 PSs or 3 PMs or 4 PPs for which the knowledgebase and resources used by PathoMAN were demonstrably sufficient. We describe below the bottlenecks in sharing this information and propose a novel framework to circumvent and ameliorate these issues.

Fig. 2: Comparison of PathoMAN results against expertly curated variants.
figure2

(a) Utilization of knowledgebase components by PathoMAN during variant curation of test data sets. (b) Ratio of number of articles per variant across different review status reported in ClinVar.

DISCUSSION

PathoMAN as a tool to aid variant curation

Traditionally, genetic variant curation has been performed manually by expert groups of individuals. However, this is a time-intensive task that requires aggregation and interpretation of information from multiple sources. In the cancer realm, this was relatively easy when a single gene, e.g., BRCA1/2, was under investigation. In contemporary testing scenarios, which routinely rely on multiplex gene panels, this task is onerous. Large gene discovery efforts, as well as clinical reporting, could use a simplified, automated method for prioritizing variants for a closer look, or in the best case, be useful as the classification tool of choice. PathoMAN addresses this critical unmet need for an unbiased algorithmic approach toward classifying genetic variants of clinical interest in cancer predisposition. PathoMAN can be easily accessed through a web browser and results for individual variants are almost immediately available, while results of a batch query may take a few minutes.

Genetic testing laboratories are increasingly utilizing ACMG/AMP classification rules to classify variants for pathogenicity within cancer predisposition genes. However, results vary depending on availability of accessible data and interpretational differences.13 Concordance between CLIA-certified laboratories varied between 37% and 71% pre- and postconsultative processes using the ACMG guidelines. We observed 84% interlaboratory agreement across all three clinical laboratories. Efforts are being made to narrow interpretational differences through initiatives underway such as ClinGen. In a recent report,14 13% of variants in ClinVar were reanalyzed, and were found to be unresolved, underscoring the difficulties even for expert curator groups. For manual or automated curation, the minimal set of information required to classify a variant as likely pathogenic or likely benign includes population frequency, in silico predictors, and prior reported evidence of pathogenicity from public databases. PathoMAN compiles this information uniformly in a machine accessible format that is used as a knowledgebase for variant classification. An additional advantage of using PathoMAN is that it can effortlessly identify benign variants based on public allele frequency and the genomic context information. In a typical multiplexed gene panel variant list, after filtering for only rare high or moderate impact variants, PathoMAN will classify about one-third of the variants as B/LB with high precision. This saves time and effort for the variant curators and helps them to focus on curating the remaining potentially actionable variants. PathoMAN will also identify known founder variants.

Cancer is a complex disease with multigene etiology. Some cancer genes confer high risk whereas some only moderately affect the carrier’s risk. Panel testing is currently used for active surveillance and intervention to lower disease risk. Large sequencing and genotyping efforts to discover new cancer predisposition genes are being carried out by several consortia like BCAC,15 SIMPLEXO,16 COMPLEXO,17 CIMBA,18 etc. As the cost for sequencing decreases, the number of genes tested is increasing. Automation allows for rapid processing, service assurance, and reproducibility of results. Gold standard sets of curation pioneered by ClinGen19,20 would aid in refining these pathogenicity classifications further, while efforts such as the PROMPT21,22 registry enable accurate penetrance estimates of variants in susceptibility genes. The PROMPT registry has identified a 26% discordance rate among clinical laboratories and an 11% rate with conflicting interpretations, a discrepancy that has implications for altering medical management.

Many laboratories and certain programs such as CardioClassifier23 and InterVar24 use prior knowledge of disease–gene pair association. This is advantageous to reduce misclassification of variants. However, it also suffers from the disadvantage that it cannot be used for lesser known disease-gene– pairs or for novel gene hunting. In a recent report,7 we showed that half of the cases in a series consisting of selected advanced cancers at a single institution were nonsyndromic associations.7 Probands or their close relatives had clinically actionable variants in cancer genes not directly associated with the specific cancers for which there were known syndromic associations. PathoMAN currently does not use the contextual syndromic association in deciphering pathogenicity of variants. This is a distinct advantage when searching for novel genetic association. However, in clinical sequencing, we acknowledge that limiting to known disease–gene pairs to identify pathogenic variants reduces false positives. In future versions, we hope to incorporate both clinical and gene discovery modes.

The variants that are manually curated as P/LP or B/LB, but that are called as VUS by PathoMAN, are grouped under the loss of resolution (LOR) category. This loss of resolution due to lack of accessible supporting evidence could be due to several reasons, e.g., inability to programmatically parse inline texts from public databases, availability of updated proprietary databases like HGMD and LOVD, unavailability of in-house functional evidence25 or familial cosegregation information,26 etc.

The upgrade for VUS to either LP or LB by PathoMAN is based on the three categories: lines of available evidence in public databases, population frequency, and computational and in silico prediction on deleteriousness. These variants can be reclassified as either pathogenic or benign if additional functional or cosegregation data became available to the user.

Commercial testing laboratories have proprietary versions of interpretation pipelines such as Sherloc27 (Invitae Corporation) and MyVISION (Myriad Genetics). However, these are unavailable to the community at large. PathoMAN is designed to provide an optimized platform for clinical variant calling utilizing publicly available data resources.

Using ACMG for variant classification in cancer

Variants in tumor suppressors and oncogenes lead to tumorigenesis, and the Knudson two-hit hypothesis28 is seen to operate in many common cancers. Common examples include APC,TP53, BRCA1/2 genes, etc. However, several of these genes, especially those that are part of the Fanconi complex (FANCS-BRCA1, FANCD1-BRCA2,FANCJ-BRIP1, FANCN-PALB2, FANCP-SLX4,RAD51C), neurofibromatosis (NF1), ataxia–telangiectasia (ATM), Bloom syndrome (BLM), Nijmegen breakage syndrome (NBN), dyskeratosis congenita (TERT), which lead to autosomal recessive rare Mendelian disorders, are also found to be risk genes for autosomal dominant cancer predisposition. Heterozygous carriers of these gene variants are reported to have increased risks for syndromic cancers.29 Occasionally, gene-disrupting heterozygous variants in these genes that are rare or absent in public controls such as ExAC and gnomAD may be observed in sequenced cancer cohorts. Their ClinVar record for pathogenicity is usually based on their Mendelian recessive syndrome and not to the cancer phenotypes. Hence, applying the ACMG rules to genes without membership in the ACMG list may be fraught with misclassification. However, we believe that continuing data streams for variants in these genes will lead to better classifications, especially when coupled with familial cosegregation and functional validations. While PathoMAN classifications for such genes are a useful starting point for identifying variants that may be pathogenic and discarding benign, we emphasize expert manual curation to disentangle these issues.

Limitations of automation

Automating variant classification based on publicly available information has some pitfalls. Supporting evidences provided in ClinVar for variants by submitters are not computation friendly and require manual curation to interpret free text. In several instances, the citations are not relevant to the specific records. Technologies such as natural language processing and tagging will eventually help to build a knowledgebase that can further be used for deep learning.

ACMG guidelines do not account for functional evidence provided in ClinVar (supporting observations), which leads to loss of information that could otherwise be used in variant classification. Due to this lack of data structure (free text), the bonafide variants in ClinVar are being coded only PP5 or BP6 and not PS3 or BS3. We employed the review status 2 or more as a proxy for functional evidence. Not all submitters are equipped or do independent analyses to assess functional evidences for their clinical assertion. If the ClinVar evidence is informatically coded, it would be helpful for molecular geneticists and clinical curators to use this information for their pathogenicity estimation. For example, TP53 (R273H),BRCA1 (Y105C), and BRCA1 (V1688del) variants have overwhelming literature evidences (Figure S1); however, the evidence present in the description of the submissions within ClinVar is computationally underivable. Similarly, there are many variants reported in the literature that may have some level of supporting evidence for pathogenicity or benignity in ClinVar. Currently all data integration is done by manual curators on a case-by-case basis.

We propose a framework to report ClinVar data that can be structured and parsable for an automated algorithm in the context of cancer. This format consists of six important fields that compress the vast information that is present in literature or clinical reports:

  1. 1.

    Population/ethnicity (Non-Finnish European [NFE], Ashkenazi Jewish [ASJ], African/African American [AFR], South Asian [SAS], Latino [AMR], Finnish [FIN], East Asian [EAS] and Other [OTH])

  2. 2.

    Inheritance model (autosomal dominant [AD], autosomal recessive [AR], de novo, X-linked)

  3. 3.

    Allelic status (Hom, Het)

  4. 4.

    Family history/cosegregation information (Yes–1; No–0)

  5. 5.

    Disease association (TCGA code/OncoTree code http://oncotree.mskcc.org)

  6. 6.

    Functional evidence (experiment type: Non-sense mediated decay [NMC] and Loss of heterozygosity [LOH], etc.)

For example, ERCC3 (R109X) variant30 can be depicted as ASJ-AD-Het-1:1-BRCA, BLCA-NMD. This variant was seen in Ashkenazi Jewish individuals with an autosomal dominant inheritance for the heterozygous allele. This variant cosegregated in one family with cancer history. The variant was found in breast cancer and bladder cancer individuals and the functional evidence for pathogenicity was carried out by testing for nonsense-mediated decay and other experiments.

Large sequencing studies and gene-specific functional studies give curated lists of variants with their pathogenic impacts, such as the TP53 database31 and a functional study on PALB2 variants.32,33 As a pilot project, we have collected a list of PALB2, TP53 variants from these literature as supporting the knowledgebase for PathoMAN (PS3/BS3 functional evidence) but there is a real need to create a publicly available, well-curated list of variants from the literature that is amenable to programmatic interpretation. Similarly, as standards evolve for the incorporation of somatic variants into germline interpretation, we expect an integration of such events for well-established tumor suppressor and oncogenes. The roles played by the ENIGMA consortium,34,35 G4GH (https://www.ga4gh.org/), and BRCA-Share36 in this regard are meritorious. Though clinical laboratories collaborate to resolve the differences in variant interpretations submitted to ClinVar,14 the fact remains, however, that a unified framework for incorporation of supporting machine-readable evidences in any variant database including ClinVar remains a critical bottleneck.

Functional data are rarely available for most genes. Exceptions areBRCA1/2 due to the concerted efforts of the ENIGMA consortium.34,35 In single-variant reports, data are usually buried within scientific jargon that is not compatible with genomic variant information. In many instances, functional data are dependent on the models used, e.g., overexpression of a mutant construct, deletion of a region using a CRISPR endonuclease, and sometimes, introduction of the specific nucleotide through homology-directed DNA repair. It is also likely that the results from these three methods do not agree. Novel methods to understand deleteriousness using saturation mutagenesis are also starting to emerge37,38 for example, for BRCA1 (ref. 12), we have incorporated the loss of function information into PathoMAN’s algorithm. We hope these will add a uniform layer of functional data that can be used in determining pathogenicity in the coming years.

In conclusion, we performed pathogenicity assessment of 59,379 variants in germline cancer risk genes, the first and largest uniform classification using an unbiased computational tool. We demonstrate the high concordance and low discordance when compared with manual curation as a harbinger of how such programs will soon be able to help domain experts and manual curators. PathoMAN is a first step toward our goal of automating the complex process of variant classification and interpretation. A beta version of the web app is available at https://pathoman.mskcc.org/.

URLs

1. CAVA: https://github.com/RahmanTeam/CAVA

2. SNPEff: http://snpeff.sourceforge.net/SnpEff_manual.html

3. ANNOVAR: https://annovar.openbioinformatics.org

4. ExAC noTCGA: http://exac.broadinstitute.org

5. gnomAD: http://gnomad.broadinstitute.org/

6. ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/

7. IARC database: http://p53.iarc.fr/

8. ClinVar parser tool: https://github.com/macarthur-lab/clinvar

9. dbNSfP and dbscSNV: https://sites.google.com/site/jpopgen/dbNSFP

10. Gene List: https://github.com/macarthur-lab/gene_lists

11. RepeatMasker: http://www.repeatmasker.org/

12. UCSC Genome Browser: https://genome.ucsc.edu

13. CardioClassifier: https://www.cardioclassifier.org/

14. InterVar: https://github.com/WGLab/InterVar

15. ACMG: https://www.acmg.net/

16. ClinVar Miner: https://clinvarminer.genetics.utah.edu/

17. PathoMAN: http://pathoman.mskcc.org/

18.Ambry: https://ambrygen.com/

19. Invitae: https://www.invitae.com/en/

20. GeneDx: https://www.genedx.com/

Change history

  • 21 March 2019

    An update to the original published conflict of interest for author Liying Zhang, PhD. L.Z. received compensation from Future Technology Research LLC (seminar on precision medicine), Roche Diagnostics Asia Pacific, BGI, Illumina (speaking activities at conferences/workshop). L.Z.'s family member has a leadership position and ownership interest of Shanghai Genome Center. This correction has been made.

References

  1. 1.

    Offit K. Multigene testing for hereditary cancer: when, why, and how. J Natl Compr Canc Netw. 2017;15:741–743.

    Article  Google Scholar 

  2. 2.

    Desmond A, Kurian AW, Gabree M, et al. Clinical actionability of multigene panel testing for hereditary breast and ovarian cancer risk assessment. JAMA Oncol. 2015;1:943–951.

    Article  Google Scholar 

  3. 3.

    Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17:405–424.

    Article  Google Scholar 

  4. 4.

    Pandey KR, Maden N, Poudel B, et al. The curation of genetic variants: difficulties and possible solutions. Genomics Proteomics Bioinformatics. 2012;10:317–325.

    CAS  Article  Google Scholar 

  5. 5.

    Maxwell KN, Hart SN, Vijai J, et al. Evaluation of ACMG-guideline-based variant classification of cancer susceptibility and non-cancer-associated genes in families affected by breast cancer. Am J Hum Genet. 2016;98:801–817.

    CAS  Article  Google Scholar 

  6. 6.

    Pritchard CC, Mateo J, Walsh MF, et al. Inherited DNA-repair gene mutations in men with metastatic prostate cancer. N Engl J Med. 2016;375:443–453.

    CAS  Article  Google Scholar 

  7. 7.

    Mandelker D, Zhang L, Kemel Y, et al. Mutation detection in patients with advanced cancer by universal sequencing of cancer-related genes in tumor and normal DNA vs guideline-based germline testing. JAMA. 2017;318:825–835.

    Article  Google Scholar 

  8. 8.

    Zhang J, Walsh MF, Wu G, et al. Germline mutations in predisposition genes in pediatric cancer. N Engl J Med. 2015;373:2336–2346.

    CAS  Article  Google Scholar 

  9. 9.

    Lek M, Karczewski KJ, Minikel EV, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291.

    CAS  Article  Google Scholar 

  10. 10.

    Hu C, Hart SN, Polley EC, et al. Association between inherited germline mutations in cancer predisposition genes and risk of pancreatic cancer. JAMA. 2018;319:2401–2409.

    CAS  Article  Google Scholar 

  11. 11.

    Robinson DR, Wu YM, Lonigro RJ, et al. Integrative clinical genomics of metastatic cancer. Nature. 2017;548:297–303.

    CAS  Article  Google Scholar 

  12. 12.

    Findlay GM, Daza RM, Martin B, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562:217–222.

    CAS  Article  Google Scholar 

  13. 13.

    Amendola LM, Jarvik GP, Leo MC, et al. Performance of ACMG-AMP variant-interpretation guidelines among nine laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet. 2016;99:247

    CAS  Article  Google Scholar 

  14. 14.

    Harrison SM, Dolinsky JS, Knight Johnson AE, et al. Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet Med. 2017;19:1096–1104.

    Article  Google Scholar 

  15. 15.

    Breast Cancer Association Consortium. Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. J Natl Cancer Inst. 2006;98:1382–1396.

    Article  Google Scholar 

  16. 16.

    Hart SN, Maxwell KN, Thomas T, et al. Collaborative science in the next-generation sequencing era: a viewpoint on how to combine exome sequencing data across sites to identify novel disease susceptibility genes. Brief Bioinform. 2016;17:672–677.

    CAS  Article  Google Scholar 

  17. 17.

    Complexo,Southey MC, Park DJ, et al. COMPLEXO: identifying the missing heritability of breast cancer via next generation collaboration. Breast Cancer Res. 2013;15:402.

    Article  Google Scholar 

  18. 18.

    Chenevix-Trench G, Milne RL, Antoniou AC, et al. An international initiative to identify genetic modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers: the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). Breast Cancer Res. 2007;9:104.

    Article  Google Scholar 

  19. 19.

    Patel RY, Shah N, Jackson AR, et al. ClinGen Pathogenicity Calculator: a configurable system for assessing pathogenicity of genetic variants. Genome Med. 2017;9:3.

    Article  Google Scholar 

  20. 20.

    Rehm HL, Berg JS, Brooks LD, et al. ClinGen—the Clinical Genome Resource. N Engl J Med. 2015;372:2235–2242.

    CAS  Article  Google Scholar 

  21. 21.

    Balmana J, Digiovanni L, Gaddam P, et al. Conflicting interpretation of genetic variants and cancer risk by commercial laboratories as assessed by the Prospective Registry of Multiplex Testing. J Clin Oncol. 2016;34:4071–4078.

    CAS  Article  Google Scholar 

  22. 22.

    Walsh MF, Gaddam P, Digiovanni L, et al. Prospective registry of multiplex testing (PROMPT): a web-based platform to assess cancer risk of genetic variants. J Clin Oncol. 2016;34 15 suppl:1518.

    Article  Google Scholar 

  23. 23.

    Whiffin N, Walsh R, Govind R, et al. CardioClassifier: disease- and gene-specific computational decision support for clinical genome interpretation. Genet Med. 2018;20:1246–1254.

    CAS  Article  Google Scholar 

  24. 24.

    Li Q, Wang K. InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet. 2017;100:267–280.

    CAS  Article  Google Scholar 

  25. 25.

    Kelly MA, Caleshu C, Morales A, et al. Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s Inherited Cardiomyopathy Expert Panel. Genet Med. 2018;20:351–359.

    Article  Google Scholar 

  26. 26.

    Jarvik GP, Browning BL. Consideration of cosegregation in the pathogenicity classification of genomic variants. Am J Hum Genet. 2016;98:1077–1081.

    CAS  Article  Google Scholar 

  27. 27.

    Nykamp K, Anderson M, Powers M, et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genet Med. 2017;19:1105–1117.

    Article  Google Scholar 

  28. 28.

    Knudson AG Jr. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci USA. 1971;68:820–823.

    Article  Google Scholar 

  29. 29.

    Olsen JH, Hahnemann JM, Borresen-Dale AL, et al. Cancer in patients with ataxia-telangiectasia and in their relatives in the Nordic countries. J Natl Cancer Inst. 2001;93:121–127.

    CAS  Article  Google Scholar 

  30. 30.

    Vijai J, Topka S, Villano D, et al. A recurrent ERCC3 truncating mutation confers moderate risk for breast cancer. Cancer Discov. 2016;6:1267–1275.

    CAS  Article  Google Scholar 

  31. 31.

    Bouaoun L, Sonkin D, Ardin M, et al. TP53 variations in human cancers: new lessons from the IARC TP53 database and genomics data. Hum Mutat. 2016;37:865–876.

    CAS  Article  Google Scholar 

  32. 32.

    Janatova M, Kleibl Z, Stribrna J, et al. The PALB2 gene is a strong candidate for clinical testing in BRCA1- and BRCA2-negative hereditary breast cancer. Cancer Epidemiol Biomarkers Prev. 2013;22:2323–2332.

    CAS  Article  Google Scholar 

  33. 33.

    Southey MC, Goldgar DE, Winqvist R, et al. PALB2, CHEK2 and ATM rare variants and cancer risk: data from COGS. J Med Genet. 2016;53:800–811.

    CAS  Article  Google Scholar 

  34. 34.

    Guidugli L, Carreira A, Caputo SM, et al. Functional assays for analysis of variants of uncertain significance in BRCA2. Hum Mutat. 2014;35:151–164.

    CAS  Article  Google Scholar 

  35. 35.

    Spurdle AB, Healey S, Devereau A, et al. ENIGMA—evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes. Hum Mutat. 2012;33:2–7.

    CAS  Article  Google Scholar 

  36. 36.

    Beroud C, Letovsky SI, Braastad CD, et al. BRCA Share: a collection of clinical BRCA gene variants. Hum Mutat. 2016;37:1318–1328.

    CAS  Article  Google Scholar 

  37. 37.

    Findlay GM, Boyle EA, Hause RJ, et al. Saturation editing of genomic regions by multiplex homology-directed repair. Nature. 2014;513:120–123.

    CAS  Article  Google Scholar 

  38. 38.

    Starita LM, Ahituv N, Dunham MJ, et al. Variant interpretation: functional assays to the rescue. Am J Hum Genet. 2017;101:315–325.

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank Sabine Topka, Semanti Mukherjee, Maria Carlo, and Zoe Steinsnyder for helpful suggestions to improve the manuscript. Research reported in this paper was supported by National Cancer Institute of the National Institutes of Health under award numbers R21CA029533, P50CA221745, as well as Cycle for Survival, the Breast Cancer Research Foundation, and the V Foundation for Cancer Research. It is also supported by the Cancer Center core grant P30CA008748 and the Robert and Kate Niehaus Center for Inherited Cancer Genomics. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding agencies.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Joseph Vijai PhD.

Ethics declarations

Disclosure

L.Z. received compensation from Future Technology Research LLC (seminar on precision medicine), Roche Diagnostics Asia Pacific, BGI, Illumina (speaking activities at conferences/workshop). L.Z.'s family member has a leadership position and ownership interest of Shanghai Genome Center. Y.K. was a past employee of Bioreference Laboratories, a subsidiary of OPKO Health, with employment ending in January 2016. K.C. declares institutional support for therapeutic clinical trial from AstraZeneca and Syndax Pharmaceuticals outside the submitted work. Z.S. declares her immediate family member works at the department of Ophthalmology at MSKCC. She also holds consulting/advisory roles with Allergan, Adverum Biotechnologies, Alimera Sciences, Biomarin, Fortress Biotech, Genentech, Novartis, Optos, Regeneron, Regenxbio, Spark Therapeutics. M.R. declares grants, personal fees and nonfinancial support from AstraZeneca, personal fees from McKesson, grants and personal fees from Pfizer, nonfinancial support from Myriad, nonfinancial support from Invitae, grants from AbbVie, grants from Tesar, grants from Medivation outside the submitted work. V.J. and K.O. declare that they hold patent on the “Diagnosis and Treatment of ERCC3 mutant cancer” PCT/US18/22588. The other authors declare no conflicts of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ravichandran, V., Shameer, Z., Kemel, Y. et al. Toward automation of germline variant curation in clinical cancer genetics. Genet Med 21, 2116–2125 (2019). https://doi.org/10.1038/s41436-019-0463-8

Download citation

Keywords

  • pathogenicity
  • ACMG
  • germline
  • cancer
  • curation

Further reading

Search

Quick links