INTRODUCTION

Genetic testing and targeted resequencing of cancer susceptibility genes to facilitate precision cancer prevention and early diagnosis has grown exponentially because of the decreasing costs of next-generation sequencing (NGS).1,2 A major challenge in clinical sequencing is interpreting sequence variants. The American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathology (AMP) have published guidelines on interpretation of germline variants, considering both their pathogenicity and clinical actionability.3 Yet, germline variant classification continues to impose an immense burden on the time and resources of diagnostic molecular laboratories and cancer care professionals. The ACMG classification schema requires manually exploring multiple lines of public data, as well as other orthogonal data sources and literature, and then aggregation and scoring to provide evidence for classifying variants.4 Currently, there is no widely available automated computational framework for classifying genetic variants based on ACMG criteria. We developed the Pathogenicity of Mutation Analyzer (PathoMAN), a computational resource that automates germline variant classification with uniformity, transparency, and speed, to facilitate variant curation for the cancer genetics community.

PathoMAN’s variant curation algorithm classifies germline genomic variants. The schema is inspired by ACMG/AMP classification.3 It aggregates multiple tracks of genetic and molecular evidences using variant annotators and from public repositories containing evidence necessary for pathogenicity assertion. The compiled data is then used in 28 distinct categories, which are grouped as variant type, biological impact, in silico predictions, presence in the control cohort, familial information, and inheritance mode. The aggregate score resulting from evaluation of these categories is used in generating the assertion for a variant as Pathogenic (P), Likely Pathogenic (LP), Benign (B), Likely Benign (LB), or Variant of Uncertain Significance (VUS).

PathoMAN’s performance was measured by reevaluating expertly curated germline cancer variants from three clinical testing laboratories: Ambry, Invitae, and GeneDx. We selected the commonly tested cancer susceptibility genes in multiplex panels, many of which are in the ACMG recommended gene list. We also tested the algorithm on reported P/LP variants from four published cancer studies on non-ACMG heritable cancer risk and putative risk genes. In this study, we also assessed the frequency of clinically actionable variants present in general population using ExAC (noTCGA) data in cancer susceptibility and predisposition genes. We tested the application of ACMG criteria for germline cancer variants and addressed the bottleneck of variant curation in using automated algorithms in variant classification.

MATERIALS AND METHODS

Test data sets

To test the performance of PathoMAN against manual curation, we selected variants that were commonly reported by CLIA-certified clinical testing laboratories—Ambry, Invitae and GeneDx (3513 variants in 27 genes)—with identical assertions in ClinVar (version September 2018). Many variants in this test data set have also been reported by multiple other submitters in ClinVar. The Invitae data set consisted of 1494 B/LB, 608 P/LP, and 1412 VUS variants; the Ambry data set 1517 B/LB, 646 P/LP, and 1351 VUS variants; and the GeneDx data set 1512 B/LB, 633 P/LP, and 1369 VUS variants (Table S1). We used these data sets to test the assertions of pathogenic and benign classifications using PathoMAN.

We also tested PathoMAN on 300 P/LP variants in 55 genes from four published reports on multiple cancers,5,6,7,8 where ACMG criteria and primary assertions were available. We chose these data sets to test lesser known cancer predisposition genes that are not part of the ACMG list.

ACMG/AMP guidelines

The variant classification criteria used by PathoMAN utilizes ACMG/AMP guidelines.3 The ACMG/AMP guidelines constitute 16 criteria that aid in classifying pathogenicity and 12 criteria that aid for classifying benignity. This classification system resolves a variant as pathogenic or benign based on eight major components: population frequency data, genomic annotation and computational predictive data, functional data, segregation data, de novo data, allelic/genotypic data, public databases, and literature and other data (Table 1). The detailed usage is described in the “Determination of ACMG criteria for variant classification” section of the supplemental materials.

Table 1 Utilization of ACMG criteria and knowledgebase in PathoMAN variant curation

The results of PathoMAN were compared against the reported clinical assertion. We report them here in four categories: concordance, discordance, loss of resolution (LOR), and gain of resolution (GOR). When the reported P/LP and B/LB variants are reclassified as P/LP and B/LB respectively by PathoMAN, then the results are considered concordant. Similarly, when reported P/LP and B/LB variants are reclassified as B/LB and P/LP respectively by PathoMAN, then the variants are considered discordant. When reported P/LP or B/LB variants are reclassified as VUS by PathoMAN, then they are placed in the LOR category because PathoMAN cannot definitively classify these variants as either pathogenic or benign, thus losing resolution. Similarly, when the reported VUS are reclassified as P/LP or B/LB, they are considered as GOR because PathoMAN can resolve these variants as pathogenic or benign (Table S2). These evaluations aid in understanding the usage of the eight ACMG categories of evidence in the context of cancer genetics, and in their ability to differentiate between P/LP, B/LB, and VUS.

ExAC subset of cancer predisposition genes

The Exome Aggregation Consortium9 (ExAC) is a joint effort to aggregate exome sequencing data from 14 large sequencing projects to provide summary data such as ethnicity specific allele frequency for a wider scientific community. The ExAC noTCGA data is a subset of 53,105 samples and it excludes The Cancer Genome Atlas (TCGA) cancer germline samples (n = 7601). ExAC is often utilized as convenience controls for several cancers6,10,11 in case–control design. We wanted to estimate the burden of variants in ExAC noTCGA as classified by PathoMAN and contrast against known information in ClinVar. We selected 55,566 variants from 76 known and putative cancer risk genes (Table S3) that were in exonic or essential splice-site regions. This is not considered part of the test data sets described earlier, because ExAC data is used as part of the ACMG criteria PS4, PM2, BA1, BS1, and BS2.

RESULTS

We developed PathoMAN, a germline variant classification algorithm, which is provided freely as a web-based service that allows users to either query single variants or batch upload a CSV file. Single-variant query works on chromosome, position, reference allele, alternative allele, allele count, allele number, de novo status, cosegregation status, and preferred control population subgroup. Batch upload requires six columns [chr, pos, ref, alt, ac, an] in a comma-separated values (CSV) file. Users can select de novo status, cosegregation status, and preferred control population subgroup. The program converts the CSV file to a minimal VCF4.2 file, annotates the minimal VCF, and prepares it for PathoMAN variant classification. The result of a single variant query is displayed back on to the webpage immediately while the batch upload results will be emailed back to the submitter. For an annotated VCF file containing 100 variants, PathoMAN takes 6 minutes, which is 3.6 seconds per variant. This provides a massive advantage in terms of speed, uniformity, efficiency, and service assurance compared with manual curation. In contrast, an expert reviewer may take 20–30 minutes to classify a novel variant.

PathoMAN versus manual curation for test data sets

To evaluate the performance of PathoMAN, we compared its results against clinical assertions from three clinical laboratories. The test data set contained 3513 variants with prior reported curation from the three clinical laboratories: Ambry Genetics, Invitae, and GeneDx. Interlab agreement across the three laboratories was 84%. The 16% interlab disagreement variants were marked as VUS when comparing against PathoMAN results. Amongst 3513 variants, missense variants accounted for 54.2%, synonymous variants for 24.7%, frameshift variants for 7.5%, stopgain variants for 4.7%. and the rest were distributed among splice variants, in-frame insertion/deletion, and stoploss in the test data set (Fig. 1a, Table S4). Variants were annotated with CAVA, ANNOVAR, ExAC noTCGA, gnomAD, and ClinVar and run through PathoMAN.

Fig. 1: Comparison of PathoMAN results against expertly curated variants.
figure 1

(a) Distribution of 3513 variants in the test data set by variant class. EE variant that alters the first or last three bases of an exon (i.e., the exon end), but not the frame of the coding sequence; ESS any variant that alters essential splice-site base (+1,+2, −1, −2); FS frameshifting insertion and/or deletion; it alters length and frame of coding sequence; IF in-frame insertion and/or deletion; it alters length but not frame of coding sequence;IM variant that alters initiating methionine start codon; NSY nonsynonymous variant; it alters amino acid(s) but not coding sequence length; SG stopgain (nonsense) variant caused by base substitution; SS any variant that alters splice-site base within the first eight intronic bases flanking exon (i.e., +8 to −8) but not an ESS or SS5 base;SS5 any variant that alters +5 splice-site base but not an ESS base, SY synonymous variant; it does not alter amino acid or coding sequence length; 3PU any variant in 3′ untranslated region; 5PU any variant in 5′ untranslated region. (b) Performance of PathoMAN’s variant classification against variant classification from three clinical laboratories: Ambry Genetics, Invitae, and GeneDx. B/LB benign/likely benign, GOR gain of resolution, LOR loss of resolution, P/LP pathogenic/likely pathogenic. (c) Concordance of PathoMAN and clinical lab results for reported P/LP variants group by penetrance of the gene (1257 variants in 27 genes [supplementary table S5]). (d) Concordance of PathoMAN and published reports for reported P/LP variants from four cancer studies (300 variants in 55 genes: Mandelker et al.7, 97 variants; Maxwell et al.5, 40 variants; Pritchard et al.6, 56 variants; and Zhang et al.8, 107 variants [supplementary table S7]).

Upon comparing the clinical assertions from the three laboratories with PathoMAN’s results, the number of observed agreements was 90% (3153 of 3513 variants) and the Cohen’s kappa coefficient for interrater agreement (к) was 0.83 (confidence interval [CI] 95% 0.82–0.85).

PathoMAN achieved a concordance of 94.4% for P/LP and 81.1% for B/LB variants (Fig. 1b). It showed 100% concordance for P/LP frameshift, splice sites, and truncating variants. Similarly, there was 100% concordance for B/LB synonymous variants. There was a minimal discordance seen at 0.3% for P/LP variants (n = 2). PathoMAN failed to resolve 22 P/LP missense variants and reclassified 2 P/LP variants as B/LB (a synonymous variant and an extended splice-site variant) (Table 2). The synonymous PMS2 variant (c.825A>G p.Gln275Gln) was functionally shown to have aberrant splicing by experimental methods in literature. The MSH2 splice variant (c.942+3A>T) has been demonstrated to disrupt messenger RNA (mRNA) splicing and result in skipping of exon 5. PathoMAN assertions are made in real time by the algorithm. In future versions of the program, we intend to incorporate a consensus splice prediction module and a literature-based evidence module, which would aid in correct classification of these edge case variants. No B/LB variants were reclassified as P/LP.

Table 2 Distribution of PathoMAN reclassified germline cancer variants by variant class

PathoMAN reclassified 5.3% (n = 32) reported P/LP variants and 18.9% (n = 238) reported B/LB variants as VUS. Fifty-three percent of these P/LP variants (17/32) and 74.4% B/LB variants (177/238) were classified as VUS in ClinVar by at least one submitter previously. This suggests that, for these variants, a consensus assertion has not been achieved due to insufficient clinical information. Similarly, PathoMAN reclassified 1.6% (n = 26) and 3.8% (n = 63) of VUS as P/LP and B/LB respectively. Sixty-two percent of these reported VUS variants (55/89) had at least one submitter classify them as P/LP or B/LB among the three laboratories and in ClinVar. One of the major sources of gain of resolution is achieved by PathoMAN’s use of the saturation mutagenesis experiments on BRCA1.12 Other reasons for reclassification are version changes in ClinVar and updates on the public allele frequencies in ExAC and gnomAD.

When grouped as high and low penetrance genes (Fig. 1c, Table S5), we calculated the absolute difference between classifications and observed Cohen’s kappa coefficient for interrater agreement (к) of 0.82 (CI 95% 0.805–0.843) and 0.84 (CI 95% 0.77–0.92) respectively.

When compared against 300 P/LP variants in putative cancer predisposition candidate genes from four published cancer studies, PathoMAN showed 96.4% concordance for Pritchard et al.6 (prostate cancer), 87.5% concordance for Maxwell et al.5 (breast cancer), 84.5% concordance for Mandelker et al.7 (multiple cancer types), and 80.3% for Zhang et al.8 (pediatric cancer) (Fig. 1d).

PathoMAN results for ExAC data set

PathoMAN classified <1% of the heterozygous genotypes in ExAC noTCGA data set (55,566 exonic and essential splice variants from 76 cancer risk genes) as P/LP. We tabulated pathogenic variant burden by genes and compared them against ClinVar (Table 3). PathoMAN calls similar number of P/LP variants as reported in ClinVar for the high-risk cancer genes like BRCA1 and BRCA2. PathoMAN also predicts a few novel P/LP variants unreported in ClinVar. Investigators who intend to use the ExAC noTCGA data set as controls in a gene burden test against sequenced cancer cases can use PathoMAN to get a list of P/LP variants across genes.

Table 3 Comparison of PathoMAN P/LP burden versus ClinVar P/LP burden for ExAC (noTCGA) variants by cancer susceptibility genes

Usage of ACMG/AMP categories in PathoMAN

We analyzed the usability and frequency of use for the eight categories of evidence (population frequency, genomic annotation and computational prediction, functional evidence, cosegregation, de novo status, allelic/genotypic data, public databases, scientific literature and other data) described in the ACMG/AMP guidelines. Interestingly, we find that the categories population frequency data, genomic annotation and computational predictions, databases, and scientific literature (Fig. 2a) are the most used. These are available due to generous data and toolkit-sharing policies in the genomics field. The categories that are rarely if ever used are familial cosegregation data or de novo status, allelic data, and functional data. We have also used ClinVar’s review status to upweight functional evidence in the current version, as we believe that the review status directly corresponds to the literature evidence reported for a variant (Fig. 2b). The cosegregation data and de novo status data are limited to familial studies and are mostly unavailable in sporadic case–control settings because these are collected by investigators based on patient input. It was clear from ACMG guidelines that, for a variant to be classified as pathogenic or likely pathogenic by ACMG criteria, one needs a maximum of 1 PVS1 or 2 PSs or 3 PMs or 4 PPs for which the knowledgebase and resources used by PathoMAN were demonstrably sufficient. We describe below the bottlenecks in sharing this information and propose a novel framework to circumvent and ameliorate these issues.

Fig. 2: Comparison of PathoMAN results against expertly curated variants.
figure 2

(a) Utilization of knowledgebase components by PathoMAN during variant curation of test data sets. (b) Ratio of number of articles per variant across different review status reported in ClinVar.

DISCUSSION

PathoMAN as a tool to aid variant curation

Traditionally, genetic variant curation has been performed manually by expert groups of individuals. However, this is a time-intensive task that requires aggregation and interpretation of information from multiple sources. In the cancer realm, this was relatively easy when a single gene, e.g., BRCA1/2, was under investigation. In contemporary testing scenarios, which routinely rely on multiplex gene panels, this task is onerous. Large gene discovery efforts, as well as clinical reporting, could use a simplified, automated method for prioritizing variants for a closer look, or in the best case, be useful as the classification tool of choice. PathoMAN addresses this critical unmet need for an unbiased algorithmic approach toward classifying genetic variants of clinical interest in cancer predisposition. PathoMAN can be easily accessed through a web browser and results for individual variants are almost immediately available, while results of a batch query may take a few minutes.

Genetic testing laboratories are increasingly utilizing ACMG/AMP classification rules to classify variants for pathogenicity within cancer predisposition genes. However, results vary depending on availability of accessible data and interpretational differences.13 Concordance between CLIA-certified laboratories varied between 37% and 71% pre- and postconsultative processes using the ACMG guidelines. We observed 84% interlaboratory agreement across all three clinical laboratories. Efforts are being made to narrow interpretational differences through initiatives underway such as ClinGen. In a recent report,14 13% of variants in ClinVar were reanalyzed, and were found to be unresolved, underscoring the difficulties even for expert curator groups. For manual or automated curation, the minimal set of information required to classify a variant as likely pathogenic or likely benign includes population frequency, in silico predictors, and prior reported evidence of pathogenicity from public databases. PathoMAN compiles this information uniformly in a machine accessible format that is used as a knowledgebase for variant classification. An additional advantage of using PathoMAN is that it can effortlessly identify benign variants based on public allele frequency and the genomic context information. In a typical multiplexed gene panel variant list, after filtering for only rare high or moderate impact variants, PathoMAN will classify about one-third of the variants as B/LB with high precision. This saves time and effort for the variant curators and helps them to focus on curating the remaining potentially actionable variants. PathoMAN will also identify known founder variants.

Cancer is a complex disease with multigene etiology. Some cancer genes confer high risk whereas some only moderately affect the carrier’s risk. Panel testing is currently used for active surveillance and intervention to lower disease risk. Large sequencing and genotyping efforts to discover new cancer predisposition genes are being carried out by several consortia like BCAC,15 SIMPLEXO,16 COMPLEXO,17 CIMBA,18 etc. As the cost for sequencing decreases, the number of genes tested is increasing. Automation allows for rapid processing, service assurance, and reproducibility of results. Gold standard sets of curation pioneered by ClinGen19,20 would aid in refining these pathogenicity classifications further, while efforts such as the PROMPT21,22 registry enable accurate penetrance estimates of variants in susceptibility genes. The PROMPT registry has identified a 26% discordance rate among clinical laboratories and an 11% rate with conflicting interpretations, a discrepancy that has implications for altering medical management.

Many laboratories and certain programs such as CardioClassifier23 and InterVar24 use prior knowledge of disease–gene pair association. This is advantageous to reduce misclassification of variants. However, it also suffers from the disadvantage that it cannot be used for lesser known disease-gene– pairs or for novel gene hunting. In a recent report,7 we showed that half of the cases in a series consisting of selected advanced cancers at a single institution were nonsyndromic associations.7 Probands or their close relatives had clinically actionable variants in cancer genes not directly associated with the specific cancers for which there were known syndromic associations. PathoMAN currently does not use the contextual syndromic association in deciphering pathogenicity of variants. This is a distinct advantage when searching for novel genetic association. However, in clinical sequencing, we acknowledge that limiting to known disease–gene pairs to identify pathogenic variants reduces false positives. In future versions, we hope to incorporate both clinical and gene discovery modes.

The variants that are manually curated as P/LP or B/LB, but that are called as VUS by PathoMAN, are grouped under the loss of resolution (LOR) category. This loss of resolution due to lack of accessible supporting evidence could be due to several reasons, e.g., inability to programmatically parse inline texts from public databases, availability of updated proprietary databases like HGMD and LOVD, unavailability of in-house functional evidence25 or familial cosegregation information,26 etc.

The upgrade for VUS to either LP or LB by PathoMAN is based on the three categories: lines of available evidence in public databases, population frequency, and computational and in silico prediction on deleteriousness. These variants can be reclassified as either pathogenic or benign if additional functional or cosegregation data became available to the user.

Commercial testing laboratories have proprietary versions of interpretation pipelines such as Sherloc27 (Invitae Corporation) and MyVISION (Myriad Genetics). However, these are unavailable to the community at large. PathoMAN is designed to provide an optimized platform for clinical variant calling utilizing publicly available data resources.

Using ACMG for variant classification in cancer

Variants in tumor suppressors and oncogenes lead to tumorigenesis, and the Knudson two-hit hypothesis28 is seen to operate in many common cancers. Common examples include APC,TP53, BRCA1/2 genes, etc. However, several of these genes, especially those that are part of the Fanconi complex (FANCS-BRCA1, FANCD1-BRCA2,FANCJ-BRIP1, FANCN-PALB2, FANCP-SLX4,RAD51C), neurofibromatosis (NF1), ataxia–telangiectasia (ATM), Bloom syndrome (BLM), Nijmegen breakage syndrome (NBN), dyskeratosis congenita (TERT), which lead to autosomal recessive rare Mendelian disorders, are also found to be risk genes for autosomal dominant cancer predisposition. Heterozygous carriers of these gene variants are reported to have increased risks for syndromic cancers.29 Occasionally, gene-disrupting heterozygous variants in these genes that are rare or absent in public controls such as ExAC and gnomAD may be observed in sequenced cancer cohorts. Their ClinVar record for pathogenicity is usually based on their Mendelian recessive syndrome and not to the cancer phenotypes. Hence, applying the ACMG rules to genes without membership in the ACMG list may be fraught with misclassification. However, we believe that continuing data streams for variants in these genes will lead to better classifications, especially when coupled with familial cosegregation and functional validations. While PathoMAN classifications for such genes are a useful starting point for identifying variants that may be pathogenic and discarding benign, we emphasize expert manual curation to disentangle these issues.

Limitations of automation

Automating variant classification based on publicly available information has some pitfalls. Supporting evidences provided in ClinVar for variants by submitters are not computation friendly and require manual curation to interpret free text. In several instances, the citations are not relevant to the specific records. Technologies such as natural language processing and tagging will eventually help to build a knowledgebase that can further be used for deep learning.

ACMG guidelines do not account for functional evidence provided in ClinVar (supporting observations), which leads to loss of information that could otherwise be used in variant classification. Due to this lack of data structure (free text), the bonafide variants in ClinVar are being coded only PP5 or BP6 and not PS3 or BS3. We employed the review status 2 or more as a proxy for functional evidence. Not all submitters are equipped or do independent analyses to assess functional evidences for their clinical assertion. If the ClinVar evidence is informatically coded, it would be helpful for molecular geneticists and clinical curators to use this information for their pathogenicity estimation. For example, TP53 (R273H),BRCA1 (Y105C), and BRCA1 (V1688del) variants have overwhelming literature evidences (Figure S1); however, the evidence present in the description of the submissions within ClinVar is computationally underivable. Similarly, there are many variants reported in the literature that may have some level of supporting evidence for pathogenicity or benignity in ClinVar. Currently all data integration is done by manual curators on a case-by-case basis.

We propose a framework to report ClinVar data that can be structured and parsable for an automated algorithm in the context of cancer. This format consists of six important fields that compress the vast information that is present in literature or clinical reports:

  1. 1.

    Population/ethnicity (Non-Finnish European [NFE], Ashkenazi Jewish [ASJ], African/African American [AFR], South Asian [SAS], Latino [AMR], Finnish [FIN], East Asian [EAS] and Other [OTH])

  2. 2.

    Inheritance model (autosomal dominant [AD], autosomal recessive [AR], de novo, X-linked)

  3. 3.

    Allelic status (Hom, Het)

  4. 4.

    Family history/cosegregation information (Yes–1; No–0)

  5. 5.

    Disease association (TCGA code/OncoTree code http://oncotree.mskcc.org)

  6. 6.

    Functional evidence (experiment type: Non-sense mediated decay [NMC] and Loss of heterozygosity [LOH], etc.)

For example, ERCC3 (R109X) variant30 can be depicted as ASJ-AD-Het-1:1-BRCA, BLCA-NMD. This variant was seen in Ashkenazi Jewish individuals with an autosomal dominant inheritance for the heterozygous allele. This variant cosegregated in one family with cancer history. The variant was found in breast cancer and bladder cancer individuals and the functional evidence for pathogenicity was carried out by testing for nonsense-mediated decay and other experiments.

Large sequencing studies and gene-specific functional studies give curated lists of variants with their pathogenic impacts, such as the TP53 database31 and a functional study on PALB2 variants.32,33 As a pilot project, we have collected a list of PALB2, TP53 variants from these literature as supporting the knowledgebase for PathoMAN (PS3/BS3 functional evidence) but there is a real need to create a publicly available, well-curated list of variants from the literature that is amenable to programmatic interpretation. Similarly, as standards evolve for the incorporation of somatic variants into germline interpretation, we expect an integration of such events for well-established tumor suppressor and oncogenes. The roles played by the ENIGMA consortium,34,35 G4GH (https://www.ga4gh.org/), and BRCA-Share36 in this regard are meritorious. Though clinical laboratories collaborate to resolve the differences in variant interpretations submitted to ClinVar,14 the fact remains, however, that a unified framework for incorporation of supporting machine-readable evidences in any variant database including ClinVar remains a critical bottleneck.

Functional data are rarely available for most genes. Exceptions areBRCA1/2 due to the concerted efforts of the ENIGMA consortium.34,35 In single-variant reports, data are usually buried within scientific jargon that is not compatible with genomic variant information. In many instances, functional data are dependent on the models used, e.g., overexpression of a mutant construct, deletion of a region using a CRISPR endonuclease, and sometimes, introduction of the specific nucleotide through homology-directed DNA repair. It is also likely that the results from these three methods do not agree. Novel methods to understand deleteriousness using saturation mutagenesis are also starting to emerge37,38 for example, for BRCA1 (ref. 12), we have incorporated the loss of function information into PathoMAN’s algorithm. We hope these will add a uniform layer of functional data that can be used in determining pathogenicity in the coming years.

In conclusion, we performed pathogenicity assessment of 59,379 variants in germline cancer risk genes, the first and largest uniform classification using an unbiased computational tool. We demonstrate the high concordance and low discordance when compared with manual curation as a harbinger of how such programs will soon be able to help domain experts and manual curators. PathoMAN is a first step toward our goal of automating the complex process of variant classification and interpretation. A beta version of the web app is available at https://pathoman.mskcc.org/.

URLs

1. CAVA: https://github.com/RahmanTeam/CAVA

2. SNPEff: http://snpeff.sourceforge.net/SnpEff_manual.html

3. ANNOVAR: https://annovar.openbioinformatics.org

4. ExAC noTCGA: http://exac.broadinstitute.org

5. gnomAD: http://gnomad.broadinstitute.org/

6. ClinVar: https://www.ncbi.nlm.nih.gov/clinvar/

7. IARC database: http://p53.iarc.fr/

8. ClinVar parser tool: https://github.com/macarthur-lab/clinvar

9. dbNSfP and dbscSNV: https://sites.google.com/site/jpopgen/dbNSFP

10. Gene List: https://github.com/macarthur-lab/gene_lists

11. RepeatMasker: http://www.repeatmasker.org/

12. UCSC Genome Browser: https://genome.ucsc.edu

13. CardioClassifier: https://www.cardioclassifier.org/

14. InterVar: https://github.com/WGLab/InterVar

15. ACMG: https://www.acmg.net/

16. ClinVar Miner: https://clinvarminer.genetics.utah.edu/

17. PathoMAN: http://pathoman.mskcc.org/

18.Ambry: https://ambrygen.com/

19. Invitae: https://www.invitae.com/en/

20. GeneDx: https://www.genedx.com/