In silico analysis of BRCA1 and BRCA2 missense variants and the relevance in molecular genetic testing

Over the years since the genetic testing of BRCA1 and BRCA2 has been conducted for research and later introduced into clinical practice, a high number of missense variants have been reported in the literature and deposited in public databases. Polymorphism Phenotyping v2 (PolyPhen-2) and Sorting Intolerant from Tolerant (SIFT) are two widely applied bioinformatics tools used to assess the functional impacts of missense variants. A total of 2605 BRCA1 and 4763 BRCA2 variants from the ClinVar database were analysed with PolyPhen2 and SIFT. When SIFT was evaluated alongside PolyPhen-2 HumDiv and HumVar, it had shown top performance in terms of negative predictive value (NPV) (100%) and sensitivity (100%) for ClinVar classified benign and pathogenic BRCA1 variants. Both SIFT and PolyPhen-2 HumDiv achieved 100% NPV and 100% sensitivity in prediction of pathogenicity of the BRCA2 variants. Agreement was achieved in prediction outcomes from the three tested approaches in 55.04% and 68.97% of the variants of unknown significance (VUS) for BRCA1 and BRCA2, respectively. The performances of PolyPhen-2 and SIFT in predicting functional impacts varied across the two genes. Due to lack of high concordance in prediction outcomes among the two tested algorithms, their usefulness in classifying the pathogenicity of VUS identified through molecular testing of BRCA1 and BRCA2 is hence limited in the clinical setting.


Scientific Reports
| (2021) 11:11114 | https://doi.org/10.1038/s41598-021-88586-w www.nature.com/scientificreports/ variant of unknown significance (VUS) while classes 4 and 5 include likely pathogenic and definite pathogenic variants, respectively. The ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles) consortium has been playing an important role to evaluate the clinical significance of sequence variants in high-risk breast cancer genes including BRCA1 and BRCA2 genes since its establishment in 2009 13 . The BRCA Exchange portal also serves as a consolidated resource for aggregating curated BRCA1 and BRCA2 variants for public access 14 .
The frequency of reporting VUS is up to 20% for individuals undergoing genetic testing of BRCA1 and BRCA2 but can be lower in other well-studied populations 15 . Identification of VUS is considered as inconclusive and clinically not actionable. While creating uncertainty for treatment decision, reporting VUS has the potential to raise anxiety in its carriers 16 . A large number of VUS have been listed in publicly available databases such as the Breast Cancer Information Core (BIC) database 17 and ClinVar 18 . The VUS usually consist of non-truncating single nucleotide changes resulting in missense variants, small in-frame insertions or deletions and variants in intronic, non-coding and untranslated regions. The VUS can be novel at the time it was reported due to the unavailability of prior clinical or functional evidence about its causality. Over the time, the status of a particular VUS can be shifted to either benign or pathogenic after sufficient findings have been accumulated 19 .
With the advances of bioinformatics and computational biology, many in silico tools have been developed to help the biologists, scientists, geneticists and clinicians to predict the effects and potential significance of the missense coding variants. Polymorphism Phenotyping v2 (PolyPhen-2) 20 and Sorting Intolerant from Tolerant (SIFT) 21 are two widely applied bioinformatics tools used to assess functional effects of missense variants. The PolyPhen-2 gives predictions based on both sequence alignment and structural features characterizing the amino acid substitution while SIFT solely considers the sequence homology and protein conversation among species to assess functional effects of the variants. Apart from that, many other bioinformatics programs based on different algorithms, have been developed. Most of the online tools grant free access to users.
The aim of the present study was to perform in silico analysis for the BRCA1 and BRCA2 missense variants which are currently reported in ClinVar including pathogenic variants, benign variants and VUS, using the PolyPhen2 and SIFT.

Results
Prediction results from PolyPhen-2 and SIFT on known benign and pathogenic variants. There were 165 and 142 missense variants with known clinical significance (classified as benign or pathogenic), for BRCA1 and BRCA2 respectively which were subjected to both PolyPhen-2 and SIFT predictions. The prediction results for the benign and pathogenic entries were summarised in Supplementary Table S1 and Supplementary  Table S2 for BRCA1 and BRCA2 respectively. For PolyPhen-2, the variants predicted as ''possibly damaging'' and ''probably damaging'' were grouped together as "damaging" variants. The positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, and accuracy were summarised in Table 1.

Discussions
The current study compared the prediction performance of PolyPhen-2 HumDiv, PolyPhen-2 HumVar and SIFT using benchmark datasets for BRCA1 and BRCA2 genes. The study was further extended to evaluate their practical use to predict the pathogenicity of reported VUS in the context of the BRCA1 and BRCA2 genes. Analysis was based on the curated variants with expert review from the ClinVar 18 which is widely referred by genetic laboratory personnel, clinicians and scientists nowadays for interpreting the molecular genetic testing results. Results from this study show differences between the performance of PolyPhen-2 HumDiv, PolyPhen-2 HumVar and SIFT. Of the missense variants with known clinical significance (classified benign or pathogenic), there was concordant agreement achieved in the predictions in 96 (58.18%) of 165 BRCA1 and in 95 (66.90%) of 142 BRCA2 variants for all three approaches used, respectively. The discordant predictions are attributed to the inherent characteristics of different systems used by different algorithms. The PolyPhen-2 combines information from protein sequence and structure to analyse missense variants. However, two different datasets had been used to train the HumDiv and HumVar, hence resulting in different analytical approaches to a given missense variant 20 . HumDiv data is mainly used to discriminate the rare alleles in complex disease phenotypes and in natural selection. On the other hand, HumVar data is more relevant in analysing variants of Mendelian disorders, which more stringently recognise highly disease-causing mutations from benign polymorphism existing in the normal population. The hereditary breast cancers do not clearly fit into either category of complex or Mendelian disorder because of the incomplete penetrance imposed by the two pre-disposing genes. The HumVar had shown higher overall accuracy than HumDiv i.e. 66.67% versus 60.00% and 77.46% versus 61.27% for BRCA1 and BRCA2 missense variants, respectively. Notably, both HumDiv and HumVar had higher NPV (> 85%) than PPV (< 35%) indicating the PolyPhen-2 had given more confident results in predicting benign variants. When SIFT was evaluated along with PolyPhen-2 HumDiv and HumVar, it had shown top performance in terms of NPV (100%) and sensitivity (100%) for BRCA1 and BRCA2. Notably, HumDiv had also shown 100% NPV and   www.nature.com/scientificreports/ 100% sensitivity for BRCA2. Hence, in this study although some similar performances were observed, these algorithms were noted to have varied predictive performances for BRCA1 and BRCA2 missense variants, respectively. Notably, the NPV of the three tested approaches were all above 85% (Table 1). However, the other performances including PPV, specificity and accuracy were generally low (< 80%), hence, rendering clinical applicability of these approaches unsuitable. A limitation of the current study is the small size of datasets of tested missense variants with known clinical significance for BRCA1 (Benign, N = 131; Pathogenic, N = 34) and BRCA2 (Benign, N = 129; Pathogenic, N = 13), respectively.
Genetic testing for BRCA1 and BRCA2 genes can be important to provide actionable results to the clinicians and genetic counsellors who are directly dealing with the patients and their relatives. The utility of such testing is strongly dependent on the interpretation of variants identified. The current laboratory practice in variant interpretation include searching the published literature and online curated databases for evidence of pathogenicity. Frequently the identified missense variants are novel and are not catalogued in the existing publicly available databases. In vitro assays are performed by research laboratories to study the functional impact of the missense variant to the protein, however, this is beyond the ability of most clinical laboratories. The functional tests for BRCA1 and BRCA2 are generally time-consuming, laborious and costly. Studies of familial co-segregation of variant with cancer is feasible by providing carrier testing but such extended testing involves genotyping of many relatives collectively before conclusion can be firmly drawn. Again, such strategy is more applicable in the research setting.
Although lacking evidence from functional analysis and familial co-segregation with cancer, a rare missense variant is not always considered as VUS. Apart from very frequent polymorphic variants, according to ENIGMA BRCA1/2 gene variant classification criteria, a missense variant could be classified as class 1 13 if it has a prior probability of pathogenicity ≤ 0.02 from clinically calibrated bioinformatic analyses and an allele frequency ≥ 0.001 and < 0.01 in large outbred control reference groups. Reporting VUS in the clinical report could lead to problematic interpretation since such results cannot be readily utilised to identify the at-risk family members and provide indication for increased surveillances. Such uninformative results could also possibly raise the misperception in the patients and family members that they are at higher risk of developing cancers.
In the current study, low consensus was noted when the three approaches were used to predict the pathogenicity of the VUS. Agreement was achieved in prediction outcomes from the three tested approaches in 55.04% and 68.97% of the VUS for BRCA1 and BRCA2, respectively. The lack of concordance in prediction outcomes can cause confusion in interpreting the significance of the missense variants. Previous comparative studies which evaluated other in silico algorithms suggested that combining several prediction tools could improve the prediction performance 22 , however, the study by Walters-Sen et al. 23 had reported the opposite.
There had been several other studies conducted in silico analysis on BRCA1 and BRCA2 missense variants. Based on the datasets from a commercial testing laboratory, Kerr et al. 24 reported that SIFT presented 100% sensitivity and NPV in predicting both BRCA1 and BRCA2 variants, a similar finding as ours. Ernst et al. 25 also showed 100% sensitivity for SIFT predictions on both BRCA1 and BRCA2 variants, which was also in congruent with the findings from our study. However, for accuracy we have shown that PolyPhen-2 had better performance than SIFT as compared to the studies by Kerr  In conclusion, the performances of PolyPhen-2 and SIFT in predicting functional impacts varied across a clinical dataset of BRCA1 and BRCA2 missense variants. Lack of high concordance in prediction outcomes highlighted their limited clinical application in classifying the pathogenicity of VUS identified through molecular testing of BRCA1 and BRCA2.

Methodology
Mining of BRCA1 and BRCA2 missense variants. ClinVar is a public depository of genetic variants allowing submissions of curations with clinical significance. ClinVar is a freely accessible database (https:// www. ncbi. nlm. nih. gov/ clinv ar/). The variants of the BRCA1 and BRCA2 genes as of March 16, 2021 were downloaded after applying filters including "missense" for "molecular consequence" and "expert panel" for "review status". The datasets were imported into a spreadsheet software, Microsoft Excel for Mac, visualized and analysed. Entries of different DNA alterations representing the identical missense protein change were removed. For BRCA1, a missense variant, c.5074G > C (p.Asp1692His) was excluded from analysis due to its proven effects on mRNA splicing instead of amino acid substitution 27,28 . A total of 307 expert reviewed entries consisting of 34 pathogenic and 131 benign BRCA1, and 13 pathogenic and 129 benign BRCA2 variants were subject to in silico analysis for functional predictions.
The missense variants of unknown significance (VUS) were also sought for the BRCA1 and BRCA2 genes in ClinVar by applying filters including "missense" for "molecular consequence" and "uncertain significance" for "clinical significance". Duplicate entries with identical missense protein change were removed. There were 2440 BRCA1 and 4621 BRCA2 VUS included in this study.
In silico analysis using PolyPhen-2. PolyPhen-2 is an online tool for prediction of the functional consequences of an amino acid substitution on a human protein. Polyphen-2 web interface was accessed at http:// genet ics. bwh. harva rd. edu/ pph2/ index. shtml. Batch query mode was used for analysis. The query line of missense variant was outlined according to the format: www.nature.com/scientificreports/ "# Protein ID PositionAA1 AA1". The protein ID of BRCA1 and BRCA2 are NP_009225 and NP_000050, respectively. Both HumDiv-and HumVar-trained PolyPhen-2 models were used in this study. Each analysed variants is classified as benign (score ≤ 0.5), possibly damaging (0.5 < score ≤ 0.9), or probably damaging (score > 0.9) according to the predetermined thresholds of False Positive Rate (FPR) for each of the two models (HumDiv and HumVar).
In silico analysis using SIFT. SIFT is an online tool uses sequence homology to judge the functional impact of missense variants. SIFT web interface was accessed at http:// sift. jcvi. org/. SIFT Human Protein was selected. The query line of missense was outlined by the Protein Ensembl ENSP ID followed by the specified substitution: "ENSP ID, substitution". The Ensembl ENSP ID used for BRCA1 and BRCA2 were ENSP00000350283 and ENSP00000439902, respectively. The batch protein tool was used to multiple queries. SIFT give prediction outcomes for missense variants as damaging (score < 0.05) and tolerated (score ≥ 0.05).
Evaluation of prediction results of variants with known clinical significance. The performance of the PolyPhen-2 and SIFT was analysed by comparing the statistical calculations: For these calculations, TP or True Positives are pathogenic variants with CLINVAR expert review called as possibly damaging or probably damaging by PolyPhen-2, and damaging by SIFT, respectively. FP or False Positives are benign variants called as possibly damaging or probably damaging by PolyPhen-2, and damaging by SIFT, respectively. TN or True Negatives are benign variants called as benign by PolyPhen-2, and tolerated by SIFT, respectively. FN or False Negatives are pathogenic variants called as benign by PolyPhen-2, and tolerated by SIFT, respectively. Calculations were performed using MedCal diagnostic test evaluator calculator.