A computational model for classification of BRCA2 variants using mouse embryonic stem cell-based functional assays

Sequencing-based genetic tests to identify individuals at increased risk of hereditary breast and ovarian cancers have resulted in the identification of more than 40,000 sequence variants of BRCA1 and BRCA2. A majority of these variants are considered to be variants of uncertain significance (VUS) because their impact on disease risk remains unknown, largely due to lack of sufficient familial linkage and epidemiological data. Several assays have been developed to examine the effect of VUS on protein function, which can be used to assess their impact on cancer susceptibility. In this study, we report the functional characterization of 88 BRCA2 variants, including several previously uncharacterized variants, using a well-established mouse embryonic stem cell (mESC)-based assay. We have examined their ability to rescue the lethality of Brca2 null mESC as well as sensitivity to six DNA damaging agents including ionizing radiation and a PARP inhibitor. We have also examined the impact of BRCA2 variants on splicing. In addition, we have developed a computational model to determine the probability of impact on function of the variants that can be used for risk assessment. In contrast to the previous VarCall models that are based on a single functional assay, we have developed a new platform to analyze the data from multiple functional assays separately and in combination. We have validated our VarCall models using 12 known pathogenic and 10 neutral variants and demonstrated their usefulness in determining the pathogenicity of BRCA2 variants that are listed as VUS or as variants with conflicting functional interpretation.


Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection No software used.

Data analysis
All analyses were conducted in the R programming language, Version 3.6.3. The R code and appropriately formatted input data files are available upon request.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The raw data are provided in Supplemental Tables 2 and 3.

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Two independently generated mouse ES cell clones expressing each BRCA2 variants were used in every experiment. The reason for selecting two independent clones expressing each BRCA2 variant was to rule out any "position effect" related to the site of integration of the BAC in the mouse ES cell genomic DNA.
Data exclusions No experimental data was excluded.

Replication
All Drug sensitivity assays were done in triplicate using two independent ES cell clones. all cell viability assays were performed in two independently generated mES cells expressing each variant.
Randomization Cells were randomly plated for drug sensitivity and HAT selection assay.

Blinding
All experiments were blindly performed by individuals who were not aware of the pathogenicity of the variants.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.

Noise and artifact removal
Describe your procedure(s) for artifact and structured noise removal, specifying motion parameters, tissue signals and physiological signals (heart rate, respiration).

Volume censoring
Define your software and/or method and criteria for volume censoring, and state the extent of such censoring.