Introduction

Since the last two decades, molecular diagnosis of hereditary breast and/or ovarian cancer (HBOC) is mostly based on the identification of germline inactivating mutations within BRCA1 [MIM# 113705] and BRCA2 [MIM# 600185].1, 2 In BRCA1 and BRCA2 mutation carriers, the cumulative risk of breast cancer at 70 years has been estimated to 65 and 45%, respectively, and the risk of ovarian cancer to 39 and 10%, respectively.3 The identification of a deleterious BRCA1/BRCA2 mutation within a family is crucial for the medical follow-up, as mutation carriers should be offered annual MRI or, alternatively, prophylactic mastectomy and prophylactic salpingo-oophorectomy. Furthermore, in a breast cancer patient, the detection of a germline BRCA1 or BRCA2 mutation may have important therapeutic consequences: complete mastectomy instead of partial mastectomy and, in the future, the prescription of specific targeted therapies, such as PARP inhibitors.4, 5 Considering the medical consequences of the identification of a germline BRCA1 or BRCA2 mutation and the frequency of mutation carriers, which has been estimated up to 1/500,6 BRCA1 and BRCA2 are among the most frequently analyzed genes, in the context of Mendelian diseases. In molecular diagnostic laboratories, analysis of BRCA1/BRCA2 is usually performed by Sanger sequencing, sometimes preceded by a pre-screening step mainly based on the detection of heteroduplexes using different methods such as denaturing high-performance liquid chromatography (DHPLC), high-resolution melting analysis (HRM) or enhanced mismatch mutation analysis (EMMA). This analysis is completed by the screening for genomic rearrangements, which is routinely performed using semi-quantitative methods, such as QMPSF or MLPA.7, 8, 9, 10, 11 The number of amplicons and the need to combine several methods make the analysis of BRCA1/BRCA2 particularly labour-intensive. Furthermore, screening for germline mutations within BRCA1 and BRCA2 often remains negative, even in families strongly suspected to present a Mendelian form of breast and/or ovarian cancer. Numerous other genes have been shown to be involved in the genetic determinism of breast or ovarian cancers, but their respective contribution and the penetrance of their mutations remain, for most of them, to be characterized. Mutations within TP53, PTEN, STK11 and CDH1 resulting in Li Fraumeni (LFS), Cowden, Peutz–Jeghers syndrome and hereditary diffuse gastric cancer, respectively, are associated to an increased risk of breast cancer;12, 13, 14, 15 mutations within RAD51 paralogs, such as RAD51C, confer an increased risk of ovarian cancer;16 and variations within ATM, BRIP1, PALB2 and CHEK2 have been shown to be associated with a moderate increase of breast cancer.17

Next-generation sequencing (NGS), based on massively parallel sequencing after clonal amplification of DNA templates in emulsion PCR or solid phase, allows molecular diagnostic laboratories to increase their throughput, to reduce the delay of analysis and to analyze simultaneously the different genes involved in a specific disease or a group of diseases. As recently highlighted, the main challenge of transferring NGS to medical diagnosis is the development of a workflow and bioinformatic pipelines fulfilling the requirement of quality control for diagnosis.18 The aim of this study was to prospectively evaluate the performance of NGS for the routine analysis of BRCA1 and BRCA2 and to determine the rate of potential deleterious mutations within other genes in a large series of patients referred for a suspicion of HBOC. To fulfil both aims, we choose a strategy based on genomic capture and fully sequenced multiple genes, comparing three different capture designs.

Patients and methods

Patients

All the patients analyzed in this study have been seen in the context of a genetic session and fulfilled at least one the following criteria: (i) breast cancer before the age of 36, (ii) medullary breast adenocarcinoma before the age of 61, (iii) triple-negative breast cancer before the age of 41, (iv) male breast cancer before the age of 71, (v) ovarian adenocarcinoma before the age of 61, (vi) two breast cancer cases in first- or second-degree relatives (with a transmitting male), with at least one cancer before the age of 51 and the other before 71, (vii) three breast cancer cases in first- or second-degree relatives with at least one cancer before the age of 61, (viii) one breast cancer before the age of 51 in first-degree relatives with prostate cancer before the age of 61 or pancreas cancer before the age of 61.

A consecutive series of 708 patients was studied. Fifty-nine patients, previously genotyped for BRCA1 and BRCA2 and harbouring 62 representative variations, were used as controls (Table 1). For each patient, informed consent for genetic analysis was obtained.

Table 1 BRCA1 and BRCA2 variations used for the NGS validation

Enrichment

We used Agilent eArray to design three different Sureselect solution library baits (Agilent, Santa Clara, CA, USA), covering a variable number of genes (Table 2). For each gene, exons and introns were covered by the capture. The first design had been kindly provided by MC King’s laboratory and was used for the validation steps. Two different designs in addition to the first one were used to perform the prospective screening.19

Table 2 Description of the different capture designs and number of detected mutations

Sample preparation and next-generation sequencing

DNA was extracted from peripheral blood, using the EZ1 BioRobot (Qiagen, Courtaboeuf, France). DNA was sonicated using a Covaris S2 (Covaris, Inc., MS, Woburn, MA, USA). The sample preparation was performed with SPRIworks System I or HT-High Throughput (Beckman, Villepinte, France). Illumina adapters were replaced by indexed adapters (Eurogentec, Angers, France), previously published by Huentelman’s team.20 The SureSelect enrichment process was performed either before combining indexed samples (hereafter called ‘pooling after capture’) according to the manufacturer’s procedures (Agilent) or after combining equimolarly indexed samples (hereafter called ‘pooling before capture’) according to Kenny et al.21 The current protocol, available on request, was robotized on two Biomek FX workstations dedicated to the pre- and post-PCR zone. Libraries were then sequenced on GAIIx (Illumina, San Diego, CA, USA) using the paired-end 2 × 76 bp program.

Bioinformatic analyses

The bioinformatic pipeline was automated using scripts in Python Programming Language and Java for creating report files (see Supplementary Figure S1). We used one Hash-based and one BWT-based aligner software. The CASAVA suite v1.8 from Illumina ensured demultiplexing, generation of .fastq files, mapping of the reads and variant calling, then the variants were annotated using Alamut-HT (Interactive BioSoftware, Rouen, France). The default setting of the Eland-pair analysis implemented in CASAVA was used with the ‘variantsNoCovCutoff’ option. In parallel, the NextGENe software v2.1 (SoftGenetics, State College, PA, USA) was used. NextGENe parameters were: single-end read analysis, 15 base seeds, 5 base move step, matching base percentage ≥70, ‘detect large indels’ option on, mutation percentage ≥10. The CNVseq software was implemented for the detection of genomic rearrangements within BRCA1 and BRCA2.22 A 500 bp ‘windows size’ parameter was set up and data were filtered with a log2 ratio ≥0.4 for duplication and ≤−0.5 for deletion detection. For quality controls, bam files were inspected for coverage quality, using Samtools and the NextGENe software.23 If the coverage was below 20 × , the corresponding genomic region was checked by conventional methods (DHPLC-HRM-MLPA-Sanger sequencing). On BRCA1 and BRCA2, the clinical significance of variations detected was based on consensus data integrated in the French UMD-BRCA1/BRCA2 databases.24 Variations detected within TP53 were classified according the IARC TP53 database.25 For variations not listed as deleterious within the databases and for variations within the other genes of the capture, interpretations were based on the Align GV/GD tool and the minor allelic frequencies (MAF) estimated from the NHLBI GO Exome Sequencing Project Exome (ESP; http://evs.gs.washington.edu/EVS/).26 In this study, we classified as potentially deleterious missense mutations with an AGVGD score ≥C45 and a ESP MAF<0.01. Impact on splicing was predicted using MaxEntScan score and SpliceSiteFinder score. Variations outside the canonical AG/GT splice sites and inducing a 15% decrease of the MaxEntScan score and a 5% decrease of the SpliceSiteFinder score were considered potential splicing defects.27 All mutations have been submitted to the Universal Mutation Database (http://www.umd.be/BRCA1/, http://www.umd.be/BRCA2/).

Confirmation of the detected variations

For BRCA1 and BRCA2, all the SNV and indels detected using NGS within the coding sequence or ±50 bp within the intronic sequences and not recorded as polymorphisms were confirmed by Sanger sequencing, using the BigDye Terminator Cycle Sequencing V1.1 Ready Reaction kit (Life Technologies, Carlsbad, CA, USA). Genomic rearrangements detected by CNVseq software were checked by using semi-quantitative methods, alternatively MLPA (MRC-Holland, Amsterdam, Netherlands) or QMPSF. For the other genes included in the capture enrichment, all variations inducing a premature codon stop or a potential splicing defect were checked by Sanger sequencing.

Results

Validation of the NGS pipeline

First, we compared two indexing protocols based on pooling after capture and pooling before capture, respectively (Supplementary Figure S2). Twelve libraries prepared from patients harbouring known BRCA1 or BRCA2 mutations (patients T1 to T12, Table 1) were sequenced according to the two protocols, using the same index sequences. Each lane produced a mean number of reads equal to 60 × 106±5 × 106 SD (Supplementary Figure S2). No obvious sequencing biases between indexes were noted after the demultiplexing procedure. The two protocols used did not show differences in quality sequencing parameters and in coverage. All the variations tested were correctly identified (Table 1). Then, using the pooling before capture protocol, we evaluated the NGS quality control parameters on 48 other control DNA samples (T6 and T13 to T59; Table 1) harbouring 430 SNV and indels, and 13 large genomic rearrangements within BRCA1 or BRCA2. Each lane of GAIIx produced 90 × 106±6 × 106 SD reads. The average on-target ratio was equal to 42±1% SD. The coverage in targeted regions was on average equal to 225±27 × SD for six multiplexed libraries. Our NGS pipeline detected 438 SNV and indels, and 17 large genomic rearrangements. All the known variations tested were detected, showing a sensitivity of 100%. The eight additional detected SNV and indels were not confirmed by Sanger sequencing corresponding to 1.8% of false positives. Most of them were recurrent transversion variants with unbalanced forward/reverse strand ratio. Similarly, the four additional large genomic rearrangements were not validated by MLPA/QMPSF. Two discrepancies were noted between CASAVA and NextGENe software (Table 1). The BRCA2 c.156_157insAP003441.3:g.105088_105370 also known as c.156_157insAlu in patient T59, corresponding to a Portuguese founder mutation,28 was correctly detected only by CASAVA using orphan paired reads data and the BRCA1 c.5277+48_5277+59dup mutation in patient 15 was identified only by NextGENe. These two discrepancies led us to keep in the bioinformatics pipeline both the CASAVA and the NextGENe software.

The last step of the validation study was conducted on a group of 168 consecutive previously not analyzed patients by performing, in parallel, NGS and DHPLC-HRM-MLPA-Sanger sequencing. These methods had been routinely used in our laboratories for the molecular diagnosis of HBOC, in more than 4000 patients. Excluding polymorphisms, 85 variations, including 14 causal variations, of which one was a genomic rearrangement, were detected within in BRCA1 and BRCA2. All the variations detected by our conventional procedures were also detected by NGS.

Detection, using the NGS pipeline, of BRCA1, BRCA2 and TP53 mutations

The NGS workflow was applied to the molecular diagnosis of HBOC in 708 new patients, using different versions of the capture set (Table 2). A total of 69 germline deleterious mutations (37 in BRCA1 and 32 in BRCA2) were detected (Table 3): 53 mutations were predicted to induce a premature termination codon (PTC), 10 were previously known to induce a splicing defect, 3 corresponded to genomic rearrangements and 3 were known deleterious missense mutations. Eight additional missense variations within BRCA2 suspected to be deleterious were also detected. Among this series of patients (see the inclusion criteria in Materials and methods), the overall mutational detection rate on BRCA1 and BRCA2 was therefore 10.8%, which is comparable with the rates obtained by the French diagnostic laboratories (http://www.e-cancer.fr). We also detected four TP53 germline mutations, recorded as deleterious: c.638G>C, p.Arg213Pro; c.646G>A, p.Val216Met; c.704A>G, p.Asn235Ser; c.1010G>A, p.Arg337His (according to the reference sequence NM_000546.5). Among the four corresponding families (Supplementary Figure S3), only one family clearly met the Chompret criteria for the LFS.29 Three rare variants (EVS MAF <0.01) of unknown significance were also found in TP53: c.664C>T, p.Pro222Ser; c.1025G>A, p.Arg342Gln; c.1060C>A, p.Gln354Lys.

Table 3 Deleterious mutations found in BRCA1 and BRCA2 by NGS in 708 patients

Detection, using the NGS pipeline, of mutations affecting other genes

In the other panel genes, 36 variations inducing a PTC or affecting the canonical AG/GT splice sites were detected and were classified as deleterious (Tables 2 and 4, Figure 1). These mutations accounted for one-third of the mutations classified as deleterious in our series (Figure 1). Among 468 families, deleterious mutations within the MMR genes MSH2 and PMS2 were detected in five distinct families, among which two included at least one relative with an ovarian cancer (Supplementary Figure S4). Only one deleterious mutation was found in CDH1, and retrospectively, gastric cancers were mentioned within the corresponding family (data not shown). We also detected, among 708 patients, 10 inactivating mutations within PALB2 and RAD51C and 10 inactivating mutations within CHEK2 and ATM (Table 4). In addition, according to the thresholds described in Materials and methods (section Bioinformatic analyses) for missense changes and for potential splicing mutations, 28 missense changes could be suspected to be deleterious and 11 variants were predicted to induce splicing defect (Table 4, Supplementary Table S1).

Table 4 Inactivating mutations detected in other genes than BRCA1, BRCA2 and TP53
Figure 1
figure 1

Relative distribution of variants detected with NGS in 708 HBOC patients. Percentages were based on the number of time the gene was sequenced depending on the version of the capture design.

Discussion

The transfer of NGS from research to diagnostic laboratories is today one of the most important challenges in medical genetics. If NGS offers considerable possibilities in terms of throughput, the entire procedure, including the bioinformatic analyses, should fulfil the quality requirement of diagnostic laboratories. Here, we show and validate the efficiency of the entire NGS procedure for the molecular diagnosis of HBOC. Indeed, all SNVs, indels and genomic rearrangements tested and previously detected by conventional methods, were detected using our pipeline. Compared with the classical methods commonly used for genetic analyses, the sensitivity of the entire procedure was estimated to be 100% and we observed 1.8% of false positives. We found, that the CNVseq algorithm was able to detect genomic rearrangements with a good sensitivity and specificity. Nevertheless, this algorithm is time-processor-consuming and, therefore, other software such as CONTRA should offer an interesting alternative.30 A major advantage of the NGS procedure is that the progressive implementation of specific software should allow the detection of other types of alterations, which are not found by using conventional procedures. For instance, complex rearrangements, such as inversions, should be detected taking advantage of paired-end data, by using software such as PINDEL.31 Deep intronic mutations probably constitute a reservoir of undetected mutations and the capture of intronic sequences represents also an additional advantage of a capture-based NGS strategy in the future, when tools for the interpretation of intronic mutations will be available.

In terms of efficiency, transition from conventional to NGS procedures has allowed our medium-size molecular diagnostic laboratory (including three full time technicians, two bioinformaticians and two medical geneticists) to perform 1000 complete screenings of BRCA1/BRCA2 per year, each analysis being completed and validated within a maximum of 3 months, in a routine procedure. We observed that the implementation of NGS did not result into a dramatic increase of reagent cost per patient, as this cost was evaluated, for the conventional and for the NGS procedures, to 292 and 311 euros, respectively. The main consequence, in terms of human power is the integration of fulltime bioinformaticians in medical diagnostic laboratories, regardless the type of platform.32 In our experience, this is indeed absolutely crucial to construct the informatics pipeline, to evaluate the numerous available software and to generate quality reports at each step of the process.

Another advantage of a NGS procedure based on gene capture is the possibility to simultaneously analyze other genes involved in the phenotype. Therefore, an additional aim of our study was to estimate, on a large series of patients, the mutation detection rate within the other genes that have been demonstrated or suspected to be involved in HBOC. Among these, the other gene whose mutations also confer a high risk for breast cancer is TP53. Early-onset breast cancer is one of the canonical tumours of the LFS spectrum and the lifetime breast cancer risk of germline TP53 mutations has been estimated to be 49%.33 TP53 mutation detection rate in women with breast cancer before 36 years of age and without detectable BRCA1 or BRCA2 mutation has been estimated to be 7%.29, 34 Our TP53 mutation detection rate in this series of 468 patients analyzed for TP53 was lower (1%), which can easily be explained by the fact that our patients had not been selected on an early age of breast cancer onset. In patients harbouring germline TP53 mutations, several studies have highlighted the risk of secondary tumours in the field of radiotherapy suggesting that, in a breast patient with germline TP53 mutation, radiotherapy should be avoided.35, 36, 37 This is a strong additional argument justifying the inclusion of TP53 in a diagnostic NGS panel for breast cancer. Nevertheless, considering the wide LFS tumour spectrum and tumour risk in children, TP53 testing should be carefully considered and the medical implications of a positive test should be clearly explained to the patient before the test. The increased risk of ovarian cancer in MMR mutation carriers led us to include the MMR genes into the NGS panel. In our series, we detected five deleterious MMR mutations in five families whose presentation was not strongly suggestive of Lynch syndrome (Supplementary Figure S3). The low mutation detection rate is counterbalanced by the medical benefit resulting from the identification of a MMR germline mutation.38, 39

The detection, among 708 patients suspected of HBOC, of 20 inactivating mutations within PALB2, RAD51C, CHEK2 and ATM (Table 2), indicates that their collective contribution can be estimated at least to 3% and provides another argument highlighting the genetic heterogeneity of HBOC.19, 40 Within families harbouring these mutations, segregation studies will be performed to estimate their causality and penetrance. At the present time, published data concerning the causality of these mutations are still insufficient to integrate these genes into a routine HBOC diagnostic panel. For the other genes of the panel like MRE11A, NBN, BARD1 and BRIP1, additional studies are needed to validate their implication in HBOC predisposition.

In conclusion, this report shows that the deployment of NGS in medical laboratories significantly increases the throughput, reduces the delay and optimizes the molecular diagnosis of HBOC. Considering the medical consequences of the identification of a deleterious mutation within a family, NGS represents a remarkable progress for the clinical management of the families. National and international networks of medical laboratories using NGS should facilitate diffusion, evaluation comparison of NGS tools and software. The second challenge for laboratories performing HBOC diagnosis will be the interpretation of mutations identified, in particular, within the other genes than BRCA1 and BRCA2. Even more than in the pre-NGS era, the creation of databases of clinical grade and the interaction with clinicians will be essential.