Main

Next-generation sequencing (NGS) technologies require probabilistic algorithms for the conversion of uniquely aligned short sequence reads into genotypes. These algorithms are sensitive to multiple sources of error, including sequencing errors, incorrect alignment (“mismapping”), and random sampling.1,2,3,4,5,6,7,8 False-positive results due to sequencing errors are particularly prevalent when read depth is below 10 reads per base on average (“10× coverage” by convention).3 Due to this uncertainty, amplification-based dye terminator dideoxy DNA (“Sanger”) sequencing has been used routinely to confirm NGS results.9,10,11,12,13,14,15,16

However, as read depth increases and additional samples are tested using a consistent experimental protocol and analytical pipeline, more information is available to interrogate the validity of a given variant call. Valuable data, in addition to the count of reference and nonreference (“variant”) nucleotides observed at a given position, are amassed. These data include mapping quality, strand origin, base call quality, position of the variant within a sequence read, haplotype information, and cross-sample comparisons. The commonly used genotype-calling pipeline using the Genome Analysis Toolkit1,17 implements a Bayesian genotype likelihood model (based on known polymorphic loci such as dbSNP variants) and variant quality score recalibration to estimate posterior probabilities for each variant call (with hapmap_3.3.b37.sites and 1000G_omni2.5.b37.sites for training resources).

Although technically these final quality scores (“Qscores”, or “QN,” where N is a value greater than zero) are reported as log-scaled probabilities, comparison across experiment types is not advisable because of the large degree of variability of data volume, data quality, and options between NGS analytical pipelines. In this study, Qscores are considered to be relative measures and are compared only between clinical exome-sequencing (CES) data sets from the end-to-end analytically validated procedures established in the University of California, Los Angeles (UCLA) Clinical Genomics Center, which is part of the UCLA Molecular Diagnostics Laboratories (accredited by both the Clinical Laboratory Improvement Amendments (CLIA) and the College of American Pathologists).

For variants with high quality scores (>Q10,000) and high coverage (>100×), the amount of information supporting the genotype call is overwhelming. For such variants, failure to replicate the finding by Sanger sequencing is highly indicative of human error (e.g., sample swap). Thus, for high-quality NGS variants, Sanger confirmation serves almost exclusively as a sample quality control measure. Therefore, it is the goal of this study to establish a conservative internal quality score cutoff, above which Sanger confirmation of CES-identified variants will no longer be a necessary quality control measure in our laboratory.

Materials and Methods

CES procedure

Exome sequencing was performed in the UCLA Clinical Genomics Center (http://pathology.ucla.edu/genomics) following validated protocols. Briefly, high-molecular-weight genomic DNA was isolated from whole blood collected in a lavender top tube (ethylene diamine tetraacetic acid dipotassium (K2EDTA) or ethylene diamine tetraacetic acid tripotasium (K3EDTA)) using a QIAcube (Qiagen, Venlo, The Netherlands). For all of the clinical samples, exome sequencing was performed using the Agilent SureSelect Human All Exon 50mb (Agilent Technologies, Santa Clara, CA) for exome capture and Illumina HiSeq2000 (Illumina, San Diego, CA) for sequencing as 50-bp paired-end runs using V3 chemistry. For the nonclinical samples, Agilent SureSelect Human All Exon 50mb XT kit (V2) was used for exome capture and Illumina HiSeq2000 for sequencing as 100-bp paired-end runs using V3 chemistry.

Data analysis was performed using the analytical pipeline implemented and validated for CES in the UCLA Clinical Genomics Center. All sequence reads were aligned to the human reference genome (Human GRCh37/hg19) using Novoalign (Novocraft, Selangor, Malaysia). Polymerase chain reaction (PCR) duplicates were marked by Picard (http://picard.sourceforge.net/), and Genome Analysis Toolkit (Broad Institute, Cambridge, MA) was used to realign indels, recalibrate the quality scores, call, filter, recalibrate, and evaluate the variants. All variants called across the protein-coding regions and the flanking junctions were annotated using Variant Annotator X, an in-house MySQL database using data from the publicly available Ensembl Variant Effect Predictor (European Bioinformatics Institute, Hinxston, UK).18 A detailed description of the bioinformatic methods used to analyze these data is presented in Supplementary Materials online.

Several steps are taken to reduce the probability of sample swap errors in our laboratory, including the following: (i) assays are performed by appropriately licensed technologists with experience in NGS workflows; (ii) at least two unique identifiers are used to label all reaction vessels and worksheets at all preanalytical stages; and (iii) samples are alternated by gender. In addition, when related individuals are tested as part of a trio, Mendelian errors are analyzed by counting the number of inconsistent genotypes. For instance, from internal experience, the proband should not have more than five de novo amino acid–altering rare variants, and approximately half of the heterozygous variants present in the proband should be inherited from the mother and the other half from the father. Moreover, when available, previous genetic testing results (such as variants described in clinical reports from individual gene assessments or regions of homozygosity from chromosomal microarray analyses) are cross-referenced with the CES data also. Some additional steps that laboratories could use to reduce sample errors include running samples in duplicate, running the CES assay in parallel with a single-nucleotide polymorphism array or genotyping identity panel for concordance analysis (if not previously performed as mentioned earlier), or spiking the blood sample with a unique plasmid during extraction and confirming that the plasmid sequence occurs in the final result.

Variant selection

All clinically reported variants, both clinically significant findings and variants of uncertain significance, were selected for confirmation. In total, 110 unique single-nucleotide variants (SNVs) were selected for Sanger confirmation ( Table 1 ), and a subset (16 SNVs) of these was randomly selected for assessment from a pool of variants with quality scores <Q2000. These additional variants had not been clinically reported because they are not in genes that are known to cause any clinical condition.

Table 1 Variant information

All SNVs selected, regardless of report status, are predicted to be nonsynonymous and are rare (with an average minor allele frequency <1% in the Exome Variant Server).19

Sanger sequencing

PCR primers were designed for each target locus using the Web-based Primer3Plus software (Andreas Untergasser and Harm Nijveen).20 Targets were amplified using PCR and subjected to agarose gel electrophoresis for size analysis of resulting amplicons. If no amplicon was observed, multiple amplicons were observed, or an amplicon of improper size was observed, a second independent set of PCR primers was designed and tested in a similar fashion.

Unique, properly sized amplicons were purified using standard techniques. BigDye Terminator DNA-sequencing reactions were then performed on eluted amplicons, and sequencing was conducted by automated capillary gel electrophoresis (ABI 3730, Life Technologies, Carlsbad, CA). A clinical molecular geneticist certified by the American Board of Medical Genetics manually analyzed the resulting sequence traces using Sequence Scanner (Applied Biosystems, Foster City, CA).

Cost analysis

The reagent cost per validation is estimated to be US$20 per variant on average. The personnel cost for designing PCR primers, running the assay, analyzing the data, and interpreting the results is estimated to be US$120. Overhead (including facilities, maintenance, instrument costs, and other considerations) contribute approximately US$100 per test. Combined, the estimated cost of performing Sanger confirmation of a single SNV is thus approximately US$240. These values were calculated based on standard clinical molecular genetics practices and average licensed medical technologist salaries in the UCLA Molecular Diagnostics Laboratories.

Results

Exome-sequencing results were confirmed for 103/103 (100%) of SNVs, with quality scores ≥Q500 ( Table 1 ). The coverage depth for these variants ranged from 5× to 250×, with a mean of 116×. The correlation between quality score and coverage depth is positive and statistically significant (R = 0.56; P < 0.0001; Figure 1 ). Of the seven SNVs with quality scores <Q500, only one was not corroborated by the Sanger sequencing data ( Table 1 and Figure 2 ).

Figure 1
figure 1

Correlation between quality scores and depth of coverage. Individual quality scores are plotted against read depth for 110 single-nucleotide variant loci tested. Quality score threshold of Q500 is marked by a dashed gray vertical line. The correlation is positive and significant (Pearson correlation significance test, P < 10–13).

Figure 2
figure 2

Validation results sorted by quality score. Each single-nucleotide variant (SNV) tested is represented by a point, sorted by ascending quality score. Red points represent SNVs with quality scores <Q500 (horizontal red dashed line). Vertical red bars indicate failure to confirm.

From the first 144 signed-out reports, the average number of reported variants per report is ~1 (range: 0–5 variants). With an estimated cost of US$240 per confirmation and a sample volume of 40 reports per month, the total cost to the laboratory performing the test is US$9,600 per month (US$115,200 per year). Furthermore, the number of clinically relevant (nonincidental) variants reported per case is not expected to decline over time. Instead, as more disease–gene associations are made, we expect the number of cases with at least one potentially causal variant to increase. Thus, the cost of Sanger confirmation will scale at least linearly with this increased sample volume. Notably, turnaround times for reports requiring variant confirmation were delayed at least 1 week on average compared with reports with no reported variants.

Discussion

For each UCLA CES test, the decision to report a variant begins with interpretation by a group of diverse experts at a Genomic Data Board. This interpretation considers the molecular genetic evidence (e.g., the effect that a DNA change is predicted to have on its corresponding protein product) as it relates directly to the primary clinical concern(s) noted by the ordering physician. At present, incidental findings are not reported. If the board decides that a variant is worthy of reporting, the laboratory then considers the technical validity of the finding. Before May 2013, Sanger sequencing was used as an alternative methodology for validation of each reported variant. Since that time, only indels and SNVs with quality scores <Q500 are validated by Sanger sequencing.

Because it has been considered the gold standard for more than two decades, using capillary-based Sanger sequencing for confirmation of all NGS results is a safe choice. However, taken out of context, this is highly unusual; technical confirmation of results from a validated assay using an alternative methodology before reporting is not often done for other types of molecular testing. Additionally, there are several specific reasons to suspect that Sanger confirmation of all clinically relevant SNVs detected by NGS is an unnecessarily conservative approach with significant drawbacks.

First, NGS can be sampled to generate dozens or hundreds of independent reads across a locus, whereas increased sampling of Sanger sequencing requires technical replicates. Although a Sanger sequencing peak does represent a large number of individual DNA molecules, these are clonal and arise from an unknown number of original template molecules. At heterozygous positions sequenced bidirectionally, the minimum number of original template molecules required to produce a signal is only four: forward reference, forward alternate, reverse reference, and reverse alternate. Although it is likely that a larger number of template molecules are typically amplified, it is not possible to assess or confirm this number due to the clonal nature of PCR amplification. Moreover, the error rate for a single base is relatively higher in NGS than in Sanger sequencing; however, high read depth (“coverage”) of a locus can overcome this issue.

Additionally, PCR-based amplification is susceptible to allele dropout due to cryptic variation within primer-binding sites, whereas the target enrichment techniques used in exome sequencing are not. Additionally, some genomic intervals are extremely difficult to amplify and may not yield high-quality Sanger results despite multiple attempts. Being unable to report a clinically significant variant due to a failure of the Sanger technology introduces a challenging obstacle if the NGS assay is analytically validated.

NGS variant identification is not without error. However, above a certain hypothetical quality threshold, the probability of observing a false-positive NGS result is lower than the false-negative rate of Sanger sequencing (which itself is not perfect). This means that for variants meeting this threshold, performing Sanger sequencing is noninformative beyond sample quality control because the vast majority of results will be concordant and the remaining negative results will not be interpretable. Thus, such high-quality NGS results, when routinely obtained using a method validated by a clinical laboratory, should be considered an equally defensible “gold standard.”

The difficulty then is in determining a high-confidence quality threshold. Coverage depth is a useful guide, but probabilistic genotyping algorithms—such as those implemented within the Genome Analysis Toolkit17—provide highly informative quality scores. Because quality scores are assay specific and relative, it is not possible to calculate an a priori threshold value. Rather, based on a sample of 110 SNV confirmation tests, we have established a conservative in-house quality score threshold of Q500 (~40× coverage) for the CES test in our laboratory, above which all 103 SNVs detected were confirmed by Sanger sequencing.

Manual inspection of variant calls using a visualization tool such as the Integrative Genomics Browser21,22 by a genomics expert is a potential alternative to our quality score threshold approach. Although our experiences generally support this as a valid potential solution, we do not have sufficient data to broadly assess the efficacy of this approach.

Small insertions and deletions (“indels,” defined here as being <10 bp long) are also detected by CES and are reported if clinically significant. At this time, we do not have sufficient data to propose a quality score threshold for confirmation of indels and will thus continue to confirm all such reported variants by the Sanger method.

The perceived benefits of performing Sanger confirmation on all NGS-detected SNVs lie in quality control and risk avoidance. These must be weighed against increased test cost, delayed turnaround time, and the potentially paralyzing failure to confirm a very high–quality variant of clinical significance. Although current professional practice guidelines recommend confirmatory testing of all clinical NGS results, they also allow for laboratories to reduce the amount of confirmatory testing performed as long as suitable validation studies have been completed.23 Follow-up testing of identified variants in additional family members for carrier or presymptomatic status by Sanger sequencing is performed in our laboratory on request for an additional fee. However, in practice, this has been a rare occurrence; for the majority of our exome-sequencing cases, the original proband is the only family member tested, which also argues against the need to have a pair of Sanger sequencing primers available in the laboratory for every variant detected.

All genetic tests introduce uncertainty. At the genomic level, it is the exception, not the rule, when a causal relationship between a genetic variant and a clinical condition can be made absolutely. Thus, when counseling for CES results, the slight probability of a high-quality variant being an analytical false positive is typically a minor consideration compared with the uncertainty of genotype–phenotype relationships. This argues against devoting large amounts of resources to confirmatory testing for variants of high confidence, especially when the testing laboratory is conservative in the ascertainment and reporting of “causative” variants, as ours is.

Stemming from these theoretical and practical considerations, and based on data resulting from the confirmation of 110 SNVs, our group has decided to discontinue routine Sanger confirmation of reported CES results with quality scores >Q500 (SNVs only; Table 2 ). However, other laboratories wishing to follow this paradigm must establish their own quality thresholds for each assay and provide empirical evidence to support those decisions.

Table 2 Summary of Sanger confirmation results, split by quality score threshold of Q500

Disclosure

S.P.S., H.L., K.D., S.F.N., W.W.G., and J.L.D. are employed by a fee-for-service laboratory at UCLA performing next-generation sequencing.