Abstract
Purpose:
Sanger sequencing is currently considered the gold standard methodology for clinical molecular diagnostic testing. However, next-generation sequencing has already emerged as a much more efficient means to identify genetic variants within gene panels, the exome, or the genome. We sought to assess the accuracy of next-generation sequencing variant identification in our clinical genomics laboratory with the goal of establishing a quality score threshold for confirmatory Sanger-based testing.
Methods:
Confirmation data for reported results from 144 sequential clinical exome-sequencing cases (94 unique variants) and an additional set of 16 variants from comparable research samples were analyzed.
Results:
Of the 110 total single-nucleotide variants analyzed, 103 variants had a quality score ≥Q500, 103 (100%) of which were confirmed by Sanger sequencing. Of the remaining seven variants with quality scores <Q500, six were confirmed by Sanger sequencing (85%).
Conclusion:
For single-nucleotide variants, we predict that going forward, we will be able to reduce our Sanger confirmation workload by 70–80%. This serves as a proof of principle that as long as sufficient validation and quality control measures are implemented, the volume of Sanger confirmation can be reduced, alleviating a significant amount of the labor and cost burden on clinical laboratories wishing to use next-generation sequencing technology. However, Sanger confirmation of low-quality single-nucleotide variants and all insertions or deletions <10 bp remains necessary at this time in our laboratory.
Genet Med 16 7, 510–515.
Similar content being viewed by others
Main
Next-generation sequencing (NGS) technologies require probabilistic algorithms for the conversion of uniquely aligned short sequence reads into genotypes. These algorithms are sensitive to multiple sources of error, including sequencing errors, incorrect alignment (“mismapping”), and random sampling.1,2,3,4,5,6,7,8 False-positive results due to sequencing errors are particularly prevalent when read depth is below 10 reads per base on average (“10× coverage” by convention).3 Due to this uncertainty, amplification-based dye terminator dideoxy DNA (“Sanger”) sequencing has been used routinely to confirm NGS results.9,10,11,12,13,14,15,16
However, as read depth increases and additional samples are tested using a consistent experimental protocol and analytical pipeline, more information is available to interrogate the validity of a given variant call. Valuable data, in addition to the count of reference and nonreference (“variant”) nucleotides observed at a given position, are amassed. These data include mapping quality, strand origin, base call quality, position of the variant within a sequence read, haplotype information, and cross-sample comparisons. The commonly used genotype-calling pipeline using the Genome Analysis Toolkit1,17 implements a Bayesian genotype likelihood model (based on known polymorphic loci such as dbSNP variants) and variant quality score recalibration to estimate posterior probabilities for each variant call (with hapmap_3.3.b37.sites and 1000G_omni2.5.b37.sites for training resources).
Although technically these final quality scores (“Qscores”, or “QN,” where N is a value greater than zero) are reported as log-scaled probabilities, comparison across experiment types is not advisable because of the large degree of variability of data volume, data quality, and options between NGS analytical pipelines. In this study, Qscores are considered to be relative measures and are compared only between clinical exome-sequencing (CES) data sets from the end-to-end analytically validated procedures established in the University of California, Los Angeles (UCLA) Clinical Genomics Center, which is part of the UCLA Molecular Diagnostics Laboratories (accredited by both the Clinical Laboratory Improvement Amendments (CLIA) and the College of American Pathologists).
For variants with high quality scores (>Q10,000) and high coverage (>100×), the amount of information supporting the genotype call is overwhelming. For such variants, failure to replicate the finding by Sanger sequencing is highly indicative of human error (e.g., sample swap). Thus, for high-quality NGS variants, Sanger confirmation serves almost exclusively as a sample quality control measure. Therefore, it is the goal of this study to establish a conservative internal quality score cutoff, above which Sanger confirmation of CES-identified variants will no longer be a necessary quality control measure in our laboratory.
Materials and Methods
CES procedure
Exome sequencing was performed in the UCLA Clinical Genomics Center (http://pathology.ucla.edu/genomics) following validated protocols. Briefly, high-molecular-weight genomic DNA was isolated from whole blood collected in a lavender top tube (ethylene diamine tetraacetic acid dipotassium (K2EDTA) or ethylene diamine tetraacetic acid tripotasium (K3EDTA)) using a QIAcube (Qiagen, Venlo, The Netherlands). For all of the clinical samples, exome sequencing was performed using the Agilent SureSelect Human All Exon 50mb (Agilent Technologies, Santa Clara, CA) for exome capture and Illumina HiSeq2000 (Illumina, San Diego, CA) for sequencing as 50-bp paired-end runs using V3 chemistry. For the nonclinical samples, Agilent SureSelect Human All Exon 50mb XT kit (V2) was used for exome capture and Illumina HiSeq2000 for sequencing as 100-bp paired-end runs using V3 chemistry.
Data analysis was performed using the analytical pipeline implemented and validated for CES in the UCLA Clinical Genomics Center. All sequence reads were aligned to the human reference genome (Human GRCh37/hg19) using Novoalign (Novocraft, Selangor, Malaysia). Polymerase chain reaction (PCR) duplicates were marked by Picard (http://picard.sourceforge.net/), and Genome Analysis Toolkit (Broad Institute, Cambridge, MA) was used to realign indels, recalibrate the quality scores, call, filter, recalibrate, and evaluate the variants. All variants called across the protein-coding regions and the flanking junctions were annotated using Variant Annotator X, an in-house MySQL database using data from the publicly available Ensembl Variant Effect Predictor (European Bioinformatics Institute, Hinxston, UK).18 A detailed description of the bioinformatic methods used to analyze these data is presented in Supplementary Materials online.
Several steps are taken to reduce the probability of sample swap errors in our laboratory, including the following: (i) assays are performed by appropriately licensed technologists with experience in NGS workflows; (ii) at least two unique identifiers are used to label all reaction vessels and worksheets at all preanalytical stages; and (iii) samples are alternated by gender. In addition, when related individuals are tested as part of a trio, Mendelian errors are analyzed by counting the number of inconsistent genotypes. For instance, from internal experience, the proband should not have more than five de novo amino acid–altering rare variants, and approximately half of the heterozygous variants present in the proband should be inherited from the mother and the other half from the father. Moreover, when available, previous genetic testing results (such as variants described in clinical reports from individual gene assessments or regions of homozygosity from chromosomal microarray analyses) are cross-referenced with the CES data also. Some additional steps that laboratories could use to reduce sample errors include running samples in duplicate, running the CES assay in parallel with a single-nucleotide polymorphism array or genotyping identity panel for concordance analysis (if not previously performed as mentioned earlier), or spiking the blood sample with a unique plasmid during extraction and confirming that the plasmid sequence occurs in the final result.
Variant selection
All clinically reported variants, both clinically significant findings and variants of uncertain significance, were selected for confirmation. In total, 110 unique single-nucleotide variants (SNVs) were selected for Sanger confirmation ( Table 1 ), and a subset (16 SNVs) of these was randomly selected for assessment from a pool of variants with quality scores <Q2000. These additional variants had not been clinically reported because they are not in genes that are known to cause any clinical condition.
All SNVs selected, regardless of report status, are predicted to be nonsynonymous and are rare (with an average minor allele frequency <1% in the Exome Variant Server).19
Sanger sequencing
PCR primers were designed for each target locus using the Web-based Primer3Plus software (Andreas Untergasser and Harm Nijveen).20 Targets were amplified using PCR and subjected to agarose gel electrophoresis for size analysis of resulting amplicons. If no amplicon was observed, multiple amplicons were observed, or an amplicon of improper size was observed, a second independent set of PCR primers was designed and tested in a similar fashion.
Unique, properly sized amplicons were purified using standard techniques. BigDye Terminator DNA-sequencing reactions were then performed on eluted amplicons, and sequencing was conducted by automated capillary gel electrophoresis (ABI 3730, Life Technologies, Carlsbad, CA). A clinical molecular geneticist certified by the American Board of Medical Genetics manually analyzed the resulting sequence traces using Sequence Scanner (Applied Biosystems, Foster City, CA).
Cost analysis
The reagent cost per validation is estimated to be US$20 per variant on average. The personnel cost for designing PCR primers, running the assay, analyzing the data, and interpreting the results is estimated to be US$120. Overhead (including facilities, maintenance, instrument costs, and other considerations) contribute approximately US$100 per test. Combined, the estimated cost of performing Sanger confirmation of a single SNV is thus approximately US$240. These values were calculated based on standard clinical molecular genetics practices and average licensed medical technologist salaries in the UCLA Molecular Diagnostics Laboratories.
Results
Exome-sequencing results were confirmed for 103/103 (100%) of SNVs, with quality scores ≥Q500 ( Table 1 ). The coverage depth for these variants ranged from 5× to 250×, with a mean of 116×. The correlation between quality score and coverage depth is positive and statistically significant (R = 0.56; P < 0.0001; Figure 1 ). Of the seven SNVs with quality scores <Q500, only one was not corroborated by the Sanger sequencing data ( Table 1 and Figure 2 ).
From the first 144 signed-out reports, the average number of reported variants per report is ~1 (range: 0–5 variants). With an estimated cost of US$240 per confirmation and a sample volume of 40 reports per month, the total cost to the laboratory performing the test is US$9,600 per month (US$115,200 per year). Furthermore, the number of clinically relevant (nonincidental) variants reported per case is not expected to decline over time. Instead, as more disease–gene associations are made, we expect the number of cases with at least one potentially causal variant to increase. Thus, the cost of Sanger confirmation will scale at least linearly with this increased sample volume. Notably, turnaround times for reports requiring variant confirmation were delayed at least 1 week on average compared with reports with no reported variants.
Discussion
For each UCLA CES test, the decision to report a variant begins with interpretation by a group of diverse experts at a Genomic Data Board. This interpretation considers the molecular genetic evidence (e.g., the effect that a DNA change is predicted to have on its corresponding protein product) as it relates directly to the primary clinical concern(s) noted by the ordering physician. At present, incidental findings are not reported. If the board decides that a variant is worthy of reporting, the laboratory then considers the technical validity of the finding. Before May 2013, Sanger sequencing was used as an alternative methodology for validation of each reported variant. Since that time, only indels and SNVs with quality scores <Q500 are validated by Sanger sequencing.
Because it has been considered the gold standard for more than two decades, using capillary-based Sanger sequencing for confirmation of all NGS results is a safe choice. However, taken out of context, this is highly unusual; technical confirmation of results from a validated assay using an alternative methodology before reporting is not often done for other types of molecular testing. Additionally, there are several specific reasons to suspect that Sanger confirmation of all clinically relevant SNVs detected by NGS is an unnecessarily conservative approach with significant drawbacks.
First, NGS can be sampled to generate dozens or hundreds of independent reads across a locus, whereas increased sampling of Sanger sequencing requires technical replicates. Although a Sanger sequencing peak does represent a large number of individual DNA molecules, these are clonal and arise from an unknown number of original template molecules. At heterozygous positions sequenced bidirectionally, the minimum number of original template molecules required to produce a signal is only four: forward reference, forward alternate, reverse reference, and reverse alternate. Although it is likely that a larger number of template molecules are typically amplified, it is not possible to assess or confirm this number due to the clonal nature of PCR amplification. Moreover, the error rate for a single base is relatively higher in NGS than in Sanger sequencing; however, high read depth (“coverage”) of a locus can overcome this issue.
Additionally, PCR-based amplification is susceptible to allele dropout due to cryptic variation within primer-binding sites, whereas the target enrichment techniques used in exome sequencing are not. Additionally, some genomic intervals are extremely difficult to amplify and may not yield high-quality Sanger results despite multiple attempts. Being unable to report a clinically significant variant due to a failure of the Sanger technology introduces a challenging obstacle if the NGS assay is analytically validated.
NGS variant identification is not without error. However, above a certain hypothetical quality threshold, the probability of observing a false-positive NGS result is lower than the false-negative rate of Sanger sequencing (which itself is not perfect). This means that for variants meeting this threshold, performing Sanger sequencing is noninformative beyond sample quality control because the vast majority of results will be concordant and the remaining negative results will not be interpretable. Thus, such high-quality NGS results, when routinely obtained using a method validated by a clinical laboratory, should be considered an equally defensible “gold standard.”
The difficulty then is in determining a high-confidence quality threshold. Coverage depth is a useful guide, but probabilistic genotyping algorithms—such as those implemented within the Genome Analysis Toolkit17—provide highly informative quality scores. Because quality scores are assay specific and relative, it is not possible to calculate an a priori threshold value. Rather, based on a sample of 110 SNV confirmation tests, we have established a conservative in-house quality score threshold of Q500 (~40× coverage) for the CES test in our laboratory, above which all 103 SNVs detected were confirmed by Sanger sequencing.
Manual inspection of variant calls using a visualization tool such as the Integrative Genomics Browser21,22 by a genomics expert is a potential alternative to our quality score threshold approach. Although our experiences generally support this as a valid potential solution, we do not have sufficient data to broadly assess the efficacy of this approach.
Small insertions and deletions (“indels,” defined here as being <10 bp long) are also detected by CES and are reported if clinically significant. At this time, we do not have sufficient data to propose a quality score threshold for confirmation of indels and will thus continue to confirm all such reported variants by the Sanger method.
The perceived benefits of performing Sanger confirmation on all NGS-detected SNVs lie in quality control and risk avoidance. These must be weighed against increased test cost, delayed turnaround time, and the potentially paralyzing failure to confirm a very high–quality variant of clinical significance. Although current professional practice guidelines recommend confirmatory testing of all clinical NGS results, they also allow for laboratories to reduce the amount of confirmatory testing performed as long as suitable validation studies have been completed.23 Follow-up testing of identified variants in additional family members for carrier or presymptomatic status by Sanger sequencing is performed in our laboratory on request for an additional fee. However, in practice, this has been a rare occurrence; for the majority of our exome-sequencing cases, the original proband is the only family member tested, which also argues against the need to have a pair of Sanger sequencing primers available in the laboratory for every variant detected.
All genetic tests introduce uncertainty. At the genomic level, it is the exception, not the rule, when a causal relationship between a genetic variant and a clinical condition can be made absolutely. Thus, when counseling for CES results, the slight probability of a high-quality variant being an analytical false positive is typically a minor consideration compared with the uncertainty of genotype–phenotype relationships. This argues against devoting large amounts of resources to confirmatory testing for variants of high confidence, especially when the testing laboratory is conservative in the ascertainment and reporting of “causative” variants, as ours is.
Stemming from these theoretical and practical considerations, and based on data resulting from the confirmation of 110 SNVs, our group has decided to discontinue routine Sanger confirmation of reported CES results with quality scores >Q500 (SNVs only; Table 2 ). However, other laboratories wishing to follow this paradigm must establish their own quality thresholds for each assay and provide empirical evidence to support those decisions.
Disclosure
S.P.S., H.L., K.D., S.F.N., W.W.G., and J.L.D. are employed by a fee-for-service laboratory at UCLA performing next-generation sequencing.
References
DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491–498.
Kiezun A, Garimella K, Do R, et al. Exome sequencing and the genetic basis of complex traits. Nat Genet 2012;44:623–630.
Abecasis GR, Altshuler D, Auton A, et al. A map of human genome variation from population-scale sequencing. Nature 2010; 467:1061–1073.
Brockman W, Alvarez P, Young S, et al. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res 2008;18:763–770.
Li M, Nordborg M, Li LM . Adjust quality scores from alignment and improve sequencing accuracy. Nucleic Acids Res 2004;32:5183–5191.
Li H, Handsaker B, Wysoker A, et al.; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–2079.
Li H, Homer N . A survey of sequence alignment algorithms for next-generation sequencing. Brief Bioinformatics 2010;11:473–483.
Li Y, Vinckenbosch N, Tian G, et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nat Genet 2010;42:969–972.
Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 2009;461:272–276.
Ng SB, Buckingham KJ, Lee C, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 2010;42:30–35.
Sikkema-Raddatz B, Johansson LF, de Boer EN, et al. Targeted next-generation sequencing can replace Sanger sequencing in clinical diagnostics. Hum Mutat 2013;34:1035–1042.
Peng G, Fan Y, Palculict TB, et al. Rare variant detection using family-based sequencing analysis. Proc Natl Acad Sci USA 2013;110:3985–3990.
McCourt CM, McArt DG, Mills K, et al. Validation of next generation sequencing technologies in comparison to current diagnostic gold standards for BRAF, EGFR and KRAS mutational analysis. PLoS ONE 2013;8:e69604.
Dames S, Chou LS, Xiao Y, et al. The development of next-generation sequencing assays for the mitochondrial genome and 108 nuclear genes associated with mitochondrial disorders. J Mol Diagn 2013;15:526–534.
Sivakumaran TA, Husami A, Kissell D, et al. Performance evaluation of the next-generation sequencing approach for molecular diagnosis of hereditary hearing loss. Otolaryngol Head Neck Surg 2013;148:1007–1016.
Chin EL, da Silva C, Hegde M . Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations. BMC Genet 2013;14:6.
McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297–1303.
McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F . Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 2010;26:2069–2070.
NHLBI Exome Sequencing Project (ESP). Exome Variant Server [cited 2012 December]. http://evs.gs.washington.edu/EVS/.
Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JA . Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res 2007;35(Web Server issue):W71–W74.
Robinson JT, Thorvaldsdóttir H, Winckler W, et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24–26.
Thorvaldsdóttir H, Robinson JT, Mesirov JP . Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinformatics 2013;14:178–192.
Rehm HL, Bale SJ, Bayrak-Toydemir P, et al.; Working Group of the American College of Medical Genetics and Genomics Laboratory Quality Assurance Committee. ACMG clinical laboratory standards for next-generation sequencing. Genet Med 2013;15:733–747.
Acknowledgements
This work was made possible by the use of data generated from consented clinical testing participants, and we thank them for this valuable contribution. Technical assistance was provided by Nora Warschaw, Traci Toy, Robert Chin, Thien Huynh, and Jean Reiss at the UCLA Molecular Diagnostics Laboratories. Variant Annotator X software was used with the permission and guidance of its author, Michael Yourshaw, and computational assistance was provided by Bret Harry.
Author information
Authors and Affiliations
Corresponding author
Supplementary information
Supplementary Materials
(DOC 24 kb)
Rights and permissions
About this article
Cite this article
Strom, S., Lee, H., Das, K. et al. Assessing the necessity of confirmatory testing for exome-sequencing results in a clinical molecular diagnostic laboratory. Genet Med 16, 510–515 (2014). https://doi.org/10.1038/gim.2013.183
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/gim.2013.183
Keywords
This article is cited by
-
A graphical, interactive and GPU-enabled workflow to process long-read sequencing data
BMC Genomics (2021)
-
Sanger sequencing is no longer always necessary based on a single-center validation of 1109 NGS variants in 825 clinical exomes
Scientific Reports (2021)
-
FDA oversight of NSIGHT genomic research: the need for an integrated systems approach to regulation
npj Genomic Medicine (2019)
-
1 in 38 individuals at risk of a dominant medically actionable disease
European Journal of Human Genetics (2019)
-
Multi-gene testing in neurological disorders showed an improved diagnostic yield: data from over 1000 Indian patients
Journal of Neurology (2019)