Next-generation sequencing (NGS) provides a broad investigation of the genome, and it is being readily applied for the diagnosis of disease-associated genetic features. However, the interpretation of NGS data remains challenging owing to the size and complexity of the genome and the technical errors that are introduced during sample preparation, sequencing and analysis. These errors can be understood and mitigated through the use of reference standards — well-characterized genetic materials or synthetic spike-in controls that help to calibrate NGS measurements and to evaluate diagnostic performance. The informed use of reference standards, and associated statistical principles, ensures rigorous analysis of NGS data and is essential for its future clinical use.
- The analysis of next-generation sequencing (NGS) data is complex, owing to the breadth of sequences tested and the range of internal biases and errors. In a clinical context, this can lead to false positives and false negatives, and the potential for misdiagnosis.
- These errors and biases can be mitigated through the use of reference standards — materials with known characteristics that are crucial for test development, quality control and proficiency testing.
- Various reference standards have been developed for NGS, including well-characterized biological samples, synthetic controls and in silico data sets. Each approach has its own strengths and limitations.
- Despite recent progress in developing reference standards, several important challenges remain, including the need to establish the commutability of standards with patient samples.
- We consider an informed use of reference standards, along with associated statistical principles, to be essential for the rigorous analysis of NGS data. Furthermore, reference standards will have a key role in developing the next generation of sequencing technologies.
Next-generation sequencing (NGS) is increasingly being applied in clinical diagnosis, as it can identify genetic variations associated with disease1, can determine fusion genes that cause cancer2 and can detect pathogens in patient samples or isolates3. Unlike previous diagnostic sequencing, NGS can deliver a full qualitative and quantitative analysis of the DNA or RNA sequences within a sample in a single test, and thereby promises improved diagnostic yield.
Despite these advantages, the application and analysis of NGS data remain challenging. The sheer size and diversity of the human genome defy simple analysis, and the breadth of the tested sequences increases the risk of both false-positive and false-negative diagnoses. Furthermore, the detection of clinically relevant features in extreme or repetitive regions of the genome is limited4, and technical variables that are introduced during sample preparation, library construction, sequencing and bioinformatic analysis further confound analysis5.
These errors can cause inaccurate analysis and misdiagnosis, and many clinical laboratories continue to validate complex or ambiguous results with Sanger sequencing6, 7. Many of these errors can be mitigated with the use of reference standards, which have been recommended by a range of professional organizations8, 9, 10, 11, 12, 13, 14, 15, 16. The use of reference standards to assess diagnostic performance has been well studied in fields such as analytical chemistry17, 18, 19, providing a conceptual framework that can be readily applied to NGS assays. A reference standard is defined as a material that is “sufficiently homogeneous and stable with respect to one or more specified properties, which has been established to be fit for its intended use in a measurement process” (Ref. 20). These properties can be qualitative, such as the sequence of a DNA molecule; or quantitative, such as its abundance within a sample. Given that DNA is stable, can be replicated with fidelity and can be accurately characterized, it constitutes an ideal reference material. Although RNA enjoys similar advantages, its lower stability necessitates additional care in handling and storage.
Although the adoption of reference standards for NGS has been fairly slow, partly owing to the rapid pace of technological innovation, the increasing translation of NGS technology into clinical practice has recently focused attention on the development of accompanying reference standards. In this Review, we describe the development and use of reference standards for NGS. We particularly focus on the conceptual and statistical principles that underpin the design of reference standards, and the relative merits and limitations of differing approaches. We propose that an understanding of reference standards, and their associated statistical principles, is required for the rigorous analysis of NGS data and its application in clinical diagnosis.
To minimize bias associated with any particular method, a reference standard should be characterized using different procedures. For NGS, this could entail characterization with different NGS technologies or with orthogonal methods, such as Sanger sequencing and quantitative PCR (qPCR). After extensive characterization, a consensus value, such as a genotype, can then be assigned to the reference standard. Metrological organizations, such as the National Institute of Standards and Technology (NIST), can certify reference standards, indicating that they have been characterized for composition with a stated uncertainty and are directly traceable to an external system of units21.
To be fit for its intended use, a reference standard must be commutable; that is, it must perform comparably to samples undergoing testing22. For example, commutability requires a clinical DNA standard to perform similarly to patient genomic DNA (gDNA) during library preparation, sequencing and analysis. The use of reference standards with poor commutability will bias any calibrated measurements and will result in inaccurate diagnosis23. Given the importance of commutability, clear guidelines have been developed for clinical standards that can be readily adapted to reference standards in NGS (Box 1).
Box 1: Commutability
Once commutability has been established, reference standards can be used to calibrate measurements made from patient samples. For example, the abundance of a DNA sequence in a patient sample can be determined by comparison to a reference sequence of known abundance, along with the uncertainty associated with this measurement. This calibration allows standardization of measurements across multiple samples, and allows diagnostic thresholds to be anchored to reference standards.
Reference standards in NGS. The development of reference standards should address some of the unique challenges posed by NGS. Foremost, the breadth and depth of NGS enable many genome regions to be tested in a single assay, increasing diagnostic (or prognostic) yield but also increasing the risk of erroneously identifying false positives. To evaluate these events requires reference standards that reflect the diversity of the genomic or transcriptomic features being tested by an NGS assay.
Given the breadth of the interrogated sequence, precision is of crucial importance for NGS tests in which, for example, only a few rare mutations are expected within large regions of the genome, and even a low false-positive rate can overwhelm diagnosis with erroneous mutation calls. Filtering practices are often applied to maximize precision; however, these practices can concomitantly increase the proportion of false-negative results, which represent missed opportunities for diagnosis12. This false-negative diagnosis rate is poorly understood for many NGS applications and is difficult to measure without the use of well-characterized reference standards.
Sequencing coverage is one of the most relevant technical variables in NGS and is typically limited by library complexity and cost considerations24. Sufficient coverage is crucial for sensitivity and is required for confident measurements of gene expression and variant detection. At low sequence coverage, sensitivity limits are reached, and uncertainty increases. Accordingly, the ability for reference standards to assess the impact of sequencing coverage is important for many NGS applications.
High sequencing coverage can also be required to overcome the appreciable error rate of NGS that results from DNA damage and sequencing errors25. By contrast, systematic sequencing errors due to sequencing artefacts and short-read misalignment cannot be overcome by high coverage and often require reference standards to be identified and understood26.
Given that reference standards can provide known 'truths', the difference between the expected values and the measured values can provide an empirical estimate of uncertainty27. This is otherwise difficult for NGS workflows, which typically include multiple steps that each introduce different types and amounts of uncertainty. A reference standard will accrue such uncertainties as it progresses through the NGS workflow, and can provide a cumulative measure of uncertainty associated with the final diagnosis.
Biological reference materials
Natural genetic materials can act as useful reference standards that are relatively cheap and easy to develop, encompass the full size and diversity of the human genome or transcriptome, and are expected to be generally commutable with patient samples. Natural genetic materials are also agnostic to the limitations of current NGS technologies and constitute an impartial reference against which to compare alternative methods (Fig. 1).
Human genome reference standards. The detection of genetic variation associated with disease is the leading clinical use of NGS. However, the use of alternative sequencing technologies and bioinformatic analyses typically returns substantial disagreement among variant calls, often at thousands of genomic sites, for the same individual genome28, 29, 30, 31. The difficulty in establishing a comprehensive and unambiguous set of variants for even a single genome indicates the need for reliably genotyped human samples that can serve as reference standards.
Because the original human reference genome sequence was assembled from a consensus of multiple individuals32, it does not provide a biological material to use as a reference standard. Instead, various individual human genomes have been established as reference standards to benchmark NGS test performance. Despite some concerns regarding genomic instability and drift33, stable gDNA from these individuals can be fairly easily and inexpensively sourced from transformed cell lines.
The genome of a healthy female donor of European ancestry known as NA12878 has become the foremost human genome reference standard. The limitations of individual NGS technologies in analysing the genome were offset by integrating multiple sequencing and analytical approaches to generate a high-confidence set of single nucleotide variants (SNVs) and small insertions and deletions (indels) across most of the genome34. For example, long-read sequencing was required to resolve structural variant sets35 and to assign phasing information to NA12878 variants36. Despite these efforts, a substantial proportion of the human genome remains refractory to sequencing analysis owing to extreme GC content, low complexity or repetitive sequences. These difficult regions often vary between individuals and host a range of clinically relevant mutations4.
Many clinical laboratories routinely sequence the NA12878 gDNA as a process control for their NGS workflow (Box 2). The identified variants can be benchmarked against high-confidence genotypes to assess performance, and sequencing multiple replicates enables repeatability (within-run variation) and reproducibility (between-run variation) to be assessed37. Despite such widespread usage, the consent provided by the NA12878 individual is limited to research use only, and commercial products cannot be derived from the genome.
Box 2: Diagnostic statistics
The diversity of human genetic variation has motivated the development and characterization of reference genomes from different ancestries. Accordingly, NIST is expanding its set of supported genome reference standards to include representatives from different ethnic populations38. These additional genomes were collected by the Personal Genome Project, with consent provided for a broader range of uses39. These efforts are further supported by a number of regional initiatives that aim to develop reference genome banks for specific countries and to provide more relevant reference standards for local populations40, 41, 42, 43.
Reference genomes for disease studies. Patient genomes that harbour disease-causing variants can also provide valuable reference standards to guide clinical diagnosis with NGS. However, the sheer diversity of causative variants that are tested with NGS, of which only a small proportion can be present within any single genome, presents a challenge to developing reference genomes for disease.
Cells from patient samples with clinical variants of interest can be transformed to provide a renewable source of reference material. The Genetic Testing Reference Materials Coordination Program (GeT-RM) has characterized a wide range of cell lines that harbour pathogenic mutations for a range of inherited diseases44, 45, 46, 47, 48, 49, as well as variants in pharmacogenetic loci50, 51. These genomes stand as representative examples of the variation associated with disease.
Genome editing to engineer specific variants into a cell line offers an alternative approach52, 53. However, the risk of unintended off-target effects requires careful validation of engineered cell lines to ensure that they remain isogenic for other genome positions.
Establishing a stable, well-characterized and renewable reference genome material for cancer has proved particularly difficult. The extensive characterization of several matched tumour and normal samples has illustrated the complexity of tumour genome populations and has provided useful reference data for benchmarking analysis54, 55, 56. However, tumour samples are typically small and finite, and do not provide a ready source of biological reference material. Furthermore, a tumour sample can encompass multiple, evolving sub-clonal populations and can be insufficiently stable to derive a reliable and homogeneous reference material57. Derived cell lines can provide a simplified example of a cancer genome and can be mixed with other matched cells lines to simulate tumour samples58. Ongoing efforts, such as the Sequencing Quality Control (SEQC) Consortium, aim to establish reference cell lines for use in cancer studies and diagnosis59.
Reference RNA samples. RNA-sequencing (RNA-seq) is confounded by the sheer size and diversity of the transcriptome, variation in RNA sample quality and library preparation methods, and complex bioinformatic analysis60. Nevertheless, the importance of accurately and reproducibly measuring gene expression has motivated the development of well-characterized natural RNA reference materials. However, the responsiveness of gene expression to external stimuli, even during laboratory culture, can cause substantial batch-specific transcriptome variation, and many reference RNA materials must be non-renewably stocked from a single large batch61.
The SEQC59, GEUVADIS62 and Association of Biomolecular Resource Facilities (ABRF)63 projects have used human reference RNA samples to provide comprehensive assessments of RNA-seq accuracy and reproducibility with different protocols, NGS technologies and laboratory sites. Combining these reference samples at known ratios has also allowed the relative accuracy of NGS technologies to be evaluated based on the detection of differentially expressed genes, and this consensus analysis of reference RNA samples has subsequently informed best practices for RNA-seq experimentation59, 62, 63.
Although these RNA samples were sequenced using multiple NGS technologies, novel isoforms continue to be discovered with increasing depth, suggesting that further transcriptional diversity remains unannotated. This highlights the challenge of using natural RNA reference samples for which there is no comprehensive consensus annotation to evaluate false-positive and false-negative findings with RNA-seq. Despite these challenges, natural reference RNA standards are invaluable resources for understanding complex transcriptional features associated with disease, such as the diagnosis of the BCR–ABL1 fusion transcript in leukaemia64.
Reference standards for microorganisms. Metagenomic sequencing can deliver a global profile of the microbial community within an environmental sample65 and can diagnose the presence of pathogens directly from patient samples or isolates3. Unlike previous technologies, NGS can also discover microorganisms that are entirely novel or uncultivable in the laboratory66. However, microbial diversity poses a challenge to analysis by NGS, with individual microorganisms exhibiting a range of genome architectures. Analysis is further confounded by shared and missing reference genome sequences and the presence of background matrix DNA, such as human DNA in patient samples9.
Various microbial reference genomes have been released by NIST for tool development and analysis67. The genomes were selected due to their importance in food safety and clinical microbiology, and encompass a wide range of GC contents. The US Food and Drug Administration (FDA) has also established FDA-ARGOS, a database that lists validated genome sequences from a diverse range of infectious microorganisms, which can be used to standardize the development of NGS tests9.
Mock microbial communities, in which multiple microorganisms have been individually cultured and combined at known abundances to form a community, are often favoured as reference standards to benchmark metagenome analysis. Mock communities can be assembled from extracted gDNA samples, or directly from individual cultures, to allow biases that arise during DNA extraction to be examined. The use of microorganisms with completed reference genomes and combined at known concentrations also allows the limitations of genome quantification and de novo assembly to be investigated. Similarly, mock communities can act as common templates for the multiplex PCR primers that are used in 16S ribosomal DNA (rDNA) profiling, and indicate whether specific microbial lineages are under-estimated or missed during analysis68.
To improve standardization between participating laboratories, the Human Microbiome Project Consortium assembled a mock community of different bacteria and archaea that represent a range of GC contents, genome sizes, repeat content and phylogenetic diversity69, 70. The Microbiome Quality Control (MBQC) project was subsequently initiated to evaluate methods for measuring the human microbiome, using a range of reference standards71. More recent efforts have expanded the scope of represented microorganisms, and have tailored mock communities to specific environmental sources or to NGS applications72. These reference communities have served as useful controls to compare 16S and shotgun sequencing data, to evaluate bias due to GC content and to benchmark metagenome analysis.
A major limitation of using natural genetic materials as NGS reference standards is that they cannot typically be combined with patient samples without contaminating downstream analysis. Spike-in controls, by contrast, are designed to be directly added to a sample and to undergo concurrent library preparation and sequencing, thereby acting as internal quantitative and qualitative controls that are subject to the same downstream technical variables as the accompanying sample (Fig. 1).
Spike-in controls often comprise non-human or artificial sequences73, or contain unique molecular barcodes74, so derivative reads can be distinguished from the accompanying sample following sequencing. For example, the PhiX bacteriophage genome is routinely used as a spike-in control to determine the basic quality control and error rate in Illumina sequencing runs75.
The design of spike-in control sequences is flexible and constrained only by the limits of synthesis, enabling them to be rapidly developed to represent diagnostic features and to address the specific requirements of an NGS test. Spike-ins are typically prepared individually and can be combined at different concentrations to formulate complex mixtures in which many features are represented and internal 'ladders' are built to measure the quantitative features of the accompanying sample. Despite these advantages, achieving commutability remains a constant challenge in developing spike-in controls, as synthetic constructs may not reflect the complexity or behaviour of native DNA or RNA samples76.
Genome spike-ins. Synthetic DNA spike-in controls have been used to represent instances of human genetic variation, including SNVs, indels, and large structural and copy number variants77. The ability to represent genetic variation, particularly with clinical relevance, enables spike-ins to evaluate the detection of these variants with NGS technologies. Furthermore, many variants can be represented within a single mixture of DNA spike-ins, enabling the breadth of NGS diagnosis to be appraised.
A substantial proportion of clinically relevant variants are difficult to resolve using current NGS technologies4. Genome spike-ins can be used to represent such difficult variants, the presence of which may be otherwise ambiguous in natural genome materials. Similarly, natural genome reference materials can be supplemented with synthetic controls that represent clinically relevant or difficult variants that are not otherwise present74, 78.
By manipulating the abundance of specific DNA spike-ins, it is also possible to simulate quantitative features of genome biology, such as variant allele frequency and copy number variation. For example, pairs of DNA spike-ins that represent reference and variant alleles can be either combined to emulate heterozygous genotypes, or further titrated to emulate lower somatic variant allele frequencies that are commonly observed in tumour samples77. These internal DNA spike-in ladders can derive quantitative statistics that are specific to an individual library and can empirically define thresholds for distinguishing sequencing errors from true positives at low allele frequencies (Box 3).
Box 3: Illustrating diagnostic performance
RNA spike-ins. RNA spike-ins were originally developed by the External RNA Controls Consortium (ERCC) as reference standards for quantitative reverse transcriptase (qRT) PCR and microarray assays73, 79, 80, but have since been widely adopted by the RNA-seq community. The ERCC spike-ins comprise a set of polyadenylated transcripts with a range of lengths and GC contents, and without homology to the human genome. The development of spliced RNA spike-ins, which emulate the complex exon and intron architecture of human genes, has allowed further assessments of alternative splicing and transcript assembly using RNA-seq81, 82, 83. Custom RNA spike-in sets have also been developed for more specific applications, including the sequencing of small RNA classes84 and the detection of oncogenic fusion genes85.
RNA spike-ins can be combined to form staggered mixtures that encompass the range of human gene expression86. The accuracy of gene expression measurements with RNA-seq can then be empirically assessed by comparison to this quantitative ladder (Box 4). This comparison also allows gene expression to be quantified with absolute transcript copy numbers86. RNA spike-ins can be formulated at different concentrations between alternative mixtures to provide both positive and negative controls in differential gene expression tests. Adding alternative mixtures to different samples enables users to empirically assess the accuracy of fold-change measurements at different gene expression levels, and can inform the interpretation of differential gene expression between accompanying samples87.
Box 4: Quantitative accuracy and regression analysis
RNA spike-ins can provide a completely characterized truth set that enables the evaluation of false-positive and false-negative findings that is not otherwise possible for natural reference RNA samples. This advantage was utilized during the SEQC and ABRF projects, which complemented reference RNA samples with ERCC controls, enabling a broader analysis of RNA-seq performance that included evaluations of sensitivity and technical variation between NGS methods and laboratories59, 63.
Spike-in controls can also be used as scaling factors for normalization between multiple samples (Box 5). This has proved particularly useful in single-cell RNA-seq experiments, which typically compare thousands of individual cells between which the mRNA composition and the impact of experimental variables can vary substantially88. By adding spike-ins during cell lysis, the mRNA content returned from each cell can be estimated according to the fraction of the library that is derived from spike-ins, with an atypically high fraction being indicative of low RNA quantity (and potentially an experimental error)89. This ability to measure absolute transcript numbers at high cellular resolution allows researchers to investigate novel aspects of transcriptome kinetics that were previously imperceptible using conventional (bulk) RNA-seq90.
Box 5: Normalization with spike-in controls
In silico data sets for bioinformatic analysis
The bioinformatic steps during the analysis of NGS libraries are often complex, and are a substantial source of bias and errors. In silico data sets can be generated quickly and easily, and have proved useful for developing and troubleshooting software tools, and for assessing bioinformatic performance (Fig. 1).
Common data sets (typically in FASTQ or SAM/BAM format) can be rapidly simulated or altered to generate 'ground truth' examples for testing bioinformatics analysis. For example, rare or challenging variants can be quickly represented at any desired frequency within a simulated data set91. Furthermore, the progress of each simulated read can be traced at each step during the analytical workflow, from their original genomic position, through alignment and final analysis92. This allows each step to be assessed, and enables the rapid optimization of the NGS workflow.
Various software tools have been developed to produce human genomes with known genotypes and to simulate derivative NGS libraries93. These tools can often incorporate sequencing errors and can model other sources of error present in NGS libraries. Similarly, for RNA-seq, in silico reference data sets have been used to benchmark existing analytical tools94 and, more recently, to develop innovative methods for transcript quantification95, 96.
The most obvious limitation of in silico data sets is that their use is restricted to assessing bioinformatic steps of NGS workflows, and it is difficult to fully model the complexity and variability present in real data using simulated data. Therefore, although in silico reference standards are a useful supplement for testing bioinformatic steps, they do not replace the use of physical standards that measure the full range of variables faced in clinical diagnosis.
The ability to return diagnostic information from the human genome sequence, which may form a permanent component of an individual's health record, necessitates clear and robust regulatory oversight. Regional organizations are typically vested with authority and responsibility to regulate the development and the validation of clinical diagnostic NGS tests, including the use of reference standards to assess and monitor test performance. The FDA and the European Medicines Agency are two of the largest regulatory organizations. For clarity, we outline the regulatory environment in the United States, but analogous principles are applied in other countries.
Validation data demonstrating the diagnostic performance for a gene or a mutation of interest are required for FDA approval of in vitro diagnostic tests. However, this requirement may be prohibitively difficult and expensive for NGS tests that can detect many variants across large genome regions. Nevertheless, in the few examples of NGS tests that have sought FDA approval, such as the diagnosis of CFTR mutations with Illumina's MiSeqDx instrument97, reference standards have proved critical for benchmarking performance.
Alternatively, an NGS test can be accredited for use within a single laboratory under the Clinical Laboratory Improvement Amendments (CLIA)98. Reference standards are central to demonstrating the validity of the NGS assay, including a global analysis of accuracy, precision, sensitivity, specificity, reportable range and reference interval14. Currently, most NGS diagnostic laboratories have sought approval through this pathway, partly due to the lower cost and speed, compared with FDA approval, with which rapidly evolving sequencing and bioinformatics tools can be accredited.
The ongoing performance of CLIA-approved NGS tests is routinely monitored with proficiency testing, in which blinded samples are periodically sent to participating clinical laboratories for analysis, which then report results for performance evaluation14. In some cases, performance can also be verified by the informal inter-laboratory exchange of patient samples99. Given the breadth of the variants tested, NGS tests are more suited to methods-based evaluation, rather than to specific genes or mutations of interest100, 101. Using this approach, proficiency testing can use a set of central reference samples to provide an independent and standardized evaluation of many different laboratories and different types of NGS tests.
The College of American Pathologists (CAP) offers one of the most comprehensive proficiency testing programmes for NGS (CAP proficiency testing), including germline and somatic variants102, as well as common or actionable fusion genes103. For microbial genomics, the Global Microbial Identifier recently launched a proficiency testing challenge for bacterial whole-genome sequencing (GMI proficiency tests), in which participants are sent live cultures, extracted gDNA and NGS data from bacterial strains9.
Given the analytical complexity of NGS data, in silico proficiency testing challenges have also been introduced to test bioinformatics workflows102, 104, 105. In these programmes, participants are sent NGS library data for analysis using their local bioinformatics workflow. This will prove particularly useful for assessing the diagnosis of complex structural variants or for evaluating false-negative rates. Although not intended as a formal proficiency testing programme, the FDA recently launched precisionFDA, an online portal where participants can access and share NGS data sets and bioinformatic tools, and can standardize analytical best practices106.
The high-sequence throughput of NGS enables the broad interrogation of the genome or transcriptome with a single test. Given this advantage, NGS is being rapidly established in clinics for the diagnosis of disease-associated genetic features. However, the diagnosis of features is far from simple, particularly given the size and diversity of the genome and the complexity of sequence data and bioinformatic analysis. Reference standards are an invaluable resource through which to understand these limitations.
We have described a range of reference standards that have been developed for NGS (summarized in Table 1). Natural biological samples retain genome complexity and can stand as a record of common and pathogenic human genetic variation that is agnostic to current sequencing or bioinformatic technologies. By contrast, synthetic controls can be precisely designed to address specific clinical or technological applications and, through their careful synthesis and preparation, can enable quantitative aspects of genome biology to be assessed. Although not a substitute for physical reference standards, in silico data sets can be used to efficiently optimize bioinformatic steps. Each type of reference standard has its own relative merits and limitations, and ideally a combination of different types should be used in order to provide a robust framework for validation and quality control14.
Reference standards have so far mostly been used to benchmark NGS workflows. However, we anticipate that the routine use of reference standards will increasingly enable the development of novel statistical and bioinformatic analyses87, 107. This includes the ability to empirically measure library statistics and uncertainty, which can then inform and train new generations of bioinformatic tools26, 108. Furthermore, the use of reference standards to expand and to standardize the assessment of difficult, complex or quantitative features of the genome can lead to further gains in diagnostic yield109.
Continued technological innovation is expected to generate more sophisticated synthetic controls and to lead to more comprehensively characterized biological materials. This relationship between technological innovation and reference standards is reciprocal, as reference standards will in turn inform the development and optimization of new sequencing technologies. The continued development of reference standards is a relatively simple alternative approach to improving the accuracy, reliability and standardization of clinical diagnosis, without requiring further advances in NGS technologies. Accordingly, the use of reference standards is likely to expand in step with the broader implementation of NGS in clinical diagnosis and our evolving understanding of genetic disease.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N. Engl. J. Med. 369, 1502–1511 (2013). et al.
- Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016). , , , &
- Next-generation sequencing for infectious disease diagnosis and management. J. Mol. Diagn. 17, 623–634 (2015). , , &
- Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).
This study investigated the location of clinically relevant variants in regions of the human genome that are refractory to reliable genotyping with NGS owing to the presence of extreme GC content or repetitive sequences.
- Library preparation methods for next-generation sequencing: tone down the bias. Exp. Cell Res. 322, 12–20 (2014). , &
- Sanger confirmation is required to achieve optimal sensitivity and specificity in next-generation sequencing panel testing. J. Mol. Diagn. 18, 923–932 (2016). , , , &
- Systematic evaluation of Sanger validation of next-generation sequencing variants. Clin. Chem. 62, 647–654 (2016). , &
- Guidelines for diagnostic next-generation sequencing. Eur. J. Hum. Genet. 24, 2–5 (2016). et al.
- Assuring the quality of next-generation sequencing in clinical microbiology and public health laboratories. J. Clin. Microbiol. 54, 2857–2865 (2016). , &
- Good laboratory practice for clinical next-generation sequencing informatics pipelines. Nat. Biotechnol. 33, 689–693 (2015). et al.
- College of American Pathologists' laboratory standards for next-generation sequencing clinical tests. Arch. Pathol. Lab. Med. 139, 481–493 (2015). et al.
- ACMG clinical laboratory standards for next-generation sequencing. Genet. Med. 15, 733–747 (2013). et al.
- Opportunities and challenges associated with clinical diagnostic genome sequencing. J. Mol. Diagn. 14, 525–540 (2012). et al.
- Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat. Biotechnol. 30, 1033–1036 (2012).
The Nex-StoCT (Next-generation Sequencing: Standardization of Clinical Testing) workgroup developed a set of guidelines to ensure that results from NGS tests are sufficiently reliable for clinical diagnosis, including the recommendation of reference standards for test validation, quality control and proficiency testing.
- Centers for Disease Control and Prevention. Good laboratory practices for molecular genetic testing for heritable diseases and conditions. MMWR Recomm. Rep. 58, 1–29 (2009).
- Developing a sustainable process to provide quality control materials for genetic testing. Genet. Med. 7, 534–549 (2005). et al.
- Roadmap for harmonization of clinical laboratory measurement procedures. Clin. Chem. 57, 1108–1117 (2011). et al.
- Impact of reference materials on accuracy in clinical chemistry. Clin. Biochem. 31, 449–457 (1998). &
- What is a standard? Clin. Chem. 13, 55–76 (1967).
- International Organization for Standardization. ISO Guide 30:2015 — Reference Materials — Selected Terms and Definitions (ISO, 2015).
- Reference materials and reference measurement procedures: an overview from a national metrology institute. Clin. Biochem. Rev. 28, 131–137 (2007).
- Reference materials and commutability. Clin. Biochem. Rev. 28, 139–147 (2007). , &
- Why commutability matters. Clin. Chem. 52, 553–554 (2006). , &
- Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014). , , , &
- DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017). , , &
- Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PLoS ONE 7, e41356 (2012). , , , &
- Uncertainty of measurement in quantitative medical testing: a laboratory implementation guide. Clin. Biochem. Rev. 25, S1–S24 (2004). &
- Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013). et al.
- Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013). et al.
- Optimized filtering reduces the error rate in detecting genomic variants by short-read sequencing. Nat. Biotechnol. 30, 61–68 (2012). et al.
- Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2012). et al.
- Extending reference assembly models. Genome Biol. 16, 13 (2015). et al.
- U-251 revisited: genetic drift and phenotypic consequences of long-term cultures of glioblastoma cells. Cancer Med. 3, 812–824 (2014). et al.
- Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
The Genome in a Bottle Consortium used a range of NGS technologies and analytical tools to characterize the NA12878 genome and to provide a set of high-confidence genotypes that can be used to benchmark germline variant-calling pipelines.
- svclassify: a method to establish benchmark structural variant calls. BMC Genomics 17, 64 (2016). et al.
- A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017). et al.
- Analytical validation of whole exome and whole genome sequencing for clinical applications. BMC Med. Genomics 7, 20 (2014). et al.
- Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016). et al.
- A public resource facilitating clinical use of genomes. Proc. Natl Acad. Sci. USA 109, 11920–11927 (2012). et al.
- De novo assembly and phasing of a Korean human genome. Nature 538, 243–247 (2016). et al.
- Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015). et al.
- De novo assembly of a haplotype-resolved human genome. Nat. Biotechnol. 33, 617–622 (2015). et al.
- Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios. Nat. Commun. 6, 5969 (2015). et al.
- Development of a genomic DNA reference material panel for Rett syndrome (MECP2-related disorders) genetic testing. J. Mol. Diagn. 16, 273–279 (2014). et al.
- Development of a genomic DNA reference material panel for myotonic dystrophy type 1 (DM1) genetic testing. J. Mol. Diagn. 15, 518–525 (2013). et al.
- Quality assurance for Duchenne and Becker muscular dystrophy genetic testing. J. Mol. Diagn. 13, 167–174 (2011). et al.
- Development of genomic reference materials for cystic fibrosis genetic testing. J. Mol. Diagn. 11, 186–193 (2009). et al.
- Consensus characterization of 16 FMR1 reference materials: a consortium study. J. Mol. Diagn. 10, 2–12 (2008). et al.
- Development of genomic reference materials for Huntington disease genetic testing. Genet. Med. 9, 719–723 (2007). et al.
- Characterization of 137 genomic DNA reference materials for 28 pharmacogenetic genes. J. Mol. Diagn. 18, 109–123 (2016).
This paper illustrates the process undertaken by GeT-RM to develop reference materials for genetic testing, including characterization by multiple laboratories and subsequent consensus verification of genotypes.
- Characterization of 107 genomic DNA reference materials for CYP2D6, CYP2C19, CYP2C9, VKORC1, and UGT1A1: a GeT-RM and Association for Molecular Pathology collaborative project. J. Mol. Diagn. 12, 835–846 (2010). et al.
- Routine use of the Ion Torrent AmpliSeq™ Cancer Hotspot Panel for identification of clinically actionable somatic mutations. Clin. Chem. Lab. Med. 52, 707 (2014). et al.
- A novel method for creating artificial mutant samples for performance evaluation and quality control in clinical molecular genetics. J. Mol. Diagn. 7, 247–251 (2005). et al.
- A somatic reference standard for cancer genome sequencing. Sci. Rep. 6, 24607 (2016). et al.
- Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).
This characterization of matched tumour and normal samples shows the requirement for deep sequencing to reveal the diversity of somatic mutations and subclonal populations, with the resulting data providing a useful resource for the bioinformatic analysis of tumour samples.
- A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010). et al.
- Advancing benchmarks for genome sequencing. Cell Syst. 1, 176–177 (2015). &
- A cancer cell-line titration series for evaluating somatic classification. BMC Res. Notes 8, 823 (2015). et al.
- SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).
This is a comprehensive study of RNA-seq accuracy and reproducibility across multiple sequencing platforms and laboratory sites, using human reference RNA samples spiked with the ERCC controls.
- A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016). et al.
- Universal Reference RNA as a standard for microarray experiments. BMC Genomics 5, 20 (2004). et al.
- Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013). et al.
- Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat. Biotechnol. 32, 915–925 (2014). et al.
- Establishment of the first World Health Organization International Genetic Reference Panel for quantitation of BCR-ABL mRNA. Blood 116, e111–e117 (2010). et al.
- The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics. Front. Genet. 6, 348 (2015). , &
- Unusual biology across a group comprising more than 15% of domain bacteria. Nature 523, 208–211 (2015). et al.
- Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front. Genet. 6, 235 (2015). et al.
- Every base matters: assessing small subunit rRNA primers for marine microbiomes with mock communities, time series and global field samples. Environ. Microbiol. 18, 1403–1414 (2016). , &
- The Human Microbiome Project Consortium. A framework for human microbiome research. Nature 486, 215–221 (2012).
- Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE 7, e39315 (2012).
The Human Microbiome Project developed a mock community of microbes commonly found on or in the human body, which has been used to benchmark metagenome sequencing and analysis.
- The microbiome quality control project: baseline study design and future directions. Genome Biol. 16, 276 (2015). , , , &
- High-resolution phylogenetic microbial community profiling. ISME J. 10, 2020–2032 (2016). et al.
- The External RNA Controls Consortium. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).
- Plasmid-based materials as multiplex quality controls and calibrators for clinical next-generation sequencing assays. J. Mol. Diagn. 18, 336–349 (2016). et al.
- SASI-Seq: sample assurance spike-ins, and highly differentiating 384 barcoding for Illumina sequencing. BMC Genomics 15, 110 (2014). et al.
- Technical validation of a multiplex platform to detect thirty mutations in eight genetic diseases prevalent in individuals of Ashkenazi Jewish descent. Genet. Med. 7, 633–639 (2005). et al.
- Representing genetic variation with synthetic DNA standards. Nat. Methods 13, 784–791 (2016).
This study presents a set of synthetic spike-in controls representing DNA variants (SNVs, indels and structural variants), which can function as qualitative and quantitative controls for genome sequencing.
- Multiplexed reference materials as controls for diagnostic next-generation sequencing. J. Mol. Diagn. 18, 882–889 (2016). et al.
- The External RNA Controls Consortium. Proposed methods for testing and selecting the ERCC external RNA controls. BMC Genomics 6, 150 (2005).
- Universal RNA reference materials for gene expression. Clin. Chem. 50, 1464–1471 (2004). et al.
- SIRVs: Spike-In RNA Variants as external isoform controls in RNA-sequencing. Preprint at bioRxiv http://dx.doi.org/10.1101/080747 (2016). et al.
- Using synthetic mouse spike-in transcripts to evaluate RNA-seq analysis tools. PLoS ONE 11, e0153782 (2016). et al.
- Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016). et al.
- Improving small RNA-seq by using a synthetic spike-in set for size-range quality control together with a set for data normalization. Nucleic Acids Res. 43, e89 (2015). et al.
- Open-access synthetic spike-in mRNA-seq data for cancer gene fusions. BMC Genomics 15, 824 (2014). et al.
- Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).
This study used the ERCC controls to measure the sensitivity, dynamic range, quantitative accuracy and biases of RNA-seq experiments.
- Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014). et al.
- Computational and analytical challenges in single-cell transcriptomics. Nat. Rev. Genet. 16, 133–145 (2015). , &
- Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10, 1093–1095 (2013). et al.
- Measuring absolute RNA copy numbers at high temporal resolution reveals transcriptome kinetics in development. Cell Rep. 14, 632–647 (2016). et al.
- Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. Nat. Methods 12, 623–630 (2015). et al.
- Understanding the limitations of next generation sequencing informatics, an approach to clinical pipeline validation using artificial data sets. Cancer Genet. 206, 441–448 (2014). , &
- A comparison of tools for the simulation of genomic next-generation sequencing data. Nat. Rev. Genet. 17, 459–469 (2016). , &
- Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013). et al.
- Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017). , , , &
- Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016). , , &
- Milestone approval lifts Illumina's NGS from research into clinic. Nat. Biotechnol. 32, 111–112 (2014).
- Centers for Medicare and Medicaid Services. US Department of Health and Human Services. Part 493 — Laboratory Requirements: Clinical Laboratory Improvement Amendments of 1988. 42 CFR §493.1443–1495 https://www.cdc.gov/clia/Regulatory/default.aspx
- Alternative approaches to proficiency testing in molecular genetics. Clin. Chem. 49, 717–718 (2003). &
- Methods-based proficiency testing in molecular genetic pathology. J. Mol. Diagn. 16, 283–287 (2014). et al.
- Three-year experience of a CAP/ACMG methods-based external proficiency testing program for laboratories offering DNA sequencing for rare inherited disorders. Genet. Med. 16, 25–32 (2014). , , , &
- A model study of in silico proficiency testing for clinical next-generation sequencing. Arch. Pathol. Lab. Med. 140, 1085–1091 (2016). et al.
- Quality assurance of RNA expression profiling in clinical laboratories. J. Mol. Diagn. 14, 1–11 (2012). , , &
- In silico proficiency testing for clinical next-generation sequencing. J. Mol. Diagn. 19, 35–42 (2017). , &
- Multi-institutional FASTQ file exchange as a means of proficiency testing for next-generation sequencing bioinformatics and variant interpretation. J. Mol. Diagn. 18, 572–579 (2016). et al.
- A research roadmap for next-generation sequencing informatics. Sci. Transl Med. 8, 335ps10 (2016). et al.
- Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
These authors developed a normalization strategy for RNA-seq termed RUV (remove unwanted variation), which adjusts for nuisance technical effects between samples by performing factor analysis on suitable sets of control genes (for example, RNA spike-ins).
, , &
- Creating a universal SNP and small indel variant caller with deep neural networks. Preprint at bioRxiv http://dx.doi.org/10.1101/092890 (2016). et al.
- Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016). et al.
- Clinical validation of a next-generation sequencing screen for mutational hotspots in 46 cancer-related genes. J. Mol. Diagn. 15, 607–622 (2013). et al.
- Power analysis of single-cell RNA-sequencing experiments. Nat. Methods 14, 381–387 (2017). et al.
- Commutability of reference materials in clinical chemistry. J. Int. Fed. Clin. Chem. 5, 169–173 (1993).
- Points of significance: classification evaluation. Nat. Methods 13, 603–604 (2016). , &
- Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA 113, 11901–11906 (2016). et al.
- Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011). , , &
- Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). et al.
- The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10, e0118432 (2015). &
- Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29, S49–S52 (2008). &
- Points of significance: simple linear regression. Nat. Methods 12, 999–1000 (2015). &
- A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010). &
- Transcriptional amplification in tumor cells with elevated c-Myc. Cell 151, 56–67 (2012). et al.
- Revisiting global gene expression analysis. Cell 151, 476–482 (2012). et al.
- Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4, 28 (2016). et al.
The authors thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowship 1062470 (to T.R.M.). S.A.H. and I.W.D. are supported by Australian Postgraduate Award scholarships. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors and do not reflect the views of NHMRC. The authors also thank L. Burnett (Kinghorn Centre for Clinical Genomics, Australia) for helpful suggestions during manuscript preparation.
- Reference standards
Control materials with known characteristics (for example, a known genotype) against which test performance can be measured.
The ability of a reference standard to perform comparably to actual patient samples when measured using more than one measurement procedure.
- Matrix effects
Effects caused by any sample component other than the analyte of interest that can lead to the non-commutability of reference standards.
- Variant allele frequencies
The fraction of alleles in a given sample (for example, a tumour biopsy sample) that correspond to a variant of interest.
(Also known as positive predictive value). The fraction of positive predictions made by a test that are true.
(Also known as recall). The fraction of known positives that are correctly predicted by a test.
- Systematic sequencing errors
Nonrandom errors in sequence determination due to sample preparation and sequencing processes.
The well-characterized genome from a healthy female individual that is commonly used to benchmark genome analysis.
- Long-read sequencing
Sequencing approach that uses reads in excess of several kilobases, enabling the resolution of large structural genomic features.
The process of determining the chromosome from which a particular DNA variant is derived.
- Mock microbial communities
A reference standard generated by combining the genome DNA (or cells) from multiple individually cultured microorganisms at known concentrations.
- Spike-in controls
DNA or RNA molecules of known length, sequence composition and abundance that are directly added to samples to act as qualitative and quantitative internal controls.
- Limit of detection
The lowest concentration of an analyte that can be detected by an assay.
The adjustment of technical bias between multiple samples to facilitate accurate comparisons.
- Reportable range
The genomic region or regions in which sequencing data of an acceptable quality can be derived by a next-generation sequencing test.
- Reference interval
The spectrum of sequence variants that occur in an unaffected population from which the patient specimen has been derived.
- Proficiency testing
The provision of reference samples to participating laboratories for testing, with results reported to an independent organization for evaluation (often known as external quality assessment in Europe).