Abstract
Next-generation sequencing (NGS) has been widely adopted to identify genetic variants and investigate their association with disease. However, the analysis of sequencing data remains challenging because of the complexity of human genetic variation and confounding errors introduced during library preparation, sequencing and analysis. We have developed a set of synthetic DNA spike-ins—termed ‘sequins’ (sequencing spike-ins)—that are directly added to DNA samples before library preparation. Sequins can be used to measure technical biases and to act as internal quantitative and qualitative controls throughout the sequencing workflow. This step-by-step protocol explains the use of sequins for both whole-genome and targeted sequencing of the human genome. This includes instructions regarding the dilution and addition of sequins to human DNA samples, followed by the bioinformatic steps required to separate sequin- and sample-derived sequencing reads and to evaluate the diagnostic performance of the assay. These practical guidelines are accompanied by a broader discussion of the conceptual and statistical principles that underpin the design of sequin standards. This protocol is suitable for users with standard laboratory and bioinformatic experience. The laboratory steps require ~1–4 d and the bioinformatic steps (which can be performed with the provided example data files) take an additional day.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All next-generation sequencing libraries and associated data files, including synthetic sequences and variant annotations, are available for download at http://www.sequinstandards.com/resources/#nature_protocols. Please see the ‘Equipment setup’ section and Supplementary Notes 1 and 2 for further details.
Code availability
Anaquin source code is available from https://github.com/sequinstandards/RAnaquin.
References
Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. & Ponting, C. P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).
Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).
Goldfeder, R. L. et al. Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).
Ross, M. G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013).
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Clark, M. J. et al. Performance comparison of exome DNA sequencing technologies. Nat. Biotechnol. 29, 908–914 (2011).
Lam, H. Y. K. et al. Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2011).
Gargis, A. S. et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat. Biotechnol. 30, 1033–1036 (2012).
Deveson, I. W. et al. Chiral DNA sequences as commutable controls for clinical genomics. Nat. Commun. 10, 1342 (2019).
Deveson, I. W. et al. Representing genetic variation with synthetic DNA standards. Nat. Methods 13, 784–791 (2016).
Hardwick, S. A., Deveson, I. W. & Mercer, T. R. Reference standards for next-generation sequencing. Nat. Rev. Genet. 18, 473–484 (2017).
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Kim, S. et al. Strelka2: fast and accurate variant calling for clinical sequencing applications. Nat. Methods 15, 591–594 (2018).
Wong, T., Deveson, I. W., Hardwick, S. A. & Mercer, T. R. ANAQUIN: a software toolkit for the analysis of spike-in controls for next generation sequencing. Bioinformatics 33, 1723–1724 (2017).
Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522–1527 (2007).
Albert, T. J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).
Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016).
Hardwick, S. A. et al. Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nat. Commun. 9, 3096 (2018).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Zook, J. M. & Salit, M. Genomes in a bottle: creating standard reference materials for genomic variation—why, what and how?. Genome Biol. 12, P31 (2011).
Sims, D. J. et al. Plasmid-based materials as multiplex quality controls and calibrators for clinical next-generation sequencing assays. J. Mol. Diagn. 18, 336–349 (2016).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).
Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Kavak, P. et al. Discovery and genotyping of novel sequence insertions in many sequenced individuals. Bioinformatics 33, i161–i169 (2017).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Murphy, K. M. et al. Comparison of the microsatellite instability analysis system and the Bethesda panel for the determination of microsatellite instability in colorectal cancers. J. Mol. Diagn. 8, 305–311 (2006).
Ka, S. et al. HLAscan: genotyping of the HLA region using next-generation sequencing data. BMC Bioinformatics 18, 258 (2017).
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
Acknowledgements
The authors would like to thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowships (1062470 to T.R.M.), APP1108254 (to B.S.K.) and APP1114016 (to J.B). I.W.D is supported by a Cancer Institute NSW Early Career Fellowship (2018/ECF013). T.R.M. and T.W. are supported by a Paramor Family Fellowship. S.A.H. is supported by an Australian Postgraduate Award scholarship. A.L.M.R. is supported by a University of New South Wales Sydney Tuition Fee Scholarship. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors and do not reflect the views of the NHMRC.
Author information
Authors and Affiliations
Contributions
J.B., B.S.K. and C.B. contributed materials. J.B. performed the experiments. T.W., I.W.D., S.A.H. and A.L.M.R. carried out the bioinformatic analysis. J.B., T.W., I.W.D. and T.R.M. wrote the manuscript. All authors conceived the study and contributed to manuscript preparation.
Corresponding authors
Ethics declarations
Competing interests
The authors declare competing interests: the Garvan Institute of Medical Research has filed patents covering aspects of sequencing controls.
Additional information
Journal peer review information: Nature Protocols thanks Justin Zook and other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Deveson, I. W. et al. Nat. Commun. 10, 1342 (2019): https://doi.org/10.1038/s41467-019-09272-0
Hardwick, S. A., Deveson, I. W. & Mercer, T. R. Nat. Rev. Genet. 18, 473–484 (2017): https://doi.org/10.1038/nrg.2017.44
Deveson, I. W. et al. Nat. Methods 13, 784–791 (2016): https://doi.org/10.1038/nmeth.3957
Integrated supplementary information
Supplementary Figure 1 Example of sequin calibration.
Genome browser views show sequencing alignments within a single sequin standard before (upper) and after (middle) coverage calibration, performed using anaquin ‘calibrate’. During calibration, sequin alignments are down-sampled to achieved matched coverage with the human sample DNA (lower) within sequin regions. This example also shows artifactual enrichment of read-pairs at sequin termini, which occurs during some library preparation methods. Anaquin ‘calibrate’ automatically removes these terminal alignments before calibration. Sequin edge regions (550 bp, by default) are also excluded during the calibration process, as well as downstream anaquin analyses (germline/somatic).
Supplementary information
Supplementary information
Supplementary Figure 1
Supplementary Information
Supplementary Notes 1 and 2
Rights and permissions
About this article
Cite this article
Blackburn, J., Wong, T., Madala, B.S. et al. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat Protoc 14, 2119–2151 (2019). https://doi.org/10.1038/s41596-019-0175-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-019-0175-1
This article is cited by
-
A universal molecular control for DNA, mRNA and protein expression
Nature Communications (2024)
-
Reference Materials for Improving Reliability of Multiomics Profiling
Phenomics (2024)
-
Reliable biological and multi-omics research through biometrology
Analytical and Bioanalytical Chemistry (2024)
-
Vibrio-Sequins - dPCR-traceable DNA standards for quantitative genomics of Vibrio spp
BMC Genomics (2023)
-
The Quartet Data Portal: integration of community-wide resources for multiomics quality control
Genome Biology (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.