Use of synthetic DNA spike-in controls (sequins) for human genome sequencing

Blackburn, James; Wong, Ted; Madala, Bindu Swapna; Barker, Chris; Hardwick, Simon A.; Reis, Andre L. M.; Deveson, Ira W.; Mercer, Tim R.

doi:10.1038/s41596-019-0175-1

Protocol
Published: 19 June 2019

Use of synthetic DNA spike-in controls (sequins) for human genome sequencing

James Blackburn ORCID: orcid.org/0000-0001-5216-8815^1,2^na1,
Ted Wong¹^na1,
Bindu Swapna Madala¹,
Chris Barker¹,
Simon A. Hardwick^1,2,
Andre L. M. Reis^1,2,
Ira W. Deveson ORCID: orcid.org/0000-0003-3861-0472^1,2 &
…
Tim R. Mercer^1,2,3

Nature Protocols volume 14, pages 2119–2151 (2019)Cite this article

6484 Accesses
19 Citations
29 Altmetric
Metrics details

Subjects

Abstract

Next-generation sequencing (NGS) has been widely adopted to identify genetic variants and investigate their association with disease. However, the analysis of sequencing data remains challenging because of the complexity of human genetic variation and confounding errors introduced during library preparation, sequencing and analysis. We have developed a set of synthetic DNA spike-ins—termed ‘sequins’ (sequencing spike-ins)—that are directly added to DNA samples before library preparation. Sequins can be used to measure technical biases and to act as internal quantitative and qualitative controls throughout the sequencing workflow. This step-by-step protocol explains the use of sequins for both whole-genome and targeted sequencing of the human genome. This includes instructions regarding the dilution and addition of sequins to human DNA samples, followed by the bioinformatic steps required to separate sequin- and sample-derived sequencing reads and to evaluate the diagnostic performance of the assay. These practical guidelines are accompanied by a broader discussion of the conceptual and statistical principles that underpin the design of sequin standards. This protocol is suitable for users with standard laboratory and bioinformatic experience. The laboratory steps require ~1–4 d and the bioinformatic steps (which can be performed with the provided example data files) take an additional day.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic showing the design and use of sequins in NGS experiments.

**Fig. 3: Compatibility of sequins with targeted sequencing.**

**Fig. 4: Overview of protocol for sequin use in human genome sequencing.**

**Fig. 5: Calibration of sequin coverage to matched human genome regions.**

**Fig. 6: Example traces measuring DNA fragment size and abundance.**

**Fig. 7: Example qPCR assessment of target enrichment for *ALK*, *BRAF*, *PIK3CA*, *PTEN* and *TP53*.**

**Fig. 8: Example of sequin and corresponding human variants.**

**Fig. 9: Alignment-free comparison of quantitative accuracy between libraries.**

**Fig. 10: Impact of sequence context on NGS performance.**

**Fig. 11: Performance evaluation of somatic variant calling by anaquin.**

**Fig. 12: Comparison of expected human and sequin variants analyzed by targeted sequencing.**

Performance assessment of DNA sequencing platforms in the ABRF Next-Generation Sequencing Study

Article 09 September 2021

Assessing the utility of long-read nanopore sequencing for rapid and efficient characterization of mobile element insertions

Article 28 September 2020

Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing

Article Open access 28 October 2022

Data availability

All next-generation sequencing libraries and associated data files, including synthetic sequences and variant annotations, are available for download at http://www.sequinstandards.com/resources/#nature_protocols. Please see the ‘Equipment setup’ section and Supplementary Notes 1 and 2 for further details.

Code availability

Anaquin source code is available from https://github.com/sequinstandards/RAnaquin.

References

Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17, 333–351 (2016).
Article CAS Google Scholar
Sims, D., Sudbery, I., Ilott, N. E., Heger, A. & Ponting, C. P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).
Article CAS Google Scholar
Chen, L., Liu, P., Evans, T. C. & Ettwiller, L. M. DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification. Science 355, 752–756 (2017).
Article CAS Google Scholar
Goldfeder, R. L. et al. Medical implications of technical accuracy in genome sequencing. Genome Med. 8, 24 (2016).
Article Google Scholar
Ross, M. G. et al. Characterizing and measuring bias in sequence data. Genome Biol. 14, R51 (2013).
Article Google Scholar
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Article CAS Google Scholar
Clark, M. J. et al. Performance comparison of exome DNA sequencing technologies. Nat. Biotechnol. 29, 908–914 (2011).
Article CAS Google Scholar
Lam, H. Y. K. et al. Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2011).
Article Google Scholar
Gargis, A. S. et al. Assuring the quality of next-generation sequencing in clinical laboratory practice. Nat. Biotechnol. 30, 1033–1036 (2012).
Article CAS Google Scholar
Deveson, I. W. et al. Chiral DNA sequences as commutable controls for clinical genomics. Nat. Commun. 10, 1342 (2019).
Article Google Scholar
Deveson, I. W. et al. Representing genetic variation with synthetic DNA standards. Nat. Methods 13, 784–791 (2016).
Article CAS Google Scholar
Hardwick, S. A., Deveson, I. W. & Mercer, T. R. Reference standards for next-generation sequencing. Nat. Rev. Genet. 18, 473–484 (2017).
Article CAS Google Scholar
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).
Article CAS Google Scholar
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Article CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS Google Scholar
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Article CAS Google Scholar
Kim, S. et al. Strelka2: fast and accurate variant calling for clinical sequencing applications. Nat. Methods 15, 591–594 (2018).
Article CAS Google Scholar
Wong, T., Deveson, I. W., Hardwick, S. A. & Mercer, T. R. ANAQUIN: a software toolkit for the analysis of spike-in controls for next generation sequencing. Bioinformatics 33, 1723–1724 (2017).
Article CAS Google Scholar
Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522–1527 (2007).
Article CAS Google Scholar
Albert, T. J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007).
Article CAS Google Scholar
Hardwick, S. A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods 13, 792–798 (2016).
Article CAS Google Scholar
Hardwick, S. A. et al. Synthetic microbe communities provide internal reference standards for metagenome sequencing and analysis. Nat. Commun. 9, 3096 (2018).
Article Google Scholar
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Article CAS Google Scholar
Zook, J. M. & Salit, M. Genomes in a bottle: creating standard reference materials for genomic variation—why, what and how?. Genome Biol. 12, P31 (2011).
Article Google Scholar
Sims, D. J. et al. Plasmid-based materials as multiplex quality controls and calibrators for clinical next-generation sequencing assays. J. Mol. Diagn. 18, 336–349 (2016).
Article CAS Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS Google Scholar
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Article CAS Google Scholar
Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4, 265–270 (2009).
Article CAS Google Scholar
Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat. Biotechnol. 34, 303–311 (2016).
Article CAS Google Scholar
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Article CAS Google Scholar
Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).
Article Google Scholar
Kavak, P. et al. Discovery and genotyping of novel sequence insertions in many sequenced individuals. Bioinformatics 33, i161–i169 (2017).
Article CAS Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article CAS Google Scholar
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Article CAS Google Scholar
Murphy, K. M. et al. Comparison of the microsatellite instability analysis system and the Bethesda panel for the determination of microsatellite instability in colorectal cancers. J. Mol. Diagn. 8, 305–311 (2006).
Article CAS Google Scholar
Ka, S. et al. HLAscan: genotyping of the HLA region using next-generation sequencing data. BMC Bioinformatics 18, 258 (2017).
Article Google Scholar
Thorvaldsdottir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

The authors would like to thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowships (1062470 to T.R.M.), APP1108254 (to B.S.K.) and APP1114016 (to J.B). I.W.D is supported by a Cancer Institute NSW Early Career Fellowship (2018/ECF013). T.R.M. and T.W. are supported by a Paramor Family Fellowship. S.A.H. is supported by an Australian Postgraduate Award scholarship. A.L.M.R. is supported by a University of New South Wales Sydney Tuition Fee Scholarship. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors and do not reflect the views of the NHMRC.

Author information

These authors contributed equally: J. Blackburn, T. Wong.

Authors and Affiliations

Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, Australia
James Blackburn, Ted Wong, Bindu Swapna Madala, Chris Barker, Simon A. Hardwick, Andre L. M. Reis, Ira W. Deveson & Tim R. Mercer
St Vincent’s Clinical School, Faculty of Medicine, UNSW Australia, Sydney, Australia
James Blackburn, Simon A. Hardwick, Andre L. M. Reis, Ira W. Deveson & Tim R. Mercer
Altius Institute for Biomedical Sciences, Seattle, WA, USA
Tim R. Mercer

Authors

James Blackburn
View author publications
You can also search for this author in PubMed Google Scholar
Ted Wong
View author publications
You can also search for this author in PubMed Google Scholar
Bindu Swapna Madala
View author publications
You can also search for this author in PubMed Google Scholar
Chris Barker
View author publications
You can also search for this author in PubMed Google Scholar
Simon A. Hardwick
View author publications
You can also search for this author in PubMed Google Scholar
Andre L. M. Reis
View author publications
You can also search for this author in PubMed Google Scholar
Ira W. Deveson
View author publications
You can also search for this author in PubMed Google Scholar
Tim R. Mercer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.B., B.S.K. and C.B. contributed materials. J.B. performed the experiments. T.W., I.W.D., S.A.H. and A.L.M.R. carried out the bioinformatic analysis. J.B., T.W., I.W.D. and T.R.M. wrote the manuscript. All authors conceived the study and contributed to manuscript preparation.

Corresponding authors

Correspondence to Ira W. Deveson or Tim R. Mercer.

Ethics declarations

Competing interests

The authors declare competing interests: the Garvan Institute of Medical Research has filed patents covering aspects of sequencing controls.

Additional information

Journal peer review information: Nature Protocols thanks Justin Zook and other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Example of sequin calibration.

Genome browser views show sequencing alignments within a single sequin standard before (upper) and after (middle) coverage calibration, performed using anaquin ‘calibrate’. During calibration, sequin alignments are down-sampled to achieved matched coverage with the human sample DNA (lower) within sequin regions. This example also shows artifactual enrichment of read-pairs at sequin termini, which occurs during some library preparation methods. Anaquin ‘calibrate’ automatically removes these terminal alignments before calibration. Sequin edge regions (550 bp, by default) are also excluded during the calibration process, as well as downstream anaquin analyses (germline/somatic).

Supplementary information

Supplementary Figure 1

Reporting Summary

Supplementary Information

Supplementary Notes 1 and 2

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blackburn, J., Wong, T., Madala, B.S. et al. Use of synthetic DNA spike-in controls (sequins) for human genome sequencing. Nat Protoc 14, 2119–2151 (2019). https://doi.org/10.1038/s41596-019-0175-1

Download citation

Received: 30 June 2017
Accepted: 03 April 2019
Published: 19 June 2019
Issue Date: July 2019
DOI: https://doi.org/10.1038/s41596-019-0175-1

This article is cited by

A universal molecular control for DNA, mRNA and protein expression
- Helen M. Gunter
- Scott E. Youlten
- Tim R. Mercer
Nature Communications (2024)
Reference Materials for Improving Reliability of Multiomics Profiling
- Luyao Ren
- Leming Shi
- Yuanting Zheng
Phenomics (2024)
Reliable biological and multi-omics research through biometrology
- Lianhua Dong
- Yu Zhang
- Xiang Fang
Analytical and Bioanalytical Chemistry (2024)
Vibrio-Sequins - dPCR-traceable DNA standards for quantitative genomics of Vibrio spp
- Sabrina Flütsch
- Fabian Wiestner
- Kai N. Stölting
BMC Genomics (2023)
The Quartet Data Portal: integration of community-wide resources for multiomics quality control
- Jingcheng Yang
- Yaqing Liu
- Yuanting Zheng
Genome Biology (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.