Abstract

The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed 'sequins', that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

BioProject

Referenced accessions

European Nucleotide Archive

References

  1. 1.

    et al. Sequencing studies in human genetics: design and interpretation. Nat. Rev. Genet. 14, 460–470 (2013).

  2. 2.

    & Cancer genome-sequencing study design. Nat. Rev. Genet. 14, 321–332 (2013).

  3. 3.

    & Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutat. Res. 729, 1–15 (2012).

  4. 4.

    & Molecular genetic testing and the future of clinical genomics. Nat. Rev. Genet. 14, 415–426 (2013).

  5. 5.

    , , & Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).

  6. 6.

    , , , & Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).

  7. 7.

    et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 43, D670–D681 (2015).

  8. 8.

    et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

  9. 9.

    & Edge effects in calling variants from targeted amplicon sequencing. BMC Genomics 15, 1073–1080 (2014).

  10. 10.

    , , & ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

  11. 11.

    & Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  12. 12.

    et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).

  13. 13.

    et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 1–33 (2013).

  14. 14.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  15. 15.

    et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  16. 16.

    , , & Cancer heterogeneity—a multifaceted view. EMBO Rep. 14, 686–695 (2013).

  17. 17.

    & Tumour heterogeneity and cancer cell plasticity. Nature 501, 328–337 (2013).

  18. 18.

    & Clonal evolution in cancer. Nature 481, 306–313 (2012).

  19. 19.

    et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).

  20. 20.

    et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

  21. 21.

    , & Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).

  22. 22.

    et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

  23. 23.

    , & Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

  24. 24.

    & Human structural variation: mechanisms of chromosome rearrangements. Trends Genet. 31, 587–599 (2015).

  25. 25.

    , & Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer Genet. 206, 432–440 (2013).

  26. 26.

    , & Whole-genome CNV analysis: advances in computational approaches. Front. Genet. 6, 138 (2015).

  27. 27.

    , , , & The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986–D992 (2014).

  28. 28.

    , , & LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

  29. 29.

    & The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10, 691–703 (2009).

  30. 30.

    , , & A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).

  31. 31.

    et al. Statistical issues in the analysis of DNA copy number variations. Int. J. Comput. Biol. Drug Des. 1, 368–395 (2008).

  32. 32.

    , , , & The impact of amplification on differential expression analyses by RNA-seq. Preprint at bioRxiv (2015).

  33. 33.

    & Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29 (Suppl. 1), S49–S52 (2008).

  34. 34.

    , , & Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nat. Commun. 5, 4850 (2014).

  35. 35.

    et al. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology. BMC Genomics 14, 804 (2013).

  36. 36.

    et al. Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions. BMC Genomics 14, 143 (2013).

  37. 37.

    , , & Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

  38. 38.

    et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).

  39. 39.

    et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods (2016).

  40. 40.

    et al. Analytical validation and application of a targeted next-generation sequencing mutation-detection assay for use in treatment assignment in the NCI-MPACT trial. J. Mol. Diagn. 18, 51–67 (2016).

  41. 41.

    et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

  42. 42.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  43. 43.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  44. 44.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  45. 45.

    , , & ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).

  46. 46.

    , , & CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).

Download references

Acknowledgements

We thank our colleagues M. Cowley and M. Pinese for useful discussions and K. Ying for assistance in bioinformatic pipelines. The authors would like to thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowship (1062470 to T.R.M. and 1062606 to W.Y.C.). I.W.D. and S.A.H. are supported by Australian Postgraduate Award scholarships. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors, and they do not reflect the views of NHMRC.

Author information

Author notes

    • Ira W Deveson
    •  & Wendy Y Chen

    These authors contributed equally to this work.

Affiliations

  1. Genomics and Epigenetics Division, Garvan Institute of Medical Research, New South Wales, Australia.

    • Ira W Deveson
    • , Wendy Y Chen
    • , Ted Wong
    • , Simon A Hardwick
    • , John S Mattick
    •  & Tim R Mercer
  2. School of Biotechnology and Biomolecular Sciences, Faculty of Science, The University of New South Wales, New South Wales, Australia.

    • Ira W Deveson
  3. St. Vincents Clinical School, Faculty of Medicine, The University of New South Wales, New South Wales, Australia.

    • Wendy Y Chen
    • , Simon A Hardwick
    • , John S Mattick
    •  & Tim R Mercer
  4. Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Queensland, Australia.

    • Stacey B Andersen
    •  & Lars K Nielsen

Authors

  1. Search for Ira W Deveson in:

  2. Search for Wendy Y Chen in:

  3. Search for Ted Wong in:

  4. Search for Simon A Hardwick in:

  5. Search for Stacey B Andersen in:

  6. Search for Lars K Nielsen in:

  7. Search for John S Mattick in:

  8. Search for Tim R Mercer in:

Contributions

T.R.M. conceived the project, designed sequins and synthetic chromosome and conceived experiments. W.Y.C. and S.B.A. prepared sequins and performed experiments. I.W.D., T.W. and T.R.M. performed data analysis. I.W.D., S.A.H., L.K.N., J.S.M. and T.R.M. prepared the manuscript.

Competing interests

The Garvan Institute of Medical Research has filed patent applications on some techniques described in this study.

Corresponding author

Correspondence to Tim R Mercer.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–15

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3957

Further reading