Representing genetic variation with synthetic DNA standards

Abstract

The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed 'sequins', that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Sequin standards representing genetic variation.
Figure 2: Assessing quantitative variability within and between sequin mixtures.
Figure 3: Validation of DNA sequins against the NA12878 reference genome.
Figure 4: Using sequins to measure somatic variant-allele frequency.
Figure 5: Representing structural variation with DNA sequins.
Figure 6: Representing copy-number variation and repeat DNA with sequins.

Accession codes

Primary accessions

BioProject

Referenced accessions

European Nucleotide Archive

NCBI Reference Sequence

References

  1. 1

    Goldstein, D.B. et al. Sequencing studies in human genetics: design and interpretation. Nat. Rev. Genet. 14, 460–470 (2013).

    CAS  Article  Google Scholar 

  2. 2

    Mwenifumbo, J.C. & Marra, M.A. Cancer genome-sequencing study design. Nat. Rev. Genet. 14, 321–332 (2013).

    CAS  Article  Google Scholar 

  3. 3

    Gundry, M. & Vijg, J. Direct mutation analysis by high-throughput sequencing: from germline to low-abundant, somatic variants. Mutat. Res. 729, 1–15 (2012).

    CAS  Article  Google Scholar 

  4. 4

    Katsanis, S.H. & Katsanis, N. Molecular genetic testing and the future of clinical genomics. Nat. Rev. Genet. 14, 415–426 (2013).

    CAS  Article  Google Scholar 

  5. 5

    Nielsen, R., Paul, J.S., Albrechtsen, A. & Song, Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).

    CAS  Article  Google Scholar 

  6. 6

    Sims, D., Sudbery, I., Ilott, N.E., Heger, A. & Ponting, C.P. Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014).

    CAS  Article  Google Scholar 

  7. 7

    Rosenbloom, K.R. et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 43, D670–D681 (2015).

    CAS  Article  Google Scholar 

  8. 8

    Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  Article  Google Scholar 

  9. 9

    Satya, R.V. & DiCarlo, J. Edge effects in calling variants from targeted amplicon sequencing. BMC Genomics 15, 1073–1080 (2014).

    Article  Google Scholar 

  10. 10

    Huang, W., Li, L., Myers, J.R. & Marth, G.T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    Article  Google Scholar 

  11. 11

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  Google Scholar 

  12. 12

    Zook, J.M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32, 246–251 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Van der Auwera, G.A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 1–33 (2013).

    Google Scholar 

  14. 14

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

  15. 15

    DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  Article  Google Scholar 

  16. 16

    De Sousa E Melo, F., Vermeulen, L., Fessler, E. & Medema, J.P. Cancer heterogeneity—a multifaceted view. EMBO Rep. 14, 686–695 (2013).

    Article  Google Scholar 

  17. 17

    Meacham, C.E. & Morrison, S.J. Tumour heterogeneity and cancer cell plasticity. Nature 501, 328–337 (2013).

    CAS  Article  Google Scholar 

  18. 18

    Greaves, M. & Maley, C.C. Clonal evolution in cancer. Nature 481, 306–313 (2012).

    CAS  Article  Google Scholar 

  19. 19

    Griffith, M. et al. Optimizing cancer genome sequencing and analysis. Cell Syst. 1, 210–223 (2015).

    CAS  Article  Google Scholar 

  20. 20

    Carter, S.L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012).

    CAS  Article  Google Scholar 

  21. 21

    Aran, D., Sirota, M. & Butte, A.J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).

    CAS  Article  Google Scholar 

  22. 22

    Koboldt, D.C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).

    CAS  Article  Google Scholar 

  23. 23

    Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    CAS  Article  Google Scholar 

  24. 24

    Weckselblatt, B. & Rudd, M.K. Human structural variation: mechanisms of chromosome rearrangements. Trends Genet. 31, 587–599 (2015).

    CAS  Article  Google Scholar 

  25. 25

    Abel, H.J., Duncavage, E.J. & Duncavage, E.J. Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer Genet. 206, 432–440 (2013).

    CAS  Article  Google Scholar 

  26. 26

    Pirooznia, M., Goes, F.S. & Zandi, P.P. Whole-genome CNV analysis: advances in computational approaches. Front. Genet. 6, 138 (2015).

    Article  Google Scholar 

  27. 27

    MacDonald, J.R., Ziman, R., Yuen, R.K.C., Feuk, L. & Scherer, S.W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 42, D986–D992 (2014).

    CAS  Article  Google Scholar 

  28. 28

    Layer, R.M., Chiang, C., Quinlan, A.R. & Hall, I.M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 15, R84 (2014).

    Article  Google Scholar 

  29. 29

    Cordaux, R. & Batzer, M.A. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10, 691–703 (2009).

    CAS  Article  Google Scholar 

  30. 30

    Zarrei, M., MacDonald, J.R., Merico, D. & Scherer, S.W. A copy number variation map of the human genome. Nat. Rev. Genet. 16, 172–183 (2015).

    CAS  Article  Google Scholar 

  31. 31

    Wineinger, N.E. et al. Statistical issues in the analysis of DNA copy number variations. Int. J. Comput. Biol. Drug Des. 1, 368–395 (2008).

    CAS  Article  Google Scholar 

  32. 32

    Parekh, S., Ziegenhain, C., Vieth, B., Enard, W. & Hellmann, I. The impact of amplification on differential expression analyses by RNA-seq. Preprint at bioRxiv http://dx.doi.org/10.1101/035493 (2015).

  33. 33

    Armbruster, D.A. & Pry, T. Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29 (Suppl. 1), S49–S52 (2008).

    PubMed  PubMed Central  Google Scholar 

  34. 34

    Gibbons, J.G., Branco, A.T., Yu, S. & Lemos, B. Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nat. Commun. 5, 4850 (2014).

    CAS  Article  Google Scholar 

  35. 35

    Lei, H. et al. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology. BMC Genomics 14, 804 (2013).

    CAS  Article  Google Scholar 

  36. 36

    Schaap, M. et al. Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions. BMC Genomics 14, 143 (2013).

    CAS  Article  Google Scholar 

  37. 37

    Risso, D., Ngai, J., Speed, T.P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

    CAS  Article  Google Scholar 

  38. 38

    Frampton, G.M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).

    CAS  Article  Google Scholar 

  39. 39

    Hardwick, S.A. et al. Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat. Methods http://dx.doi.org/10.1038/nmeth.3958 (2016).

  40. 40

    Lih, C.J. et al. Analytical validation and application of a targeted next-generation sequencing mutation-detection assay for use in treatment assignment in the NCI-MPACT trial. J. Mol. Diagn. 18, 51–67 (2016).

    CAS  Article  Google Scholar 

  41. 41

    Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    CAS  Article  Google Scholar 

  42. 42

    Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    CAS  Article  Google Scholar 

  43. 43

    Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Article  Google Scholar 

  44. 44

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  45. 45

    Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).

    CAS  Article  Google Scholar 

  46. 46

    Abyzov, A., Urban, A.E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank our colleagues M. Cowley and M. Pinese for useful discussions and K. Ying for assistance in bioinformatic pipelines. The authors would like to thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowship (1062470 to T.R.M. and 1062606 to W.Y.C.). I.W.D. and S.A.H. are supported by Australian Postgraduate Award scholarships. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors, and they do not reflect the views of NHMRC.

Author information

Affiliations

Authors

Contributions

T.R.M. conceived the project, designed sequins and synthetic chromosome and conceived experiments. W.Y.C. and S.B.A. prepared sequins and performed experiments. I.W.D., T.W. and T.R.M. performed data analysis. I.W.D., S.A.H., L.K.N., J.S.M. and T.R.M. prepared the manuscript.

Corresponding author

Correspondence to Tim R Mercer.

Ethics declarations

Competing interests

The Garvan Institute of Medical Research has filed patent applications on some techniques described in this study.

Integrated supplementary information

Supplementary Figure 1 Workflow for use and analysis of DNA sequin standards in genome sequencing.

(From top) sequins are added to a genomic DNA sample and undergo concurrent library preparation and sequencing. Reads representing genomic DNA or synthetic sequin DNA are distinguished by their alignment to the reference human genome (left) or the artificial in silico reference chromosome (colored), respectively. Sequin-derived alignments undergo independent analysis in parallel to assess the performance of variant detection, the accuracy of allele frequency quantification and facilitate inter-sample comparisons and/or normalization. Multiple aspects of the workflow, including both laboratory (library preparation and sequencing) and bioinformatic (alignment and analysis) processes, can be assessed using sequins, with examples indicated (dashed boxes).

Supplementary Figure 2 Design workflow for artificial in silico chromosome and DNA sequins

(a) An artificial in silico chromosome sequence was designed to represent features of a real human chromosome with artificial, non-homologous primary sequences. To generate a background sequence into which synthetic variants could be incorporated, the human genome (Hg38) sequence was binned into 100 kb windows, which were ranked according to gene density, repeat density and GC content, then systematically sampled to ensure proportional representation. To abolish homology to Hg38, sequences were inverted and, where necessary, subjected to local shuffling, nucleotide substitution and/or manual curation. Annotated instances of small and large scale genetic variation from human populations were systematically sampled and incorporated, along with their flanking genome sequences, into the artificial chromosome, with homology being abolished as above. (b) Artificial sequences (background and genetic variants) were assembled into in silico chromosome sequences. Three in silico chromosomes were assembled; the first sequence (chrIS_D) contains no genetic variation and corresponds to the consensus reference sequence, analogous to the Hg38 reference sequence. The remaining two sequences emulate diploid human genotypes, with homozygous variation encoded throughout both sequences and heterozygous variation encoded in only one of the sequences. These form the template for synthetic DNA sequin synthesis. (c) Sequins are short (<10kB) stretches of synthetic DNA that encode mutations with respect to the sequence of chrIS_D, whose sequence remains in silico. Sequin sequences were flanked by SapI Type II restriction digest sites to enable their excision and purification from the host (pMA) vector. Purified sequins were combined at desired concentrations before library preparation and sequencing. The resulting sequencing reads align to the chrIS_D and identify instances of genetic variation with respect to this reference sequence.

Supplementary Figure 3 Representing human genetic variation within DNA sequins

(a) Large scale (1.5MB) genome browser view of sequins mapping within chrIS_D, in which eight individual sequins are visible, as well as a closer view of a single sequin, encoding six synthetic variants. (b) Histograms indicate spacing between synthetic variants encoded within sequins, with a density distribution for bona fide variants in the NA12878 genome provided for comparison. (c,d) Common genetic variation within human populations was proportionately sampled and incorporated into chrIS_D. Common genetic variation was also supplemented with an additional set of InDels that pose a challenge to detection via short-read sequencing. Histograms indicate the relative frequency of nucleotide substitutions (c) and the distribution of InDel sizes (d) within sequins (blue) and human populations (dbSNP 141; grey). (e) Examples of variants residing in low-complexity and simple-repeat sequences that could not be represented in sequins due to synthesis difficulties and/or cross-alignment to the human genome.

Supplementary Figure 4 Assessment of cross-alignment between sequin and human DNA libraries

When aligned to a combined reference index containing Hg38 and chrIS_D, all libraries (simulated and experimental) show no cross-alignment contamination. Forcing cross-alignment requires (i) performing alignment in absence of either chrIS_D or Hg38 and (ii) reducing the mapping penalty associated with mismatching bases (BWA mem default penalty = 4). This also results in a high rate of erroneous alignment. Histograms illustrate the proportion of unique (blue), multiple/ambiguous alignments (grey) and unmapped reads (orange) from simulated (a) and real (b) libraries when aligned to either chrIS_D or Hg38, as mismatch-penalty rates are reduced from 4 (default) to 0.

Supplementary Figure 5 Sources of variability in observed sequin abundances within and between mixtures

Sequence-specific biases during PCR amplification, library preparation, sequencing and alignment contribute to systematic variation (assumed to be reproducible between independent mixtures) in the observed abundances of sequins, while different mixtures exhibit independent variation, resulting from aliquot variability (pipetting error). (a) The observed abundances (median per-base coverage) of individual sequin standards (normalized to median abundance of all standards within a mixture) are compared between five independent mixtures, in which all standards were theoretically incorporated at equimolar concentrations. The vector directions associated with either systematic or independent variation are indicated. (b) Histograms indicate observed abundances (normalized to median) of sequin standards in five independent mixtures (matched order), before and after correction to remove systematic biases. Standard deviations indicate the magnitude of variability within each, with this being considerably reduced following systematic-bias correction.

Supplementary Figure 6 Sequencing and alignment quality for sequenced reads derived from sequin or NA12878 genomic DNA

(a, b) FastQC reports for NA12878 DNA and sequin-derived reads, showing phred quality scores as a function of nucleotide position within sequenced reads (a) and on a per-read basis (b). (c) Population distributions show the frequency of single-base substitution errors in sequenced reads from sequins or NA12878 genomic DNA.

Supplementary Figure 7 Calibrating sequencing coverage of sequins to match NA12878 genome

(a) Alignments to chrIS_D were subsampled to achieve coverage matched to the accompanying NA12878 genome. Frequency distributions illustrates coverage (per nucleotide) for (i) sequins before (right, blue) and (ii) after (left; blue) sub-sampling and (iii) accompanying NA12878 genome at maximum sequencing depth (red). Box-whisker plots (above) summarize median and interquartile ranges for frequency distributions. (b) Plots show mean coverage (+/- SD) across the length of sequin standards (left; n=36) in comparison to coverage across matched ~1kb windows in NA12878 genome (right; n=36). Terminal regions of sequins where edge-effects impact coverage are indicated (dashed lines) and were excluded during coverage calibration and analysis.

Supplementary Figure 8 Reference scale to measure allele frequencies

Sequin standards representing variant alleles were titrated at a two-fold serial dilution relative to reference allele standards to establish a ladder of allele frequencies from 1:1 (ie. heterozygous) to 1:4096. (a) Scatter plot illustrates the mixture structure used to establish variant allele frequency reference scale. (b,c) Observed alignment counts are plotted against expected allelic fractions for synthetic sequin variants, with SNVs (b) and InDels (c) shown separately. (d) Genome browser views provide illustrative examples of synthetic variants at 0.5, 0.25, 0.125 and 0.0625 allelic fractions, with alignments (grey) containing variant indicated.

Supplementary Figure 9 Impact of library depth on variant detection sensitivity

Sequins constitute an internal standard that can be used to inform the filtering stringency require to achieve sensitivity and specificity according to sample-specific variation, such as library depth. (a, b) Read counts supporting reference (grey) and variant alleles (red and blue) are plotted against expected allelic fractions (1/2 – 1/4096) at 5,000-, 1000-, and 250-fold sequencing depth. Variants called at a minimum confidence threshold (blue; p–value indicated) that was empirically determined within each sampled library in order to maximize sensitivity, without compromising precision. (c, d) Frequency distributions illustrate variant detection sensitivity at different library depth. Whilst a linear relationship between library depth and detection sensitivity is apparent for SNVs (c) a more complicated relationship due confounding impact of erroneous alignments is apparent for InDels (d).

Supplementary Figure 10 Evaluating filtering strategies for somatic variant detection

Filtering strategies based on multiple features are commonly used to distinguish true-positive variants (red) and false-positive variants (blue). Cumulative frequency plots illustrate the distribution of (a) reads frequencies, (b) quality score at identified variant nucleotide(s), (c) confidence p-value ascribed by VarScan2 (derived from Fisher exact test on reference and variant alignment counts). (d) Box whisker plot show fraction (+/- min and max range) of alignments in forward relative to reverse orientation. (e) Read-counts supporting reference (grey) and variant alleles (blue and purple) for synthetic InDels are plotted against expected variant allele fractions. InDels in blue were called with a minimum confidence of p < 10^-6 (Fisher exact test), while green variants failed to reach this threshold. (f) Frequency distributions describe the sensitivity (blue) and precision (red) of synthetic InDel detection and the accumulation of false positive calls (grey) as a function of variant calling confidence threshold (Fisher exact test). (g) Limit of detection plot indicates the confidence with which synthetic InDels were detected relative to their expected allele frequencies. False positive InDel calls (grey) are included for comparison.

Supplementary Figure 11 Large deletions in the in silico chromosome and NA12878 genome

(a) Genome browser examples of large deletions illustrating characteristic coverage profiles for discordant-pair (orange; alignment position/orientation of paired reads violates expected relationship) and split-read (green; partial alignment of single read to two positions) alignments at synthetic breakpoint sites in chrIS_D and deletion sites previously annotated in the NA12878 genome (Personalis 1000 Genomes). (b) Normalized density of 3’ split-alignment termini, in forward (blue) and reverse (red) orientation, are plotted with respect to synthetic variant breakpoint sites (mean, n=3). (c) The accumulation of split-read (green) or discordant-pair (yellow) alignments is plotted against sequencing depth (x-fold coverage). Mean indicated, error bars = SD, n = 3.

Supplementary Figure 12 Large novel insertions in the in silico chromosome and NA12878 genome

(a) Genome browser examples of large novel insertions illustrating characteristic coverage profiles for hanging-read alignments (partnered read is unmapped) in forward (blue) and reverse (red) orientation at synthetic insertion sites in chrIS_D and insertion sites previously annotated in the NA12878 genome (Spiral Genetics). (b) Normalized density of 3’ hanging-alignment termini, in forward (blue) and reverse (red) orientation, are plotted with respect to synthetic insertion breakpoint sites (mean, n=3 for each class). (c) The accumulation of hanging-reads in forward and reverse orientations is plotted against sequencing depth (x-fold coverage).

Supplementary Figure 13 Mobile element insertions in the in silico chromosome

Sequins were constructed to represent examples of mobile-element insertion events (n = 3; based on Alu, MIR and L2 repeat classes) at non-annotated positions within chrIS_D. (a) Genome browser examples show characteristic coverage profiles for discordant-pair alignments in forward (blue) and reverse (red) orientation at synthetic mobile insertion sites. (b) Normalized density of 3’ discordant-pair alignment termini, in forward (blue) and reverse (red) orientation, are plotted with respect to synthetic mobile element insertion breakpoint sites (mean, n = 3 for each class). (c) The accumulation of hanging-reads in forward and reverse orientation is plotted against sequencing depth (x-fold coverage).

Supplementary Figure 14 Copy-number amplification in the in silico chromosome and NA12878 genome

Sequins were designed to represent copy number amplifications of different sizes and amplification numbers (6 × 95 bp, 4 × 202 bp and 2 × 621 bp) with respect to a single copy encoded in the artificial in silico chromosome. (a) Genome browser profile illustrates characteristic coverage by standard (grey), discordant- (orange) and split- (green) alignments at breakpoint (dashed line) in chrIS_D. (b) Scatter-plots indicate normalized standard alignment coverage within amplified region (green) and flanking regions (grey), relative to sequencing depth. The fold-difference in coverage observed between amplified region and flanking regions scales proportionally to copy number, as indicated below.

Supplementary Figure 15 Using sequins as copy-number reference scales

The t-statistic is determined by a one-way t-test between coverage of repeats (A,B,C and D at 1,2,4, and 8 copies, respectively) relative to background (calibrated to B repeat). (a) Frequency distribution illustrates the performance of the t-statistic in relation to an 8-fold shift in mean coverage established by the copy—number sequins scale. (b) Frequency distribution illustrates the t-statistic assigned to previously annotated variants in the NA12878 genome for comparison to copy number sequin scale above. (d) Scatter plot illustrates the observed abundance of the quantitative DNA scale. The inflection point (dashed line) indicates the lower limit (orange) beyond which copy-number elements are not detected or inaccurately measured. Above this line sequins form a quantitative DNA scale (red). For comparison, the corresponding abundance of representative repeat DNA features are indicated (right blue/green panel).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 2963 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Deveson, I., Chen, W., Wong, T. et al. Representing genetic variation with synthetic DNA standards. Nat Methods 13, 784–791 (2016). https://doi.org/10.1038/nmeth.3957

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing