Protocol | Published:

Deriving genotypes from RAD-seq short-read data using Stacks

Nature Protocols volume 12, pages 26402659 (2017) | Download Citation

Abstract

Restriction site-associated DNA sequencing (RAD-seq) allows for the genome-wide discovery and genotyping of single-nucleotide polymorphisms in hundreds of individuals at a time in model and nonmodel species alike. However, converting short-read sequencing data into reliable genotype data remains a nontrivial task, especially as RAD-seq is used in systems that have very diverse genomic properties. Here, we present a protocol to analyze RAD-seq data using the Stacks pipeline. This protocol will be of use in areas such as ecology and population genetics. It covers the assessment and demultiplexing of the sequencing data, read mapping, inference of RAD loci, genotype calling, and filtering of the output data, as well as providing two simple examples of downstream biological analyses. We place special emphasis on checking the soundness of the procedure and choosing the main parameters, given the properties of the data. The procedure can be completed in 1 week, but determining definitive methodological choices will typically take up to 1 month.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , , & Genotyping-by-sequencing in ecological and conservation genomics. Mol. Ecol. 22, 2841–2847 (2013).

  2. 2.

    , , , & Harnessing the power of RADseq for ecological and evolutionary genomics. Nat. Rev. Genet. 17, 81–92 (2016).

  3. 3.

    et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3, e3376 (2008).

  4. 4.

    et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6, e19379 (2011).

  5. 5.

    , , , & Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 7, e37135 (2012).

  6. 6.

    et al. RAD capture (Rapture): flexible and efficient sequence-based genotyping. Genetics 202, 389–400 (2016).

  7. 7.

    et al. ezRAD: a simplified method for genomic genotyping in non-model organisms. PeerJ 1, e203 (2013).

  8. 8.

    , , & quaddRAD: a new high-multiplexing and PCR duplicate removal ddRAD protocol produces novel evolutionary insights in a nonradiating cichlid lineage. Mol. Ecol. 26, 2783–2795 (2017).

  9. 9.

    et al. Hybridization capture using RAD probes (hyRAD), a new tool for performing genomic analyses on collection specimens. PLoS One 11, e0151651 (2016).

  10. 10.

    , , , & Stacks: building and genotyping loci de novo from short-read sequences. G3 1, 171–182 (2011).

  11. 11.

    , , , & Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).

  12. 12.

    et al. The population structure and recent colonization history of Oregon threespine stickleback determined using restriction-site associated DNA-sequencing. Mol. Ecol. 22, 2864–2883 (2013).

  13. 13.

    et al. Evolution of stickleback in 50 years on earthquake-uplifted islands. Proc. Natl. Acad. Sci. USA 112, E7204–E7212 (2015).

  14. 14.

    , & Multispecies outcomes of sympatric speciation after admixture with the source population in two radiations of Nicaraguan Crater Lake cichlids. PLoS Genet. 12, e1006157 (2016).

  15. 15.

    et al. Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake. Science 350, 1493–1498 (2015).

  16. 16.

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2015).

  17. 17.

    , , & Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).

  18. 18.

    & Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–714 (2011).

  19. 19.

    , & ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).

  20. 20.

    PyRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinforma. Oxf. Engl. 30, 1844–1849 (2014).

  21. 21.

    Search and clustering orders of magnitude faster than BLAST. Bioinforma. Oxf. Engl. 26, 2460–2461 (2010).

  22. 22.

    , & AftrRAD: a pipeline for accurate and efficient de novo assembly of RADseq data. Mol. Ecol. Resour. 15, 1163–1171 (2015).

  23. 23.

    , & Accurate anchoring alignment of divergent sequences. Bioinforma. Oxf. Engl. 22, 29–34 (2006).

  24. 24.

    et al. TASSEL-GBS: a high capacity genotyping by sequencing analysis pipeline. PLoS One 9, e90346 (2014).

  25. 25.

    et al. Switchgrass genomic diversity, ploidy, and evolution: novel insights from a network-based SNP discovery protocol. PLoS Genet. 9, e1003215 (2013).

  26. 26.

    , & dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms. PeerJ 2, e431 (2014).

  27. 27.

    , & Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads. Bioinforma. Oxf. Engl. 28, 2732–2737 (2012).

  28. 28.

    et al. Bioinformatic processing of RAD-seq data dramatically impacts downstream population genetic inference. Methods Ecol. Evol. (2016).

  29. 29.

    A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinforma. Oxf. Engl. 27, 2987–2993 (2011).

  30. 30.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  31. 31.

    & Haplotype-based variant detection from short-read sequencing. Preprint at (2012).

  32. 32.

    et al. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6, e1000862 (2010).

  33. 33.

    & adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinforma. Oxf. Engl. 27, 3070–3071 (2011).

  34. 34.

    & Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinforma. Oxf. Engl. 25, 1754–1760 (2009).

  35. 35.

    et al. RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data. Mol. Ecol. Resour. (2016).

  36. 36.

    , & Predicting RAD-seq marker numbers across the eukaryotic tree of life. Genome Biol. Evol. (2015).

  37. 37.

    et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  38. 38.

    et al. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 515, 261–263 (2014).

  39. 39.

    et al. A new model army: emerging fish models to study the genomics of vertebrate Evo-Devo. J. Exp. Zool. B Mol. Dev. Evol. 324, 316–341 (2015).

  40. 40.

    et al. The Atlantic salmon genome provides insights into rediploidization. Nature 533, 200–205 (2016).

  41. 41.

    , & Defining loci in restriction-based reduced representation genomic data from nonmodel species: sources of bias and diagnostics for optimal clustering. Biomed. Res. Int. 2014, 675158 (2014).

  42. 42.

    et al. Similarity thresholds used in DNA sequence assembly from short reads can reduce the comparability of population histories across species. PeerJ 3, e895 (2015).

  43. 43.

    et al. Population structure of Atlantic mackerel inferred from RAD-seq-derived SNP markers: effects of sequence clustering parameters and hierarchical SNP selection. Mol. Ecol. Resour. 16, 991–1001 (2016).

  44. 44.

    , & Lost in parameter space: a road map for stacks. Methods Ecol. Evol. 8, 1360–1373 (2017).

  45. 45.

    Genetic Data Analysis II: Methods for Discrete Population Genetic Data (Sinauer Associates, 1996).

  46. 46.

    , & Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131, 479–491 (1992).

  47. 47.

    Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evol. Int. J. Org. Evol. 60, 2399–2402 (2006).

  48. 48.

    , , & in Phylogeography and Population Genetics in Crustacea 31–55 (CRC Press, 2011).

Download references

Acknowledgements

We thank J. Paris and N. Rayamajhi for their help in testing the procedure and for discussion of the manuscript.

Author information

Affiliations

  1. Department of Animal Biology, University of Illinois at Urbana–Champaign, Urbana, Illinois, USA.

    • Nicolas C Rochette
    •  & Julian M Catchen

Authors

  1. Search for Nicolas C Rochette in:

  2. Search for Julian M Catchen in:

Contributions

N.C.R. and J.M.C. designed the protocol, performed experiments, and wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Julian M Catchen.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Figures

    Supplementary Figures 1 and 2.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nprot.2017.123

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.