Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sequencing pools of individuals — mining genome-wide polymorphism data without big funding

Key Points

  • Whole-genome sequencing of pools of individuals (Pool-seq) is a cost-effective approach to determine genome-wide allele frequencies in an unbiased manner from a large number of individuals.

  • Once minimum quality criteria have been met, Pool-seq-based allele frequency estimates are accurate and reliable.

  • Typical issues of Pool-seq are alignment problems due to copy number variation or problems in the reference genome. The calling of low-frequency alleles is challenging owing to the difficulty in distinguishing them from sequencing errors.

  • Pool-seq has been successfully applied to a wide range of applications, including bulk segregant analyses, evolve and resequence studies, evolutionary genome analyses, analyses of time-series data and cancer genomics.

  • Owing to its cost-effectiveness, Pool-seq will continue to be a powerful tool for studies that require genome-wide allele frequency data in a large number of population samples. New technological and analytical advances will facilitate the extraction of haplotype information from Pool-seq data.

Abstract

The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Cost-effectiveness of Pool-seq.
Figure 2: Comparison of sequencing strategies.
Figure 3: Pool-seq applications.

References

  1. Ellegren, H. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29, 51–63 (2014).

    PubMed  Google Scholar 

  2. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Google Scholar 

  4. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  5. Weigel, D. & Mott, R. The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10, 107 (2009).

    PubMed  PubMed Central  Google Scholar 

  6. Daetwyler, H. D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46, 858–865 (2014).

    CAS  PubMed  Google Scholar 

  7. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  9. Sheridan, C. Illumina claims $1,000 genome win. Nature Biotech. 32, 115 (2014).

    Google Scholar 

  10. Weinstock, G. M. Genomic approaches to studying the human microbiota. Nature 489, 250–256 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010). This study is the first to provide a statistical framework for the analysis of Pool-seq data in population genetics.

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol. 22, 3766–3779 (2013).

    CAS  PubMed  Google Scholar 

  13. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011).

    CAS  PubMed  Google Scholar 

  14. Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Disease gene identification strategies for exome sequencing. Eur. J. Hum. Genet. 20, 490–497 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).

    CAS  PubMed  Google Scholar 

  16. Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011).

    CAS  PubMed  Google Scholar 

  17. Pihlstrom, L., Rengmark, A., Bjornara, K. A. & Toft, M. Effective variant detection by targeted deep sequencing of DNA pools: an example from Parkinson's disease. Ann. Hum. Genet. 78, 243–252 (2014).

    CAS  PubMed  Google Scholar 

  18. Suvorov, A. et al. Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS ONE 8, e83547 (2013).

    PubMed  PubMed Central  Google Scholar 

  19. Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346–350 (2008).

    CAS  PubMed  Google Scholar 

  20. Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. & Babik, W. Accuracy of allele frequency estimation using pooled RNA-seq. Mol. Ecol. Resour. 14, 381–392 (2014).

    CAS  PubMed  Google Scholar 

  21. Gross, J. B., Furterer, A., Carlson, B. M. & Stahl, B. A. An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE 8, e55659 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Kozak, G. M., Brennan, R. S., Berdan, E. L., Fuller, R. C. & Whitehead, A. Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution 68, 63–80 (2014).

    CAS  PubMed  Google Scholar 

  23. Sloan, D. B. et al. De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol. Ecol. Resour. 12, 333–343 (2012).

    CAS  PubMed  Google Scholar 

  24. Gautier, M. et al. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22, 3165–3178 (2013).

    CAS  PubMed  Google Scholar 

  25. Arnold, B., Corbett-Detig, R. B., Hartl, D. & Bomblies, K. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22, 3179–3190 (2013).

    CAS  PubMed  Google Scholar 

  26. Karczewski, K. J. et al. Systematic functional regulatory assessment of disease-associated variants. Proc. Natl Acad. Sci. USA 110, 9607–9612 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    PubMed  PubMed Central  Google Scholar 

  28. Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).

    CAS  PubMed  Google Scholar 

  30. Qanbari, S. et al. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10, e1004148 (2014).

    PubMed  PubMed Central  Google Scholar 

  31. Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631–635 (2012).

    CAS  PubMed  Google Scholar 

  32. Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011).

    CAS  PubMed  Google Scholar 

  34. Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Robasky, K., Lewis, N. E. & Church, G. M. The role of replicates for error mitigation in next-generation sequencing. Nature Rev. Genet. 15, 56–62 (2014).

    CAS  PubMed  Google Scholar 

  37. Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002). This is a comprehensive review of pooling strategies.

    CAS  PubMed  Google Scholar 

  38. Zhu, Y., Bergland, A. O., Gonzalez, J. & Petrov, D. A. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS ONE 7, e41901 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Kofler, R. et al. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Schrider, D. R., Begun, D. J. & Hahn, M. W. Detecting highly differentiated copy-number variants from pooled population sequencing. Pac. Symp. Biocomput 1, 344–344 (2013).

    Google Scholar 

  41. Kapun, M., van Schalkwyk, H., McAllister, B., Flatt, T. & Schlötterer, C. Inference of chromosomal inversion dynamics from Pool-seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23, 1813–1827 (2014).

    CAS  PubMed  Google Scholar 

  42. Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8, e1002487 (2012). This study is the first to infer TE insertion sites and the population frequency of TE insertions from Pool-seq data.

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Sax, K. The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8, 552–560 (1923).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods 6, 550–551 (2009). This paper is the first to show that Pool-seq can be used to map induced mutations.

    CAS  PubMed  Google Scholar 

  45. Schneeberger, K. Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nature Rev. Genet. 15, 662–676 (2014).

    CAS  PubMed  Google Scholar 

  46. Hill, J. T. et al. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res. 23, 687–697 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Miller, A. C., Obholzer, N. D., Shah, A. N., Megason, S. G. & Moens, C. B. RNA-seq-based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23, 679–686 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Galvao, V. C. et al. Synteny-based mapping-by-sequencing enabled by targeted enrichment. Plant J. 71, 517–526 (2012).

    CAS  PubMed  Google Scholar 

  49. Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039–1042 (2010). This study provides proof that Pool-seq provides enough power to map complex traits.

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Wenger, J. W., Schwartz, K. & Sherlock, G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 6, e1000942 (2010).

    PubMed  PubMed Central  Google Scholar 

  51. Swinnen, S. et al. Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 22, 975–984 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Wade, M. J. Epistasis, complex traits, and mapping genes. Genetica 112–113, 59–69 (2001).

  53. Earley, E. J. & Jones, C. D. Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics 189, 1203–1209 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Bastide, H. et al. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9, e1003534 (2013). This papershows that Pool-seq allows highly accurate fine mapping using natural population samples.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Jeong, S. et al. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793 (2008).

    CAS  PubMed  Google Scholar 

  56. Kelly, J. K., Koseva, B. & Mojica, J. P. The genomic signal of partial sweeps in Mimulus guttatus. Genome Biol. Evol. 5, 1457–1469 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Beissinger, T. M. et al. A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number. Genetics 196, 829–840 (2014).

    CAS  PubMed  Google Scholar 

  58. Johansson, A. M., Pettersson, M. E., Siegel, P. B. & Carlborg, O. Genome-wide effects of long-term divergent selection. PLoS Genet. 6, e1001188 (2010).

    PubMed  PubMed Central  Google Scholar 

  59. Rubin, C. J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010). This is a particularly nice demonstration of the power of Pool-seq to detect selected loci in population samples.

    CAS  PubMed  Google Scholar 

  60. Burke, M. K. et al. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587–590 (2010). The is the first experimental evolution study measuring allele frequency changes using Pool-seq.

    CAS  PubMed  Google Scholar 

  61. Remolina, S. C., Chang, P. L., Leips, J., Nuzhdin, S. V. & Hughes, K. A. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution 66, 3390–3403 (2012).

    PubMed  PubMed Central  Google Scholar 

  62. Turner, T. L., Stewart, A. D., Fields, A. T., Rice, W. R. & Tarone, A. M. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Zhou, D. et al. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proc. Natl Acad. Sci. USA 108, 2349–2354 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Turner, T. L. & Miller, P. M. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633–642 (2012).

    PubMed  PubMed Central  Google Scholar 

  65. Tobler, R. et al. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31, 364–375 (2013).

    PubMed  PubMed Central  Google Scholar 

  66. Orozco-terWengel, P. et al. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21, 4931–4941 (2012).

    PubMed  PubMed Central  Google Scholar 

  67. Reed, L. K. et al. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197, 781–793 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Martins, N. et al. Host adaptation to viruses relies on few genes with different cross-resistance properties. Proc. Natl Acad. Sci. USA 111, 5938–5943 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Jalvingh, K. M., Chang, P. L., Nuzhdin, S. V. & Wertheim, B. Genomic changes under rapid evolution: selection for parasitoid resistance. Proc. Biol. Sci. 281, 20132303 (2014).

    PubMed  PubMed Central  Google Scholar 

  70. Magwire, M. M. et al. Genome-wide association studies reveal a simple genetic basis of resistance to naturally coevolving viruses in Drosophila melanogaster. PLoS Genet. 8, e1003057 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Turner, T. L., Bourne, E. C., Von Wettberg, E. J., Hu, T. T. & Nuzhdin, S. V. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nature Genet. 42, 260–263 (2010). The study is the first to show that ecologically important traits can be mapped with Pool-seq by comparing two functionally diverged populations.

    CAS  PubMed  Google Scholar 

  72. Lamichhaney, S. et al. Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 19345–19350 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Fabian, D. K. et al. Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America. Mol. Ecol. 21, 4748–4769 (2012).

    PubMed  PubMed Central  Google Scholar 

  74. Kolaczkowski, B., Kern, A. D., Holloway, A. K. & Begun, D. J. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187, 245–260 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Cheng, C. et al. Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics 190, 1417–1432 (2012).

    PubMed  PubMed Central  Google Scholar 

  76. Hancock, A. M. et al. Adaptations to climate in candidate genes for common metabolic disorders. PLoS Genet. 4, e32 (2008).

    PubMed  PubMed Central  Google Scholar 

  77. Hancock, A. M. et al. Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86 (2011).

    CAS  PubMed  Google Scholar 

  78. Fischer, M. C. et al. Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps. Mol. Ecol. 22, 5594–5607 (2013). This is a nice application of Pool-seq to find selected loci in a non-model organism.

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Günther, T. & Coop, G. Robust identification of local adaptation from allele frequencies. Genetics 195, 205–220 (2013). This paper presents the first statistical framework to identify significant associations of a given locus with one or more environmental variables using Pool-seq data.

    PubMed  PubMed Central  Google Scholar 

  80. Rubin, C. J. et al. Strong signatures of selection in the domestic pig genome. Proc. Natl Acad. Sci. USA 109, 19529–19536 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Axelsson, E. et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360–364 (2013).

    CAS  PubMed  Google Scholar 

  82. He, Z. et al. Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 7, e1002100 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. Nolte, V., Pandey, R. V., Kofler, R. & Schlötterer, C. Genome-wide patterns of natural variation reveal strong selective sweeps and ongoing genomic conflict in Drosophila mauritiana. Genome Res. 23, 99–110 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. True, J. R., Mercer, J. M. & Laurie, C. C. Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142, 507–523 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Casacuberta, E. & Gonzalez, J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 1503–1517 (2013).

    CAS  PubMed  Google Scholar 

  86. Kazazian, H. H. Jr Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).

    CAS  PubMed  Google Scholar 

  87. Boitard, S., Schlötterer, C., Nolte, V., Pandey, R. V. & Futschik, A. Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. Clément, J. A. et al. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains. PLoS Negl Trop. Dis. 7, e2591 (2013).

    PubMed  PubMed Central  Google Scholar 

  89. Foll, M. et al. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10, e1004185 (2014).

    PubMed  PubMed Central  Google Scholar 

  90. Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Barrick, J. E. & Lenski, R. E. Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb. Symp. Quant. Biol. 74, 119–129 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013).

    PubMed  PubMed Central  Google Scholar 

  93. Parts, L. et al. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21, 1131–1138 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  94. Illingworth, C. J., Parts, L., Schiffels, S., Liti, G. & Mustonen, V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol. Biol. Evol. 29, 1187–1197 (2012).

    CAS  PubMed  Google Scholar 

  95. Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S. & Petrov, D. A. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. arXiv 1303.5044 (2014).

  96. Traverse, C. C., Mayo-Smith, L. M., Poltak, S. R. & Cooper, V. S. Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections. Proc. Natl Acad. Sci. USA 110, E250–E259 (2013).

    CAS  PubMed  Google Scholar 

  97. Versace, E., Nolte, V., Pandey, R. V., Tobler, R. & Schlötterer, C. Experimental evolution reveals habitat-specific fitness dynamics among Wolbachia clades in Drosophila melanogaster. Mol. Ecol. 23, 802–814 (2014).

    PubMed  PubMed Central  Google Scholar 

  98. Barcellos-Hoff, M. H., Lyden, D. & Wang, T. C. The evolution of the cancer niche during multistage carcinogenesis. Nature Rev. Cancer 13, 511–518 (2013).

    CAS  Google Scholar 

  99. Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nature Rev. Cancer 6, 924–935 (2006).

    CAS  Google Scholar 

  100. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. Newburger, D. E. et al. Genome evolution during progression to breast cancer. Genome Res. 23, 1097–1108 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  102. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Aparicio, S. & Caldas, C. The implications of clonal genome evolution for cancer medicine. New Engl. J. Med. 368, 842–851 (2013).

    CAS  PubMed  Google Scholar 

  104. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306–313 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011).

    PubMed  PubMed Central  Google Scholar 

  106. Long, Q. et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE 6, e15292 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. Kessner, D., Turner, T. L. & Novembre, J. Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol. 30, 1145–1158 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. Burke, M. K., King, E. G., Shahrestani, P., Rose, M. R. & Long, A. D. Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome Biol. Evol. 6, 1–11 (2014).

    PubMed  Google Scholar 

  109. Eskin, I. et al. eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J. Computat. Biol. 20, 861–877 (2013).

    CAS  Google Scholar 

  110. Kofler, R. & Schlötterer, C. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31, 474–483 (2014).

    CAS  PubMed  Google Scholar 

  111. Imsland, F. et al. The Rose-comb mutation in chickens constitutes a structural rearrangement causing both altered comb morphology and defective sperm motility. Plos Genetics 8, e1002775 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Del Fabbro, C., Scalabrin, S., Morgante, M. & Giorgi, F. M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE 8, e85024 (2013).

    PubMed  PubMed Central  Google Scholar 

  113. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Google Scholar 

  114. Nevado, B., Ramos-Onsins, S. E. & Perez-Enciso, M. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23, 1764–1779 (2014).

    CAS  PubMed  Google Scholar 

  115. Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  120. Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinformatics 13, 239 (2012).

    PubMed  PubMed Central  Google Scholar 

  121. Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, i318–i324 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  122. Altmann, A. et al. vipR: variant identification in pooled DNA using R. Bioinformatics 27, I77–I84 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  123. Zhou, B. Y. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 28, 2569–2575 (2012).

    CAS  PubMed  Google Scholar 

  124. Chen, Q. & Sun, F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 14 (Suppl. 1), S1 (2013).

    PubMed  PubMed Central  Google Scholar 

  125. Druley, T. E. et al. Quantification of rare allelic variants from pooled genomic DNA. Nature Methods 6, 263–265 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. Vallania, F. L. et al. High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res. 20, 1711–1718 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  128. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012).

  129. Calvo, S. E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nature Genet. 42, 851–858 (2010).

    CAS  PubMed  Google Scholar 

  130. Fiston-Lavier, A.-S., Barron, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. bioRxiv http://dx.doi.org/10.1101/002964 (2014).

  131. Zhuang, J., Wang, J., Theurkauf, W. & Weng, Z. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. Kofler, R., Pandey, R. V. & Schlötterer, C. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-seq). Bioinformatics 27, 3435–3436 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Boitard, S. et al. Pool-HMM: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13, 337–340 (2013).

    PubMed  PubMed Central  Google Scholar 

  134. Ferretti, L., Ramos-Onsins, S. E. & Perez-Enciso, M. Population genomics from pool sequencing. Mol. Ecol. 22, 5561–5576 (2013).

    PubMed  Google Scholar 

  135. Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013).

    PubMed  PubMed Central  Google Scholar 

  136. Vitalis, R., Gautier, M., Dawson, K. J. & Beaumont, M. A. Detecting and measuring selection from gene frequency data. Genetics 196, 799–817 (2014).

    PubMed  Google Scholar 

  137. Gautier, M. & Vitalis, R. Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013).

    CAS  PubMed  Google Scholar 

  138. Feder, A. F., Petrov, D. A. & Bergland, A. O. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS ONE 7, e48588 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  139. Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J. & Hobert, O. CloudMap: a cloud-based pipeline for analysis of mutant genome sequences. Genetics 192, 1249–1269 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  140. Edwards, M. D. & Gifford, D. K. High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (Suppl. 6), S8 (2012).

    PubMed  PubMed Central  Google Scholar 

  141. Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L. & Harris, M. P. Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, 1017–1024 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  142. Austin, R. S. et al. Next-generation mapping of Arabidopsis genes. Plant J. 67, 715–725 (2011).

    CAS  PubMed  Google Scholar 

  143. Leshchiner, I. et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 22, 1541–1548 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  144. Prosperi, M. C. & Salemi, M. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics 28, 132–133 (2012).

    CAS  PubMed  Google Scholar 

  145. Zagordi, O., Bhattacharya, A., Eriksson, N. & Beerenwinkel, N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011).

    PubMed  PubMed Central  Google Scholar 

  146. Eyre, D. W. et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput. Biol. 9, e1003059 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  147. Astrovskaya, I. et al. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12 (Suppl. 6), S1 (2011).

    PubMed  PubMed Central  Google Scholar 

  148. Yang, X., Charlebois, P., Macalalad, A., Henn, M. R. & Zody, M. C. V-Phaser 2: variant inference for viral populations. BMC Genomics 14, 674 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  149. Töpfer, A. et al. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput. Biol. 10, e1003515 (2014).

    PubMed  PubMed Central  Google Scholar 

  150. Töpfer, A. et al. Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20, 113–123 (2013).

    PubMed  PubMed Central  Google Scholar 

  151. Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N. & Roth, V. HIV haplotype inference using a constraint-based Dirichlet process mixture model. Machine Learning in Computational Biology NIPS Workshop (2010).

    Google Scholar 

  152. Pandey, R. V., Kofler, R., Orozco-terWengel, P., Nolte, V. & Schlötterer, C. PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  153. Chen, X., Listman, J. B., Slack, F. J., Gelernter, J. & Zhao, H. Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet. Epidemiol. 36, 549–560 (2012).

    PubMed  PubMed Central  Google Scholar 

  154. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 11, 396–398 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors apologize to all colleagues who were not cited owing to space limitations. They are grateful to all colleagues who shared unpublished manuscripts, especially D. Kessner, Q. Long, M. Pérez Enciso, A. S. Fiston-Lavier and K. Schneeberger for comments and discussions. They thank members of the Institut für Populationsgenetik, in particular A. Betancourt, M. Dolezal, A. Futschik and A. Kalinka for discussion and comments on earlier versions of the manuscript. This work has been supported by the ERC (ArchAdapt) and the Austrian Science Funds (FWF, W1225).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Schlötterer.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Next-generation sequencing

(NGS; also known as second-generation sequencing). An umbrella term for different sequencing platforms delivering millions of short DNA sequence reads.

Reads

DNA sequences that are generated by next-generation sequencing.

Pool-seq

A sequencing technique in which sequencing libraries are not prepared from DNA of a single individual or cell but from a mixture of DNA fragments originating from different individuals or cells. In the context of this Review, Pool-seq is used to describe the unbiased sequencing of the entire genome.

Coverage

The number of reads that span a given genomic position.

Sequencing libraries

Sets of fragmented DNA extracted from one or more individuals that serve as the template for subsequent sequencing.

Exome sequencing

A sequencing approach in which the complexity of the genome is reduced through hybridization to exonic sequences, which results in a higher sequence coverage of protein-coding regions.

Restriction-site-associated DNA markers

Sequence polymorphisms in close proximity to a restriction enzyme recognition site.

Linkage disequilibrium

(LD). Nonrandom association between alleles at two loci. In outcrossing diploid individuals, the genotypes need to be sorted into haplotypes in a statistical procedure called phasing.

Genetic markers

Polymorphic loci that could be scored with a genotyping technique.

F2 analysis

Analysis of mapping populations generated by the F2 design. The F1 progeny from crossing two phenotypically different parental strains are themselves crossed to produce an F2 population that is segregating for the phenotype of interest. The F2 mapping population may carry up to three genotypes at every marker and therefore allows the detection of additive and dominance effects, as well as interactions between loci.

Phased genomic sequences

Genome sequences for which the haplotype phase (that is, the combination of alleles or genetic markers that coexist on a single chromosome) has been determined.

Imputation

In statistics, it refers to the replacement of missing data with values. In genomics, it describes the use of haplotype sequences to fill in missing sequence information.

Haplotypes

The combination of alleles or genetic markers that coexist on a single chromosome. Chromosomal regions carrying a haplotype are inherited as intact physical units until they are broken up by recombination.

Pool genome-wide association studies

(Pool-GWASs). Genotype–phenotype mapping studies in which phenotypically extreme individuals are grouped and sequenced as pools. Causative variants are identified by contrasting the allele frequencies between the pools.

Evolve and resequence studies

Studies that combine experimental evolution with next-generation sequencing. They make use of controlled environmental, demographic and selective variables to facilitate genotype–phenotype mapping.

Forward genetics

An approach in which mutations induced by random mutagenesis that lead to the disruption of gene function are identified based on their phenotypes. The causative mutation is traditionally identified by positional cloning or by a candidate-gene approach.

Bulk segregant analysis

(BSA). Analysis in which offspring from diverged parents are phenotyped and the DNA of individuals from opposing tails of the phenotypic distribution is combined (pooled). Causative variants are identified by contrasting allele frequency differences among the pools.

Epistatic interactions

Non-additive interactions between genes in which the effect of an allele at one locus is modified by the genotypes at other loci in the genome. The resulting phenotype is different from that expected by summing the independent effects of the individual loci.

Introgress

Introducing a genomic region from one strain or species into that of another by repeated backcrossing. By selecting for the phenotype of interest, the genomes become isogenic except for the chromosomal regions causing the selected phenotype.

Paired-end reads

DNA fragments that were sequenced from both ends, yielding pairs of reads that are separated by a defined distance that is dependent on the library preparation protocol.

Soft clipping

Substrings at either end of reads that were not aligned with a local alignment algorithm and are thereby excluded in the subsequent analysis.

Proper pairs

Paired-end reads where both pairs can be mapped to the same chromosomes within a distance pre-specified by the insert size chosen during library preparation.

Broken pairs

Paired-end reads that do not map as proper pairs.

Mapping quality

Log (base 10) transformed measure of the probability that a read is incorrectly mapped multiplied by 10.

Base quality

Log (base 10) transformed measure of the probability that a given base call is incorrect multiplied by 10.

Insertions and deletions

(Indels). DNA sequences that have been inserted or deleted from a genomic region. As only phylogenetic analysis allows the distinction between insertions and deletions, indel has been used as an indifferent term.

Strand bias

A variant that is significantly more likely to occur within reads that originate from one of the two strands of DNA.

GWASs

Trait mapping studies that rely on a statistical test to determine associations between sequence variants and a given phenotype in natural populations.

Cline

The gradual change in phenotypes or allele frequencies along a geographical or environmental gradient.

Hitchhiking

The population genetic mechanism by which a neutral, or in some cases slightly deleterious, mutation increases in population frequency solely as a result of physical linkage with a positively selected mutation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schlötterer, C., Tobler, R., Kofler, R. et al. Sequencing pools of individuals — mining genome-wide polymorphism data without big funding. Nat Rev Genet 15, 749–763 (2014). https://doi.org/10.1038/nrg3803

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg3803

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing