The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.
At a glance
- Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29, 51–63 (2014).
- dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). et al.
- Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010). et al.
- 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
- The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10, 107 (2009). &
- Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46, 858–865 (2014). et al.
- Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). et al.
- The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
- Illumina claims $1,000 genome win. Nature Biotech. 32, 115 (2014).
- Genomic approaches to studying the human microbiota. Nature 489, 250–256 (2012).
- The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207–218 (2010).
This study is the first to provide a statistical framework for the analysis of Pool-seq data in population genetics.
- Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol. 22, 3766–3779 (2013). et al.
- Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745–755 (2011). et al.
- Disease gene identification strategies for exome sequencing. Eur. J. Hum. Genet. 20, 490–497 (2012). , , &
- RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009). , &
- Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499–510 (2011). et al.
- Effective variant detection by targeted deep sequencing of DNA pools: an example from Parkinson's disease. Ann. Hum. Genet. 78, 243–252 (2014). , , &
- Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS ONE 8, e83547 (2013). et al.
- Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346–350 (2008). , &
- Accuracy of allele frequency estimation using pooled RNA-seq. Mol. Ecol. Resour. 14, 381–392 (2014). , , , &
- An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE 8, e55659 (2013). , , &
- Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution 68, 63–80 (2014). , , , &
- De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol. Ecol. Resour. 12, 333–343 (2012). et al.
- The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22, 3165–3178 (2013). et al.
- RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22, 3179–3190 (2013). , , &
- Systematic functional regulatory assessment of disease-associated variants. Proc. Natl Acad. Sci. USA 110, 9607–9612 (2013). et al.
- Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013). et al.
- Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012). , , , &
- Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010). &
- Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10, e1004148 (2014). et al.
- Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631–635 (2012). et al.
- High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 19872–19877 (2013). et al.
- A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491–498 (2011). et al.
- Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011). , &
- Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008). , &
- The role of replicates for error mitigation in next-generation sequencing. Nature Rev. Genet. 15, 56–62 (2014). , &
- DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).
This is a comprehensive review of pooling strategies.
, , , &
- Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS ONE 7, e41901 (2012). , , &
- PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011). et al.
- Detecting highly differentiated copy-number variants from pooled population sequencing. Pac. Symp. Biocomput 1, 344–344 (2013). , &
- Inference of chromosomal inversion dynamics from Pool-seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23, 1813–1827 (2014). , , , &
- Sequencing of pooled DNA samples (Pool-seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8, e1002487 (2012).
This study is the first to infer TE insertion sites and the population frequency of TE insertions from Pool-seq data.
- The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8, 552–560 (1923).
- SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods 6, 550–551 (2009).
This paper is the first to show that Pool-seq can be used to map induced mutations.
- Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nature Rev. Genet. 15, 662–676 (2014).
- MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res. 23, 687–697 (2013). et al.
- RNA-seq-based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23, 679–686 (2013). , , , &
- Synteny-based mapping-by-sequencing enabled by targeted enrichment. Plant J. 71, 517–526 (2012). et al.
- Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039–1042 (2010).
This study provides proof that Pool-seq provides enough power to map complex traits.
- Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 6, e1000942 (2010). , &
- Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 22, 975–984 (2012). et al.
- Epistasis, complex traits, and mapping genes. Genetica 112–113, 59–69 (2001).
- Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics 189, 1203–1209 (2011). &
- A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9, e1003534 (2013).
This papershows that Pool-seq allows highly accurate fine mapping using natural population samples.
- The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783–793 (2008). et al.
- The genomic signal of partial sweeps in Mimulus guttatus. Genome Biol. Evol. 5, 1457–1469 (2013). , &
- A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number. Genetics 196, 829–840 (2014). et al.
- Genome-wide effects of long-term divergent selection. PLoS Genet. 6, e1001188 (2010). , , &
- Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591 (2010).
This is a particularly nice demonstration of the power of Pool-seq to detect selected loci in population samples.
- Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587–590 (2010).
The is the first experimental evolution study measuring allele frequency changes using Pool-seq.
- Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution 66, 3390–3403 (2012). , , , &
- Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011). , , , &
- Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proc. Natl Acad. Sci. USA 108, 2349–2354 (2011). et al.
- Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633–642 (2012). &
- Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31, 364–375 (2013). et al.
- Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21, 4931–4941 (2012). et al.
- Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197, 781–793 (2014). et al.
- Host adaptation to viruses relies on few genes with different cross-resistance properties. Proc. Natl Acad. Sci. USA 111, 5938–5943 (2014). et al.
- Genomic changes under rapid evolution: selection for parasitoid resistance. Proc. Biol. Sci. 281, 20132303 (2014). , , &
- Genome-wide association studies reveal a simple genetic basis of resistance to naturally coevolving viruses in Drosophila melanogaster. PLoS Genet. 8, e1003057 (2012). et al.
- Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nature Genet. 42, 260–263 (2010).
The study is the first to show that ecologically important traits can be mapped with Pool-seq by comparing two functionally diverged populations.
, , , &
- Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 19345–19350 (2012). et al.
- Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America. Mol. Ecol. 21, 4748–4769 (2012). et al.
- Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187, 245–260 (2011). , , &
- Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics 190, 1417–1432 (2012). et al.
- Adaptations to climate in candidate genes for common metabolic disorders. PLoS Genet. 4, e32 (2008). et al.
- Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 83–86 (2011). et al.
- Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps. Mol. Ecol. 22, 5594–5607 (2013).
This is a nice application of Pool-seq to find selected loci in a non-model organism.
- Robust identification of local adaptation from allele frequencies. Genetics 195, 205–220 (2013).
This paper presents the first statistical framework to identify significant associations of a given locus with one or more environmental variables using Pool-seq data.
- Strong signatures of selection in the domestic pig genome. Proc. Natl Acad. Sci. USA 109, 19529–19536 (2012). et al.
- The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360–364 (2013). et al.
- Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 7, e1002100 (2011). et al.
- Genome-wide patterns of natural variation reveal strong selective sweeps and ongoing genomic conflict in Drosophila mauritiana. Genome Res. 23, 99–110 (2013). , , &
- Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142, 507–523 (1996). , &
- The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 1503–1517 (2013). &
- Mobile elements: drivers of genome evolution. Science 303, 1626–1632 (2004).
- Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186 (2012). , , , &
- Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains. PLoS Negl Trop. Dis. 7, e2591 (2013). et al.
- Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10, e1004185 (2014). et al.
- Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571–574 (2013). et al.
- Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb. Symp. Quant. Biol. 74, 119–129 (2009). &
- Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013). &
- Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21, 1131–1138 (2011). et al.
- Quantifying selection acting on a complex trait using allele frequency time series data. Mol. Biol. Evol. 29, 1187–1197 (2012). , , , &
- Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. arXiv 1303.5044 (2014). , , , &
- Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections. Proc. Natl Acad. Sci. USA 110, E250–E259 (2013). , , &
- Experimental evolution reveals habitat-specific fitness dynamics among Wolbachia clades in Drosophila melanogaster. Mol. Ecol. 23, 802–814 (2014). , , , &
- The evolution of the cancer niche during multistage carcinogenesis. Nature Rev. Cancer 13, 511–518 (2013). , &
- Cancer as an evolutionary and ecological process. Nature Rev. Cancer 6, 924–935 (2006). , , &
- Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506–510 (2012). et al.
- Genome evolution during progression to breast cancer. Genome Res. 23, 1097–1108 (2013). et al.
- The life history of 21 breast cancers. Cell 149, 994–1007 (2012). et al.
- The implications of clonal genome evolution for cancer medicine. New Engl. J. Med. 368, 842–851 (2013). &
- Clonal evolution in cancer. Nature 481, 306–313 (2012). &
- Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 9530–9535 (2011). , , , &
- PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE 6, e15292 (2011). et al.
- Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol. 30, 1145–1158 (2013). , &
- Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome Biol. Evol. 6, 1–11 (2014). , , , &
- eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J. Computat. Biol. 20, 861–877 (2013). et al.
- A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31, 474–483 (2014). &
- The Rose-comb mutation in chickens constitutes a structural rearrangement causing both altered comb morphology and defective sperm motility. Plos Genetics 8, e1002775 (2012). et al.
- An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE 8, e85024 (2013). , , &
- Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).
- Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23, 1764–1779 (2014). , &
- Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009). et al.
- Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357–359 (2012). &
- Dindel: accurate indel calls from short-read data. Genome Res. 21, 961–973 (2011). et al.
- The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). et al.
- VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283–2285 (2009). et al.
- SNP calling by sequencing pooled samples. BMC Bioinformatics 13, 239 (2012). et al.
- A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, i318–i324 (2010).
- vipR: variant identification in pooled DNA using R. Bioinformatics 27, I77–I84 (2011). et al.
- An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 28, 2569–2575 (2012).
- A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 14 (Suppl. 1), S1 (2013). &
- Quantification of rare allelic variants from pooled genomic DNA. Nature Methods 6, 263–265 (2009). et al.
- High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res. 20, 1711–1718 (2010). et al.
- SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011). , , , &
- Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012). &
- High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nature Genet. 42, 851–858 (2010). et al.
- T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. bioRxiv http://dx.doi.org/10.1101/002964 (2014). , , &
- TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 6826–6838 (2014). , , &
- PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-seq). Bioinformatics 27, 3435–3436 (2011). , &
- Pool-HMM: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13, 337–340 (2013). et al.
- Population genomics from pool sequencing. Mol. Ecol. 22, 5561–5576 (2013). , &
- Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 3124–3140 (2013). , , , &
- Detecting and measuring selection from gene frequency data. Genetics 196, 799–817 (2014). , , &
- Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654–668 (2013). &
- LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS ONE 7, e48588 (2012). , &
- CloudMap: a cloud-based pipeline for analysis of mutant genome sequences. Genetics 192, 1249–1269 (2012). , , , &
- High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (Suppl. 6), S8 (2012). &
- Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, 1017–1024 (2012). , , , &
- Next-generation mapping of Arabidopsis genes. Plant J. 67, 715–725 (2011). et al.
- Mutation mapping and identification by whole-genome sequencing. Genome Res. 22, 1541–1548 (2012). et al.
- QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics 28, 132–133 (2012). &
- ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011). , , &
- Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput. Biol. 9, e1003059 (2013). et al.
- Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12 (Suppl. 6), S1 (2011). et al.
- V-Phaser 2: variant inference for viral populations. BMC Genomics 14, 674 (2013). , , , &
- Viral quasispecies assembly via maximal clique enumeration. PLoS Comput. Biol. 10, e1003515 (2014). et al.
- Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20, 113–123 (2013). et al.
- HIV haplotype inference using a constraint-based Dirichlet process mixture model. Machine Learning in Computational Biology NIPS Workshop (2010). , , , &
- PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011). , , , &
- Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet. Epidemiol. 36, 549–560 (2012). , , , &
- PyClone: statistical inference of clonal population structure in cancer. Nature Methods 11, 396–398 (2014). et al.