Article series: Applications of next-generation sequencing

Sequencing pools of individuals — mining genome-wide polymorphism data without big funding

Journal name:
Nature Reviews Genetics
Volume:
15,
Pages:
749–763
Year published:
DOI:
doi:10.1038/nrg3803
Published online

Abstract

The analysis of polymorphism data is becoming increasingly important as a complementary tool to classical genetic analyses. Nevertheless, despite plunging sequencing costs, genomic sequencing of individuals at the population scale is still restricted to a few model species. Whole-genome sequencing of pools of individuals (Pool-seq) provides a cost-effective alternative to sequencing individuals separately. With the availability of custom-tailored software tools, Pool-seq is being increasingly used for population genomic research on both model and non-model organisms. In this Review, we not only demonstrate the breadth of questions that are being addressed by Pool-seq but also discuss its limitations and provide guidelines for users.

At a glance

Figures

  1. Cost-effectiveness of Pool-seq.
    Figure 1: Cost-effectiveness of Pool-seq.

    The accuracy of allele frequency estimates is compared for whole-genome sequencing of pools of individuals (Pool-seq) and whole-genome sequencing of individuals using the ratio of the standard deviation (SD) of the estimated allele frequency with both methods. The same number of reads is used for both sequencing strategies. A value smaller than one indicates that Pool-seq is more accurate than sequencing of individuals. a | The influence of the pool size is shown. A larger pool size results in higher accuracy of Pool-seq, but Pool-seq still produces more accurate allele frequency estimates even for pool sizes of 50 individuals in most comparisons. Only when the number of sequenced individuals approaches the pool size does sequencing of individuals become the superior strategy. b | Influence of coverage and variation in representation of individuals in a pool is shown. With a lower coverage per individual, the advantage of Pool-seq decreases. It should be noted that with a decreasing coverage per individual, the two approaches produce very similar types of data; that is, sequencing of individuals tends to show the same limitations as Pool-seq, such as for estimating linkage disequilibrium and for distinguishing sequencing errors from low-frequency polymorphisms. Variation in the representation of individuals in the DNA pool reduces the accuracy of Pool-seq only slightly (0% (that is, all individuals are uniformly represented; orange line) and 30% (light blue line)). The graphs were generated with the PIFs software12, ignoring sequencing errors.

  2. Comparison of sequencing strategies.
    Figure 2: Comparison of sequencing strategies.

    Three different sequencing approaches — whole-genome sequencing (part a), exome sequencing (part b) and restriction-site-associated DNA sequencing (RAD-seq; part c) — are compared, and sequencing of individuals (left panel) is contrasted with sequencing of pools of individuals (right panel). Reads are coloured to reflect the individual from which they originate. In exome sequencing, sequencing libraries are enriched for exonic sequences (part b). RAD-seq only determines the sequence next to restriction sites, which results in stacked sequence reads (part c). Both exome sequencing and RAD-seq direct the sequencing efforts to targeted regions. This reduction in genome coverage allows a higher read count at a given genomic position and thus a more accurate allele frequency estimate at the covered genomic regions than whole-genome sequencing.

  3. Pool-seq applications.
    Figure 3: Pool-seq applications.

    Whole-genome sequencing of pools of individuals (Pool-seq) is a versatile technology that may be used for a wide range of applications. a | Pool genome-wide association study (Pool-GWAS) was used to examine female abdominal pigmentation in Drosophila melanogaster. Contrasting the allele frequencies in pools of light and dark females identified candidate single-nucleotide polymorphisms (SNPs) with an exceptionally high mapping resolution (<210 bp) near the pigmentation genes tan and bric-à-brac 1 (Ref. 54). b | A comparison of neutral SNPs (left panel) and SNPs associated with salinity (right panel) in herring is shown. Although neutral SNPs show no population differentiation, evolutionarily selected SNPs clearly distinguish high-salinity (Atlantic) and low-salinity (Baltic Sea) populations72. c | The Manhattan plot shows the significance of SNPs in the chicken genome in a test for loci selected during domestication. Domesticated species are compared to a wild population. One domestication-specific adaptation is in the thyroid-stimulating hormone receptor (TSHR) gene59. d | Polymorphism on chromosome 3R of Drosophila mauritiana and D. melanogaster is shown. The reduced recombination rate towards the centromere in D. melanogaster is reflected by the lower polymorphism level, whereas polymorphism level remains high in D. mauritiana, reflecting the differences in the recombination landscape between the two species83. e | The time series shows the dynamics of three Burkholderia cenocepacia morphs: ruffled (R), studded (S) and wrinkly (W). The upper panel displays the relative frequencies of the morphs, including morph switches, as inferred from allele frequency analyses using Pool-seq. “M” indicates mutations acquired during the time course. The lower panel schematically shows the morph switches during the experiment96. f | Clonal evolution in human leukaemia before and after chemotherapy is shown. An initial leukaemia cell (black) expands by cell division to form a leukaemic cell population (grey area). During this expansion, additional mutations are acquired in minor subclones (orange and pink areas). Treatment with chemotherapy reduced the leukaemic cell population, and only one of the minor subclones survived chemotherapy treatment to expand during post-treatment relapse. Across different leukaemia samples, such an approach was able to distinguish whether the mutation that is possibly causing relapse was already present before chemotherapy (this example) or emerged after chemotherapy (not shown)100. FDR, false discovery rate. Figure reproduced with permission from: a, Ref. 54, © 2013 Bastide et al.; b, Ref. 72, National Academy of Sciences; c, Ref. 59, Nature Publishing Group; d, Ref. 83, Cold Spring Harbor Laboratory Press; e, Ref. 96, National Academy of Sciences; f, Ref. 100, Nature Publishing Group.

References

  1. Ellegren, H. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol. 29, 5163 (2014).
  2. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308311 (2001).
  3. International HapMap, C. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 5258 (2010).
  4. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 5665 (2012).
  5. Weigel, D. & Mott, R. The 1001 genomes project for Arabidopsis thaliana. Genome Biol. 10, 107 (2009).
  6. Daetwyler, H. D. et al. Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46, 858865 (2014).
  7. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747753 (2009).
  8. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661678 (2007).
  9. Sheridan, C. Illumina claims $1,000 genome win. Nature Biotech. 32, 115 (2014).
  10. Weinstock, G. M. Genomic approaches to studying the human microbiota. Nature 489, 250256 (2012).
  11. Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186, 207218 (2010).
    This study is the first to provide a statistical framework for the analysis of Pool-seq data in population genetics.
  12. Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: pool-versus individual-based genotyping. Mol. Ecol. 22, 37663779 (2013).
  13. Bamshad, M. J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nature Rev. Genet. 12, 745755 (2011).
  14. Gilissen, C., Hoischen, A., Brunner, H. G. & Veltman, J. A. Disease gene identification strategies for exome sequencing. Eur. J. Hum. Genet. 20, 490497 (2012).
  15. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 5763 (2009).
  16. Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Rev. Genet. 12, 499510 (2011).
  17. Pihlstrom, L., Rengmark, A., Bjornara, K. A. & Toft, M. Effective variant detection by targeted deep sequencing of DNA pools: an example from Parkinson's disease. Ann. Hum. Genet. 78, 243252 (2014).
  18. Suvorov, A. et al. Intra-specific regulatory variation in Drosophila pseudoobscura. PLoS ONE 8, e83547 (2013).
  19. Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression differences within and between Drosophila species. Nature Genet. 40, 346350 (2008).
  20. Konczal, M., Koteja, P., Stuglik, M. T., Radwan, J. & Babik, W. Accuracy of allele frequency estimation using pooled RNA-seq. Mol. Ecol. Resour. 14, 381392 (2014).
  21. Gross, J. B., Furterer, A., Carlson, B. M. & Stahl, B. A. An integrated transcriptome-wide analysis of cave and surface dwelling Astyanax mexicanus. PLoS ONE 8, e55659 (2013).
  22. Kozak, G. M., Brennan, R. S., Berdan, E. L., Fuller, R. C. & Whitehead, A. Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution 68, 6380 (2014).
  23. Sloan, D. B. et al. De novo transcriptome assembly and polymorphism detection in the flowering plant Silene vulgaris (Caryophyllaceae). Mol. Ecol. Resour. 12, 333343 (2012).
  24. Gautier, M. et al. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22, 31653178 (2013).
  25. Arnold, B., Corbett-Detig, R. B., Hartl, D. & Bomblies, K. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22, 31793190 (2013).
  26. Karczewski, K. J. et al. Systematic functional regulatory assessment of disease-associated variants. Proc. Natl Acad. Sci. USA 110, 96079612 (2013).
  27. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
  28. Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. 22, 17481759 (2012).
  29. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499511 (2010).
  30. Qanbari, S. et al. Classic selective sweeps revealed by massive sequencing in cattle. PLoS Genet. 10, e1004148 (2014).
  31. Pasaniuc, B. et al. Extremely low-coverage sequencing and imputation increases power for genome-wide association studies. Nature Genet. 44, 631635 (2012).
  32. Lou, D. I. et al. High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing. Proc. Natl Acad. Sci. USA 110, 1987219877 (2013).
  33. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genet. 43, 491498 (2011).
  34. Minoche, A. E., Dohm, J. C. & Himmelbauer, H. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems. Genome Biol. 12, R112 (2011).
  35. Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 18511858 (2008).
  36. Robasky, K., Lewis, N. E. & Church, G. M. The role of replicates for error mitigation in next-generation sequencing. Nature Rev. Genet. 15, 5662 (2014).
  37. Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862871 (2002).
    This is a comprehensive review of pooling strategies.
  38. Zhu, Y., Bergland, A. O., Gonzalez, J. & Petrov, D. A. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster. PLoS ONE 7, e41901 (2012).
  39. Kofler, R. et al. PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925 (2011).
  40. Schrider, D. R., Begun, D. J. & Hahn, M. W. Detecting highly differentiated copy-number variants from pooled population sequencing. Pac. Symp. Biocomput 1, 344344 (2013).
  41. Kapun, M., van Schalkwyk, H., McAllister, B., Flatt, T. & Schlötterer, C. Inference of chromosomal inversion dynamics from Pool-seq data in natural and laboratory populations of Drosophila melanogaster. Mol. Ecol. 23, 18131827 (2014).
  42. Kofler, R., Betancourt, A. J. & Schlötterer, C. Sequencing of pooled DNA samples (Pool-seq) uncovers complex dynamics of transposable element insertions in Drosophila melanogaster. PLoS Genet. 8, e1002487 (2012).
    This study is the first to infer TE insertion sites and the population frequency of TE insertions from Pool-seq data.
  43. Sax, K. The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8, 552560 (1923).
  44. Schneeberger, K. et al. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nature Methods 6, 550551 (2009).
    This paper is the first to show that Pool-seq can be used to map induced mutations.
  45. Schneeberger, K. Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nature Rev. Genet. 15, 662676 (2014).
  46. Hill, J. T. et al. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq. Genome Res. 23, 687697 (2013).
  47. Miller, A. C., Obholzer, N. D., Shah, A. N., Megason, S. G. & Moens, C. B. RNA-seq-based mapping and candidate identification of mutations from forward genetic screens. Genome Res. 23, 679686 (2013).
  48. Galvao, V. C. et al. Synteny-based mapping-by-sequencing enabled by targeted enrichment. Plant J. 71, 517526 (2012).
  49. Ehrenreich, I. M. et al. Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 10391042 (2010).
    This study provides proof that Pool-seq provides enough power to map complex traits.
  50. Wenger, J. W., Schwartz, K. & Sherlock, G. Bulk segregant analysis by high-throughput sequencing reveals a novel xylose utilization gene from Saccharomyces cerevisiae. PLoS Genet. 6, e1000942 (2010).
  51. Swinnen, S. et al. Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. 22, 975984 (2012).
  52. Wade, M. J. Epistasis, complex traits, and mapping genes. Genetica 112113, 5969 (2001).
  53. Earley, E. J. & Jones, C. D. Next-generation mapping of complex traits with phenotype-based selection and introgression. Genetics 189, 12031209 (2011).
  54. Bastide, H. et al. A genome-wide, fine-scale map of natural pigmentation variation in Drosophila melanogaster. PLoS Genet. 9, e1003534 (2013).
    This papershows that Pool-seq allows highly accurate fine mapping using natural population samples.
  55. Jeong, S. et al. The evolution of gene regulation underlies a morphological difference between two Drosophila sister species. Cell 132, 783793 (2008).
  56. Kelly, J. K., Koseva, B. & Mojica, J. P. The genomic signal of partial sweeps in Mimulus guttatus. Genome Biol. Evol. 5, 14571469 (2013).
  57. Beissinger, T. M. et al. A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number. Genetics 196, 829840 (2014).
  58. Johansson, A. M., Pettersson, M. E., Siegel, P. B. & Carlborg, O. Genome-wide effects of long-term divergent selection. PLoS Genet. 6, e1001188 (2010).
  59. Rubin, C. J. et al. Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587591 (2010).
    This is a particularly nice demonstration of the power of Pool-seq to detect selected loci in population samples.
  60. Burke, M. K. et al. Genome-wide analysis of a long-term evolution experiment with Drosophila. Nature 467, 587590 (2010).
    The is the first experimental evolution study measuring allele frequency changes using Pool-seq.
  61. Remolina, S. C., Chang, P. L., Leips, J., Nuzhdin, S. V. & Hughes, K. A. Genomic basis of aging and life-history evolution in Drosophila melanogaster. Evolution 66, 33903403 (2012).
  62. Turner, T. L., Stewart, A. D., Fields, A. T., Rice, W. R. & Tarone, A. M. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster. PLoS Genet. 7, e1001336 (2011).
  63. Zhou, D. et al. Experimental selection of hypoxia-tolerant Drosophila melanogaster. Proc. Natl Acad. Sci. USA 108, 23492354 (2011).
  64. Turner, T. L. & Miller, P. M. Investigating natural variation in Drosophila courtship song by the evolve and resequence approach. Genetics 191, 633642 (2012).
  65. Tobler, R. et al. Massive habitat-specific genomic response in D. melanogaster populations during experimental evolution in hot and cold environments. Mol. Biol. Evol. 31, 364375 (2013).
  66. Orozco-terWengel, P. et al. Adaptation of Drosophila to a novel laboratory environment reveals temporally heterogeneous trajectories of selected alleles. Mol. Ecol. 21, 49314941 (2012).
  67. Reed, L. K. et al. Systems genomics of metabolic phenotypes in wild-type Drosophila melanogaster. Genetics 197, 781793 (2014).
  68. Martins, N. et al. Host adaptation to viruses relies on few genes with different cross-resistance properties. Proc. Natl Acad. Sci. USA 111, 59385943 (2014).
  69. Jalvingh, K. M., Chang, P. L., Nuzhdin, S. V. & Wertheim, B. Genomic changes under rapid evolution: selection for parasitoid resistance. Proc. Biol. Sci. 281, 20132303 (2014).
  70. Magwire, M. M. et al. Genome-wide association studies reveal a simple genetic basis of resistance to naturally coevolving viruses in Drosophila melanogaster. PLoS Genet. 8, e1003057 (2012).
  71. Turner, T. L., Bourne, E. C., Von Wettberg, E. J., Hu, T. T. & Nuzhdin, S. V. Population resequencing reveals local adaptation of Arabidopsis lyrata to serpentine soils. Nature Genet. 42, 260263 (2010).
    The study is the first to show that ecologically important traits can be mapped with Pool-seq by comparing two functionally diverged populations.
  72. Lamichhaney, S. et al. Population-scale sequencing reveals genetic differentiation due to local adaptation in Atlantic herring. Proc. Natl Acad. Sci. USA 109, 1934519350 (2012).
  73. Fabian, D. K. et al. Genome-wide patterns of latitudinal differentiation among populations of Drosophila melanogaster from North America. Mol. Ecol. 21, 47484769 (2012).
  74. Kolaczkowski, B., Kern, A. D., Holloway, A. K. & Begun, D. J. Genomic differentiation between temperate and tropical Australian populations of Drosophila melanogaster. Genetics 187, 245260 (2011).
  75. Cheng, C. et al. Ecological genomics of Anopheles gambiae along a latitudinal cline: a population-resequencing approach. Genetics 190, 14171432 (2012).
  76. Hancock, A. M. et al. Adaptations to climate in candidate genes for common metabolic disorders. PLoS Genet. 4, e32 (2008).
  77. Hancock, A. M. et al. Adaptation to climate across the Arabidopsis thaliana genome. Science 334, 8386 (2011).
  78. Fischer, M. C. et al. Population genomic footprints of selection and associations with climate in natural populations of Arabidopsis halleri from the Alps. Mol. Ecol. 22, 55945607 (2013).
    This is a nice application of Pool-seq to find selected loci in a non-model organism.
  79. Günther, T. & Coop, G. Robust identification of local adaptation from allele frequencies. Genetics 195, 205220 (2013).
    This paper presents the first statistical framework to identify significant associations of a given locus with one or more environmental variables using Pool-seq data.
  80. Rubin, C. J. et al. Strong signatures of selection in the domestic pig genome. Proc. Natl Acad. Sci. USA 109, 1952919536 (2012).
  81. Axelsson, E. et al. The genomic signature of dog domestication reveals adaptation to a starch-rich diet. Nature 495, 360364 (2013).
  82. He, Z. et al. Two evolutionary histories in the genome of rice: the roles of domestication genes. PLoS Genet. 7, e1002100 (2011).
  83. Nolte, V., Pandey, R. V., Kofler, R. & Schlötterer, C. Genome-wide patterns of natural variation reveal strong selective sweeps and ongoing genomic conflict in Drosophila mauritiana. Genome Res. 23, 99110 (2013).
  84. True, J. R., Mercer, J. M. & Laurie, C. C. Differences in crossover frequency and distribution among three sibling species of Drosophila. Genetics 142, 507523 (1996).
  85. Casacuberta, E. & Gonzalez, J. The impact of transposable elements in environmental adaptation. Mol. Ecol. 22, 15031517 (2013).
  86. Kazazian, H. H. Jr Mobile elements: drivers of genome evolution. Science 303, 16261632 (2004).
  87. Boitard, S., Schlötterer, C., Nolte, V., Pandey, R. V. & Futschik, A. Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 21772186 (2012).
  88. Clément, J. A. et al. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains. PLoS Negl Trop. Dis. 7, e2591 (2013).
  89. Foll, M. et al. Influenza virus drug resistance: a time-sampled population genetics perspective. PLoS Genet. 10, e1004185 (2014).
  90. Lang, G. I. et al. Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations. Nature 500, 571574 (2013).
  91. Barrick, J. E. & Lenski, R. E. Genome-wide mutational diversity in an evolving population of Escherichia coli. Cold Spring Harb. Symp. Quant. Biol. 74, 119129 (2009).
  92. Kvitek, D. J. & Sherlock, G. Whole genome, whole population sequencing reveals that loss of signaling networks is the major adaptive strategy in a constant environment. PLoS Genet. 9, e1003972 (2013).
  93. Parts, L. et al. Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res. 21, 11311138 (2011).
  94. Illingworth, C. J., Parts, L., Schiffels, S., Liti, G. & Mustonen, V. Quantifying selection acting on a complex trait using allele frequency time series data. Mol. Biol. Evol. 29, 11871197 (2012).
  95. Bergland, A. O., Behrman, E. L., O'Brien, K. R., Schmidt, P. S. & Petrov, D. A. Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila. arXiv 1303.5044 (2014).
  96. Traverse, C. C., Mayo-Smith, L. M., Poltak, S. R. & Cooper, V. S. Tangled bank of experimentally evolved Burkholderia biofilms reflects selection during chronic infections. Proc. Natl Acad. Sci. USA 110, E250E259 (2013).
  97. Versace, E., Nolte, V., Pandey, R. V., Tobler, R. & Schlötterer, C. Experimental evolution reveals habitat-specific fitness dynamics among Wolbachia clades in Drosophila melanogaster. Mol. Ecol. 23, 802814 (2014).
  98. Barcellos-Hoff, M. H., Lyden, D. & Wang, T. C. The evolution of the cancer niche during multistage carcinogenesis. Nature Rev. Cancer 13, 511518 (2013).
  99. Merlo, L. M. F., Pepper, J. W., Reid, B. J. & Maley, C. C. Cancer as an evolutionary and ecological process. Nature Rev. Cancer 6, 924935 (2006).
  100. Ding, L. et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature 481, 506510 (2012).
  101. Newburger, D. E. et al. Genome evolution during progression to breast cancer. Genome Res. 23, 10971108 (2013).
  102. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 9941007 (2012).
  103. Aparicio, S. & Caldas, C. The implications of clonal genome evolution for cancer medicine. New Engl. J. Med. 368, 842851 (2013).
  104. Greaves, M. & Maley, C. C. Clonal evolution in cancer. Nature 481, 306313 (2012).
  105. Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc. Natl Acad. Sci. USA 108, 95309535 (2011).
  106. Long, Q. et al. PoolHap: inferring haplotype frequencies from pooled samples by next generation sequencing. PLoS ONE 6, e15292 (2011).
  107. Kessner, D., Turner, T. L. & Novembre, J. Maximum likelihood estimation of frequencies of known haplotypes from pooled sequence data. Mol. Biol. Evol. 30, 11451158 (2013).
  108. Burke, M. K., King, E. G., Shahrestani, P., Rose, M. R. & Long, A. D. Genome-wide association study of extreme longevity in Drosophila melanogaster. Genome Biol. Evol. 6, 111 (2014).
  109. Eskin, I. et al. eALPS: estimating abundance levels in pooled sequencing using available genotyping data. J. Computat. Biol. 20, 861877 (2013).
  110. Kofler, R. & Schlötterer, C. A guide for the design of evolve and resequencing studies. Mol. Biol. Evol. 31, 474483 (2014).
  111. Imsland, F. et al. The Rose-comb mutation in chickens constitutes a structural rearrangement causing both altered comb morphology and defective sperm motility. Plos Genetics 8, e1002775 (2012).
  112. Del Fabbro, C., Scalabrin, S., Morgante, M. & Giorgi, F. M. An extensive evaluation of read trimming effects on Illumina NGS data analysis. PLoS ONE 8, e85024 (2013).
  113. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 1012 (2011).
  114. Nevado, B., Ramos-Onsins, S. E. & Perez-Enciso, M. Resequencing studies of nonmodel organisms using closely related reference genomes: optimal experimental designs and bioinformatics approaches for population genomics. Mol. Ecol. 23, 17641779 (2014).
  115. Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 32073212 (2009).
  116. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357359 (2012).
  117. Albers, C. A. et al. Dindel: accurate indel calls from short-read data. Genome Res. 21, 961973 (2011).
  118. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010).
  119. Koboldt, D. C. et al. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 22832285 (2009).
  120. Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinformatics 13, 239 (2012).
  121. Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 26, i318i324 (2010).
  122. Altmann, A. et al. vipR: variant identification in pooled DNA using R. Bioinformatics 27, I77I84 (2011).
  123. Zhou, B. Y. An empirical Bayes mixture model for SNP detection in pooled sequencing data. Bioinformatics 28, 25692575 (2012).
  124. Chen, Q. & Sun, F. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics 14 (Suppl. 1), S1 (2013).
  125. Druley, T. E. et al. Quantification of rare allelic variants from pooled genomic DNA. Nature Methods 6, 263265 (2009).
  126. Vallania, F. L. et al. High-throughput discovery of rare insertions and deletions in large cohorts. Genome Res. 20, 17111718 (2010).
  127. Wei, Z., Wang, W., Hu, P., Lyon, G. J. & Hakonarson, H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res. 39, e132 (2011).
  128. Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv 1207.3907 (2012).
  129. Calvo, S. E. et al. High-throughput, pooled sequencing identifies mutations in NUBPL and FOXRED1 in human complex I deficiency. Nature Genet. 42, 851858 (2010).
  130. Fiston-Lavier, A.-S., Barron, M. G., Petrov, D. A. & González, J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. bioRxiv http://dx.doi.org/10.1101/002964 (2014).
  131. Zhuang, J., Wang, J., Theurkauf, W. & Weng, Z. TEMP: a computational method for analyzing transposable element polymorphism in populations. Nucleic Acids Res. 42, 68266838 (2014).
  132. Kofler, R., Pandey, R. V. & Schlötterer, C. PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-seq). Bioinformatics 27, 34353436 (2011).
  133. Boitard, S. et al. Pool-HMM: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples. Mol. Ecol. Resour. 13, 337340 (2013).
  134. Ferretti, L., Ramos-Onsins, S. E. & Perez-Enciso, M. Population genomics from pool sequencing. Mol. Ecol. 22, 55615576 (2013).
  135. Catchen, J., Hohenlohe, P. A., Bassham, S., Amores, A. & Cresko, W. A. Stacks: an analysis tool set for population genomics. Mol. Ecol. 22, 31243140 (2013).
  136. Vitalis, R., Gautier, M., Dawson, K. J. & Beaumont, M. A. Detecting and measuring selection from gene frequency data. Genetics 196, 799817 (2014).
  137. Gautier, M. & Vitalis, R. Inferring population histories using genome-wide allele frequency data. Mol. Biol. Evol. 30, 654668 (2013).
  138. Feder, A. F., Petrov, D. A. & Bergland, A. O. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS ONE 7, e48588 (2012).
  139. Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J. & Hobert, O. CloudMap: a cloud-based pipeline for analysis of mutant genome sequences. Genetics 192, 12491269 (2012).
  140. Edwards, M. D. & Gifford, D. K. High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (Suppl. 6), S8 (2012).
  141. Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L. & Harris, M. P. Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, 10171024 (2012).
  142. Austin, R. S. et al. Next-generation mapping of Arabidopsis genes. Plant J. 67, 715725 (2011).
  143. Leshchiner, I. et al. Mutation mapping and identification by whole-genome sequencing. Genome Res. 22, 15411548 (2012).
  144. Prosperi, M. C. & Salemi, M. QuRe: software for viral quasispecies reconstruction from next-generation sequencing data. Bioinformatics 28, 132133 (2012).
  145. Zagordi, O., Bhattacharya, A., Eriksson, N. & Beerenwinkel, N. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 (2011).
  146. Eyre, D. W. et al. Detection of mixed infection from bacterial whole genome sequence data allows assessment of its role in Clostridium difficile transmission. PLoS Comput. Biol. 9, e1003059 (2013).
  147. Astrovskaya, I. et al. Inferring viral quasispecies spectra from 454 pyrosequencing reads. BMC Bioinformatics 12 (Suppl. 6), S1 (2011).
  148. Yang, X., Charlebois, P., Macalalad, A., Henn, M. R. & Zody, M. C. V-Phaser 2: variant inference for viral populations. BMC Genomics 14, 674 (2013).
  149. Töpfer, A. et al. Viral quasispecies assembly via maximal clique enumeration. PLoS Comput. Biol. 10, e1003515 (2014).
  150. Töpfer, A. et al. Probabilistic inference of viral quasispecies subject to recombination. J. Comput. Biol. 20, 113123 (2013).
  151. Prabhakaran, S., Rey, M., Zagordi, O., Beerenwinkel, N. & Roth, V. HIV haplotype inference using a constraint-based Dirichlet process mixture model. Machine Learning in Computational Biology NIPS Workshop (2010).
  152. Pandey, R. V., Kofler, R., Orozco-terWengel, P., Nolte, V. & Schlötterer, C. PoPoolation DB: a user-friendly web-based database for the retrieval of natural polymorphisms in Drosophila. BMC Genet. 12, 27 (2011).
  153. Chen, X., Listman, J. B., Slack, F. J., Gelernter, J. & Zhao, H. Biases and errors on allele frequency estimation and disease association tests of next-generation sequencing of pooled samples. Genet. Epidemiol. 36, 549560 (2012).
  154. Roth, A. et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 11, 396398 (2014).

Download references

Author information

Affiliations

  1. Institut für Populationsgenetik, Vetmeduni Vienna, Veterinärplatz 1, 1210 Vienna, Austria.

    • Christian Schlötterer,
    • Raymond Tobler,
    • Robert Kofler &
    • Viola Nolte
  2. Vienna Graduate School of Population Genetics.

    • Raymond Tobler

Competing interests statement

The authors declare no competing interests.

Corresponding author

Correspondence to:

Author details

  • Christian Schlötterer

    Christian Schlötterer received his Ph.D. in Genetics at the Ludwig Maximilian University Munich, Germany. Following postdoctoral work at the Ludwig Maximilian University Munich and the University of Chicago, Illinois, USA, he joined the faculty of the Vetmeduni Vienna, Austria, where he heads the Institute of Population Genetics. The main focus of his laboratory is on the genetic basis of adaptation to local environments. Christian Schlötterer's homepage.

  • Raymond Tobler

    Ray Tobler is currently finishing his Ph.D. at the Vetmeduni Vienna, Austria. He is interested in how populations adapt to stressful environments. By combining experimental evolution and next-generation sequencing, he studies adaptive genomic changes in Drosophila melanogaster populations exposed to novel thermal environments.

  • Robert Kofler

    Robert Kofler is a senior bioinformatics postdoctoral researcher at the Institute of Population Genetics, Vetmeduni Vienna, Austria. He is interested in the evolutionary dynamics of transposable elements and the trajectories of beneficial alleles in experimentally evolving populations. He has pioneered the data analysis of whole-genome sequencing of pools of individuals (Pool-seq) and developed several software solutions for such data.

  • Viola Nolte

    Viola Nolte studied Biology at the Leibniz Universität Hannover, Germany. She has a broad range of research interests, ranging from molecular evolution to experimental population genetics.

Additional data