Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

WASP: allele-specific software for robust molecular quantitative trait locus discovery

Abstract

Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mapping of allele-specific reads.
Figure 2: The combined haplotype test and its performance.

Similar content being viewed by others

References

  1. Degner, J.F. et al. Nature 482, 390–394 (2012).

    Article  CAS  Google Scholar 

  2. Montgomery, S.B. et al. Nature 464, 773–777 (2010).

    Article  CAS  Google Scholar 

  3. Pickrell, J.K. et al. Nature 464, 768 (2010).

    Article  CAS  Google Scholar 

  4. Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J. & Akey, J.M. Genome Res. 21, 1728–1737 (2011).

    Article  CAS  Google Scholar 

  5. Harvey, C.T. et al. Bioinformatics 31, 1235–1242 (2015).

    Article  Google Scholar 

  6. Sun, W. Biometrics 68, 1–11 (2012).

    Article  Google Scholar 

  7. Degner, J.F. et al. Bioinformatics 25, 3207–3212 (2009).

    Article  CAS  Google Scholar 

  8. Panousis, N.I., Gutierrez-Arcelus, M., Dermitzakis, E.T. & Lappalainen, T. Genome Biol. 15, 467 (2014).

    Article  Google Scholar 

  9. Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).

    Article  CAS  Google Scholar 

  10. Rozowsky, J. et al. Mol. Syst. Biol. 7, 522 (2011).

    Article  Google Scholar 

  11. Liu, Z. et al. Genet. Epidemiol. 38, 591–598 (2014).

    Article  Google Scholar 

  12. Roberts, A. & Pachter, L. Nat. Methods 10, 71–73 (2013).

    Article  CAS  Google Scholar 

  13. Turro, E. et al. Genome Biol. 12, R13 (2011).

    Article  CAS  Google Scholar 

  14. Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  15. Benjamini, Y. & Speed, T.P. Nucleic Acids Res. 40, e72 (2012).

    Article  CAS  Google Scholar 

  16. McVicker, G. et al. Science 342, 747–749 (2013).

    Article  CAS  Google Scholar 

  17. Lappalainen, T. et al. Nature 501, 506–511 (2013).

    Article  CAS  Google Scholar 

  18. Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Nat. Methods 7, 1009–1015 (2010).

    Article  CAS  Google Scholar 

  19. Trapnell, C. et al. Nat. Biotechnol. 31, 46–53 (2013).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank members of the Liu, Pritchard, Stephens and Gilad labs for helpful discussions. We thank X.S. Liu's lab for hosting G.M. as a visitor in the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute while this work was conducted. We thank many early users of WASP, particularly C. DeBoever, who contributed bug fixes and code improvements. This work was supported by the Howard Hughes Medical Institute, the US National Institutes of Health (NIH grants HG007036, HG006123, MH101825 and GM007197) and the US National Science Foundation (NSF Graduate Research Fellowship DGE-0638477 to B.v.d.G.).

Author information

Authors and Affiliations

Authors

Contributions

B.v.d.G., G.M., J.K.P. and Y.G. conceived of the project. B.v.d.G. and G.M. performed the analyses and implemented the software. G.M. and B.v.d.G. wrote the manuscript with input from all authors. J.K.P. and Y.G. directed the project.

Corresponding author

Correspondence to Jonathan K Pritchard.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 WASP mapping errors at heterozygous sites.

WASP mapping errors at heterozygous sites as a function of the rate of unknown single-nucleotide variants (SNVs).

Supplementary Figure 2 Quantile-quantile plots of ranked –log10 P values from the combined haplotype test.

(a) Ranked –log10 P values from running the combined haplotype test on H3K27ac ChIP-seq data from ten lymphoblastoid cell lines compared to P values expected under the null hypothesis. The permuted points are for same data set, but with the genotypes of each SNP shuffled. (b) Ranked –log10 P values from running the combined haplotype test on RNA-seq data from 69 YRI cell lines. The test was run only on eQTLs that were previously identified in cell lines derived from European individuals1. The permuted points are for the same data set, but with the genotypes of each SNP shuffled. 1. Lappalainen, T. et al. Nature 501, 506–511 (2013).

Supplementary Figure 3 Receiver operating characteristic (ROC) curves showing the performance of five methods for QTL identification on simulated data.

Performance for different numbers of individuals and effect sizes. The simulations are described in Supplementary Note 6.

Supplementary Figure 4 The WASP mapping pipeline.

Reads are first mapped to the genome using a mapping tool of the user’s choice. The aligned reads are provided to WASP in SAM (sequence alignment/map) or BAM (binary alignment/map) format, along with a list of known polymorphisms. WASP identifies reads that overlap known polymorphisms, flips the alleles in the reads, and remaps them to the genome. Reads that map to a different location than the original read are then discarded. Finally, WASP can optionally remove reads that map to the same genomic location (‘duplicate reads’) without introducing a reference bias.

Supplementary Figure 5 The WASP combined haplotype test pipeline.

Mapped reads (in BAM or SAM format) for each individual, genotypes for known SNPs, and a list of regions and SNPs to test are provided to WASP. WASP extracts read counts for the target regions as well as allele-specific read counts. Read counts from multiple sources can be used to update heterozygous probabilities. Expected read counts for each region are adjusted through modeling of the relationships between read counts and GC content and between read counts and total read counts for each sample. Dispersion parameters are estimated from the data and provided to the combined haplotype test along with the read counts. Principal components can optionally be used as covariates by the test.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Table 1 and Supplementary Notes 1–8 (PDF 3390 kb)

Supplementary Software

WASP code and documentation. Updated files are maintained at https://github.com/bmvdgeijn/WASP (ZIP 688 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

van de Geijn, B., McVicker, G., Gilad, Y. et al. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods 12, 1061–1063 (2015). https://doi.org/10.1038/nmeth.3582

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3582

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research