WASP: allele-specific software for robust molecular quantitative trait locus discovery

van de Geijn, Bryce; McVicker, Graham; Gilad, Yoav; Pritchard, Jonathan K

doi:10.1038/nmeth.3582

Brief Communication
Published: 14 September 2015

WASP: allele-specific software for robust molecular quantitative trait locus discovery

Bryce van de Geijn^1,2^na1,
Graham McVicker³^na1,
Yoav Gilad¹ &
…
Jonathan K Pritchard^3,4,5

Nature Methods volume 12, pages 1061–1063 (2015)Cite this article

15k Accesses
247 Citations
59 Altmetric
Metrics details

Subjects

Abstract

Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Mapping of allele-specific reads.**

**Figure 2: The combined haplotype test and its performance.**

A scalable unified framework of total and allele-specific counts for cis-QTL, fine-mapping, and prediction

Article Open access 03 March 2021

Towards population-scale long-read sequencing

Article 28 May 2021

Rapid genotype imputation from sequence with reference panels

Article 03 June 2021

References

Degner, J.F. et al. Nature 482, 390–394 (2012).
Article CAS Google Scholar
Montgomery, S.B. et al. Nature 464, 773–777 (2010).
Article CAS Google Scholar
Pickrell, J.K. et al. Nature 464, 768 (2010).
Article CAS Google Scholar
Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J. & Akey, J.M. Genome Res. 21, 1728–1737 (2011).
Article CAS Google Scholar
Harvey, C.T. et al. Bioinformatics 31, 1235–1242 (2015).
Article Google Scholar
Sun, W. Biometrics 68, 1–11 (2012).
Article Google Scholar
Degner, J.F. et al. Bioinformatics 25, 3207–3212 (2009).
Article CAS Google Scholar
Panousis, N.I., Gutierrez-Arcelus, M., Dermitzakis, E.T. & Lappalainen, T. Genome Biol. 15, 467 (2014).
Article Google Scholar
Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).
Article CAS Google Scholar
Rozowsky, J. et al. Mol. Syst. Biol. 7, 522 (2011).
Article Google Scholar
Liu, Z. et al. Genet. Epidemiol. 38, 591–598 (2014).
Article Google Scholar
Roberts, A. & Pachter, L. Nat. Methods 10, 71–73 (2013).
Article CAS Google Scholar
Turro, E. et al. Genome Biol. 12, R13 (2011).
Article CAS Google Scholar
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Benjamini, Y. & Speed, T.P. Nucleic Acids Res. 40, e72 (2012).
Article CAS Google Scholar
McVicker, G. et al. Science 342, 747–749 (2013).
Article CAS Google Scholar
Lappalainen, T. et al. Nature 501, 506–511 (2013).
Article CAS Google Scholar
Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Nat. Methods 7, 1009–1015 (2010).
Article CAS Google Scholar
Trapnell, C. et al. Nat. Biotechnol. 31, 46–53 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

We thank members of the Liu, Pritchard, Stephens and Gilad labs for helpful discussions. We thank X.S. Liu's lab for hosting G.M. as a visitor in the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute while this work was conducted. We thank many early users of WASP, particularly C. DeBoever, who contributed bug fixes and code improvements. This work was supported by the Howard Hughes Medical Institute, the US National Institutes of Health (NIH grants HG007036, HG006123, MH101825 and GM007197) and the US National Science Foundation (NSF Graduate Research Fellowship DGE-0638477 to B.v.d.G.).

Author information

Bryce van de Geijn and Graham McVicker: These authors contributed equally to this work.

Authors and Affiliations

Department of Human Genetics, University of Chicago, Chicago, Illinois, USA
Bryce van de Geijn & Yoav Gilad
Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, Illinois, USA
Bryce van de Geijn
Department of Genetics, Stanford University, Stanford, California, USA
Graham McVicker & Jonathan K Pritchard
Department of Biology, Stanford University, Stanford, California, USA
Jonathan K Pritchard
Howard Hughes Medical Institute, Stanford University, Stanford, California, USA
Jonathan K Pritchard

Authors

Bryce van de Geijn
View author publications
You can also search for this author in PubMed Google Scholar
Graham McVicker
View author publications
You can also search for this author in PubMed Google Scholar
Yoav Gilad
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan K Pritchard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.v.d.G., G.M., J.K.P. and Y.G. conceived of the project. B.v.d.G. and G.M. performed the analyses and implemented the software. G.M. and B.v.d.G. wrote the manuscript with input from all authors. J.K.P. and Y.G. directed the project.

Corresponding author

Correspondence to Jonathan K Pritchard.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 WASP mapping errors at heterozygous sites.

WASP mapping errors at heterozygous sites as a function of the rate of unknown single-nucleotide variants (SNVs).

Supplementary Figure 2 Quantile-quantile plots of ranked –log₁₀ P values from the combined haplotype test.

(a) Ranked –log₁₀ P values from running the combined haplotype test on H3K27ac ChIP-seq data from ten lymphoblastoid cell lines compared to P values expected under the null hypothesis. The permuted points are for same data set, but with the genotypes of each SNP shuffled. (b) Ranked –log₁₀ P values from running the combined haplotype test on RNA-seq data from 69 YRI cell lines. The test was run only on eQTLs that were previously identified in cell lines derived from European individuals¹. The permuted points are for the same data set, but with the genotypes of each SNP shuffled. 1. Lappalainen, T. et al. Nature 501, 506–511 (2013).

Supplementary Figure 3 Receiver operating characteristic (ROC) curves showing the performance of five methods for QTL identification on simulated data.

Performance for different numbers of individuals and effect sizes. The simulations are described in Supplementary Note 6.

Supplementary Figure 4 The WASP mapping pipeline.

Reads are first mapped to the genome using a mapping tool of the user’s choice. The aligned reads are provided to WASP in SAM (sequence alignment/map) or BAM (binary alignment/map) format, along with a list of known polymorphisms. WASP identifies reads that overlap known polymorphisms, flips the alleles in the reads, and remaps them to the genome. Reads that map to a different location than the original read are then discarded. Finally, WASP can optionally remove reads that map to the same genomic location (‘duplicate reads’) without introducing a reference bias.

Supplementary Figure 5 The WASP combined haplotype test pipeline.

Mapped reads (in BAM or SAM format) for each individual, genotypes for known SNPs, and a list of regions and SNPs to test are provided to WASP. WASP extracts read counts for the target regions as well as allele-specific read counts. Read counts from multiple sources can be used to update heterozygous probabilities. Expected read counts for each region are adjusted through modeling of the relationships between read counts and GC content and between read counts and total read counts for each sample. Dispersion parameters are estimated from the data and provided to the combined haplotype test along with the read counts. Principal components can optionally be used as covariates by the test.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Table 1 and Supplementary Notes 1–8 (PDF 3390 kb)

Supplementary Software

WASP code and documentation. Updated files are maintained at https://github.com/bmvdgeijn/WASP (ZIP 688 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

van de Geijn, B., McVicker, G., Gilad, Y. et al. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods 12, 1061–1063 (2015). https://doi.org/10.1038/nmeth.3582

Download citation

Received: 23 June 2015
Accepted: 13 August 2015
Published: 14 September 2015
Issue Date: November 2015
DOI: https://doi.org/10.1038/nmeth.3582

This article is cited by

Molecular quantitative trait loci in reproductive tissues impact male fertility in cattle
- Xena Marie Mapel
- Naveen Kumar Kadri
- Hubert Pausch
Nature Communications (2024)
Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data
- Nava Ehsan
- Bence M. Kotis
- Pejman Mohammadi
Nature Communications (2024)
Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation
- Jun Wang
- Xuesen Cheng
- Rui Chen
Genome Biology (2023)
Allele-specific binding (ASB) analyzer for annotation of allele-specific binding SNPs
- Ying Li
- Xiao-Ou Zhang
- Aiping Lu
BMC Bioinformatics (2023)
SEESAW: detecting isoform-level allelic imbalance accounting for inferential uncertainty
- Euphy Y. Wu
- Noor P. Singh
- Michael I. Love
Genome Biology (2023)