Abstract
Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Degner, J.F. et al. Nature 482, 390–394 (2012).
Montgomery, S.B. et al. Nature 464, 773–777 (2010).
Pickrell, J.K. et al. Nature 464, 768 (2010).
Skelly, D.A., Johansson, M., Madeoy, J., Wakefield, J. & Akey, J.M. Genome Res. 21, 1728–1737 (2011).
Harvey, C.T. et al. Bioinformatics 31, 1235–1242 (2015).
Sun, W. Biometrics 68, 1–11 (2012).
Degner, J.F. et al. Bioinformatics 25, 3207–3212 (2009).
Panousis, N.I., Gutierrez-Arcelus, M., Dermitzakis, E.T. & Lappalainen, T. Genome Biol. 15, 467 (2014).
Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).
Rozowsky, J. et al. Mol. Syst. Biol. 7, 522 (2011).
Liu, Z. et al. Genet. Epidemiol. 38, 591–598 (2014).
Roberts, A. & Pachter, L. Nat. Methods 10, 71–73 (2013).
Turro, E. et al. Genome Biol. 12, R13 (2011).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Benjamini, Y. & Speed, T.P. Nucleic Acids Res. 40, e72 (2012).
McVicker, G. et al. Science 342, 747–749 (2013).
Lappalainen, T. et al. Nature 501, 506–511 (2013).
Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Nat. Methods 7, 1009–1015 (2010).
Trapnell, C. et al. Nat. Biotechnol. 31, 46–53 (2013).
Acknowledgements
We thank members of the Liu, Pritchard, Stephens and Gilad labs for helpful discussions. We thank X.S. Liu's lab for hosting G.M. as a visitor in the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute while this work was conducted. We thank many early users of WASP, particularly C. DeBoever, who contributed bug fixes and code improvements. This work was supported by the Howard Hughes Medical Institute, the US National Institutes of Health (NIH grants HG007036, HG006123, MH101825 and GM007197) and the US National Science Foundation (NSF Graduate Research Fellowship DGE-0638477 to B.v.d.G.).
Author information
Authors and Affiliations
Contributions
B.v.d.G., G.M., J.K.P. and Y.G. conceived of the project. B.v.d.G. and G.M. performed the analyses and implemented the software. G.M. and B.v.d.G. wrote the manuscript with input from all authors. J.K.P. and Y.G. directed the project.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 WASP mapping errors at heterozygous sites.
WASP mapping errors at heterozygous sites as a function of the rate of unknown single-nucleotide variants (SNVs).
Supplementary Figure 2 Quantile-quantile plots of ranked –log10 P values from the combined haplotype test.
(a) Ranked –log10 P values from running the combined haplotype test on H3K27ac ChIP-seq data from ten lymphoblastoid cell lines compared to P values expected under the null hypothesis. The permuted points are for same data set, but with the genotypes of each SNP shuffled. (b) Ranked –log10 P values from running the combined haplotype test on RNA-seq data from 69 YRI cell lines. The test was run only on eQTLs that were previously identified in cell lines derived from European individuals1. The permuted points are for the same data set, but with the genotypes of each SNP shuffled. 1. Lappalainen, T. et al. Nature 501, 506–511 (2013).
Supplementary Figure 3 Receiver operating characteristic (ROC) curves showing the performance of five methods for QTL identification on simulated data.
Performance for different numbers of individuals and effect sizes. The simulations are described in Supplementary Note 6.
Supplementary Figure 4 The WASP mapping pipeline.
Reads are first mapped to the genome using a mapping tool of the user’s choice. The aligned reads are provided to WASP in SAM (sequence alignment/map) or BAM (binary alignment/map) format, along with a list of known polymorphisms. WASP identifies reads that overlap known polymorphisms, flips the alleles in the reads, and remaps them to the genome. Reads that map to a different location than the original read are then discarded. Finally, WASP can optionally remove reads that map to the same genomic location (‘duplicate reads’) without introducing a reference bias.
Supplementary Figure 5 The WASP combined haplotype test pipeline.
Mapped reads (in BAM or SAM format) for each individual, genotypes for known SNPs, and a list of regions and SNPs to test are provided to WASP. WASP extracts read counts for the target regions as well as allele-specific read counts. Read counts from multiple sources can be used to update heterozygous probabilities. Expected read counts for each region are adjusted through modeling of the relationships between read counts and GC content and between read counts and total read counts for each sample. Dispersion parameters are estimated from the data and provided to the combined haplotype test along with the read counts. Principal components can optionally be used as covariates by the test.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5, Supplementary Table 1 and Supplementary Notes 1–8 (PDF 3390 kb)
Supplementary Software
WASP code and documentation. Updated files are maintained at https://github.com/bmvdgeijn/WASP (ZIP 688 kb)
Rights and permissions
About this article
Cite this article
van de Geijn, B., McVicker, G., Gilad, Y. et al. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods 12, 1061–1063 (2015). https://doi.org/10.1038/nmeth.3582
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3582
This article is cited by
-
Molecular quantitative trait loci in reproductive tissues impact male fertility in cattle
Nature Communications (2024)
-
Haplotype-aware modeling of cis-regulatory effects highlights the gaps remaining in eQTL data
Nature Communications (2024)
-
Single-cell multiomics of the human retina reveals hierarchical transcription factor collaboration in mediating cell type-specific effects of genetic variants on gene regulation
Genome Biology (2023)
-
Accounting for cis-regulatory constraint prioritizes genes likely to affect species-specific traits
Genome Biology (2023)
-
Haplotype-aware pantranscriptome analyses using spliced pangenome graphs
Nature Methods (2023)