Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.
At a glance
- Nature 482, 390–394 (2012). et al.
- Nature 464, 773–777 (2010). et al.
- Nature 464, 768 (2010). et al.
- Genome Res. 21, 1728–1737 (2011). , , , &
- Bioinformatics 31, 1235–1242 (2015). et al.
- Biometrics 68, 1–11 (2012).
- Bioinformatics 25, 3207–3212 (2009). et al.
- Genome Biol. 15, 467 (2014). , , &
- Genome Biol. 11, R106 (2010). &
- Mol. Syst. Biol. 7, 522 (2011). et al.
- Genet. Epidemiol. 38, 591–598 (2014). et al.
- Nat. Methods 10, 71–73 (2013). &
- Genome Biol. 12, R13 (2011). et al.
- Bioinformatics 25, 2078–2079 (2009). et al.
- Nucleic Acids Res. 40, e72 (2012). &
- Science 342, 747–749 (2013). et al.
- Nature 501, 506–511 (2013). et al.
- Nat. Methods 7, 1009–1015 (2010). , , &
- Nat. Biotechnol. 31, 46–53 (2013). et al.
- Supplementary Figure 1: WASP mapping errors at heterozygous sites. (95 KB)
WASP mapping errors at heterozygous sites as a function of the rate of unknown single-nucleotide variants (SNVs).
- Supplementary Figure 2: Quantile-quantile plots of ranked –log10 P values from the combined haplotype test. (99 KB)
(a) Ranked –log10 P values from running the combined haplotype test on H3K27ac ChIP-seq data from ten lymphoblastoid cell lines compared to P values expected under the null hypothesis. The permuted points are for same data set, but with the genotypes of each SNP shuffled. (b) Ranked –log10 P values from running the combined haplotype test on RNA-seq data from 69 YRI cell lines. The test was run only on eQTLs that were previously identified in cell lines derived from European individuals1. The permuted points are for the same data set, but with the genotypes of each SNP shuffled.
1. Lappalainen, T. et al. Nature 501, 506–511 (2013).
- Supplementary Figure 3: Receiver operating characteristic (ROC) curves showing the performance of five methods for QTL identification on simulated data. (263 KB)
Performance for different numbers of individuals and effect sizes. The simulations are described in Supplementary Note 6.
- Supplementary Figure 4: The WASP mapping pipeline. (119 KB)
Reads are first mapped to the genome using a mapping tool of the user’s choice. The aligned reads are provided to WASP in SAM (sequence alignment/map) or BAM (binary alignment/map) format, along with a list of known polymorphisms. WASP identifies reads that overlap known polymorphisms, flips the alleles in the reads, and remaps them to the genome. Reads that map to a different location than the original read are then discarded. Finally, WASP can optionally remove reads that map to the same genomic location (‘duplicate reads’) without introducing a reference bias.
- Supplementary Figure 5: The WASP combined haplotype test pipeline. (122 KB)
Mapped reads (in BAM or SAM format) for each individual, genotypes for known SNPs, and a list of regions and SNPs to test are provided to WASP. WASP extracts read counts for the target regions as well as allele-specific read counts. Read counts from multiple sources can be used to update heterozygous probabilities. Expected read counts for each region are adjusted through modeling of the relationships between read counts and GC content and between read counts and total read counts for each sample. Dispersion parameters are estimated from the data and provided to the combined haplotype test along with the read counts. Principal components can optionally be used as covariates by the test.
- Supplementary Text and Figures (3,472 KB)
Supplementary Figures 1–5, Supplementary Table 1 and Supplementary Notes 1–8