Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Detection of quantitative trait loci from RNA-seq data with or without genotypes using BaseQTL

A preprint version of the article is available at bioRxiv.

Abstract

Detecting genetic variants associated with traits (quantitative trait loci, QTL) requires genotyped study individuals. Here we describe BaseQTL, a Bayesian method that exploits allele-specific expression to map molecular QTL from sequencing reads (eQTL for gene expression) even when no genotypes are available. When used with genotypes to map eQTL, BaseQTL has lower error rates and increased power compared with existing QTL mapping methods. Running without genotypes limits how many tests can be performed, but due to the proximity of QTL variants to gene bodies, the 2.8% of variants within a 100 kB window that could be tested contained 26% of eQTL detectable with genotypes. eQTL effect estimates were invariably consistent between analyses performed with and without genotypes. Often, sequencing data may be generated in the absence of genotypes on patients and controls in differential expression studies, and we identified an apparent psoriasis-specific eQTL for GSTP1 in one such dataset, providing new insights into disease-dependent gene regulation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic representation of BaseQTL.
Fig. 2: Reference mapping bias correction.
Fig. 3: Benchmarking BaseQTL with observed genotypes.
Fig. 4: eQTL effects estimated with BaseQTL with observed or hidden genotypes.
Fig. 5: eQTLs in skin.
Fig. 6: Disentangling condition-specific eQTLs.

Similar content being viewed by others

Data availability

Geuvadis samples were accessed from E-GEUV-1, ftp://ftp.sra.ebi.ac.uk/vol1/fastq, on 16 April 2017 or 23 January 2018 as indicated in Supplementary Table 1. Psoriasis and normal skin samples were accessed from E-GEOD-54456, ftp://ftp.sra.ebi.ac.uk/vol1/fastq, on 2 November 2018. GTEx associations for skin, blood and lymphoblastic cell lines corresponding to Analysis V7 were downloaded from https://gtexportal.org/home/datasets on 21 June 2019. Differentially regulated genes between psoriasis and normal skin were downloaded from https://ars.els-cdn.com/content/image/1-s2.0-S0022202X15368834-mmc2.xls on the 21 November 2018. We downloaded RNA-seq data from 86 Geuvadis samples with EUR ancestry (GBR code) from ArrayExpress (E-GEUV-1, Supplementary Table 1). We also analyzed 94 and 90 RNA-seq normal and psoriasis skin samples13 obtained from ArrayExpress (E-GEOD-54456). For the analysis of psoriasis eQTL we selected 51 upregulated genes in psoriasis versus normal skin (P ≤ 10−6 corresponding to family-wise error rate <0.025) and with a median expression of at least 500 RPKM in psoriasis samples (data extracted from https://ars.els-cdn.com/content/image/1-s2.0-S0022202X15368834-mmc2.xls13, and/or within 100 kB of a psoriasis GWAS hit 25 (380 genes). Datasets to reproduce figures in this paper were uploaded into Zenodo 41.

Code availability

The source code and documentation for BaseQTL are available at https://gitlab.com/evigorito/baseqtl. We also provide a pipeline to process RNA fastq files and genotypes, if available, to prepare for running BaseQTL at https://gitlab.com/evigorito/baseqtl_pipeline (Supplementary Fig. 12 and Supplementary Section 3). The code to reproduce the figures is available at https://gitlab.com/evigorito/baseqtl_paper. The three repositories have been uploaded to Zenodo41.

References

  1. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  Google Scholar 

  2. Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    Article  Google Scholar 

  3. Guo, H. et al. Integration of disease association and eQTL data using a Bayesian colocalisation approach highlights six candidate causal genes in immune-mediated diseases. Hum. Mol. Genet. 24, 3305–3313 (2015).

    Article  Google Scholar 

  4. Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    Article  Google Scholar 

  5. Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).

    Article  Google Scholar 

  6. Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).

    Article  Google Scholar 

  7. Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).

    Article  Google Scholar 

  8. Wall, J. D. et al. Estimating genotype error rates from high-coverage next-generation sequence data. Genome Res. 24, 1734–1739 (2014).

    Article  Google Scholar 

  9. Peters, J. E. et al. Insight into genotype-phenotype associations through eQTL mapping in multiple cell types in health and immune-mediated disease. PLoS Genet. 12, e1005908 (2016).

    Article  Google Scholar 

  10. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  Google Scholar 

  11. Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32 (2017).

    Article  Google Scholar 

  12. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  Google Scholar 

  13. Li, B. et al. Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms. J. Invest. Dermatol. 134, 1828–1838 (2014).

    Article  Google Scholar 

  14. Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).

    Article  Google Scholar 

  15. van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

    Article  Google Scholar 

  16. Sun, W. A statistical framework for eQTL mapping using RNA-seq data. Biometrics 68, 1–11 (2012).

    Article  MathSciNet  Google Scholar 

  17. Hu, Y.-J., Sun, W., Tzeng, J.-Y. & Perou, C. M. Proper use of allele-specific expression improves statistical power for cis-eQTL mapping with RNA-seq data. J. Am. Stat. Assoc. 110, 962–974 (2015).

    Article  MathSciNet  Google Scholar 

  18. Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).

    Article  Google Scholar 

  19. Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).

    Article  Google Scholar 

  20. Liu, Z. et al. Comparing computational methods for identification of allele-specific expression based on next generation sequencing data. Genet. Epidemiol. 38, 591–598 (2014).

    Article  Google Scholar 

  21. Castel, S. E., Levy-Moonshine, A., Mohammadi, P., Banks, E. & Lappalainen, T. Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015).

    Article  Google Scholar 

  22. Stranger, B. E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007).

    Article  Google Scholar 

  23. Brown, A. A. et al. Predicting causal variants affecting expression by using whole-genome sequencing and RNA-seq from multiple human tissues. Nat. Genet. 49, 1747+ (2017).

    Article  Google Scholar 

  24. Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).

  25. Tsoi, L. C. et al. Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat. Commun. 8, 15382 (2017).

    Article  Google Scholar 

  26. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  27. Ding, J. et al. Gene expression in skin and lymphoblastoid cells: refined statistical method reveals extensive overlap in cis-eQTL signals. Am. J. Hum. Genet. 87, 779–789 (2010).

    Article  Google Scholar 

  28. Gudjonsson, J. E. et al. Assessment of the psoriatic transcriptome in a large sample: additional regulated genes and comparisons with in vitro models. J. Invest. Dermatol. 130, 1829–1840 (2010).

    Article  Google Scholar 

  29. Schalkwijk, J., Chang, A., Janssen, P., De Jongh, G. J. & Mier, P. D. Skin-derived antileucoproteases (SKALPs): characterization of two new elastase inhibitors from psoriatic epidermis. Br. J. Dermatol. 122, 631–641 (1990).

    Article  Google Scholar 

  30. Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).

    Article  Google Scholar 

  31. Joehanes, R. et al. Integrated genome-wide analysis of expression quantitative trait loci aids interpretation of genomic association studies. Genome Biol. 18, 16 (2017).

    Article  Google Scholar 

  32. Nestle, F. O., Kaplan, D. H. & Barker, J. Psoriasis. N. Engl. J. Med. 361, 496–509 (2009).

    Article  Google Scholar 

  33. Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    Article  Google Scholar 

  34. Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

    Article  Google Scholar 

  35. Dobin, A. & Gingeras, T. R. Mapping RNA-seq reads with STAR. Curr. Protoc. Bioinformatics 51, 11.14.1–11.14.19 (2015).

    Article  Google Scholar 

  36. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).

    Article  Google Scholar 

  37. Castel, S. E., Mohammadi, P., Chung, W. K., Shen, Y. & Lappalainen, T. Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016).

    Article  Google Scholar 

  38. Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011).

    Article  Google Scholar 

  39. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    Article  Google Scholar 

  40. Muller, P., Parmigiani, G. & Rice, K. FDR and Bayesian Multiple Comparisons Rules Working Paper (Johns Hopkins University, Department of Biostatistics, 2006).

  41. Vigorito, E. et al. Dataset to reproduce BaseQTL figures. Zenodo https://doi.org/10.5281/zenodo.4759202 (2021).

Download references

Acknowledgements

This work was co-funded by the Wellcome Trust (WT107881), the MRC (MC_UU_00002/2, MC_UU_00002/4, MC_UU_00002/13, MR/R013926/1 (to the CLUSTER Consortium)) and the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). S.R.W. was supported by the NIHR Cambridge Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This research was funded in whole, or in part, by the Wellcome Trust (WT107881). For the purpose of open access, the author has applied a CC BY public copyright licence to any author accepted manuscript version arising from this submission.

Author information

Authors and Affiliations

Authors

Contributions

C.W. conceived of the project. E.V., C.W. and S.R.W. developed the model. E.V. wrote the software and performed analyses. W.-Y.L. and C.S. performed analyses and implemented the software. P.D.W.K. and S.R.W. contributed to the design of statistical analysis. E.V. and C.W. wrote the manuscript with input from all authors. C.W. directed the project.

Corresponding authors

Correspondence to Elena Vigorito or Chris Wallace.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks Eric Gamazon, Wei Sun and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Handling editor: Ananya Rastogi, in collaboration with the Nature Computational Science team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Tables 1–17, Figs. 1–21, Sections 1–4 and References.

Supplementary Data 1

Geuvadis samples used in this study.

Supplementary Data 2

eQTL estimates for psoriasis and normal skin running individual models.

Supplementary Data 3

eQTL estimates for psoriasis and normal skin running a joint model.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vigorito, E., Lin, WY., Starr, C. et al. Detection of quantitative trait loci from RNA-seq data with or without genotypes using BaseQTL. Nat Comput Sci 1, 421–432 (2021). https://doi.org/10.1038/s43588-021-00087-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00087-y

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing