Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project—the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
The Geuvadis RNA-sequencing data, genotype data, variant annotations, splice scores, quantifications, and QTL results are freely and openly available with no restrictions. The main portal for accessing the data is EBI ArrayExpress, under accessions E-GEUV-1, E-GEUV-2 and E-GEUV-3 (see the data access schema in Supplementary Fig. 39). For visualization of the results we created the Geuvadis Data Browser (http://www.ebi.ac.uk/Tools/geuvadis-das) where quantifications and QTLs can be viewed, searched and downloaded (Supplementary Fig. 40). The project webpage (http://www.geuvadis.org) provides full documentation and links to all files, and the analysis group wiki is open to the public (http://geuvadiswiki.crg.es).
We would like to thank E. Falconnet, L. Romano, A. Planchon, D. Bielsen, A. Yurovsky, A. Buil, J. Bryois, A. Nica, I. Topolsky, N. Fusi, S. Waszak, C. Bustamante, J. Rung, N. Kolesnikov, A. Roa, E. Bragin, S. Brent, J. Gonzalez, M. Morell, A. Puig, E. Palumbo, M. Ventayol Garcia, J. F. J. Laros, J. Blanc, R. Birkelund, G. Plaja, M. Ingham, J. Camps, M. Bayes, L. Agueda, A. Gouin, M.-L. Yaspo, E. Graf, A. Walther, C. Fischer, S. Loesecke, B. Schmick, D. Balzereit, S. Dökel, M. Linser, A. Kovacsovics, M. Friskovec, C. von der Lancken, M. Schlapkohl, A. Hellmann, M. Schilhabel, the SNP&SEQ Technology Platform in Uppsala, S. Sauer, the Vital-IT high-performance computing centre of the SIB Swiss Institute of Bioinformatics, B. Goldstein and others at the Coriell Institute, and J. Cooper, E. Burnett, K. Ball and others at the European Collection of Cell Cultures (ECACC) and the 1000 Genomes Consortium. This project was funded by the European Commission 7th Framework Program (FP7) (261123; GEUVADIS); the Swiss National Science Foundation (130326, 130342), the Louis Jeantet Foundation, and ERC (260927) (E.T.D.); NIH-NIMH (MH090941) (E.T.D., M.I.M., R.G.); Spanish Plan Nacional SAF2008-00357 (NOVADIS), the Generalitat de Catalunya AGAUR 2009 SGR-1502, and the Instituto de Salud Carlos III (FIS/FEDER PI11/00733) (X.E.); Spanish Plan Nacional (BIO2011-26205) and ERC (294653) (R.G.); ESGI, READNA (FP7 Health-F4-2008-201418), Spanish Ministry of Economy and Competitiveness (MINECO) and the Generalitat de Catalunya (I.G.G.); DFG Cluster of Excellence Inflammation at Interfaces, the INTERREG4A project HIT-ID, and the BMBF IHEC project DEEP SP 2.3 (P.Ro.); German Centre for Cardiovascular Research (DZHK) and the German Ministry of Education and Research (01GR0802, 01GM0867, 01GR0804, 16EX1020C) (T.M.); EurocanPlatform (FP7 260791), ENGAGE and CAGEKID (241669) (A.B.); FP7/2007-2013, ENGAGE project, HEALTH-F4-2007-201413, and the Centre for Medical Systems Biology within the framework of The Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific and Research (NWO) (P.AC.H and G.-J.v.O.); The Swedish Research Council (C0524801, A028001) and the Knut and Alice Wallenberg Foundation (2011.0073) (A.-C.S.); The Swiss National Science Foundation (127375, 144082) and ERC (249968) (S.E.A.); Instituto de Salud Carlos III (FIS/FEDER PS09/02368) (A.C.); German Federal Ministry of Education and Research (01GS08201) (R.S.); Max Planck Society (H.L.); Wellcome Trust (WT085532) and the European Molecular Biology Laboratory (P.F.); ENGAGE, Wellcome Trust (081917, 090367, 090532, 098381), and Medical Research Council UK (G0601261) (M.I.M.); Wellcome Trust Centre for Human Genetics (090532/Z/09/Z, 075491/Z/04/B), Wellcome Trust (098381, 090367, 076113, 083270), the WTCCC2 project (085475/B/08/Z, 085475/Z/08/Z), Royal Society Wolfson Merit Award, Wellcome Trust Senior Investigator Award (095552/Z/11/Z) (P.D.); EMBO long-term fellowship EMBO-ALTF 2010-337 (H.K.); NIH-NIGMS (R01 GM104371) (D.G.M.); Marie Curie FP7 fellowship (O.S.); Scholarship by the Clarendon Fund of the University of Oxford, and the Nuffield Department of Medicine (M.A.R.); EMBO long-term fellowship ALTF 225-2011 (M.R.F.); Emil Aaltonen Foundation and Academy of Finland fellowships (T.L.).
This file contains Supplementary Table 4 showing miRNA-mRNA correlations and Supplementary Table 5 showing Top eQTL variants for 91 GWAS SNPs.