We assessed gene expression profiles in 2,752 twins, using a classic twin design to quantify expression heritability and quantitative trait loci (eQTLs) in peripheral blood. The most highly heritable genes (∼777) were grouped into distinct expression clusters, enriched in gene-poor regions, associated with specific gene function or ontology classes, and strongly associated with disease designation. The design enabled a comparison of twin-based heritability to estimates based on dizygotic identity-by-descent sharing and distant genetic relatedness. Consideration of sampling variation suggests that previous heritability estimates have been upwardly biased. Genotyping of 2,494 twins enabled powerful identification of eQTLs, which we further examined in a replication set of 1,895 unrelated subjects. A large number of non-redundant local eQTLs (6,756) met replication criteria, whereas a relatively small number of distant eQTLs (165) met quality control and replication standards. Our results provide a new resource toward understanding the genetic control of transcription.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Assisted clustering of gene expression data using regulatory data from partially overlapping sets of individuals
BMC Genomics Open Access 10 December 2022
Nature Open Access 30 September 2022
Scientific Reports Open Access 07 September 2022
Subscribe to Journal
Get full journal access for 1 year
only $6.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Hardy, J. Psychiatric genetics: are we there yet? JAMA Psychiatry 70, 569–570 (2013).
Majewski, J. & Pastinen, T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet. 27, 72–79 (2011).
Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).
Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Stranger, B.E. et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 8, e1002639 (2012).
Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Emilsson, V. et al. Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008).
de Jong, S. et al. Expression QTL analysis of top loci from GWAS meta-analysis highlights additional schizophrenia candidate genes. Eur. J. Hum. Genet. 20, 1004–1008 (2012).
Fransen, K. et al. Analysis of SNPs with an effect on gene expression identifies UBE2L3 and BCL3 as potential new risk genes for Crohn's disease. Hum. Mol. Genet. 19, 3482–3488 (2010).
Luo, R. et al. Genome-wide transcriptome profiling reveals the functional impact of rare de novo and recurrent CNVs in autism spectrum disorders. Am. J. Hum. Genet. 91, 38–55 (2012).
Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Zeller, T. et al. Genetics and beyond—the transcriptome of human monocytes and disease susceptibility. PLoS ONE 5, e10693 (2010).
Gamazon, E.R., Huang, R.S., Cox, N.J. & Dolan, M.E. Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 107, 9287–9292 (2010).
Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Degner, J.F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Xia, K. et al. seeQTL: a searchable database for human eQTLs. Bioinformatics 28, 451–452 (2012).
Fehrmann, R.S. et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 7, e1002197 (2011).
Min, J.L. et al. The use of genome-wide eQTL associations in lymphoblastoid cell lines to identify novel genetic pathways involved in complex traits. PLoS ONE 6, e22070 (2011).
Grundberg, E. et al. Population genomics in a disease targeted primary cell model. Genome Res. 19, 1942–1952 (2009).
Gibbs, J.R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).
Leek, J.T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010).
Akey, J.M., Biswas, S., Leek, J.T. & Storey, J.D. On the design and analysis of gene expression studies in human populations. Nat. Genet. 39, 807–808 (2007).
Innocenti, F. et al. Identification, replication, and functional fine-mapping of expression quantitative trait loci in primary human liver tissue. PLoS Genet. 7, e1002078 (2011).
Fairfax, B.P. et al. Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA alleles. Nat. Genet. 44, 502–510 (2012).
Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).
Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Powell, J.E. et al. Genetic control of gene expression in whole blood and lymphoblastoid cell lines is largely independent. Genome Res. 22, 456–466 (2012).
Choy, E. et al. Genetic analysis of human traits in vitro: drug response and gene expression in lymphoblastoid cell lines. PLoS Genet. 4, e1000287 (2008).
van Dongen, J., Slagboom, P.E., Draisma, H.H., Martin, N.G. & Boomsma, D.I. The continuing value of twin studies in the omics era. Nat. Rev. Genet. 13, 640–653 (2012).
Flicek, P. et al. Ensembl 2013. Nucleic Acids Res. 41, D48–D55 (2013).
Rossin, E.J. et al. Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet. 7, e1001273 (2011).
Huang, W. et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35, W169–W175 (2007).
Grossman, S.R. et al. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327, 883–886 (2010).
Nickel, G.C., Tefft, D. & Adams, M.D. Human PAML browser: a database of positive selection on human genes using phylogenetic methods. Nucleic Acids Res. 36, D800–D808 (2008).
Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005).
Voight, B.F., Kudaravalli, S., Wen, X. & Pritchard, J.K. A map of recent positive selection in the human genome. PLoS Biol. 4, e72 (2006).
Andrés, A.M. et al. Targets of balancing selection in the human genome. Mol. Biol. Evol. 26, 2755–2764 (2009).
Grossman, S.R. et al. Identifying recent adaptations in large-scale genomic data. Cell 152, 703–713 (2013).
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).
McKusick, V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Visscher, P.M., Hill, W.G. & Wray, N.R. Heritability in the genomics era—concepts and misconceptions. Nat. Rev. Genet. 9, 255–266 (2008).
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Powell, J.E. et al. Congruence of additive and non-additive effects on gene expression estimated from pedigree and SNP data. PLoS Genet. 9, e1003502 (2013).
Stranger, B.E. et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315, 848–853 (2007).
Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).
Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).
Price, A.L. et al. Effects of cis and trans genetic ancestry on gene expression in African Americans. PLoS Genet. 4, e1000294 (2008).
Spielman, R.S. et al. Common genetic variants account for differences in gene expression among ethnic groups. Nat. Genet. 39, 226–231 (2007).
Gatti, D.M., Barry, W.T., Nobel, A.B., Rusyn, I. & Wright, F.A. Heading down the wrong pathway: on the influence of correlation within gene sets. BMC Genomics 11, 574 (2010).
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
Sun, W., Ibrahim, J.G. & Zou, F. Genomewide multiple-loci mapping in experimental crosses by iterative adaptive penalized regression. Genetics 185, 349–359 (2010).
Marfil, V. et al. Interaction between Hhex and SOX13 modulates Wnt/TCF activity. J. Biol. Chem. 285, 5726–5737 (2010).
Betancur, C. Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting. Brain Res. 1380, 42–77 (2011).
Chiurazzi, P., Schwartz, C.E., Gecz, J. & Neri, G. XLMR genes: update 2007. Eur. J. Hum. Genet. 16, 422–434 (2008).
Inlow, J.K. & Restifo, L.L. Molecular and comparative genetics of mental retardation. Genetics 166, 835–881 (2004).
Cooper, G.M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Sullivan, P.F., Daly, M.J. & O'Donovan, M. Genetic architectures of psychiatric disorders: the emerging picture and its implications. Nat. Rev. Genet. 13, 537–551 (2012).
Wise, A.L., Gyi, L. & Manolio, T.A. eXclusion: toward integrating the X chromosome in genome-wide association analyses. Am. J. Hum. Genet. 92, 643–647 (2013).
Xavier, R.J. & Rioux, J.D. Genome-wide association studies: a new window into immune-mediated diseases. Nat. Rev. Immunol. 8, 631–643 (2008).
Hurst, L.D., Pal, C. & Lercher, M.J. The evolutionary dynamics of eukaryotic gene order. Nat. Rev. Genet. 5, 299–310 (2004).
Osborne, C.S. et al. Active genes dynamically colocalize to shared sites of ongoing transcription. Nat. Genet. 36, 1065–1071 (2004).
Sproul, D., Gilbert, N. & Bickmore, W.A. The role of chromatin structure in regulating the expression of clustered genes. Nat. Rev. Genet. 6, 775–781 (2005).
Hentges, K.E., Pollock, D.D., Liu, B. & Justice, M.J. Regional variation in the density of essential genes in mice. PLoS Genet. 3, e72 (2007).
Cai, J.J., Macpherson, J.M., Sella, G. & Petrov, D.A. Pervasive hitchhiking at coding and regulatory sites in humans. PLoS Genet. 5, e1000336 (2009).
Davidson, S., Starkey, A. & MacKenzie, A. Evidence of uneven selective pressure on different subsets of the conserved human genome; implications for the significance of intronic and intergenic DNA. BMC Genomics 10, 614 (2009).
Willemsen, G. et al. The Netherlands Twin Register biobank: a resource for genetic epidemiological studies. Twin Res. Hum. Genet. 13, 231–245 (2010).
Penninx, B.W. et al. The Netherlands Study of Depression and Anxiety (NESDA): rationales, objectives and methods. Int. J. Methods Psychiatr. Res. 17, 121–140 (2008).
Boomsma, D.I. et al. Netherlands Twin Register: from twins to twin families. Twin Res. Hum. Genet. 9, 849–857 (2006).
Boomsma, D.I. et al. Genome-wide association of major depression: description of samples for the GAIN major depressive disorder study: NTR and NESDA Biobank Projects. Eur. J. Hum. Genet. 16, 335–342 (2008).
Sullivan, P.F. et al. Genomewide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Mol. Psychiatry 14, 359–375 (2009).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Altshuler, D.M. et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Schadt, E.E., Woo, S. & Hao, K. Bayesian method to predict individual SNP genotypes from gene expression data. Nat. Genet. 44, 603–608 (2012).
Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).
Falconer, D.S. & Mackay, T.F.C. Introduction to Quantitative Genetics (Longman Group, Ltd., London, 1996).
Neale, M.C. & Cardon, L.R. Methodology for the Study of Twins and Families (Kluwer Academic Publisher Group, Dordrecht, The Netherlands, 1992).
Wang, X., Guo, X., He, M. & Zhang, H. Statistical inference in mixed models and analysis of twin and family data. Biometrics 67, 987–995 (2011).
Visscher, P.M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).
Wright, F.A. The phenotypic difference discards sib-pair QTL linkage information. Am. J. Hum. Genet. 60, 740–742 (1997).
Abecasis, G.R., Cherny, S.S., Cookson, W.O. & Cardon, L.R. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).
Barry, W.T., Nobel, A.B. & Wright, F.A. A statistical framework for testing functional categories in microarray data. Ann. Appl. Stat. 2, 286–315 (2008).
Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
The work described in this paper was funded by the US National Institute of Mental Health (RC2 MH089951, principal investigator P.F.S.) as part of the American Recovery and Reinvestment Act of 2009. Transport, extraction and preparation of the NTR samples were carried out under a supplement to the NIMH Center for Collaborative Genomics Research on Mental Disorders (U24 MH068457, principal investigator J.T.). We thank T. Lehner (National Institute of Mental Health) for his support. Additional analytic support was provided by grants R01 MH090936, R01 GM074175 and P42 ES005948 and by a Gillings Innovations Award. The Netherlands Study of Depression and Anxiety (NESDA) and the Netherlands Twin Register (NTR) were funded by the Netherlands Organization for Scientific Research (MagW/ZonMW; grants 904-61-090, 985-10-002, 904-61-193, 480-04-004, 400-05-717 and 912-100-20; Spinozapremie 56-464-14192; and Geestkracht program grant 10-000-1002), the Center for Medical Systems Biology (CMSB2; NWO Genomics), Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL), the VU University EMGO+ Institute for Health and Care Research and the Neuroscience Campus Amsterdam, NBIC/BioAssist/RK (2008.024), the European Science Foundation (EU/QLRT-2001-01254), the European Community's Seventh Framework Programme (FP7/2007-2013), ENGAGE (HEALTH-F4-2007-201413) and the European Research Council (ERC; 230374).
P.F.S. was on the scientific advisory board of Expression Analysis. The other authors report no conflicts of interest.
Integrated supplementary information
Supplementary Figure 1 Effects of covariates and mean expression on heritability and shared environmental effects (n = 2,752).
(a) Adjusted R2 for all covariates (not including hybridization plate) for predicting expression levels for each of 43,628 transcripts. (b) Heritability estimates (h2) from the ACE model applied to all transcripts, before and after covariate correction. Negative values are implausible, reflecting sampling variation, but the entire range is shown for illustration, and the estimates are unbiased. The correction generally strengthens the evidence for the most highly significant transcripts. (c) Shared environment (twinship) effect estimates c2, before and after covariate correction. (d) Covariate-corrected observed versus expected right-tailed P values for h2 show a large number of significant transcripts. (e) The same plot for positive c2 shows that none reaches transcriptome-wide significance, nor did any transcript show significant negative c2 evidence (data not shown).
(a) Mean heritability as a function of gene proximity to 3,931 NHGRI GWAS catalog SNPs with GWAS P < 5.0 × 10−8. For each of the NHGRI catalog SNPs, the closest gene was recorded, then the second closest, and so on, and each gene was designated according to whether it was the kth-closest gene to at least one SNP in the catalog. The mean heritability as a function of these ranks shows that genes with higher proximity rank tend to have higher h2. A simple rank correlation of the two axis values gives P = 0.017. For each k, the number of genes included is ∼1,600. (b) Mean heritability as a function of the number of SNPs that are closest to the gene. For each gene, the number of significant NHGRI catalog SNPs for which the gene is closest was recorded, and the mean heritability was displayed for that group. Using all genes, the rank correlation of h2 versus the number of closest GWAS SNPs gives P = 8.2 × 10−16. For each point, the number of genes included in the calculation is shown.
Supplementary Figure 3 Contribution of components to expression variation for 9,060 genes with h2 > 0.1 (whether or not declared expressed).
(a) Ratio of r2 (variation explained) by best local SNP to overall (twin-based) h2, with medians and means of ratios, as well as the proportion of h2 explained by the best local SNP. (b–d) Analogous plots and values for the best distant SNP (b), local GCTA estimation (c) and local IBD estimation (d) using DZ twins. Analyses using only expressed genes are similar throughout, due to the restriction h2 > 0.1.
Supplementary Figure 4 Effects of sample size on the reliability of h2 estimates in twin-based designs and comparison with additive-only variance components, applied to 18,392 genes.
(a) From the shrunken ‘true” estimate of the h2 distribution in NTR, the distribution of estimated h2 was determined, using the twin proportions of the Brisbane Systems Genetics Study (BSGS) and assuming no family effects. These hypothetical h2 estimates from NTR (using the shrunken h2 distribution and computed error variation applicable to the BSGS sample size and analysis approach) very closely matches their published report. BSGS values were obtained by digitizing the plot from the authors' report using WebPlotDigitizer (http://arohatgi.info/WebPlotDigitizer/app/). (b) Using the shrunken 'true' estimate of the h2 distribution from NTR and the standard error of a twin-based design and ACE model (with MZ vs. DZ proportions the same as in NTR), the rank correlation of estimated h2 versus true h2 for the transcriptome as a function of total sample size.
Supplementary Figure 5 Comparison of twin-based heritability to alternate sources of heritability information, using the 18,392 best h2 set of unique genes.
(a) Best local SNP r2 versus twin-based h2. The proportion variance in twin-based h2 explained by the best local SNP in a linear regression model is listed . Each gene was classified as ‘local’ or ‘distant’ according to the smaller of the eQTL P values (regardless of genome-wide significance). (b) Local GCTA r2 versus twin-based h2. (c) DZ local IBD analysis versus twin-based h2. The local IBD analysis is less powerful, as it is based on roughly half of the data. (d) A model using the three predictors in a–c produces only a slight improvement in prediction.
(a) Local eQTL –log10 (q) values for NESDA versus NTR. (b) Inset to highlight less significant local eQTLs. (c) Distant eQTL –log10 (q) values for NESDA versus NTR. The values shown are interchromosomal eQTLs, i.e., the SNP and target gene are on different chromosomes, ensuring that they are truly distant. (d) Inset to highlight less significant distant eQTLs.
The 6,864 SNPs involved in the 6,941 local eQTLs were annotated using Variant Effect Predictor (version 2.8) of Ensembl (see the Supplementary Note). (a) The proportion of local eQTLs located in regulatory regions. The number on the top of each bar is the P value indicating over-representation (red) and under-representation (blue) of the eQTLs located in regulatory regions. (b) The proportion of replicating local eQTLs.
Supplementary Figure 8 P values of SNPs in NTR and NESDA for genes declared significant in Westra et al.
(a) Local eQTL replication P values in NTR. (b) Local eQTL replication P values in NESDA. (c) Distant eQTL replication P values in NTR. (d) Distant eQTL replication P values in NESDA.
The 304 SNPs involved in the 348 distant eQTLs were annotated using Variant Effect Predictor (version 2.8) of Ensembl. Fourteen (26%) of 53 SNPs annotated as intergenic variants were replicated in NESDA, which is significantly lower than the overall replication rate of 47%. There was no significant enrichment or deficiency of replication in other categories.
For each of the 304 SNPs of distant eQTLs, we assessed its association with local genes (> 1 Mb away), and we then grouped these 304 SNPs by their minimum local eQTL P values. The value above each black bar shows the number of SNPs belonging to each significance grouping. For comparison, we randomly selected 10,000 SNPs from the remaining SNPs, matched to the 304 SNPs by minor allele frequency and imputation quality R2, and then grouped analogously by their minimum local eQTL P values.
A total of 304 SNPs were involved in the 348 distant eQTLs. The SNPs were clustered by genomic position so that a SNP was assigned to a cluster if its distance to any SNP already in the cluster was smaller than 1 Mb. The SNPs grouped into 203 clusters, 160 clusters with only a single SNP. Forty-three clusters with more than one SNP per cluster spanned 2 kb to 2 Mb, with median size of 89 kb. The plot shows the number of eQTLs belonging to each cluster. Eleven clusters with more than five eQTLs are highlighted in the inset table. The number of NESDA eQTLs (q value > 0.01) associated with each cluster is also shown.
(a) A distant eQTL on chromosome 19 is associated with the expression of 12 distant genes and a local gene MOY1F. The network plot shows the partial correlation graph of these 13 genes, where an edge indicating a nonzero partial correlation. The partial correlations were estimated by the penalized estimation method cited in the Supplementary Note. (b) A distant eQTL on chromosome 20 is associated with the expression of six distant genes and a local gene, SMOX. Using a likelihood ratio test approach and independent genotype and gene expression data from NESDA (to avoid winner's curse), we found that causal relations, eSNP → SMOX → a distant eQTL gene, are significantly more likely than other possible relations for these six distant eQTL genes, using the likelihood approach described in the Supplementary Note. (c) Distant eQTLs on six other chromosomes.
About this article
Cite this article
Wright, F., Sullivan, P., Brooks, A. et al. Heritability and genomics of gene expression in peripheral blood. Nat Genet 46, 430–437 (2014). https://doi.org/10.1038/ng.2951
This article is cited by
Genome Medicine (2022)
Assisted clustering of gene expression data using regulatory data from partially overlapping sets of individuals
BMC Genomics (2022)
Scientific Reports (2022)
Scientific Reports (2022)
Communications Biology (2022)