Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics

Abstract

Induced pluripotent stem cells (iPSCs) are an established cellular system to study the impact of genetic variants in derived cell types and developmental contexts. However, in their pluripotent state, the disease impact of genetic variants is less well known. Here, we integrate data from 1,367 human iPSC lines to comprehensively map common and rare regulatory variants in human pluripotent cells. Using this population-scale resource, we report hundreds of new colocalization events for human traits specific to iPSCs, and find increased power to identify rare regulatory variants compared with somatic tissues. Finally, we demonstrate how iPSCs enable the identification of causal genes for rare diseases.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Map of cis genetic regulation in human induced pluripotent cells.
Fig. 2: Linking rare variants to gene expression outliers.
Fig. 3: Colocalization of disease and trait variants with iPSC eQTLs.
Fig. 4: Integration of common-variant colocalization analyses with outlier-associated rare variants.

Data availability

All data used in the study are available from the Sequence Read Archive, dbGaP or the European Nucleotide Archive; the full data availability is provided in Supplementary Table 1. Supplementary Table 2 provides a description of the samples used in the study. Full summary statistics on significant eQTLs can be obtained from Zenodo (https://doi.org/10.5281/zenodo.4005576). The colocalization results are accessible from the LocusCompare portal (http://locuscompare.com/). We used eQTL summary statistics from GTEx (v7; https://gtexportal.org/home/datasets/) and the BIOS cohort (https://genenetwork.nl/biosqtlbrowser/). We further used GWAS summary statistics from the NHGRI-EBI GWAS catalog (Ensembl v92; https://www.ebi.ac.uk/gwas/), Phenome scanner (v2; http://www.phenoscanner.medschl.cam.ac.uk/) and GWAS studies aggregated in the LocusCompare study. We downloaded the GTEx colocalization results via LocusCompare. Other external data sources are referenced in the Methods and the main text. Source data are provided with this paper.

Code availability

Code produced for the following analyses is available on GitHub: eQTL mapping (https://github.com/PMBio/hipsci_pipeline/tree/master/limix_QTL_pipeline/), gene expression outlier analysis (https://github.com/csmail/i2qtl_outlier/) and colocalization analysis (https://github.com/mikegloudemans/ipsc-coloc).

References

  1. Westra, H.-J. et al. Systematic identification of trans-eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).

    CAS  PubMed  Article  Google Scholar 

  3. Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).

    CAS  PubMed  Article  Google Scholar 

  4. GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed Central  Article  Google Scholar 

  5. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. Alasoo, K. et al. Shared genetic effects on chromatin and gene expression indicate a role for enhancer priming in immune response. Nat. Genet. 50, 424–431 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).

    CAS  PubMed  Article  Google Scholar 

  8. Cuomo, A. S. E. et al. Single-cell RNA sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun. 11, 810 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. https://doi.org/10.1038/s41588-021-00801-6 (2021).

  10. Sun, N. et al. Patient-specific induced pluripotent stem cells as a model for familial dilated cardiomyopathy. Sci. Transl. Med. 4, 130ra47 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

  11. Lan, F. et al. Abnormal calcium handling properties underlie familial hypertrophic cardiomyopathy pathology in patient-specific induced pluripotent stem cells. Cell Stem Cell 12, 101–113 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Lee, J. et al. Activation of PDGF pathway links LMNA mutation to dilated cardiomyopathy. Nature 572, 335–340 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. Kodo, K. et al. iPSC-derived cardiomyocytes reveal abnormal TGF-β signalling in left ventricular non-compaction cardiomyopathy. Nat. Cell Biol. 18, 1031–1042 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  14. Wu, H. et al. Modelling diastolic dysfunction in induced pluripotent stem cell-derived cardiomyocytes from hypertrophic cardiomyopathy patients. Eur. Heart J. 40, 3685–3695 (2019).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  15. Dubois, N. C. et al. SIRPA is a specific cell-surface marker for isolating cardiomyocytes derived from human pluripotent stem cells. Nat. Biotechnol. 29, 1011–1018 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. Sterneckert, J. L., Reinhardt, P. & Schöler, H. R. Investigating human disease using stem cell models. Nat. Rev. Genet. 15, 625–639 (2014).

    CAS  PubMed  Article  Google Scholar 

  17. Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).

    CAS  Article  Google Scholar 

  19. Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).

    CAS  PubMed  Article  Google Scholar 

  22. Frésard, L. et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat. Med. 25, 911–919 (2019).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  23. Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  24. Choi, J. et al. A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs. Nat. Biotechnol. 33, 1173–1181 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. Thomas, S. M. et al. Reprogramming LCLs to iPSCs results in recovery of donor-specific gene expression signature. PLoS Genet. 11, e1005216 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  26. Donovan, M. K. R., D’Antonio-Chronowska, A., D’Antonio, M. & Frazer, K. A. Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants. Nat. Commun. 11, 955 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 39, D945–D950 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  28. Gerrard, D. T. et al. An integrative transcriptomic atlas of organogenesis in human embryos. eLife 5, e15657 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  29. Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    CAS  PubMed  Article  Google Scholar 

  30. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    CAS  PubMed  Article  Google Scholar 

  31. Jakubosky, D. et al. Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nat. Commun. 11, 2928 (2020).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  32. Zhao, J. et al. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Li, X. et al. Transcriptome sequencing of a large human family identifies the impact of rare noncoding variants. Am. J. Hum. Genet. 95, 245–256 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Ferraro, N. M. et al. Transcriptomic signatures across human tissues identify functional rare genetic variation. Science 369, eaaz5900 (2020).

  36. Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).

  37. Kremer, L. S. et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8, 15824 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. Kernohan, K. D. et al. Whole-transcriptome sequencing in blood provides a diagnosis of spinal muscular atrophy with progressive myoclonic epilepsy. Hum. Mutat. 38, 611–614 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. McKusick, V. A. Mendelian Inheritance in Man: a Catalog of Human Genes and Genetic Disorders (JHU Press, 1998).

  40. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics https://doi.org/10.1093/bioinformatics/btz469 (2019).

  43. Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. van der Harst, P. & Verweij, N. Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease. Circ. Res. 122, 433–443 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  45. Liu, J. Z. et al. Dense fine-mapping study identifies new susceptibility loci for primary biliary cirrhosis. Nat. Genet. 44, 1137–1141 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Pongor, L. et al. A genome-wide approach to link genotype to clinical outcome by utilizing next-generation sequencing and gene chip data of 6,697 breast cancer patients. Genome Med. 7, 104 (2015).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  48. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Sanchez, E. et al. POLR1B and neural crest cell anomalies in Treacher Collins syndrome type 4. Genet. Med. 22, 547–556 (2020).

    PubMed  Article  Google Scholar 

  50. Howson, J. M. M. et al. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms. Nat. Genet. 49, 1113–1119 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Pankratz, N. et al. Meta-analysis of Parkinson’s disease: identification of a novel locus, RIT2. Ann. Neurol. 71, 370–384 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. Scott, R. A. et al. An expanded genome-wide association study of type 2. Diabetes 66, 2888–2902 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Webb, G. J., Siminovitch, K. A. & Hirschfield, G. M. The immunogenetics of primary biliary cirrhosis: a comprehensive review. J. Autoimmun. 64, 42–52 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. 1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  56. Streeter, I. et al. The human-induced pluripotent stem cell initiative-data resources for cellular genetics. Nucleic Acids Res. 45, D691–D697 (2017).

    CAS  PubMed  Article  Google Scholar 

  57. D’Antonio, M. et al. Insights into the mutational burden of human induced pluripotent stem cells from an integrative multi-omics approach. Cell Rep. 24, 883–894 (2018).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Knowles, J. W., Hao, K., Xie, W., Weedon, M. N. & Zhang, Z. Genetic and functional analyses identify NAT2 as a human insulin sensitivity gene. Circulation 128, A10906 (2013).

  60. Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015).

    CAS  PubMed  Article  Google Scholar 

  61. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2016).

    CAS  PubMed  Article  Google Scholar 

  63. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  64. Saha, A. & Battle, A. False positives in trans-eQTL and coexpression analyses arising from RNA-sequencing alignment errors. F1000Res. 7, 1860 (2018).

    CAS  PubMed  Article  Google Scholar 

  65. Pedersen, B. S., Layer, R. M. & Quinlan, A. R. Vcfanno: fast, flexible annotation of genetic variants. Genome Biol. 17, 118 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  66. Hall, C. L. et al. Frequency of genetic variants associated with arrhythmogenic right ventricular cardiomyopathy in the genome aggregation database. Eur. J. Hum. Genet. 26, 1312–1318 (2018).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  67. Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).

    CAS  Article  PubMed  Google Scholar 

  68. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 1000, 2078–2079 (2009).

    Article  CAS  Google Scholar 

  69. Churchhouse, C & Neale, B. Rapid GWAS of thousands of phenotypes for 337,000 samples in the UK Biobank. https://www.nealelab.is/blog/2017/7/19/rapid-gwas-of-thousands-of-phenotypes-for-337000-samples-in-the-uk-biobank/ (2017).

Download references

Acknowledgements

The authors thank the HipSci, iPSCORE, GENSIPS and PhLiPS cohorts for their input into the study and (early) access to the data. This work was supported by: a strategic award from the Wellcome Trust and UK Medical Research Council (WT098503), the EMBL Interdisciplinary Postdoc (EI3POD) program under Marie Skłodowska-Curie Actions COFUND (grant no. 664726 to M.J.B. and D.S.); National Institutes of Health (NIH; T32 LM012409 to C.S.; T15 LM01127 to D.J.; U01 HL107388-01, P30DK116074 and SPO 130829 to I.C.-O.; HL107442, DK105541, DK112155 to K.A.F.); Stanford Graduate Fellowship (to M.J.G.); National Natural Science Foundation of China (31970554 to X.L.); National Key R&D Program of China (2019YFC1315804 to X.L.); Shanghai Municipal Science and Technology Major Project (2017SHZDZX01 to X.L.); EBI–Sanger Postdoctoral Fellowship (to N.C.); National Science Foundation Graduate Research Fellowship (to N.M.F.); and California Institute for Regenerative Medicine (GC1R-06673 to K.A.F.). Research in the Stegle laboratory is supported by the BMBF, the Volkswagen Foundation and the European Union (ERC project DECODE). S.B.M. is supported by NIH grants U01HG009431, R01HL142015, R01HG008150, R01AG066490 and U01HG009080. Research in the Knowles laboratory is supported by NIH grants DK116750, DK120565, DK106236, DK107437 and HL107388. We thank the staff in the Cellular Genetics and Phenotyping and Sequencing core facilities at the Wellcome Trust Sanger Institute. Work at the Wellcome Trust Sanger Institute was further supported by Wellcome Trust grant WT090851. Several datasets used in this study were sourced from the NextGen Consortium, available from the NCBI database of Genotypes and Phenotypes (dbGaP): iPSCORE (phs000924.v4.p1 and phs001325.v3.p1); PhLiPS (phs001341.v1.p1); and GENESiPS (phs001139.v1.p1). This work, in part, used supercomputing resources provided by the Stanford Genetics Bioinformatics Service Center, supported by NIH S10 instrumentation grant S10OD023452. Research reported in this manuscript was in part supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under award no. U01HG007708. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

The main analyses and data preparations were performed by M.J.B. and C.S.; D.J. and M.D. performed SV calling and analysis; N.M.F. performed GTEx (v7) data processing; L.F. and X.L. performed rare-variant annotation and interpretation; H.K. and D.V. annotated and validated the rare-disease variants and genes for the HipSci rare-disease samples; M.J.G. performed the colocalization analysis; M.J.B., D.S., B.M. and D.H. developed the eQTL software; C.Z. generated iPSC lines for Undiagnosed Diseases Network rare-disease samples; N.C., I.C.-O. and Y.P. assisted with data processing and analysis; M.J.B., C.S., M.J.G., S.B.M. and O.S. wrote the manuscript; I.C.-O., N.C., N.M.F., K.A.F., L.F., M.J.G., D.J., M.T.W., D.B.Z. and B.M. assisted in editing the manuscript; M.J.B., C.S., E.S., K.A.F., O.S. and S.B.M. conceived and oversaw the study.

Corresponding authors

Correspondence to Marc Jan Bonder, Craig Smail, Stephen B. Montgomery or Oliver Stegle.

Ethics declarations

Competing interests

S.B.M. is a member of the Scientific Advisory Board of Myome. All other authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Gabriel Hoffman and Vijay Sankaran for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Replication and consistency of effect sizes of eQTL discovered in the the original iPSC studies and replicated in i2QTL.

Shown are scatter plots between eQTL effect size estimates in this study (i2QTL, x-axis) versus effect size estimates from previous studies (y-axis). Dots correspond to eQTL discovered in the respective study. Black: eQTL with consistent effect direction. Red: eQTL with discordant effect direction. Replication defined at nominal P < 0.05 in i2QTL. a, Replication of HipSci in i2QTL. 70.4% of the effects are replicated; 98% of the eQTL have concordant effect direction. Differences in the approach for estimating effect sizes result in the observed variation. b, Replication of iPSCORE in i2QTL. 81% of the effects are replicated; 98.7% of the eQTL have concordant effect direction. Notably only SNP eQTL were considered whereas SVs were not considered for replication. c, Replication of GENESiPS in i2QTL; 76.7% of the effects are replicated; 99.5% of the effects have concordant effect direction. d, Replication of PhiLiPS in i2QTL. 76.8% of the effects are replicated; 97.5% of the effects have concordant effect direction.

Extended Data Fig. 2 Identification of multiple independent eQTL for the same gene using stepwise regression.

a, Histogram of the number of independent eQTL effects identified for individual eGenes. Up to 12 independent effects were identified. b, Zoom-in view displaying the number of eGenes with 5 to 8 independent effects. c, Zoom-in view displaying the number eGenes with 8 to 12 independent effects.

Extended Data Fig. 3 Sharing of lead eQTL signals between cell types and studies, considering i2QTL, GTEX and the iPS differentiation study from Cuomo et al.

a, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues. Shown is the fraction of shared eQTL signals relative to the total number of common genes and lead eQTL variants in the two respective maps. Shared eQTL signals are defined as eQTL with concordant effect direction and absolute effect size within a factor of two (Methods). b, Distribution of pairwise sharing as in A of iPSCs versus GTEx tissues (blue, N = 48 comparisons) versus pairwise sharing between GTEx tissues (red, N = 2,401 comparisons).c, Pairwise sharing of lead eQTL signals in i2QTL (iPSC) and 48 GTEx tissues as in A, however additionally including single-cell eQTL form Cuomo et al. in iPSCs (iPSCsc), differentiated cell types (mesendo, endoderm). d, Distribution of pairwise sharing as in C, considering iPSCs and differentiated cell types (bulk left, followed by iPSC single cell), mesendoderm (cyan), and endoderm (green) versus GTex tissues (N = 48 comparisons). During differentiation genetic signals in iPSC become more similar to those in GTEx tissues. Data in panels b and d are displayed as violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles and the whiskers corresponding to the IQR ×1.5.

Extended Data Fig. 4 Properties of distal (trans) gene-level eQTL.

a, Dot plot of N = 862 trans-eQTL detected in iPSC (FDR < 10%). Dots correspond to individual trans-eQTL, with color denoting the variant category (blue: cis-eQTL, red: GWAS variant, purple: cis-eQTL and GWAS variant). b, Breakdown of unique trans-eQTL variants (N = 193) across different variant annotations. Darker shaded colors denote trans-eQTL linked to variants that have more than one annotation; lighter shades correspond variants that can be unique assigned to a given variant category. c, Mediation analysis of trans-eQTL, considering variants that are linked to a cis eQTL. The outer pie denotes the annotation of the underlying trans variant: GWAS only variants (n = 7, white), cis eQTL variants (n = 186, dark-blue). The inner pie displays results from the mediation analysis for individual trans variants: n = 7 trans eQTL are exclusively linked to GWAS variants and hence not mediated (white, ‘GWAS only trans-eQTL’); n = 58 trans eQTL variants are not significantly linked to mediation with any RNA trait (‘Non-significant link to any cis-eQTL’); 86 are exclusively linked to gene-level abundance (‘Only linked to gene-level cis-eQTL’), 7 are exclusively linked a RNA trait other than gene-level abundance (‘Linked only to non gene-level cis-eQTL’); 35 are linked to gene-level abundance and at least one additional RNA trait(‘Linked both to gene-level and other level cis-eQTL’).

Extended Data Fig. 5 Analysis of the trans-eQTL hotspot at ELF2.

a, Schematic representation of the genetic loci around ELF2 (left) and NAA15 (right). SNPs are annotated by evidence of cis-eQTL regulation on different traits (blue: transcript-ratio eQTL on ELF2, purple: gene-level eQTL on EFL2, green: splice eQTL on ELF2, brown: the APA eQTL on NAA15). b, LD structure between cis-eQTL variants implicated in the hotspot, annotated by cis-eQTL type as in a. c-f. Lead cis-eQTL effects by RNA trait for ELF2 and NAA15 across SNPs linked to downstream trans-eQTLs(n = 682 samples). Data are represented as a violin- and boxplot with the midpoint corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers represent the interquartile range ×1.5. C. Splicing cis-eQTL on ELF2. d. Gene level cis-eQTL on ELF2. E. Transcript-ratio cis-eQTL on ELF2. F. Alternative polyadenilation cis-eQTL on NAA15. g, Co-expression network of genes that are controlled by the trans-eQTL linked to the hotspot at ELF2 (N = 37 genes), including ELF2 itself (center). Color of the bounding box around Genes denotes the cis-eQTL variant that drives the corresponding trans effect (colors as in d-g). Genes with multiple trans regulators are depicted with multiple colored rings. h, Replication of trans effects at ELF2 in a single cell differentiation study (Cuomo et al.). Shown are trans-eQTL effect sizes (beta’s) for the 12 ELF2-linked trans targets that show significant replication (defined as P < 0.05 and consistent effect direction) in any of the Cuomo et al tissues. From left to right: eQTL effect size in; the i2QTL study (discovery); in undifferentiated iPSC profiled using scRNA-seq; in mesendoderm profiled using scRNA-seq and in definitive endoderm profiled using scRNA-seq. Significant replication are indicated with a red asterisk.

Extended Data Fig. 6 Enrichment for rare, deleterious SNPs in iPSC and GTEx tissues.

Comparison of enrichments for singleton, highly-deleterious (CADD > 25) SNPs in iPSC versus GTEx v7 tissues analogous to Fig. 2b, however with an additional adjustment for the number of expressed genes. Displayed are enrichments for 10,000 random draws of 50 samples, controlling for the number of expressed genes (genes subset at a fixed number across tissues, N = 500 genes). Strongest enrichment is observed in iPSC. The data are represented as a boxplot where the middle line corresponds to the median, the lower and upper edges of the box corresponding to the first and third quartiles, the whiskers represent the interquartile range (IQR) ×1.5 and beyond the whiskers are outlier points.

Extended Data Fig. 7 Expression level of rare disease genes in iPSC versus GTEx tissues.

a, Distribution of gene expression level of XX rare disease genes (log10(FPKM + 1)) in i2QTL iPSC and 17 GTEx tissues with a median expression level of at least 1 FPKM (red dashed line). Expression in i2QTL highlighted in teal, GTEx tissues in yellow. Disease genes are expressed in iPSCs and only difficult to biopsy tissues in GTEx display higher expression levels. n = 2,952 biologically independent samples. Data are represented as boxplots with the middle line corresponding to the median, the lower and upper edges of the box to the first and third quartiles, the whiskers to the interquartile range (IQR) ×1.5. b, Expression level of genes in different curated gene lists, comparing i2QTL iPSC (left), all GTEx tissues (middle) and GTEx whole blod (right). Shown is the fraction of genes in each category for two expression bins: [0,1) FPKM expression absent or lowly expressed; [1,1e + 12) FPKM gene expressed.

Extended Data Fig. 8 Additional results from the colocalization analysis of eQTL and GWAS traits.

a, Summary of colocalization results for the Coronary artery Disease GWAS (van der Harst et al 2018). 36 out of 93 GWAS loci were identified as colocalized with an eQTL of at least one RNA trait (38.7%). Shown are the number of colocalization events for eQTL of different RNA traits. From left to right: any eQTL type (All), gene-level eQTL (Gene), exon eQTL (Exon), transcript eQTL (Transcript), splicing eQTL (Splicing), APA eQTL (APA). The number of GWAS colocalization that are uniquely linked to a given RNA trait eQTL are displayed using a triangle and the total number of colocalizations per trait is depicted as a circle. b, Colocalization between a gene-level eQTL for POLR1B and a GWAS hit (rs7586668:C > T) for Height. Left: Manhatten plots displaying the local association signal for the eQTL on POLR1B (bottom) and the GWAS signal on Height (top). Right: Scatter plot of negative log P-values for the GWAS signal (x-axis) for BMI versus the POLR1B eQTL signal (y-axis) for the corresponding region.

Extended Data Fig. 9 Enrichment for large-effect outlier-associated rare variants in colocalized genes.

Absolute effect size of GWAS trait associations for iPSC outlier- and non-outlier-associated variants, considering genes with varying degree of evidence for colocalization with common eQTL variants.

Extended Data Fig. 10 Optimization of the number of PEER factors to adjust for confounding expression heterogeneity.

Shown is the number of genes with at least one gene-level eQTL (eGenes) for the top 3,000 highest expressed genes in iPSC for increasing number of PEER factors. PEER factors are adjusted for as additional fixed effect covariates. The vertical red line denotes the number of PEER factors considered in all i2QTL analyses.

Supplementary information

Supplementary Information

Supplementary Results, Methods and Note

Reporting Summary

Supplementary Tables

Supplementary Tables 1–18

Source data

Source Data Fig. 1

Figure data.

Source Data Fig. 2

Figure data.

Source Data Fig. 3

Figure data.

Source Data Fig. 4

Figure data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bonder, M.J., Smail, C., Gloudemans, M.J. et al. Identification of rare and common regulatory variants in pluripotent cells using population-scale transcriptomics. Nat Genet 53, 313–321 (2021). https://doi.org/10.1038/s41588-021-00800-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00800-7

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing