Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Common genetic variation drives molecular heterogeneity in human iPSCs

A Corrigendum to this article was published on 14 June 2017

Abstract

Technology utilizing human induced pluripotent stem cells (iPS cells) has enormous potential to provide improved cellular models of human disease. However, variable genetic and phenotypic characterization of many existing iPS cell lines limits their potential use for research and therapy. Here we describe the systematic generation, genotyping and phenotyping of 711 iPS cell lines derived from 301 healthy individuals by the Human Induced Pluripotent Stem Cells Initiative. Our study outlines the major sources of genetic and phenotypic variation in iPS cells and establishes their suitability as models of complex human traits and cancer. Through genome-wide profiling we find that 5–46% of the variation in different iPS cell phenotypes, including differentiation capacity and cellular morphology, arises from differences between individuals. Additionally, we assess the phenotypic consequences of genomic copy-number alterations that are repeatedly observed in iPS cells. In addition, we present a comprehensive map of common regulatory variants affecting the transcriptome of human pluripotent cells.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: iPS cell line generation and quality control.
Figure 2: Locations and consequences of recurrent CNAs.
Figure 3: Variance component analysis of HipSci assays.
Figure 4: Comparison of iPS cell and somatic tissue eQTLs.
Figure 5: iPS cell eQTLs tag disease-associated variation.

Accession codes

Primary accessions

ArrayExpress

European Nucleotide Archive

References

  1. Sterneckert, J. L., Reinhardt, P. & Schöler, H. R. Investigating human disease using stem cell models. Nat. Rev. Genet. 15, 625–639 (2014)

    Article  CAS  PubMed  Google Scholar 

  2. Kim, K. et al. Epigenetic memory in induced pluripotent stem cells. Nature 467, 285–290 (2010)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  3. Kim, K. et al. Donor cell type can influence the epigenome and differentiation potential of human induced pluripotent stem cells. Nat. Biotechnol. 29, 1117–1119 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Lister, R. et al. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. Nature 471, 68–73 (2011)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  5. Nazor, K. L. et al. Recurrent variations in DNA methylation in human pluripotent stem cells and their differentiated derivatives. Cell Stem Cell 10, 620–634 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rouhani, F. et al. Genetic background drives transcriptional variation in human induced pluripotent stem cells. PLoS Genet. 10, e1004432 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Burrows, C. K. et al. Genetic variation, not cell type of origin, underlies the majority of identifiable regulatory differences in iPSCs. PLoS Genet. 12, e1005793 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Vallier, L. et al. Signaling pathways controlling pluripotency and early cell fate decisions of human induced pluripotent stem cells. Stem Cells 27, 2655–2666 (2009)

    Article  CAS  PubMed  Google Scholar 

  9. Müller, F. J. et al. A bioinformatic assay for pluripotency in human cells. Nat. Methods 8, 315–317 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Danecek, P., & McCarthy, S. A., HipSci Consortium & Durbin, R. A method for checking genomic integrity in cultured cell lines from SNP genotyping data. PLoS ONE 11, e0155014 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Laurent, L. C. et al. Dynamic changes in the copy number of pluripotency and cell proliferation genes in human ESCs and iPSCs during reprogramming and time in culture. Cell Stem Cell 8, 106–118 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. The International Stem Cell Initiative. Screening ethnically diverse human embryonic stem cells identifies a chromosome 20 minimal amplicon conferring growth advantage. Nat. Biotechnol. 29, 1132–1144 (2011)

  13. Abyzov, A. et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492, 438–442 (2012)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  14. Mayshar, Y. et al. Identification and classification of chromosomal aberrations in human induced pluripotent stem cells. Cell Stem Cell 7, 521–531 (2010)

    Article  CAS  PubMed  Google Scholar 

  15. Taapken, S. M. et al. Karotypic abnormalities in human induced pluripotent stem cells and embryonic stem cells. Nat. Biotechnol. 29, 313–314 (2011)

    Article  CAS  PubMed  Google Scholar 

  16. Hussein, S. M. et al. Copy number variation and selection during reprogramming to pluripotency. Nature 471, 58–62 (2011)

    Article  CAS  ADS  PubMed  Google Scholar 

  17. Laurin, M. & Côté, J. F. Insights into the biological functions of Dock family guanine nucleotide exchange factors. Genes Dev. 28, 533–547 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Zhang, X. et al. FATS is a transcriptional target of p53 and associated with antitumor activity. Mol. Cancer 9, 244 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Lo, J. Y., Chou, Y. T., Lai, F. J. & Hsu, L. J. Regulation of cell signaling and apoptosis by tumor suppressor WWOX. Exp. Biol. Med. 240, 383–391 (2015)

    Article  CAS  Google Scholar 

  20. Futreal, P. A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Duckett, C. S. et al. A conserved family of cellular genes related to the baculovirus iap gene and encoding apoptosis inhibitors. EMBO J. 15, 2685–2694 (1996)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Chia, N. Y. et al. A genome-wide RNAi screen reveals determinants of human embryonic stem cell identity. Nature 468, 316–320 (2010)

    Article  CAS  ADS  PubMed  Google Scholar 

  23. Belinky, F . et al. PathCards: multi-source consolidation of human biological pathways. Database (Oxford) 2015, bav006 (2015)

    Article  Google Scholar 

  24. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015)

  25. Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Xu, H. et al. ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells. Database (Oxford) 2013, bat045 (2013)

    Google Scholar 

  28. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Dubois, P. C. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Stranger, B. E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zeller, T. et al. Genetics and beyond—the transcriptome of human monocytes and disease susceptibility. PLoS One 5, e10693 (2010)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Purrington, K. S. et al. Genome-wide association study identifies 25 known breast cancer susceptibility loci as risk factors for triple-negative breast cancer. Carcinogenesis 35, 1012–1019 (2014)

    Article  CAS  PubMed  Google Scholar 

  33. Garcia-Closas, M. et al. Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet. 45, 392–398 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Wang, Z. et al. Imputation and subset-based association analysis across different cancer types identifies multiple independent risk loci in the TERTCLPTM1L region on chromosome 5p15.33. Hum. Mol. Genet. 23, 6616–6633 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Li, Q. et al. Integrative eQTL-based analyses reveal the biology of breast cancer risk loci. Cell 152, 633–641 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ongen, H. et al. Putative cis-regulatory drivers in colorectal cancer. Nature 512, 87–90 (2014)

    Article  CAS  ADS  PubMed  Google Scholar 

  37. Chen, Q. R., Hu, Y., Yan, C., Buetow, K. & Meerzaman, D. Systematic genetic analysis identifies cis-eQTL target genes associated with glioblastoma patient survival. PLoS One 9, e105393 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  38. Bojesen, S. E. et al. Multiple independent variants at the TERT locus are associated with telomere length and risks of breast and ovarian cancer. Nat. Genet. 45, 371–384 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Chiba, K. et al. Cancer-associated TERT promoter mutations abrogate telomerase silencing. eLife 4, e07918 (2015)

    Article  PubMed Central  Google Scholar 

  40. Kyttälä, A. et al. Genetic Variability Overrides the Impact of Parental Cell Type and Determines iPSC Differentiation Potential. Stem Cell Reports 6, 200–212 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  41. Kajiwara, M. et al. Donor-dependent variations in hepatic differentiation from human-induced pluripotent stem cells. Proc. Natl Acad. Sci. USA 109, 12538–12543 (2012)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  42. Choi, J. et al. A comparison of genetically matched cell lines reveals the equivalence of human iPSCs and ESCs. Nat. Biotechnol. 33, 1173–1181 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gerrits, A. et al. Expression quantitative trait loci are highly sensitive to cellular differentiation state. PLoS Genet. 5, e1000692 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Spies, N. et al. Constraint and divergence of global gene expression in the mammalian embryo. eLife 4, e05538 (2015)

    Article  PubMed  PubMed Central  Google Scholar 

  45. Cannavò, E. et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017)

    Article  ADS  CAS  PubMed  Google Scholar 

  46. Carcamo-Orive, L. et al. Analysis of transcriptional variability in large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532 (2017)

    Article  CAS  PubMed  Google Scholar 

  47. DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546 (2017)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kim, N. W. et al. Specific association of human telomerase activity with immortal cells and cancer. Science 266, 2011–2015 (1994)

    Article  CAS  ADS  PubMed  Google Scholar 

  49. Kelly, L. M. & Gilliland, D. G. Genetics of myeloid leukemias. Annu. Rev. Genomics Hum. Genet. 3, 179–198 (2002)

    Article  CAS  PubMed  Google Scholar 

  50. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

  51. Takahashi, K. & Yamanaka, S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663–676 (2006)

    Article  CAS  PubMed  Google Scholar 

  52. The UK10K Consortium. The UK10K project identifies rare variants in health and disease. Nature 526, 82–90 (2015)

  53. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015)

  54. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Delaneau, O., Marchini, J. & Zagury, J. F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2011)

    Article  CAS  PubMed  Google Scholar 

  56. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 (Suppl 1), S96–S104 (2002)

    Article  PubMed  Google Scholar 

  59. Vallier, L. et al. Early cell fate decisions of human embryonic stem cells and mouse epiblast stem cells are controlled by the same signalling pathways. PLoS ONE 4, e6082 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  60. Ly, T. et al. A proteomic chronology of gene expression through the cell cycle in human myeloid leukemia cells. eLife 3, e01630 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  61. Bensaddek, D. et al. Micro-proteomics with iterative data analysis: proteome analysis in C. elegans at the single worm level. Proteomics 16, 381–392 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008)

    Article  CAS  PubMed  Google Scholar 

  63. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013)

    Article  CAS  PubMed  Google Scholar 

  65. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015)

    Article  CAS  PubMed  Google Scholar 

  66. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. DeLuca, D. S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 1530–1532 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Leha, A. et al. A high-content platform to characterise human induced pluripotent stem cell lines. Methods 96, 85–96 (2016)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  71. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010)

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  74. Lippert, C., Casale, F. P., Rakitsch, B. & Stegle, O. LIMIX: genetic analysis of multiple traits. Preprint at http://biorxiv.org/content/early/2014/05/21/003905 (2014)

  75. Casale, F. P., Rakitsch, B., Lippert, C. & Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods 12, 755–758 (2015)

    Article  CAS  PubMed  Google Scholar 

  76. Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003)

    Article  CAS  ADS  MathSciNet  MATH  PubMed  PubMed Central  Google Scholar 

  77. Pers, T. H., Timshel, P. & Hirschhorn, J. N. SNPsnap: a web-based tool for identification and annotation of matched SNPs. Bioinformatics 31, 418–420 (2015)

    Article  CAS  PubMed  Google Scholar 

  78. Lawrence, M., Gentleman, R. & Carey, V. rtracklayer: an R package for interfacing with genome browsers. Bioinformatics 25, 1841–1842 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015)

    Article  CAS  ADS  PubMed  Google Scholar 

  80. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Trynka, G. et al. Dense genotyping identifies and localizes multiple common and rare variant association signals in celiac disease. Nat. Genet. 43, 1193–1201 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. International Multiple Sclerosis Genetics Consortium (IMSGC). Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat. Genet. 45, 1353–1360 (2013)

  84. Faraco, J. et al. ImmunoChip study implicates antigen presentation to T cells in narcolepsy. PLoS Genet. 9, e1003270 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Cordell, H. J. et al. International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways. Nat. Commun. 6, 8019 (2015)

    Article  CAS  PubMed  Google Scholar 

  86. Tsoi, L. C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat. Genet. 44, 1341–1348 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014)

    Article  CAS  ADS  PubMed  Google Scholar 

  88. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014)

  89. Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  93. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was funded with a strategic award from the Wellcome Trust and UK Medical Research Council (WT098503). We thank the staff in the Cellular Genetics and Phenotyping and Sequencing core facilities at the Wellcome Trust Sanger Institute. Work at the Wellcome Trust Sanger Institute was further supported by Wellcome Trust grant WT090851. H.K. is supported by a MRC eMedLab Medical Bioinformatics career development award from the UK Medical Research Council (MR/L016311/1). F.M.W. acknowledges financial support from the Department of Health via the NIHR Biomedical Research Centre award to Guy’s & St Thomas’ National Health Service Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust. We acknowledge the participation of all NIHR Cambridge BioResource volunteers, and thank the NIHR Cambridge BioResource centre staff for their contribution. We thank the National Institute for Health Research and NHS Blood and Transplant. The NIHR/Wellcome Trust Cambridge Clinical Research Facility supported the volunteer recruitment. We acknowledge Life Science Technologies Corporation as the provider of Cytotune. We thank F.-J. Müller for insights regarding the PluriTest method, and the GTEx consortium for making raw data and intermediate results available.

Author information

Authors and Affiliations

Authors

Contributions

H.K., A.G., O.S. and D.J.G. wrote the paper with input from all authors. H.K., A.G., D.B., Y.M., I.S., P.D., D.M., A.A., M.P., A.M., D.D., A.I.L., O.S. and D.G. contributed to the Supplementary Information. H.K., A.G., A.L., F.P.C., P.D., D.M., K.A. and D.D. analysed the data. S.A. and W.H.O. managed and supervised collection of research volunteer samples. F.S., C.A.A., A.A., R.N., S.H., M.P., S.R.P., A.W. and C.M.K. generated iPS cell lines, tier 1 assay data, cell growth data, RNA-seq and methylation data. V.A. and D.B. generated and processed the proteomics data. A.L., O.J.C., R.M., N.M. and D.D. generated and processed the high-content cellular imaging data. S.A.M., S.B. and Y.M. carried out initial data quality control and bioinformatics processing/pipelines. A.F., P.W.H., I.S. and L.C. curated and managed data and the project website. R.H. and A.K.-K. coordinated the project. D.D., P.B., W.H.O., E.B., L.V., A.I.L., F.M.W., R.D., O.S. and D.G. supervised and designed the research. H.K. and A.G. contributed equally to this work; O.S. and D.J.G. contributed equally to this work.

Corresponding authors

Correspondence to Fiona M. Watt, Richard Durbin, Oliver Stegle or Daniel J. Gaffney.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks E. Dermitzakis, S. Montgomery and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Overview of the cellomics assay.

a, Example plate layout for the cellular differentiation assay. Images are shown for the pluripotency markers (OCT4, SOX2 and NANOG) as they are measured in the Cellomics imaging device. Each line is measured in two rows of the same plate as technical replicates. The secondary antibody used for each marker is shown in parenthesis (aG, anti-goat antibody; aM, anti-mouse antibody). Each plate also has measurements for staining with the secondary antibody only, which serves as a means to assess background fluorescence. The red channel shows the signal from the DAPI staining, the green channel the marker signal. As expected, there is only a small signal from the green channel in the wells stained for the secondary antibody only. Image acquisition stops as soon as 10,000 cells have been detected. b, Detailed variance components of the cellomics markers (Methods). Substantial proportions of the marker variance could be attributed to batch factors, including staining, technician effects and antibody lots. These effects mean that the fraction of cells expressing particular markers need to be interpreted with caution (Fig. 1c, d). c, Pairwise Pearson’s correlation (r) between quantitative expression scores derived from immunostaining for pluripotency and differentiation and the PluriTest score (P values from a Student’s t-test).

Extended Data Figure 2 Pluritest scores in the two culture conditions.

ac, Comparison of PluriTest novelty score versus pluripotency score for the 711 lines generated. Lines grown on feeder-free conditions (E8 medium) scored systematically lower than feeder-dependent lines (P = 1.62 × 10−43; t-test, for pluripotency score). We note that, while we cannot rule out that feeder-free lines are less pluripotent, feeder-free conditions are not well represented in the PluriTest training dataset, which may explain this result (of the 204 ES cell/iPS cell lines in the PluriTest paper that have medium metadata available, none were cultured in E8 and only 37 were cultured in a variety of other feeder-free formulations such as MTSER). d, Despite lower pluripotency scores, lines grown on feeder-free conditions have higher fractions of cells expressing canonical protein markers of pluripotency.

Extended Data Figure 3 Extended CNA analysis.

Relationship between the number of CNAs, using three CNA minimum length thresholds for calling CNAs: 200 kb, 500 kb and 1,000 kb and other experimental factors. Values on the x axis have been ‘jittered’ (that is, small random ‘noise’ has been added to the true values) to enhance the visualization. Data points underlying the box plots are shown as semi-transparent blue dots. a, Number of CNAs per line versus passage number. P values are shown from a generalized linear mixed model (Poisson regression) with donor random effect. b, Box plot of the number of autosomal CNAs per line versus growth medium. P values are for a Poisson regression for culture condition. c, d, Number of autosomal CNAs per line versus PluriTest pluripotency and novelty scores. P values are for a linear mixed model of the number of autosomal CNAs per line with a random effect for donor. e, f, Number of CNA counts per donor versus gender and donor age. CNA counts refer to the total number of unique CNAs across all lines derived from the same donor. CNAs that are shared between lines of the same donor (overlap by at least one base) are counted only once. P values are shown for a Poisson regression for either gender or age.

Extended Data Figure 4 Location and consequence of the recurrent CNA on chr20.

Related to Fig. 2. Top, genomic location versus number of lines with copy number 3 (grey) and with a CNA (black). Bottom, the NAV gene score from ref. 22 and log2 gene expression fold change between the iPS cell lines with copy number 2 and 3a (colour scale), in the region highlighted in red in the top panel. Highlighted genes are upregulated when copy number increases, known oncogenes/tumour-suppressor genes and/or genes with a NAV score in the top 2%.

Extended Data Figure 5 Functional assessment of CNAs using growth assays.

ac, Cell growth rate (a), proliferation (b) and apoptosis (c) in cell lines with copy number 2 (wild type, blue dots) or copy number 3 (mutant, red dots) in a recurrently duplicated region in iPS cells on chromosome 1, 17 or 20. Plot titles show the donor name and the genomic coordinates of the CNA. a, Cell counts taken on successive days in culture, for pairs of lines (one mutant, one wild type) grown on the same 24-well plates are shown. Asterisks denote significance levels for statistical interactions between day and copy number in a linear mixed model, using fixed effects to fit day and copy number, and random effects to account for culture plate effects. EIF4A3 denotes whether a copy number variant overlaps one of the suspected candidate genes on chromosome 17. *P < 0.05; **P < 0.01; ***P < 0.001. b, Protein expression level measured using TMT-based quantification using the Q-exactive plus (labelled QE Plus) orbitrap and a fusion (labelled Fusion) orbitrap mass spectometry platforms. c, Estimated fraction of fluorescing nuclei following an EdU assay in mutant and wild-type lines, following exposure to mitomycin (Treated) or in a control sample (Untreated). d, Estimated fraction of fluorescing nuclei following a terminal deoxynucleotidyl transferase dUTP nick end labelling (TUNEL) assay in mutant and wild-type lines, following exposure to mitomycin (Treated) or in a control sample (Untreated). Solid trend lines are least squares regression fits. P values in b and c denote the significance of statistical interactions between copy number and mitomycin treatment condition (treated or untreated).

Extended Data Figure 6 Effect of passage number on tier 1 and tier 2 data and overview of iPS cell cis eQTLs mapped with tier 1 gene expression array data.

a, b, Passage number versus PluriTest pluripotency and novelty scores shows no significant association between passage number and pluripotency. Trend lines are fit using a linear regression of the PluriTest scores and the passage number (score P = 0.66, novelty P = 0.21). Association was also not significant when including gender and medium as fixed effects and batch variables and donor as random effects (score P = 0.3, novelty P = 0.14). c, Passage number versus log10 RNA-seq expression of pluripotency factors NANOG and OCT4 shows no significant association between passage number and pluripotency. Trend lines are fit using linear regression of log10 expression and passage number (NANOG, P = 0.5; POU5F1, P = 0.15). Association was also not significant when considering the two genes together and when including gender and medium as fixed effects and batch variables and donor as random effects (passage, P = 0.28; passage–gene interaction, P = 0.96). d, e, Variance component analysis for tier 2 assays, showing that for the majority of genes gender and passage explained little of the total variance. f, g, Comparison of lead variant effect sizes (β2) in gexarray-based eQTL maps. The eQTL maps were derived by using mean expression levels per donor (‘main’ map) and with two sets of individual lines (one per donor), drawn randomly (‘replicate’ maps). The effect sizes for all tested genes are shown in black, with FDR < 5% eGenes from the main map indicated in blue. Effect sizes are compared between the two replicate maps (f, Spearman’s rank correlation ρ = 0.47 genome-wide, ρ = 0.80 eGenes only, both P < 2.2 × 10−16) and between the main map and one replicate map (g, Spearman’s rank correlation ρ = 0.57 genome-wide, ρ = 0.88 eGenes only, both P < 2.2 × 10−16). The effect sizes obtained using the mean expression values per donor are higher than when using individual lines. h, Pairwise correlation between gene expression levels in iPS cells measured with RNA-seq and gexarray. The Spearman rank correlation coefficients of either gene (pink) or gexarray probe (blue) region based read counts are shown, showing a higher correlation for probe-based counts.

Extended Data Figure 7 Properties of iPS cell cis eQTLs in comparison to somatic eQTLs.

a, b, The power to detect eQTLs is plotted, comparing 44 somatic tissues from GTEx24 (V6p) and the HipSci RNA-seq-based eQTL map (purple triangle), considering either the absolute (a) or relative (b) number of eQTLs identified (eGenes, FDR < 5%). The major determinant of eQTL detection power is sample size. c, Cumulative fraction of RNA-seq reads relative to the number of protein-coding genes expressed. The mean read count derived from 20 iPS cell lines (10 donors, two lines each) is plotted, five fibroblast lines, and two ES cell (ESC) lines. In iPS cells, half of the reads are explained by the expression of 1,071 genes, whereas 75% and 90% of the reads are explained by the expression of 3,159 and 5,814 genes, respectively (total protein-coding genes with non-zero counts n = 17,332). d, Distribution of iPS cell eQTLs around the annotated gene start position. The −log10(eQTL P value) is plotted against the distance (in bp) from the gene start for lead eQTL variants genome-wide, highlighting significant eQTLs (FDR < 5%) in orange. e, Comparison of the magnitude of eQTL effect size (absolute beta; left) and minor allele frequency (MAF; right) between iPS-cell-specific (n = 2,131; labelled as S) and non-specific eQTLs (n = 4,500; labelled as NS), demonstrating that overall, iPS-cell-specific eQTLs have smaller effects on the transcriptome than eQTLs shared among multiple tissues (P = 9.97 × 10−161; Wilcoxon rank-sum test) and have a lower minor allele frequency (P = 1.08 × 10−35, Wilcoxon rank-sum test).

Extended Data Figure 8 Comparison of eQTL mapping pipelines between HipSci and GTEx (V6p).

a, Proportion of tissue-specific eQTLs as a function of the discovery sample size. For iPS cells, the two sets of tissue-specific eQTLs obtained with the two different mapping pipelines (Methods) are shown, namely the standard HipSci pipeline (iPSC; purple triangle) and the alternative ‘GTEx-like’ pipeline (iPSC2; purple triangle). Points other than iPS cells are from the GTEx Consortium (44 somatic tissues and cell lines)24. b, Heat map of pairwise π1 values (π1 = 1 − π0) between iPS cells and GTEx tissues, with rows representing the discovery tissue and columns the replication tissue. Clustering of tissues is based on euclidean distance (R hclust, method = average). c, Effect of eQTL replication threshold on the definition of tissue-specific effects. The replication profile of iPS cell eQTLs across GTEx tissues relative to discovery sample size in each replication tissue is shown. The proportion of lead eQTLs from iPS cells that replicate in each tissue is plotted, with replication defined using two different replication thresholds (TH1: nominal eQTL P < 0.01 / ntissues; TH5: P < 0.05 / ntissues; plotted as dots and triangles, respectively). d, Enrichment of alternative iPS cell eQTLs (GTEx-like) at proximal and distal (defined as less than or greater than 2 kb from the transcription start site) transcription factor binding sites of promoters in H1 hES cells from the ENCODE Project50. Fold enrichments per factor are shown for iPS-cell-specific and non-specific eQTLs (minimum 10 observed overlaps) (Methods). Pluripotency-associated factors are indicated with an asterisk. The profile of enrichments is comparable to that obtained with the standard HipSci pipeline (Fig. 4d).

Extended Data Figure 9 iPS cell eQTLs and disease.

a, Cumulative number of cancer genes (COSMIC cancer census 27 April 2016; ngenes = 571; ref. 20) regulated by eQTLs in iPS cells, somatic tissues (GTEx V6p), and three different cancers (ER positive and negative breast cancer, colorectal cancer)32,33. b, Enrichment of iPS cells and somatic eQTLs (lead variants and their high-linkage-disequilibrium proxies) at disease-associated variants in the NHGRI–EBI GWAS catalogue (10 April 2016). The fold enrichment of eQTLs over 100 random sets of matched variants for each tissue relative to eQTL discovery sample size is shown. The tissues showing the highest fold enrichment are liver and brain (cerebellar hemisphere; BrainCH). c, Somatic eQTL signal for the PTPN2 (protein tyrosine phosphatase, non-receptor type 2) locus on chromosome 18. This locus contains a colocalizing association signal for PTPN2 gene expression in iPS cells and five immunological disease phenotypes (Fig. 5a). d, Somatic eQTL signal for the TERT (telomerase reverse transcriptase) locus on chromosome 5 (Fig. 5b). In both c and d, the lead eQTL variant locations are indicated with red and orange vertical lines for iPS cells and somatic tissues, respectively. The focal gene regions are indicated in solid grey and gene start positions of other protein-coding genes on the same strand with vertical grey lines.

Extended Data Figure 10 Tissue expression and alternative splicing results for the TERT locus.

a, b, Normalized RNA-seq per-base coverage across the TERT locus stratified by rs10069690 genotype. The full locus (a) or zoomed view of the region (b) around the lead eQTL and cancer risk variant rs10069690 are shown. rs10069690 is indicated with a dotted line on each plot. Grey regions indicate annotated exons from Ensembl version 75. Coverage was computed from indexed BAM files using the coverageBed function from the bedtools (version 2.25.0)93. Raw coverage was divided by total library size in millions (total number of mapped reads) per sample to obtain normalized coverage, which was then averaged over samples with the same rs10069690 genotype to obtain mean normalized coverage for each genotype group. c, Profile of TERT expression in iPS cells and across somatic tissues from GTEx. The gene FPKM values obtained with RNA-SeQC (GTEx V6p) are shown. d, Splicing-QTL of TERT. We quantified TERT intron retention rates using Leafcutter92 and identified one alternative splicing event associated with rs10069690, the lead iPS cell eQTL variant for TERT (Fig. 5b). The TERT intron 4 retention ratio (PSI, per cent spliced in) is shown in iPS cell lines of all individual donors stratified by their genotype at rs10069690. This variant affects the splicing of the intron where it is located, with the minor allele (T) increasing the fraction of TERT transcripts in which intron 4 is retained (P = 1.7 × 10−9, Bonferroni-adjusted linear regression).

Supplementary information

Supplementary Information

This file contains supplementary information and methods. (PDF 337 kb)

Supplementary Table 1

Sample meta data for the HipSci cell lines used in this publication. This is a subset of HipSci's full catalogue of cell lines and data, which can be queried at http://www.hipsci.org/lines. (XLSX 184 kb)

Supplementary Table 2

CNA results. (a) CNA locations (b) Significance of CNA recurrence over 200 kb genome windows (c) Properties of the recurrent CNAs, including: peak region, overlap with chromatin fragile sites, cis (same chromosome from the CNA) and trans (different chromosome) regulated genes (i.e. genes differentially expressed between copy-number 2 and 3 lines), and top candidate genes (identified as described in the main text). (d) Genome-wide association of copy numbers at recurrent CNAs with gene expression (e) Pathway enrichment analysis of genes regulated in trans by the chromosome 17 recurrent CNA region. (XLSX 199 kb)

Supplementary Table 3

Gene expression variance components analysis. Fraction of variance explained by the factors considered for each expression array probe. (XLSX 3388 kb)

Supplementary Table 4

iPSC eQTL results. (a) eGene level summary of the cis-eQTLs discovered with different sample sets in this study. (b) eQTL results for primary and secondary lead eQTL variants of HipSci RNA-seq iPSC eGenes at FDR < 5% (N = 6,631). Primary and secondary eQTLs are defined by the column ‘primary_eQTL’. The column ‘iPSC_specific’ defines whether the eQTL is iPSC-specific. Columns ‘N_proxies_used’ and ‘proxy_positions’ give the total number and positions of proxy variants that were tested in the tissue-specific analysis. Additionally, the column ‘overlaps_CNA’ indicates whether the eQTL lead variant overlaps with a recurrent iPSC CNA. (XLSX 1545 kb)

Supplementary Table 5

Tissue information. (a) Description of the tissue data used in this study to define tissuespecific eQTLs (GTEx V6p, HipSci), including the embryonic origin of each tissue and number of tissue-specific eQTLs identified for each tissue. (b) Summary of iPSC eQTL replication tests in the tissue-specific analysis, showing for each replication tissue how often proxy variants (‘ld_buddy’, ‘best_proxy’) were tested instead of the same lead variant (‘same_as_lead’). (XLSX 17 kb)

Supplementary Table 6

iPSC eQTL overlap with disease-associated variants. (a) All disease-associated variants in the NHGRI-EBI GWAS catalogue (release 2016-04-10) which are tagged by an iPSC eQTL (lead variant or r2 > 0.8 proxy). For proxy matches, all eQTLs for which the variant is a proxy (r2 > 0.8) are shown. (b) Disease-associated variants in the GWAS catalogue that are lead eQTL variants in iPSCs (subset of (a)). For each variant, the number of high-LD proxies it has is listed (‘N_HIGH_LD_PROXIES’). (c) Individual traits in the GWAS catalogue for which iPSC eQTLs show a significant enrichment (BH-adjusted empirical P < 0.05, derived from 100 random sets of matched variants; Methods). Shown are traits with minimum five variants tagged by iPSC eQTLs. (d) Results of the colocalisation analysis for 14 traits. (XLSX 121 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kilpinen, H., Goncalves, A., Leha, A. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017). https://doi.org/10.1038/nature22403

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature22403

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing