Human pluripotent stem cells (hPS cells) can self-renew indefinitely, making them an attractive source for regenerative therapies. This expansion potential has been linked with the acquisition of large copy number variants that provide mutated cells with a growth advantage in culture1,2,3. The nature, extent and functional effects of other acquired genome sequence mutations in cultured hPS cells are not known. Here we sequence the protein-coding genes (exomes) of 140 independent human embryonic stem cell (hES cell) lines, including 26 lines prepared for potential clinical use4. We then apply computational strategies for identifying mutations present in a subset of cells in each hES cell line5. Although such mosaic mutations were generally rare, we identified five unrelated hES cell lines that carried six mutations in the TP53 gene that encodes the tumour suppressor P53. The TP53 mutations we observed are dominant negative and are the mutations most commonly seen in human cancers. We found that the TP53 mutant allelic fraction increased with passage number under standard culture conditions, suggesting that the P53 mutations confer selective advantage. We then mined published RNA sequencing data from 117 hPS cell lines, and observed another nine TP53 mutations, all resulting in coding changes in the DNA-binding domain of P53. In three lines, the allelic fraction exceeded 50%, suggesting additional selective advantage resulting from the loss of heterozygosity at the TP53 locus. As the acquisition and expansion of cancer-associated mutations in hPS cells may go unnoticed during most applications, we suggest that careful genetic characterization of hPS cells and their differentiated derivatives be carried out before clinical use.
We thank the many institutions and investigators world-wide that provided their cell lines and supported the publication of the results. We are indebted to D. Santos, M. Smith, K. Elwell, M. A. Yram, S. Ellender, L. Bevilacqua, and D. Gage for their assistance with the regulatory and logistical efforts required to acquire and sequence hES cell lines. We also thank K. Lilliehook for her comments, I. Yildirim for his assistance with the molecular modelling of P53 mutations, and C. Usher for help with figure schematics. We regret the omission of any relevant references or discussion due to space limitations. The Genomics Platform at the Broad Institute performed sample preparation, sequencing, and data storage. Y.A. is a Clore Fellow. N.B. is the Herbert Cohn Chair in Cancer Research and was partially supported by The Rosetrees Trust and The Azrieli Foundation. Costs associated with acquiring and sequencing hES cell lines were supported by HHMI and the Stanley Center for Psychiatric Research. F.T.M., S.A.M., and K.E. were supported by grants from the NIH (5P01GM099117, 5K99NS08371). K.E. was supported by the Miller consortium of the HSCI, and F.T.M. is currently supported by funds from the Wellcome Trust, the Medical Research Council (MR/P501967/1), and the Academy of Medical Sciences (SBF001\1016).
Extended data figures
Considered and whole exome sequenced hESC lines. Tab 1. We considered hESC lines for WES if they were listed on the NIH Human Embryonic Stem Cell Registry (http://grants.nih.gov/stem_cells/registry/current.htm) or if they were prepared under GMP conditions. Cell lines were typically excluded from consideration if they were unavailable for distribution or contained known karyotypic abnormalities in more than 10% of analyzed cells or disease-causing mutations identified by PGD. Cell lines with MTAs that restricted our ability to work with the cell lines, that could not be recovered upon thawing, or proved to be unavailable upon request were also excluded. Passage number at the time of request, the number of passages and time in culture from thaw to passaging, and the passaging method, media, and substrate, are provided, as is mean sequencing coverage and % cross sample contaminated per cell line. GMP, good manufacturing practice; MTA, material transfer agreement; PGD, pre-implantation genetic diagnosis; WES, whole exome sequencing. Tab 2. Summary of number of cells considered and sequenced, including reasons for exclusion. These data are presented graphically in Figure 1b-e.
Identification of candidate mosaic variants present in sequenced hESCs. Tab 1. Filters used to identify likely mosaic variants among all heterozygous variants present among the sequenced exomes of 140 hESCs. Tab 2. List of 263 candidate mosaic variants passing quality control filters and present no more than two times among the 140 sequenced hESC lines. Variants are arranged by chromosome position and annotated by likely functional impact and frequency in the general population (ExAC AC). Tab 3. Variants from the list in Tab 2 predicted to have either a high or damaging impact on gene function based on a consensus of 7 bioinformatic algorithms. See Materials and Methods for further details. Tab 4. In addition to mosaic variants identified using these stringent filters, we provide an inclusive list of all high confidence somatic variants (n=36,396) that pass the binomial test with a P value of <0.01. SNP, single nucleotide polymorphism; CHROM, chromosome number; POS, genomic position (hg19); REF, reference allele; ALT, alternate allele; HESC, human embryonic stem cell line; REFC, count of reference alleles; ALTC, count of alternate alleles; FILTER, high confidence variant score; EXACAC, allele count in the Exome Aggregation Consortium (ExAC) database; IMPACT, predicted effect of mutation; HESCAC, allele count in hESCs; TOTALC, REFC+ALTC; AF, allelic faction (ALTC/TOTALC); P, P value for binomial test on allelic fraction.
Characteristics of TP53 mutations identified in hESCs by WES and RNAseq. Tab 1. Summary of all 15 instances of TP53 mutations observed by WES and RNAseq with details of read depth, allelic fraction, P value, reference, and culture method. Note that all observed mutations are frequently seen in human cancer, and that most mutations have evidence of mosaicism, indicating that they were likely culture-derived. bFGF, basic fibroblast growth factor (FGF2); COSMIC, Catalogue of Somatic Mutations in Cancer (http://cancer.sanger.ac.uk/cosmic); ExAC, Exome Aggregation Consortium (http://exac.broadinstitute.org/); Freq., frequency; GMP, good manufacturing practice, IARC, International Agency for Research on Cancer (http://p53.iarc.fr/); ICGC, International Cancer Genome Consortium (http://icgc.org/); MEF, mouse embryonic fibroblast; Seq., sequencing; SNL, SNL mouse fibroblast feeder cell line; WES, whole exome sequencing. Errors denote SEM. Tab 2. Breakdown of the incidence of P53 mutations by culture media, substrate, and passaging method.
Primer and probe sequences used for ddPCR-based determination of P53 variant allele frequency.
Calculation of selective advantage conferred by three distinct TP53 variants. The allelic fraction of TP53 variants was measured at several passages by ddPCR in hESCs cultured under standard conditions. Replicate experiments per passage are shown in grey, and average values are shown in black. The observed increase in allelic frequency of each of the variants across time in culture is consistent with a substantial growth or survival advantage in all but one instance. See Materials and Methods for details on ddPCR and the calculation of the effect per passage.
Large copy number variants in hESCs identified by the human Psych Array. Tab 1. Summary of hESC lines with large copy number variants (>500kb) as ascertained by SNP array analysis. Two of the five cell lines with acquired TP53 mutations harbored large structural alternations (HUES71 and MShef10). Five separate cell lines (CSES25, ESI051, MShef3, UM78-1 and WA21) had an amplification at the pericentromeric region of chromosome 20 (Chr20q11.21). Tab 2. Complete list of large deletions or duplications (>500kb) identified across the 140 hESC lines.
Identification of TP53 mutations in hPSCs by RNA sequencing and WES. Tab 1. List of all RNA sequenced samples from hPSCs. Five of these samples (cell2-7) were removed since they were from single stem cells rather than cell lines. Tab 2. Summary of the number of samples and studies generated from each cell line. Tab 3. List of all samples harboring TP53 mutations, their chromosomal location, and the relevant study. Tab 4. Summary of all affected cell lines and studies. Tab 5. Summary of affected samples, cell lines, and number of mutations seen in hESCs and hiPSCs by WES and RNAseq.