Michael A Teitell

As potential therapies derived from pluripotent stem-cell lines move towards the clinic, researchers have yet to address an essential, practical question: how should cells be screened for genetic variants that contribute to cancer or other diseases? The tools to address this question exist, but discussions on how to use them for this purpose are rare and uncoordinated.

Stem cell biologists are understandably more obsessed with epigenetics than with genetics. In a developing human embryo, epigenetic controls over gene expression coordinate the creation of several hundred cell types to generate an entire organism. Remarkably, laboratory techniques can trigger epigenetic reprogramming of somatic cells to an embryonic-like state, and scientists are learning how to direct such cells into the types useful for research and therapies. Improvements in techniques (in both nuclear transfer and direct reprogramming) could soon mean that more pluripotent cell lines can be generated for particular purposes or even for particular patients. That is precisely why the field should give more attention to genetic variation within stem cell lines.

Embryonic stem (ES) cell lines vary dramatically in their proclivity to differentiate into particular lineages1,2. Although some of this variation will be related to chance events and culture techniques, some differences will certainly come down to the genetics of human populations. Understanding this contribution will become even more important as more cell lines are derived from individuals with specific disease susceptibilities.

Much human variation can be pinned on differences in DNA. Genome-wide association studies identify loci with small effects that influence a person's chance of getting a range of diseases, including inflammatory diseases and cancer. Gene-linkage analyses, such as the linking of mutations in BRCA1 and BRCA2 genes to breast cancer, have found a smaller number of loci with larger effects. The lists of dangerous sequences grow with every upload to PubMed.

The recent examination of James Watson's genome provides a good thought experiment. Massively parallel DNA sequencing assessed more than 3.3 million single nucleotide polymorphisms (SNPs) in Watson; these were then compared with a human 'reference genome'3. That analysis revealed several variants linked to illness and about 11,000 SNPs affecting the amino-acid sequence of various proteins. If a pluripotent stem cell line were made from Watson's cells, how could scientists use genetic analyses to determine whether this line would be more or less suitable than others for a particular study or therapy? What established disease-susceptibility genes should they test for?

It is time to consider how to assess stem cell genomes and to discuss whether those associated with disease predisposition should be excluded from the therapeutic arsenal

In short, it is time to consider how to assess stem cell genomes, and to discuss whether genomes associated with disease predisposition should be excluded from the therapeutic arsenal.

Counting chromosomes is not enough

Right now, the one quality that pluripotent stem cell genomes are routinely assessed for is stability: do cells maintain the normal number and gross configuration of chromosomes over long periods in culture? This overlooks finer genome rearrangements, including ones that could, presumably, make cells dangerous in the clinic and unreliable in the laboratory.

At the very least, it seems, stem cell lines should be screened for genetic variants predisposing cells to uncontrolled growth, but even for that defined purpose, setting the level of rigour is difficult. Insightful data-mining techniques recently implicated the unsuspected TMPRSS2–ERG fusion gene in prostate cancer4. At what point should an institution hoping to use a cell line for therapy screen lines for such oncogenic variants? What if a susceptibility locus for heart disease was in a cell line being prepared for transplantation into children with heart disease? Those transplants could outlast the scientists and regulators who green light the clinical trial. At what point in clinical development should these questions be raised? By the time cell lines carrying suspected variations are ready to move forward, such variants might be recognized as either harmless or unacceptable.

Stem cells have multiple sources of genetic variation

Assessing genome quality will also mean understanding the sources of variation. Certain genetic changes are likely to arise only at distinct stages of human ES cell generation, maintenance and differentiation. The long-term effects of intentional genomic modifications, such as inserting reporter genes, are also largely unassessed. In other words, assessing genome quality means not just asking what the variants are, but also when that variation occurs.

A crucial distinction is whether variations have arisen before or after material taken from individuals is used to generate a cell line; for an ES cell line, variations not found in either gamete donor could arise during gametogenesis, embryo development or manipulation. Extended cell culture may select for other variants. It is well established that some gross chromosomal changes tend to accumulate over culture time5 and smaller, currently undetected changes may accumulate as well.

A basic assessment would compare genomes in the starting materials and in established cell lines. For human ES cell lines created from left-over embryos this could be problematic. Even if tissue from the parents was available, it could be unreliable. Chromosomes within gametes are different from those in progenitor cells; for these and other reasons, genomes in tissue collected from adults may not represent the genomic contribution to an embryo.

For somatic cell genomes reset to pluripotency, sufficient material for analysis can probably be gathered from the initial biopsy and the established pluripotent stem cell line. Such analyses would pick up any variation between the patient's genome and an established stem cell line. Arguably, if cell lines are generated from specific patients, stability will become the most important criterion, as the patient and the cells will have the same genome. Even when human pluripotent stem cell lines are generated without viruses or through nuclear transfer, the act of reprogramming and culturing might still introduce selective pressures that favour certain genetic alterations over others.

Many techniques are available

Currently, genome integrity for both human ES cells and induced pluripotent stem cells (iPS cells) is mainly determined by classical G-banding cytogenetics — the Giemsa staining of metaphase chromosomes, which are compact enough to be visible in the light microscope. Very occasionally, a technique called metaphase-based comparative genomic hybridization (CGH) is used to assess human ES cells. These techniques detect gross chromosomal aberrations such as aneuploidy (the lack or extra copy of a chromosome). But G-banding-based karyotyping can only detect features 5–10 Mb in size and CGH can detect those of 2–3 Mb, so genomic changes of less than 2 Mb go unnoticed. None of the thousands of variations of the sort identified in Watson's genome can be considered, nor can differences in how many copies of a gene exist within a genome; such copy number variant (CNV) regions are increasingly recognized as important.

The genetic variability known to influence human traits has not yet been assessed in pluripotent stem cell lines, but there are several approaches for doing so (Table 1). Several relatively rapid, massively parallel genome-wide sequencing methodologies are available6,7. The parallel-sequencing technique used to examine Watson's genome has also identified more than a hundred somatic rearrangements in lung cancer biopsies, and shows what is currently possible6.

Microarray-based techniques, like array-CGH (aCGH), use DNA hybridization to detect SNPs or CNV regions of a few thousand to more than a million base pairs8. These non-structural, non-sequencing methods do not supply information on genome location, so some variations, such as rearrangements, are not detected. Still, a technique like aCGH can survey the genome more broadly than can traditional, non-parallel sequencing-based techniques. A recent aCGH study from 8 investigators led by my laboratory compared two commonly used human ES cell lines and identified nine regions with confirmed CNV differences plus 70 likely differences9. Though the effects of these differences have not been characterized, they could reasonably influence how cells differentiate or function.

What variations should we look for?

Of course, too-detailed an analysis of stem cell genomes could be viewed as merely a technical exercise because we do not yet know how to pluck useful information from the enormous amounts of data that would be generated. So what is the best way to analyse genomes of pluripotent stem cell lines?

Sequencing whole genomes over multiple passages, although most accurate, is impracticable. Still, to answer basic questions about genome stability in culture, as well as after genome modification, reprogramming and differentiation, high-resolution sequencing could be required. Perhaps high-density array platforms for SNP and CNV analysis, coupled with traditional karyotyping, would identify genomes best suited for clinical applications.

So far, this is easier said than done. After all, findings from genome-wide association studies in whole organisms may not be useful for stem cells. Most SNPs and CNVs associated with diabetes have yet to be linked to clinical traits such as insulin production even in the intact organism. It is too early to say that genetic variations protecting individuals from diabetes would yield clinically successful cell products. For developing treatments, currently unmeasured traits may come into play, such as how stably or efficiently transplanted cells respond to cues in their new environment. Indeed, genetic variation within transplant recipients beyond potential immune incompatibilities, rather than within transplanted cells, could become an important consideration.

Long lists of important but difficult questions should not stop scientists from considering what is possible now. A near-term consideration could be targeted array platforms that screen for those genome alterations that are strongly linked to disease susceptibilities. Another near-term consideration is planning for the future.

The scientific community has the tools for more rigorous assessments than are currently made. How to apply those tools is undoubtedly a complex problem, and at this point a solution is far from clear. Certainly, though, a first step is to get discussion of this issue onto the agenda.