A common nursery school song asks “Which of these things is not like the others?” For stem cell lines, this question can be tough to answer. Even cells expressing the same cell-surface markers can have different origins and show disparate behaviours. Now, researchers led by Jeanne Loring at The Scripps Research Institute in La Jolla, California, have created a robust classification system by applying computer algorithms to data from about 150 cell samples1. Further analysis revealed a vast protein-protein network that maintains stem cells' pluripotency, or the ability to become any sort of cell in the body.

Jeanne Loring and Franz-Josef Müller

The project began because Loring and her colleagues suspected researchers were overgeneralizing results from one neural stem cell line to others. Usually, labs work with only one or two neural stem cell lines, which makes broad comparisons impossible, says Loring. She and her colleagues began asking labs to share data on gene expression and culture conditions. The subsequent analysis on more than five dozen brain-derived lines showed that these lines fell into many clusters.

Hoping to resolve the clusters, the researchers began adding more and more types of cell lines. Although neural stem cells continued to fall into several clusters, pluripotent human stem cell lines clustered together whether they were derived from embryos or from genetically engineered cultured skin cells. This indicates that both embryonic stem cells and so-called induced pluripotent stem cells will be similarly useful for drug screening or potential therapies. It also promises a technique for determining whether a cell is truly pluripotent. “The network captures pluripotency better than any markers we can find,” says Loring.

Although scientists routinely analyse gene activity within stem cells, the analysis technique and the number of lines analysed is unusual, says lead author Franz-Josef Müller. In particular, the scientists used an 'unsupervised' machine-learning approach that clustered cell types irrespective of whether they were labelled as neural stem cells, pluripotent stem cells or other types. When he proposed this strategy at a training course for microarray analysis at the Max Planck Institute in Berlin, other attendees were not impressed. Most people thought it best to analyse just one cell line at a time, recalls Müller. “They were like 'go home, think about what you want to do and come back later'.” But one of the instructors, coauthor Dennis Kostka, had used unbiased clustering techniques for categorizing cancer and thought there might be something to Müller's proposal. The first analysis did not show a clear picture of the clustering that the researchers ultimately identified, but adding more data from more samples improved the outcomes. “We had to embrace complexity,” says Müller.

“All these samples cultured in different ways from all over the world are all very alike.” Jeanne Loring, The Scripps Research Institute

Not only did the researchers find that pluripotent stem cell lines clustered together, but they also found that the grouping became more robust as more cell lines were added. “All these samples cultured in different ways from all over the world are all very alike,” says Loring.

The gene-expression analysis they used was capable of finding differences between samples but not of finding networks common within clusters, so Loring and her colleagues used another sort of algorithm to probe publicly available data on protein-protein interactions. When they applied this analysis to the pluripotent stem cell lines, they identified elements of a regulatory network known in mouse cells, but this analysis also suggested that the known network was a relatively small component of a much bigger one represented by nearly 300 genes. Their analysis also helps resolve some apparent paradoxes. For example, expression of the transcription factor Nanog is required in pluripotent stem cell lines but is not present in oocytes, which generate pluripotent stem cells. Other components of the network are overexpressed in oocytes, says Müller.

Ultimately, finding this robust network of proteins depended on a human network of scientists who shared data on their cell lines. These people were taking a risk, says Loring, as the analysis could have discovered information that might have made the lines from a particular lab seem irregular or inferior to others in the same category.

The researchers hope that both sorts of networks will expand. They have created a wiki where scientists can share ideas, questions and methods. The goal is that as more and more kinds of stem cell lines are included in the analysis, a sort of taxonomy of stem cell lines will emerge. Then, just as biologists can make reasonable predictions about the biology of one species from studies of its relatives, stem cell scientists will also know how their results on say, blood, embryonic, neural or skin cells can be more broadly applied.

Genes within the identified regulatory network have been linked already to life span, embryogenesis and tumorigenesis. Further refinements should make it easier to predict and manipulate stem cells' behaviour, and plenty of data and cell types remain to be sorted. “There are lots of blurry edges,” says Müller. “These are very attractive.”