The problem of cell type became clear to genome biologist Jason Buenrostro in 2013. He was studying a cell line derived from someone with cancer, trying to map out how the DNA was arranged in the nucleus. The cells should have been pretty much identical, he thought. But the more Buenrostro looked at the DNA, the more differences he found in how it was packaged1. “I realized that there were probably hundreds of flavours,” recalls Buenrostro, who was a graduate student at Stanford University in California at the time.

This and other research pushed him to conclude that “every cell is a special snowflake”. And that conclusion significantly complicated his research into how some cancer cells develop resistance to drugs. For Buenrostro, now at Harvard University in Cambridge, Massachusetts, it meant that “all of these snowflake cells can actually be important”.

Despite their individuality, there are pragmatic reasons to group similar cells together. “Defining cell types is crucial for understanding new biological phenomena, elucidating underlying mechanisms, and identifying therapy targets,” says Zhang Zhang, a bioinformaticist at the Beijing Institute of Genomics at the Chinese Academy of Sciences.

Already, projects to build huge atlases of cells are yielding torrents of data and insights into disease. Since the mid-2010s, scientists categorizing cells have leant heavily on single-cell RNA sequencing, a technique that identifies the genes that each cell has turned on, to group those with similar profiles. The multinational collaboration, the Human Cell Atlas, launched in 2016, has analysed more than 90 million cells from more than 11,000 people in an ongoing effort to build 18 different atlases, and published more than 440 studies.

But behind this progress lies a deceptively simple question: what exactly is a cell type?

“A cell type is a group of cells that are similar to each other and distinct from other groups of cells,” offers neuroscientist Hongkui Zeng, director of the Allen Institute for Brain Science in Seattle, Washington. But even that definition leaves plenty of room for interpretation: similar how? Distinct in what way?

Ask a dozen researchers and you’ll get as many different answers. In fact, when the journal Cell Systems did that in 2017, 15 researchers offered wildly different suggestions, pointing to developmental history, molecular profile, shape or function as possible identifiers2. “The debates can be quite heated,” says Buenrostro.

Perhaps that’s because the question goes to the heart of how scientists conceive of the basic unit of life. Some researchers reject the view that a cell is a simple summation of gene expression, as atlases based on RNA sequencing imply. Others argue that a cell’s progression through various states over time must also be considered.

But most can agree on one thing, says Barbara Treutlein, a multicellular systems biologist at the Swiss Federal Institute of Technology (ETH) in Zurich. “There’s a general consensus that it is extremely complicated.”

Technology begets taxonomy

Such struggles over central definitions are hardly unique to cell biology: taxonomists have wrestled with ‘what is a species?’ for centuries, and geneticists had to confront ‘what is a gene?’ when the dogma of one gene making one protein began to break down decades ago.

Throughout the history of cell biology, cellular parts lists have appeared in many forms, reflecting the predominant technology of the time. Around 1900, microscopy was king, and researchers such as Spanish histologist Santiago Ramón y Cajal sketched cells and started grouping them by appearance. For example, certain common, stellate brain cells were called astrocytes, or star-like cells.

As a result of the molecular-biology revolution, which picked up steam in the mid-1900s, scientists learnt to classify cells on the basis of a limited set of molecular markers. So astrocytes became cells that produced glial fibrillary acid protein, or GFAP, which is easily visualized by staining cells with antibodies or tagging the GFAP gene with green fluorescent protein.

Then came single-cell RNA sequencing, a method3 first published in 2009. Today, cell cartographers might define astrocytes by the host of RNAs that they express — an approach that has parallels with the use of comparative genomics to understand the evolution of species.

But none of those tools says much about what astrocytes do, which is to support neurons and synapses.

A luorescent micrograph of a section through the cerebellum of the brain showing Purkinje cells.

Purkinje neurons (turquoise) in the cerebellum, a brain area that controls movement and other functions.Credit: Thomas Deerinck, NCMIR/Science Photo Library

Not that trying to group cells by function is easy. One class of cell that has often been categorized in a functional manner is neurons, which are frequently classified by the chemicals — such as dopamine or serotonin — that they release, says Anne West, a neurobiologist at Duke University in Durham, North Carolina. But many neurons make the same neurotransmitters; for example, scientists in the mid-2000s debated how many types of interneuron produced the neurotransmitter GABA; estimates ranged from four to many, many more. West expects ongoing work with single cells and RNA expression across the brain will help the field to agree on a number.

Nonetheless, cell function might be, “in principle”, the best way to define a cell type, says Joshua Sanes, a neurobiologist at Harvard. And part of a cell’s function, Treutlein adds, is its response to its environment. In living tissue, cells are constantly exposed to signals that could influence them, such as metabolites, hormones or pathogens. “You only really know what type a cell is once you also know how it responds,” Treutlein says. “These states, all together, will tell you what it is.”

She suggests that a future phase of cell atlases should include how cells respond to such changes — for instance, how cells might alter their developmental trajectories in response to drug treatments.

Unfortunately, cell responses and functions are not obvious for many cell types, and might be transient features. Working them out is time-consuming, and cells often change function when transferred from an entire organism into a laboratory dish for focused study.

This forces researchers to adopt more practical cell-typing criteria and explains the dominance of standardized molecular methods — mainly single-cell RNA sequencing, but also the technique developed by Buenrostro and others to examine how DNA is packaged4, as well as spatial methods that link these molecular markers to a cell’s place in tissues. By combining those approaches, “we’ve really redefined what cell types are”, says Sarah Teichmann, a genome biologist and biophysicist at the University of Cambridge, UK, and co-chair of the Human Cell Atlas project.

Have map, will travel

This new molecular approach has yielded exciting results that promise to rewrite much of what cell biologists know about the body. Indeed, although scientists once estimated that there were about 200 cell types in the human body, last year Zeng and her colleagues identified more than 5,000 RNA-based clusters — and therefore potential cell types — in the cortex of the mouse brain alone5.

Focused efforts are also turning up new types, even in well-explored tissues. For example, about 10 years ago, Sanes and his colleagues began investigating the cell types in the mouse retina with single-cell RNA sequencing. At the time, scientists had estimated that there were about 65; the new analysis6 netted at least 130. Before that, researchers had probably missed rarer types or very similar ones that the molecular methods could distinguish, Sanes suggests. Sanes and his collaborators are now comparing retina atlases from different species7.

Cell atlases directly affect medical investigations, too. Two independent research teams discovered a rare, new cell type, that is potentially involved in cystic fibrosis8,9; another group profiled and mapped pacemaker cells in the heart10.

During the COVID-19 pandemic, many atlas researchers turned to investigations of the virus SARS-CoV-2, says Aviv Regev, a computational and systems biologist and head of research and early development at biotechnology company Genentech in South San Francisco, California, who co-chairs the Human Cell Atlas. Studies identified a variety of cell types that were susceptible to infection and showed how their cellular responses mirrored or diverged from those in other diseases11.

Regev says that Genentech is already using cell-atlas data in drug development. For example, one team has been testing a drug for lung disease that binds to a receptor found in cells in the lung. But perusing the cell atlas, the researchers discovered the same receptor in more cells located in the gut that are relevant to inflammatory bowel disease. This led them to test the same drug for that condition. Without the resource, they never would have noticed the similarity, says Regev.

Deeper definitions

Beyond the quest for therapies and the desire to make an inventory of the body, the cell-type question speaks to a deeper quandary: what is the basic unit of life?

“I would say broadly there are two camps,” says Itai Yanai, a systems biologist at New York University Langone Health. “One camp looks at cells, and the other camp looks at genes.”

One person firmly in the cell-focused camp is Alfonso Martinez Arias, a developmental biologist at the Catalan Institution for Research and Advanced Studies in Barcelona, Spain. He says that single-cell RNA sequencing creates a gene-centric view that distracts scientists from other questions. “I think a cell is much more than the sum total of the RNAs that it contains,” says Martinez Arias. For example, when he grows cells in a dish to model the early embryo, the RNA profiles of 2D cultures differ little from those of 3D organoids, he says — even though the 3D versions have very different structures and organizations.

For scientists such as Yanai, however, genes are the fundamental unit of life, and cells are manifestations of those genes. So cataloguing cell type by RNA makes sense: “You tell me which genes are on, I’ll tell you what cell type,” he says. For example, he says, the skin’s pigment-making cells, melanocytes, express a particular “melanocyte module” of genes.

A fluorescent light micrograph of the small intestine.

Villi in the small intestine, which help to absorb nutrients from food.Credit: Thomas Deerinck, NCMIR/Science Photo Library

Another rubric for defining cell type, says Yanai, is to look at the physical state of the genome in the nucleus — how the genome forms loops and coils, leaving some genes accessible and others sequestered, and governing which genes are available for transcription.

But even that genomic arrangement is controlled by other upstream genes and proteins. Could those regulatory molecules be considered the true root of cell types? Günter Wagner, an evolutionary biologist at the University of Vienna, Austria, thinks so.

Wagner and his colleagues have a theory12: that cell types are controlled by large complexes of transcription factors and other molecules called the ‘core regulatory complex’, or CoRC. This big ball of collaborating regulators would pluck the DNA strings to turn on some genes and suppress others, and therefore determine chromatin arrangement, RNA profile — and cell type. CoRCs have been defined for a handful of cell types, such as neural and blood cells, says Wagner, but it’s not clear yet how generalizable the concept is. He suspects that CoRCs would define a shorter list of cell types than would clusters based on single-cell analyses.

The CoRC “is kind of like the unicorn that you’re searching for, for what a cell type is”, says Jeff Doyle, a plant systematist at Cornell University in Ithaca, New York. He has seen hints of them in some plant-cell atlases.

As for the current focus on RNA sequencing, Teichmann admits that the critics have a point. “Of course, a cell type isn’t just the RNA profile,” she says. She notes that the Human Cell Atlas expects to incorporate different methods of cell typing; RNA analysis was just the first to become manageable at scale. And she says that it’s been powerful because RNA reflects other aspects of a cell’s biology, including the arrangement of chromatin and its complement of proteins.

Time and state

Cell types are often sketched according to a cell’s present identity. But a cell’s past and future are just as crucial, says Sam Morris, a stem-cell biologist at Washington University School of Medicine in St. Louis, Missouri. Even cells with seemingly stable identities might have the potential to turn into different types — such as an immune cell that activates to fight infection — or even turn cancerous or diseased under some conditions.

A cell’s past, of course, is of deep interest to developmental biologists, who study how one cell divides and diversifies to produce first an embryo and then an entire creature. That’s why the ultimate representation of cell types should be a tree-like structure, rooted in the body’s first cell and ending with mature types at the branch tips, argues Jay Shendure, a developmental geneticist at the University of Washington in Seattle. A parts list in an atlas, he says, “under-prioritizes the concept of time and the notion of continuity”.

Researchers are beginning to create the data that would underlie such trees. For example, in a study this year, Shendure and his colleagues tracked single-cell transcriptomes in mouse embryos from early development to birth and beyond. They found major changes in the RNA that was expressed in cells during the hour after birth, probably because the animals had to adapt to life outside the womb13.

Tracking cell types by developmental lineage has its own problems, however. There are rare instances when types that seem identical can arise through different trajectories. And it’s not clear yet how to categorize intermediate forms. “I still think there’s a question of, is cell identity a continuous property, is it a discrete property?” says Morris.

There’s also a more transient cellular property, called cell state, to consider. A cell’s type can remain consistent while its state radically changes: say, from newly born to preparing for the next cell division, or from quiescent to activated. It can be very challenging to distinguish impermanent cell states from true cell types, says Zhang.

Agree to disagree

If different technologies don’t classify cells in the same way, and every cell is an individual at the finest level, then what is a cell type?

If the concept still seems vague, that’s as it should be, argues Allon Klein, a systems biologist at Harvard Medical School in Boston, Massachusetts. He says that the concept can be both “extremely useful and poorly defined” at the same time.

That’s because ultimately, there’s no simple ground truth to find. Nature hasn’t created a neat parts list as a human engineer would, and any effort to delineate categories is in some sense artificial. The same is true of taxonomists’ efforts to define species: the question never really went away, says Klein, but the answers evolved as genetic data poured in. Klein thinks that something similar will happen in cell biology.

Researchers are already coming up with more nuanced ways to accept and account for cellular variation. Buenrostro and Regev have come to see cells less as members of a particular type, and more as collections of identities, based on the modules or pathways that a given cell is running at a given time. So a cell could be running, say, a stable ‘fibroblast’ program with overlay states of ‘activate to repair wound’ and ‘cell division’.

The modules that matter to a given researcher will depend on their interests and perspective. That’s why the metaphor of a cell ‘atlas’ is so fitting, says Regev. Just like a geographical atlas combines natural features, political borders and other concepts, cell atlases can also unify different versions of cell identity — no matter the user’s perspective, or where they are headed.