Single-cell multi-omics means walking a tightrope, trading off such issues as cost and data sparsity. Credit: Huber & Starke / Getty Images

Single cells have much to say to those who can tune in, especially with the help of multimodal single-cell omics experiments. To assess the plethora of cell types and cell states, the heterogeneity of diseased tissue or the dynamics of protein movement into the nucleus, or to peer into the ‘black box’ of embryo development at high resolution, researchers might, for instance wish to measure—all in the same cells—how accessible DNA is and how abundant mRNA and protein are, and seek to capture lineage information and change over time. The single-cell multimodal approach, the Nature Methods 2019 Method of the Year, entails experimental and computational integration that is not a push-button affair.

How the genome encodes the organism is “to me, the big question in genomics,” says Tim Stuart, postdoctoral fellow in the lab of Rahul Satija at the New York Genome Center (NYGC). Given that organisms hold “representations of themselves” in their DNA sequences, understanding these representations’ intricacies can, for example, help to predict how DNA sequence changes might affect an organism’s development or how certain DNA mutations lead to disease. Mutations in certain genomic regions have been associated with certain traits. Most of these loci are not in protein-coding regions but rather in ones that regulate gene expression, says Stuart. But which regulatory elements regulate which genes, and in which cell types are these elements active? “Gathering multimodal single-cell data is one approach that I think will be very powerful in working out the function of regulatory elements and their cell-type-specific activity,” he says. With multiple data modalities from a cell, researchers can begin to work out how the results relate to one another, he says. They can bring an understanding of how greater DNA accessibility at a certain location connects to increased gene expression.

Software tools are needed to address such questions and have to be useable even by those without extensive computational expertise. This motivated Satija, Stuart, Caleb Lareau, colleagues at Stanford University and others to develop Signac1. “We developed the package specifically with multimodal data in mind and have incorporated methods to do things like link regulatory elements to genes that they might control,” says Stuart. Signac is software for labs analyzing single-cell chromatin data. It’s been integrated with Seurat, the Satija lab’s widely used single-cell RNA-seq analysis package. The ability to identify mitochondrial variants in single-cell assays is an exciting development, says Stuart. It applies to studies of clonal relationships among cells or the impact of pathogenic mitochondrial variants. Scientists can analyze data from experiments that measure—in the same cells—DNA accessibility, gene expression, protein abundance and mitochondrial variants. Getting such data is an ambitious endeavor for labs plumbing complex questions, such as development, which Wolf Reik, a researcher at the Babraham Institute, has long focused on.

One challenge researchers face with single-cell multi-omics methods, and not just in developmental biology, is data sparsity. As Reik says, every single-modality dimension “has the problem of sparsity.” One can computationally impute what’s missing in the sparse information, and with collaborators, he and colleague Stephen Clark, also at the Babraham, are exploring an “AI-guided” approach to filling in data gaps in new ways. But, says Reik, caution is advised because when multiplying one data-sparsity aspect with a second and a third, “you get into trouble quite quickly.” There is also a tightrope of tradeoffs in single-cell multi-omics: “Do you want more cells or more coverage per cell?” says Clark. “The answer is usually somewhere in between.”

ScNMT captures transcriptomic information, DNA methylation and chromatin accessibility from the same cell. DNA is labeled to visualize accessible chromatin regions; bisulfite sequencing captures epigenetic state. RNA is sequenced with SMART-Seq. Credit: Adapted with permission from ref. 4, Springer Nature

How to gastrulate

Developmental biologist Lewis Wolpert famously quipped: “It is not birth, marriage or death, but gastrulation, which is truly the most important time in your life.” Little wonder that the promise of measuring gastrulation at single-cell resolution entices developmental biologists.

During gastrulation, a body plan emerges from a ball of cells and cells change shape and location. In vertebrate gastrulation, some cells move outward, others invaginate. Three germ layers are established: ectoderm, mesoderm and endoderm. It’s a regulated affair as the layers interact, and there’s much epigenetic reprogramming. Single-cell analysis tools are helping researchers to characterize the molecular details of these events.

“It’s a piece of beauty,” says Reik, commenting on work2 by Shankar Srinivas of the University of Oxford and team and Antonio Scialdone from the Helmholtz Zentrum Munich and team to characterize human gastrulation on a single-cell level. They apply methods such as single-cell RNA-seq, data clustering using diffusion pseudotime and Velocyto, nicknamed RNA velocity, which is a computational way to tell ‘RNA time’. The scientists compared mouse and human gastrulation, performed immunocytochemistry assessments, analyzed the transcriptome and characterized cell types. The researchers see much commonality between human and mouse gastrulation and note that “mouse represents a good model of human gastrulation.” Among the dissimilarities are different paths cells take as they migrate. The epithelial-to-mesenchymal transition may, they suggest, be regulated differently in mice and humans. Gastrulation has long captivated Reik’s and Clark’s interest. Together with colleagues at the University of Cambridge, the German Cancer Research Center and others, they used a method called scNMT to profile mouse gastrulation3, particularly germ-layer epigenetic events. They found, for example, that the primary germ layers emerge in a hierarchical fashion and that the epigenome shapes and guides cell fates and cell lineages. ScNMT4, an approach spearheaded by Clark and used mainly in developmental biology and cancer research, measures transcriptomic information, DNA methylation and chromatin accessibility from the same cell. “I like these three measurements,” says Clark, whom I interviewed jointly with Reik. Clark is now scaling up the method from a few hundred cells to soon, he hopes, tens of thousands.

Steps in scNMT after cell lysis include labeling DNA with GpC methyltransferase to visualize accessible chromatin regions. Next the DNA is prepared for bisulfite conversion and library preparation, and then sequenced to capture the epigenetic state of the cell. To obtain the transcriptome, the cell’s RNA is processed separately and sequenced using Smart-seq2, an approach for generating full-length cDNA and sequencing libraries. scNMT has involved ‘traditional’ library prep, says Clark, in which each cell is treated like an individual sample and molecular barcodes are added right at the end during PCR. “What we’re now doing is basically using methods to barcode as soon as we collect single cells,” he says.

In an afternoon they can collect 10 or even 30 plates of cells and do the barcoding steps right away. Then the cells can be pooled for downstream steps. “Generally speaking, the hard stuff comes at the early stages, before you’ve done any amplification,” says Clark. “Once you’ve amplified, you’re fairly safe.” The scale-up plan has no fixed timetable. “The short answer is: not yet,” he says. The team wants to be sure it scales robustly. “What I’m looking for is an experiment where I demonstrate that the data quality essentially replicates our previous method, but obviously with the increased scale,” he says. To assign cell types when using scNMT, the team uses RNA-seq data based on reference atlases and then computationally infers data. They find the major cell types, says Clark, “but we miss out on some of the more subtle types.” This, says Reik, highlights how important it is “that the wet-lab methods and the computational methods work hand and glove all the time.” The best scenario, he says, is for wet-lab and computational scientists to begin collaborating right at project design stage.

Reik and Clark hope to learn more about single-cell epigenetic dynamics during development and how disease states are modified by gene expression changes. Profiling both the epigenetics and the RNA is gratifying, says Clark. “It’s so restrictive if you have to do it separately because you don’t see the relationship.” When combining data from measurements made in different cells, linkages between dimensions go missing, says Reik, and aspects that co-vary from one cell to another aren’t captured.

By exploring relationships between different modality and data layers, scientists can develop hypotheses for mechanistic studies. With epigenetics and with gene regulation more generally, says Reik, one needs to consider “what comes first, what comes second, what comes third.” The sequence of events lets scientists infer causality. Epigenetic signals can be quite transient, says Reik. DNA methylation was once considered a longer-term change, but studies are showing it’s dynamic and often transient. Some histone methylation is also quite transient, as are other types of histone modifications. In the early embryo, methylation is globally erased and then re-established, but in a “very transient, turnover kind of fashion,” he says. In single-cell, multimodal experiments, researchers need to stay aware of how tissue dissociation can change gene expression, he says. Dissociating tissue on ice can mitigate some effects, says Clark, but doing this work on ice is also challenging.

Find your type

Traditionally, says Clark, cell types have been defined by morphology. As it’s become possible to assess gene expression differences, scientists are taking note of large gene expression differences between cells that would have been labeled as the same cell types in terms of morphology. “The same is true of the epigenome,” says Clark. In the future, it might become more common to catalog cells based on molecular features rather than morphological ones, he says. Reik points out that cells undergo transition states and identity shifts quickly and even undergo cell fate conversion. During development, cells travel from one area of the body to another and merge into a new identity upon arrival. Such aspects can be dissected by single-cell methods better than with other approaches.

Cell types are strongly influenced by their microenvironment, says Roney Santos Coimbra, a neuroscientist and molecular biologist at the Oswaldo Cruz Foundation (Fiocruz) in Belo Horizonte, Brazil. To study bacterial meningitis and Zika virus, he works with brain tissue slices such as from young rat hippocampus. Instead of cultivating one cell type in a dish, it’s “something similar to the real microenvironment,” he says. Coimbra and his team hope a microenvironmental view can help them to parse the details of infections. In bacterial meningitis, neurons die and progenitor cells are depleted. This is accompanied by a rise in homocysteine levels and a decrease DNA methylation. With an in vivo model of bacterial meningitis, the team found that vitamin B12 lowers neuroinflammation and halts neuronal death5.

Coimbra says that vitamin B12 is a co-factor for methionine synthase in the remethylation pathway: homocysteine is converted to methionine and then into S-adenosylmethionine, which is a universal methyl group donor. Perhaps, he says, this explains vitamin B12’s apparent neuroprotective effect. Methyl-dependent epigenetic mechanisms might be shifting gene expression, modulating the homocysteine pathway and increasing DNA methylation. He and his and colleagues have just completed transcriptomic analysis of their experiments and are evaluating reduced representation bisulfite sequencing (RRBS) results. RRBS is a way to home in on regions rich in CpG and analyze methylation genome-wide. The researchers focused on a few genes from their RNA-seq analysis, one of which is Ccl3, which appears to play a role in the immune response in COVID-19 as well. “We are working to integrate the data,” he says.

In a separate project using hippocampus slices to study the infection with Zika virus, the scientists have been exploring how infection forces cells to differentiate before they are supposed to, depleting the reserve of progenitor cells, says Coimbra. The researchers have been using RNA-seq to assess transcriptomic signatures of this derailment in brain development. Given that the work is in brain slices and not single cells, the RNA-seq and RRBS results will have background noise and the mixed cell population will mask some results. It’s a risk Coimbra has to take. “Working with single cells would be great for us, to deconvolute, to have clear signals,” he says. Higher resolution results would let the scientists iterate between single-cell experiments and mixed cells, which would help to characterize microenvironmental influences.

“We also must dream to do research in Brazil,” he says. RRBS reagents and library preparation are expensive. For this work, he traded bioinformatics analysis for reagents that other labs had in stock and crowdsourced funding from friends. He and his colleagues want to dig deeper into epigenetic regulation of infection and assess micro RNAs, too. But it’s unclear if that can happen.

Nuclear dynamics

To characterize how cells react to changing conditions or disease, researchers need a window into the nucleus. With live cell imaging they can assess transcription factor levels and gene expression in cell nuclei. They might perform in situ measurement of protein and RNA using stains and hybridization or use flow cytometry to sort cells on the basis of fluorescent reporters. But such approaches make it hard to assess single-cell changes. Transcription factors on the move can influence transcription genome-wide. inCITE-seq6 is a way to make such multiplex measurements of a protein’s journey. The method involves DNA-coupled antibodies and RNA-seq on a droplet-based platform. It yields quantitative intranuclear protein measurements and transcriptomic information. “The most enthusiastic interest has been from cancer biologists who are interested in signaling proteins,” says Hattie Chung, the postdoctoral associate at the Broad Institute of MIT and Harvard who led inCITE-seq’s development. It’s been a journey to develop this method, which she hopes to keep scaling up. Chung was a member of Aviv Regev’s lab. Regev has transitioned to a new post at Genentech but remains supportive from afar, says Chung, who now works with Fei Chen at the Broad. He helped develop inCITE-seq, including advising on buffers.

Chung joined the Regev lab right as single-cell genomic approaches were being applied to identify the molecular identities of diverse cell types. “I thought that was super cool,” says Chung. It seemed like a natural extension to explore how to discern cells within the same cell type that are ‘turned on’. They change as they process incoming signals, and a signaling cascade can trigger movement of regulatory proteins such as transcription factors. “It’s a very complex set of shuttling of molecules across different locations,” she says. “There are dynamic changes within these cell types that we’re now charting at an unprecedented pace.”

Chung arrived at the Broad with a PhD unrelated to single-cell genomics or functional genomics. In pre-pandemic times, she had worked on bacterial pathogens and the evolutionary dynamics of pathogens with a focus on computational approaches. As a postdoc, she has become a wet-lab researcher. She explored how to iterate quickly and cheaply and avoid sequencing at every step. “That would have been exorbitantly expensive and not feasible even for our lab,” says Chung. Another group at the Broad had developed a HeLa cell line that expresses a p65–mNeonGreen reporter construct. It can tag the transcription factor p65, which belongs to a family of transcription factors called nuclear factor (NF)-κB that is involved in development and cell growth, as well as in cancer and inflammation. In untreated cells, p65 is in the cytoplasm, but upon stimulation with tumor necrosis factor-α, it moves to the nucleus. Total cell levels of the protein remain constant, but nuclear levels become highly elevated.

The idea was to distinguish treated from untreated cells on the basis of these different p65 levels in nuclei. “We needed a quick way to iterate upon our buffer recipes and see whether we were getting sensitive measurements or not,” says Chung. They used antibodies specific to p65 and flow cytometry. A green fluorescent reporter revealed when the protein levels in the nuclei shifted, so the team needed a signal to match that shift. They went through dozens of different buffers, she says. “We didn’t sequence anything until we were able to get buffer conditions that gave us the antibody signal against that same protein, which matched the protein signal from the green fluorescent reporter,” she says. A typical antibody against p65 delivers a noticeable signal shift as it moves from the cytosol to the nucleus. But when DNA tags are conjugated to the antibody for later sequencing purposes, background signal drowns out this shift.

Hattie Chung (bottom left) led the development of inCITE-seq as a postdoctoral associate in the Broad Institute labs of Aviv Regev (top), who is now at Genentech, and Fei Chen (bottom right). Credit: C. Atkins; V. Chudik; R. Majovski

A sticky place

As other groups working on spatial transcriptomic methods have found, says Chung, it’s tough to attach DNA tags to track antibodies in the nucleus. The nucleus is likely a “sticky place,” she says, chock full of proteins and other molecules always on the lookout for DNA motifs. Her immunofluorescence staining of cells in a dish assured her there was detectable antibody signal. It’s why she recommends “bread and butter” ways of validating signal with flow cytometry and in situ immunofluorescence. “They’re also just really cheap ways to iterate,” she says. “And they also give you visual proof.”

She and her team collaborated with the company BioLegend; the company handled the DNA–antibody conjugation for the group’s experiments. The company’s proprietary pipeline linked the tags to the antibody at a targeted spot, she says. One alternative would have been a less specific chemical attachment with amine-directed conjugation. But, she says, that risked having a variable number of DNA tags on antibodies or disruption of the antibody-binding sites. The researchers benefitted from the collaboration with BioLegend, which helped to defray costs, says Chung. To conjugate antibodies for the inCITE-seq experiments in the company pipeline, one to two milligrams of antibody were needed. The antibodies also had to be in a solution free of bovine serum albumin, which interferes with conjugation. When labs choose antibodies for DNA conjugation, she says, they will definitely want to be sure they have a clearly detectable signal. Although some labs might try conjugation on their own, she thinks a commercial platform might offer a more consistent way to control stoichiometry and to assure the tag is not interfering with the target recognition site. “That gives me peace of mind,” she says.

CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes, captures expression of proteins on the cell surface at the same time as RNA expression. It was developed by Peter Smibert and colleagues, formerly at NYGC and now at 10x Genomics. The computational analysis aspect was developed in the Satija lab at NYGC. Antibodies that work in surface CITE-seq tend to have been tested for fresh or frozen tissue or suspensions, not formalin-fixed paraffin-embedded (FFPE) tissue, says Chung. Using frozen or fresh tissues fixed quite lightly is preferable for single-cell and single-nucleus analysis with inCITE-seq. But antibodies used in tagging intracellular or intranuclear proteins in immunohistochemistry commonly involve FFPE tissue, and the antibodies have been validated for that application. FFPE, she warns, changes the epitopes. “Don’t assume that antibodies that work well for FFPE necessarily work well for inCITE-seq,” she says.

inCITE-seq can reveal how a transcription factor reacts to cues. Here, p65 in the cytosol of untreated cells (left) moves into the nucleus upon treatment with NF-κB (right). Credit: H. Chung, Broad Inst. of MIT and Harvard. Adapted with permission from ref. 6, Springer Nature

Taking on headaches

Looking back on all the iteration, Chung wonders whether her computational background was helpful because it gave her a “very naive way” of looking at each experimental protocol, such as fluorescence in situ hybridization (FISH). “I want to know what every single chemical and every single buffer does,” she says. As she assessed FISH hybridization buffers, she spoke through the experiment with Chen, who suggested trying dextran sulfate. “It worked!” she says. Chung sees “a huge need and appetite” for cheaper ways to get controlled DNA conjugation with antibodies. She is intrigued by a company called AlphaThera, which offers a way to conjugate DNA to antibodies with very little material, which would influence cost. “I have not actually tested any of these kits yet,” she says.

“My biggest headache was antibody conjugations,” says Andrew Tsourkas, a researcher at the University of Pennsylvania’s Department of Biomedical Engineering who co-founded AlphaThera in 2016. He was customer number one for the company, which has set up shop in the Pennovation Center, UPenn’s incubator. Tsourkas had kept trying to attach antibodies to nanoparticles. “It was a huge headache,” he says and it was inefficient. “We just needed so much antibody to even make it happen.” It failed in different people’s hands. He and his colleagues developed and commercialized a method7 that might help Chung and her colleagues and other labs, too, he says, who are attaching oligos to antibodies as part of their multiplexed analysis to combine genomics, transcriptomics and proteomics. “The proteomics piece was always missing because of this challenge of attaching oligonucleotides,” says Tsourkas.

The company makes linkers that can attach oligos, a dye, a biotin or another label to antibodies. Once exposed to illumination at 365 nm, the oYo-Link binds the oligo covalently to the antibody. This ‘black light’ wavelength, commonly used in bars and nightclubs, does not damage oligos, proteins or nucleotides, he says. The linker is a protein G-based adaptor—a bacterially derived protein engineered with a light-specific moiety. When activated by light, it covalently binds to an antibody in a site-specific location in the antibody’s ‘tail’ region. Attaching in this fragment crystallizable (Fc) region avoids interaction with the antibody’s binding regions. The reaction itself can be performed in any kind of buffer, he says, and researchers can use low amounts of antibody. “We can label just a single microgram and they can run it on an SDS–PAGE, and they see it right away if the antibody has been conjugated or not,” says Tsourkas. The AlphaThera conjugation process takes around two hours.

Tsourkas and Feifan Yu, who heads R&D, quality control and manufacturing at AlphaThera and is the company’s first employee, believe oYo-Links can help in many types of multimodal single-cell experiments. The linkers, they say, give people a way to perform conjugation, avoid background signal challenges and help contain costs too. Once labs design oligos and find the right antibodies that attach reliably to the proteins of interest, says Yu, conjugation can be time-consuming or even beyond the expertise of a given lab to do on their own. She and Tsourkas know that people want to use as little antibody as possible for conjugation.

Yu says she has customers who have achieved consistent conjugation with oligos 100–150 nucleotides long. The goal of the company’s technology is to help people avoid time lost on troubleshooting reagents. “Troubleshooting might a biology question, not a reagent question,” she says, laughing. She recommends antibody validation and recalls an instance from her time in academia when she assumed the ordered antibody was fine. She ran a gel and, she says, “there was no antibody there.”

At University of Pennsylvania spinout AlphaThera, Andrew Tsourkas and Feifan Yu want to help labs with antibody conjugation woes.

Wish list

Reik and Clark have a wish list of desirable single-cell, multimodal measurements that includes epigenomic, transcriptomic, spatial, proteomic analysis and lineage information. “I think that would be a very nice wish list for Christmas,” says Reik, whom I interviewed shortly before Christmas 2021. Reik adds, laughing, that this wish-list might apply for next Christmas.

Current spatial methods, says Reik, tend to be static and do not capture the huge waves of cell movement in, for example, a developing tissue. He watches emerging techniques, such as those that apply in vivo light-sheet imaging, and that deliver multiple data modalities without killing or damaging cells. “We’ll get there eventually,” he says. CRISPR-based ‘scarring’ approaches, in which one creates genetic lesions for lineage tracing, make it possible to integrate cell lineage information. “That’s also hugely exciting,” says Reik. It can, for instance, capture an integrated impression of developmental processes from a fertilized egg to a particular point in time in development. What matters in developmental biology is “to understand how it all started and where it went and how it went from there,” he says.

Chung sees a number of labs, such as Nir Yosef’s team at the University of California, Berkeley and the Satija lab at NYGC, emphasizing ways to identify and discover cell types in samples, including computational approaches. The future may bring a representation of cells that is based not on a static cell type identity but one that captures shared pathways and shared signaling states. “I would love to now see the next extension, which is, how do we start to probe the more dynamic changes that cells undergo in their lifetime?” she says. Perhaps, she says, the current ways of analyzing these datasets for defining cell types can give way to representing cells in terms of their activity states, their pathways or the pathways shared across cell types.

The differences might be small compared to other molecular differences between cell types. Approaches are emerging that will “help refine our understanding of what is a cell type,” she says. Such trends may well get her to put “her computational hat back on,” she says. It’s an exciting challenge, she says, to move toward new, analytical approaches that capture a cell’s functional profile, be that in its activity or in its readiness to receive signals from the external world, and how it responds to these cues.