Main

There are currently no general models that can reliably predict the phenotypic impact of a specific genetic change, and therefore broad screens of genetic perturbations (see definition in Box 1) will be with us for the foreseeable future. Such screens come in two flavors: those that aim to figure out how cells work and those that aim to build new genetic circuits or protein functions for medical or biotechnological applications; however, the challenge of mapping genotype to phenotype is usually similar. Typical approaches include (1) knocking in/out each gene and detecting changes in the phenotype of interest, (2) altering the regulatory or coding sequence of a specific gene and monitoring the resulting change in expression dynamics or function, or (3) labeling all proteins or genomic loci and studying how these move or localize in response to an environmental signal.

Classic genetic screening techniques, such as randomly mutating cells and seeing what survives, are rapidly being replaced by newer strategies. In the last ten years, our capability to make precise genomic changes has transcended our wildest expectations, mainly owing to CRISPR–Cas9 (refs. 1,2,3). At the same time, microscopy has experienced a revolution in what temporal and spatial resolution can be achieved in living cells4,5. In this Perspective, we will focus on different approaches that combine these methods to study the impact of specific genetic changes by high-throughput live-cell imaging. To keep this piece focused, we have restricted ourselves to the imaging of synthetic libraries, which excludes the large body of literature that deals with imaging of the natural cell-to-cell variation in tissues, such as spatial transcriptomics6 or lineage tracing7, although these methods are closely related to our scope.

Arrayed libraries studied with imaging

The first examples of imaging-based screens used arrayed libraries (Box 1), where genetically modified cell strains (Box 1) were stored in physical isolation (for example, in a series of 96- or 384-well plates; Fig. 1a). Sampling the libraries in a way that preserves this order allows high-content time-lapse microscopy of many strains in one experiment.

Fig. 1: Different approaches to phenotype a library of genetically different cells.
figure 1

a, Arrayed libraries have a high cost in labor to construct and spot them on plates for phenotyping, but each strain in the library can be phenotyped completely in any modality (imaging, sequencing, etc.). b, Pooled libraries where fluorescence intensity serves as the readout. Cells can be sorted by flow cytometry into bins on the basis of fluorescence intensity, and each bin is then barcoded and sequenced. The result is histograms of intensity for each perturbation in the library. c, In a competition screen, a pool-generated library has some selection pressure applied, and the populations before and after application of the pressure are then identified by sequencing and compared to determine enrichment or depletion. d, Another sequencing option is to isolate individual cells with droplet microfluidics and perform scRNA-seq, with the result being data that map a perturbation to changes in a transcriptional profile. e,f, Cells can be phenotyped on the fly, with a handful of strains of interest isolated for downstream analysis (e), or all strains can be phenotyped and fixed in place for genotyping (f). The former has the advantage of not requiring a barcode, making library construction simpler, whereas the latter approach allows larger-scale mapping of genotype to phenotype.

The need for large-scale phenotypic screens was already apparent when the technology for whole-genome sequencing became available8. Sequencing the yeast genome9 allowed for genome-scale targeted studies to replace random mutagenesis in these organisms. An international multi-laboratory consortium generated both haploid and diploid knockouts of 2,026 open reading frames (ORFs) in the yeast genome10, generating an impressive resource that allowed for repeated pooled experiments or individual study of each knockout. A proof-of-principle demonstration of a strategy to replace genes with a unique 20-nucleotide genetic barcode (Box 1) was also described11. Taking advantage of the ability to cross haploid strains, arrayed libraries of double knockouts were also constructed12,13.

In two studies, libraries were constructed that would allow phenotyping by subcellular localization. One was a mixed library of plasmids and transposon-generated epitope tags of ORFs14. In the other, chromosomal green fluorescent protein (GFP) fusions to the end of all yeast ORFs15 were created. The latter approach allows for live-cell imaging. While the authors crossed in a red fluorescent protein (RFP) fusion with a defined spatial pattern as a landmark to aid analysis, cells were manually scored for each strain until over a decade later when machine learning took over16.

On the bacterial side, Taniguchi et al. quantified the expression of >1,000 fluorescent protein (FP) fusions by live-cell single-molecule microscopy in microfluidic chips. The imaging was followed by smFISH (see Box 1) against the FP transcript, thus quantifying both transcription and translation levels in the same cell17. Alternatively, it is possible to spot individual bacterial clones from an arrayed library on agarose18 or agar pads19,20. This approach was applied to the ASKA library of FP fusions21 to generate high-resolution space–time maps of protein locations18.

While the scale of the arrayed libraries and the data produced from these studies are impressive, the methods themselves are limited by the labor involved in performing the screens. Another general drawback of arrayed libraries is that the strains have to be cultured separately, which makes it hard to perform experiments under identical conditions. This may, in turn, limit which phenotypic differences can be resolved.

Pool-synthesized cell libraries

Pool-generated (pooled) libraries (Box 1) present an alternative approach to arrayed libraries. Early examples were created by randomly mutagenizing yeast with transposons, wherein the strains were screened on the basis of fitness22 or even subcellular localization of a transposon-generated fusion epitope for immunofluorescence23 (although, for the latter, we note that when cells were imaged in pools only population-level statistics about spatial patterns could be gathered).

Targeted pooled approaches to scale up strain generation are based on leveraging designable DNA oligonucleotide pools24. Pooled library synthesis makes it significantly easier and more affordable to generate many strains than with the arrayed approach, but the genetic identity of each cell is unknown until the individual cells are genotyped. At present, libraries of hundreds of thousands of designed oligonucleotide sequences, up to 200 nucleotides in length, can be generated for US$10,000–40,000. Smaller libraries (~10,000 oligonucleotides) cost approximately US$1,000, which makes approaches based on this technology affordable. Currently, the most common approaches are to (1) make genome-wide alterations or perturbations using CRISPR-based technology with pools of guide RNAs (gRNAs) or (2) focus on depth by varying a specific genomic locus or a mobile genetic vector. The former allows for wide screens to find targets for follow-up studies25. The latter is typically used to draw precise conclusions about the effect of variation in a specific sequence, such as the contribution of each nucleotide to protein–DNA binding26, or for optimization of protein properties (for example, fluorescent proteins (FPs)27,28, recombination machinery29 and the SARS-CoV-2 receptor-binding domain30).

In terms of altering a specific DNA locus, Kinney et al.26 developed a pioneering assay in which they built a library of bacterial strains with different promoter regions in front of a gene for an FP. By sorting the library into different bins using flow cytometry and sequencing the promoter regions of the cells in each bin, they could precisely quantify the contribution of any base in each sequence position to the promoter activity (Fig. 1b). This simple but elegant screen works because the phenotypic readout is fluorescence level. To quantify expression of any gene of interest, the approach has been extended and generalized by replacing flow cytometry with single-cell RNA sequencing (scRNA-seq) and fluorescence intensity with transcript expression levels31.

A similar experimental workflow was used to measure expression of an FP under the regulation, first, of 75 transcription factors in yeast32 and, later, of all combinations of 114 promoters with 111 ribosome-binding sites in Escherichia coli33. Johns et al.34 extended the concept to a wide range of organisms by barcoding the expression of >29,000 regulatory regions from 184 different bacterial species. They determined both the transcriptional efficiency, using the RNA-to-DNA ratio for each barcode, and the translational efficiency, using SORT-seq. In all cases, barcoding of bins leverages the power of next-generation sequencing to analyze pool-generated libraries of variants in one experiment.

Straightforward screens can also be made when the phenotypic readout is the fitness of a strain in a selective environment (Fig. 1c). In such experiments, the phenotype is typically the relative frequency of each genetic barcode (and thus each genotype) in the population before and after the fitness competition. This approach was the basis for the first generation of CRISPR libraries to multiplex mapping of a perturbation to the corresponding fitness phenotype for knockout35,36,37, knockdown38,39 and activation38,40 in mammalian cell lines. Before CRISPR, conceptually similar fitness screens were made with transposons41,42,43,44 or RNA interference45.

Going beyond counting the frequency of certain barcodes in the population, single-cell sequencing can be used in multiple ways to assess the phenotype of individually perturbed cells (Fig. 1d), to determine the transcriptional state of each individual cell (e.g., droplet-based scRNA-seq46,47,48) or the state of the chromosome (e.g., scHi-C49 or scATAC-seq50). A combination of both approaches has also been demonstrated (scNMT-seq51 and sci-CAR52). Pool-generated CRISPR libraries phenotyped by single-cell sequencing have been used in contexts ranging from immortalized mammalian cell lines53,54,55 to primary immune cells from Cas9-transgenic mice55 and even in cells collected from Cas9- and GFP-transgenic mice transduced with single guide RNA (sgRNA) vectors and then injected into a new host mouse56.

Screening pooled libraries with live-cell imaging

The major limitation of phenotyping based on sequencing is that the cells are lysed in the process, and, as a consequence, all spatial, morphological and dynamic information is lost. This is unfortunate because many of the phenotypes of interest to cell biologists and microbiologists require this type of data.

In the following section, we discuss approaches for microscopy-based screening of pooled cell libraries, which allow for high-resolution imaging and time-lapse microscopy. These methods overcome most of the challenges of the arrayed imaging screens, such as the amount of work needed and the challenge of maintaining identical experimental conditions for all strains. However, because all strains are handled the same, one may need to run the whole experiment under different conditions to maintain the dynamic range. For example, if a reporter protein is expressed at very different levels across a pooled library, imaging conditions that are optimal for capturing variation in intensity for one strain may result in saturated pixel values for another strain with a higher expression level.

First, we will describe some selected recent methods, divided by the approach to genotyping: selection of a few cells with desired phenotypes (Fig. 1e) or in situ genotyping of the entire library (Fig. 1f). Next, we will discuss the relative advantages of the different approaches.

Selection of a few cells out of many for genotyping

Some of the earliest methods to visually select individual cells from a heterogeneous population used photoinducible chemistry (which fits naturally with microscopy for phenotyping) to mark cells of interest. Photostick57 was one of the first such methods (Fig. 2a). In this approach, a small molecule cross-links the selected cells to the imaging substrate upon light exposure. Non-selected cells are washed away, and the remaining cells are identified, for example, by sequencing. The authors used the method to successfully engineer hippocampal neurons with a specific firing pattern.

Fig. 2: Different ways to connect imaging-based phenotypes to genotypes.
figure 2

a, Cells are selectively attached to the substrate using photochemistry57. b, Photoconvertible molecules in or adjacent to cells are photoconverted, and the cells are sorted by flow cytometry59,61. c, Selected cells are mechanically moved to new locations64,65. d, Cells are grown and phenotyped in a microfluidic chip and moved to an exit channel by optical tweezers70. e, Cells are phenotyped on a culture dish or slide and then genotyped in situ27,80,81. f, Cells are grown and phenotyped in a microfluidic chip that allows for in situ genotyping76,77,79.

Rather than sticking the cells in place, photo-cross-linking can be used to fluorescently label cells, which can then be sorted with flow cytometry. For example, biotin-4-fluorescein was photo-cross-linked to selected cells58, and, similarly, photoconvertible quantum dots were attached to cells before selective photoactivation59. The demonstrated throughput of these methods is, however, limited to several hundred selected cells.

A next step along these lines is to have cells constitutively express a photoconvertible FP, convert the FP in cells that display a desired phenotype and then use flow cytometry to sort out cells of interest (Fig. 2b). This enabled selection of cells from tissues60 or from a large heterogeneous library61. The latter approach was used to study the efficiency of different nuclear localization signal peptides. In recent studies, where machine learning was used for characterization of phenotypes and automation for selecting and photoactivating cells, it was possible to scale the approach to thousands of gRNAs and millions of cells62. Yan et al.63 also showed that they can sort out many populations of cells in the same experiment by different degrees of photoactivation.

Multiple rounds of library imaging and selection of adherent cells were used by Piatkevich et al.64 to evolve proteins on the basis of complex criteria (Fig. 2c). In each round, a computer vision-guided automated micropipette was used to screen 300,000 cells expressing different protein constructs in ~4 h. In particular, Piatkevich et al. evolved a genetically encoded fluorescent voltage indicator, simultaneously optimizing its brightness and membrane localization.

A similar approach was taken by Wheeler et al.65. The researchers seeded a pool-generated CRISPR-edited human cell library at low density in polydimethylsiloxane (PDMS)/magnetic microwells to ensure one founder cell per microwell. The cells were observed by confocal imaging, and the microwells with interesting phenotypes were manually removed with a motorized microneedle for further studies65. In the screen, the authors identified RNA-binding proteins related to the stress-induced formation of punctate protein–RNA assemblies.

Microfluidics approaches66,67,68,69 are rapidly becoming the standard solution for bacterial imaging, as they allow for exponential growth over many generations, excellent imaging conditions, high reproducibility and highly controlled medium switches. A high-precision method for selecting individual strains from a microfluidic chip was presented by Luro et al.70 (Fig. 2d), who performed targeted mutagenesis on a genetic oscillator71. The researchers then loaded the resulting pool of strains into a microfluidic chip where they could be phenotyped for hours. Strains presenting desired characteristics (that is, more robust oscillations) were identified and individually selected with optical tweezers.

In situ genotyping of all cells in a screen

Selection-based methods are favorable when the goal is to extract a few interesting strains from a large pool. If, on the other hand, the aim is to map each genotype to its resulting phenotype, such methods quickly become impractical because of the laborious process of picking individual cells.

Imaging-based phenotyping of a pool-synthesized library followed by in situ genotyping72,73,74,75 of the whole library was described in 2014 (ref. 76); however, practical implementation of the approach was first demonstrated in two studies published in 2017 (refs. 27,77). Emanuel et al.27 developed a method to screen for novel FPs (Fig. 2e), where a large library of mutated FPs were expressed from plasmids in bacterial cells and imaged on a coverslip. The cells were fixed, and the fluorescence properties of the proteins were connected to the corresponding genotype through an expressed barcode RNA identified by multiplexed FISH74. The coverslip format allows for the screening of large libraries, but for a limited time span because the bacteria are not kept in a state of exponential growth. Simultaneously, our group77 implemented a microfluidic culture system that allows single-molecule microscopy in bacterial cells growing exponentially for many generations (Fig. 2f), named Dynamic u‐fluidic Microscopy‐based Phenotyping of a Library before In situ Genotyping (DuMPLING). The microfluidic design also facilitates direct spatial mapping between the phenotyped cell and the RNA FISH-based genotype barcodes. This proof-of-principle demonstration was implemented on a very small CRISPR interference (CRISPRi78; Box 1) library constructed with barcoded plasmids. CRISPRi makes it possible to target genes anywhere on the chromosome while the gRNA is expressed from one position. This simplifies in situ genotyping enormously because the genetic alteration is in the same place in all strains.

We later used the microfluidic format to identify genes related to synchronization of the division and replication cycles in E. coli79. In this study, a pooled CRISPRi library was used to monitor the effect of different gene knockdowns on DNA replication by time-lapse imaging of replication forks throughout multiple division cycles in hundreds of different strains. As in the previous study77, phenotypes were mapped to genotypes in situ by sequential FISH probing of an RNA barcode. The structure of the microfluidic system accommodates many physically isolated strains in the same field of view, making it possible to perform time-lapse microscopy on tens of thousands of bacteria with 1-min time resolution.

Implementation of a large-scale imaging-based pooled screen in human cells was performed by Feldman et al.80, who studied knockouts of 1,000 different genes with 4,000 distinct barcodes (Fig. 2e). In total, 20 million cells were analyzed across these screens, where cell nuclei were tracked using a DNA stain and the nuclear translocation of p65–mNeonGreen was assessed at each time point. Following live-cell phenotyping, cells were fixed and the identity of the disrupted gene was determined by in situ sequencing of the sgRNA sequences, as well as barcodes, using an extension of the gap-fill padlock rolling circle amplification approach72. The gap-fill approach requires that the ends of the hybridization probe bind on each side of the barcode (the remainder of the probe sequence loops out and is not hybridized), such that the polymerization reaction can integrate the barcode sequence into the circular template. The template is next amplified by rolling circle amplification. Another example of an application in eukaryotes was demonstrated by Wang et al.81, who studied the effect of 54 CRISPRi knockdowns on RNA localization to nuclear compartments. In this study, fixed cells were used for phenotyping with FISH probes and antibodies, and the genotypes were assessed by multiplexed FISH.

When to use what: comparing the strengths and weaknesses of the different methods

In the following subsections, we contrast the relative strengths and weaknesses of different imaging-based methods and give practical guidance on how to pick the appropriate approach for different applications. Please also see the roadmap in Fig. 3.

Fig. 3: Roadmap to method selection for library screening.
figure 3

Depending on how you answer the questions, different methods may be most suitable for your needs. Please see the text for a more complete reference list.

Pooled library generation

The relative ease and flexibility of library generation for selection-based methods follow from the fact that the cells can be cultured and manipulated downstream of the imaging step (Fig. 2c,d). In this way, essentially any DNA variation can be identified by nucleic acid sequencing. In principle, even a mixed population of unknown cell types can be analyzed, for example, in an environmental sample. In contrast, if the experiment requires identification of all the different clones in the library, genotyping in situ is usually necessary and generally requires a barcode nucleic acid sequence that is separate from the sequence that defines the variation or perturbation (Fig. 2e,f). Most applications use an RNA barcode so that many copies of the barcode can be generated before fixing the cells (although we note exciting recent work that allows for amplification of a desired sequence after fixation82).

A common problem for methods with barcode sequences physically distant from the genetic perturbation is the formation of mismatched barcodes (chimeras). In the CRISPRi setting, for example, there is generally a short sequence that acts as a linker between the promoter for the barcode RNA and the sgRNA gene. This linker sequence is ripe for recombination, for example, during amplification of the oligonucleotide pool (that is, chimera formation during PCR). A solution to the issue of barcode–perturbation mismatch was proposed in the context of CRISPRi screening with scRNA-seq readout54. The authors devised a construct with two promoters upstream of the sgRNA sequence resulting in two transcripts with different functions: (1) a CRISPR gRNA that is generated by RNA polymerase III and (2) a polyadenylated transcript that is generated by RNA polymerase II, which can be captured for identification by scRNA-seq.

Phenotyping: what is the time scale of the process?

For phenotypes that can be determined by a single image, like cell morphology or spatial distribution of molecules, fixed cells on a coverslip offer an easy and viable approach. However, as has been discussed elsewhere, static distributions can result in an ambiguous picture of the underlying mechanism83. In such cases, it is essential that the same cell be observed at multiple time points. For short time scales (shorter than a cell division), cells adhered to or sandwiched on a coverslip may be sufficient80. However, a microfluidic approach is indispensable when studying processes on a longer time scale, that is, more than one cell generation70,77,79.

Connecting phenotypes to genotypes: breadth versus depth

Selection-based methods have the advantage that the identified cells can be separated and cultured for downstream analysis (such as scRNA-seq or Hi-C to assay the state of the transcriptome or chromosome conformation, respectively) and to make stocks for later use (Fig. 2c,d). This naturally enables a more complete and deeper view of the phenotype resulting from each selected perturbation. In situ identification methods have not yet been successfully combined with other phenotyping assays downstream of the imaging phase, but, theoretically, they are compatible with transcriptome-scale in situ methods such as seqFISH84,85 and MERFISH86.

The obvious limitation of the selection approaches is throughput. The selected cells must be isolated either one at a time, for example, by optical tweezers70, or in a batch, for example, by photoactivation and sorting by flow cytometry. By contrast, in situ genotyping methods generally reveal the genotype of all cells that are imaged (Fig. 2e,f), giving far greater breadth to the results.

A method to increase throughput of selection-based genotyping is imaging-activated cell sorting, where cells briefly pass the microscope (for example, see refs. 87,88). However, imaging resolution has remained insufficient for phenotyping beyond distinguishing coarse differences in cell types. A recent microfluidic solution for improved image resolution uses a PDMS valve to transiently press cells against a coverslip. The trapped cells are then either kept or discarded89. While this approach allows for single-molecule fluorescence microscopy, the throughput is relatively low, and, as several cells are trapped together, additional selection steps would likely be required for most applications.

Genotyping in situ: sequencing or hybridization

A division within in situ identification approaches is the method of genotyping: sequencing or hybridization. The sequencing methods are generally extensions or adaptations of sequencing-by-ligation protocols72,73, whereas the hybridization methods are adaptations of combinatorial RNA FISH74,75. The advantage of the sequencing methods is that the barcode can be very compact, and diversity scales as 4(barcode length). For some library designs, such as in an operator library, it is even possible to read out the genetic variation directly. However, these methods require multiple enzymatic steps in situ, which has proven to be difficult in microfluidic settings77. Hybridization methods are generally less experimentally challenging. While FISH methods require longer target regions (typically 15–20 nucleotides per probe), they have been demonstrated to scale up to 60,000 variants, with a million barcodes being a viable extension29.

Future directions

The methods presented in this piece have three methodological concepts in common: library construction, genotyping and phenotyping. While great advances have been made, some aspects of the current methods have wide space for improvement. The tools for cloning and genome engineering are rapidly progressing and moving past most conceivable obstacles, even in a pooled format. In eukaryotes, new tools are moving library-scale gene editing with CRISPR toward reliable, specific and targeted gene edits (for example, by improving template delivery90,91). Methods for robust in situ identification of chromosomal barcodes at the single-cell level have been limiting, but important progress in this direction has been reported82.

Methods for imaging-based phenotyping have few limitations in enabling the study of intracellular dynamic processes in perturbed libraries beyond the usual microscopy caveats of labeling, resolution and cell toxicity. A logical phenotyping modality to combine with large knockout screens is high-dimensional unbiased imaging of cellular morphologies, such as Cell Painting92. Imaging-based screens can also be taken into the dizzyingly large combinatorial space of drug combinations93 or interactions of different combinations of cell types94,95.

The major limitation of live-cell phenotyping of libraries is throughput: it is not possible to move the microscope stage and image with sufficient speed to capture more than a few frames per second, which introduces a trade-off between the number of strains imaged and time resolution. Large field-of-view setups96 will be at least part of the solution. However, in the near future, it is likely that these experimental challenges will be small compared to those related to handling and analyzing the data. As an example, we are already acquiring 1 TB of relevant physiological data per day in DuMPLING screens79. The staggering volumes of high-quality data are an obvious match for machine learning approaches, as has been covered elsewhere97,98. As an early example, a support vector machine was used to derive meaningful phenotypic information from fluorescence microscopy data for 20 million cells representing an arrayed library of 5,000 genetically altered yeast strains99.

As with all machine learning applications, image analysis may be sensitive to biases in manually curated reference datasets; thus, in building standardized tools, it is important to test for such subjectivity in the final results100. Much work is still needed to avoid the pitfall where each laboratory designs a tool for their own dataset, making standardization of analysis nearly impossible. Starfish is an effort by the Chan–Zuckerberg Institute to combat this by standardizing dot detection in single-molecule spatial transcriptomics experiments101, which is relevant for genotyping in situ and likely will produce tools that are useful for phenotyping where dot detection is involved. Another institutional effort is CellProfiler, an endeavor to fill the space of general phenotyping tools102.

Discussion

We will soon be at a state in biological research where our capacity for making genetic perturbation and detailed imaging of complex phenotypes in individual cells will have few real limitations, except for practicalities such as cost, storage space, and imaging and analysis speed. For a long time, biological research has been akin to trying to understand how a commercial airliner works by removing one part at a time and observing when it crashes. We now have the tools to gently turn the knobs in the cockpit and at the same time monitor the flaps on the wings. However, because the number of possible genetic alterations, even in a small bacterial genome, massively exceeds the number of atoms in the universe, and the number of phenotypes that can be studied is even more bewildering, the real challenge still lies in making clever and specific experimental designs and developing the tools to analyze the data generated. So, in conclusion, focus on the biological question of interest, look at the relevant flap and turn the right knobs gently.