Introduction

Transposon-insertion sequencing (TIS) methods combine large-scale transposon mutagenesis with next-generation sequencing to estimate the essentiality and/or fitness contribution of each genetic feature in a bacterial genome simultaneously. A strength of TIS is that experiments are performed with pooled transposon libraries, which allows direct linkage of phenotype to genotype in a high-throughput manner. Ultimately TIS aims to elucidate the function of each genomic feature and is therefore a critical tool to help interpret the mounting levels of genome sequencing data being generated. TIS methods can be sensitive enough to detect even minor changes in mutant fitness but also, with sufficient density, precise enough to be able to assay not only genes but also intergenic regions, promoter regions and essential protein domains within coding regions. Four variations on the TIS method were published in 2009: transposon sequencing (Tn-Seq)1, transposon-directed insertion site sequencing (TraDIS)2, insertion sequencing (INSeq)3 and high-throughput insertion tracking by deep sequencing (HITS)4. Since then, TIS has become a valuable tool in our molecular biology toolkit, whose full utility is still being explored.

The basic TIS workflow is summarized in Fig. 1. Briefly, it begins with construction of a saturated mutant library (Fig. 1A) by introducing a randomly inserting transposon, commonly a Tn5 or mariner transposon, into a strain of interest often by transformation or conjugation. The goal is to create a population of bacteria where each cell carries a single transposon insertion in the genome, and when cells are pooled together, each genetic component is disrupted multiple times at different sites. By direct sequencing of the transposon-flanking regions of the initial library, potential essential features can be identified as those that do not tolerate insertions. Alternatively, the library can be subjected to a selective condition, for instance antibiotic stress (Fig. 1B), to query non-essential features involved in survival and growth within that environment. Such conditionally important components are defined by insertions whose frequency significantly changes in the population during the selection, determined by sequencing before and after selection. Genomic features that have disruptive transposon insertions with a decrease in frequency over experimental selection are assumed to be important for fitness in the test conditions; such features could include antibiotic resistance genes during antibiotic selection or virulence factors in an infection model. Features where insertions show an increase in frequency are assumed to have a disadvantageous effect in the test conditions, including negative regulators of fitness-enhancing features, or metabolically costly systems that are not necessary in those conditions.

Fig. 1: Basic TIS method overview.
figure 1

A | Creation of the transposon-insertion sequencing (TIS) library has four steps. The first is to create random transposon (Tn) mutants (part Aa). The horizontal black lines and arrows represent the host’s genomic DNA (gDNA) and the coding regions of genes are marked ‘X’, ‘Y’ and ‘Z’. The horizontal blue line is the transposon containing an antibiotic resistance (AbR) selection marker and bounded by inverted repeats, shown in green. When the transposon inserts itself into the gDNA, the disruption in a gene (gene Y in this example) is shown by a red cross. The second step is to select and pool mutants (part Ab). The red cross represents a single mutation in each cell; these cells are selected, for instance on antibiotic-containing agar plates, and pooled and DNA is extracted. The third step is fragmentation, addition of adaptors and PCR amplification (part Ac). Fragmentation (vertical dashed lines) can be enzymatic or by shearing (depending on the version of TIS). Sequencing adaptors (yellow rectangles) are then added, and primers (purple arrows) P1 and P2 are used for PCR amplification. Step 4 is sequencing and mapping (part Ad). Sequences out from the transposon end (using primer P3) are mapped onto the reference genome and the transposon insertion point is determined (vertical red arrow) and mapped for each mutant. Genes that cannot tolerate insertions (gene Z in this example) will not have any TIS reads mapped. B | Challenging the TIS library — in this example with an antibiotic, colistin (bottom row), compared with an untreated control (top row) (data from ref.42). The vertical lines denote the density of insertions at each insertion site, and red and blue denote forward or reverse insertion direction, respectively. Below are the predicted genes, in light blue. The first gene (usq) has equivalent numbers of insertions in the treated and untreated samples and thus has no effect on fitness in colistin. The next gene (truA) has relatively more insertions in the treated sample compared with the control (its mutants have increased fitness in colistin) and thus is considered a sensitivity gene. The next gene (dedA) is an experimentally confirmed resistance gene42 and it has decreased insertions in the treated sample (mutants have decreased fitness in colistin). The last two genes (accD_1 and folC) have no insertions in either the treated sample or the untreated sample and are thus considered essential for growth.

There are four major TIS versions that differ in various steps of their sequencing procedures (see ref.5 for more detail on these variations). For example, the way DNA undergoes fragmentation for library preparation differs: Tn-Seq and INSeq use the type II restriction enzyme MmeI to yield uniform-length shorter reads, which can remove PCR amplification bias, whereas TraDIS and HITS use random-sized shearing via sonication, which can have the advantage of improved transposon mapping owing to longer reads. Similarly, Tn-Seq and INSeq exclusively use the mariner transposon, which inserts itself into thymine–adenine dinucleotide (TA) sites but otherwise does not have a sequence preference, and the others have the flexibility that they can use any transposon, but they commonly use Tn5 as it is commercially available and does not have a insertion site bias. After fragmentation, various adaptors are added, and transposon–genome junctions are amplified and sequenced with a sequencing primer facing out of either the transposon or the adaptor. Finally, mapping of the adjacent genomic DNA allows the exact position of each transposon in the bacterial genome to be determined with use of appropriate bioinformatic tools (see Developments in TIS data analysis).

Since the last comprehensive reviews on TIS5,6 in 2013, a range of exciting and multidisciplinary methods that build on TIS have emerged to answer increasingly complex biological questions. These advances include scaling TIS analysis to hundreds of different conditions using high-throughput phenotyping, the use of machine learning to predict bacterial survival outcomes and combining TIS with cutting-edge techniques from single-cell analysis (droplet Tn-Seq (dTn-Seq)) to fluorescence sorting (TraDISort). Analysis tools have also evolved to cope with this increase in complexity of TIS studies. Lastly, a broad range of in vitro and in vivo applications of TIS have been implemented in pathogenic, commensal and environmental bacteria in the past decade. In this Review, we discuss these exciting developments and applications of TIS and present our vision for TIS into the future. We refer readers to previous reviews5,6,7 for detailed information on the design of TIS experiments, including choice of transposon and statistical impacts of experimental parameters, comparisons of TIS method variations, limitations of standard TIS and details on applications before 2013.

Advances and extensions of TIS methods

Over the past decade, TIS methods have been developed to incorporate other technologies and techniques to answer complex biological questions in creative ways. These include physical separation and sorting of individual mutant cells, using inducible promoters to study essential genes and scaling of current techniques to simultaneously screen multiple environments and different species, facilitating pan-organism analysis (Fig. 2).

Fig. 2: Extensions to the TIS method.
figure 2

a | Physical separation of mutant populations based on motility. This includes assaying genes for motility by inoculating transposon-insertion sequencing (TIS) mutants on an agar plate (yellow circle and beige spot) and separating the inner mutant pool (less motile) from the outer mutant pool (more motile). b | In density–TraDISort, mutant populations can be separated into top, middle and bottom fractions (shown by horizontal orange bands) on the basis of their increased or decreased cellular density using a Percoll gradient and centrifugation. c | Separation of single mutants using fluorescence (TraDISort). The mutant pool is treated with the fluorescent marker ethidium bromide (EthBr) and subjected to fluorescence-activated cell sorting, where each cell is sorted with use of a laser (horizontal red line) on the basis of its fluorescence (shown as green), reporting on efflux activity. d | Encapsulation, growth and sorting with microfluidics of single mutants in droplets for droplet transposon sequencing (dTn-Seq). Each single mutant with different growth rates (in the schematic on the left, low, medium and high levels of growth are represented as blue, orange and green background colours, respectively) will grow independently within its own droplet (grey circle), eliminating the effects of interactions between mutants. A final sorting step, based on cell fluorescence or microscopy, can also be added. Alternatively, cell-containing droplets (blue droplet on the right) can undergo multiple layers of re-encapsulation, so that an encapsulated mutant can be encapsulated within another droplet containing a different cell (shown in yellow; this can be another mutant, another bacterial cell or a host cell) and signals can freely diffuse between the layers (shown as red bolts) to allow cell interactions to be investigated by sorting those cell combinations that have altered fitness and grow at different rates, or those that can be separated by sorting based on markers, such as alterations of cell morphology observed with a microscope.

Beyond growth-based selection approaches

A major recent advance of TIS is based on the ability to separate mutants by their physical characteristics, rather than solely on the basis of growth. The simplest forms of this have adapted classical microbiological assays to the massive multiplexing made possible by TIS (Fig. 2a). For example, motility genes can be assayed by ‘racing’ mutant libraries across agar plates and comparing mutants in the inner population (less motile) with those in the outer population (more motile). This approach has been applied to Escherichia coli ST131 (ref.8) and Pseudomonas aeruginosa PA14 (ref.9), leading to the identification of known motility genes, such as those encoding common bacterial motors (flagella, fimbriae and pili), in addition to new candidates. Similarly, density–TraDISort10 combines TraDIS and density gradient centrifugation to separate mutants on the basis of their density (Fig. 2b) and identify genes involved in bacterial capsule production, which is a major virulence factor for many pathogens. In this study, 78 genes underlying capsule production were identified across two clinically relevant Klebsiella pneumoniae strains10.

The application of cell sorting to TIS has led to the development of techniques that progress from bulk separation to separation of single cells. One such application is TraDISort, which combines fluorescence-activated cell sorting (FACS) and TraDIS11 and sorts single cells on the basis of fluorescence. TraDISort has used the cytosolic concentration of ethidium bromide (EthBr), a fluorescent DNA intercalating agent, as a marker for altered efflux activity (Fig. 2c). For instance, mutants with insertions in efflux pump genes, such as amvA, had reduced ability to remove ethidium bromide from the cell, resulting in an overall higher level of fluorescence. By contrast, mutants such as the amvA repressor (amvR), had increased efflux and lower fluorescence. A similar approach used a fluorescent reporter to separate heterogeneous populations of Mycobacterium tuberculosis12, which uncovered lamA, a gene of previously unknown function that reduced overall heterogeneity in the population by decreasing asymmetric polar growth. Similarly, FAST-INSeq was developed to identify regulators of typhoid toxin production, with use of FACS of Salmonella enterica subsp. enterica serovar Typhi-infected macrophages with a fluorescent reporter for toxin expression13. Lastly, Tn-FACSeq was used to identify genes from Bdellovibrio bacteriovorus, a bacterial predator, that are important for attachment to Vibrio cholerae14. These types of fluorescence-based technique could be extended further, for instance to examine bacterial responses to other fluorescent (or fluorescently tagged) compounds, to other fluorescent reporter constructs or simply using FACS to differentiate mutants with altered cell size.

Population-independent mutant assays

In traditional TIS approaches, mutant fitness is measured within the context of the entire mutant population. However, the true fitness of a mutant can be obscured when it is grown in the presence of other mutants. For instance, TIS cannot report on the effect of secreted products or other ‘common goods’ that act beyond the cell containing the mutation, or similarly mutants that suffer from density dependence. Recently, on the basis of advances in single-cell analysis, dTn-Seq was developed to address these issues15. dTn-Seq sorts single mutants by combining microfluidics with TIS, encapsulating individual transposon mutants in growth-medium-in-oil droplets, facilitating isolated growth of mutants free from the influence of the population (Fig. 2d). dTn-Seq experiments showed that in Streptococcus pneumoniae 1–3% of mutants have altered fitness when grown in isolation; some mutants may grow faster or slower in isolation compared with their growth measured in a traditional TIS screen. To highlight its versatility, dTn-Seq has been applied to investigate hypercompetence, processing of host glycoproteins, defence against host immune factors and microcolony formation15. Moreover, dTn-Seq is compatible with microscopy and FACS-based screening, and by reloading droplets into a microfluidic device, multilayer encapsulations can be achieved. Such droplets consist of multiple layers containing different mutants, other bacterial species or even host cells, between which communication signals can freely diffuse, thereby facilitating investigations of interbacterial and bacteria–host cell interactions15 (Fig. 2d). Although these droplets cannot easily be applied to in vivo animal models, any interesting phenotypes that arise from dTn-Seq screens, including host–microorganism interaction mediators, can be directly confirmed in cell culture assays and/or in vivo in animal models using targeted mutants.

Assaying function of essential genes and gain-of-function screens

One limitation of standard TIS is that only non-essential genes can be assayed, as essential genes, by definition, do not tolerate insertions. A handful of studies have overcome this by using gain-of-function screens that use libraries of transposons with outward-facing promoters to facilitate gene overexpression and repression (Fig. 3A). Monitoring the change in frequency of transposons that induce the expression of downstream genes, including essential genes, during selection can identify phenotypes that may not be evident from gene disruption (Fig. 3B). This idea is not new; for example the TnAraOut method, developed in 2000 (ref.16), used transposons containing the arabinose inducible promoter PBAD to screen V. cholerae for essential antibiotic targets. Various approaches, where an outward facing promoter is engineered into a transposon system, have been developed to assay essential genes, for example in Caulobacter crescentus17 and Staphylococcus aureus18,19. Recently, this approach was combined with traditional TIS, resulting in the TraDIS-Xpress package20 (previously known as TraDIS+ (ref.21)). TraDIS-Xpress uses an inducible PBAD promoter facing out of a Tn5 in E. coli, in addition to detailed transposon-mediated inactivation data, to query all genes. It was successfully applied to identify both essential and non-essential genes affecting tolerance to various concentrations of the biocide triclosan, and differential responses to bactericidal and bacteriostatic concentrations were found. A high-throughput method for gain-of-function assays was recently developed, dual-barcoded shotgun expression library sequencing (Dub-Seq), where barcoded overexpression libraries of E. coli were mapped, barcoded and used to assign gene function in 52 experimental conditions on the basis of mutant fitness changes due to increased gene dosage22. To control expression in gain-of-function screens, some studies17,20 used inducible promoters on a single transposon, which can have the advantage (over using multiple transposons) of allowing high library density and reducing insertion bias. Other studies used constitutive promoters with different strengths on either barcoded or different types of transposons, which has the strategic advantage that different gene dosages can be assayed in the same culture18,19.

Fig. 3: TIS to assay the functions of essential genes.
figure 3

A | An inducible promoter (right-angled red arrow), such as PBAD, is positioned facing out of each transposon (Tn) to overexpress all genes, including essential genes, so that their function can be assayed. Orange bars indicate transposon-induced transcription on top of wild-type expression (grey bars). B | In traditional transposon-insertion sequencing (TIS) approaches, essential genes can be identified as those that cannot tolerate insertions (part Ba). In gain-of-function screens, when transposons in the transposon pool are induced, for example with arabinose (part Bb), high expression of essential gene Z is achieved. When selection is applied, involvement of essential genes in the condition can now be assayed (part Bc) by monitoring relative differences in the number of transposons that influence expression levels. In this example, after exposure to the condition and sequencing out of the transposon, an increase in the number of transposon insertions that increase gene Z’s expression is observed. Therefore, mutants that overexpress gene Z have increased fitness within the overall population during selection, indicating that gene Z’s expression is beneficial in that condition. All other features are as in Fig. 1.

Scaling up TIS using high-throughput phenotyping

Although it is possible to apply the original TIS protocols at scale, the multistep library preparations involved can become increasingly costly when one is dealing with hundreds of samples. One solution to this problem, random barcode transposon-site sequencing (RB-Tn-Seq)23 introduces a random DNA barcode into each transposon. An initial conventional TIS approach is used to determine the insertion site associated with each barcode, and then a single-step PCR barcode amplicon can be directly sequenced in future experiments to track changes in mutant frequencies24, substantially speeding up screening. For instance, one recent upscaled study applied RB-Tn-Seq to 32 different bacterial strains across 129 conditions, and identified a large variety of leads for gene function25. A second problem in scaling TIS to large collections of bacteria is that optimized transposon delivery vectors often do not exist for non-model organisms. The ‘magic pool’ approach accelerates the optimization process using pools of transposon vectors, each of which has a different combination of upstream sequences (promoters and ribosome-binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows quick measurement of vector efficiency during mutagenesis26.

Developments in TIS data analysis

A typical analysis protocol for TIS data can be summarized as follows: after the splitting of sequencing reads on the basis of their multiplexing barcode, any transposon or adaptor sequences are removed, reads are mapped to an annotated reference genome and the unique position of the transposon and relative insertion coverage (that is, number of reads) are recorded, processed and presented to calculate the effect of each transposon on fitness (for example, growth or survival). Over the past 10 years, several bioinformatics pipelines and protocols have been developed for this purpose (Table 1). These TIS tools include Web-based applications27,129, stand-alone graphical applications29 and command-line toolkits30,31,32. All of these tools implement variations on both gene essentiality and conditional fitness analyses, although they differ in the details of preprocessing and read alignment, normalization techniques, and statistical models or tests used. Often choices made in the experimental protocol can impact the appropriateness of a particular analysis procedure as much as any theoretical issue; for instance, many hidden Markov model and sliding window approaches to defining essential regions in the absence of annotation are applicable only to mariner transposon studies, as the assumption of a uniform insertion probability at TA sites simplifies the underlying statistical model. Many of these core issues were addressed in a 2016 review on design and analysis of TIS experiments7, but recent TIS developments have pushed analysis methods in new directions.

Table 1 Transposon-insertion sequencing data analysis tools

One major development of TIS analysis methods is in dealing with infection dynamics and particularly the effects of bottlenecks, which are transient reductions in population size during the course of the experiment (see later). While these have been dealt with in analyses using normalization based on changes in neutral loci33 or subsampling30, two interesting new approaches to this problem were proposed recently. In the first, principal component analysis (PCA) is performed on log fold changes in mutant abundance across replicate infection experiments34. Examination of the principal components recovered can then identify linear combinations of the changes across replicates that separate genes consistently across experiments, providing a score for association of any particular gene with survival in infection and eliminating the contribution of spurious stochastic changes. A second approach has adopted the zero-inflated negative binomial (ZINB) distribution to model transposon insertion counts35. The ZINB distribution is a mixture of a logit distribution, which captures the probability of detection of a data point, and the negative binomial, which captures the overdispersion generally observed in sequencing data. This distribution has attracted attention recently in the analysis of single-cell RNA sequencing (RNA-seq), as it provides a natural mechanism for capturing technical dropout of transcripts36. Similarly, by fitting the logit component of the ZINB genome-wide, this approach can correct for differences in library saturation between conditions arising either in library creation or due to bottlenecking.

A second development is the move from simple condition–control comparisons to the simultaneous investigation of large suites of conditions. For instance, the PCA and ZINB methods highlighted above demonstrated their effectiveness using combinations of existing datasets, identifying commonalities in genes required by different Vibrio strains in infection37 or response to a panel of antibiotics in M. tuberculosis35, respectively. A striking example of such a data analysis pipeline, named AlbaTraDIS38 was developed for the TraDIS-Xpress study examining triclosan tolerance (see ref.20 and the TraDIS-Xpress discussions earlier) and uses sliding window analyses integrating all available information to predict all genes and promoter regions involved during selection.

As TIS experiments become more complex and the results are combined with other data types, tools for visualization and data delivery are becoming increasingly important. For example, a recent study integrating expression and fitness data from the HIV-associated Salmonella enterica subsp. enterica serovar Typhimurium strain D23580 (ref.39) included a Dalliance-based browser40, allowing readers to directly interrogate the data themselves and providing a valuable community resource. Platforms for easily providing this kind of interactive interface are beginning to emerge, such as ShinyOmics41, a Web-based application for rapid collaborative exploration of omics data, including TIS, RNA-seq and proteomics date, which allows comparisons between datasets, PCA and simple network analysis. As datasets accumulate and automation increases throughput, such integrative analysis approaches will become increasingly important.

Key biological applications of TIS

Since its development, TIS has been used in a range of in vitro studies as well as in vivo infection models. Here we summarize how the development of TIS has facilitated the investigation of key biological questions, with a focus on studies with implications for human health.

Identifying genes and networks involved in antibiotic resistance

The emergence of antibiotic resistance is a major global health problem, exacerbated by a lack of development of new antibiotics. TIS is well equipped to infer the relative impact that disrupting each genomic feature has on antibiotic sensitivity (Fig. 4) and can contribute to developing a better understanding of how resistance emerges, as well as guide the development of new strategies to target resistant bacteria. Traditional TIS experiments performed by culturing transposon libraries with inhibitory but sublethal concentrations of antibiotics for several generations have been used to define a comprehensive non-essential gene complement involved in intrinsic resistance for many clinically important pathogens, including the notorious ESKAPE species (Enterococcus faecium, S. aureus, K. pneumoniae, Acinetobacter baumannii, P. aeruginosa and Enterobacter species)18,42,43,44.

Fig. 4: Mapping complex genotype–phenotype relationships.
figure 4

a | A gene–antibiotic network for three antibiotics (based on refs54,119). Each node (circle) depicts a gene, whereas each edge indicates a negative (solid grey line), neutral (dashed grey line) or positive (solid red line) effect on fitness between the genes in the presence of antibiotics as determined by transposon-insertion sequencing (TIS). All three antibiotics — penicillin G (PeniG), vancomycin (Vanco) and daptomycin (Dapto) — affect cell wall integrity but TIS uncovers a wide variety of genes involved, many of which are not direct targets of the antibiotic. In this case, the unknown gene Q is likely to be involved in peptidoglycan synthesis/membrane integrity on the basis of the function of other genes with similar fitness profiles. b | By mutation of gene Q and construction of a transposon library in this mutant background, genetic interactions can be identified (based on concepts from ref.33). In this case, the genes uncovered in the mutant background, as depicted in the gene interaction map (GIM), further support a role in peptidoglycan synthesis, that it may function in the cell membrane and that it may be controlled by a particular regulator. Additionally, in vivo TIS data with the mutant library performed in healthy mice and mice depleted of neutrophils (neut−) indicate that gene Q is needed to establish lung infection but is dispensable in the absence of neutrophils. Adapted from ref.54, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/).

These studies have shown that while antibiotics may have specific targets (for example, in cell wall synthesis, DNA replication or protein synthesis), the bacterial response to antibiotics is actually distributed across the genome. For example, fluoroquinolones target topoisomerase IV and DNA gyrase, which are essential enzymes involved in DNA replication. Although most TIS experiments cannot assay these targets directly owing to their inherent essentiality, TIS profiles generated under fluoroquinolone exposure implicate other genes involved in DNA replication and repair, such as recN and xseA33,45. While these genes are not direct fluoroquinolone targets, they contribute to intrinsic resistance as part of a secondary effect; for example, fluoroquinolones trigger DNA damage, which activates DNA repair. In general, antibiotic TIS profiles for each antibiotic tested, and each organism screened, show a role for genes beyond those related to the primary target, indicating the importance of genes with diverse functions, including amino acid and carbohydrate metabolism, energy generation, transport and regulation42,43,46,47,48,49,50,51. This finding underlines that although we have a limited view of how an antibiotic inhibits a bacterial cell, TIS can be used to uncover this complex, multifactorial process. As a result, TIS profiles have been demonstrated to be effective in determining the mechanism of action of novel antibiotics52,53. Creating profiles for multiple similar conditions can help to direct attention to genes with unknown function that are important under all of these conditions and thereby help to identify leads that may assist in uncovering their function (Fig. 4b and see Integrating TIS with other genomic approaches to predict complex traits). Moreover, TIS profiles can also uncover opportunities to sensitize a bacterium to a drug, facilitating the design of secondary or helper drugs42,54.

Investigating virulence genes, host adaptation and vaccine development

Interrogating the genomic requirements for pathogens to cause disease has been a major motivation in the development of large-scale reverse genetics approaches (Fig. 5). For instance, the development of early transposon screens in Salmonella Typhimurium55,56 provided key evidence for the discovery of major virulence factors. TIS has made it much easier and faster to screen bacterial pathogens virulence factors. These have included in vitro assays, such as capsule production10,57, growth in serum58 and colicin resistance and sensitivity in E. coli59, as well as sporulation in Clostridioides difficile60. An early in vivo screen used retrospective TIS on existing samples to examine Salmonella Typhimurium genes involved in infection of three farm animals (chickens, cows and pigs61,62) plus mice, and found multiple conserved virulence genes. Other examples include an analysis of the virulence genes of Legionella pneumophila using both cell culture and mouse models63, survival of Streptococcus pyogenes in human saliva64, survival of A. baumannii in a bloodstream infection mouse model65, survival of Streptococcus equi in horse blood or hydrogen peroxide66, the demonstration that oxidative stress resistance enhances V. cholerae host adaptation in a mouse model67, survival of Burkholderia cenocepacia in a Caenorhabditis elegans host68, Streptococcus mutans infection in an oral rodent model69 and the building of a targeted sublibrary of type IV secreted proteins in the intracellular pathogen Coxiella burnetii using INSeq, which was subsequently screened for vacuole formation in human HeLa cells70.

Fig. 5: Bottleneck and realism trade-off in TIS infection models.
figure 5

Different animal models of infection (top) can induce strong bottleneck effects (bottom) in infecting bacterial populations, which can confound transposon-insertion sequencing (TIS) analysis. This bottleneck effect is particularly pronounced in models where bacteria must overcome barrier defences (for example oral infection models of Salmonella enterica subsp. enterica serovar Typhimurium). Organoid models can provide a complex environment while limiting bottlenecks, but technical limitations in culture may limit the total population size screened. Finally, cell culture screens can be scaled to arbitrary sizes, allowing screening of extremely large collections of mutants, but they often provide insight into only a particular aspect of disease.

Although TIS has streamlined the process of generating and monitoring mutant libraries, challenges remain in applying the technique to infection models. A primary concern is the effect of bottlenecks, which can be quantified experimentally by measuring the loss of neutral markers33, for instance using the wild-type isogenic tagged strains (WITS) method71. Bottleneck effects should be considered computationally (see earlier) and can, at least partially, be avoided by careful consideration of the infection model. The size and temporal structure of a bottleneck are often specific to the particular infection model, and can be influenced by a range of factors, including physical barriers, nutrient availability and competition with the native microbiota72. Whereas mild bottleneck effects can be partially compensated for during analysis, major bottlenecks can irreversibly bias an experiment that does not account for them. In these cases, the surviving mutants represent the subset of bacteria that happened to pass some barrier to infection, rather than being representative of all mutants that could, leading to skewed representation and a lack of reproducibility between experimental replicates. The wide variety of infection models developed to study Salmonella Typhimurium provide an example of how model choice can affect the design of a transposon screen (Fig. 5). For the most realistic infection models based on infection through the gut epithelium, bottlenecks can be severe73,74, limiting transposon analysis to small pools of tens to hundreds of mutants61,62. Intraperitoneal inoculation can bypass this major bottleneck in a mouse model, which allowed screening of ~10,000 mutants in a single animal75; however, the results are uninformative with regard to gastrointestinal disease. Finally, in the case of cell culture models, such as the macrophage model that captures a key challenge to the development of systemic salmonellosis55, the only constraint on library complexity is the number of cells available for infection, allowing efficient screening of very large mutant populations (~106)39. Similar trade-offs between realism and library complexity are likely to exist for many infection models, particularly those that involve bacterial penetration of barrier defences.

Experimental challenges notwithstanding, TIS applied to animal models has also proved useful in identifying and understanding vaccine targets. For example, screening of a S. pneumoniae TIS library in a ferret transmission model, describing the fitness landscape of genes during mammalian transmission, yielded valuable and translatable data. Targeted deletion confirmed that key TIS hits (putative C3-degrading protease CppA, iron transporter PiaA and competence regulatory histidine kinase ComD) significantly decreased transmissibility. Importantly, maternal vaccination with recombinant PiaA and CppA alone or in combination blocked transmission from mother to offspring and was more effective than capsule-based vaccines76. In a second example, a mouse sickle cell disease (SCD) model coupled with TIS identified a set of pneumococcal virulence genes specific to hosts with SCD. Not only did these factors point to aspects of SCD pathophysiology, but they also showed that the protective capacity of antigens can be different in the healthy versus the SCD population, highlighting the importance of understanding bacterial pathogenesis in the context of common comorbidities77.

Assaying functional components of mobile genetic elements

Mobile genetic elements, including plasmids, transposable elements and bacteriophages, are important players in interspecies and intraspecies gene transfer and are heavily implicated in the spread of antibiotic resistance and virulence determinants. Plasmids are notoriously difficult to study with screening approaches, owing to their independent replication systems and capacity to regulate copy number. TIS has been used to identify genes involved in maintenance of IncA/C plasmids in E. coli, and the results were then developed into an IncA/C plasmid typing scheme78. Additionally, TIS has also been used to demonstrate the involvement of a type IV secretion system in conjugation of an IncP plasmid in Edwardsiella piscicida79.

Bacteriophages (‘phages’) are important mobile genetic elements that have been used for bacterial typing for decades and can greatly influence bacterial pathogenesis through the transduction of pathogenicity islands. Additionally, phage therapy to treat resistant bacterial infections is experiencing a resurgence as an alternative to antibiotics. TIS is able to identify essential host factors that mediate or hinder bacteriophage infection. For example, challenge of an E. coli O157 TraDIS library with T4 and T7 bacteriophages identified new host genes involved in both bacteriophage resistance (for example, sspA, encoding stringent starvation protein A) and susceptibly (for example, the sap operon)80. Similarly, experiments with bacteriophages specific to particular capsules have allowed the identification of not only modifiers of bacteriophage resistance81, but also genes responsible for capsule expression57.

Uncovering essential genes and the influence of the pan-genome

One of the first uses of TIS was to define the essential genes for survival for a plethora of bacterial species, including human pathogens such as Porphyromonas gingivalis82, B. cenocepacia83 and Yersinia pseudotuberculosis84, animal pathogens such as S. equi85, plant pathogens such as Pseudomonas syringae86, the model organism E. coli K12 (ref.87) or commensal gut bacteria such as Bifidobacterium breve88. These valuable essential gene datasets gained from TIS studies have been shown to correlate well with existing phenotypic gene essentiality data, such as from the single-gene E. coli knockout library (Keio University) 89, and these can not only be interpreted to understand basic functioning of the cell but can also provide vital information for identifying potential novel drug targets.

Once TIS has been implemented in a species it is often straightforward to create libraries in related strains. This flexibility facilitates functional exploration of a species’s pan-genome and how genetic background can affect phenotype. This kind of in-depth investigation can ultimately help to uncover how bacterial species, particularly diverse species that include both pathogens and non-pathogens, can become harmful or antibiotic resistant, for example via horizontal transfer of pathogen-associated genetic material. TIS in nine strains of P. aeruginosa determined the core essential genome in five media, and highlighted that essentiality of some genes depends on genomic context90. The influence of genetic background on phenotype is further illustrated by examples that highlight how genes involved in responding to antibiotic stress can be strain specific. For example, screening of two S. pneumoniae isolates for genes that are important for intrinsic resistance to antibiotics from three classes showed that on average only ~50% of the responsive genes are common between strains. Investigation of the underlying reasons for this variability showed that network architecture, including regulatory pathways that direct competence, are wired in a strain-specific manner, thereby making responses strain specific54. A recent study probed five diverse strains of S. aureus for daptomycin resistance mediators and identified several core pathways consistently involved across strains, including the lipoteichoic acid pathway, as well as factors that varied with strain diversity, such as the cell envelope18. Furthermore, in A. baumannii a single gyrA resistance allele results in preferential poisoning of topoisomerase IV by ciprofloxacin, leading to large alterations in the fitness landscape of insertion mutants compared with a wild-type gyrA background. This altered background triggers the activation of prophage and quickly leads to the emergence of ciprofloxacin-resistant clones91. In M. tuberculosis, loss-of-function mutations in katG can result in isoniazid resistance. However, TIS experiments have shown that several clinical strains have an increased requirement for katG compared with the reference strain H37Rv92. This variability underscores how genome variation can affect adaptive solutions and highlights the importance of extending TIS to clinical isolates that may have a very different genetic background to laboratory strains.

Understanding metabolism, the response to environmental factors and microorganism–microorganism interactions

Despite decades of accumulating genome sequences in public databanks, many protein-coding genes remain unannotated or carry inaccurate annotations, particularly in non-model and difficult-to-culture organisms. Often even the conditions under which a gene contributes to survival are unknown, leading to a serious roadblock in any attempt at molecular characterization. TIS can help us understand how bacterial cells experience the changing environments encountered in nature, by providing comprehensive profiling of mutant phenotypes93. Specifically, recent TIS studies have identified fitness determinants during energy-limited growth in P. aeruginosa94 and outlined how several bacterial species synthesize amino acids95. Others have identified genes in E. coli that promote survival during exposure to ionizing radiation96, genes in Salmonella Typhi that allow adaptation to survival in water97 and genes involved in desiccation stress in Salmonella Typhimurium98. TIS approaches have also uncovered how bacteria interact with plants or deal with soil environments99. For instance, TIS was used to identify genes required for growth of the soft-rot pathogenic bacterium Dickeya dadantii in chicory plants100 or genes of Pantoea stewartia that are essential for survival in planta to provide insights into how it causes wilt disease in corn101. As the breadth of examined stresses increases, shared adaptations to diverse stress conditions are emerging.

In a massively upscaled example of assaying genes for adapting to changing environments in parallel, a recent study demonstrated the utility of this approach by applying RB-Tn-seq to 32 diverse bacteria in ~200 conditions; this work assigned a phenotype to more than 11,000 uncharacterized proteins, with ~2,000 of these functional annotations demonstrated to be conserved across organisms25. A similar study investigated the major human gut commensal and obligate anaerobe Bacteroides thetaiotaomicron across 492 conditions and identified genes involved in metabolism and bile tolerance102. Together, these two studies demonstrate how TIS can be applied broadly across organisms and deeply within an organism to extract leads for future molecular characterization.

TIS approaches can also directly answer questions with implications for human health; for instance, how does the gut microbiota affect drug metabolism and toxicity? Using the robotics-driven TIS mutant array approach pioneered by the INSeq method3,103, where individual mutants are mapped through combinatorial pooling, a comprehensive library of ~1,300 B. thetaiotaomicron mutants were selected and incubated with the antiviral drug brivudine. Drug metabolites were then measured by mass spectrometry, and individual B. thetaiotaomicron genes required for the production of a hepatotoxic metabolite were identified. Follow-up studies in gnotobiotic mice confirmed the in vivo relevance of this toxin production pathway104, illustrating the power of this approach in understanding drug–microbiota interactions.

In the wild, microorganisms rarely live planktonically in isolation, but are constantly interacting with other microorganisms and forming communities, either by chance as in wound co-infections or as part of a stable ecosystem as in the mammalian gut. TIS provides an opportunity to understand these microorganism–microorganism interactions, as the genetic response of one bacterium to other bacteria can be recorded. Numerous studies have shown that the fitness effects of gene disruption can depend critically on the presence of other community members. For instance, co-infection has been shown to alter the bacterial fitness landscape in wound models, such as with the opportunistic pathogens Streptococcus gordonii and Aggregatibacter actinomycetemcomitans105. Furthermore, studies of co-infection with P. aeruginosa and S. aureus in mouse surgical wounds showed that ~25% of S. aureus genes that are essential during co-infection are no longer needed during single-species infection (Fig. 6a). Furthermore, single mutants, such as those of the community-dependent essential gene udk, encoding a uridine kinase, were confirmed to influence levels of co-infection but not monoinfection in vivo106. Interaction studies have even been extended to predatory relationships, such as in studies of B. bacteriovorus that identified genes required for predation of V. cholerae during planktonic and biofilm growth using Tn-Seq libraries of both the prey107 and the predator14. Similarly, bacterial genes that influence infection with viruses have been examined by TIS in numerous bacterial species57,80,81,108. The mechanisms of bacterial interactions within communities have also been investigated by screening for effectors of type VI protein secretion systems (T6SSs). T6SSs are conserved bacterial defence mechanisms that deliver toxins to neighbouring cells through a contact-dependent mechanism, killing those that lack immunity proteins (Fig. 6b). New toxin and immunity genes have been identified through TIS in V. cholerae109 and P. aeruginosa110. Together, these studies have provided insight into a wide range of relationships that shape the microbial environment.

Fig. 6: TIS to assay microorganism–microorganism interactions.
figure 6

a | Transposon-insertion sequencing (TIS) of Staphylococcus aureus (SA) mutants alone (orange circles) compared with coculture of SA and Pseudomonas aeruginosa (PA) wild-type strain (green rods) can identify community-dependent essential genes that are needed only during coculture. Other features are as in Fig. 1. b | Using TIS to identify type VI secretion system (T6SS) toxin immunity pairs by growing cells with close cell-to-cell contact, and performing TIS so that genes involved in protection of T6SS-depending killing (depicted as a yellow bolt) can be detected. In this example, genes in PA encoding immunity proteins (Tsi; orange diamond) that protect from neighbour killing by toxins (Tse; red square) become essential only in a T6SS-active (retS; bottom panel) background but not in the inactive (H1_retS; top panel) cellular background.

Integrating TIS with other genomic approaches to predict complex traits

Whereas TIS has been most commonly used to make simple associations between environments and genetic components, it can also uncover more complex relationships. No genomic element, gene or pathway exists in isolation; rather they are connected through intricate networks, resulting in specific organismal properties and, ideally, an appropriate response when disturbed. One layer of these networks is gene regulation, including the non-coding genome. By combining saturated TIS libraries with expression profiling, one can identify functional non-coding RNAs (ncRNAs). For example, RNA-seq can be used to map out expression units across the entire genome and indicate whether non-coding/intergenic regions display significant levels of transcription. In turn, a parallel TIS experiment with insertions in these regions can then be used to associate a phenotype with the disrupted ncRNA. This approach was used in S. pneumoniae, and yielded 89 ncRNAs, more than half of which had not been identified previously, and several could be associated through TIS in vivo data as being critical for virulence111. A follow-up study used different RNA-seq techniques to map the full transcriptional landscape of the S. pneumoniae virulent type strain TIGR4 (ref.112). This resulted in identification of many non-coding regulatory regions, which could be associated with a phenotype through integration of TIS data from different environments. Another comprehensive TIS-based regulatory study in Neisseria meningitidis identified 288 genes and small ncRNAs needed for colonization of human epithelial and/or endothelial cells113.

Mapping of genetic interactions, which quantify fitness dependencies between genes, can be used to build genetic interaction networks and infer regulatory relationships, pathway structures or leads for gene function114. Genetic interactions can be identified by creating a TIS library in a query gene deletion background and screening for genes whose fitness deviates from the multiplicative fitness of the individual mutants (Fig. 7). The most obvious example of a genetic interaction is synthetic lethal interaction, where two individual mutants have no or little fitness effect but abolish growth when combined, and can occur if two gene products perform redundant essential functions. A variety of such interactions exist, which imply different types of relationships between components (reviewed in ref.114). Genetic interaction networks can be combined with in vivo studies to further inform gene function in the host. For example, if a gene involved in intrinsic resistance to cell wall-targeting antibiotics is important only in healthy mice but not when certain immune components are missing, this indicates that the gene product is not only a resistance factor but is also potentially visible to the immune system (Fig. 4). Several studies have successfully used a genetic interaction approach with TIS in S. pneumoniae, including to uncover regulatory dependencies for catabolite control protein A, how a potassium uptake system and a subpathway for nasopharyngeal colonization are regulated and how the protease ClpP is involved in competence1,33,54. Additionally, cell division components such as CozE in S. pneumoniae were identified by use of pbp1A as a query gene, revealing that CozE directs the activity of Pbp1A to the midcell plane, where it promotes zonal cell elongation115. Alternatively, a query gene/pathway can be inhibited by a drug or inhibitor, as has been done in the case of wall teichoic acid biosynthesis in S. aureus, and then screened with TIS for synthetic lethal interactions116. This study connected wall teichoic acids with other pathways, including cell-envelope D-alanylation, and peptidoglycan and lipoteichoic acid synthesis116. Other applications of note include investigating the role of the quorum-sensing and virulence regulator LasR in different P. aeruginosa infection models117, and assigning function to uncharacterized genes in M. tuberculosis118.

Fig. 7: Integrating TIS with RNA-seq data.
figure 7

A | An example of combining RNA sequencing (RNA-seq; depicted in green throughout) and transposon-insertion sequencing (TIS; depicted in red throughout) to identify antagonistic interactions between the antibiotics polymyxin B (PolyB) and gentamicin (Gent) or tobramycin (Tobr) in Pseudomonas aeruginosa. TIS data indicate that genes mexY and mexX are involved in intrinsic resistance in P. aeruginosa to gentamicin and tobramycin, indicated by the red edges between the antibiotics and the genes. When these genes are disrupted by a transposon insertion, the bacterium becomes more sensitive to these antibiotics. Moreover, RNA-seq data reveal that polymyxin B induces expression of these genes, as indicated by the green arrows. This led to the hypothesis that polymyxin B, owing to its transcriptional activation of mexY and mexX, will make the bacterium less sensitive to either gentamycin or tobramycin. The study authors confirmed experimentally that these antibiotics work in an antagonistic manner48, which highlights the strength of probing response networks from different perspectives to extract biological meaning. B | Measurement of TIS and RNA-seq responses under the same conditions (part Ba) has shown that the transcriptional responses to a specific environment (Δ expression) are often not accurate predictors of gene deletion phenotypes, as expression and fitness do not correlate well (Δ fitness; part Bb)119. However, by overlaying these datasets over a known network (for example, a metabolic network; part Bc), network analyses can identify patterns between TIS and RNA-seq. In this example a small part of a metabolic network is depicted; grey circles are metabolites and arrows are genes encoding enzymes that mediate each reaction. Red arrows are phenotypically important genes in a specific environment identified by TIS, whereas green arrows are genes that change transcriptionally in the same environment, identified by RNA-seq. The distance between two genes in a metabolic network is the number of reactions between them, and can be calculated for all pairs across the network. In part Bd, the left network is an example where distances between pairs of fitness and expression changes are small, whereas the right network illustrates larger distances. An adapted response to an environment is characterized by fitness and expression changes that are relatively small (distance to neighbour) and correlated (Δ fitness × expression). Exposure to stress conditions to which a bacterium is not adapted leads to a loss in correlation between genes that change in transcription and those have a fitness effect (part Bd)119. Importantly, such associations can be used to make predictions on antibiotic susceptibility126. Part B is adapted with permission from ref.119, Elsevier.

Both TIS and RNA-seq data have shown that even relatively simple perturbations in bacteria (for example, changes in pH, or exposure to low-level antibiotics) trigger complex responses. There is value in combining these data sets; for example, combined TIS and RNA-seq antibiotic response data obtained from P. aeruginosa allowed predictions of antagonistic antibiotic combinations48 (Fig. 7A). Such observations suggest that TIS and RNA-seq may register distinct but complementary features of the underlying network architecture of the cell and illustrate how responses can be separated into at least two organizational levels (phenotypic and transcriptional). Remarkably, when expression and fitness are directly compared (Fig. 7Ba) there is often little correlation between them (Fig. 7Bb)119,120,121, with some notable exceptions, such as in some metabolic pathways120 and classical virulence factors39. This discordance has long been known in yeast122,123,124, and suggests that the majority of transcriptional regulation is not optimized by selection125.

Considering RNA-seq and TIS in the context of the underlying cellular network can clarify their relationship119 (Fig. 7Bc). When responding to an environmental stimulus to which a bacterium is adapted (for instance nutrient depletion), the majority of fitness and expression changes occur at short distances from each other within the metabolic network, with 80% of genes with fitness changes connected by two or fewer metabolic reactions to genes with expression changes, and 93% within a three-reaction radius119 (Fig. 7Bd). Furthermore, the correlation of fitness and expression changes decreases with distance in the network119. This indicates that fitness and expression changes are not only colocated within the network but are of a magnitude comparable to those of their neighbours. These relationships can disappear when a bacterium responds to an environment to which it is not adapted, for instance an antibiotic119 (Fig. 7Bb). This suggests that the apparently paradoxical lack of correlation between fitness and expression measurements can be in part understood through network models that incorporate regulatory and genetic relationships, which could aid drug target predictions and genetic network engineering. Quantifying the degree of disruption a stimulus creates in a bacterium’s transcriptional network has already resulted in accurate predictions of fitness, antibiotic sensitivity and drug mechanism of action126.

Conclusions and future perspectives

Considerable advances and extensions of TIS have been made since its introduction in 2009, many of which are highlighted in this Review. As a result of advances in cell sorting and microfluidics, TIS has been adapted to examine phenotypes on the level of single mutant cells11,15,127. TIS has also demonstrated its practical utility, for example, in the development of new vaccine candidates76,77 and new antibiotics or helper drug targets42. Lastly, genetic interaction approaches are beginning to build networks that map out the complex relationships between genetic components within the cell, and these could be further extended by combining two transposons into a single genome, by combining TIS and CRISPR-based transcriptional interference (CRISPRi) or by scaling approaches that simultaneously mutagenize interacting organisms128.

Furthermore, the vast majority of TIS studies to date have been performed in bacterial species, owing to this ease of genetic manipulation. For those species in which TIS works well, it is a powerful technique that can provide high volumes of valuable genotype–phenotype linkage data on a fine scale. Over the next decade, we hope to see an expansion in the types of organisms assayed by TIS methods. Encouragingly, several related TIS-like methods have been developed for use in mammalian, fungal, parasite and archaeal backgrounds. Profiles of TIS applied in the two model yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe identified essential genes, genes involved in rapamycin resistance and factors that contribute to the formation of heterochromatin129,130,131. Other examples include determining the essential genes of the archaeal species Sulfolobus islandicus132 and Methanococcus maripaludis133, phenotypic interrogation via tag sequencing (PhITSeq) in haploid human cells to assign gene function134,135, quantitative insertion site sequencing (QISeq) in mice using piggyBac and Sleeping Beauty transposons to screen them for cancer-related genes136,137, QI-Seq in Plasmodium falciparum, using the piggyBac transposon to determine essential genes138 and barcode analysis by sequencing (BarSeq) in yeast24,139. Moreover, we expect to see the biological questions answered with TIS to become increasingly complex. Such applications will only enhance the utility and breadth of TIS approaches in the future.

The analogous functional genomics method of pooled CRISPRi screening, which silences genes in a targeted fashion and uses single-guide RNAs and catalytically dead Cas proteins (first demonstrated with dCas9 (ref.140)), has been successfully applied to numerous bacterial species since the development of mobile CRISPRi systems141,142,143,144,145. CRISPRi has some advantages over TIS, primarily that silencing is directly targetable to regions of interest, which can reduce the complexity of the assay and thus the amount of sequencing reads required, and it can allow knockdown of any coding regions, for instance essential genes142, which traditional TIS cannot. However, CRISPRi requires design, synthesis and cloning libraries of sgRNAs, which can be technically challenging, and understanding the impact of off-target effects or differences in sgRNA efficiency can add complications during implementation and analysis. By contrast, the execution of TIS requires no prior specific knowledge of the genetic make-up of an organism, and owing to its more random nature, TIS can uncover unexpected or novel genes, can potentially assay transcriptionally inactive regions of the genome and can be precise enough to interrogate specific regions within the transcriptional units, such as essential protein domains38. Modifications of both technologies can assay the effects of gene overexpression and suppression through complementary approaches. Functional genomics screening techniques such as TIS and CRISPRi can suffer from similar shortfalls, namely that deciphering detailed mechanistic insight from the large datasets generated can be difficult to automate. For effective data analysis, research groups will have to pool data, resources and expertise to construct holistic workflows that can manage this complexity. To this end, we expect to see in an increase in data sharing platforms, such as the newly established TIS depository TraDIS-vault, the viewer available for the invasive Salmonella Typhimurium strain D23580 (ref.39) or interactive data visualization platforms, such as ShinyOmics41.

Looking forward, we predict that TIS methods will be applied to answer increasingly complex and diverse biological questions. For this expansion, we must move past the straightforward, homogeneously grown laboratory assays to more sophisticated ones that better mimic ecological states that occur in nature. One major limitation of TIS is that it is available only for use in easily culturable and genetically tractable species, which represent only the minority of total bacterial and archaeal species146. A key challenge will be to develop tools to allow the recalcitrant microbes of medical, industrial and environmental importance to be assayed. TIS-inspired methods, such as the ‘magic pool’ approach to optimizing transposon delivery in non-model organisms26, are already beginning to address this. A driving factor will be massive upscaling in the numbers of conditions and bacterial strains that can be simultaneously screened, building on RB-Tn-Seq25 and similar approaches. As these methods push beyond model strains, we will increasingly gain insight into how the genetic diversity within pan-genomes is shaped and maintained. This migration away from one-dimensional genotype versus phenotype experiments will require consideration of the larger genetic network and particularly interactions between genetic background and fitness (see Scaling up TIS using high-throughput phenotyping). This will involve the application of machine learning, modelling and network analyses to integrate and extract knowledge from accumulating TIS datasets. Eventually, and in combination with other postgenomic functional data, this approach will increasingly enable us to move from describing the genetic architecture of the cell to predicting future behaviours119,147. Collectively, these developments illustrate that the journey of TIS is far from over, with many exciting paths yet to be explored.