A decade of advances in transposon-insertion sequencing

It has been 10 years since the introduction of modern transposon-insertion sequencing (TIS) methods, which combine genome-wide transposon mutagenesis with high-throughput sequencing to estimate the fitness contribution or essentiality of each genetic component in a bacterial genome. Four TIS variations were published in 2009: transposon sequencing (Tn-Seq), transposon-directed insertion site sequencing (TraDIS), insertion sequencing (INSeq) and high-throughput insertion tracking by deep sequencing (HITS). TIS has since become an important tool for molecular microbiologists, being one of the few genome-wide techniques that directly links phenotype to genotype and ultimately can assign gene function. In this Review, we discuss the recent applications of TIS to answer overarching biological questions. We explore emerging and multidisciplinary methods that build on TIS, with an eye towards future applications.

Transposon-insertion sequencing (TIS) methods combine large-scale transposon mutagenesis with nextgeneration sequencing to estimate the essentiality and/or fitness contribution of each genetic feature in a bacterial genome simultaneously. A strength of TIS is that experiments are performed with pooled transposon libraries, which allows direct linkage of phenotype to genotype in a high-throughput manner. Ultimately TIS aims to elucidate the function of each genomic feature and is therefore a critical tool to help interpret the mounting levels of genome sequencing data being generated. TIS methods can be sensitive enough to detect even minor changes in mutant fitness but also, with sufficient density, precise enough to be able to assay not only genes but also intergenic regions, promoter regions and essential protein domains within coding regions. Four variations on the TIS method were published in 2009: transposon sequencing (Tn-Seq) 1 , transposon-directed insertion site sequencing (TraDIS) 2 , insertion sequencing (INSeq) 3 and high-throughput insertion tracking by deep sequencing (HITS) 4 . Since then, TIS has become a valuable tool in our molecular biology toolkit, whose full utility is still being explored.
The basic TIS workflow is summarized in Fig. 1. Briefly, it begins with construction of a saturated mutant library (Fig. 1A) by introducing a randomly inserting transposon, commonly a Tn5 or mariner transposon, into a strain of interest often by transformation or conjugation. The goal is to create a population of bacteria where each cell carries a single transposon insertion in the genome, and when cells are pooled together, each genetic component is disrupted multiple times at different sites. By direct sequencing of the transposonflanking regions of the initial library, potential essential features can be identified as those that do not tolerate insertions. Alternatively, the library can be subjected to a selective condition, for instance antibiotic stress (Fig. 1B), to query non-essential features involved in survival and growth within that environment. Such conditionally important components are defined by insertions whose frequency significantly changes in the population during the selection, determined by sequencing before and after selection. genomic features that have disruptive transposon insertions with a decrease in frequency over experimental selection are assumed to be important for fitness in the test conditions; such features could include antibiotic resistance genes during antibiotic selection or virulence factors in an infection model. Features where insertions show an increase in frequency are assumed to have a disadvantageous effect in the test conditions, including negative regulators of fitness-enhancing features, or metabolically costly systems that are not necessary in those conditions.
There are four major TIS versions that differ in various steps of their sequencing procedures (see reF. 5 for more detail on these variations). For example, the way DNA undergoes fragmentation for library preparation differs: Tn-Seq and INSeq use the type II restriction enzyme MmeI to yield uniform-length shorter reads, which can remove PCR amplification bias, whereas TraDIS and HITS use random-sized shearing via sonication, which can have the advantage of improved transposon mapping owing to longer reads. Similarly, Tn-Seq and INSeq exclusively use the mariner transposon,  The first is to create random transposon (Tn) mutants (part Aa). The horizontal black lines and arrows represent the host's genomic DNA (gDNA) and the coding regions of genes are marked 'X', 'Y' and 'Z'. The horizontal blue line is the transposon containing an antibiotic resistance (Ab R ) selection marker and bounded by inverted repeats, shown in green. When the transposon inserts itself into the gDNA, the disruption in a gene (gene Y in this example) is shown by a red cross. The second step is to select and pool mutants (part Ab). The red cross represents a single mutation in each cell; these cells are selected, for instance on antibiotic-containing agar plates, and pooled and DNA is extracted. The third step is fragmentation, addition of adaptors and PCR amplification (part Ac). Fragmentation (vertical dashed lines) can be enzymatic or by shearing (depending on the version of TIS). Sequencing adaptors (yellow rectangles) are then added, and primers (purple arrows) P1 and P2 are used for PCR amplification.
Step 4 is sequencing and mapping (part Ad). Sequences out from the transposon end (using primer P3) are mapped onto the reference genome and the transposon insertion point is determined (vertical red arrow) and mapped for each mutant. Genes that cannot tolerate insertions (gene Z in this example) will not have any TIS reads mapped. B | Challenging the TIS library -in this example with an antibiotic, colistin (bottom row), compared with an untreated control (top row) (data from reF. 42 ). The vertical lines denote the density of insertions at each insertion site, and red and blue denote forward or reverse insertion direction, respectively. Below are the predicted genes, in light blue. The first gene (usq) has equivalent numbers of insertions in the treated and untreated samples and thus has no effect on fitness in colistin. The next gene (truA) has relatively more insertions in the treated sample compared with the control (its mutants have increased fitness in colistin) and thus is considered a sensitivity gene. The next gene (dedA) is an experimentally confirmed resistance gene 42 and it has decreased insertions in the treated sample (mutants have decreased fitness in colistin). The last two genes (accD_1 and folC) have no insertions in either the treated sample or the untreated sample and are thus considered essential for growth.
which inserts itself into thymine-adenine dinucleotide (TA) sites but otherwise does not have a sequence  preference, and the others have the flexibility that they  can use any transposon, but they commonly use Tn5 as  it is commercially available and does not have a insertion site bias. After fragmentation, various adaptors are  added, and transposon-genome junctions are amplified and sequenced with a sequencing primer facing out of either the transposon or the adaptor. Finally, mapping of the adjacent genomic DNA allows the exact position of each transposon in the bacterial genome to be determined with use of appropriate bioinformatic tools (see Developments in TIS data analysis).
Since the last comprehensive reviews on TIS 5,6 in 2013, a range of exciting and multidisciplinary methods that build on TIS have emerged to answer increasingly complex biological questions. These advances include scaling TIS analysis to hundreds of different conditions using high-throughput phenotyping, the use of machine learning to predict bacterial survival outcomes and combining TIS with cutting-edge techniques from single-cell analysis (droplet Tn-Seq (dTn-Seq)) to fluorescence sorting (TraDISort). Analysis tools have also evolved to cope with this increase in complexity of TIS studies. Lastly, a broad range of in vitro and in vivo applications of TIS have been implemented in pathogenic, commensal and environmental bacteria in the past decade. In this Review, we discuss these exciting developments and applications of TIS and present our vision for TIS into the future. We refer readers to previous reviews [5][6][7] for detailed information on the design of TIS experiments, including choice of transposon and statistical impacts of experimental parameters, comparisons of TIS method variations, limitations of standard TIS and details on applications before 2013.

Advances and extensions of TIS methods
Over the past decade, TIS methods have been developed to incorporate other technologies and techniques to answer complex biological questions in creative ways. These include physical separation and sorting of individual mutant cells, using inducible promoters to study essential genes and scaling of current techniques to simultaneously screen multiple environments and different species, facilitating pan-organism analysis (Fig. 2).
Beyond growth-based selection approaches. A major recent advance of TIS is based on the ability to separate mutants by their physical characteristics, rather than solely on the basis of growth. The simplest forms of this have adapted classical microbiological assays to the massive multiplexing made possible by TIS (Fig. 2a). For example, motility genes can be assayed by 'racing' mutant libraries across agar plates and comparing mutants in the inner population (less motile) with those in the outer population (more motile). This approach has been applied to Escherichia coli ST131 (reF. 8 ) and Pseudomonas aeruginosa PA14 (reF. 9 ), leading to the identification of known motility genes, such as those encoding common bacterial motors (flagella, fimbriae and pili), in addition to new candidates. Similarly, density-TraDISort 10 combines TraDIS and density gradient This includes assaying genes for motility by inoculating transposoninsertion sequencing (TIS) mutants on an agar plate (yellow circle and beige spot) and separating the inner mutant pool (less motile) from the outer mutant pool (more motile). b | In density-TraDISort, mutant populations can be separated into top, middle and bottom fractions (shown by horizontal orange bands) on the basis of their increased or decreased cellular density using a Percoll gradient and centrifugation. c | Separation of single mutants using fluorescence (TraDISort). The mutant pool is treated with the fluorescent marker ethidium bromide (EthBr) and subjected to fluorescence-activated cell sorting, where each cell is sorted with use of a laser (horizontal red line) on the basis of its fluorescence (shown as green), reporting on efflux activity. d | Encapsulation, growth and sorting with microfluidics of single mutants in droplets for droplet transposon sequencing (dTn-Seq). Each single mutant with different growth rates (in the schematic on the left, low, medium and high levels of growth are represented as blue, orange and green background colours, respectively) will grow independently within its own droplet (grey circle), eliminating the effects of interactions between mutants. A final sorting step, based on cell fluorescence or microscopy, can also be added. Alternatively, cell-containing droplets (blue droplet on the right) can undergo multiple layers of re-encapsulation, so that an encapsulated mutant can be encapsulated within another droplet containing a different cell (shown in yellow; this can be another mutant, another bacterial cell or a host cell) and signals can freely diffuse between the layers (shown as red bolts) to allow cell interactions to be investigated by sorting those cell combinations that have altered fitness and grow at different rates, or those that can be separated by sorting based on markers, such as alterations of cell morphology observed with a microscope.
www.nature.com/nrg centrifugation to separate mutants on the basis of their density (Fig. 2b) and identify genes involved in bacterial capsule production, which is a major virulence factor for many pathogens. In this study, 78 genes underlying capsule production were identified across two clinically relevant Klebsiella pneumoniae strains 10 . The application of cell sorting to TIS has led to the development of techniques that progress from bulk separation to separation of single cells. One such application is TraDISort, which combines fluorescenceactivated cell sorting (FACS) and TraDIS 11 and sorts single cells on the basis of fluorescence. TraDISort has used the cytosolic concentration of ethidium bromide (EthBr), a fluorescent DNA intercalating agent, as a marker for altered efflux activity (Fig. 2c). For instance, mutants with insertions in efflux pump genes, such as amvA, had reduced ability to remove ethidium bromide from the cell, resulting in an overall higher level of fluorescence. By contrast, mutants such as the amvA repressor (amvR), had increased efflux and lower fluorescence. A similar approach used a fluorescent reporter to separate heterogeneous populations of Mycobacterium tuberculosis 12 , which uncovered lamA, a gene of previously unknown function that reduced overall heterogeneity in the population by decreasing asymmetric polar growth. Similarly, FAST-INSeq was developed to identify regulators of typhoid toxin production, with use of FACS of Salmonella enterica subsp. enterica serovar Typhi-infected macrophages with a fluorescent reporter for toxin expression 13 . Lastly, Tn-FACSeq was used to identify genes from Bdellovibrio bacteriovorus, a bacterial predator, that are important for attachment to Vibrio cholerae 14 . These types of fluorescence-based technique could be extended further, for instance to examine bacterial responses to other fluorescent (or fluorescently tagged) compounds, to other fluorescent reporter constructs or simply using FACS to differentiate mutants with altered cell size.

Population-independent mutant assays.
In traditional TIS approaches, mutant fitness is measured within the context of the entire mutant population. However, the true fitness of a mutant can be obscured when it is grown in the presence of other mutants. For instance, TIS cannot report on the effect of secreted products or other 'common goods' that act beyond the cell containing the mutation, or similarly mutants that suffer from density dependence. Recently, on the basis of advances in single-cell analysis, dTn-Seq was developed to address these issues 15 . dTn-Seq sorts single mutants by combining microfluidics with TIS, encapsulating individual transposon mutants in growth-medium-in-oil droplets, facilitating isolated growth of mutants free from the influence of the population (Fig. 2d). dTn-Seq experiments showed that in Streptococcus pneumoniae 1-3% of mutants have altered fitness when grown in isolation; some mutants may grow faster or slower in isolation compared with their growth measured in a traditional TIS screen. To highlight its versatility, dTn-Seq has been applied to investigate hypercompetence, processing of host glycoproteins, defence against host immune factors and microcolony formation 15 . Moreover, dTn-Seq is compatible with microscopy and FACS-based screening, and by reloading droplets into a microfluidic device, multilayer encapsulations can be achieved. Such droplets consist of multiple layers containing different mutants, other bacterial species or even host cells, between which communication signals can freely diffuse, thereby facilitating investigations of interbacterial and bacteria-host cell interactions 15 (Fig. 2d). Although these droplets cannot easily be applied to in vivo animal models, any interesting phenotypes that arise from dTn-Seq screens, including host-microorganism interaction mediators, can be directly confirmed in cell culture assays and/or in vivo in animal models using targeted mutants.

Assaying function of essential genes and gain-of-function screens.
One limitation of standard TIS is that only nonessential genes can be assayed, as essential genes, by definition, do not tolerate insertions. A handful of studies have overcome this by using gain-of-function screens that use libraries of transposons with outward-facing promoters to facilitate gene overexpression and repression (Fig. 3A). Monitoring the change in frequency of transposons that induce the expression of downstream genes, including essential genes, during selection can identify phenotypes that may not be evident from gene disruption (Fig. 3B). This idea is not new; for example the TnAraOut method, developed in 2000 (reF. 16 ), used transposons containing the arabinose inducible promoter P BAD to screen V. cholerae for essential antibiotic targets. Various approaches, where an outward facing promoter is engineered into a transposon system, have been developed to assay essential genes, for example in Caulobacter crescentus 17 and Staphylococcus aureus 18,19 . Recently, this approach was combined with traditional TIS, resulting in the TraDIS-Xpress package 20 (previously known as TraDIS+ (reF. 21 )). TraDIS-Xpress uses an inducible P BAD promoter facing out of a Tn5 in E. coli, in addition to detailed transposon-mediated inactivation data, to query all genes. It was successfully applied to identify both essential and non-essential genes affecting tolerance to various concentrations of the biocide triclosan, and differential responses to bactericidal and bacteriostatic concentrations were found. A highthroughput method for gain-of-function assays was recently developed, dual-barcoded shotgun expression library sequencing (Dub-Seq), where barcoded overexpression libraries of E. coli were mapped, barcoded and used to assign gene function in 52 experimental conditions on the basis of mutant fitness changes due to increased gene dosage 22 . To control expression in gain-of-function screens, some studies 17,20 used inducible promoters on a single transposon, which can have the advantage (over using multiple transposons) of allowing high library density and reducing insertion bias. Other studies used constitutive promoters with different strengths on either barcoded or different types of transposons, which has the strategic advantage that different gene dosages can be assayed in the same culture 18,19 .

Scaling up TIS using high-throughput phenotyping.
Although it is possible to apply the original TIS protocols at scale, the multistep library preparations involved Transposon A mobile genetic element that inserts itself into a genome and disrupts genes or genetic features at that site.

Next-generation sequencing
DNA sequencing using a massively parallel platform that separates DNA templates on a flow cell and clonally amplifies clusters for sequencing.
Genomic features every component of the genome that can be annotated as being a feature, whether it be a gene, coding rNA, non-coding rNA or promoter region.

Fragmentation
Breaking up DNA into smaller pieces in order to be sequenced. This can be done by physical shearing methods, such as sonication, or enzymatic digestion.
Fluorescence-activated cell sorting (FACS). A specialized type of flow cytometry that separates cells, one cell at time, by their fluorescent characteristics on the basis of light scattering.

Nature reviewS | GENETICS
can become increasingly costly when one is dealing with hundreds of samples. One solution to this problem, random barcode transposon-site sequencing (RB-Tn-Seq) 23 introduces a random DNA barcode into each transposon. An initial conventional TIS approach is used to determine the insertion site associated with each barcode, and then a single-step PCR barcode amplicon can be directly sequenced in future experiments to track changes in mutant frequencies 24 , substantially speeding up screening. For instance, one recent upscaled study applied RB-Tn-Seq to 32 different bacterial strains across 129 conditions, and identified a large variety of leads for gene function 25 . A second problem in scaling TIS to large collections of bacteria is that optimized transposon delivery vectors often do not exist for non-model organisms. The 'magic pool' approach accelerates the optimization process using pools of transposon vectors, each of which has a different combination of upstream sequences (promoters and ribosome-binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows quick measurement of vector efficiency during mutagenesis 26 .

Developments in TIS data analysis
A typical analysis protocol for TIS data can be summarized as follows: after the splitting of sequencing reads on the basis of their multiplexing barcode, any transposon or adaptor sequences are removed, reads are mapped to an annotated reference genome and the unique position of the transposon and relative insertion coverage (that is, number of reads) are recorded, processed and presented to calculate the effect of each transposon on fitness (for example, growth or survival). Over the past 10 years, several bioinformatics pipelines and protocols have been developed for this purpose (TABle 1). These TIS tools include Web-based applications 27,129 , stand-alone graphical applications 29 and command-line toolkits [30][31][32] . All of these tools implement variations on both gene essentiality and conditional fitness analyses, although they differ in the details of preprocessing and  B | In traditional transposon-insertion sequencing (TIS) approaches, essential genes can be identified as those that cannot tolerate insertions (part Ba). In gain-of-function screens, when transposons in the transposon pool are induced, for example with arabinose (part Bb), high expression of essential gene Z is achieved. When selection is applied, involvement of essential genes in the condition can now be assayed (part Bc) by monitoring relative differences in the number of transposons that influence expression levels. In this example, after exposure to the condition and sequencing out of the transposon, an increase in the number of transposon insertions that increase gene Z's expression is observed. Therefore, mutants that overexpress gene Z have increased fitness within the overall population during selection, indicating that gene Z's expression is beneficial in that condition. All other features are as in Fig. 1.
www.nature.com/nrg read alignment, normalization techniques, and statistical models or tests used. Often choices made in the experimental protocol can impact the appropriateness of a particular analysis procedure as much as any theoretical issue; for instance, many hidden Markov model and sliding window approaches to defining essential regions in the absence of annotation are applicable only to mariner transposon studies, as the assumption of a uniform insertion probability at TA sites simplifies the underlying statistical model. Many of these core issues were addressed in a 2016 review on design and analysis of TIS experiments 7 , but recent TIS developments have pushed analysis methods in new directions. One major development of TIS analysis methods is in dealing with infection dynamics and particularly the effects of bottlenecks, which are transient reductions in population size during the course of the experiment (see later). While these have been dealt with in analyses using normalization based on changes in neutral loci 33 or subsampling 30 , two interesting new approaches to this problem were proposed recently. In the first, principal component analysis (PCA) is performed on log fold changes in mutant abundance across replicate infection experiments 34 . Examination of the principal components recovered can then identify linear combinations of the changes across replicates that separate genes consistently across experiments, providing a score for association of any particular gene with survival in infection and eliminating the contribution of spurious stochastic changes. A second approach has adopted the zero-inflated negative binomial (ZINB) distribution to model transposon insertion counts 35 . The ZINB distribution is a mixture of a logit distribution, which captures the probability of detection of a data point, and the negative binomial, which captures the overdispersion generally observed in sequencing data. This distribution has attracted attention recently in the analysis of single-cell RNA sequencing (RNA-seq), as it provides a natural mechanism for capturing technical dropout of transcripts 36 . Similarly, by fitting the logit component of the ZINB genome-wide, this approach can correct for differences in library saturation between conditions arising either in library creation or due to bottlenecking.
A second development is the move from simple condition-control comparisons to the simultaneous investigation of large suites of conditions. For instance, the PCA and ZINB methods highlighted above demonstrated their effectiveness using combinations of existing datasets, identifying commonalities in genes required by different Vibrio strains in infection 37 or response to a panel of antibiotics in M. tuberculosis 35 , respectively. A striking example of such a data analysis pipeline, named AlbaTraDIS 38 was developed for the TraDIS-Xpress study examining triclosan tolerance (see reF. 20 and the TraDIS-Xpress discussions earlier) and uses sliding window analyses integrating all available information to predict all genes and promoter regions involved during selection.
As TIS experiments become more complex and the results are combined with other data types, tools for visualization and data delivery are becoming increasingly important. For example, a recent study integrating expression and fitness data from the HIV-associated Salmonella enterica subsp. enterica serovar Typhimurium strain D23580 (reF. 39 ) included a Dalliance-based browser 40 , allowing readers to directly interrogate the data themselves and providing a valuable community resource. Platforms for easily providing this kind of interactive interface are beginning to emerge, such as ShinyOmics 41 , a Web-based application for rapid collaborative exploration of omics data, including TIS, RNA-seq and proteomics date, which allows comparisons between datasets, PCA and simple network analysis. As datasets accumulate and automation increases throughput, such integrative analysis approaches will become increasingly important.

Sliding window
A window of arbitrary length is set and events within that window are assayed. This window is then moved around the genome. This approach provides a more objective method of assessing the genome that is independent of annotation.

Bottlenecks
When a population size is drastically reduced through stochastic processes and the surviving cells will make up the new population but will have reduced genetic diversity.

Overdispersion
When data exhibit greater variability than would be expected under a statistical model. in the context of sequencing data, overdispersion is often used in reference to the negative binomial distribution, which can be understood as a generalization of the Poisson distribution that allows a larger variance relative to the mean.
Key biological applications of TIS Since its development, TIS has been used in a range of in vitro studies as well as in vivo infection models. Here we summarize how the development of TIS has facilitated the investigation of key biological questions, with a focus on studies with implications for human health.
Identifying genes and networks involved in antibiotic resistance. The emergence of antibiotic resistance is a major global health problem, exacerbated by a lack of development of new antibiotics. TIS is well equipped to infer the relative impact that disrupting each genomic feature has on antibiotic sensitivity (Fig. 4)  These studies have shown that while antibiotics may have specific targets (for example, in cell wall synthesis, DNA replication or protein synthesis), the bacterial response to antibiotics is actually distributed across the genome. For example, fluoroquinolones target topoisomerase IV and DNA gyrase, which are essential enzymes involved in DNA replication. Although most TIS experiments cannot assay these targets directly owing to their inherent essentiality, TIS profiles generated under fluoroquinolone exposure implicate other genes involved in DNA replication and repair, such as recN and xseA 33,45 . While these genes are not direct fluoroquinolone targets, they contribute to intrinsic resistance as part of a secondary effect; for example, fluoroquinolones trigger DNA damage, which activates DNA repair. In general, antibiotic TIS profiles for each antibiotic tested, and each organism screened, show a role for genes beyond those related to the primary target, indicating the importance of genes with diverse functions, including amino acid and carbohydrate metabolism, energy generation, transport and regulation 42,43,[46][47][48][49][50][51] . This finding underlines that although we have a limited view of how an antibiotic inhibits a bacterial cell, TIS can be used to uncover this complex, multifactorial process. As a result, TIS profiles have been demonstrated to be effective in determining the mechanism of action of novel antibiotics 52,53 . Creating profiles for multiple similar conditions can help to direct attention to genes with unknown function that are important under all of these conditions and thereby help to identify leads that may assist in uncovering their function ( Fig. 4b and see Integrating TIS with other genomic approaches to predict complex traits). Moreover, TIS profiles can also uncover opportunities to sensitize a bacterium to a drug, facilitating the design of secondary or helper drugs 42,54 .

Investigating virulence genes, host adaptation and vaccine development.
Interrogating the genomic requirements for pathogens to cause disease has been a major motivation in the development of large-scale reverse genetics approaches (Fig. 5). For instance, the development of early transposon screens in Salmonella Typhimurium 55,56 provided key evidence for the discovery of major virulence factors. TIS has made it much easier and faster to screen bacterial pathogens virulence factors. These have included in vitro assays, such as capsule production 10,57 , growth in serum 58 and colicin resistance and sensitivity in E. coli 59 , as well as sporulation in Clostridioides difficile 60

Reverse genetics
Determining the phenotypic effects of a genetic feature by altering the genetic feature and observing changes in the organism compared with a wild type. 'reverse' refers to the genotype-to-phenotype mode of investigation, being opposite the classical phenotype-to-genotype genetic investigations ('forward genetics'). Although TIS has streamlined the process of generating and monitoring mutant libraries, challenges remain in applying the technique to infection models. A primary concern is the effect of bottlenecks, which can be quantified experimentally by measuring the loss of neutral markers 33 , for instance using the wild-type isogenic tagged strains (WITS) method 71 . Bottleneck effects should be considered computationally (see earlier) and can, at least partially, be avoided by careful consideration of the infection model. The size and temporal structure of a bottleneck are often specific to the particular infection model, and can be influenced by a range of factors, including physical barriers, nutrient availability and competition with the native microbiota 72 . Whereas mild bottleneck effects can be partially compensated for during analysis, major bottlenecks can irreversibly bias an experiment that does not account for them. In these cases, the surviving mutants represent the subset of bacteria that happened to pass some barrier to infection, rather than being representative of all mutants that could, leading to skewed representation and a lack of reproducibility between experimental replicates. The wide variety of infection models developed to study Salmonella Typhimurium provide an example of how model choice can affect the design of a transposon screen (Fig. 5). For the most realistic infection models based on infection through the gut epithelium, bottlenecks can be severe 73,74 , limiting transposon analysis to small pools of tens to hundreds of mutants 61,62 . Intraperitoneal inoculation can bypass this major bottleneck in a mouse model, which allowed screening of ~10,000 mutants in a single animal 75 ; however, the results are uninformative with regard to gastrointestinal disease. Finally, in the case of cell culture models, such as the macrophage model that captures a key challenge to the development of systemic salmonellosis 55 , the only constraint on library complexity is the number of cells available for infection, allowing efficient screening of very large mutant populations (~10 6 ) 39 . Similar trade-offs between realism and library complexity are likely to exist for many infection models, particularly those that involve bacterial penetration of barrier defences.
Experimental challenges notwithstanding, TIS applied to animal models has also proved useful in identifying and understanding vaccine targets. For example, screening of a S. pneumoniae TIS library in a ferret transmission model, describing the fitness landscape of genes during mammalian transmission, yielded valuable and translatable data. Targeted deletion confirmed that key TIS hits (putative C3-degrading protease CppA, iron transporter PiaA and competence regulatory histidine kinase ComD) significantly decreased transmissibility. Importantly, maternal vaccination with recombinant PiaA and CppA alone or in combination blocked transmission from mother to offspring and was more effective than capsule-based vaccines 76 . In a second example, a mouse sickle cell disease (SCD) model coupled with TIS identified a set of pneumococcal virulence genes specific to hosts with SCD. Not only did these factors point to aspects of SCD pathophysiology, but they also showed that the protective capacity of antigens can be different in the healthy versus the SCD population, highlighting the importance of understanding bacterial pathogenesis in the context of common comorbidities 77 .

Assaying functional components of mobile genetic elements.
Mobile genetic elements, including plasmids, transposable elements and bacteriophages, are important players in interspecies and intraspecies gene transfer and are heavily implicated in the spread of antibiotic resistance and virulence determinants. Plasmids are notoriously difficult to study with screening approaches, owing to their independent replication systems and capacity to regulate copy number. TIS has been used to identify genes involved in maintenance of IncA/C plasmids in E. coli, and the results were then developed into an IncA/C plasmid typing scheme 78 . Additionally, TIS has also been used to demonstrate the involvement of a type IV secretion system in conjugation of an IncP plasmid in Edwardsiella piscicida 79 .
Bacteriophages ('phages') are important mobile genetic elements that have been used for bacterial typing for decades and can greatly influence bacterial pathogenesis through the transduction of pathogenicity islands. Additionally, phage therapy to treat resistant bacterial infections is experiencing a resurgence as an alternative to antibiotics. TIS is able to identify essential host factors that mediate or hinder bacteriophage infection. For example, challenge of an E. coli O157 TraDIS  analysis. This bottleneck effect is particularly pronounced in models where bacteria must overcome barrier defences (for example oral infection models of Salmonella enterica subsp. enterica serovar Typhimurium). Organoid models can provide a complex environment while limiting bottlenecks, but technical limitations in culture may limit the total population size screened. Finally, cell culture screens can be scaled to arbitrary sizes, allowing screening of extremely large collections of mutants, but they often provide insight into only a particular aspect of disease.
Organoid A three-dimensional, simplified replica of an organ derived from stem cells to realistically model the organ in vitro.
Nature reviewS | GENETICS library with T4 and T7 bacteriophages identified new host genes involved in both bacteriophage resistance (for example, sspA, encoding stringent starvation protein A) and susceptibly (for example, the sap operon) 80 .
Similarly, experiments with bacteriophages specific to particular capsules have allowed the identification of not only modifiers of bacteriophage resistance 81 , but also genes responsible for capsule expression 57 .
Uncovering essential genes and the influence of the pan-genome. One of the first uses of TIS was to define the essential genes for survival for a plethora of bacterial species, including human pathogens such as Porphyromonas gingivalis 82 , B. cenocepacia 83 and Yersinia pseudotuberculosis 84 , animal pathogens such as S. equi 85 , plant pathogens such as Pseudomonas syringae 86 , the model organism E. coli K12 (reF. 87 ) or commensal gut bacteria such as Bifidobacterium breve 88 . These valuable essential gene datasets gained from TIS studies have been shown to correlate well with existing phenotypic gene essentiality data, such as from the single-gene E. coli knockout library (Keio University) 89 , and these can not only be interpreted to understand basic functioning of the cell but can also provide vital information for identifying potential novel drug targets. Once TIS has been implemented in a species it is often straightforward to create libraries in related strains. This flexibility facilitates functional exploration of a species's pan-genome and how genetic background can affect phenotype. This kind of in-depth investigation can ultimately help to uncover how bacterial species, particularly diverse species that include both pathogens and non-pathogens, can become harmful or antibiotic resistant, for example via horizontal transfer of pathogen-associated genetic material. TIS in nine strains of P. aeruginosa determined the core essential genome in five media, and highlighted that essentiality of some genes depends on genomic context 90 . The influence of genetic background on phenotype is further illustrated by examples that highlight how genes involved in responding to antibiotic stress can be strain specific. For example, screening of two S. pneumoniae isolates for genes that are important for intrinsic resistance to antibiotics from three classes showed that on average only ~50% of the responsive genes are common between strains. Investigation of the underlying reasons for this variability showed that network architecture, including regulatory pathways that direct competence, are wired in a strain-specific manner, thereby making responses strain specific 54 . A recent study probed five diverse strains of S. aureus for daptomycin resistance mediators and identified several core pathways consistently involved across strains, including the lipoteichoic acid pathway, as well as factors that varied with strain diversity, such as the cell envelope 18 . Furthermore, in A. baumannii a single gyrA resistance allele results in preferential poisoning of topoisomerase IV by ciprofloxacin, leading to large alterations in the fitness landscape of insertion mutants compared with a wild-type gyrA background. This altered background triggers the activation of prophage and quickly leads to the emergence of ciprofloxacin-resistant clones 91 . In M. tuberculosis, loss-of-function mutations in katG can result in isoniazid resistance. However, TIS experiments have shown that several clinical strains have an increased requirement for katG compared with the reference strain H37Rv 92 . This variability underscores how genome variation can affect adaptive solutions and highlights the importance of extending TIS to clinical isolates that may have a very different genetic background to laboratory strains.

Understanding metabolism, the response to environmental factors and microorganism-microorganism interactions.
Despite decades of accumulating genome sequences in public databanks, many protein-coding genes remain unannotated or carry inaccurate annotations, particularly in non-model and difficult-to-culture organisms. Often even the conditions under which a gene contributes to survival are unknown, leading to a serious roadblock in any attempt at molecular characterization. TIS can help us understand how bacterial cells experience the changing environments encountered in nature, by providing comprehensive profiling of mutant phenotypes 93 . Specifically, recent TIS studies have identified fitness determinants during energy-limited growth in P. aeruginosa 94 and outlined how several bacterial species synthesize amino acids 95 . Others have identified genes in E. coli that promote survival during exposure to ionizing radiation 96 , genes in Salmonella Typhi that allow adaptation to survival in water 97 and genes involved in desiccation stress in Salmonella Typhimurium 98 . TIS approaches have also uncovered how bacteria interact with plants or deal with soil environments 99 . For instance, TIS was used to identify genes required for growth of the soft-rot pathogenic bacterium Dickeya dadantii in chicory plants 100 or genes of Pantoea stewartia that are essential for survival in planta to provide insights into how it causes wilt disease in corn 101 . As the breadth of examined stresses increases, shared adaptations to diverse stress conditions are emerging.
In a massively upscaled example of assaying genes for adapting to changing environments in parallel, a recent study demonstrated the utility of this approach by applying RB-Tn-seq to 32 diverse bacteria in ~200 conditions; this work assigned a phenotype to more than 11,000 uncharacterized proteins, with ~2,000 of these functional annotations demonstrated to be conserved across organisms 25 . A similar study investigated the major human gut commensal and obligate anaerobe Bacteroides thetaiotaomicron across 492 conditions and identified genes involved in metabolism and bile tolerance 102 . Together, these two studies demonstrate how TIS can be applied broadly across organisms and deeply within an organism to extract leads for future molecular characterization.
TIS approaches can also directly answer questions with implications for human health; for instance, how does the gut microbiota affect drug metabolism and toxicity? Using the robotics-driven TIS mutant array approach pioneered by the INSeq method 3,103 , where individual mutants are mapped through combinatorial pooling, a comprehensive library of ~1,300 B. thetaiotaomicron mutants were selected and incubated with the antiviral drug brivudine. Drug metabolites were then measured

Pan-genome
The complete set of genes in all strains within a species, in contrast to the core genome, which is the set of genes shared by all strains within a species.

Essential genome
The complete set of genes and genetic features in a genome that are essential for a cell to survive and grow, the examplar of which are 'housekeeping' genes for core processes such as replication and division.
www.nature.com/nrg by mass spectrometry, and individual B. thetaiotaomicron genes required for the production of a hepatotoxic metabolite were identified. Follow-up studies in gnotobiotic mice confirmed the in vivo relevance of this toxin production pathway 104 , illustrating the power of this approach in understanding drug-microbiota interactions.
In the wild, microorganisms rarely live planktonically in isolation, but are constantly interacting with other microorganisms and forming communities, either by chance as in wound co-infections or as part of a stable ecosystem as in the mammalian gut. TIS provides an opportunity to understand these microorganism-microorganism interactions, as the genetic response of one bacterium to other bacteria can be recorded. Numerous studies have shown that the fitness effects of gene disruption can depend critically on the presence of other community members. For instance, co-infection has been shown to alter the bacterial fitness landscape in wound models, such as with the opportunistic pathogens Streptococcus gordonii and Aggregatibacter actinomycetemcomitans 105 . Furthermore, studies of co-infection with P. aeruginosa and S. aureus in mouse surgical wounds showed that ~25% of S. aureus genes that are essential during co-infection are no longer needed during single-species infection (Fig. 6a). Furthermore, single mutants, such as those of the community-dependent essential gene udk, encoding a uridine kinase, were confirmed to influence levels of co-infection but not monoinfection in vivo 106 . Interaction studies have even been extended to predatory relationships, such as in studies of B. bacteriovorus that identified genes required for predation of V. cholerae during planktonic and biofilm growth using Tn-Seq libraries of both the prey 107 and the predator 14 .
Similarly, bacterial genes that influence infection with viruses have been examined by TIS in numerous bacterial species 57,80,81,108 . The mechanisms of bacterial interactions within communities have also been investigated by screening for effectors of type VI protein secretion systems (T6SSs). T6SSs are conserved bacterial defence mechanisms that deliver toxins to neighbouring cells through a contact-dependent mechanism, killing those that lack immunity proteins (Fig. 6b). New toxin and immunity genes have been identified through TIS in V. cholerae 109 and P. aeruginosa 110 . Together, these studies have provided insight into a wide range of relationships that shape the microbial environment.
Integrating TIS with other genomic approaches to predict complex traits. Whereas TIS has been most commonly used to make simple associations between environments and genetic components, it can also uncover more complex relationships. No genomic element, gene or pathway exists in isolation; rather they are connected through intricate networks, resulting in specific organismal properties and, ideally, an appropriate response when disturbed. One layer of these networks is gene regulation, including the non-coding genome. By combining saturated TIS libraries with expression profiling, one can identify functional non-coding RNAs (ncRNAs). For example, RNA-seq can be used to map out expression units across the entire genome and indicate whether non-coding/intergenic regions display significant levels of transcription. In turn, a parallel TIS experiment with insertions in these regions can then be used to associate a phenotype with the disrupted ncRNA. This approach was used in S. pneumoniae, and yielded 89 ncRNAs, more than half of which had not been identified previously, and several could be associated through TIS in vivo data as being critical for virulence 111 . A follow-up study used different RNA-seq techniques to map the full transcriptional landscape of the S. pneumoniae virulent type strain TIGR4 (reF. 112 ). This resulted in identification of many non-coding regulatory regions, which could be associated with a phenotype through integration of TIS data from different environments. Another comprehensive TISbased regulatory study in Neisseria meningitidis identified 288 genes and small ncRNAs needed for colonization of human epithelial and/or endothelial cells 113 .  Fig. 1. b | Using TIS to identify type VI secretion system (T6SS) toxin immunity pairs by growing cells with close cell-to-cell contact, and performing TIS so that genes involved in protection of T6SS-depending killing (depicted as a yellow bolt) can be detected. In this example, genes in PA encoding immunity proteins (Tsi; orange diamond) that protect from neighbour killing by toxins (Tse; red square) become essential only in a T6SS-active (retS; bottom panel) background but not in the inactive (H1_retS; top panel) cellular background.

Gnotobiotic
An environment for culturing microorganisms, such as an animal model, where all microorganisms are either defined or removed.
Mapping of genetic interactions, which quantify fitness dependencies between genes, can be used to build genetic interaction networks and infer regulatory relationships, pathway structures or leads for gene function 114 . Genetic interactions can be identified by creating a TIS library in a query gene deletion background and screening for genes whose fitness deviates from the multiplicative fitness of the individual mutants (Fig. 7). The most obvious example of a genetic interaction is synthetic lethal interaction, where two individual mutants have no or little fitness effect but abolish growth when combined, and can occur if two gene products perform redundant essential functions. A variety of such interactions exist, which imply different types of relationships between components (reviewed in reF. 114 ). Genetic interaction networks can be combined with in vivo studies to further inform gene function in the host. For example, if a gene involved in intrinsic resistance to cell wall-targeting antibiotics is important only in healthy mice but not when certain immune components are missing, this indicates that the gene product is not only a resistance factor but is also potentially visible to the immune system (Fig. 4). Several studies have successfully used a genetic interaction approach with TIS in S. pneumoniae, including to uncover regulatory dependencies for catabolite control protein A, how a potassium uptake system and a subpathway for nasopharyngeal colonization are regulated and how the protease ClpP is involved in competence 1,33,54 . Additionally, cell division components such as CozE in S. pneumoniae were identified by use of pbp1A as a query gene, revealing that CozE directs the activity of Pbp1A to the midcell plane, where it promotes zonal cell elongation 115 . Alternatively, a query gene/pathway can be inhibited by a drug or inhibitor, as has been done in the case of wall teichoic acid biosynthesis in S. aureus, and then screened with TIS for synthetic lethal interactions 116 . This study connected wall teichoic acids with other pathways, including cell-envelope D-alanylation, and peptidoglycan and lipoteichoic acid synthesis 116 . Other applications of note include investigating the role of the quorum-sensing and virulence regulator LasR in different P. aeruginosa infection models 117 , and assigning function to uncharacterized genes in M. tuberculosis 118 .
Both TIS and RNA-seq data have shown that even relatively simple perturbations in bacteria (for example, changes in pH, or exposure to low-level antibiotics) trigger complex responses. There is value in combining these data sets; for example, combined TIS and RNA-seq antibiotic response data obtained from P. aeruginosa allowed predictions of antagonistic antibiotic combinations 48 (Fig. 7A). Such observations suggest that TIS and RNA-seq may register distinct but complementary features of the underlying network architecture of the cell and illustrate how responses can be separated into at least two organizational levels (phenotypic and transcriptional). Remarkably, when expression and fitness are directly compared (Fig. 7Ba) there is often little correlation between them (Fig. 7Bb) [119][120][121] , with some notable exceptions, such as in some metabolic pathways 120 and classical virulence factors 39 . This discordance has long been known in yeast [122][123][124] , and suggests that the majority of transcriptional regulation is not optimized by selection 125 .
Considering RNA-seq and TIS in the context of the underlying cellular network can clarify their relationship 119 (Fig. 7Bc). When responding to an environmental stimulus to which a bacterium is adapted (for instance nutrient depletion), the majority of fitness and expression changes occur at short distances from each other within the metabolic network, with 80% of genes with fitness changes connected by two or fewer metabolic reactions to genes with expression changes, and 93% within a three-reaction radius 119 (Fig. 7Bd). Furthermore, the correlation of fitness and expression changes decreases with distance in the network 119 . This indicates that fitness and expression changes are not only colocated within the network but are of a magnitude comparable to those of their neighbours. These relationships can disappear when a bacterium responds to an environment to which it is not adapted, for instance an antibiotic 119 (Fig. 7Bb). This suggests that the apparently paradoxical lack of correlation between fitness and expression measurements can be in part understood through network models that incorporate regulatory and genetic relationships, which could aid drug target predictions and genetic network engineering. Quantifying the degree of disruption a stimulus creates in a bacterium's transcriptional network has already resulted in accurate predictions of fitness, antibiotic sensitivity and drug mechanism of action 126 .

Conclusions and future perspectives
Considerable advances and extensions of TIS have been made since its introduction in 2009, many of which are highlighted in this Review. As a result of advances in cell sorting and microfluidics, TIS has been adapted to examine phenotypes on the level of single mutant cells 11,15,127 . TIS has also demonstrated its practical utility, for example, in the development of new vaccine candidates 76,77 and new antibiotics or helper drug targets 42 . Lastly, genetic interaction approaches are beginning to build networks that map out the complex relationships between genetic components within the cell, and these could be further extended by combining two transposons into a single genome, by combining TIS and CRISPR-based transcriptional interference (CRISPRi) or by scaling approaches that simultaneously mutagenize interacting organisms 128 .
Furthermore, the vast majority of TIS studies to date have been performed in bacterial species, owing to this ease of genetic manipulation. For those species in which TIS works well, it is a powerful technique that can provide high volumes of valuable genotypephenotype linkage data on a fine scale. Over the next decade, we hope to see an expansion in the types of organisms assayed by TIS methods. Encouragingly, several related TIS-like methods have been developed for use in mammalian, fungal, parasite and archaeal backgrounds. Profiles of TIS applied in the two model yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe identified essential genes, genes involved in rapamycin resistance and factors that contribute to the formation of heterochromatin [129][130][131] . Other examples include determining the essential genes of the archaeal

Synthetic lethal
Where individual mutants have no or little fitness effect, but when two or more of these mutations are combined, this leads to arrest in cell growth or to cell death.

Antagonistic antibiotic combinations
When the activity of an antibiotic combination is lower than would be predicted from the effects of the individual antibiotics.
www.nature.com/nrg species Sulfolobus islandicus 132 and Methanococcus maripaludis 133 , phenotypic interrogation via tag sequencing (PhITSeq) in haploid human cells to assign gene function 134,135 , quantitative insertion site sequencing (QISeq) in mice using piggyBac and Sleeping Beauty transposons to screen them for cancer-related genes 136,137 , QI-Seq in Plasmodium falciparum, using the piggyBac transposon to determine essential genes 138 and barcode analysis by sequencing (BarSeq) in yeast 24,139 . Moreover, we expect to see the biological questions answered with TIS to become increasingly complex. Such applications will only enhance the utility and breadth of TIS approaches in the future.
The analogous functional genomics method of pooled CRISPRi screening, which silences genes in a  Fig. 7 | Integrating TIS with RNA-seq data. A | An example of combining RNA sequencing (RNA-seq; depicted in green throughout) and transposon-insertion sequencing (TIS; depicted in red throughout) to identify antagonistic interactions between the antibiotics polymyxin B (PolyB) and gentamicin (Gent) or tobramycin (Tobr) in Pseudomonas aeruginosa. TIS data indicate that genes mexY and mexX are involved in intrinsic resistance in P. aeruginosa to gentamicin and tobramycin, indicated by the red edges between the antibiotics and the genes. When these genes are disrupted by a transposon insertion, the bacterium becomes more sensitive to these antibiotics. Moreover, RNA-seq data reveal that polymyxin B induces expression of these genes, as indicated by the green arrows. This led to the hypothesis that polymyxin B, owing to its transcriptional activation of mexY and mexX, will make the bacterium less sensitive to either gentamycin or tobramycin. The study authors confirmed experimentally that these antibiotics work in an antagonistic manner 48 , which highlights the strength of probing response networks from different perspectives to extract biological meaning. B | Measurement of TIS and RNA-seq responses under the same conditions (part Ba) has shown that the transcriptional responses to a specific environment (Δ expression) are often not accurate predictors of gene deletion phenotypes, as expression and fitness do not correlate well (Δ fitness; part Bb) 119 . However, by overlaying these datasets over a known network (for example, a metabolic network; part Bc), network analyses can identify patterns between TIS and RNA-seq. In this example a small part of a metabolic network is depicted; grey circles are metabolites and arrows are genes encoding enzymes that mediate each reaction. Red arrows are phenotypically important genes in a specific environment identified by TIS, whereas green arrows are genes that change transcriptionally in the same environment, identified by RNA-seq. The distance between two genes in a metabolic network is the number of reactions between them, and can be calculated for all pairs across the network. In part Bd, the left network is an example where distances between pairs of fitness and expression changes are small, whereas the right network illustrates larger distances. An adapted response to an environment is characterized by fitness and expression changes that are relatively small (distance to neighbour) and correlated (Δ fitness × expression). Exposure to stress conditions to which a bacterium is not adapted leads to a loss in correlation between genes that change in transcription and those have a fitness effect (part Bd) 119 . Importantly, such associations can be used to make predictions on antibiotic susceptibility 126 . Part B is adapted with permission from reF. 119 ,Elsevier. targeted fashion and uses single-guide RNAs and catalytically dead Cas proteins (first demonstrated with dCas9 (reF. 140 )), has been successfully applied to numerous bacterial species since the development of mobile CRISPRi systems [141][142][143][144][145] . CRISPRi has some advantages over TIS, primarily that silencing is directly targetable to regions of interest, which can reduce the complexity of the assay and thus the amount of sequencing reads required, and it can allow knockdown of any coding regions, for instance essential genes 142 , which traditional TIS cannot. However, CRISPRi requires design, synthesis and cloning libraries of sgRNAs, which can be technically challenging, and understanding the impact of off-target effects or differences in sgRNA efficiency can add complications during implementation and analysis. By contrast, the execution of TIS requires no prior specific knowledge of the genetic make-up of an organism, and owing to its more random nature, TIS can uncover unexpected or novel genes, can potentially assay transcriptionally inactive regions of the genome and can be precise enough to interrogate specific regions within the transcriptional units, such as essential protein domains 38 . Modifications of both technologies can assay the effects of gene overexpression and suppression through complementary approaches. Functional genomics screening techniques such as TIS and CRISPRi can suffer from similar shortfalls, namely that deciphering detailed mechanistic insight from the large datasets generated can be difficult to automate. For effective data analysis, research groups will have to pool data, resources and expertise to construct holistic workflows that can manage this complexity. To this end, we expect to see in an increase in data sharing platforms, such as the newly established TIS depository TraDIS-vault, the viewer available for the invasive Salmonella Typhimurium strain D23580 (reF. 39 ) or interactive data visualization platforms, such as ShinyOmics 41 .
Looking forward, we predict that TIS methods will be applied to answer increasingly complex and diverse biological questions. For this expansion, we must move past the straightforward, homogeneously grown laboratory assays to more sophisticated ones that better mimic ecological states that occur in nature. One major limitation of TIS is that it is available only for use in easily culturable and genetically tractable species, which represent only the minority of total bacterial and archaeal species 146 . A key challenge will be to develop tools to allow the recalcitrant microbes of medical, industrial and environmental importance to be assayed. TIS-inspired methods, such as the 'magic pool' approach to optimizing transposon delivery in non-model organisms 26 , are already beginning to address this. A driving factor will be massive upscaling in the numbers of conditions and bacterial strains that can be simultaneously screened, building on RB-Tn-Seq 25 and similar approaches. As these methods push beyond model strains, we will increasingly gain insight into how the genetic diversity within pan-genomes is shaped and maintained. This migration away from one-dimensional genotype versus phenotype experiments will require consideration of the larger genetic network and particularly interactions between genetic background and fitness (see Scaling up TIS using high-throughput phenotyping). This will involve the application of machine learning, modelling and network analyses to integrate and extract knowledge from accumulating TIS datasets. Eventually, and in combination with other postgenomic functional data, this approach will increasingly enable us to move from describing the genetic architecture of the cell to predicting future behaviours 119,147 . Collectively, these developments illustrate that the journey of TIS is far from over, with many exciting paths yet to be explored. A massively upscaled study that applies TIS to multiple bacterial species and more than 100 conditions so as to assign broad gene function en masse. www.nature.com/nrg