Main

A striking and consistent finding to emerge from the genome-sequencing projects is that the function of most genes cannot be determined from analysis of the primary sequence alone. Instead, clues can be obtained from a range of other approaches, of which the most informative is usually the identification of a mutant phenotype. Phenotypic analysis of mutants that have been obtained by either forward or reverse genetics must therefore continue to have a central role in the post-genome-sequencing, functional genomics era. Meeting this requirement is far from trivial. For even the intensively studied model organism Escherichia coli K12, over 50% of its ORFs remain uncharacterized (see The Institute for Genomic Research web site), and it is still a major undertaking to analyse a corresponding number of individual mutants that carry single gene deletions for many interesting and biologically relevant phenotypes.

An attractive alternative to analysing mutants individually is to analyse them in pools. However, to achieve this, one needs a means to distinguish between the different mutants. Genetic footprinting1 was developed as one approach for more efficient identification of mutants in mixed populations. However, it is restricted in that only one gene is analysed at a time.

An alternative to the slow and laborious analysis of individual mutants is provided by signature-tagged mutagenesis (STM), which was originally designed to enable high-throughput, parallel analysis of mutant strains of pathogenic microorganisms2. In STM, each mutant is tagged with a different DNA sequence in such a way that all tags can be co-amplified from the DNA of mixed populations of mutants in a single PCR. They can also be simultaneously labelled to provide specific probes for the detection of mutants, before and after they have been subjected to selection2. Therefore, the sequence tag acts as a molecular barcode to monitor the presence of each mutant in the mixed population.

In the original description of the method2, the tags consisted of short DNA segments containing a 40 bp variable central tag that was flanked by invariant 'arms' of 20 bp in length, which enable the co-amplification and labelling of the central portions by PCR. The junctions of the variable and invariant regions were marked by restriction sites that could be used to release the arms from the central regions following amplification and labelling. These two features allow tag-specific probes to be generated (Fig. 1a). Although the majority of sequences that were generated in this way produced efficiently labelled tags that did not cross-hybridize with each other, this was not true for all sequences, and a pre-screening process was used to remove mutants that carried tags that did not amplify or label efficiently.

Figure 1: Original signature-tagged mutagenesis of Salmonella.
figure 1

a | Design of a signature tag. Each tag has a unique central sequence of 40 bp ([NK]20; N = A, C, G, or T; K = G or T), flanked by invariable arms of 20 bp, which are common to all the tags. These arms allow the sequence tags to be amplified and labelled with radioactive nucleotides (marked with a star) by PCR with primers P1 and P2. Following labelling and before hybridization, the invariant arms are removed by digestion with a restriction enzyme that recognizes sequences (shown in red boxes) between the variable region and the invariable arms. b | Signature-tagged mutagenesis screening in mice. A complex pool of tags (shown as coloured rectangles) is ligated to transposons. The tagged transposons are then used to mutagenize bacteria, which are subsequently assembled into a library. Only bacteria with tags that are efficiently amplified by PCR and are not cross-reactive with other tags in hybridization experiments are selected for inclusion in the pool that is used to infect the mice. Genomic DNA is isolated from this pool (input pool) and from the bacteria that are recovered from the animals (output pool). The tags from these two DNA pools are amplified and radiolabelled to create probes for hybridization. DNA from the colonies of the mutant library that hybridize to the probes from the input pool but not to the probes from the recovered pool represent mutants with attenuated virulence.

The feasibility of the STM method was evaluated using the mouse model of infection by Salmonella enterica, because previous research had shown that systemic infection of mice can result from the proliferation of a significant proportion of the bacteria that comprise the inoculum3. The process that was used to identify a large number of S. enterica virulence genes is illustrated in FIG. 1b.

STM has subsequently been used in many screens to provide functional information on thousands of genes, in particular from pathogenic bacteria and the yeast Saccharomyces cerevisiae. Here we describe the refinements that have been made to the methodology, particularly with respect to the use of different mutagens, signature tags and detection methods (for an overview, see Table 1). We review the broad range of viral, prokaryotic and eukaryotic systems to which tagging mutagenesis has been applied, and highlight its recent use in conjunction with high-throughput RNAi screens. Finally, we consider what the future might hold for the application of this technology in high-throughput genetic screens.

Table 1 Technical modifications of signature-tagged mutagenesis*

Technical adaptations

Methods of mutagenesis. Many approaches to random or directed insertional mutagenesis are available. Of course, the choice of the mutagenesis method will depend on its applicability to the organism that is under investigation. Those that have been used in combination with STM include in vivo and in vitro transposon mutagenesis (in this case it is essential that only one insertion occurs in each genome), shuttle mutagenesis, insertion–duplication mutagenesis by homologous recombination, gene replacement by homologous recombination and illegitimate or non-homologous recombination (Fig. 2). The chosen method should enable the efficient generation of large libraries of different mutants.

Figure 2: Methods for generating pools of tagged mutants.
figure 2

Panels a, b and c show methods that involve random transposition. The methods in panels d and e use homologous recombination. DNA tags are represented by different coloured segments, whereas the targeted sequences are shown in green. Tags can be introduced into the genome by direct in vivo transposition of the target organism (panel a). Alternatively, transposition can be carried out on the target DNA library in vivo, in an organism for which an efficient transposition method exists (panel b), or in vitro, on isolated DNA (panel c). In the methods depicted in panels b and c, the mutagenized DNA is subsequently reintroduced into the target organism to allow the incorporation of tags into the chromosome by homologous recombination. Insertion–duplication mutagenesis (panel d) involves ligating small fragments (random or specific) of target DNA into a pool of tagged plasmids. The plasmids are then introduced into the microorganism of interest, where they integrate into the genome. In PCR-mediated gene disruption (panel e), PCR is carried out to amplify a selectable marker, which is flanked with short sequences (50 bp) that are identical to those immediately downstream and upstream of the targeted gene. When introduced into a cell, the resulting PCR product that incorporates the marker can replace the targeted gene by homologous recombination. For more information on techniques of transposon-based mutagenesis see Ref. 81.

Tags and screening. The problem of unreliable amplification and/or labelling of tags can be solved by the empirical selection of tags on the basis of their efficient amplification, labelling and lack of cross-hybridization to other tags. Such pre-selected tags are then used separately to generate a potentially infinite number of mutant strains, which are arrayed according to the tags they carry (Fig. 3Aa). Another advantage of this approach is that because the identity of the tag in each mutant is known, labelled tags can be hybridized to purified plasmids that carry tags4, or to purified tag DNA5 rather than to the chromosomal DNA of mutant strains. This greatly increases the sensitivity of the assay and allows the use of non-radioactive detection methods, for example, digoxigenin4 or biotin6. The incorporation of longer tags7,8 or multiple tags at each mutated locus6,9 is another adaptation that was designed to increase the reliability of the screening process.

Figure 3: Methods for the detection of signature tags.
figure 3

Several techniques have been designed that incorporate synthetic DNA tags (A) or that take advantage of flanking sequences (B). A | Tags that are efficiently and specifically amplified and labelled can be pre-selected and used repeatedly to generate separate pools of mutants (coloured ovals in part Aa). Membranes can then be constructed with purified tags or the plasmids that harbour them. The detection of tags can be carried out without the need for hybridization. Tags can be amplified in multiple PCRs, each containing a different primer pair for a specific tag (Ab), or in polymorphic tag-length transposon mutagenesis (PTTM, panel Ac), tags of different length are amplified with a single primer pair, giving rise to products of various sizes. B | Probes that are generated from the flanking sequences can be used to hybridize to genomic microarrays. In transposon site hybridization (TraSH; panel Ba), flanking sequences are amplified by ligating linkers to digested genomic DNA from pools of mutants. In microarray tracking of transposon mutants (MATT; panel Bb), flanking sequences are amplified by arbitrary PCR (which involves two rounds of PCRs, with the first round including a primer of degenerate sequence (dashed arrow) and a transposon specific primer (solid arrow)). In designer arrays for defined mutant analysis (DeADMAn; panel Bc), the sequences that flank each mutation are isolated and assembled onto an array, which is then used for subsequent hybridizations.

Another modification involves the use of high-density oligonucleotide arrays for hybridization analysis6,9,10,11,12. In principle, it enables thousands of sequences to be analysed in parallel, but in the case of pathogenic bacteria, the number of mutants that can be screened in vivo is sometimes restricted by aspects of host anatomy and immunity; this limitation must usually be investigated in pilot experiments before large-scale screening can be initiated. Therefore, to fully exploit the potentially vast scale-up that is offered by microarrays, pools of DNAs or microorganisms from different hosts might need to be combined before hybridization analysis. However, if mutant microorganisms are being tested in environments outside living hosts, these assays can frequently be scaled up to allow analysis of highly complex pools9,13.

STM without hybridization. As an alternative to hybridization, PCR products can be analysed directly5 to indicate the presence or absence of tags (Fig. 3Ab). This method relies on using primers that are specific for each tag. Mutagenesis is carried out with transposons that carry different tags of known sequence, and the DNA that is recovered from virulent mutants is subjected to PCRs in which at least one of the primers is tag-specific. The total number of PCRs that are required for analysis is therefore twice the number of mutants being analysed, and the products are visualized by agarose gel electrophoresis14. This simple modification has the great advantage of circumventing the need for hybridization after the PCR step. However, this approach is inherently less quantitative, and a large number of PCR products must be analysed by gel electrophoresis — a problem that was addressed by the introduction of multiplex PCR-based STM15. This modification uses a small number of tags with known sequences that have been combined with three different selection markers. The mutants that are recovered are identified by a PCR in which a tag-specific primer is combined with three primers that anneal to the selection markers, yielding three different PCR products. For example, a combination of 24 sequence tags and 3 selection markers allowed a pool of 72 mutants to be analysed in 24 PCRs15.

Another approach that avoids hybridization, and further reduces the number of PCRs that are required, is polymorphic tag-length transposon mutagenesis (PTTM), which has been applied to group A Streptococcus16 (Fig. 3Ac). In this modification of the method, specificity is conferred by the different lengths of the tags, each of which can be distinguished by the separation of PCR products on acrylamide gels. Only two PCRs (for input and output) are required for each screen.

TraSH, MATT and DeADMAn. Transposon site hybridization (TraSH)17,18, microarray tracking of transposon mutants (MATT)19, and designer arrays for defined mutant analysis (DeADMAn)20 are variations on STM that incorporate microarray technology (Fig. 3b). In each case, DNA is extracted from bacterial transposon-mutagenized pools before and after a selective process, and unique sequences that are physically linked to each mutation are amplified and labelled before hybridization to a genomic microarray. By comparing the signal intensity that is generated by probes that have been derived from the mutants pre- and post-selection, those with a selective disadvantage are identified. The main difference between these techniques is the way in which specific probes are generated for each mutation. In the case of TraSH, the genomic DNA that is isolated from the mutant pool is partially digested with a restriction enzyme that makes frequent cuts in the genome. Double-stranded adaptors are then ligated to the ends of the digested DNA. A PCR is carried out with primers that anneal to the adaptors to amplify the DNA regions that flank the transposon insertion. Next, the PCR products are used as templates for transcription by T7 RNA polymerase, which transcribes from the transposon into the genomic DNA that flanks the transposon insertion. Finally, labelled cDNA is generated by reverse transcriptase PCR (RT-PCR) and hybridized to DNA microarrays.

This method has been simplified by eliminating the adaptors and using degenerate primers for probe generation by RT-PCR. The microarray analysis was also optimized by labelling input and output probes with different fluorescent dyes to allow their simultaneous hybridization to the DNA array21.

The distinctive feature of MATT19 is that the DNA that lies adjacent to a transposon insertion is amplified by a two-step PCR, without the RT-PCR step that is carried out in TraSH. In the first reaction, genomic DNA is amplified by a transposon-specific primer and a primer containing a degenerate 3′ region and an invariant 5′ anchor region. The products of this reaction are used as a template for a second PCR, which uses primers that are complementary to an amplified region of the transposon and the conserved portion of the degenerate primer. The amplified DNA from this reaction is coupled to a dye and hybridized to a microarray. This method of obtaining the flanking sequences is known as arbitrary PCR, because of the degenerate nature of the primer that is used in the first round of amplification.

In the more laborious DeADMAn approach, the sequences that flank transposon insertion sites are determined for every mutant in the pool, and oligonucleotides that are based on these flanking sequences are used to build a microarray. DNA from pools of mutants is extracted in the normal way, cut with restrictions enzymes and ligated to adaptors. Nested PCR primers are then used to amplify the DNA that flanks the transposon, and the products are labelled with fluorescent dyes before hybridization to the microarray20. These approaches all have an important advantage in that they circumvent the need to synthesize tags for each transposon.

Choice of tagging and detection methods. The principles of tag design and detection remain universal and can, in principle, be applied to any organism. The original tag design2 has proved remarkably robust and has been used in the majority of bacterial STM studies22. It has also been adapted for use in yeast9,10 and in mammalian RNAi screens23,24 (see below). PTTM, TraSH, MATT and DeADMAn have not been used extensively outside the laboratories in which they were developed.

With respect to the detection of tags, hybridization-based methods are the most popular. However, to our knowledge, there has been no comparative study of the sensitivity and ease of different tagging and detection methods with the same model organism and selection procedure. For this reason, the choice of a specific method seems to be largely based on personal preference and expertise.

Application to pathogenic bacteria

So far, STM has found its broadest applications in studying pathogenic bacteria. Numerous screens have been carried out involving all the leading human pathogens that are genetically tractable, which has resulted in the identification of over 2,000 virulence and colonization determinants (see Refs 25, 26 for reviews). Some of these studies are listed in Table 1. Examples of important biological insights that have emerged from follow-up studies are highlighted in Box 1. Bacterial pathogens frequently colonize more than one cell type, tissue or organ of a host during the course of infection, and different virulence factors are needed to enable growth and survival in these different environments. To identify genes that are involved in these processes, different models of infection can be used to screen the same mutant library in the same host by STM, in order to identify tissue- or organ-specific virulence factors27,28 (Box 1).

Some bacteria show remarkable host adaptation and can only cause disease in certain species, whereas others can cause disease in a range of hosts. The genetic basis for this has received little attention and remains largely unexplained. To identify factors that mediate host adaptation, the same pools of mutants can be inoculated into different host species to reveal mutants that are attenuated for virulence in only one host29,30.

STM can also be used in knockout mutant mice to identify genes that counter innate immune effectors. In this technique, called 'differential STM' (Fig. 4), pools of mutant pathogenic bacteria are used to infect different immunodeficient mouse strains, and mutant bacterial strains are identified on the basis of their ability to proliferate in the tissues of mice of one genetic background, but not another31,32 Subsequent analysis of the specific functions of genes that are affected in the so-called 'counter-immune' mutant mice can be enhanced by knowledge of the function of the relevant host genes.

Figure 4: Differential STM screen.
figure 4

The same set of mutants is inoculated into wild-type animals, and those with a specific genetic defect (X−/−). Counter-immune mutants are attenuated in a wild-type host, but retain their virulence in the knockout animals. This indicates that there is an interaction between the product of the gene that is affected in the counter-immune mutant and the function that is lost in the knockout animal.

Application to yeast

Although STM was originally developed for use with microbial pathogens, it has also been exploited to assist large-scale functional genomic studies of the budding yeast S. cerevisiae. In S. cerevisiae, targeted mutagenesis is achieved by taking advantage of the efficient homologous recombination in this organism10. This has facilitated the construction of a nearly complete collection of deletion mutants, covering 96% of ORFs in the yeast genome9,13. To make the mutant library, short regions of DNA that corresponded to the ends of yeast genes were used in PCRs to amplify a selectable marker. Two unique signature tags were also incorporated into each construct, and a total of almost 6,000 deletion strains have been created by an international consortium13. Used in conjunction with a set of DNA microarrays that represent all the signature tags in the library, every member of this collection of mutants can be simultaneously tested for altered growth rates in response to any environmental condition. Those that have been tested to date have identified thousands of genes that are involved in processes as diverse as nutrient utilization, responses to osmotic stress, high pH, high osmolarity and ion stress13, resistance to DNA-damaging agents33,34,35,36, drug resistance, sporulation and post-germination growth37, mechanisms of haploinsufficiency38, the non-homologous end-joining pathway39, the response to proteasome inhibition40, endoplasmic reticulum biosynthesis41 and mitochondrial respiration42 (Box 1).

The yeast consortium constructed a library of both homozygous and heterozygous signature-tagged diploid mutants9,13. An interesting aspect of heterozygous deletion mutants is that some are sensitized to drugs that inhibit the product of the heterozygous gene11. This gene dosage, or 'haploinsufficiency' effect, can be used to identify genes with products that represent potential drug targets, a process that is referred to as 'chemogenomic profiling'. Two large-scale screens have been conducted to identify the targets of various clinically or agriculturally relevant compounds with distinct chemical structures43,44. Although haploinsufficiency or chemogenomic profiling has its limitations, this approach is likely to facilitate rapid identification of many targets for new small-molecule inhibitors.

The set of yeast deletion mutants (haploids or homozygous diploids) can also be examined for chemical sensitivity using a pooled mutant set and a barcode microarray readout. This approach identifies genes with products that buffer the cells from the toxic effects of the compound, thereby generating a chemical genetic profile. Compounds with similar chemical genetic profiles often target the same pathway and therefore have a similar mode of action, whereas deletion mutants that show a similar signature of chemical sensitivities often have similar cellular functions45,46,47.

Synthetic lethality is a useful genetic tool that has been pioneered in yeast48. It can result from a lack of function in parallel biochemical pathways or redundant components of the same essential pathway, and provides an indication that two genes are functionally related. The synthetic genetic array (SGA) system enables large-scale analysis of synthetic lethal phenotypes using high-density yeast arrays49,50. The availability of the signature-tagged mutant library and DNA microarrays to analyse the relative abundance of tags has also enabled synthetic lethal screens to be carried out on a genome-wide scale, a procedure that is referred to as synthetic lethality analysis by microarray (SLAM)51. An adapted version, dSLAM (diploid-based SLAM), involves transformation of heterozygous diploids carrying an SGA reporter, which enables selection for haploid double mutants after sporulation of transformed diploids, thereby improving the robustness of this method52.

Application to other organisms

A modified STM screen was used to find adhesins of the haploid, opportunistically pathogenic yeast Candida glabrata53. In this screen, different DNA tags were introduced into a dispensable chromosomal locus to provide 96 tagged strains. They were subsequently mutagenized by non-homologous integration of a vector that carried a selectable marker into the chromosome. Screening of 4,800 mutants yielded 31 that had altered adherence to human epithelial cells, and led to the discovery of a novel family of adhesive surface glycoproteins54.

Tools that are based on either homologous recombination or illegitimate recombination between plasmid and chromosomal DNA have also been developed for insertional mutagenesis-based STM screens on three other pathogenic fungi: Candida albicans55, Aspergillus fumigatus56 and Cryptococcus neoformans57.

The ability to clone entire viral genomes on BACs means that STM can be used to analyse gene function in DNA viruses with large genomes. The application of STM to the mouse g-herpesvirus 68 has allowed screens to be carried out in cultured fibroblasts and the mouse host58,59, and might in the future facilitate the analysis of genes that are involved in reactivation from latency — an important aspect of the infectious process of herpesviruses.

Protozoan parasites with haploid stages in their life cycles can, in principle, be suitable for STM screening, as long as efficient methods of insertional mutagenesis exist. Proof of principle for STM screening of Toxoplasma gondii (a zoonotic pathogen and a model for apicomplexan parasites, which include the genus Plasmodium) has already been established by insertional mutagenesis with an STM plasmid60.

Barcoding for RNAi screens

Until recently, it was difficult to conduct large-scale genetic screens in diploid mammalian cells because of the relatively large size of their genomes and the need to inactivate both alleles of a gene. However, the advent of transient gene silencing using small interfering RNAs (siRNAs) has, to a significant degree, overcome these problems61. In these screens, either chemically synthesized dsRNA or a retrovirus-based vector (from which short hairpin RNAs (shRNAs) are transcribed) are introduced into mammalian cells. The cellular mRNA that contains the sequence of the dsRNA or shRNA is usually recognized and degraded, resulting in efficient knockdown of gene expression. Libraries of thousands of siRNAs can be assembled for analysis in 96-well plates, but subsequent infection or transfection and analysis of mammalian cells is a labour-intensive and time-consuming process. Two groups have demonstrated the feasibility of using signature tagging in conjunction with siRNA to facilitate high-throughput screening. Berns et al.23 used the unique sequence of each shRNA as a signature tag, whereas Paddison et al.24 incorporated unique synthetic 60-bp tags into each shRNA vector.

The utility of this approach has been highlighted by its application to the discovery of a novel human tumour suppressor62, and a new target of the antiproliferative drug nutlin-3 (Ref. 63). Second-generation libraries comprising over 140,000 shRNA expression plasmids, each carrying a randomly generated 60-bp signature tag, are now available64. Whereas these two studies identified shRNAs that are selectively enriched under certain experimental conditions, recent work by Staudt and colleagues65 describes a method for identifying shRNAs that are depleted during the course of an experiment. These authors used a library of barcoded vectors encoding shRNAs under the control of an inducible promoter. The library is introduced into a cell line, which is then divided into two groups: one is grown in the presence and the other in the absence of the shRNA expression inducer. Barcode abundances, which are assayed using microarrays, give an indication of the fitness of cells expressing specific shRNAs (Fig. 5). This 'loss-of-function' RNAi screen (like conventional STM screens) relies on negative selection, and could be used to discover new oncogenic pathways that promote tumour malignancy65. These recently established resources should greatly facilitate large-scale analysis of mammalian gene function in the future.

Figure 5: RNAi screen with barcoding.
figure 5

Retroviral vectors that encode a library of short hairpin RNAs (shRNAs) that are under the control of an inducible promoter are introduced into a cancer cell line. The cells are then divided in two subpopulations: one is subjected to induction of shRNA expression and the other is used as a control cell population. An shRNA that reduces the expression of a protein that is critical for proliferation or survival of the cancer cells will be eliminated from the induced, shRNA-expressing culture. Genomic DNA is isolated from the two populations at different time points, and PCR is used to amplify the barcodes that are present in the genomic DNA. Amplified DNAs from the induced and control cultures are labelled with different fluorescent dyes and cohybridized to barcode oligonucleotides on microarrays to determine the relative abundance of each barcode in the two populations. This indicates the relative depletion or enrichment of cells that express a given shRNA65.

Perspectives

Future STM screens of organisms other than yeast will be greatly helped by the development of comprehensive, ordered libraries of mutants and whole-genome DNA microarrays. Some fully sequenced bacterial genomes have already been used to construct ordered mutant libraries66 and even if these are not made with chemically synthesized tags, STM can be carried out using the flanking sequences at insertion sites to provide tags. Recently, an ordered, non-redundant transposon mutant library of 4,596 predicted ORFs of Pseudomonas aeruginosa has been constructed (corresponding to 77% of all predicted genes) in a way that allows TraSH analysis to be carried out67. There is, in principle, no reason that similar libraries could not be constructed for other important bacterial species; such libraries could also be used for screening synthetic lethal mutations, similar to the yeast SLAM technique51.

The creation of large collections of mouse mutants, generated either by targeted68 or random69 mutagenesis, offers tremendous opportunities to study pathogenicity genes that target specific aspects of host immunity, and to discover novel mechanisms of resistance to microbial infection. For example, eight mutations that cause susceptibility to mouse cytomegalovirus (MCMV) infection have been identified in a screen of 3,500 mice70. Further screening is expected to define the MCMV 'resistome': the total number of genes with non-redundant functions in resistance to this pathogen. We can anticipate that the resistomes to other pathogens of mice will be characterized in the future. The corresponding mutant mouse strains could then be used in conjunction with ordered collections of signature-tagged mutant pathogens to provide high-throughput counter-immune screens for the identification of pathogen virulence factors that target specific immune functions.

One potential application of tagged strains has been largely neglected — the study of the populations of genetically identical strains. By varying the pool complexity and inoculum dose, it might be possible to exploit the tags to obtain information on anatomical and immunological bottlenecks, as well as pathogen population dynamics and transmission during infection71.

The integration of barcoding with RNAi screens is an emerging technology that promises to deliver unprecedented insights into eukaryotic cellular processes. The ability to knockdown up to three genes simultaneously using multi-shRNAs72,73 provides an opportunity for RNAi-based synthetic-lethality-like studies. Perhaps the most challenging aspect of multi-RNAi barcoding will be the design of efficient screening procedures to deliver specific selective pressures for physiological processes of interest.

Conclusion

STM has proved to be a robust and powerful high-throughput screening technique for the analysis of genes that are not essential for life, but are needed for growth in specific environments. Its application has uncovered unexpected phenotypes for many previously uncharacterized genes, particularly those for which bioinformatics has been essentially uninformative. But, it is important to bear in mind that for every identified gene with a sequence that does not reveal function, a great deal of careful work is required to identify more specific mutant phenotypes, which in turn provide clues to biochemical function. Further research is then necessary to define the precise role of the gene product by determining its cellular location, biochemical activity, interacting partners and structure. Given the amount of work that is involved in analysing gene function, insertional mutagenesis will continue to provide the basis for innumerable important biological questions for decades to come.