Bacterial clone-based genome sequencing frequently results in gaps, which harbour hypothetical genes of unknown function. Sorek and colleagues show that these gaps contain a vast array of genes encoding proteins that are toxic to the sequencing host (Escherichia coli), including previously uncharacterized restriction enzymes, toxin–antitoxin systems and non-coding RNAs. They also identify motifs in the cloned DNA that were predicted to bind to DnaA and interfere with E. coli replication. The database of potential toxic genes and non-coding elements could be a useful tool for identifying antimicrobial targets.