The genome of an Escherichia coli strain that is emerging as a severe threat to human health has been sequenced. Comparing it with that of a harmless strain suggests why some forms of this bacterium cause disease.
Humans and Escherichia coli normally live happily together: E. coli is a beneficial bacterium commonly found in the human gastrointestinal system. But it also exists in harmful forms. One of the most harmful of these, called O157:H7, was first linked to human disease in 1983 (ref. 1), when it was shown to have been the cause of two outbreaks of an unusual and severe gastrointestinal ailment in the United States the previous year. The number of documented human illnesses and deaths caused by O157:H7 strains has since increased steadily worldwide, and these strains are now considered to be both emerging pathogens and major threats to public health2. Studies of O157:H7 receive a boost from a paper on page 529 of this issue3, in which Perna and colleagues describe and analyse the genome sequence of one strain of E. coli O157:H7.
Outbreaks of O157:H7 infections in humans have been traced primarily to infected cattle4, which are the source of contamination of ground beef, milk and — indirectly, through fertilizer — many fruit and vegetable products. Fortunately, proper cooking or pasteurization can prevent human infection by contaminated food. But infections can also be transmitted by sewage-contaminated water and person-to-person contact.
Genetically, the pathogenicity (ability to cause damage) and virulence (degree of pathogenicity) of O157:H7 strains depend on several factors. For example, these strains possess genes encoding the so-called shiga toxin, as well as small, circular DNA molecules that encode 'virulence factors'. And O157:H7 has at least one pathogenicity island — a section of chromosomal DNA containing many genes that contribute to pathogenicity5. Evolutionary studies have shown that all O157:H7 strains are closely related, and that they share a common ancestry with pathogenic O55:H7 strains6 (the numbers and letters refer to the different types of variable antigen molecules produced by these bacteria). Many virulence factors in O157:H7 strains were acquired from other strains or species by a process known as horizontal gene transfer5,7. So, what can analysis of the whole genome add to our knowledge of this pathogen?
Perna et al.3 have used the now standard 'whole-genome shotgun' approach to determine the genome sequence of the O157:H7 strain EDL933, which was isolated from ground beef linked to the 1982 outbreak. Their analysis of this genome, and in particular their comparison of this genome with that of a non-pathogenic laboratory E. coli strain, K-12 MG1655, is revealing.
The two genomes have a large amount of DNA — about 4.1 million base pairs (megabases) each — that was clearly derived from a common ancestor. This 'backbone' DNA is arranged similarly in the two strains: the two genomes can be lined up side by side along their lengths, except at one point, where part of the O157:H7 genome is reversed.
Although this conserved arrangement and inversion are not surprising for closely related strains8, one feature is rather unusual. Scattered roughly evenly within each genome's backbone are hundreds of sections of DNA that are unique to one or the other strain. Sections found only in O157:H7 — 'O-islands' — total 1.34 megabases and 1,387 genes. K-islands, which are unique to the non-pathogenic E. coli strain, add up to 0.53 megabases and 528 genes. It remains to be seen which of these differences contribute to the virulence and pathogenicity of O157:H7. The O-islands include many known and predicted pathogenicity genes — for example, some of them may encode toxins, or factors needed to make the adhesive filaments (fimbriae) that help the bacterium to stick to the lining of the gut. But it is difficult to predict gene function accurately, and many of the O-islands might have no connection with pathogenicity.
Differences between the DNA backbones may also be important. Although most of the backbone differences do not result in changes in protein sequence, many do: about 75% of the backbone-encoded proteins differ by at least one amino acid between the two strains. A more thorough analysis of these patterns will help in determining which differences are the result of natural selection and which are merely neutral changes.
Interestingly, the patterns of variation within each genome differ between the coding and non-coding strands of backbone genes. Perna et al. suggest that this may result from transcription-coupled repair of oxidative damage in DNA. This process was originally discovered in E. coli: as the coding strand is copied into RNA (transcribed), DNA damage in that strand is mended at a higher rate than normal9. Confirmation of whether this has caused the strand bias described here will require analysis of the genome sequence of another related species10.
The authors also suggest that much of the DNA in the O-islands and K-islands was acquired by horizontal gene transfer. One of their lines of evidence is that many of the islands contain sequences related to those of bacterial viruses and other vectors that carry genes from one species to another. Another possibility is that the islands were present in the common ancestor of the two strains, and then lost in one lineage (Fig. 1).
Analyses of the genomes of related species will also help to answer this question. But if horizontal gene transfer has occurred, then the fact that so many genes have been transferred to O157:H7 supports the suggestion11 that the continuing emergence of O157:H7 as a pathogen results from its ability to undergo rapid genetic change. This suggestion was made because a high proportion of O157:H7 strains have defects in genes involved in repairing DNA mismatches11. This tends to lead to higher rates of both mutation and acquisition of DNA from other strains. Many other pathogenic bacteria also have mismatch-repair defects12, but so too do many non-pathogenic E. coli strains13, and the existence of so many K-islands suggests that gene transfer is also common in non-pathogenic strains. Moreover, O157:H7 has an apparently normal long-term rate of sequence change14.
Perna et al.'s work3 emphasizes the power of comparing genomes from closely related strains or species — something that is becoming possible for more and more taxa. Such comparisons allow the detection and analysis of genetic processes that occur on relatively short timescales. They have led to discoveries such as the possible occurrence of transcription-coupled repair of oxidative damage, reported here3, and the finding that inversions that are symmetrical around the start point of replication of a bacterial chromosome are common in bacterial genome evolution8.
Many of these insights depend on knowing details such as gene location and orientation, and the absence of genes that are present in related species. This emphasizes the importance of having complete or nearly complete genome sequences. (The sequencing of the O157:H7 genome is nearly complete; only two gaps remain.) We should view with scepticism press releases announcing the completion of a genome sequence in a day15 — they refer simply to the completion of the initial sequencing part of a project, but many gaps (frequently hundreds) always remain. Closing those gaps is difficult but essential.
There is still much to learn about the biology and history of E. coli O157:H7, but the sequence and analysis presented by Perna et al.3 will be an excellent starting point. One of the great things about genomics is that the data it provides allow nearly any group, anywhere in the world, to start seeking answers to outstanding questions immediately.
About this article