The 1,667,867-base-pair genome of the bacterium that is responsible for peptic ulcers has been completely sequenced. Among the many features revealed is machinery for existence in an acidic environment.

Yet another completely sequenced bacterial genome! The 1,667,867 base pairs of Helicobacter pylori, reported by Tomb et al.1 on page 539of this issue, constitute the seventh completely sequenced bacterial genome (Table 1), and the fourth by The Institute for Genome Research (TIGR). With another 30 such projects likely to be completed in various labs around the world during the next two years, some readers may be wondering when enough is enough. Is this new sequence big news? The answer has to be a resounding “yes” — Helicobacter pylori(a.k.a. Campylobacter pylori) is the organism that causes peptic ulcers and, astonishingly, it may infect as much as half of the world's population.

Table 1 Fully sequenced genomes

In 1983 an Australian physician, Robin Warren, and his colleague Barry Marshall, a biochemist, wrote separate letters to The Lancet under a single title2. Warren described how, over the previous five years or so, he had observed spiral-shaped bacteria on the gastric epithelium of patients with peptic ulcers. Marshall then discussed the history of pyloric bacteria in mammals, and offered some thoughts on why these Campylobacter-like organisms had not been reported previously. He noted that they had been found in non-human mammals, and that they were even observed long ago in human cadavers, but their existence had been rationalized as a post-mortem consequence.

In 1984, the two wrote an article together in The Lancet3 offering the hypothesis that these bacteria were actually the cause of peptic ulcers, a very common malady attributed to stress, diet and excess gastric acidity. At about the same time, a large epidemiological study of peptic ulcers was published in the Medical Journal of Australia4. The authors ascribed the varying incidence of peptic ulcers in different Australian states to environmental factors, such as contaminated drinking water. An accompanying editorial5 stated that "the agent is probably not a bacterium". Warren responded immediately, stating that the agent probably was a bacterium6.

Interestingly, the epidemiological study was based entirely on records of state-mandated prescriptions for the antihistamine drug cimetidine (better known by its brand name Tagamet) which, according to newspaper accounts, was the most prescribed drug in the world during the 1980s. Imagine the impact of a finding that penicillin or other antibacterial agents might be a more appropriate treatment. Henrik Ibsen would have loved it!

Bemused by the situation, I searched Medline for reports involving H. or C. pylori, as well as for cimetidine and, for reasons that will become apparent, for the enzyme urease. These were plotted against five-year time periods, and the graph (Fig. 1) speaks for itself — papers dealing with H. pylori and the enzyme urease have greatly increased in number, whereas cimetidine citations are in precipitous decline. But why is urease suddenly so popular? The answer is that this enzyme is crucial to the survival of H. pylori in the very acid pH of the stomach. (Incidentally, urease is also a sentimental favourite among biochemists because, in 1926, it was the first enzyme ever to be crystallized.)

Figure 1: Citations to Helicobacter (or Campylobacter) pylori, cimetidine (Tagamet) and urease plotted in five-year increments.
figure 1

Note the abrupt changes in the period following Marshall and Warren's report3 in 1984, that H. pylori probably causes peptic ulcers (data for the year 2000 extrapolated from 1997).

It has long been known that resident microorganisms in the stomach use the enzyme urease to convert urea to ammonia and carbon dioxide7. Few other organisms can survive in this acidic environment, but H. pylori has an electropositive internal milieu which helps it to fend off the onslaught of protons in the surrounding medium. Tomb et al.1 have now shown that the proteins in H. pylori contain twice as many of the basic amino acids arginine and lysine as proteins in other organisms. Beyond that, the positively charged ammonium ions must contribute greatly to this effect.

Urease is the main antigenic activity associated with H. pylori and it is a convenient diagnostic tool8, which is why there has been such an enormous increase in citations involving urease. Many of the previous observations are now put into a proper context by having the entire genomic sequence. For example, all indications are that the ‘pathogenicity island’ is the main virulence factor. This 40-kilobase segment contains a set of accessory factors for secretion of the cytotoxins that damage the host gut9. The region is flanked by insertion elements and is probably the result of horizontal gene transfer from some other organism.

The peptic ulcer-urease story is, however, hardly the only reason that the sequence reported by Tomb et al.1 is news. This is a genome that has something for everyone — clinicians, sociologists, epidemiologists, biochemists, ecologists, molecular biologists, immunologists and, last mentioned but hardly least interested, evolutionary biologists. Tomb et al. reveal the entire restriction-modification system for recognizing and destroying foreign DNA, and outline a complete scheme of metabolism on the basis of the resident genes. They have also worked out how H. pylori mimics blood-group antigens, as well as a likely molecular mechanism for encouraging immunogenic variation.

Of the many details reported, the generalizations about molecular evolution are the most interesting. H. pylori is a Gram-negative bacterium, so most of its protein sequences would be expected to resemble other Gram-negative bacteria such as Escherichia coli and Haemophilus influenzae. Most of them do, of course, but a considerable number have sequences that are most similar to other, more distantly related, bacteria. One enzyme that is involved in chorismate biosynthesis is even reported to be most closely related to an equivalent in chloroplasts.

Although the significance of these anomalies is not clear, the possibility of rampant horizontal gene-transfer is unnerving to a community that hopes to reconstruct the history of life on the basis of amino-acid sequence comparison10. To test the point, I examined a number of bacterial urease sequences including H. pylori (Fig. 2). I found it unsettling not only that the generic boundaries were so ill-defined, but that the plant (jackbean) sequence was almost as similar to each of the bacterial sequences (an average of 65 per cent identity) as most of them were to one another.

Figure 2: The genome of the Gram-negative bacterium H pylori, sequenced by Tomb et al.1, has thrown up a few surprises.
figure 2

One is that certain protein sequences do not most closely resemble those of other Gram-negative bacteria, and this may sound a warning for reconstruction of evolutionary relationships. The phylogenetic tree shown was constructed from a 480-residue, highly conserved region of the enzyme urease. On the average, the plant (jackbean) sequence is 65 per cent identical with the bacterial sequences used. The closest relationship shown is between Klebsiella pneumoniae and Proteus mirabilis, which are 77 per cent identical.

There are other frustrations about the data. As in the previous studies, many of the potential gene-encoding regions have not been identified with regard to likely function. Given that the genes for all new proteins come from other genes by duplication, in whole or in part, followed by descent with modification, I am surprised that so many open reading frames remain as unidentified reading frames. More rigorous searching will doubtless establish more relationships, as has proved to be the case in the earlier genome reports.

Nonetheless, one cannot help but be impressed by the tremendous accomplishments of Tomb et al.1. Not only have they generated the data at a breathtaking pace, but the analyses are both insightful and thorough. The organizational problems associated with searching 1,600 open reading frames against existing databases, and then making proper judgements about the findings, are awesome. Bring on the next bacterial genome sequence — don't hold that TIGR!