The power of whole-genome sequencing, open data release and crowdsourced analyses was amply illustrated during this summer's outbreak of Shiga toxin–producing E. coli strain O104:H4 in Europe. The first signs that a massive food-poisoning outbreak was underway in Germany emerged in early May. A large number of patients progressed to hemolytic uremic syndrome (HUS), a devastating complication resulting in hemolytic anemia, thrombocytopenia and acute injury to the kidney and (more rarely) the brain and pancreas. By the end of July, the death toll had risen to 49—29 attributable to HUS. Thousands more were sickened, many requiring hospitalization, rehabilitation, blood transfusions and/or dialysis. Overall, this was the most deadly E. coli outbreak on record.

Characterization of the etiological agent was initially hindered because laboratory tests focused on the most common HUS-causing E. coli O157 strains. Unlike O157 strains, the outbreak strain, O104:H4, does not ferment sorbitol, a trait usually used to screen for HUS-causing strains. It also produces an extended-spectrum beta-lactamase that renders it resistant to almost all penicillins and cephalosporins.

Isolation and identification of O104:H4 thus relied on use of an antibiotic-containing medium twinned with serotyping and detection of the Shiga toxin gene using the polymerase chain reaction. Conventional molecular typing using multi-locus sequence typing (MLST) was carried out at the University of Münster, Germany, using a database in Ireland. This confirmed that all isolates from the outbreak belonged to the same sequence type, suggesting a common source for the epidemic. Epidemiologists ultimately tracked the source to fenugreek seeds imported from Egypt.

This is when the genome sequencers came to the fore. On May 28, researchers at BGI in Shenzhen, China, obtained samples of genomic DNA of the outbreak strain from the University Medical Center Hamburg-Eppendorf. Within five days, BGI's scientists had completed seven runs on an Ion Torrent Personal Genome Machine and released the sequencing reads into the public domain. This rapid, open data release triggered a frenzy of crowdsourced analyses by bioinformaticians across the globe. A day later, a de novo assembly of the genome had been produced by a bioinformatician in the UK and within a week, over 20 entries had been filed on a new website dedicated to genomics of the strain, revealing details of its pathogenic potential and evolutionary origins. BGI subsequently used an Illumina HiSeq to produce an improved assembly.

In parallel with the BGI's crowdsourced efforts, the University of Münster and Life Technologies also sequenced the genome of an isolate from the outbreak using Ion Torrent sequencing. A day after the BGI's data release, they released a hybrid assembly that they had produced in-house, mapping as many reads as possible to a reference genome from a related strain and using de novo assembly only for unmapped reads. Two other research centers sequenced isolates from the outbreak using 454 technology—the Göttingen Genomics Laboratory and UK's Health Protection Agency (HPA). Finally, Pacific Biosciences—a relative latecomer to the scene—used their platform to sequence not only the outbreak strain, but also 11 other related strains, including 6 strains from the O104:H4 serotype that are not associated with HUS. In their analysis, they highlighted the presence of an unusually large number of accessory virulence factors (serine protease autotransporter cytotoxins), which, they speculate, might account for the high frequency of HUS in this outbreak.

So what did genome sequencing contribute? In this case, it made little or no difference to the management of patients or of the outbreak as a whole. Nonetheless, the ability to obtain genome sequences from outbreak strains within days of isolation was impressive and allowed the pathogen to be characterized at extraordinary speed. Similarly, an open attitude to data release meant that researchers and public health scientists could immediately exploit, and even add to, sequence-based information, using the existing well-defined collection of HUS-associated type strains.

The outbreak also provided an opportunity to test drive and benchmark the range of competing sequencing technologies and associated bioinformatics pipelines. In fact, this strain has probably been sequenced on more platforms than any other organism, although curiously, among this flurry of draft genomes, we still have no finished genome. There is no simple path from genome sequence to an understanding of virulence or transmissibility. For example, in this outbreak, the genome sequence revealed a list of potential adhesins, but proving their roles, if any, in adhesion to foodstuffs or to the human gut will require months or years of detailed investigation in the laboratory.

One last promising application of whole-genome sequencing—its combination with epidemiological and geographical data to track the evolution of strains in outbreaks—hasn't figured in the O104:H4 outbreak. High-quality sequence is needed to detect true variants for genomic epidemiology but next-generation sequencers currently introduce too many errors into draft sequences—at rates as high as 1 in 100,000 bases—to make real variation distinguishable from base-call errors, at least without confirmation by Sanger sequencing. Even so, it is clear from previous work on outbreaks of Mycobacterium tuberculosis, Staphylococcus aureus, Salmonella enterica and streptococci that whole-genome sequence variants will be a powerful new adjunct to MLST data in epidemiological studies.

Thus, the E. coli outbreak provides an encouraging precedent for the application of open-source genome analyses to future outbreaks, particularly those caused by pathogens where there is no existing diagnostic or typing scheme in place. Here, the open-ended nature of genome sequencing might even allow the discovery of 'unknown unknowns' (that is, previously unsuspected pathogens or virulence determinants or resistance factors). Although genomics may not prevent the next outbreak, it is set to transform our ability to understand the origins, nature and spread of the next emerging pathogen. Knowledge is power. Vorsprung durch Technik!