The plot is a hierarchical cluster analysis of the 428 LS-BSR gene clusters that were significantly (chi-square test or Fisher's exact test, P < 0.05) more prevalent in genomes of symptomatic (LI and NSI) compared to asymptomatic (AI) cases for all 70 EPEC genomes analysed. The LS-BSR gene clusters, generated using a clustering threshold of 90% nucleotide identity, that were significantly (chi-square test or Fisher's exact test P < 0.05) associated with genomes of symptomatic compared to asymptomatic cases, were compared by hierarchical clustering41. Hierarchical clustering with Pearson correlation and average linkage was performed using MeV42. Each column represents a genome, and each row is an LS-BSR gene cluster. The gene clusters that were present with an LS-BSR value of ≥0.9 are indicated in blue, and the gene clusters that were absent (LS-BSR value of <0.9) in white. Red boxes indicate three groups of genomes, designated I, II and III, and red asterisks identify the nodes that separate the genomes into the three groups. The colour-coded rectangles at the top of the plot denote the phylogenomic lineage, and the colour-coded squares indicate the clinical outcome of each isolate. The colour coding of each symbol is given in the key at the top of the figure. A star symbol denotes the presence of bfpA in each genome.