Assembly and de-replication with dRep results in more and higher-quality genome bins as compared to co-assembly. (a) A complete Escherichia coli genome was subset 10 times in increments of 10% (10%, 20%, 30% etc.). Subsets were compared to each other in a pairwise manner (100 total comparisons) using three algorithms- ANIm, MASH and gANI. For each pair of subsets, the alignment coverage between the two genomes as determined by MUMmer is shown on the x axis (aligned length / average genome length), and the ANI reported from each algorithm is shown on the y axis. ANIm and gANI are accurate when genomes are incomplete, but MASH is only accurate when genomes are essentially complete. (b) Using previously reported algorithm runtimes, we estimated the time required to de-replicate genome sets of various sizes. gANI exhibits a sharp exponential climb, limiting its use on larger genome sets; MASH and dRep do not. (c) De-replication of bins from individual assemblies and co-assembly (dRep assembly method) resulted in more bins (⩾75% complete, ⩽5% contaminated) than co-assembly alone. (d and e) Examples of genome relatedness figures generated by dRep. The red dotted line is the value of the lowest ANI resulting from a self-vs-self alignment of each genome in the cluster.