The phylogeny of phages was reconstructed using large terminase sequences from this study (n = 397) and similar matches from all RefSeq r92 proteins (n = 532). The tree also includes large terminase sequences from complete RefSeq phage, the Lak megaphage clade9 (n = 9) and non-artefactual phage genomes that are more than 200 kb, from a previous study14. Huge phage clades identified in this study were independently corroborated with a phylogenetic reconstruction of major capsid protein (MCP) genes (Extended Data Fig. 5a) and protein clustering (Extended Data Fig. 5b). The tree was rooted using eukaryotic herpesvirus terminases (n = 7). The inner to outer rings display the presence of CRISPR–Cas in this study, host phylum, environmental sampling type and genome size. Host phylum and genome size were not included for RefSeq protein database matches for which the sequence may be from an integrated prophage or part of organismal genome projects. Scale bars show the number of substitutions per site (left) and number of base pairs (right).