a, Maximum-likelihood phylogenetic tree of the NCLDV inferred from a concatenated protein alignment of five core NCVOGs16. Branches in dark red represent published genomes and branches in black represent GVMAGs generated in this study. Shades of grey indicate boundaries of genus- and subfamily-level clades; previously described lineages are labelled. Tree annotations from inside to the outside: (1) superclade (SC), (2) GC content, (3) assembly size and (4) environmental origin. b, Distribution of NCLDV lineages across different habitats. The bars adjacent to the heat map show the total number of detected MCPs per habitat (facing to the right) and per lineage (facing downwards) as total count (total bar length) and corrected count on the basis of the average copy number of MCPs in the respective lineage (darker shaded bar length). The plot includes only lineages for which at least 100 MCPs could be detected. NCLDV lineages with available virus isolates are indicated in red. The turquoise dashed line indicates the total size of the metagenome assemblies that were screened in this analysis. Bars on the far right indicate, for each environment, the number of detected MCPs per assembled gigabase (Gb).