Introduction

Large and giant viruses are part of a group of double-stranded DNA viruses, the nucleocytoplasmic large DNA viruses (NCLDVs)1,2, which constitutes the viral phylum Nucleocytoviricota3. Viruses of this phylum infect a wide range of eukaryotic hosts, from the tiniest known unicellular choanoflagellates to multicellular animals4. NCLDVs typically replicate in so-called viral factories built in the host cytoplasm or use the host nucleus to replicate and sometimes assemble their progeny5,6. Hallmark features of these viruses are large genomes ranging from 70 kb to up to 2.5 Mb and virions that can reach more than 2 μm in length7. The term ‘giant virus’ was initially coined in the 1990s, when it became apparent that viruses that infect algae have unusually large genomes8 and, further, in the early 2000s, when the first virus with a genome in the megabase range was discovered; initial light microscopy observations led to the assumption that its particles corresponded to a Gram-positive bacterial pathogen of amoebae9,10. More detailed ultrastructural analyses revealed a typical icosahedral-shaped virion and genome sequencing yielded a 1.2 Mb viral genome11. This virus was named ‘mimivirus’, short for ‘microbe-mimicking virus’, and represented an unexpected novelty in the virosphere, due not only to its exceptional particle and genome sizes but also to its coding potential as it includes several genes with possible roles in protein biosynthesis11. Since this discovery of giant viruses, their coding potential has been full of surprises, and the presence of hallmark genes of cellular life led to the hypothesis that these viruses might represent an enigmatic fourth domain of life11,12,13. Equally intriguing, much smaller viruses (so-called virophages) were found to infect some NCLDVs that have exclusively cytoplasmic infectious cycles; virophages parasitize and sometimes kill their hosts14. Also discovered was a third partner coined ‘transpoviron’, which corresponds to a 7 kb double-stranded DNA episome that is able to propagate using both the giant virus and the virophage particles as vehicles15,16.

For well over a decade, giant viruses had chiefly been studied through cultivation-based approaches until very recently, when virology followed the footsteps of microbial genomics by applying cultivation-independent metagenomics to investigate the evolutionary diversity and metabolic potential of these viruses at an unparalleled pace. In this Review, we explore a wealth of experimental data that has revealed many insights into giant virus biology, in particular their virion structure and distinctive infection strategies. We build upon this knowledge by integrating the latest sequence-based studies that expanded NCLDV diversity, biogeography, coding potential and putative host range. Furthermore, we discuss compelling evidence that the presence of a variety of cellular hallmark genes in giant virus genomes enable the virus to reprogramme host metabolism, and that the integration of giant virus genetic material into host genomes may impact the biology and evolution of the eukaryotic cell.

Giant virus discovery through isolation

The earliest discovered NCLDVs were the Poxviridae, which include the causative agent of smallpox and were the first viral particles seen under a microscope more than 130 years ago17. Large viruses that infect Chlorella green algae were isolated in the 1980s. The first genomes of Vaccinia virus (a poxvirus) and Paramecium bursaria chlorella virus 1 (PBCV1) were sequenced in the early 1990s18 and 1999 (ref.8), respectively. Shortly thereafter, additional genomes of Poxviridae were sequenced (Fig. 1), with sizes ranging from 120 kb to 360 kb (ref.19). Subsequently, other viruses that infect animals, including members of the Ascoviridae, Iridoviridae and Asfarviridae families, were found and their genomes sequenced20,21,22. Genomes of viruses in these groups are comparably small (up to 220 kb) and even smaller in the recently discovered shrimp-associated Mininucleoviridae (70–80 kb)23. In addition to animal-infecting NCLDVs, a wide range of NCLDVs were detected in various eukaryotic algae, including chlorophytes, haptophytes, pelagophytes, brown algae and dinoflagellates in the early 2000s24. These algae-associated NCLDVs were classified as Phycodnaviridae24 and Mesomimiviridae25,26 and, although most of their genomes are ~200–500 kb (refs.24,27), the genomes of Tetraselmis virus and Prymnesium kappa virus RF01 are 668 kb (ref.28) and 1.4 Mb (ref.29), respectively.

Fig. 1: Timeline of important cultivation-dependent and cultivation-independent discoveries in the Nucleocytoviricota.
figure 1

Stacked bars indicate the number of virus isolate genomes (blue; left y-axis) and giant virus metagenome-assembled genomes (GVMAGs; red; left y-axis) of members of the Nucleocytoviricota that have been published and/or became available in the NCBI Genbank database for each year on the x axis. Filled circles indicate the assembly size of virus isolates and GVMAGs (right y-axis). Important cultivation-based events are highlighted by green stars. (1) First isolation of a giant virus (Paramecium bursaria chlorella virus 1)156. (2) Isolation of mimivirus in amoeba co-cultivation10. (3) Isolation of a giant virus together with its Cafeteria roenbergensis host58. (4) Isolation of amphora-shaped pandoravirus with a 2.5-Mb genome48. (5) Recovery of pithovirus, from a 30,000-year-old ice core, through co-cultivation with an amoeba174. (6) Isolation of faustovirus in co-cultivation with Vermamoeba vermiformis34. (7) Isolation of Bodo saltans virus, the first isolated member of the Klosneuvirinae with its native kinetoplastid host64. (8) Isolation of tupanvirus in amoeba, currently the largest giant virus based on its capsid diameter and length7. (9) Isolation of medusavirus in amoeba, representing a divergent new lineage in the Nucleocytoviricota33. Important genomic and metagenomic events are highlighted by yellow stars. (1) First genomes of members of the Nucleocytoviricota were sequenced: Vaccinia virus (1a)18 and Paramecium bursaria chlorella virus 1 (1b)8. (2) Sequencing of the first giant virus with a genome size above 1 Mb: Acanthamoeba polyphaga mimivirus11. (3) First-time recovery of GVMAGs (from Organic Lake, Antarctica)90. (4) Viral subfamily Klosneuvirinae proposed based on GVMAGs recovered from environmental sequence data92. (5) Single-cell genomics-enabled discovery of Choanovirus from marine choanoflagellates85. (6) First large-scale global metagenomic study leading to the recovery of more than 2,000 GVMAGs yielding an 11-fold increase in phylogenetic diversity and a 10-fold expansion in functional diversity82. (7) Detection of whole giant virus genomes integrated in host chromosomes119.

After the discovery of mimivirus in 2003 (ref.10), other NCLDVs with larger virions and genomes above 500 kb have been found to infect heterotrophic protists30 (mainly members of the Amoebozoa). For more than a decade, Acanthamoeba strains had chiefly been used as hosts for the co-cultivation of new viruses, leading to the frequent isolation of closely related giant viruses able to infect this unicellular host31. Acanthamoeba spp. has proven to be a particularly suitable host for many Megamimivirinae and Marseilleviridae31. Consequently, viruses from these taxonomic groups are currently among the most commonly cultivated NCLDVs with more than 30 genome sequences readily available in public databases, including the novel Megamimivirinae lineages tupanvirus7 and cotonvirus32. The co-cultivation approach has been widely successful and also led to the recovery of isolates from divergent NCLDV clades, facilitating the organization and naming of pithoviruses, pandoraviruses, molliviruses and medusaviruses33. More recently, the use of alternative hosts, such as Vermamoeba spp., has led to the co-cultivation of several new faustovirus isolates34, orpheovirus35, pacmanvirus36 and kaumoebavirus37 — all distant relatives of pithovirus, marseillevirus and asfarvirus. A newly developed high-throughput co-cultivation-based approach using high-content screening microscopy38 has proven a valuable tool for giant virus discovery and isolation38. Yet, co-cultivation is limited by host specificity of giant viruses4; some NCLDV lineages are able to infect only specific hosts, such as certain species of Acanthamoeba39, whereas others may be more versatile, exhibiting a broader host range7. Considering the enormous diversity of eukaryotes40, and in particular of microeukaryotes, it is likely that giant viruses that have been recovered through isolation reflect only a minute fraction of NCLDV lineages extant in the wild.

Virion structures and infection strategies

Viruses with nucleocytoplasmic infectious cycles

Chloroviruses were the first viruses designated as ‘giant viruses’8 owing to their large icosahedral virions of 190 nm in diameter (T number 169)41 (Fig. 2) and genomes of up to 370 kb (Table 1). In particular, PBCV1 (ref.42) was extensively studied; its capsids have a few external fibres extending from some of the capsomers41 and a spike-like structure present at one vertex to anchor onto the host cell43 (Fig. 2). The capsids are glycosylated with an unusual oligosaccharide synthesized by the virus-encoded glycosylation machinery; the oligosaccharide is N-linked to asparagines in atypical sequons44 in the major capsid protein (MCP; Vp54)45. The outer capsid layer covers a single lipid membrane46, which is essential for infectivity. Chloroviruses deliver their genome into their algal host by creating a hole in the cell wall using a virus-encoded enzyme packaged in the virion. The viral internal membrane then fuses with the host plasma membrane, forming a channel through which the genome and some viral proteins enter the cell. Because the virus does not encode an RNA polymerase, the incoming genome must be transcribed inside the host cell’s nucleus prior to virion assembly in the cytoplasm. Virions are released after host cell lysis.

Fig. 2: Giant virus infection mechanisms and virion structures.
figure 2

A | Giant viruses enter the host by attachment to the host cell envelope followed either by endocytosis uptake (part a) or membrane fusion after capsid opening (part b). Giant virus transcription is then initiated in the cytoplasm or viral factory (part c; purple arrow) or the host nucleus (part d; green arrow). In the periphery of the cytoplasmic viral factory, genome replication and assembly of new virions then occurs (part e) or newly synthesized virions are scattered in a large cytoplasmic viral factory (part f). Finally, virions are released after host cell lysis (part g), fusion of virion-containing vacuoles with host cell membrane (part h) or exocytosis of membrane-bound virions (part i). Small coloured circles indicate viral genome and viral proteins. B | Infection strategies of selected giant viruses. C | Transmission electron micrographs of ultrathin sections of non-icosahedral viruses embedded in resin. D | Structures of isolated giant viruses resolved by cryo-electron microscopy67,71,176,177. Note the blue coloured stargate structure on mimivirus. The scale bars in parts C and D are 100 nm. AaV, Aureococcus anophagefferens virus; CroV, Cafeteria roenbergensis virus; OtV, Ostreococcus tauri virus; PBCV1, Paramecium bursaria chlorella virus 1. Part D reprinted from refs.67,176,177, CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). Part D reprinted with permission from ref.43, PNAS. Part D, image courtesy of M. Kazuyoshi. Part D, image courtesy of T. Klose.

Table 1 Virion structures, genome characteristics, infection strategies and hosts of giant virus isolates

Other Nucleocytoviricota viruses that infect algae constitute small virions. Among the smallest members of the Nucleocytoviricota are prasinoviruses with virion diameters of ~120 nm and genomes up to 410 kb. Being small is crucial for infecting and replicating within Ostreococcus tauri, which is one of the smallest free-living eukaryotes with cells only 0.8 µm in size47. Following viral infection, the genome is released into the nucleus and its replication begins almost immediately. Within hours, new virions assemble in the cytoplasm and, in less than 24 h, host lysis occurs. The host cell nucleus, mitochondrion and chloroplast remain intact throughout this period.

Larger viruses with nucleocytoplasmic infectious cycles are the amoeba-infecting pandoraviruses, with amphora-shaped virions up to 1 µm in length and 500 nm in diameter (Fig. 2) and genomes up to 2.5 Mb (ref.48). There is at least one lipid membrane lining a thick tegument made of three layers, including one made of cellulose49. The particles are taken up through phagocytosis and an ostiole-like structure at the apex opens to allow the internal membrane to fuse with the phagosome membrane; this results in the delivery of the genome and necessary proteins into the host cytoplasm. Although pandoraviruses encode an RNA polymerase, the enzyme is not packaged in the capsids and, thus, infecting viruses rely on the host cell for early transcription of viral genes. At the viral factories set up within the nucleus (Fig. 2), new virions start to assemble from the apex and lipid vesicles are recruited to the viral factory to be used in virion assembly. Nascent virions are released either by cell lysis or, if viruses are within vacuoles, by exocytosis through membrane fusion with the plasma membrane50,51.

Molliviruses have an ovoid virion of smaller size (~650 nm) and genomes of 650 kb (refs.49,52) (Fig. 2); they share 16% of their genes with pandoraviruses but two-thirds of their genes are ORFans. The capsids seem haloed by fibrils of different lengths and they present a membrane-lined tegument resembling that of pandoraviruses. Their infectious cycle is also similar to that of pandoraviruses, except that DNA seems to be pre-packaged in filaments that accumulate in the viral factory before being loaded into the maturing virions. Membrane remodelling involved in virion assembly was extensively analysed by cryogenic electron microscopy (cryo-EM)53.

Medusaviruses are also Acanthamoeba-infecting viruses33. Their icosahedral virions are 260 nm in diameter, covered by spherical-headed spikes extending from each capsomer, and have a lipid membrane that surrounds the capsid interior. A low-resolution structure was determined by cryo-EM, which returned a T number of 277 (ref.54). The mechanism of entry and egress of the medusavirus virion from its host has yet to be determined. After uptake into the host cytoplasm, its DNA is replicated in the host nucleus and virions assemble in the cytoplasm (Fig. 2).

Viruses with exclusively cytoplasmic infectious cycles

The second most studied virus after PBCV1 is that of the amoeba-infecting mimivirus9. The ~700 nm virions are made of an icosahedral capsid ~500 nm in diameter with a genome of 1.2 Mb (ref.11) (Table 1). Bacterial-type sugars are synthesized by the virus-encoded glycosylation machinery and are the building blocks of the complex 70 kDa and 25 kDa polysaccharide structures that decorate the mimivirus fibrils surrounding the capsid55. A low resolution structure of the mimivirus capsid has been determined56 (Fig. 2) and detailed atomic force microscopy provided additional insights into virion composition57, further underlining the complexity of the capsid. There are two internal lipid membranes, one lining the capsid and the other in the nucleoid compartment, which contains the genome and hundreds of proteins, including RNA polymerase and transcript maturation machinery. It has been proposed that the non-structural proteins in the nucleoid are required to initiate the viral infectious cycle, protect the virion from oxidative stress and perform early transcription5,58. Preliminary data suggest that the genome is organized in a 30-nm diameter helical nucleocapsid comprising GMC oxidoreductases, which also constitute the glycosylated fibrils of the capsid. The folded genome lines the shell of the nucleocapsid, leaving a central channel that can accommodate large proteins, including RNA polymerase59. Mimivirus enters its host by triggering phagocytosis upon adhering to the host cell membrane with its glycosylated fibrils. Once in the vacuole, a specific structure at one vertex of the icosahedron (the stargate) opens, and the membrane under the capsid is pulled out and fuses with the vacuole membrane60, allowing transfer of the nucleoid into the host cytoplasm61,62. Similar to other known members of the Mimiviridae, Mimivirus replicates in its host’s cytoplasm58,63,64 (Fig. 2). Early transcription begins using the virus-encoded transcription machinery, which, at first, remains confined in the nucleoid65. The accumulation of nucleic acids due to active transcription and replication leads to the size of the viral factory increasing and newly synthesized virions start budding at its periphery, recycling host cell membranes derived from the endoplasmic reticulum61,66 or Golgi apparatus32. The last step of virion maturation, after genome loading into the nucleoid, is the addition of the fibril layer to the capsids66, with hundreds of newly synthesized virions released after cell lysis.

Several viruses related to mimivirus have similar infectious cycles but smaller virions. Among them is Cafeteria roenbergensis virus, which has an icosahedral capsid of 300 nm in diameter (Fig. 2) with a lipid membrane underneath the capsid shell. Its mode of infection is not fully understood but, similar to mimivirus, a nucleoid structure in the cytoplasm and extracellular empty capsids have been observed, supporting an external opening of the capsids followed by fusion of the internal membrane with that of the cell, thus allowing the transfer of the nucleoid into the host cytoplasm. Virions contain ~150 proteins, which either make up the icosahedral capsid or are necessary to initiate the infectious cycle61. Nascent virions assemble during the late stage of infection and are released through cell lysis. The structure of the complex capsid, determined by cryo-EM, corresponds to a T number of 499 and has provided a new model for capsid assembly67.

Another member of the Mimiviridae, with a similar icosahedral capsid of 300 nm in diameter, is Bodo saltans virus. Its capsid appears to be made of two proteinaceous layers surrounded by 40 nm-long fibrils. A possible stargate-like structure is present at one vertex of the capsid and there are two membranes, one lining the external protein shell and one internal to the nucleoid compartment containing the genome. The infectious cycle is similar to that of mimivirus except that the host’s nuclear genome appears to be degraded. The viral factory develops at the posterior pole of the cell to fill two-thirds of the cell space, pushing aside the nucleus and organelles. Lipid vesicles are recruited for virion assembly, which takes place at one side of the viral factory, and mature virions detach after genome loading and migrate to the posterior pole of the cell. Virions are released by budding in vesicles from the host membrane after cell lysis64 (Fig. 2).

Some of the largest viruses that infect algae belong to the Mimiviridae, all of which have icosahedral capsids with sizes ranging from 150 nm in the case of Aureococcus anophagefferens virus (Fig. 2; Table 1) to 370 nm in the recently described Prymnesium kappa virus29. These viruses also build a viral factory in the host cytoplasm, but it is unknown if the transcription machinery is loaded into the capsids, allowing an entirely cytoplasmic infectious cycle.

The largest virions found in the Nucleocytoviricota are those of pithovirus and cedratvirus (Fig. 2), which have very large amphora-shaped capsids that can be up to 2-µm long and 600-nm wide encapsidating genomes of up to 685 kb (Table 1). The capsids are closed by corks — one cork for pithovirus68,69 (Fig. 2) and two for cedratvirus70 — that are made by proteins organized in a honeycomb array. Despite a virion morphology that closely resembles that of pandoravirus, the external tegument is different and appears to be made of parallel strips and no cellulose; the capsids appear to be coated with short sparse fibrils35,68. The infectious cycle proceeds, as for other amoeba-infecting viruses, by phagocytosis followed by capsid opening and membrane fusion with the phagosome5. For pithovirus and cedratvirus, the RNA polymerase loaded in the virion starts early transcription in the cytoplasm and the host nucleus remains intact during the entire infectious cycle. During maturation, reservoirs of tegument and corks accumulate in the host cytoplasm and are used to build the new amphora-shaped virions. The nascent virions then exit the host cell either by exocytosis or upon cell lysis68,70.

Outside of the Mimiviridae, there are smaller amoeba-infecting viruses such as members of the Marseilleviridae, which have icosahedral virions of ~250 nm in diameter (Fig. 2). A recent publication and two preprints showed the cryo-EM structure of the capsid for two members of the family at various resolutions, revealing a T number of 309 and a complex capsid structure41,71,72 with many minor capsid proteins. Melbournevirus and other members of the family Marseilleviridae are taken up by phagocytosis and then lose their icosahedral appearance to become spherical after the disappearance of the vacuole membrane. Similar to Megamimivirinae, their genome remains in the cytoplasm; however, RNA polymerase is not loaded into the virion. Instead, the nuclear proteins are recruited to the early viral factory, including the host RNA polymerase that performs early transcription73. The appearance of the cell nucleus changes early in infection and becomes leaky through a still-unknown mechanism triggered by viral infection. After 1 h of infection, the nucleus integrity is restored and the virus-encoded RNA polymerase performs intermediate and late transcription74, and icosahedral particles assemble inside the viral factory (Fig. 2A). Marseilleviridae viruses encode histone doublets that form nucleosomes to pack the genome into virions75,76,77. Mature capsids can gather in large vesicles78 and cell lysis leads to the release of both individual virions and filled vacuoles.

As these examples illustrate, there is no shared blueprint for the structure of giant viruses and their infection mechanisms; these characteristics vary between giant virus lineages and are likely shaped by the host organisms. The host range of the experimentally characterized giant viruses is limited to a few amoeba and algae lineages representing only a minute fraction of eukaryotic diversity. Thus, we expect that many more unusual virions and infection strategies will be revealed when new viruses will be captured together with their native hosts.

Cultivation-independent genomics

Sequence-inferred prevalence and diversity of giant viruses

Many important discoveries in giant virus biology and diversity have been made through giant virus isolation and cultivation. However, such approaches are constrained by the need to satisfy optimal growth requirements in a laboratory setting and are often restricted to lytic viruses. Cultivation-independent methods have proven to be an indispensable tool to discover the genetic make-up of giant viruses from environmental samples.

In the earlier days of metagenomics, single-marker gene-based surveys (Box 1) revealed that several viruses of the Phycodnaviridae and Mimiviridae were present in a wide range of marine metagenomes collected during the Tara Oceans and the Sargasso Sea expeditions79,80 and that these viruses were more abundant in the photic layer than eukaryotes80. In a follow-up study, data from these surveys gave rise to the hypothesis that giant viruses are more diverse in the oceans than any cellular organism81. Subsequently, a large-scale analysis of the NCLDV major capsid protein (MCP), in which more than 50,000 of these proteins were found across Earth’s biomes, revealed the global dispersal of giant viruses, including in terrestrial ecosystems82.

Other approaches that enabled the discovery of novel NCLDVs are single-virus or single-cell genomics and mini-metagenomics (Box 1). First, sorting viral particles from marine samples enabled the detection of viruses that had previously been found to be associated with the algae Ostreococcus spp. and Phaeocystis globosa83. This approach led to the sequencing of several so-called giant virus single amplified genomes, of which the largest was a 813 kb genome belonging to the Mimiviridae that encoded a metacaspase, which potentially enables autocatalytic cell death of the host cell84. Single-cell methods, including sorting and genome amplification of single eukaryotic cells, were also used to identify and genome sequence five giant viruses associated with marine choanoflagellates85,86; comparative genomics together with all other NCLDV genomes revealed that viruses that infect hosts with similar trophic modes, including host habitat and lifestyles, express distinct genetic features86,87. Furthermore, mini-metagenomics analysis (Box 1) of a single forest soil sample led to the enrichment and discovery of 15 diverse giant virus metagenome-assembled genomes (MAGs), including several members of the Klosneuvirinae, highlighting an untapped diversity of giant viruses in soil88.

The most successful approach for obtaining NCLDV genomes from environmental sequence data is genome-resolved metagenomics (Box 1). Since the early 2000s, this approach has become common practice for recovering genomes of bacteria and archaea from complex environmental samples89, yet it took nearly another decade before the first giant virus MAGs (GVMAGs) appeared in public databases (Fig. 1). Yau et al. reconstructed the first GVMAGs as a by-product of their work on virophages in metagenomes from the Organic Lake in Antarctica90. Several years later, four additional potentially algae-associated GVMAGs were retrieved from environmental sequence data from Yellowstone Lake in Yellowstone National Park, United States; they were found to be related to the viral families Phycodnaviridae and Mimiviridae and shared some genes with virophages that co-occurred in the same sample91. Cultivation-independent approaches for the discovery of giant virus genome-centric sequence information gained traction when members of a Mimiviridae-affiliated subfamily, the proposed Klosneuvirinae, were recovered from metagenomic data92. The fact that these were found in metagenomes from freshwater and sewage samples originating from four different continents suggested this novel group of giant viruses is cosmopolitan92. More than 20 GVMAGs from the deep sea were subsequently discovered, including 15 affiliated with the Pithoviridae, indicating a surprisingly high prevalence of pithovirus-like viruses in the ocean93, followed by the discovery of additional, likely algae-associated freshwater giant viruses in samples collected from Dishui Lake, Shanghai, China94,95. The unique strength of cultivation-independent approaches for viral genomics and discovery became most evident when more than 2,000 GVMAGs were extracted from metagenome datasets generated from analyses of thousands of samples collected from diverse biomes82; an additional 500 GVMAGs from mainly marine systems were reconstructed shortly after96. The addition of the GVMAGs to the Nucleocytoviricota species tree led to an increase in phylogenetic diversity by more than tenfold and enabled a comprehensive update of the taxonomic framework of the Nucleocytoviricota26,82, in which the Mesomimiviridae makes up more than one-third of the observed diversity (Fig. 3). The addition of the new lineages also led to a substantial increase in the size of the Nucleocytoviricota pan-genome, which now comprises more than 900,000 proteins82. This translated to an extensively expanded repertoire of functional genes, providing not only many novel insights into how giant viruses may interact with their hosts and the environment but also generating compelling novel hypotheses about their evolutionary roles82,96,97,98.

Fig. 3: Expansion of the Nucleocytoviricota phylogenetic diversity through metagenomics.
figure 3

Two species trees of the Nucleocytoviricota are shown: the inner tree illustrates the diversity of Nucleocytoviricota based on the genomes of viral isolates, and the outer inverted tree highlights the expansion of species diversity through genomes derived from cultivation-independent sequencing approaches (black branches). Branches of virus isolates are coloured based on affiliation to taxonomic groups and extrapolated to the expanded diversity of the outer tree. Coloured bands show order-level taxonomy and coloured circles indicate the phylogenetic position of selected giant viruses that are further discussed in the text. Tree is rooted at Poxviridae. Species tree adapted from the taxonomic framework for Nucleocytoviricota lineages26. AaV, Aureococcus anophagefferens virus; CroV, Cafeteria roenbergensis virus; PBCV1, Paramecium bursaria chlorella virus 1.

Exploring the host range of giant viruses

Genome-resolved metagenomics enabled the discovery of thousands of viral genomes, of which many represented lineages divergent from viruses recovered by isolation or co-cultivation82,96 (Fig. 3). However, giant viruses recovered from metagenomes typically lack information on host organisms99. An approach to overcome this limitation is the detection of viruses and potential eukaryotic hosts co-occurring in the same sample. Furthermore, horizontal transfer of genetic material between viruses and their hosts is a common phenomenon and can go in both directions100,101,102, and the analysis of viral genes that may have been acquired through recent horizontal gene transfer (HGT) might identify host organisms. In the early days of giant virus metagenomics, read mapping-based co-occurrence analysis (Box 1) revealed that the presence of viral sequences in some marine samples was positively correlated with those of eukaryotic oomycetes80, which have not been found to be associated with NCLDVs. In another study, co-expression analysis of metatranscriptomic data revealed a strong connection between Aureococcus anophagefferens virus and its algal host, and also indicated that other Mimiviridae present in the same sample were likely associated with Aureococcus spp.103. This approach also linked Phycodnaviridae and Mimiviridae members to a wide range of marine microeukaryotes, including choanoflagellates, stramenopiles, diatoms, dinoflagellates and cercozoan algae103. In a different study, virus–host relationships were implied through the co-occurrence analysis of viral and eukaryotic PolB-encoding genes and the hypervariable V9 region of the eukaryotic 18S rRNA gene104. This approach was then applied to a comprehensive set of marine metagenomes collected during the Tara Oceans expedition, revealing that particular microeukaryotes belonging to the Alveolata, Opisthokonta, Rhizaria and Stramenopiles co-occurred with different NCLDV lineages104. In a similar study, a strong co-occurrence signal was detected between a virus belonging to the Mimiviridae and marine chrysophytes as its potential host105. Subsequent detection of putative HGT events between GVMAGs and chrysophyte genomes and transcriptomes provided further support for this host–virus relationship105. A systematic analysis of HGT candidates present in more than 2,000 NCLDV genomes, most of which were MAGs from diverse global sampling sites, revealed thousands of genes likely introduced into host chromosomes or derived from the host through recent HGT82. Based on these results, it was possible to propose connections between NCLDVs and members of all major eukaryotic phyla82. Although most of these predicted hosts have not yet been found to be infected by giant viruses, more than 20 previously isolated virus–host relationships were successfully predicted through recent HGT events, underlining the validity of this sequence inference-based approach to metagenome-assembled viral genomes (Fig. 4).

Fig. 4: Experimentally verified and computationally predicted host ranges of the Nucleocytoviricota.
figure 4

Host lineages identified through isolation with the native host, co-cultivation, single-cell sorting and in silico horizontal gene transfer-based predictions are shown. The black outline of coloured boxes indicates that an experimentally verified interaction has also been predicted computationally. Chloroplastida comprises both Streptophytina (this group includes some green algae) and Chlorophyta (this group includes most green algae). Topology of the eukaryotic species and eukaryotic taxonomy tree adapted from ref.40. CroV subfamily, viral subfamily-level clade in the Mimiviridae that contains Cafeteria roenbergensis virus; HaV family, family-level clade in the Algavirales that contains Heterosigma akashiwo virus; TSAR, Telonemia–Stramenopiles–Alveolata–Rhizaria supergroup.

Although sequence-based computational host predictions provide a means to expand the range of putative NCLDV hosts, the approaches have some potential challenges and biases. For example, co-occurrence analysis is dependent on sufficient host genome coverage for detection in metagenome data, and HGT analysis requires the availability of the host genomic sequences. Furthermore, it is difficult to detect ancient HGT from previous hosts. Another limitation to the analysis of the integration of NCLDV genes into host genomes can be the quality of the database used. For example, GVMAGs have been found mis-annotated as bacteria, archaea or eukaryotes in public databases, which hampers the use of automated tools for correct HGT detection82,106. Despite some of these limitations, expanding the putative host range of metagenome-derived NCLDVs provides a basis for targeted sampling of putative hosts, for the study of virus–host co-evolution and to identify viral-encoded functions for targeted modulation of host metabolism. Sequence-based inferences of viruses and their hosts may then be extrapolated to assess the impact of such interactions on global ecosystems.

From HGT to endogenization

Not only is HGT between viruses and their hosts a common phenomenon but some giant viruses can even integrate their entire genomes into the host chromosome (Fig. 4). This so-called endogenization is a mechanism observed for most eukaryotic viruses107,108. Arrays of NCLDV genes have occasionally been found in genomes of eukaryotes, in particular in algae, plants109,110,111 and amoebae112,113,114. A recent survey of published eukaryotic genomes and transcriptomes revealed the presence of giant virus genes in 66 different eukaryotes, including several Acanthamoeba species, flagellates, ciliates, stramenopiles, oomycetes, fungi, arthropods and diverse unicellular and multicellular algae115 (Fig. 4). Yet, for many of these eukaryotes, giant virus infections have not been observed. The integration of NCLDV genes often appears to be highly host specific, with viral genes detected in one eukaryotic species being unrelated to viral genes found in closely related species115. Among the integrated genes are NCLDV hallmark genes that are, in some instances, scattered throughout the host chromosome and, in others, co-localized in islands composed of more than 100 genes115. The integration of complete viral genomes has been described for some members of the Mesomimiviridae; for example, Ectocarpus siliculosus virus integrated into its brown algal host more than 20 years ago111 likely through use of integrases116. The related Phaeocystis globosa virus is a lysogenic virus that causes continuous infections117,118, which is in stark contrast to many other known NCLDV lineages that were successfully isolated based on the fact that they lyse their amoeba host5. The analysis of existing algal genomes and transcriptome data revealed other examples of whole giant virus genomes integrated into eukaryotic host chromosomes119. Some regions encoded more than 1,500 viral genes, making up to 10% of the genes of the green algal host119. Several of the detected viral genes were annotated as enzymes with roles in carbohydrate metabolism, chromatin remodelling, signal transduction, energy production and translation119.

It remains unknown whether integrated giant viruses are dormant with no or minimal benefit to the host, or whether the host cell benefits from some viral genes that may provide or fine-tune metabolic capabilities. Another unanswered question is whether there are mechanisms encoded in the integrated viral genome that may reactivate infection after transcribing and translating some of the integrated viral genes. This would then be followed by the release of the giant virus genetic material during host replication and effective dispersal to new hosts. If there is no reactivation of viral infection, giant virus genes decay over time, leading to rearrangements and pseudogenization107,112 and making their detection more challenging or impossible. Giant virus endogenization has been found mainly through the analysis of eukaryotic isolate genomes, but we anticipate that genome-resolved metagenomics of eukaryotes will further facilitate the discovery of many additional examples of this phenomenon. Future investigation of the integration of giant virus genes is expected to provide some answers for how endogenization has shaped and continues to shape the evolution and ecology of eukaryotic organisms.

Reprogramming of the host and its impact on host populations

Upon infection, a virus reprogrammes its host cell and turns it into a so-called virocell that supports viral replication120,121. Analogous to bacteriophages122,123, which are viruses (including large ones124) that infect bacteria, giant viruses seem to contribute genes to their hosts to augment and/or modulate the metabolic capabilities of the host cell (Fig. 5). The first described example was a virus-encoded hyaluronan synthase, encoded by Chlorella virus, that enabled its algal host to synthesize hyaluronan125. In addition, an active potassium channel encoded by Chlorella virus was found to be integrated into the host membrane during infection126. Another example is that of a host-derived nitrogen transporter in Ostreococcus tauri virus that is expressed during the infection of its green algal host127. Experimental characterization provided evidence that this transporter may increase the uptake of nitrogen by the host cell127. Other studies revealed the presence of fermentation genes in the Tetraselmis virus genome with possible implications for host metabolism in nutrient-limited marine systems28. A survey of giant virus isolates and MAGs revealed the widespread presence of genes for cytochrome P450 monooxygenases, potentially enabling or modulating complex metabolic processes such as the synthesis of sterols and other fatty acids98. Metagenome-informed experimental characterization of the distinctive cytochrome P450 of hokovirus did not reveal any sterols metabolized by the recombinant viral cytochrome P450 (ref.98). Distant homologues of eukaryotic actins (‘viractins’) and myosins (‘virmyosins’) have been found in NCLDV genomes in two recent studies128,129 and a preprint97, indicating that these viruses impact cell structure, motility and intracellular transport processes; however, further functional validation is needed. Furthermore, a giant virus related to Mesomimiviridae that infects heterotrophic choanoflagellates was found to encode type 1 rhodopsins together with the pathway for synthesis of the required pigment, β-carotene85. Metagenome-informed experimental characterization of the NCLDV rhodopsin showed that the putative rhodopsin likely functions as a proton pump, generating energy from light85. A phylogenetically distinct NCLDV rhodopsin was found in a GVMAG from Organic Lake, Antarctica, and experimental characterization of this protein revealed that it may function as a light-gated pentameric ion channel, potentially impacting ion homeostasis and phototaxis of the host cell130. Furthermore, through global metagenomics, it was predicted that genes encoding various substrate transport processes, energy generation through light (rhodopsins and genes involved in photosynthesis), carbon fixation and glycolysis are commonly found in GVMAGs affiliated with diverse lineages of the Nucleocytoviricota82,96 (Fig. 5). More detailed phylogenetic analysis revealed that some auxiliary metabolic genes encoding transporters for iron, phosphate, magnesium and ammonium originated in eukaryotic hosts and were likely recently acquired by giant viruses through HGT82,85,96. However, other genes encoding several rhodopsins, succinate hydrogenase, aconitase and glyceraldehyde 3-phosphate dehydrogenase showed a pattern that suggested a viral origin or a common evolutionary origin in one of the ancestral hosts82,85,96. Taken together, the widespread presence of metabolic genes in diverse NCLDV lineages implies that augmenting host metabolic capacities is likely a strategy more commonly used by NCLDVs than initially assumed. However, the current lack of experimental evidence of the functions and activities of most of these genes and pathways as well as their effects on the host cell demands further experimental investigation.

Fig. 5: Predicted metabolic reprogramming of a giant virus-derived virocell and consequences of giant virus infection for host populations.
figure 5

A hypothetical virocell is shown with a combination of metabolic roles that different giant viruses are predicted to have during host infection based on the presence of auxiliary metabolic genes in giant virus genomes. Darker shades of red denote metabolic roles that are supported by some functional data obtained through experiments, including a Paramecium bursaria chlorella virus 1 (PBCV1)-derived potassium channel126, an Ostreococcus tauri virus-derived ammonium transporter127, the light-driven proton pump encoded by Choanovirus85, and the light-gated ion channel encoded in the metagenome-assembled genome of Organic Lake Phycodnavirus (OLPV) from Antactica130. Also highlighted is Tetraselmis virus, which encodes fermentation genes. TCA, tricarboxylic acid. aExperimental validation has not been performed in the native virus host system. bThere is currently no experimental evidence for the function of these genes28.

Metabolic reprogramming has direct consequences on host population structure and dynamics. One striking example is the cosmopolitan marine coccolithophore Emiliania huxleyi, which forms massive blooms that play key roles in global carbon and sulfur cycles131. E. huxleyi populations are subject to persistent but ultimately lytic infections by the coccolithovirus Emiliania huxleyi virus24. Once lysis is induced, it leads to the termination of the algal bloom and the deposition of massive amounts of calcite and nutrients into the ocean, which increases the marine pool of dissolved organic matter132,133,134. Importantly, viral infections do not only lead to host lysis but also promote viral replication by rewiring host physiology, in particular the turnover of sugars and synthesis of fatty acids and lipids135,136,137. Comparably little is known about how host populations are impacted by giant viruses that were recovered through genome-resolved metagenomics but, considering the predicted host range of these viruses, it is conceivable that similar principles are omnipresent and are actively shaping the biomes and biogeochemical cycles of Earth.

Giant virus genomes encode hallmark genes of cellular life

Among the most intriguing features found in giant virus genomes are hallmark genes of cellular life such as tRNAs and genes involved in protein biosynthesis138. This phenomenon was first described upon sequencing the mimivirus genome9. Subsequent analyses revealed the phylogenetic placement of virus-encoded cellular genes between bacteria and eukaryotes, suggesting an ancient origin11. Other cellular hallmark genes with similarly deep branching patterns were found in other giant virus genomes and led to the hypotheses that giant viruses may either represent a fourth domain of life13 or are remnants of a highly degraded eukaryotic cell derived by reductive evolution12. The subsequent use of more complex phylogenetic models revealed that many of these genes had most likely been acquired from different eukaryotic hosts139,140,141. Some of these genes might represent ancient transfers from undiscovered eukaryotic hosts. This finding provided evidence for the hypothesis that giant viruses may have evolved from smaller viruses140. Yet, other studies have reported alternative topologies for some housekeeping and other metabolic genes of cellular organisms, including rhodopsins82,85,96 and cytochrome P450 (ref.98). It has also been proposed that such genes may have been transferred from ancestral giant viruses to past eukaryotic hosts, or even to a proto-eukaryote, highlighting a potentially integral role of giant viruses in the evolution of the eukaryotic cell142,143. Furthermore, it is possible that some genes that may function as part of the eukaryotic core metabolism were introduced upon integration of giant virus genetic material into the genome of an ancient eukaryotic cell, further shaping eukaryotic evolution142,144. The presence of genes for aminoacyl tRNA synthetases (aaRS) and eukaryotic translation factors has been recorded multiple times in newly recovered giant virus genomes. Indeed, a nearly complete set of 20 aaRS has been reported in klosneuvirus from metagenomic data92. Shortly after, two tupanviruses were isolated with genomes that contain a full set of aaRS and tRNAs7, and subsequently the first Klosneuvirinae isolates were described, of which one also contained a complete set of aaRS145. Especially in the Klosneuvirinae, the presence of aaRS with lineage-specific evolutionary histories provided additional support that these genes derived from different eukaryotic hosts92. The presence of genes for a complete set of aaRS is currently constrained to members of the Mimiviridae and information on the role of giant virus aaRS in host interactions is limited; however, some have been experimentally studied and were indeed functional146. There is even some experimental evidence for the potential roles of these genes in making giant viruses less dependent on host machinery, for example, during shutdown of host translation in response to viral infection or other adverse conditions147. On the other hand, a suspected role in enhancing viral translation by providing additional copies of aaRS to support host translation has not yet been confirmed. Additional hallmark genes of cellular life include those encoding for the four core histones33,76,148,149 and giant virus genes predicted to be involved in energy generation28,96. A recent study reported an active membrane potential in Pandoravirus massiliensis virions together with the expression of several remote homologues of tricarboxylic acid cycle genes150. Despite encoding functions that were recently thought to be exclusively present in cellular organisms, there is currently no evidence that giant viruses perform protein translation without host-derived ribosomes or host-independent energy generation.

Conclusions

Nearly 20 years of giant virus isolation has yielded viral isolates representing highly diverse lineages. Complementary detailed research on the biology of these viruses has revealed many important details of virion structures and infection strategies. It has become clear that there are stark differences in virion size and structure and, although there are some similarities in how these viruses enter and exit the host cell, most giant viruses employ contrasting strategies for replicating within and exploiting their host cells. Sequencing of viral isolates has led to the discovery of the largest and smallest known genomes of viruses of the Nucleocytoviricota.

Cultivation-independent approaches have accelerated the discovery of genome sequences of new giant viruses and other large viruses in the Nucleocytoviricota, providing novel insights into their phylogenetic diversity and functional potential. Metagenomics also revealed that these viruses can be found nearly anywhere on Earth, are affiliated with diverse eukaryotes and are likely modifying host physiology through metabolic reprogramming, ultimately altering the structure and function of host communities in the environment. At the same time, estimates based on NCLDV hallmark genes in metagenomic datasets indicated that only a small fraction of giant virus genomes have been discovered so far82 and that the diversity of giant viruses may be far greater than that of bacteria, at least in the oceans81. A controlled metagenomic binning experiment where giant viruses were spiked into an environmental sample showed that genome fragments of many giant viruses that are present in a given sample likely remain below the detection limit, highlighting the need for ultra-deep metagenome sequencing151 or targeted isolation efforts52. Furthermore, there is a strong bias towards detecting giant viruses that are similar to those already known, as tools used to identify viruses from metagenomes rely heavily on features observed in sequenced NCLDV genomes such as large sets of conserved genes82,93,96,152,153. However, giant virus genomes exhibit extensive plasticity, such that viruses within the same clade quickly diverge and share very few genes30. A recent stunning example of NCLDV diversity is yaravirus, which was isolated with its native amoeba host154, yet no closely related sequences were detectable in public metagenomic datasets. Its placement within NCLDV was difficult owing to more than 90% of its genes lacking similarity to those in public databases and the paucity of most viral hallmark genes154, and its placement within the Nucleocytoviricota is currently still under debate. Furthermore, a recent preprint described the genome-resolved metagenomic-based discovery of the Proculviricetes and Mirusviricetes from marine systems, which might be two class-level novel lineages within the Nucleocytoviricota that lack most of the typical viral hallmark genes155. Taken together, the excessive gene novelty of viruses in the Nucleocytoviricota, observed through both cultivation and cultivation-independent methods, further underlines that many giant viruses are likely to be hiding in plain sight.