Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Widespread endogenization of giant viruses shapes genomes of green algae

Abstract

Endogenous viral elements (EVEs)—viruses that have integrated their genomes into those of their hosts—are prevalent in eukaryotes and have an important role in genome evolution1,2. The vast majority of EVEs that have been identified to date are small genomic regions comprising a few genes2, but recent evidence suggests that some large double-stranded DNA viruses may also endogenize into the genome of the host1. Nucleocytoplasmic large DNA viruses (NCLDVs) have recently become of great interest owing to their large genomes and complex evolutionary origins3,4,5,6, but it is not yet known whether they are a prominent component of eukaryotic EVEs. Here we report the widespread endogenization of NCLDVs in diverse green algae; these giant EVEs reached sizes greater than 1 million base pairs and contained as many as around 10% of the total open reading frames in some genomes, substantially increasing the scale of known viral genes in eukaryotic genomes. These endogenized elements often shared genes with host genomic loci and contained numerous spliceosomal introns and large duplications, suggesting tight assimilation into host genomes. NCLDVs contain large and mosaic genomes with genes derived from multiple sources, and their endogenization represents an underappreciated conduit of new genetic material into eukaryotic lineages that can substantially impact genome composition.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Distribution and general features of the GEVEs.
Fig. 2: Signatures of endogenization.
Fig. 3: Evolutionary history of the GEVEs.

Similar content being viewed by others

Data availability

Nucleotide and protein sequences specific to each of the GEVEs, hallmark gene set used for phylogenetic analyses, alignments for all phylogenies presented, HMM profiles of the core genes and NCVOG families, and other data products are available at: https://zenodo.org/record/3975964#.XzFj0hl7mfZ.

Code availability

A custom bioinformatic pipeline (ViralRecall) was developed in Python 3.5 for purposes of this study. This code is already publicly available on GitHub for the Aylward lab: https://github.com/faylward/viralrecall. For NCLDV marker gene detection, we also used a custom Python script available on GitHub: https://github.com/faylward/ncldv_markersearch. Other bioinformatic analyses performed in this study were done using publicly available bioinformatic tools and are described in the Methods.

References

  1. Feschotte, C. & Gilbert, C. Endogenous viruses: insights into viral evolution and impact on host biology. Nat. Rev. Genet. 13, 283–296 (2012).

    Article  CAS  PubMed  Google Scholar 

  2. Holmes, E. C. The evolution of endogenous viral elements. Cell Host Microbe 10, 368–377 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Fischer, M. G. Giant viruses come of age. Curr. Opin. Microbiol. 31, 50–57 (2016).

    Article  PubMed  Google Scholar 

  4. Wilhelm, S. W. et al. A student’s guide to giant viruses infecting small eukaryotes: from Acanthamoeba to zooxanthellae. Viruses 9, 46 (2017).

    Article  PubMed Central  CAS  Google Scholar 

  5. Abergel, C., Legendre, M. & Claverie, J.-M. The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus. FEMS Microbiol. Rev. 39, 779–796 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Weynberg, K. D., Allen, M. J. & Wilson, W. H. Marine prasinoviruses and their tiny plankton hosts: a review. Viruses 9, 43 (2017).

    Article  PubMed Central  CAS  Google Scholar 

  7. Bhattacharya, D. & Medlin, A. L. Algal phylogeny and the origin of land plants. Plant Physiol. 116, 9–15 (1998).

    Article  CAS  PubMed Central  Google Scholar 

  8. Jeanniard, A. et al. Towards defining the chloroviruses: a genomic journey through a genus of large DNA viruses. BMC Genomics 14, 158 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Moniruzzaman, M., Martinez-Gutierrez, C. A., Weinheimer, A. R. & Aylward, F. O. Dynamic genome evolution and complex virocell metabolism of globally-distributed giant viruses. Nat. Commun. 11, 1710 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  10. Filée, J. Genomic comparison of closely related giant viruses supports an accordion-like model of evolution. Front. Microbiol. 6, 593 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. Van Etten, J. L. et al. Chloroviruses have a sweet tooth. Viruses 9, 88 (2017).

    Article  PubMed Central  CAS  Google Scholar 

  12. Schvarcz, C. R. & Steward, G. F. A giant virus infecting green algae encodes key fermentation genes. Virology 518, 423–433 (2018).

    Article  CAS  PubMed  Google Scholar 

  13. Sun, C., Feschotte, C., Wu, Z. & Mueller, R. L. DNA transposons have colonized the genome of the giant virus Pandoravirus salinus. BMC Biol. 13, 38 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Marcet-Houben, M. & Gabaldón, T. Acquisition of prokaryotic genes by fungal genomes. Trends Genet. 26, 5–8 (2010).

    Article  CAS  PubMed  Google Scholar 

  15. Rossoni, A. W. et al. The genomes of polyextremophilic cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions. eLife 8, e45017 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Filée, J. Multiple occurrences of giant virus core genes acquired by eukaryotic genomes: the visible part of the iceberg? Virology 466–467, 53–59 (2014).

    Article  PubMed  CAS  Google Scholar 

  17. Maumus, F. & Blanc, G. Study of gene trafficking between Acanthamoeba and giant viruses suggests an undiscovered family of amoeba-infecting viruses. Genome Biol. Evol. 8, 3351–3363 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Gallot-Lavallée, L. & Blanc, G. A glimpse of nucleo-cytoplasmic large DNA virus biodiversity through the eukaryotic genomics window. Viruses 9, 17 (2017).

    Article  PubMed Central  Google Scholar 

  19. Maumus, F., Epert, A., Nogué, F. & Blanc, G. Plant genomes enclose footprints of past infections by giant virus relatives. Nat. Commun. 5, 4268 (2014).

    Article  ADS  CAS  PubMed  Google Scholar 

  20. Guglielmini, J., Woo, A. C., Krupovic, M., Forterre, P. & Gaia, M. Diversification of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proc. Natl Acad. Sci. USA 116, 19585–19592 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Forterre, P. & Gaïa, M. Giant viruses and the origin of modern eukaryotes. Curr. Opin. Microbiol. 31, 44–49 (2016).

    Article  PubMed  Google Scholar 

  22. Piacente, F., Gaglianone, M., Laugieri, M. E. & Tonetti, M. G. The autonomous glycosylation of large DNA viruses. Int. J. Mol. Sci. 16, 29315–29328 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Schulz, F. et al. Giant virus diversity and host interactions through global metagenomics. Nature 578, 432–436 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  24. Abrahão, J. et al. Tailed giant Tupanvirus possesses the most complete translational apparatus of the known virosphere. Nat. Commun. 9, 749 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  25. Wilson, W. H. et al. Complete genome sequence and lytic phase transcription profile of a CoccolithovirusScience 309, 1090–1092 (2005).

    Article  ADS  CAS  Google Scholar 

  26. Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016). 

    Article  CAS  Google Scholar 

  27. Koonin, E. V. & Krupovic, M. The depths of virus exaptation. Curr. Opin. Virol. 31, 1–8 (2018).

    Article  CAS  PubMed  Google Scholar 

  28. Ochman, H., Lawrence, J. G. & Groisman, E. A. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 299–304 (2000).

    Article  ADS  CAS  PubMed  Google Scholar 

  29. Groisman, E. A. & Ochman, H. Pathogenicity islands: bacterial evolution in quantum leaps. Cell 87, 791–794 (1996).

    Article  CAS  PubMed  Google Scholar 

  30. Martin, W. F. Too much eukaryote LGT. BioEssays 39, 1700115 (2017).

    Article  Google Scholar 

  31. Keeling, P. J. & Palmer, J. D. Horizontal gene transfer in eukaryotic evolution. Nat. Rev. Genet. 9, 605–618 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Cock, J. M. et al. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465, 617–621 (2010).

    Article  ADS  CAS  PubMed  Google Scholar 

  33. Delaroque, N., Maier, I., Knippers, R. & Müller, D. G. Persistent virus integration into the genome of its algal host, Ectocarpus siliculosus (Phaeophyceae). J. Gen. Virol. 80, 1367–1370 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. Delaroque, N. & Boland, W. The genome of the brown alga Ectocarpus siliculosus contains a series of viral DNA pieces, suggesting an ancient association with large dsDNA viruses. BMC Evol. Biol. 8, 110 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  37. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Yutin, N., Wolf, Y. I., Raoult, D. & Koonin, E. V. Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol. J. 6, 223 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. Filée, J., Siguier, P. & Chandler, M. I am what I eat and I eat what I am: acquisition of bacterial genes by giant viruses. Trends Genet. 23, 10–15 (2007).

    Article  PubMed  CAS  Google Scholar 

  40. Filée, J., Pouget, N. & Chandler, M. Phylogenetic evidence for extensive lateral acquisition of cellular genes by nucleocytoplasmic large DNA viruses. BMC Evol. Biol. 8, 320 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Hoff, K. J. & Stanke, M. Predicting genes in single genomes with AUGUSTUS. Curr. Protoc. Bioinformatics 65, e57 (2019).

    PubMed  Google Scholar 

  42. Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33, W465–W467 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).

    Article  CAS  PubMed  Google Scholar 

  44. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. Federhen, S. The NCBI Taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).

    Article  CAS  PubMed  Google Scholar 

  47. Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Pagès, H., Aboyoun, P., Gentleman, R. & DebRoy, S. Biostrings: efficient manipulation of biological strings. R package version 2.56.0  https://bioconductor.org/packages/Biostrings (2020).

  49. Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 12, 1269–1276 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Haft, D. H. et al. TIGRFAMs: a protein family resource for the functional identification of proteins. Nucleic Acids Res. 29, 41–43 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).

    Article  CAS  PubMed  Google Scholar 

  54. Moniruzzaman, M. et al. Virus–host relationships of marine single-celled eukaryotes resolved from metatranscriptomics. Nat. Commun. 8, 16054 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).

    Article  CAS  PubMed  Google Scholar 

  56. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Lechner, M. et al. Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinformatics 12, 124 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Csardi G, N. T. The igraph software package for complex network research. InterJournal Complex Systems 1695, 1–9 (2006).

  61. Burns, J. A., Paasch, A., Narechania, A. & Kim, E. Comparative genomics of a bacterivorous green algae reveals evolutionary causalities and consequences of phago-mixotrophic mode of nutrition. Genome Biol. Ecol. 7, 3047–3061 (2015).

    Article  CAS  Google Scholar 

  62. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  CAS  PubMed  Google Scholar 

  65. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    Article  CAS  PubMed  Google Scholar 

  66. Martinez-Gutierrez, C. A. & Aylward, F. O. Strong purifying selection is associated with genome streamlining in epipelagic Marinimicrobia. Genome Biol. Evol. 11, 2887–2894 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).

    Article  CAS  PubMed  Google Scholar 

  68. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  69. Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Burns from the Bigelow Laboratory of Ocean Sciences and E. Kim from the American Museum of Natural History for providing access to the RNA sequencing data of C. tetramitiformis. We acknowledge use of the Virginia Tech Advanced Research Computing Center for bioinformatic analyses performed in this study. This work was supported by a Simons Early Career Investigator Award in Marine Microbial Ecology and Evolution (grant no. 620443) and NSF grant IIBR-1918271 to F.O.A.

Author information

Authors and Affiliations

Authors

Contributions

F.O.A. and M.M. designed the project and wrote the paper. M.M. curated GEVEs, performed gene annotations and phylogenetic analysis. A.R.W. performed the GEVE protein annotations. C.A.M.-G. performed the dN/dS analysis.

Corresponding author

Correspondence to Frank O. Aylward.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Chantal Abergel, Matthew Sullivan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Workflow for GEVE detection.

Overview of the initial steps to identify virus-like regions in chlorophyte genomes and subsequent steps to curate Giant Endogenous Viral Elements (GEVEs). Steps in the grey box are implemented in the ViralRecall tool; steps outside this box represent additional analyses we performed to validate our findings and further analyse the GEVEs.

Extended Data Fig. 2 General features of additional GEVEs.

Circular genome plots of 6 additional GEVEs (apart from those shown in Fig. 1b) showing NCVOG HMM hits, spliceosomal intron locations, and best LAST hit matches. Black dots atop the outermost track mark the locations of the core genes, while the blue links inside the circles represent duplicated regions. The grey shading demarcates the location of integrated GEVE as determined by ViralRecall in case of Chlorella and Tetradesmus obliquus.

Extended Data Fig. 3 GEVEs have coding potential similar to known giant viruses.

a, Principal component analysis (PCA) of the coding potential of the GEVE genomes, corresponding host genomes and reference giant viruses based on the presence/absence of Nucleocytoplasmic virus orthologous group (NCVOG) specific proteins in these genomes. The plot demonstrates the similarity in coding content of GEVEs and reference giant viruses, whereas the eukaryotic hosts are distinct in terms of coding potential. Nonviral chlorophyte host chromosomes have a much more scattered distribution due to the sporadic occurrence and low abundance of some NCVOGs in these genomes (ankyrin repeat proteins and transposons are represented in NCVOGs and are present in the nonviral portion of host chromosomes, for example). Eukaryotic-specific proteins are not included in NCVOGs, and so the host chlorophyte genomes don’t show tight clustering, since this aspect of their genomic repertoires is not captured by NCVOGs. The prcomp() function in R was used to calculate the values. b, Bipartite network of 18 GEVEs and 126 reference giant viruses based on shared gene content. The network is constructed by profiling the presence of NCVOGs across all the virus and GEVE genomes represented. Large nodes represent NCLDV or GEVE genomes, smaller nodes represent NCVOG protein families and edges denote gene families represented in different genomes.

Extended Data Fig. 4 Example of gene prediction approach within the GEVEs.

Genes predicted by AUGUSTUS (outer ring, brown) and non-overlapping Prodigal predicted genes (middle ring, green) in the GEVEs within Chlamydomoans eustigma and Tetrabaena socialis are shown as examples. In most cases, Prodigal predicted many genes that were not detected by eukaryotic gene prediction algorithms. Many of the Prodigal predicted genes originally missed by AUGUSTUS have hits to NCVOGs (innermost right, purple) - including NCLDV core genes.

Extended Data Fig. 5 Level of duplications and core gene copy numbers in GEVE genomes versus reference giant virus genomes.

The left panel shows duplication level (repeated genomic regions at >90% nucleotide similarity) as estimated using RECON 1.08. The right panel shows copy numbers of NCLDV core genes in each of the GEVEs and reference genomes (see Methods for details).

Extended Data Fig. 6 Signature of relaxed selection in the GEVEs compared to free viruses.

Violin plot representing median dN/dS values of endogenized and free reference giant viruses. Statistical significance of differences between dN/dS values of the compared groups according to a non-paired, one-sided Mann–Whitney Wilcoxon test is denoted by: ***P < 0.0001. ‘W’ denotes the Wilcoxon test statistic. For this test 79 values were for GEVE-GEVE dN/dS values and 775 were for comparisons between free viruses. The IDs of the reference genomes used for calculating the dN/dS values are provided in Supplementary Data 6.

Extended Data Fig. 7 Expression profiles of GEVE genes.

Selected set of expressed genes in 6 of the GEVEs. For each GEVE, up to 15 genes with highest expressions are shown, with exception of Tetrabaena socialis GEVE_1, for which all genes having >1 expression coverage are presented. For a particular gene, expression is measured as the average read mapping coverage of the CDS(s) in that gene. Genes having putative functions (based on PFAM or COG annotations) are shown in red, while mobile elements are shown in blue.

Extended Data Fig. 8 Functional potential coded by the GEVEs.

Functional profiles (EggNOG) of the GEVEs normalized across all the NOG functional categories except category S (Function unknown). No gene was found to be in category R (General function prediction only). Number of genes having no hits or in category S (Function unknown) are shown in the table on the right.

Extended Data Table 1 NCLDV hallmark genes in diverse chlorophyte genomes without GEVEs
Extended Data Table 2 GEVE feature summaries

Supplementary information

Supplementary Information

This file contains the following: a) Supplementary results and discussion with references. b) Supplementary figures with captions describing each figure. c) Supplementary tables with captions describing each table.

Reporting Summary

Supplementary Data

Supplementary Data 1: Information on the genomes analysed in this study. FTP download link are provided for each of the genomes.

Supplementary Data

Supplementary Data 2: Summary statistics for individual contigs in each of the viral elements (GEVEs) analysed.

Supplementary Data

Supplementary Data 3: Average amino acid identities (AAI) between each pair of GEVEs.

Supplementary Data

Supplementary Data 4: Functional annotation for each of the GEVEs obtained using a number of protein family databases. Databases used are: COG, PFam, EggNOG, VOG, TIGR and EggVOG. See ‘Methods’ for references for all these databases.

Supplementary Data

Supplementary Data 5: Annotation and expression values of the expressed genes in six of the GEVEs. Annotations are only provided for the genes which had hits to different databases (as specified in Supplementary Data 4).

Supplementary Data

Supplementary Data 6: Genome IDs of the reference NCLDVs that were used to calculate dN/dS values in the Phycodnaviridae and Mimiviridae group. The reference genomes can be accessed from the study cited in the ‘Calculation of dN/dS ratios’ sub-section in the ‘Methods’.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moniruzzaman, M., Weinheimer, A.R., Martinez-Gutierrez, C.A. et al. Widespread endogenization of giant viruses shapes genomes of green algae. Nature 588, 141–145 (2020). https://doi.org/10.1038/s41586-020-2924-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-020-2924-2

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing