Histones and associated chromatin proteins have essential functions in eukaryotic genome organization and regulation. Despite this fundamental role in eukaryotic cell biology, we lack a phylogenetically comprehensive understanding of chromatin evolution. Here, we combine comparative proteomics and genomics analysis of chromatin in eukaryotes and archaea. Proteomics uncovers the existence of histone post-translational modifications in archaea. However, archaeal histone modifications are scarce, in contrast with the highly conserved and abundant marks we identify across eukaryotes. Phylogenetic analysis reveals that chromatin-associated catalytic functions (for example, methyltransferases) have pre-eukaryotic origins, whereas histone mark readers and chaperones are eukaryotic innovations. We show that further chromatin evolution is characterized by expansion of readers, including capture by transposable elements and viruses. Overall, our study infers detailed evolutionary history of eukaryotic chromatin: from its archaeal roots, through the emergence of nucleosome-based regulation in the eukaryotic ancestor, to the diversification of chromatin regulators and their hijacking by genomic parasites.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD031991.
Code for reproducing the analysis is available in our laboratory Github repository (https://github.com/sebepedroslab/chromatin-evolution-analysis).
Struhl, K. Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell 98, 1–4 (1999).
Kornberg, R. D. & Lorch, Y. Primary role of the nucleosome. Mol. Cell 79, 371–375 (2020).
Jenuwein, T. & Allis, C. D. Translating the histone code. Science 293, 1074–1080 (2001).
Berger, S. L. The complex language of chromatin regulation during transcription. Nature 447, 407–412 (2007).
Banaszynski, L. A., Allis, C. D. & Lewis, P. W. Histone variants in metazoan development. Dev. Cell 19, 662–674 (2010).
Allis, C. D. & Jenuwein, T. The molecular hallmarks of epigenetic control. Nat. Rev. Genet. 17, 487–500 (2016).
Sultana, T. et al. The landscape of L1 retrotransposons in the human genome is shaped by pre-insertion sequence biases and post-insertion selection. Mol. Cell 74, 555–570 (2019).
Gangadharan, S., Mularoni, L., Fain-Thornton, J., Wheelan, S. J. & Craig, N. L. DNA transposon Hermes inserts into DNA in nucleosome-free regions in vivo. Proc. Natl Acad. Sci. USA 107, 21966–21972 (2010).
Shinn, P. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).
Goodier, J. L. Restricting retrotransposons: a review. Mob. DNA 7, 16 (2016).
Molaro, A. & Malik, H. S. Hide and seek: how chromatin-based pathways silence retroelements in the mammalian germline. Curr. Opin. Genet. Dev. 37, 51–58 (2016).
Malik, H. S. & Henikoff, S. Phylogenomics of the nucleosome. Nat. Struct. Biol. 10, 882–891 (2003).
Talbert, P. B. & Henikoff, S. Histone variants—ancient wrap artists of the epigenome. Nat. Rev. Mol. Cell Biol. 11, 264–275 (2010).
Soboleva, T. A., Nekrasov, M., Ryan, D. P. & Tremethick, D. J. Histone variants at the transcription start-site. Trends Genet. 30, 199–209 (2014).
Zink, L.-M. & Hake, S. B. Histone variants: nuclear function and disease. Curr. Opin. Genet. Dev. 37, 82–89 (2016).
Weber, C. M. & Henikoff, S. Histone variants: dynamic punctuation in transcription. Genes Dev. 28, 672–682 (2014).
Borg, M., Jiang, D. & Berger, F. Histone variants take center stage in shaping the epigenome. Curr. Opin. Plant Biol. 61, 101991 (2021).
Zentner, G. E. & Henikoff, S. Regulation of nucleosome dynamics by histone modifications. Nat. Struct. Mol. Biol. 20, 259–266 (2013).
Campos, E. I. & Reinberg, D. Histones: annotating chromatin. Annu. Rev. Genet. 43, 559–599 (2009).
Strahl, B. D. & Allis, C. D. The language of covalent histone modifications. Nature 403, 41–45 (2000).
Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).
Talbert, P. B. & Henikoff, S. The yin and yang of histone marks in transcription. Annu. Rev. Genom. Hum. Genet. 22, 147–170 (2021).
Taverna, S. D., Li, H., Ruthenburg, A. J., Allis, C. D. & Patel, D. J. How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers. Nat. Struct. Mol. Biol. 14, 1025–1040 (2007).
Musselman, C. A., Lalonde, M.-E., Côté, J. & Kutateladze, T. G. Perceiving the epigenetic landscape through histone readers. Nat. Struct. Mol. Biol. 19, 1218–1227 (2012).
Gurard-Levin, Z. A., Quivy, J.-P. & Almouzni, G. Histone chaperones: assisting histone traffic and nucleosome dynamics. Annu. Rev. Biochem. 83, 487–517 (2014).
Burgess, R. J. & Zhang, Z. Histone chaperones in nucleosome assembly and human disease. Nat. Struct. Mol. Biol. 20, 14–22 (2013).
Koster, M. J. E., Snel, B. & Timmers, H. T. M. Genesis of chromatin and transcription dynamics in the origin of species. Cell 161, 724–736 (2015).
Hargreaves, D. C. & Crabtree, G. R. ATP-dependent chromatin remodeling: genetics, genomics and mechanisms. Cell Res. 21, 396–420 (2011).
Gornik, S. G. et al. Loss of nucleosomal DNA condensation coincides with appearance of a novel nuclear protein in dinoflagellates. Curr. Biol. 22, 2303–2312 (2012).
Mattiroli, F. et al. Structure of histone-based chromatin in Archaea. Science 357, 609–612 (2017).
Warnecke, T., Becker, E. A., Facciotti, M. T., Nislow, C. & Lehner, B. Conserved substitution patterns around nucleosome footprints in eukaryotes and Archaea derive from frequent nucleosome repositioning through evolution. PLoS Comput. Biol. 9, e1003373 (2013).
Ammar, R. et al. Chromatin is an ancient innovation conserved between Archaea and Eukarya. eLife 1, e00078 (2012).
Rojec, M., Hocher, A., Merkenschlager, M. & Warnecke, T. Chromatinization of Escherichia coli with archaeal histones.eLife 8, e49038 (2019).
Forbes, A. J. et al. Targeted analysis and discovery of posttranslational modifications in proteins from methanogenic archaea by top-down MS. Proc. Natl Acad. Sci. USA 101, 2678–2683 (2004).
Weidenbach, K. et al. Deletion of the archaeal histone in Methanosarcina mazei Gö1 results in reduced growth and genomic transcription. Mol. Microbiol. 67, 662–671 (2008).
Talbert, P. B., Meers, M. P. & Henikoff, S. Old cogs, new tricks: the evolution of gene expression in a chromatin context. Nat. Rev. Genet. 20, 283–297 (2019).
de Mendoza, A. & Sebe-Pedros, A. Origin and evolution of eukaryotic transcription factors. Curr. Opin. Genet. Dev. 59, 25–32 (2019).
Schwaiger, M. et al. Evolutionary conservation of the eumetazoan gene regulatory landscape. Genome Res. 24, 639–650 (2014).
Sebé-Pedrós, A. et al. Early metazoan cell type diversity and the evolution of multicellular gene regulation. Nat. Ecol. Evol. 2, 1176–1188 (2018).
Connolly, L. R., Smith, K. M. & Freitag, M. The Fusarium graminearum histone H3 K27 methyltransferase KMT6 regulates development and expression of secondary metabolite gene clusters. PLoS Genet. 9, e1003916 (2013).
Jamieson, K., Rountree, M. R., Lewis, Z. A., Stajich, J. E. & Selker, E. U. Regional control of histone H3 lysine 27 methylation in Neurospora. Proc. Natl Acad. Sci. USA 110, 6027–6032 (2013).
Sebé-Pedrós, A. et al. The dynamic regulatory genome of Capsaspora and the origin of animal multicellularity. Cell 165, 1224–1237 (2016).
Bourdareau, S. et al. Histone modifications during the life cycle of the brown alga Ectocarpus. Genome Biol. 22, 12 (2021).
Wang, S. Y. et al. Role of epigenetics in unicellular to multicellular transition in Dictyostelium. Genome Biol. 22, 134 (2021).
Taverna, S. D., Coyne, R. S. & Allis, C. D. Methylation of histone H3 at lysine 9 targets programmed DNA elimination in Tetrahymena. Cell 110, 701–711 (2002).
Garcia, B. A. et al. Organismal differences in post-translational modifications in histones H3 and H4. J. Biol. Chem. 282, 7641–7655 (2007).
Drinnenberg, I. A. et al. EvoChromo: towards a synthesis of chromatin biology and evolution. Development 146, dev178962 (2019).
Draizen, E. J. et al. HistoneDB 2.0: a histone database with variants—an integrated resource to explore histones and their variants. Database 2016, baw014 (2016).
Maile, T. M. et al. Mass spectrometric quantification of histone post-translational modifications by a hybrid chemical labeling method. Mol. Cell. Proteom. 14, 1148–1158 (2015).
Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707–719 (2007).
Rajagopal, N. et al. Distinct and predictive histone lysine acetylation patterns at promoters, enhancers, and gene bodies. Genes Genomes Genet. 4, 2051–2063 (2014).
Koonin, E. V. & Yutin, N. The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect. Biol. 6, a016188–a016188 (2014).
Sandman, K. & Reeve, J. N. Archaeal histones and the origin of the histone fold. Curr. Opin. Microbiol. 9, 520–525 (2006).
Pereira, S. L., Grayling, R. A., Lurz, R. & Reeve, J. N. Archaeal nucleosomes. Proc. Natl Acad. Sci. USA 94, 12633–12637 (1997).
Henneman, B., van Emmerik, C., van Ingen, H. & Dame, R. T. Structure and function of archaeal histones. PLoS Genet. 14, e1007582 (2018).
Imachi, H. et al. Isolation of an archaeon at the prokaryote-eukaryote interface. Nature 577, 519–525 (2020).
Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).
Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541, 353–358 (2017).
Da Cunha, V., Gaia, M., Nasir, A. & Forterre, P. Asgard archaea do not close the debate about the universal tree of life topology. PLoS Genet. 14, e1007215 (2018).
Alva, V. & Lupas, A. N. Histones predate the split between bacteria and archaea. Bioinformatics 35, 2349–2353 (2019).
Allis, C. D. et al. New nomenclature for chromatin-modifying enzymes. Cell 131, 633–636 (2007).
Wu, F. et al. Unique mobile elements and scalable gene flow at the prokaryote–eukaryote boundary revealed by circularized Asgard archaea genomes. Nat. Microbiol. 7, 200–212 (2022).
Schuettengruber, B., Bourbon, H.-M., Di Croce, L. & Cavalli, G. Genome regulation by polycomb and trithorax: 70 years and counting. Cell 171, 34–57 (2017).
Dion, M. F., Altschuler, S. J., Wu, L. F. & Rando, O. J. Genomic characterization reveals a simple histone H4 acetylation code. Proc. Natl Acad. Sci. USA 102, 5501–5506 (2005).
de Jong, J. et al. Chromatin landscapes of retroviral and transposon integration profiles. PLoS Genet. 10, e1004250 (2014).
Sultana, T., Zamborlini, A., Cristofari, G. & Lesage, P. Integration site selection by retroviruses and transposable elements in eukaryotes. Nat. Rev. Genet. 18, 292–308 (2017).
Gao, X., Hou, Y., Ebina, H., Levin, H. L. & Voytas, D. F. Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 18, 359–369 (2008).
Cosby, R. L. et al. Recurrent evolution of vertebrate transcription factors by transposase capture. Science 371, eabc6405 (2021).
Cordaux, R., Udit, S., Batzer, M. A. & Feschotte, C. Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proc. Natl Acad. Sci. USA 103, 8101–8106 (2006).
Fiedler, M. et al. Decoding of methylated histone H3 tail by the Pygo-BCL9 Wnt signaling complex. Mol. Cell 30, 507–518 (2008).
Erives, A. J. Phylogenetic analysis of the core histone doublet and DNA topo II genes of Marseilleviridae: evidence of proto-eukaryotic provenance. Epigenetics Chromatin 10, 55 (2017).
Liu, Y. et al. Virus-encoded histone doublets are essential and form nucleosome-like structures. Cell 184, 4237–4250 (2021).
Valencia-Sánchez, M. I. et al. The structure of a virus-encoded nucleosome. Nat. Struct. Mol. Biol. 28, 413–417 (2021).
Iyer, L. M., Balaji, S., Koonin, E. V. & Aravind, L. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses. Virus Res. 117, 156–184 (2006).
Nagamine, T. Apoptotic arms races in insect-baculovirus coevolution. Physiol. Entomol. 47, 1–10 (2021).
Starrett, G. J. et al. Adintoviruses: a proposed animal-tropic family of midsize eukaryotic linear dsDNA (MELD) viruses. Virus Evol. 7, veaa055 (2021).
Hocher, A. et al. Growth temperature is the principal driver of chromatinization in archaea. Preprint at bioRxiv https://doi.org/10.1101/2021.07.08.451601 (2021).
Alpha-Bazin, B. et al. Lysine-specific acetylated proteome from the archaeon Thermococcus gammatolerans reveals the presence of acetylated histones. J. Proteom. 232, 104044 (2021).
Eme, L., Spang, A., Lombard, J., Stairs, C. W. & Ettema, T. J. G. Archaea and the origin of eukaryotes. Nat. Rev. Microbiol. 15, 711–723 (2017).
Akıl, C. & Robinson, R. C. Genomes of Asgard archaea encode profilins that regulate actin. Nature 562, 439–443 (2018).
Koonin, E. V. The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biol 11, 209 (2010).
Sebé-Pedrós, A., Grau-Bové, X., Richards, T. A. & Ruiz-Trillo, I. Evolution and classification of myosins, a paneukaryotic whole-genome approach. Genome Biol. Evol. 6, 290–305 (2014).
Richards, T. A. & Cavalier-Smith, T. Myosin domain evolution and the primary divergence of eukaryotes. Nature 436, 1113–1118 (2005).
Wickstead, B., Gull, K. & Richards, T. Patterns of kinesin evolution reveal a complex ancestral eukaryote with a multifunctional cytoskeleton. BMC Evol. Biol. 10, 110 (2010).
Dacks, J. B. & Field, M. C. Evolution of the eukaryotic membrane-trafficking system: origin, tempo and mode. J. Cell Sci. 120, 2977–2985 (2007).
Collins, L. & Penny, D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol. Biol. Evol. 22, 1053–1066 (2005).
Grau-Bové, X., Sebé-Pedrós, A. & Ruiz-Trillo, I. The eukaryotic ancestor had a complex ubiquitin signaling system of archaeal origin. Mol. Biol. Evol. 32, 726–739 (2015).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Ho, J. W. K. et al. Comparative analysis of metazoan chromatin organization. Nature 512, 449–452 (2014).
Montgomery, S. A. et al. Chromatin organization in early land plants reveals an ancestral association between H3K27me3, transposons, and constitutive heterochromatin. Curr. Biol. 30, 573–588 (2020).
Frapporti, A. et al. The Polycomb protein Ezl1 mediates H3K9 and H3K27 methylation to repress transposable elements in Paramecium. Nat. Commun. 10, 2710 (2019).
Lennartsson, A. & Ekwall, K. Histone modification patterns and epigenetic codes. Biochim. Biophys. Acta 1790, 863–868 (2009).
Peterson, C. L. & Laniel, M.-A. Histones and histone modifications. Curr. Biol. 14, R546–R551 (2004).
Rando, O. J. Combinatorial complexity in chromatin structure and function: revisiting the histone code. Curr. Opin. Genet. Dev. 22, 148–155 (2012).
de Mendoza, A., Pflueger, J. & Lister, R. Capture of a functionally active methyl-CpG binding domain by an arthropod retrotransposon family. Genome Res. 29, 1277–1286 (2019).
De Mendoza, A. et al. Recurrent acquisition of cytosine methyltransferases into eukaryotic retrotransposons. Nat. Commun. 9, 1341 (2018).
Ji, X. et al. Chromatin proteomic profiling reveals novel proteins associated with histone-marked genomic regions. Proc. Natl Acad. Sci. USA 112, 3841–3846 (2015).
Wierer, M. & Mann, M. Proteomics to study DNA-bound and chromatin-associated gene regulatory complexes. Hum. Mol. Genet. 25, R106–R114 (2016).
Villaseñor, R. et al. ChromID identifies the protein interactome at chromatin marks. Nat. Biotechnol. 38, 728–736 (2020).
Stieglmeier, M. et al. Nitrososphaera viennensis gen. nov., sp. nov., an aerobic and mesophilic, ammonia-oxidizing archaeon from soil and a member of the archaeal phylum Thaumarchaeota. Int. J. Syst. Evol. Microbiol. 64, 2738–2752 (2014).
Tirichine, L. et al. Histone extraction protocol from the two model diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana. Mar. Genomics 13, 21–25 (2014).
Shechter, D., Dormann, H. L., Allis, C. D. & Hake, S. B. Extraction, purification and analysis of histones. Nat. Protoc. 2, 1445–1457 (2007).
Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).
Taus, T. et al. Universal and confident phosphorylation site localization using phosphoRS. J. Proteome Res. 10, 5354–5362 (2011).
Vizcaíno, J. A. et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res. 44, D447–D456 (2016).
Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conference (eds Varoquaux, G. et al.) 11–15 (Python in Science Conference, 2008).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Veluchamy, A. et al. An integrative analysis of post-translational histone modifications in the marine diatom Phaeodactylum tricornutum. Genome Biol. 16, 102 (2015).
Ren, Q. & Gorovsky, M. A. Histone H2A.Z acetylation modulates an essential charge patch. Mol. Cell 7, 1329–1335 (2001).
Allis, C. D. et al. hv1 is an evolutionarily conserved H2A variant that is preferentially associated with active genes. J. Biol. Chem. 261, 1941–1948 (1986).
Fusauchi, Y. & Iwai, K. Tetrahymena histone H2A. Acetylation in the N-terminal sequence and phosphorylation in the C-terminal sequence. J. Biochem. 95, 147–154 (1984).
Xiong, L., Adhvaryu, K. K., Selker, E. U. & Wang, Y. Mapping of lysine methylation and acetylation in core histones of Neurospora crassa. Biochemistry 49, 5236–5243 (2010).
Zhang, K., Sridhar, V. V., Zhu, J., Kapoor, A. & Zhu, J. K. Distinctive core histone post-translational modification patterns in Arabidopsis thaliana. PLoS ONE 2, e1210 (2007).
Johnson, L. et al. Mass spectrometry analysis of Arabidopsis histone H3 reveals distinct combinations of post-translational modifications. Nucleic Acids Res. 32, 6511–6518 (2004).
Bergmüller, E., Gehrig, P. M. & Gruissem, W. Characterization of post-translational modifications of histone H2B-variants isolated from Arabidopsis thaliana. J. Proteome Res. 6, 3655–3668 (2007).
Beck, H. C. et al. Quantitative proteomic analysis of post-translational modifications of human histones. Mol. Cell. Proteom. 5, 1314–1325 (2006).
Goudarzi, A. et al. Dynamic competing histone H4 K5K8 acetylation and butyrylation are hallmarks of highly active gene promoters. Mol. Cell 62, 169–180 (2016).
Hake, S. B. et al. Expression patterns and post-translational modifications associated with mammalian histone H3 variants. J. Biol. Chem. 281, 559–568 (2006).
Tan, M. et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell 146, 1016–1028 (2011).
Moniruzzaman, M., Martinez-Gutierrez, C. A., Weinheimer, A. R. & Aylward, F. O. Dynamic genome evolution and complex virocell metabolism of globally-distributed giant viruses. Nat. Commun. 11, 2020 (1710).
Punta, M. et al. The Pfam protein families database. Nucleic Acids Res. 40, D290–D301 (2012).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002).
Steenwyk, J. L., Buida, T. J., Li, Y., Shen, X.-X. & Rokas, A. ClipKIT: a multiple sequence alignment trimming software for accurate phylogenomic inference. PLoS Biol. 18, e3001007 (2020).
Minh, B. Q., Nguyen, M. A. T. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).
Grau-Bové, X. & Sebé-Pedrós, A. Orthology clusters from gene trees with Possvm. Mol. Biol. Evol. 38, 5204–5208 (2021).
Huerta-Cepas, J., Dopazo, H., Dopazo, J. & Gabaldón, T. The human phylome. Genome Biol. 8, R109 (2007).
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Csűrös, M. & Miklós, I. in Research in Computational Molecular Biology (eds Apostolico, A. et al.) 206–220 (Springer, 2006).
Csurös, M. Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26, 1910–1912 (2010).
Gansner, E. R. & North, S. C. An open graph visualization system and its applications to software engineering. Softw. Pract. Exp. 30, 1203–1233 (2000).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Jombart, T., Balloux, F. & Dray, S. adephylo: new tools for investigating the phylogenetic signal in biological traits. Bioinformatics 26, 1907–1909 (2010).
Wells, J. N. & Feschotte, C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 54, 539–561 (2020).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinf. 10, 421 (2009).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Anisimova, M., Gil, M., Dufayard, J.-F., Dessimoz, C. & Gascuel, O. Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst. Biol. 60, 685–699 (2011).
Bodenhofer, U., Bonatesta, E., Horejš-Kainrath, C. & Hochreiter, S. msa: an R package for multiple sequence alignment. Bioinformatics 31, 3997–3999 (2015).
We thank A. de Mendoza for critical input on the analysis of TE fusions. We also thank J. Casacuberta for P. patens samples, H. J. G. Meijer for P. infestans samples, M. Adamska for S. ciliatum samples and A. Simpson for access to the G. okellyi culture (made possible by his funding from NSERC, Canada). Research in the A.S.-P. group was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 851647) and the Spanish Ministry of Science and Innovation (PGC2018-098210-A-I00). We also acknowledge support of the Spanish Ministry of Science and Innovation to the EMBL partnership, the Centro de Excelencia Severo Ochoa and the CERCA Programme (Generalitat de Catalunya). C.N. is supported by an FPI PhD fellowship from the Spanish Ministry of Economy, Industry and Competitiveness (MEIC). X.G.-B. is supported by a Juan de la Cierva fellowship (FJC2018-036282-I) from MEIC. I.R.-T. was supported by a European Research Council (grant no. 616960). B.F.L. was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2017-05411) and by the ‘Fonds de Recherche Nature et Technologie’, Quebec. P.L.-G. and D.M. were supported by a Moore and Simons foundations grant (GBMF9739) and by European Research Council advanced grants (322669, 787904). Research in the C.S. group was supported by the ERC through project TACKLE (advanced grant no. 695192).
The authors declare no competing interests.
Peer review information
Nature Ecology & Evolution thanks Paul Talbert, Michael Borg and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, Primary and secondary alignments of histone-fold containing proteins classified as canonical H2A, H2B, H3 and H4, based on identity to reference sequences in HistoneDB. Pie plots represent the number of alignments to HistoneDB-annotated sequences, for the entire dataset (prokaryotic, eukary-otic and viral sequences, large pie plots in the inset) and the eukaryotic subset (smaller plots in the inset). For those proteins that align to more than one canonical histone or major variant (macroH2A, H2A.Z or cenH3), the scatter plots represent the relative identity between the primary (horizontal axis) and secondary alignment(s) (vertical axis). b, Aggregated counts of histone gene pairs, classified ac-cording to histone type and orientation. c, Presence of histone variants (left) and number of collinear pairs of histone-encoding genes (right) per species, classified according to their histone types and rela-tive orientation (head-to-head, hh; head-to-tail, ht; and tail-to-tail, tt). Source data available in Supple-mentary Data 2. Histone variant classification is based on the highest-scoring HMM profile from His-toneDB. Asterisks colors in the macroH2A column indicate species where histone-less Macro do-mains orthologous to the macroH2A genes are found (see panel d). Lighter colors in the variant classi-fication indicate ambiguously classified histones (i.e. cases in which the highest-scoring HMM profile exhibited a low bitscore, defined as a probability below 0.05 in the profile-wise distribution function of scaled bitscores; or cases in which the first-to-second ratio between high scoring profiles was below 1.01). d, Alignments of putatively conserved histone N-tails in archaea. Conserved amino-acids are color-coded according to chemical properties. Dots next to species names are color-coded according to taxonomy (same as Fig. 2c). e, Phylogenetic analysis of the Macro motif of macroH2A histones across eukaryotes, highlighting the macroH2A ortholog group (green), and, within this group, Macro-containing genes lacking histone domains (orange), and their protein domain architectures.
a, Proteomics detection coverage (% of amino acids), number of hPTMs and number of hPTMs per covered position, for the best-covered histone in each species in our proteomics survey. b, Number of samples in which each histone-matching peptide with post-translational modifications (peptide spectral matches defined by Proteome Discoverer) has been identified, per species. For each species, we report the percentage of modified peptides found in more than one replicate. c, Number of samples in which histone-matching modified peptide has been identified, across all the samples from this study. The tree pie charts represent these distributions for all hPTMs, acetylations, and methylations. d, Evidence of hPTM conservation in the major histone variants H2A.Z and macroH2A (conserved positions only), as well as any position in the linker histones H1.
a-c, Number of taxa within each lineage that contain chromatin-associated genes, for archaeal, bacterial (per phyla) or viral (per family) genomes. Numbers indicate the exact number of taxa. d, Number of genes encoding core domains that define chromatin-associated gene families per eukaryotic genome/transcriptome. Numbers indicate exact number of proteins.
a, Species tree of eukaryotes used in the ancestral reconstruction analysis, with branch lengths calibrated to the gain/loss rates of Pfam domains (see Methods). Available in Supplementary Table 1. b, Conservation of archetypical protein domain architectures across orthogroups, in acetylases, deacetylases, methyltransferases, demethylases, remodellers and chaperones. In each heatmap, we indicate the fraction of genes within an orthogroup (rows) that contain a specific protein domain (columns). Domains in bold are catalytic (black) or reader (purple) functions. At the right of each heatmap, we summarize the presence/absence profile of each orthogroup across eukaryotic lineages (as listed in Fig. 1a).
a, Pie plot representing the number of genes classified as part of the catalytic (acetylases, deacetylases, methyltransferases, demethylases, remodellers or chaperones) or reader families or as both. The barplot at the right shows the most common reader domains in genes classified with both reader and catalytic functions. b, Pie plot representing the number of reader domain-encoding genes classified according to whether they contain one type of reader domain (for example, PHD) or more than one (for example, PHD + PWWP). The barplot at the right shows the most common combinations of reader domains among genes with multiple reader domains. c, Summary of gene family gains per reader family, with example cases highlighted in selected nodes. Node size is proportional to number of gains at 90% probability.
a, Number of candidate fusion genes classified by the level of gene model validation evidence, based on contiguity of the gene model over the genome assembly (that is lack of poly-N stretches in the genomic region between the TE- and chromatin-associated domains), evidence of expression, and evidence of contiguous expression (see inset at the right). b, Summary of candidate gene fusions within each chromatin-associated gene family, divided by gene family. For each gene, we indicate their similarity to known TE families, presence of TE-associated domains, the evidence of gene model validity, and information on their gene structure (whether they are monoexonic or are located in clusters with other fusion genes). Source data available in Supplementary Table 6. c, Number of species with at least one valid fusion, divided by gene family. d, Mapping positions of RNA-seq reads supporting candidate gene-transposon fusions (selected examples from Fig. 5e). For each fusion, we show reads spanning the region along the spliced transcript that fully covers the transposon-associated domains (highlighted in green), the chromatin-associated domains, and the inter-domain region. Uninterrupted stretches of mapped positions between domains indicate the validity of a domain co-occurrence. For clarity purposes, reads mapping entirely within a single domain have been excluded from this visualization.
a-c, Selected gene trees highlighting examples of eukaryotic- and prokaryotic-like viral homologues. d, Number of viral genes of each chromatin-associated gene family, classified according to their closest neighbours from cellular clades in gene tree analyses based on phylogenetic affinity scores (see Methods). Within each gene family, viral sequences are classified according to their PFAM domain architecture – the most common architecture being single domain in most gene families except for remodellers and BIR readers. e, Id. but classifying viral genes according to their phylogenetic affinity to eukaryotic orthology groups. Source data available in Supplementary Table 6.
Taxon sampling. a, List of eukaryotic species used in the comparative genomic analyses, including species abbreviations, data sources for genome or transcriptome assemblies and annotations and their taxonomic classification. b, List of gene expression datasets (SRA accession numbers) used for gene model validation analyses of candidate fusion genes. c, List of histone post-translational modification proteomics datasets used in this study (PRIDE accession numbers).
Histone clusters and classification. a, Pairs of collinear histone-encoding genes, including their genomic coordinates and relative orientation. b, List and sequences of archaeal HMfB histones with N-terminal tails (at least ten amino acids before a complete globular domain). c, Classification of histone variants across eukaryotes.
hPTM conservation. a–g, Table of hPTMs identified in histones of the 26 eukaryotic species used in the comparative proteomics analysis, separated by histone type (canonical and major variants: H2A, H2B, H3, H4, macroH2A, H2A.Z and H1). Each entry corresponds to a modified peptide, for which we specify modification coordinates along the peptide and relative to the consensus histone sequence (if available). We also indicate whether each peptide can be uniquely mapped to a conserved or non-conserved region in a canonical histone or to specific histone variants. These tables also include entries for hPTMs reported in the literature (indicated as a cited source or as a specific UNIPROT entry; see Methods for a list of sources); in these cases, source peptides and associated data may not be available. h, hPTMs in archaea.
Gene family analysis. a, List of gene classes analysed in the comparative genomics analyses, including the PFAM protein domains used to retrieve homologues and search parameters. b, List of transposon-associated PFAM domains surveyed in the analyses of transposon–chromatin gene fusions.
Evolution of the chromatin machinery in eukaryotes. a, Summary of gene family evolutionary patterns in eukaryotes (n = 1,713 orthogroups). For each orthogroup, we indicate its gene and functional class, the number of members, species where it is present and major eukaryotic lineages (Amoebozoa, Opisthokonta + Breviatea + Apusozoa, CRuMs, Ancyromonadida, Malawimonadidae, Archaeplastida + Cryptista, SAR + Haptista, Hemimastigophora, Discoba and Metamonada), the probability of presence at the last eukaryotic common ancestor, the phylogenetic affinity of their closest homologues (other eukaryotic orthogroups, bacteria, archaea or viruses) and their average frequency amongst the ten nearest neighbours of its member gene in phylogenetic trees (‘Phylogenetic affinity score’, Methods); as well as its consensus protein domain architecture (present in at least 25% of its members). We also indicate the gene symbols of members from four model species: H. sapiens, D. melanogaster, S. cerevisiae and A. thaliana. b,c, Probability of gain and loss of each gene family at extant and ancestral nodes along the eukaryotic phylogeny. d, Orthogroup assignments per gene.
Transposon fusions and viral homology. a, List of candidate fusions between chromatin-associated genes and transposons, including the phylogenetic classification of each gene (orthogroup), protein domain architectures and the transcriptomics-level and gene model-level evidence supporting each fusion. b, List of chromatin-associated genes encoded by viral genomes, including their species of origin and a summary of their phylogenetic embedding among cellular species (specifically, which are its closest homologues in cellular genomes and the fraction of phylogenetic nearest neighbours they represent, the closest eukaryotic gene family among those close to eukaryotic genes in the gene trees and the distance to the closest cellular homologue).
Phylogenetic analyses. Collection of gene trees used to identify orthology groups for the eukaryotic chromatin toolkit. UFBS bootstrap supports rare indicated at each node. An annotated eukaryotic species tree is also included.
Peptide sequences. Collection of peptide sequences used to build gene trees of the eukaryotic chromatin toolkit.
About this article
Cite this article
Grau-Bové, X., Navarrete, C., Chiva, C. et al. A phylogenetic and proteomic reconstruction of eukaryotic chromatin evolution. Nat Ecol Evol 6, 1007–1023 (2022). https://doi.org/10.1038/s41559-022-01771-6