The human genome encodes some 350 Krüppel-associated box (KRAB) domain-containing zinc-finger proteins (KZFPs), the products of a rapidly evolving gene family that has been traced back to early tetrapods1,2. The function of most KZFPs is unknown, but a few have been demonstrated to repress transposable elements in embryonic stem (ES) cells by recruiting the transcriptional regulator TRIM28 and associated mediators of histone H3 Lys9 trimethylation (H3K9me3)-dependent heterochromatin formation and DNA methylation3,4,5,6,7,8,9. Depletion of TRIM28 in human or mouse ES cells triggers the upregulation of a broad range of transposable elements4,10,11, and recent data based on a few specific examples have pointed to an arms race between hosts and transposable elements as an important driver of KZFP gene selection5. Here, to obtain a global view of this phenomenon, we combined phylogenetic and genomic studies to investigate the evolutionary emergence of KZFP genes in vertebrates and to identify their targets in the human genome. First, we unexpectedly reassigned the root of the family to a common ancestor of coelacanths and tetrapods. Second, although we confirmed that the majority of KZFPs bind transposable elements and pinpoint cases of ongoing co-evolution, we found that most of their transposable element targets have lost all transposition potential. Third, by examining the interplay between human KZFPs and other transcriptional modulators, we obtained evidence that KZFPs exploit evolutionarily conserved fragments of transposable elements as regulatory platforms long after the arms race against these genetic invaders has ended. Together, our results demonstrate that KZFPs partner with transposable elements to build a largely species-restricted layer of epigenetic regulation.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Huntley, S. et al. A comprehensive catalog of human KRAB-associated zinc finger genes: insights into the evolutionary history of a large family of transcriptional repressors. Genome Res. 16, 669–677 (2006)
Liu, H., Chang, L. H., Sun, Y., Lu, X. & Stubbs, L. Deep vertebrate roots for mammalian zinc finger transcription factor subfamilies. Genome Biol. Evol. 6, 510–525 (2014)
Wolf, D. & Goff, S. P. Embryonic stem cells use ZFP809 to silence retroviral DNAs. Nature 458, 1201–1204 (2009)
Castro-Diaz, N. et al. Evolutionally dynamic L1 regulation in embryonic stem cells. Genes Dev. 28, 1397–1409 (2014)
Jacobs, F. M. et al. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature 516, 242–245 (2014)
Schultz, D. C., Ayyanathan, K., Negorev, D., Maul, G. G. & Rauscher, F. J., III . SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes Dev. 16, 919–932 (2002)
Schultz, D. C., Friedman, J. R. & Rauscher, F. J., III . Targeting histone deacetylase complexes via KRAB-zinc finger proteins: the PHD and bromodomains of KAP-1 form a cooperative unit that recruits a novel isoform of the Mi-2α subunit of NuRD. Genes Dev. 15, 428–443 (2001)
Matsui, T. et al. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464, 927–931 (2010)
Quenneville, S. et al. The KRAB-ZFP/KAP1 system contributes to the early embryonic establishment of site-specific DNA methylation patterns maintained during development. Cell Reports 2, 766–773 (2012)
Rowe, H. M. et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463, 237–240 (2010)
Turelli, P. et al. Interplay of TRIM28 and DNA methylation in controlling human endogenous retroelements. Genome Res. 24, 1260–1270 (2014)
Amemiya, C. T. et al. The African coelacanth genome provides insights into tetrapod evolution. Nature 496, 311–316 (2013)
Rhee, H. S. & Pugh, B. F. Comprehensive genome-wide protein-DNA interactions detected at single-nucleotide resolution. Cell 147, 1408–1419 (2011)
Quenneville, S. et al. In embryonic stem cells, ZFP57/KAP1 recognize a methylated hexanucleotide to affect chromatin and DNA methylation of imprinting control regions. Mol. Cell 44, 361–372 (2011)
Baudat, F. et al. PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 327, 836–840 (2010)
Frietze, S., O’Geen, H., Blahnik, K. R., Jin, V. X. & Farnham, P. J. ZNF274 recruits the histone methyltransferase SETDB1 to the 3′ ends of ZNF genes. PLoS One 5, e15082 (2010)
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of TEs: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017)
Ecco, G. et al. Transposable elements and their KRAB-ZFP controllers regulate gene expression in adult tissues. Dev. Cell 36, 611–623 (2016)
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351, 1083–1087 (2016)
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015)
Lizio, M. et al. Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol. 16, 22 (2015)
Hedges, S. B., Dudley, J. & Kumar, S. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22, 2971–2972 (2006)
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012)
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008)
Dreos, R., Ambrosini, G., Cavin Périer, R. & Bucher, P. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res. 41, D157–D164 (2013)
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010)
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011)
Chikina, M. D. & Troyanskaya, O. G. An effective statistical evaluation of ChIPseq dataset similarity. Bioinformatics 28, 607–613 (2012)
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013)
We thank the University of Lausanne Genomic Technologies Facility for sequencing, J. Marquis, S. Offner and C. Raclot for technical assistance and Vital-IT for computing resources. This work was financed through grants from the Swiss National Science Foundation and the European Research Council (ERC 268721) to D.T., and a fellowship from the Fonds de la Recherche en Santé du Québec to M.I.
The authors declare no competing financial interests.
Reviewer Information Nature thanks V. Iyer, Z. Izsvak and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, Zoomed-out overview. Each KZFP is represented by a circle (‘node’) and linked to a large node of the same colour representing its species of origin. The colours were randomly assigned for each species. Weighted links between any two KZFP nodes are based on significant homology (≥60%) between their ZNF signatures. The topology of the network is established by a force-directed process that follows certain rules: all nodes repel each other and are attracted to the centre by artificial ‘gravity’, and nodes with links attract each other, which leads to the formation of clusters when many KZFPs share homology. Species-specific KZFPs aggregate close to the corresponding species node and highly conserved clusters tend to radiate out of the centre. b–g, Zoom-in on indicated taxonomic regions. The restricted cluster of a guinea pig stood out for its very high content of duplicated KZFPs (e), suggestive of ongoing amplification, a phenomenon only seen, to a lesser extent, in a few other genomes (notably the mouse, c). Clusters specific to amphibians, lizards, snakes, turtles and crocodilians were clearly delineated (g), with very few KZFPs from these groups being shared with mammals. Nevertheless, a bridge between humans and birds or lizards could be established via highly conserved KZFPs, corresponding in human to ZNF777 and ZNF282 (g, top right).
Venn diagrams of ChIP–exo replicates indicating the number of peaks found in either or both experiments, along with their percentage overlap.
Extended Data Figure 3 Overlap between TRIM28 and KZFP ChIP–exo in different cellular contexts illustrating similar profiles.
a, Number of high quality peaks retained for all tested KZFPs in 293T cells (log scale) with colour code for KRAB domain configuration. b, ChIP–exo of ZFP57 and PRDM9, illustrating the single base-pair resolution of the technique. Top, alignments of genomic targets, using a four-colour code for nucleotides (C, yellow; G, red; A, green; T, blue). Bottom, forward (red) and reverse (blue) strand 5′ end read counts from ChIP–exo data, with the area of overlap (purple) corresponding to the formaldehyde-fixed interface between protein and DNA. c, Evolutionary distribution of KZFPs according to their preferential binding to transposable elements or near TSSs (within 2.5 kb). Dots are coloured according to KRAB domain configuration (blue, KRAB-ZNF; green, SCAN-KRAB-ZNF; red, DUF3669-KRAB-ZNF). d, Proportion of transposable elements bound by 0, 1 or 2+ KZFPs per class (top) or within a panel of LINE1 elements (bottom). e, Overview of a genomic region with tracks of TRIM28 ChIP–exo in H1 ES cells (purple) and indicated KZFPs in 293T cells (green) along with corresponding total input controls. f, Percentage of overlap for each ChIP–exo of KZFPs with ChIP–exo data for endogenous TRIM28 in H1 ES cells. g, Superimposition of ChIP–exo at overlap between endogenous TRIM28 in human ES cells (bottom) and indicated HA-tagged KZFPs in 293T cells (middle), with bound regions aligned on top. Colours are representative of distinct DNA bases: C, yellow; G, red; A, green; T, blue. The base pair-resolution footprint of each factor is represented by areas in which forward (red) and reverse (blue) reads overlap (purple). For ZNF675 in the TRIM28 signal, one can observe a peak not present in HEK293 ChIP–exo—this represents signal from another endogenous factor cobinding in close proximity. In this particular case, the extraneous signal overlaps with ZNF765 (Extended Data Fig. 4b).
a, Network analysis of interactions between KZFPs and transposable elements. Strength of links between KZFPs (grey) and transposable element families, based on data illustrated in Fig. 2c. The weakest illustrated link corresponds to P < 1 × 10−20. Some KZFP–transposable element relationships are very exclusive (at the edge of the network), but a high level of promiscuity is also apparent, with some KZFPs recognizing multiple subfamilies, or even multiple classes of transposable elements (for example SINEs and ERVs). b, ChIP–exo of ZNF765, which significantly recognizes both LINEs (L1PA) and ERVs (THE1) through the indicated consensus motif, also found in some non-repeat sequences. ZNF765 is indicated by a black arrow in a.
a, Proximity-binding plot of human KZFPs. The colour scale represents the fraction of peaks from two KZFPs binding significantly close to one another (P < 0.05, IntervalStats software). The cluster in the lower right corner corresponds to KZFPs co-recruited in various combinations at promoters. Other pairs of closely binding KZFPs can be detected, many in transposable elements as illustrated in Figs 2, 3. b, Genome browser view of KZFPs binding in close proximity at promoters. c, Signal overlay of selected pairs of promoter-binding KZFPs. Some (for example, ZNF282 and ZNF398) systematically bind in close proximity to one another, while others (for example, ZKSCAN2 and ZNF263) are recruited at less inter-dependent distances from the TSS.
a, Recruitment of ZNF75D and ZNF274 to 3′ ZNF-encoding exons of ZNF proteins, illustrated at the ZNF180 locus, with overlapping but distinct consensus binding motifs. b, Proportion of non-KRAB-ZFP (top) or KZFP protein-coding (bottom) genes bound by ZNF75D, ZNF274 or both. For KZFP-bound genes, there was no clear preference for young or old members of the family. c, Motif and ChIP–exo signal for ZNF75D and ZNF274, with links to the C2H2 zinc-finger protein motif. Bound sequences overlap with highly conserved amino acids and some overrepresented subsequences in the motif (GK, QR) directly correspond to the DNA sequence required to recruit the protein.
a, Sequence logos of KRAB-bound sequences of each subfamily of L1PA elements. Gaps are represented by black squares and each letter is proportional in height to its prevalence. Specific mutations correlating with the loss of binding for specified KZFPs are highlighted by black arrows and the motif is displayed at the top. A plus–minus system on the chart illustrates the level of binding of the specific KZFP to each subfamily of L1PAs. b, Binding dynamics of selected KZFPs within L1PA families, illustrating a wave-like pattern of recruitment. However, when the ages of KZFPs and mutation profiles are taken into account, paradoxical scenarios of arms-race dynamics (ZNF141, ZNF765, ZNF93 and ZNF649) and positive selection (ZNF382 and ZNF84) can be observed.
Additional examples of KZFP binding to domesticated retroelements. Notably, the non-KRAB-ZFP chromatin-looping factor ZNF143 was significantly enriched with the KZFP ZNF317 at ancient LINE1 integrants. Also, FOXA1 within L1 co-occurred with multiple KZFPs binding L1PA elements. Finally, ZNF808 within MER11B elements and transcription factor GATA3, with signs of possible enhancer effects with significant association with EP300 at the same loci.
Extended Data Figure 9 Non-KZFP-recognized transposable elements are less epigenetically active and shorter; association of KZFPs with H3K9me3 in NIH Roadmap.
a, Proportion of histone modifications in non-KRAB recognized transposable elements. The number of elements is displayed in parenthesis. When compared to the proportions observed for KZFP-recognized transposable elements in Fig. 4b, we can observe that the latter are more susceptible to be decorated with either active or repressive histone marks. b, The length in base pairs of transposable elements is plotted and subdivided by class and by KZFP binding status. c, Transposable element-specific H3K9me3 enrichment within 127 cell types from NIH Roadmap and KZFPs from our dataset. However, as many KZFPs bind in close proximity to one another, it can be difficult to associate H3K9me3 status to a particular KZFP. Other sources of H3K9me3 targeting transposable elements can also add noise to the signal. d, For each transposable element, we sum the expression values (RPKM) of KZFPs binding it from RNA-seq in the NIH Roadmap dataset. Transposable elements bound by highly expressed KZFPs are on average more susceptible to be covered by H3K9me3. ***P < 1 × 10−50, Wilcoxon signed-rank test.
a, Strategy used to assay the repressive potential of selected KZFPs, with details for each candidate target. These were selected on the basis of a few criteria, namely high signal for both TRIM28 and H3K9me3 in H1 ES cells and strong binding by KZFP in 293T. b, Confirmation of the repression-recruiting capabilities of the cloned motifs in H1 ES cells. In all four cases, repression is naturally observed with the constructs containing the cloned motifs without the need to overexpress the KZFP targeting the sequence. c, Repression of cloned motifs in 293T cells. For three out of the four constructs, endogenous TRIM28 binding data in 293T are negative and repression was neither expected nor observed. However, TRIM28 is recruited at the original genomic ZNF675-targeted sequence in 293T and spontaneous repression was gradually observed over time in our reporter system. We then induced expression of the cognate KZFPs by addition of doxycycline and could observe repression of all four constructs. The kinetics for ZNF675 were in this case slightly faster than those observed with endogenous repression. The data presented are representative of two independent experiments.
A list of all vertebrate genomes analyzed along with their assembly version, common names, Latin names, taxonomic class and order, as well as the count of detected KZFP genetic units (genes + pseudogenes) displayed to Figure 1a. (XLSX 25 kb)
A list of zinc fingers genes / pseudogenes we annotated in vertebrate genomes along with clustering information based on zinc finger signatures, domain configuration, genomic localization and others. In this table, data within parentheses indicate domains (zinc fingers, KRAB domains) outside of current annotation. There are also some units annotated as ZNFs clustering with KRABs, but manual inspection of a lot of these units show incomplete sequencing upstream of the zinc finger array. KZFPs with the same cluster # (last column) are considered to be DNA binding orthologs based on ZNF similarity. (XLSX 9511 kb)
Data presented in figure 1c, which represents the average proportion of shared KZFP genetic units between species according to our ZNF signature clustering. (XLSX 358 kb)
A list of all the KZFPs for which we obtained ChIP-exo data, along with Ensembl identifiers, KRAB domain configuration, estimate evolutionary origin according to our functional clustering, as well as the DNA coding sequence we cloned in our expression vectors. (XLSX 141 kb)
Data presented in Figure 2b, which shows the percentage of overlap between KZFP ChIP datasets and various genomic structures and transposable element classes. (XLSX 32 kb)
Data presented in Figure 2f, which represents the enrichment of KZFPs within precise transposable element subfamilies. The score is -log 10 of the pvalue and the maximum achievable is 320. (XLSX 359 kb)
Data presented in Extended Data 5a, the association of different KZFP datasets based on a proximity analysis. The score is the percentage of peaks found significantly close at p < 0.05. (XLSX 415 kb)
Data presented in Figure 4a – the enrichment between KZFP peaks and transcription factors from ENCODE. The score is -log 10 of the pvalue and the maximum achievable is 320. Only the subset of data for which there is at least one significant interaction is shown. (XLSX 419 kb)
Data presented in Extended Data 9c – enrichment between KZFP peaks and H3K9me3 from the NIH roadmap dataset. The score is -log 10 of the pvalue and the maximum achievable is 320. Only the subset of data for which there is at least one significant interaction is shown. (XLSX 324 kb)
Data presented in Figure 4d – expression values in tags per million from the FANTOM5 dataset (XLSX 7138 kb)
About this article
Cite this article
Imbeault, M., Helleboid, PY. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017). https://doi.org/10.1038/nature21683
Nature Communications (2020)
Nucleic Acids Research (2020)
The UCSC repeat browser allows discovery and visualization of evolutionary conflict across repeat families
Mobile DNA (2020)
Science Advances (2020)