Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Resource
  • Published:

C2H2 zinc finger proteins greatly expand the human regulatory lexicon

Abstract

Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors. However, for most C2H2-ZF proteins it is unknown whether they even bind DNA or, if they do, to which sequences. Here, by combining data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses, we show that natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains. In vivo binding data are generally consistent with our recognition code and indicate that C2H2-ZF proteins recognize more motifs than all other human transcription factors combined. We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that the human genome contains an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: B1H data and PBM confirmations.
Figure 2: The B1H recognition code.
Figure 3: Comparison of B1H-RC to ChIP-seq results for 39 human C2H2-ZF proteins.
Figure 4: The majority of KRAB family proteins bind to EREs.

Similar content being viewed by others

Accession codes

Primary accessions

Gene Expression Omnibus

References

  1. Wolfe, S.A., Nekludova, L. & Pabo, C.O. DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183–212 (2000).

    Article  CAS  PubMed  Google Scholar 

  2. Klug, A. The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Annu. Rev. Biochem. 79, 213–231 (2010).

    Article  CAS  PubMed  Google Scholar 

  3. Emerson, R.O. & Thomas, J.H. Adaptive evolution in zinc finger transcription factors. PLoS Genet. 5, e1000325 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Nowick, K. et al. Gain, loss and divergence in primate zinc-finger genes: a rich resource for evolution of gene regulatory differences between species. PLoS ONE 6, e21553 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Hamilton, A.T. et al. Evolutionary expansion and divergence in the ZNF91 subfamily of primate-specific zinc finger genes. Genome Res. 16, 584–594 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  Google Scholar 

  7. Stubbs, L., Sun, Y. & Caetano-Anolles, D. Function and evolution of C2H2 zinc finger arrays. Subcell. Biochem. 52, 75–94 (2011).

    Article  CAS  PubMed  Google Scholar 

  8. Weirauch, M.T. & Hughes, T.R. A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution. Subcell. Biochem. 52, 25–73 (2011).

    Article  CAS  PubMed  Google Scholar 

  9. Vaquerizas, J.M., Kummerfeld, S.K., Teichmann, S.A. & Luscombe, N.M. A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).

    Article  CAS  PubMed  Google Scholar 

  10. Mackay, D.J. et al. Hypomethylation of multiple imprinted loci in individuals with transient neonatal diabetes is associated with mutations in ZFP57. Nat. Genet. 40, 949–951 (2008).

    Article  CAS  PubMed  Google Scholar 

  11. Kleefstra, T. et al. Zinc finger 81 (ZNF81) mutations associated with X-linked mental retardation. J. Med. Genet. 41, 394–399 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kalsoom, U.E. et al. Whole exome sequencing identified a novel zinc-finger gene ZNF141 associated with autosomal recessive postaxial polydactyly type A. J. Med. Genet. 50, 47–53 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Rowe, H.M. et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463, 237–240 (2010).

    Article  CAS  PubMed  Google Scholar 

  14. Rowe, H.M. & Trono, D. Dynamic control of endogenous retroviruses during development. Virology 411, 273–287 (2011).

    Article  CAS  PubMed  Google Scholar 

  15. Matsui, T. et al. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464, 927–931 (2010).

    Article  CAS  PubMed  Google Scholar 

  16. Thomas, J.H. & Schneider, S. Coevolution of retroelements and tandem zinc finger genes. Genome Res. 21, 1800–1812 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Carlson, K.A. et al. Molecular characterization of a putative antiretroviral transcriptional factor, OTK18. J. Immunol. 172, 381–391 (2004).

    Article  CAS  PubMed  Google Scholar 

  18. Wolf, D. & Goff, S.P. Embryonic stem cells use ZFP809 to silence retroviral DNAs. Nature 458, 1201–1204 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Jacobs, F.M. et al. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature 516, 242–245 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Persikov, A.V. & Singh, M. De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 42, 97–108 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Gupta, A. et al. An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res. 42, 4800–4812 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Wolfe, S.A., Grant, R.A., Elrod-Erickson, M. & Pabo, C.O. Beyond the “recognition code”: structures of two Cys2His2 zinc finger/TATA box complexes. Structure 9, 717–723 (2001).

    Article  CAS  PubMed  Google Scholar 

  24. Isalan, M., Choo, Y. & Klug, A. Synergy between adjacent zinc fingers in sequence-specific DNA recognition. Proc. Natl. Acad. Sci. USA 94, 5617–5621 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Brayer, K.J. & Segal, D.J. Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem. Biophys. 50, 111–131 (2008).

    Article  CAS  PubMed  Google Scholar 

  26. Brayer, K.J., Kulshreshtha, S. & Segal, D.J. The protein-binding potential of C2H2 zinc finger domains. Cell Biochem. Biophys. 51, 9–19 (2008).

    Article  CAS  PubMed  Google Scholar 

  27. Iuchi, S. Three classes of C2H2 zinc finger proteins. Cell. Mol. Life Sci. 58, 625–635 (2001).

    Article  CAS  PubMed  Google Scholar 

  28. Meng, X., Brodsky, M.H. & Wolfe, S.A. A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23, 988–994 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Noyes, M.B. et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36, 2547–2560 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Swirnoff, A.H. & Milbrandt, J. DNA-binding specificity of NGFI-A and related zinc finger transcription factors. Mol. Cell. Biol. 15, 2275–2287 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Berger, M.F. et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Wang, H. et al. SVA elements: a hominid-specific retroposon family. J. Mol. Biol. 354, 994–1007 (2005).

    Article  CAS  PubMed  Google Scholar 

  34. Ayyanathan, K. et al. Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation. Genes Dev. 17, 1855–1869 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Cheng, Y. et al. KRAB zinc finger protein ZNF382 is a proapoptotic tumor suppressor that represses multiple oncogenes and is commonly silenced in multiple carcinomas. Cancer Res. 70, 6516–6526 (2010).

    Article  CAS  PubMed  Google Scholar 

  36. Drosophia 12 Genes Consortium. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

  37. She, X., Cheng, Z., Zollner, S., Church, D.M. & Eichler, E.E. Mouse segmental duplication and copy number variation. Nat. Genet. 40, 909–914 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Wang, J. et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

    Article  CAS  PubMed  Google Scholar 

  40. Carroll, S.B. Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36 (2008).

    Article  CAS  PubMed  Google Scholar 

  41. Weirauch, M. et al. Determination and inference of Eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Dermitzakis, E.T. & Clark, A.G. Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121 (2002).

    Article  CAS  PubMed  Google Scholar 

  43. Sanges, R. et al. Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage. Genome Biol. 7, R56 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Odom, D.T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat. Genet. 39, 730–732 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wunderlich, Z. & Mirny, L.A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Tonikian, R., Zhang, Y., Boone, C. & Sidhu, S.S. Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries. Nat. Protoc. 2, 1368–1386 (2007).

    Article  CAS  PubMed  Google Scholar 

  47. Gupta, A. et al. An optimized two-finger archive for ZFN-mediated gene targeting. Nat. Methods 9, 588–590 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Meng, X. & Wolfe, S.A. Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nat. Protoc. 1, 30–45 (2006).

    Article  CAS  PubMed  Google Scholar 

  49. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Frey, B.J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).

    Article  CAS  PubMed  Google Scholar 

  51. Stormo, G.D. & Zhao, Y. Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–760 (2010).

    Article  CAS  PubMed  Google Scholar 

  52. Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. Least angle regression. Ann. Stat. 32, 407–499 (2004).

    Article  Google Scholar 

  53. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  54. Bailey, T.L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Petrey, D. et al. Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins 53 (suppl. 6), 430–435 (2003).

    Article  CAS  PubMed  Google Scholar 

  57. Lu, X.J. & Olson, W.K. 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 31, 5108–5121 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Pang, Y.P. Successful molecular dynamics simulation of two zinc complexes bridged by a hydroxide in phosphotriesterase using the cationic dummy atom method. Proteins 45, 183–189 (2001).

    Article  CAS  PubMed  Google Scholar 

  59. Pettersen, E.F. et al. UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

    Article  CAS  PubMed  Google Scholar 

  60. Vriend, G. WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8, 52–56 (1990).

    Article  CAS  PubMed  Google Scholar 

  61. Case, D.A. et al. The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668–1688 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Ryckaert, J.-P., Ciccotti, G. & Berendsen, H.J.C. Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).

    Article  CAS  Google Scholar 

  63. Toukmaji, A., Sagui, C., Board, J. & Darden, T. Efficient particle-mesh Ewald based approach to fixed and induced dipolar interactions. J. Chem. Phys. 113, 10913 (2000).

    Article  CAS  Google Scholar 

  64. Pérez, A. et al. Refinement of the AMBER force field for nucleic acids: improving the description of alpha/gamma conformers. Biophys. J. 92, 3817–3829 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Hornak, V. et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins 65, 712–725 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Berendsen, H.J.C., Postma, J.P.M., van Gunsteren, W.F., DiNola, A. & Haak, J.R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684 (1984).

    Article  CAS  Google Scholar 

  67. Feig, M., Karanicolas, J. & Brooks, C.L. III. MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J. Mol. Graph. Model. 22, 377–395 (2004).

    Article  CAS  PubMed  Google Scholar 

  68. Guerois, R., Nielsen, J.E. & Serrano, L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320, 369–387 (2002).

    Article  CAS  PubMed  Google Scholar 

  69. Lam, K.N., van Bakel, H., Cote, A.G., van der Ven, A. & Hughes, T.R. Sequence specificity is obtained from the majority of modular C2H2 zinc-finger arrays. Nucleic Acids Res. 39, 4680–4690 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Chen, G.I. et al. PP4R4/KIAA1622 forms a novel stable cytosolic complex with phosphoprotein phosphatase 4. J. Biol. Chem. 283, 29273–29284 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Moffat, J. et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124, 1283–1298 (2006).

    Article  CAS  PubMed  Google Scholar 

  72. Skarra, D.V. et al. Label-free quantitative proteomics and SAINT analysis enable interactome mapping for the human Ser/Thr protein phosphatase 5. Proteomics 11, 1508–1516 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Schmidt, D. et al. ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods 48, 240–248 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Day, D.S., Luquette, L.J., Park, P.J. & Kharchenko, P.V. Estimating enrichment of repetitive elements from high-throughput sequence data. Genome Biol. 11, R69 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Landt, S.G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Kharchenko, P.V., Tolstorukov, M.Y. & Park, P.J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Bailey, T.L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 40, e128 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Grant, C.E., Bailey, T.L. & Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  87. Sheffield, N.C. et al. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 23, 777–788 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are grateful to S. Wolfe for providing B1H reagents and protocols, F. Aidoo, H. Zheng, H. Tang, P. Young, T. Kanagalingam, D. Torti and the Donnelly Sequencing Centre for technical support, and E. Chan, H. van Bakel and X. Chen for computational support and analyses. This work was supported by grants from the Canadian Institutes of Health Research (MOP-77721 and MOP-111007 to T.R.H., MOP-272138 to T.R.H., J.G. and Andrew Emili), and funding from the Canadian Institutes for Advanced Research to T.R.H., B.J.F. and M.T.W. H.S.N. was supported by a Canadian Institutes of Health Research Banting Fellowship, F.W.S. by a European Molecular Biology Organization postdoctoral fellowship, and K.N.L. by a Natural Science and Engineering Research Council CGS-M.

Author information

Authors and Affiliations

Authors

Contributions

H.S.N., S.M., F.W.S. and T.R.H. conceived and designed the experiments. S.M. performed the B1H experiments, with contributions from K.N.L. F.W.S. performed the ChIP-seq experiments, with contributions from E.R. S.M. and A.Y. performed the PBM experiments. H.S.N. analyzed the data and developed the computational models. M.G. performed the structural modeling. M.A., M.T.W. and T.R.H. contributed to data analysis. J.G. contributed reagents and materials. P.M.K., J.G. and B.J.F. provided critical advice and commentary on data analysis. H.S.N. prepared the figures. T.R.H. conceived the study and supervised the project, and H.S.N. and T.R.H. wrote the manuscript.

Corresponding author

Correspondence to Timothy R Hughes.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Table 1 (PDF 6067 kb)

Supplementary Data

Supplementary Data (ZIP 31609 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Najafabadi, H., Mnaimneh, S., Schmitges, F. et al. C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol 33, 555–562 (2015). https://doi.org/10.1038/nbt.3128

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.3128

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research