Resource | Published:

C2H2 zinc finger proteins greatly expand the human regulatory lexicon

Nature Biotechnology volume 33, pages 555562 (2015) | Download Citation

Abstract

Cys2-His2 zinc finger (C2H2-ZF) proteins represent the largest class of putative human transcription factors. However, for most C2H2-ZF proteins it is unknown whether they even bind DNA or, if they do, to which sequences. Here, by combining data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses, we show that natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains. In vivo binding data are generally consistent with our recognition code and indicate that C2H2-ZF proteins recognize more motifs than all other human transcription factors combined. We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions, indicating that the human genome contains an extensive and largely unstudied adaptive C2H2-ZF regulatory network that targets a diverse range of genes and pathways.

  • Subscribe to Nature Biotechnology for full access:

    $250

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Accessions

Primary accessions

Gene Expression Omnibus

References

  1. 1.

    , & DNA recognition by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183–212 (2000).

  2. 2.

    The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Annu. Rev. Biochem. 79, 213–231 (2010).

  3. 3.

    & Adaptive evolution in zinc finger transcription factors. PLoS Genet. 5, e1000325 (2009).

  4. 4.

    et al. Gain, loss and divergence in primate zinc-finger genes: a rich resource for evolution of gene regulatory differences between species. PLoS ONE 6, e21553 (2011).

  5. 5.

    et al. Evolutionary expansion and divergence in the ZNF91 subfamily of primate-specific zinc finger genes. Genome Res. 16, 584–594 (2006).

  6. 6.

    et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  7. 7.

    , & Function and evolution of C2H2 zinc finger arrays. Subcell. Biochem. 52, 75–94 (2011).

  8. 8.

    & A catalogue of eukaryotic transcription factor types, their evolutionary origin, and species distribution. Subcell. Biochem. 52, 25–73 (2011).

  9. 9.

    , , & A census of human transcription factors: function, expression and evolution. Nat. Rev. Genet. 10, 252–263 (2009).

  10. 10.

    et al. Hypomethylation of multiple imprinted loci in individuals with transient neonatal diabetes is associated with mutations in ZFP57. Nat. Genet. 40, 949–951 (2008).

  11. 11.

    et al. Zinc finger 81 (ZNF81) mutations associated with X-linked mental retardation. J. Med. Genet. 41, 394–399 (2004).

  12. 12.

    et al. Whole exome sequencing identified a novel zinc-finger gene ZNF141 associated with autosomal recessive postaxial polydactyly type A. J. Med. Genet. 50, 47–53 (2013).

  13. 13.

    et al. KAP1 controls endogenous retroviruses in embryonic stem cells. Nature 463, 237–240 (2010).

  14. 14.

    & Dynamic control of endogenous retroviruses during development. Virology 411, 273–287 (2011).

  15. 15.

    et al. Proviral silencing in embryonic stem cells requires the histone methyltransferase ESET. Nature 464, 927–931 (2010).

  16. 16.

    & Coevolution of retroelements and tandem zinc finger genes. Genome Res. 21, 1800–1812 (2011).

  17. 17.

    et al. Molecular characterization of a putative antiretroviral transcriptional factor, OTK18. J. Immunol. 172, 381–391 (2004).

  18. 18.

    & Embryonic stem cells use ZFP809 to silence retroviral DNAs. Nature 458, 1201–1204 (2009).

  19. 19.

    et al. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature 516, 242–245 (2014).

  20. 20.

    et al. DNA-binding specificities of human transcription factors. Cell 152, 327–339 (2013).

  21. 21.

    & De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 42, 97–108 (2014).

  22. 22.

    et al. An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res. 42, 4800–4812 (2014).

  23. 23.

    , , & Beyond the “recognition code”: structures of two Cys2His2 zinc finger/TATA box complexes. Structure 9, 717–723 (2001).

  24. 24.

    , & Synergy between adjacent zinc fingers in sequence-specific DNA recognition. Proc. Natl. Acad. Sci. USA 94, 5617–5621 (1997).

  25. 25.

    & Keep your fingers off my DNA: protein-protein interactions mediated by C2H2 zinc finger domains. Cell Biochem. Biophys. 50, 111–131 (2008).

  26. 26.

    , & The protein-binding potential of C2H2 zinc finger domains. Cell Biochem. Biophys. 51, 9–19 (2008).

  27. 27.

    Three classes of C2H2 zinc finger proteins. Cell. Mol. Life Sci. 58, 625–635 (2001).

  28. 28.

    , & A bacterial one-hybrid system for determining the DNA-binding specificity of transcription factors. Nat. Biotechnol. 23, 988–994 (2005).

  29. 29.

    et al. A systematic characterization of factors that regulate Drosophila segmentation via a bacterial one-hybrid system. Nucleic Acids Res. 36, 2547–2560 (2008).

  30. 30.

    & DNA-binding specificity of NGFI-A and related zinc finger transcription factors. Mol. Cell. Biol. 15, 2275–2287 (1995).

  31. 31.

    et al. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol. 24, 1429–1435 (2006).

  32. 32.

    et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

  33. 33.

    et al. SVA elements: a hominid-specific retroposon family. J. Mol. Biol. 354, 994–1007 (2005).

  34. 34.

    et al. Regulated recruitment of HP1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation. Genes Dev. 17, 1855–1869 (2003).

  35. 35.

    et al. KRAB zinc finger protein ZNF382 is a proapoptotic tumor suppressor that represses multiple oncogenes and is commonly silenced in multiple carcinomas. Cancer Res. 70, 6516–6526 (2010).

  36. 36.

    Drosophia 12 Genes Consortium. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

  37. 37.

    , , , & Mouse segmental duplication and copy number variation. Nat. Genet. 40, 909–914 (2008).

  38. 38.

    et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 22, 1798–1812 (2012).

  39. 39.

    et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).

  40. 40.

    Evo-devo and an expanding evolutionary synthesis: a genetic theory of morphological evolution. Cell 134, 25–36 (2008).

  41. 41.

    et al. Determination and inference of Eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).

  42. 42.

    & Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121 (2002).

  43. 43.

    et al. Shuffling of cis-regulatory elements is a pervasive feature of the vertebrate lineage. Genome Biol. 7, R56 (2006).

  44. 44.

    et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nat. Genet. 39, 730–732 (2007).

  45. 45.

    & Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009).

  46. 46.

    , , & Identifying specificity profiles for peptide recognition modules from phage-displayed peptide libraries. Nat. Protoc. 2, 1368–1386 (2007).

  47. 47.

    et al. An optimized two-finger archive for ZFN-mediated gene targeting. Nat. Methods 9, 588–590 (2012).

  48. 48.

    & Identifying DNA sequences recognized by a transcription factor using a bacterial one-hybrid system. Nat. Protoc. 1, 30–45 (2006).

  49. 49.

    , , & Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

  50. 50.

    & Clustering by passing messages between data points. Science 315, 972–976 (2007).

  51. 51.

    & Determining the specificity of protein-DNA interactions. Nat. Rev. Genet. 11, 751–760 (2010).

  52. 52.

    , , & Least angle regression. Ann. Stat. 32, 407–499 (2004).

  53. 53.

    Random forests. Mach. Learn. 45, 5–32 (2001).

  54. 54.

    et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).

  55. 55.

    et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

  56. 56.

    et al. Using multiple structure alignments, fast model building, and energetic analysis in fold recognition and homology modeling. Proteins 53 (suppl. 6), 430–435 (2003).

  57. 57.

    & 3DNA: a software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res. 31, 5108–5121 (2003).

  58. 58.

    Successful molecular dynamics simulation of two zinc complexes bridged by a hydroxide in phosphotriesterase using the cationic dummy atom method. Proteins 45, 183–189 (2001).

  59. 59.

    et al. UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).

  60. 60.

    WHAT IF: a molecular modeling and drug design program. J. Mol. Graph. 8, 52–56 (1990).

  61. 61.

    et al. The Amber biomolecular simulation programs. J. Comput. Chem. 26, 1668–1688 (2005).

  62. 62.

    , & Numerical integration of the cartesian equations of motion of a system with constraints: molecular dynamics of n-alkanes. J. Comput. Phys. 23, 327–341 (1977).

  63. 63.

    , , & Efficient particle-mesh Ewald based approach to fixed and induced dipolar interactions. J. Chem. Phys. 113, 10913 (2000).

  64. 64.

    et al. Refinement of the AMBER force field for nucleic acids: improving the description of alpha/gamma conformers. Biophys. J. 92, 3817–3829 (2007).

  65. 65.

    et al. Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins 65, 712–725 (2006).

  66. 66.

    , , , & Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684 (1984).

  67. 67.

    , & MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J. Mol. Graph. Model. 22, 377–395 (2004).

  68. 68.

    , & Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320, 369–387 (2002).

  69. 69.

    , , , & Sequence specificity is obtained from the majority of modular C2H2 zinc-finger arrays. Nucleic Acids Res. 39, 4680–4690 (2011).

  70. 70.

    et al. PP4R4/KIAA1622 forms a novel stable cytosolic complex with phosphoprotein phosphatase 4. J. Biol. Chem. 283, 29273–29284 (2008).

  71. 71.

    et al. A lentiviral RNAi library for human and mouse genes applied to an arrayed viral high-content screen. Cell 124, 1283–1298 (2006).

  72. 72.

    et al. Label-free quantitative proteomics and SAINT analysis enable interactome mapping for the human Ser/Thr protein phosphatase 5. Proteomics 11, 1508–1516 (2011).

  73. 73.

    et al. ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods 48, 240–248 (2009).

  74. 74.

    & Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  75. 75.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  76. 76.

    , , & Estimating enrichment of repetitive elements from high-throughput sequence data. Genome Biol. 11, R69 (2010).

  77. 77.

    et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

  78. 78.

    et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).

  79. 79.

    , & Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).

  80. 80.

    & MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

  81. 81.

    & Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 40, e128 (2012).

  82. 82.

    et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).

  83. 83.

    , & FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

  84. 84.

    et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).

  85. 85.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

  86. 86.

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  87. 87.

    et al. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 23, 777–788 (2013).

Download references

Acknowledgements

We are grateful to S. Wolfe for providing B1H reagents and protocols, F. Aidoo, H. Zheng, H. Tang, P. Young, T. Kanagalingam, D. Torti and the Donnelly Sequencing Centre for technical support, and E. Chan, H. van Bakel and X. Chen for computational support and analyses. This work was supported by grants from the Canadian Institutes of Health Research (MOP-77721 and MOP-111007 to T.R.H., MOP-272138 to T.R.H., J.G. and Andrew Emili), and funding from the Canadian Institutes for Advanced Research to T.R.H., B.J.F. and M.T.W. H.S.N. was supported by a Canadian Institutes of Health Research Banting Fellowship, F.W.S. by a European Molecular Biology Organization postdoctoral fellowship, and K.N.L. by a Natural Science and Engineering Research Council CGS-M.

Author information

Author notes

    • Hamed S Najafabadi
    • , Sanie Mnaimneh
    •  & Frank W Schmitges

    These authors contributed equally to this work.

Affiliations

  1. Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.

    • Hamed S Najafabadi
    • , Sanie Mnaimneh
    • , Frank W Schmitges
    • , Michael Garton
    • , Ally Yang
    • , Mihai Albu
    • , Philip M Kim
    • , Jack Greenblatt
    • , Brendan J Frey
    •  & Timothy R Hughes
  2. Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.

    • Kathy N Lam
    • , Ernest Radovani
    • , Philip M Kim
    • , Jack Greenblatt
    •  & Timothy R Hughes
  3. Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

    • Matthew T Weirauch
  4. Canadian Institutes for Advanced Research, Toronto, Ontario, Canada.

    • Matthew T Weirauch
    • , Brendan J Frey
    •  & Timothy R Hughes
  5. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

    • Philip M Kim
    •  & Brendan J Frey
  6. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada.

    • Brendan J Frey

Authors

  1. Search for Hamed S Najafabadi in:

  2. Search for Sanie Mnaimneh in:

  3. Search for Frank W Schmitges in:

  4. Search for Michael Garton in:

  5. Search for Kathy N Lam in:

  6. Search for Ally Yang in:

  7. Search for Mihai Albu in:

  8. Search for Matthew T Weirauch in:

  9. Search for Ernest Radovani in:

  10. Search for Philip M Kim in:

  11. Search for Jack Greenblatt in:

  12. Search for Brendan J Frey in:

  13. Search for Timothy R Hughes in:

Contributions

H.S.N., S.M., F.W.S. and T.R.H. conceived and designed the experiments. S.M. performed the B1H experiments, with contributions from K.N.L. F.W.S. performed the ChIP-seq experiments, with contributions from E.R. S.M. and A.Y. performed the PBM experiments. H.S.N. analyzed the data and developed the computational models. M.G. performed the structural modeling. M.A., M.T.W. and T.R.H. contributed to data analysis. J.G. contributed reagents and materials. P.M.K., J.G. and B.J.F. provided critical advice and commentary on data analysis. H.S.N. prepared the figures. T.R.H. conceived the study and supervised the project, and H.S.N. and T.R.H. wrote the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Timothy R Hughes.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–10 and Supplementary Table 1

Zip files

  1. 1.

    Supplementary Data

    Supplementary Data

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.3128

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.