Mapping genome-wide transcription-factor binding sites using DAP-seq

Journal name:
Nature Protocols
Volume:
12,
Pages:
1659–1672
Year published:
DOI:
doi:10.1038/nprot.2017.055
Published online

Abstract

To enable low-cost, high-throughput generation of cistrome and epicistrome maps for any organism, we developed DNA affinity purification sequencing (DAP-seq), a transcription factor (TF)-binding site (TFBS) discovery assay that couples affinity-purified TFs with next-generation sequencing of a genomic DNA library. The method is fast, inexpensive, and more easily scaled than chromatin immunoprecipitation sequencing (ChIP-seq). DNA libraries are constructed using native genomic DNA from any source of interest, preserving cell- and tissue-specific chemical modifications that are known to affect TF binding (such as DNA methylation) and providing increased specificity as compared with in silico predictions based on motifs from methods such as protein-binding microarrays (PBMs) and systematic evolution of ligands by exponential enrichment (SELEX). The resulting DNA library is incubated with an affinity-tagged in vitro-expressed TF, and TF–DNA complexes are purified using magnetic separation of the affinity tag. Bound genomic DNA is eluted from the TF and sequenced using next-generation sequencing. Sequence reads are mapped to a reference genome, identifying genome-wide binding locations for each TF assayed, from which sequence motifs can then be derived. A researcher with molecular biology experience should be able to follow this protocol, processing up to 400 samples per week.

At a glance

Figures

  1. DAP-seq protocol overview.
    Figure 1: DAP-seq protocol overview.

    (a) An adaptor-ligated DNA library is prepared by shearing native genomic DNA into ~200-bp fragments and ligating Illumina-based sequencing adaptors to the repaired ends. (b) Transcription factor (TF) ORF clones fused to the Halo affinity tag are expressed in vitro and bound to ligand-coupled beads, whereas nonspecific proteins are washed away. (c) HaloTag-TF fusion proteins are incubated with an adaptor-ligated genomic DNA library, and unbound DNA fragments are washed away. Samples are heated to release TF-bound DNA, and the recovered DNA is PCR-amplified to attach indexed sequencing primers. Indexed DNA samples are subsequently combined and size-selected to remove residual adaptor dimers. Purified DNA libraries are sequenced using next-generation sequencing, and the resulting genome-wide binding events are analyzed. Peaks shown are maize DAP-seq peaks viewed in the Integrative Genomics Viewer.

  2. DAP-seq DNA library titration experiment with Arabidopsis transcription factor TGA5 (AT5G06960).
    Figure 2: DAP-seq DNA library titration experiment with Arabidopsis transcription factor TGA5 (AT5G06960).

    (a) Number of mapped reads increases with input library amount. (b,c) Number of peaks called (b) is highest at 50 ng of input, then decreases, probably because of the increase in background noise, consistent with the reduction in fraction of reads in peaks (c). (d) Coverage histogram of the number of positions in the genome (y axis) covered at each sequencing depth (x axis) shows that input of 10, 50, and 100 ng leads to enrichment (i.e., high sequencing depth) at a subset of base positions. (e,f) Strand shift cross-coverage score (e) and mean read pileup at peaks (f), both indicators of the extent of enrichment at specific genomic locations, are much higher for 50- and 100-ng input. (g) The sequence motifs identified from the top 600 peaks of all experiments are very similar. (h,i) Similarities between pairs of experiments, both quantitative (h—Pearson's correlation of reads in peaks) and qualitative (i—Jaccard index of peak regions), show that experiments of 10, 50, and 100 ng identify similar genome-wide binding profiles. DNA library was prepared from A. thaliana ecotype Col-0. Following the DAP-seq protocol with the TF TGA5, the resulting libraries were sequenced on an Illumina HiSeq 4000 instrument with 100-bp paired-end reads. Reads were mapped using Bowtie2 (ref. 28) v2.2.9 against the TAIR10 reference with default parameters. Reads aligned to nuclear chromosomes with MAPQ scores ≥30 were used to call peaks with the GEM peak caller30 using only the first read in the pair with the parameters -''--k_min 6 --k_max 20 --k_seqs 600 --outNP --outMEME --outJASPAR --k_neg_dinu_shuffle --t 11''. The top 600 peaks, ranked first by enrichment q-value, then by fold enrichment, were used for de novo motif discovery by MEME-ChIP32 v4.11.2. Fractions of reads in peaks, coverage histograms, strand-shift cross-coverage scores, mean read pileup at peaks, and pairwise Pearson's correlations were computed by the R package ChIPQC33 v1.10.2. Motif logos were drawn by the R package motifStack34 v1.16.2. Pairwise Jaccard Index was computed by BEDTools35 v2.24.0.

References

  1. Swinnen, G., Goossens, A. & Pauwels, L. Lessons from domestication: targeting cis-regulatory elements for crop improvement. Trends Plant Sci. http://dx.doi.org/10.1016/j.tplants.2016.01.014 (2016).
  2. Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell http://dx.doi.org/10.1016/j.cell.2016.07.012 (2016).
  3. Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M. & Teichmann, S.A. Structure and evolution of transcriptional regulatory networks. Curr. Opin. Struct. Biol. 14, 283291 (2004).
  4. Niu, W. et al. Diverse transcription factor binding features revealed by genome-wide ChIP-seq in C. elegans. Genome Res. 21, 245254 (2011).
  5. Negre, N. et al. A cis-regulatory map of the Drosophila genome. Nature 471, 527531 (2011).
  6. Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91100 (2012).
  7. Celniker, S.E. et al. Unlocking the secrets of the genome. Nature 18, 927930 (2009).
  8. Landt, S.G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 18131831 (2012).
  9. Weirauch, M.T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 14311443 (2014).
  10. Jolma, A. et al. DNA-binding specificities of human transcription factors. Cell 152, 327339 (2013).
  11. Jolma, A. et al. DNA-dependent formation of transcription factor pairs alters their binding specificity. Nature 527, 384388 (2015).
  12. Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575579 (2015).
  13. Hu, S. et al. DNA methylation presents distinct binding sites for human transcription factors. Elife 2013 (2013).
  14. Raghav, S.K. et al. Integrative genomics identifies the corepressor SMRT as a gatekeeper of adipogenesis through the transcription factors C/EBPB and KAISO. Mol. Cell 46, 335350 (2012).
  15. O'Malley, R.C. et al. Cistrome and epicistrome features shape the regulatory DNA landscape. Cell http://dx.doi.org/10.1016/j.cell.2016.04.038 (2016).
  16. Dror, I., Golan, T., Levy, C., Rohs, R. & Mandel-Gutfreund, Y. A widespread role of the motif environment in transcription factor binding across diverse protein families. Genome Res. http://dx.doi.org/10.1101/gr.184671.114 (2015).
  17. Los, G.V. et al. HaloTag: a novel protein labeling technology for cell imaging and protein analysis. ACS Chem. Biol. http://dx.doi.org/10.1021/cb800025k (2008).
  18. Worsley Hunt, R. & Wasserman, W.W. Non-targeted transcription factors motifs are a systemic component of ChIP-seq datasets. Genome Biol. 15, 412 (2014).
  19. Schultz, M.D. et al. Human body epigenome maps reveal noncanonical DNA methylation variation. Nature 523, 212216 (2015).
  20. Kawakatsu, T. et al. Unique cell-type-specific patterns of DNA methylation in the root meristem. Nat. Plants 2, 16058 (2016).
  21. Song, L. & Crawford, G.E. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb. Protoc. http://dx.doi.org/10.1101/pdb.prot5384 (2010).
  22. Buenrostro, J.D., Giresi, P.G., Zaba, L.C., Chang, H.Y. & Greenleaf, W.J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 12131218 (2013).
  23. Rizzo, J.M. & Sinha, S. Epidermal Cells: Methods and Protocols (ed. Turksen, K.) 4959 (Springer, 2014).
  24. Kawakatsu, T. et al. Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166, 492506 (2016).
  25. Chen, K., Zhao, B.S. & He, C. Nucleic acid modifications in regulation of gene expression. Cell Chem. Biol. 23, 7485 (2016).
  26. Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science 333, 601607 (2011).
  27. Yazaki, J. et al. Mapping transcription factor interactome networks using HaloTag protein arrays 113, E4238E4247.
  28. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012).
  29. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
  30. Guo, Y., Mahony, S. & Gifford, D.K. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput. Biol. 8, e1002638 (2012).
  31. Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 2426 (2011).
  32. Machanick, P. & Bailey, T.L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 16961697 (2011).
  33. Carroll, T.S., Liang, Z., Salama, R., Stark, R. & de Santiago, I. Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data. Front. Genet. 5, 75 (2014).
  34. Ou, J. & Zhu, L.J. motifStack: plot stacked logos for single or multiple DNA, RNA and amino acid sequence. http://bioconductor.org/packages/release/bioc/html/motifStack.html (2015).
  35. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841842 (2010).
  36. Harper, S. & Speicher, D.W. in Protein Chromatography (Humana Press) 681, 259280 (2011).
  37. Structural Genomics Consortium. et al. Protein production and purification. Nat. Methods 5, 135146 (2008).
  38. Urich, M.A., Nery, J.R., Lister, R., Schmitz, R.J. & Ecker, J.R. MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing. Nat. Protoc. 10, 47583 (2015).
  39. Gallagher, S. & Chakavarti, D. Immunoblot analysis. J. Vis. Exp. 2, 2008 (2008).

Download references

Author information

  1. Present address: United States Department of Energy Joint Genome Institute, Walnut Creek, California, USA.

    • Ronan C O'Malley

Affiliations

  1. Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA.

    • Anna Bartlett,
    • Ronan C O'Malley,
    • Mary Galli,
    • Joseph R Nery &
    • Joseph R Ecker
  2. Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California, USA.

    • Ronan C O'Malley,
    • Shao-shan Carol Huang &
    • Joseph R Ecker
  3. Waksman Institute of Microbiology, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA.

    • Mary Galli &
    • Andrea Gallavotti
  4. Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, California, USA.

    • Joseph R Ecker

Contributions

R.C.O. and J.R.E. designed the original protocol. R.C.O., A.B., S.-s.C.H., M.G., and A.G. modified and updated the protocol to its current state. J.R.N. performed all the sequencing. A.B., M.G., S.-s.C.H., and J.R.E. wrote the manuscript with contributions from all authors.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Additional data