Identifying ChIP-seq enrichment using MACS

Abstract

Model-based analysis of ChIP-seq (MACS) is a computational algorithm that identifies genome-wide locations of transcription/chromatin factor binding or histone modification from ChIP-seq data. MACS consists of four steps: removing redundant reads, adjusting read position, calculating peak enrichment and estimating the empirical false discovery rate (FDR). In this protocol, we provide a detailed demonstration of how to install MACS and how to use it to analyze three common types of ChIP-seq data sets with different characteristics: the sequence-specific transcription factor FoxA1, the histone modification mark H3K4me3 with sharp enrichment and the H3K36me3 mark with broad enrichment. We also explain how to interpret and visualize the results of MACS analyses. The algorithm requires 3 GB of RAM and 1.5 h of computing time to analyze a ChIP-seq data set containing 30 million reads, an estimate that increases with sequence coverage. MACS is open source and is available from http://liulab.dfci.harvard.edu/MACS/.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Workflow of MACS 1.4.2.
Figure 2: Peak model built by MACS using the FoxA1 data set.
Figure 3: IGV visualization of MACS results using the FoxA1 data set.
Figure 4: IGV visualization of MACS results using the University of Washington H3K4me3 data set.
Figure 5: IGV visualization of MACS results using the Broad Institute H3K36me3 data set.

References

  1. 1

    Mardis, E.R. ChIP-seq: welcome to the new frontier. Nat. Methods 4, 613–614 (2007).

    CAS  Article  PubMed  Google Scholar 

  2. 2

    Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669–680 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3

    Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  Article  Google Scholar 

  4. 4

    Johnson, D.S., Mortazavi, A., Myers, R.M. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).

    CAS  Article  Google Scholar 

  5. 5

    Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6

    Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4, 651–657 (2007).

    CAS  Article  PubMed  Google Scholar 

  7. 7

    Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8

    Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotech. 27, 66–75 (2009).

    CAS  Article  Google Scholar 

  9. 9

    Vega, V.B., Cheung, E., Palanisamy, N. & Sung, W.-K. Inherent signals in sequencing-based chromatin-immunoprecipitation control libraries. PLoS ONE 4, e5241 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10

    Liu, E.T., Pott, S. & Huss, M. Q&A: ChIP-seq technologies and the study of gene regulation. BMC Biol. 8, 56 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11

    Teytelman, L. et al. Impact of chromatin structures on DNA processing for genomic analyses. PLoS ONE 4, e6700 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12

    Nix, D.A., Courdy, S.J. & Boucher, K.M. Empirical methods for controlling false positives and estimating confidence in ChIP-seq peaks. BMC Bioinformatics 9, 523 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13

    Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137–R137 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  14. 14

    Tavares, L. et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27me3. Cell 148, 664–678 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15

    Ulitsky, I., Shkumatava, A., Jan, C.H., Sive, H. & Bartel, D.P. Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution. Cell 147, 1537–1550 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16

    He, H.H. et al. Nucleosome dynamics define transcriptional enhancers. Nat. Genet. 42, 343–347 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. 17

    Zheng, W., Zhao, H., Mancera, E., Steinmetz, L.M. & Snyder, M. Genetic analysis of variation in transcription factor binding in yeast. Nature 464, 1187–1191 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. 18

    Noordermeer, D. et al. The dynamic architecture of Hox gene clusters. Science 334, 222–225 (2011).

    CAS  Article  PubMed  Google Scholar 

  19. 19

    Welboren, W.-J. et al. ChIP-seq of ERα and RNA polymerase II defines genes differentially responding to ligands. EMBO J. 28, 1418–1428 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  20. 20

    Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007).

    CAS  Article  PubMed  Google Scholar 

  21. 21

    Liu, T. et al. Cistrome: an integrative platform for transcriptional regulation studies. Genome Biol. 12, R83 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22

    Jothi, R., Cuddapah, S., Barski, A., Cui, K. & Zhao, K. Genome-wide identification of in vivo protein–DNA binding sites from ChIP-seq data. Nucleic Acids Res. 36, 5221–5231 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23

    Ji, H. et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data. Nat. Biotech. 26, 1293–1300 (2008).

    CAS  Article  Google Scholar 

  24. 24

    Zang, C. et al. A clustering approach for identification of enriched domains from histone modification ChIP-seq data. Bioinformatics 25, 1952–1958 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25

    Fejes, A.P. et al. FindPeaks 3.1: a tool for identifying areas of enrichment from massively parallel short-read sequencing technology. Bioinformatics 24, 1729–1730 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26

    Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-seq data. Nat. Methods 5, 829–834 (2008).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27

    Laajala, T.D. et al. A practical comparison of methods for detecting transcription factor binding sites in ChIP-seq experiments. BMC Genomics 10, 618 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28

    Wilbanks, E.G. & Facciotti, M.T. Evaluation of algorithm performance in ChIP-seq peak detection. PLoS ONE 5, e11471 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29

    Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-seq and RNA-seq studies. Nat. Methods 6, S22–S32 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30

    Barski, A. & Zhao, K. Genomic location analysis by ChIP-seq. J. Cell Biochem. 107, 11–18 (2009).

    CAS  Article  PubMed  Google Scholar 

  31. 31

    Malone, B.M., Tan, F., Bridges, S.M. & Peng, Z. Comparison of four ChIP-seq analytical algorithms using rice endosperm H3K27 trimethylation profiling data. PLoS ONE 6, e25260 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32

    Chen, Y. et al. Systematic evaluation of factors influencing ChIP-seq fidelity. Nat. Methods 9, 609–614 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  33. 33

    Stitzel, M.L. et al. Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. Cell Metab. 12, 443–455 (2010).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34

    Sati, S. et al. High resolution methylome map of rat indicates role of intragenic DNA methylation in identification of coding region. PLoS ONE 7, e31621 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35

    Li, N. et al. Whole genome DNA methylation analysis based on high throughput sequencing technology. Methods 52, 203–212 (2010).

    Article  PubMed  Google Scholar 

  36. 36

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  37. 37

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  38. 38

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  39. 39

    Salmon-Divon, M., Dvinge, H., Tammoja, K. & Bertone, P. PeakAnalyzer: genome-wide annotation of chromatin binding and modification loci. BMC Bioinformatics 11, 415 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. 40

    Robinson, J.T. et al. Integrative genomics viewer. Nat. Biotech. 29, 24–26 (2011).

    CAS  Article  Google Scholar 

  41. 41

    Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  42. 42

    Nicol, J.W., Helt, G.A., Blanchard, S.G. Jr ., Raja, A. & Loraine, A.E. The integrated genome browser: free software for distribution and exploration of genome-scale datasets. Bioinformatics 25, 2730–2731 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This project was supported by the National Natural Science Foundation of China (31028011 and 31071114); the National Basic Research Program of China (973 Program: 2010CB944904 and 2011CB965104); US National Institutes of Health grant HG4069; and the Excellent Young Teachers Program of Tongji University (2010KJ041).

Author information

Affiliations

Authors

Contributions

Y.Z., T.L. and X.S.L. developed the original MACS algorithm. T.L. developed the current version of the MACS program. J.F. and B.Q. performed the data analysis. J.F., T.L. and X.S.L. wrote the initial manuscript. All authors contributed to the discussion and writing of the final manuscript.

Corresponding authors

Correspondence to Yong Zhang or Xiaole Shirley Liu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Feng, J., Liu, T., Qin, B. et al. Identifying ChIP-seq enrichment using MACS. Nat Protoc 7, 1728–1740 (2012). https://doi.org/10.1038/nprot.2012.101

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing