Building an ENCODE-style data compendium on a shoestring

Ruau, David; Ng, Felicia S L; Wilson, Nicola K; Hannah, Rebecca; Diamanti, Evangelia; Lombard, Patrick; Woodhouse, Steven; Göttgens, Berthold

doi:10.1038/nmeth.2643

Download PDF

Correspondence
Published: 27 September 2013

Building an ENCODE-style data compendium on a shoestring

David Ruau^1,2,
Felicia S L Ng^1,2,
Nicola K Wilson^1,2,
Rebecca Hannah^1,2,
Evangelia Diamanti^1,2,
Patrick Lombard^1,2,
Steven Woodhouse^1,2 &
…
Berthold Göttgens^1,2

Nature Methods volume 10, page 926 (2013)Cite this article

2509 Accesses
11 Citations
3 Altmetric
Metrics details

Subjects

To the Editor:

One perhaps unintended consequence of the success of the human genome project has been a shift in the biomedical research funding landscape toward large-scale programs, commonly involving several hundred scientists and budgets of hundreds of millions of dollars. However, this emphasis on large-scale projects has been questioned, as illustrated by recent debates following last year's publications from the Encyclopedia of DNA Elements (ENCODE) project^1,2. Rather than making decisions ahead of time about what data sets should be generated for a given research community, as large-scale projects must do, we have explored an alternative approach, compiling all data sets produced by one such community as soon as they have been deposited in public databases. We demonstrate that the compendium size resulting from such real-time curation can exceed that of large-consortium efforts, thereby providing a highly topical contribution to the ongoing 'small science versus big science' debate.

We created HAEMCODE, a repository for transcription factor (TF)-binding maps in mouse blood cells; the maps are generated from chromatin immunoprecipitation followed by sequencing (ChIP-seq) data. Using a standardized analysis pipeline, we manually curated more than 300 TF ChIP-seq studies from a wide range of primary mouse hematopoietic cells and major cell line models. As of September 2013, the HAEMCODE compendium covered 84 TFs across 24 major blood cell types. Hemopoiesis is also a major focus of ENCODE, yet the currently available mouse ENCODE data (36 TFs; May 2013) cover less than half the HAEMCODE contents, with only 9 TFs investigated by ENCODE not available elsewhere.

We developed a Web interface (http://haemcode.stemcells.cam.ac.uk/) to provide data access as well as a range of online analysis tools that we designed to be useful to both experimentalists and computational biologists. In the classical use case, a user selects experiments within HAEMCODE before being directed to a workspace that offers precomputed options to inspect and/or download selected ChIP-seq data sets. Additional online tools can compute global similarity between selected experiments, investigate overrepresentation of a user-submitted gene list in any subset of ChIP-seq experiments³, inspect precomputed results from de novo motif discovery and output all ChIP-seq experiments with binding peaks for a user-supplied gene locus.

Integration of publicly available data represents a powerful approach to make novel discoveries across diseases, species and platforms that would be impossible to achieve from single projects⁴. Successful completion of the HAEMCODE project on a small budget highlights this approach as a potentially widely applicable complement to multimillion-dollar research initiatives.

References

Alberts, B. Science 337, 1583 (2012).
Google Scholar
The ENCODE Project Consortium. Nature 489, 57–74 (2012).
Joshi, A., Hannah, R., Diamanti, E. & Göttgens, B. Exp. Hematol. 41, 354–366 (2013).
Google Scholar
Butte, A.J. & Kohane, I.S. Nat. Biotechnol. 24, 55–62 (2006).
Google Scholar

Download references

Acknowledgements

This work was funded by the Biotechnology and Biological Sciences Research Council, Leukaemia and Lymphoma Research, the Medical Research Council (MRC), Cancer Research UK, the Cambridge National Institute for Health Research (NIHR) Biomedical Research Center and core support grants from the Wellcome Trust–MRC Cambridge Stem Cell Institute. F.S.L.N. is supported by a Yousef Jameel scholarship.

Author information

Authors and Affiliations

Department of Hematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
David Ruau, Felicia S L Ng, Nicola K Wilson, Rebecca Hannah, Evangelia Diamanti, Patrick Lombard, Steven Woodhouse & Berthold Göttgens
Wellcome Trust–Medical Research Council (MRC) Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
David Ruau, Felicia S L Ng, Nicola K Wilson, Rebecca Hannah, Evangelia Diamanti, Patrick Lombard, Steven Woodhouse & Berthold Göttgens

Authors

David Ruau
View author publications
You can also search for this author in PubMed Google Scholar
Felicia S L Ng
View author publications
You can also search for this author in PubMed Google Scholar
Nicola K Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Hannah
View author publications
You can also search for this author in PubMed Google Scholar
Evangelia Diamanti
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Lombard
View author publications
You can also search for this author in PubMed Google Scholar
Steven Woodhouse
View author publications
You can also search for this author in PubMed Google Scholar
Berthold Göttgens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to David Ruau or Berthold Göttgens.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruau, D., Ng, F., Wilson, N. et al. Building an ENCODE-style data compendium on a shoestring. Nat Methods 10, 926 (2013). https://doi.org/10.1038/nmeth.2643

Download citation

Published: 27 September 2013
Issue Date: October 2013
DOI: https://doi.org/10.1038/nmeth.2643

Building an ENCODE-style data compendium on a shoestring

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interests

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links