Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Next-generation genomics: an integrative approach

Key Points

  • The integration of transcriptomic, genetic, genomic, epigenetic and network interaction data is crucial for a unified view of biological processes and to advance our understanding of human disease and biology.

  • The genome sequence is a scaffold on which known annotations and experimental data can be assembled. It is useful to view these different levels of information together on a genome browser.

  • Data integration can be used to identify functional elements in the genome, explore the function of genetic variation and improve understanding of gene regulation.

  • Given large multidimensional data sets with minimal parameters, unsupervised learning techniques can be used to identify frequently occurring patterns in the data and therefore to suggest hypotheses.

  • Carefully designed computational experiments for supervised integration can be used to test hypotheses on a global scale. Other supervised approaches, such as Bayesian networks, can also generate hypotheses of function.

  • There are a range of online and stand-alone tools available to bench scientists for tackling large-scale data sets.

  • Several analytical hurdles remain, which are being addressed by bioinformaticians.

Abstract

Integrating results from diverse experiments is an essential process in our effort to understand the logic of complex systems, such as development, homeostasis and responses to the environment. With the advent of high-throughput methods — including genome-wide association (GWA) studies, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and RNA sequencing (RNA–seq) — acquisition of genome-scale data has never been easier. Epigenomics, transcriptomics, proteomics and genomics each provide an insightful, and yet one-dimensional, view of genome function; integrative analysis promises a unified, global view. However, the large amount of information and diverse technology platforms pose multiple challenges for data access and processing. This Review discusses emerging issues and strategies related to data integration in the era of next-generation genomics.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Annotating the genome through detecting transcription-factor binding sites and histone-modification states.
Figure 2: Identification of regulatory SNPs.
Figure 3: Data visualization.
Figure 4: Flow chart for data analysis.

References

  1. Licatalosi, D. D. & Darnell, R. B. RNA processing and its regulation: global insights into biological networks. Nature Rev. Genet. 11, 75–87 (2010).

    CAS  PubMed  Article  Google Scholar 

  2. Wang, Z., Gerstein, M. & Snyder, M. RNA–Seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).

    CAS  Article  PubMed  Google Scholar 

  3. Farnham, P. J. Insights from genomic profiling of transcription factors. Nature Rev. Genet. 10, 605–616 (2009).

    CAS  PubMed  Article  Google Scholar 

  4. Park, P. J. ChIP–seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).

    CAS  PubMed  Article  Google Scholar 

  5. Metzker, M. L. Sequencing technologies — the next generation. Nature Rev. Genet. 11, 31–46 (2010).

    CAS  PubMed  Article  Google Scholar 

  6. Laird, P. W. Principles and challenges of genome-wide DNA methylation analysis. Nature Rev. Genet. 11, 191–203 (2010).

    CAS  PubMed  Article  Google Scholar 

  7. Beyer, A., Bandyopadhyay, S. & Ideker, T. Integrating physical and genetic maps: from genomes to interaction networks. Nature Rev. Genet. 8, 699–710 (2007).

    CAS  Article  PubMed  Google Scholar 

  8. Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

    CAS  Article  PubMed  Google Scholar 

  9. Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Turner, E. H., Lee, C., Ng, S. B., Nickerson, D. A. & Shendure, J. Massively parallel exon capture and library-free resequencing across 16 genomes. Nature Methods 6, 315–316 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. Chiang, D. Y. et al. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nature Methods 6, 99–103 (2009).

    CAS  PubMed  Article  Google Scholar 

  12. Alkan, C. et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genet. 41, 1061–1067 (2009).

    CAS  PubMed  Article  Google Scholar 

  13. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA–Seq. Nature Methods 5, 621–628 (2008).

    CAS  PubMed  Article  Google Scholar 

  14. Maher, C. A. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. Gingeras, T. R. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457, 1028–1032 (2009).

    PubMed Central  Article  CAS  Google Scholar 

  16. Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009). This study demonstrates the integration of epigenetic data with the human genome to annotate novel RNAs.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Bernstein, B. E., Meissner, A. & Lander, E. S. The mammalian epigenome. Cell 128, 669–681 (2007).

    CAS  Article  PubMed  Google Scholar 

  19. Goldberg, A. D., Allis, C. D. & Bernstein, E. Epigenetics: a landscape takes shape. Cell 128, 635–638 (2007).

    CAS  PubMed  Article  Google Scholar 

  20. Jones, P. A. & Baylin, S. B. The epigenomics of cancer. Cell 128, 683–692 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705 (2007).

    CAS  PubMed  Article  Google Scholar 

  22. Li, E. Chromatin modification and epigenetic reprogramming in mammalian development. Nature Rev. Genet. 3, 662–673 (2002).

    CAS  PubMed  Article  Google Scholar 

  23. Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).

    CAS  Article  PubMed  Google Scholar 

  24. Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).

    CAS  PubMed  Google Scholar 

  25. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009). In addition to providing the first human methylomes, this study conducts an integrative analysis of DNA methylation, histone modifications and RNA–seq.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. Cokus, S. J. et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452, 215–219 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133, 523–536 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766–770 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Pomraning, K. R., Smith, K. M. & Freitag, M. Genome-wide high throughput analysis of DNA methylation in eukaryotes. Methods 47, 142–150 (2009).

    CAS  PubMed  Article  Google Scholar 

  30. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).

    CAS  PubMed  Article  Google Scholar 

  31. Wang, Z. et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nature Genet. 40, 897–903 (2008).

    CAS  PubMed  Article  Google Scholar 

  32. Crawford, G. E. et al. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nature Methods 3, 503–509 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Boyle, A. P. et al. High-resolution mapping and characterization of open chromatin across the genome. Cell 132, 311–322 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Sabo, P. J. et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nature Methods 3, 511–518 (2006).

    CAS  Article  PubMed  Google Scholar 

  35. Dorschner, M. O. et al. High-throughput localization of functional elements by quantitative chromatin profiling. Nature Methods 1, 219–225 (2004).

    CAS  PubMed  Article  Google Scholar 

  36. Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283–289 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Chi, S. W., Zang, J. B., Mele, A. & Darnell, R. B. Argonaute HITS-CLIP decodes microRNA–mRNA interaction maps. Nature 460, 479–486 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. Walhout, A. J. & Vidal, M. Protein interaction maps for model organisms. Nature Rev. Mol. Cell Biol. 2, 55–62 (2001).

    CAS  Article  Google Scholar 

  39. Hutchins, J. R. et al. Systematic analysis of human protein complexes identifies chromosome segregation proteins. Science 328, 593–599 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).

    CAS  PubMed  Article  Google Scholar 

  41. Simonis, M. et al. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nature Genet. 38, 1348–1354 (2006).

    CAS  PubMed  Article  Google Scholar 

  42. Dostie, J. et al. Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. Fullwood, M. J. & Ruan, Y. ChIP-based methods for the identification of long-range chromatin interactions. J. Cell. Biochem. 107, 30–39 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. Vassetzky, Y. et al. Chromosome conformation capture (from 3C to 5C) and its ChIP-based modification. Methods Mol. Biol. 567, 171–188 (2009).

    PubMed  Article  CAS  Google Scholar 

  45. Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. Fullwood, M. J. et al. An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58–64 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. Gobeil, S., Zhu, X., Doillon, C. J. & Green, M. R. A genome-wide shRNA screen identifies GAS1 as a novel melanoma metastasis suppressor gene. Genes Dev. 22, 2932–2940 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. Gazin, C., Wajapeyee, N., Gobeil, S., Virbasius, C. M. & Green, M. R. An elaborate pathway required for Ras-mediated epigenetic silencing. Nature 449, 1073–1077 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. Bric, A. et al. Functional identification of tumor-suppressor genes through an in vivo RNA interference screen in a mouse lymphoma model. Cancer Cell 16, 324–335 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Meacham, C. E., Ho, E. E., Dubrovsky, E., Gertler, F. B. & Hemann, M. T. In vivo RNAi screening identifies regulators of actin dynamics as key determinants of lymphoma progression. Nature Genet. 41, 1133–1137 (2009).

    CAS  PubMed  Article  Google Scholar 

  52. Luo, J. et al. A genome-wide RNAi screen identifies multiple synthetic lethal interactions with the Ras oncogene. Cell 137, 835–848 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  53. Zender, L. et al. An oncogenomics-based in vivo RNAi screen identifies tumor suppressors in liver cancer. Cell 135, 852–864 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  54. Schuldiner, M. et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507–519 (2005).

    CAS  Article  PubMed  Google Scholar 

  55. Roguev, A., Wiren, M., Weissman, J. S. & Krogan, N. J. High-throughput genetic interaction mapping in the fission yeast Schizosaccharomyces pombe. Nature Methods 4, 861–866 (2007).

    CAS  PubMed  Article  Google Scholar 

  56. Hannum, G. et al. Genome-wide association data reveal a global map of genetic interactions among protein complexes. PLoS Genet. 5, e1000782 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nature Genet. 39, 311–318 (2007).

    CAS  PubMed  Article  Google Scholar 

  58. Heintzman, N. D. et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature 459, 108–112 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Kim, T. H. et al. A high-resolution map of active promoters in the human genome. Nature 436, 876–880 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  60. Kim, T. H. et al. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231–1245 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Hartman, S. E. et al. Global changes in STAT target selection and transcription regulation upon interferon treatments. Genes Dev. 19, 2953–2968 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007).

    CAS  PubMed  Article  Google Scholar 

  63. Visel, A. et al. ChIP–seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. O'Geen, H. et al. Genome-wide analysis of KAP1 binding suggests autoregulation of KRAB-ZNFs. PLoS Genet. 3, e89 (2007).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  65. Lee, T. I. et al. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301–313 (2006).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).

    CAS  PubMed  Article  Google Scholar 

  67. Marks, H. et al. High-resolution analysis of epigenetic changes associated with X inactivation. Genome Res. 19, 1361–1373 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  68. Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  69. Pomerantz, M. M. et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nature Genet. 41, 882–884 (2009).

    CAS  PubMed  Article  Google Scholar 

  70. Wright, J. B., Brown, S. J. & Cole, M. D. Upregulation of c-MYC in cis through a large chromatin loop linked to a cancer risk-associated single-nucleotide polymorphism in colorectal cancer cells. Mol. Cell. Biol. 30, 1411–1420 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  71. Tuupanen, S. et al. The common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nature Genet. 41, 885–890 (2009).

    CAS  PubMed  Article  Google Scholar 

  72. Kasowski, M. et al. Variation in transcription factor binding among humans. Science 328, 232–235 (2010). This study shows that individual binding variability for RNAPII and NF-κB is linked to SNPs and structural variants that alter individual gene expression levels. The binding data enables functional annotation of regulatory SNPs.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  73. Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nature Genet. 42, 255–259 (2010). These authors used open chromatin maps to recover a type 2 diabetes-associated SNP in the intron of transcription factor 7-like 2 ( TCF7L2 ). Functional assays confirmed its role in enhancer activity.

    CAS  PubMed  Article  Google Scholar 

  74. Zhang, Y. et al. Model-based Analysis of ChIP–Seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed  PubMed Central  Google Scholar 

  75. Degner, J. F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  76. Maynard, N. D., Chen, J., Stuart, R. K., Fan, J. B. & Ren, B. Genome-wide mapping of allele-specific protein–DNA interactions in human cells. Nature Methods 5, 307–309 (2008).

    CAS  PubMed  Article  Google Scholar 

  77. McDaniell, R. et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010). This study shows that nucleotide sequences in human regulatory elements are variable, which suggests that these elements may contain regulatory SNPs.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  78. Hellman, A. & Chess, A. Gene body-specific methylation on the active X chromosome. Science 315, 1141–1143 (2007).

    CAS  PubMed  Article  Google Scholar 

  79. Edwards, C. A. & Ferguson-Smith, A. C. Mechanisms regulating imprinted genes in clusters. Curr. Opin. Cell Biol. 19, 281–289 (2007).

    CAS  PubMed  Article  Google Scholar 

  80. Marson, A. et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell 134, 521–533 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  81. Pandey, R. R. et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol. Cell 32, 232–246 (2008).

    CAS  Article  PubMed  Google Scholar 

  82. Nagano, T. et al. The Air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science 322, 1717–1720 (2008).

    CAS  PubMed  Article  Google Scholar 

  83. Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  84. Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nature Genet. 41, 376–381 (2009).

    CAS  PubMed  Article  Google Scholar 

  85. Andersson, R., Enroth, S., Rada-Iglesias, A., Wadelius, C. & Komorowski, J. Nucleosomes are well positioned in exons and carry characteristic histone modifications. Genome Res. 19, 1732–1741 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  86. Schwartz, S., Meshorer, E. & Ast, G. Chromatin organization marks exon–intron structure. Nature Struct. Mol. Biol. 16, 990–995 (2009).

    CAS  Article  Google Scholar 

  87. Luco, R. F. et al. Regulation of alternative splicing by histone modifications. Science 327, 996–1000 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  88. Spies, N., Nielsen, C. B., Padgett, R. A. & Burge, C. B. Biased chromatin signatures around polyadenylation sites and exons. Mol. Cell 36, 245–254 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  89. Hon, G., Wang, W. & Ren, B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS Comput. Biol. 5, e1000566 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  90. Schubeler, D. et al. The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote. Genes Dev. 18, 1263–1271 (2004).

    PubMed  PubMed Central  Article  Google Scholar 

  91. Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  92. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage. Proc. Natl Acad. Sci. USA 100, 15776–15781 (2003).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  93. Shumaker, D. K. et al. Mutant nuclear lamin A leads to progressive alterations of epigenetic control in premature aging. Proc. Natl Acad. Sci. USA 103, 8703–8708 (2006).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  94. Zeng, W. et al. Specific loss of histone H3 lysine 9 trimethylation and HP1γ/cohesin binding at D4Z4 repeats is associated with facioscapulohumeral dystrophy (FSHD). PLoS Genet. 5, e1000559 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  95. Chen, X. et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell 133, 1106–1117 (2008).

    CAS  Article  PubMed  Google Scholar 

  96. Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP–seq experiments for DNA-binding proteins. Nature Biotech. 26, 1351–1359 (2008).

    CAS  Article  Google Scholar 

  97. Schones, D. E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887–898 (2008).

    CAS  Article  PubMed  Google Scholar 

  98. Eisen, M. B., Spellman, P. T., Brown, P. O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl Acad. Sci. USA 95, 14863–14868 (1998).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  99. Hon, G., Ren, B. & Wang, W. ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome. PLoS Comput. Biol. 4, e1000201 (2008).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  100. Spirin, V. & Mirny, L. A. Protein complexes and functional modules in molecular networks. Proc. Natl Acad. Sci. USA 100, 12123–12128 (2003).

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  101. Mikkelsen, T. S. et al. Dissecting direct reprogramming through integrative genomic analysis. Nature 454, 49–55 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  102. Hawkins, R. D. et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell 6, 479–491 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  103. Rosenbloom, K. R. et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic Acids Res. 38, D620–D625 (2010).

    CAS  PubMed  Article  Google Scholar 

  104. Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  105. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nature Genet. 25, 25–29 (2000).

    CAS  Article  PubMed  Google Scholar 

  106. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  107. Oberdoerffer, S. et al. Regulation of CD45 alternative splicing by heterogeneous ribonucleoprotein, hnRNPLL. Science 321, 686–691 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  108. Needham, C. J., Bradford, J. R., Bulpitt, A. J. & Westhead, D. R. Inference in Bayesian networks. Nature Biotech. 24, 51–53 (2006).

    CAS  Article  Google Scholar 

  109. van Steensel, B. et al. Bayesian network analysis of targeting interactions in chromatin. Genome Res. 20, 190–200 (2010). This is an excellent example of using supervised integration with a Bayesian network to predict interactions among chromatin-associated proteins, then validating the findings experimentally.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  110. Yu, H., Zhu, S., Zhou, B., Xue, H. & Han, J. D. Inferring causal relationships among different histone modifications and gene expression. Genome Res. 18, 1314–1324 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  111. Jansen, R. et al. A Bayesian networks approach for predicting protein–protein interactions from genomic data. Science 302, 449–453 (2003).

    CAS  PubMed  Article  Google Scholar 

  112. Taylor, J., Schenck, I., Blankenberg, D. & Nekrutenko, A. Using Galaxy to perform large-scale interactive data analyses. Curr. Protoc. Bioinformatics Chapter 10, Unit 10.5 (2007).

    PubMed  Google Scholar 

  113. Blankenberg, D. et al. A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 17, 960–964 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  114. Ji, H. et al. An integrated software system for analyzing ChIP–chip and ChIP–seq data. Nature Biotech. 26, 1293–1300 (2008).

    CAS  Article  Google Scholar 

  115. Taslim, C. et al. Comparative study on ChIP–seq data: normalization and binding pattern characterization. Bioinformatics 25, 2334–2340 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  116. Celniker, S. E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  117. Collins, S. R. et al. Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature 446, 806–810 (2007).

    CAS  Article  PubMed  Google Scholar 

  118. Jaschek, R. & Tanay, A. Spatial clustering of multivariate genomic and epigenomic information. Lect. Notes Comput. Sci. 5541, 170–183 (2009).

    CAS  Article  Google Scholar 

  119. Dennis, G. Jr et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 4, P3 (2003).

    PubMed  Article  Google Scholar 

  120. Huang da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protoc. 4, 44–57 (2009).

    Article  CAS  Google Scholar 

  121. Lupien, M. et al. FoxA1 translates epigenetic signatures into enhancer-driven lineage-specific transcription. Cell 132, 958–970 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  122. Roh, T. Y., Cuddapah, S. & Zhao, K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 19, 542–552 (2005).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  123. Roh, T. Y., Wei, G., Farrell, C. M. & Zhao, K. Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns. Genome Res. 17, 74–81 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We apologize to those authors whose work we were unable to reference owing to limitations of space. R.D.H is supported by a postdoctoral fellowship from the American Cancer Society. We acknowledge generous funding from the Ludwig Institute for Cancer Research, the US National Institutes of Health, the California Institute of Regenerative Medicine and the Juvenile Diabetes Research Foundation. We thank the anonymous reviewers for their valuable comments on earlier versions of this Review.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Ren.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Bing Ren's homepage

1000 Genomes

Anno-J

The Cancer Genome Atlas

CEAS

CisGenome

Cytoscape

DAVID

ENCODE Project

Entrez Genome

Ensembl

Galaxy

Gene Ontology

Gene Set Enrichment Analysis

International HapMap Project

MACS

The MEME Suite

mouseNET

NCBI

NextBio

PubMed

Roadmap Epigenomics Project

STRING

UCSC Genome Browser

Glossary

Next-generation sequencing

Here, we define this as the use of established sequencing platforms, including the Illumina/Solexa Genome Analyzer, Roche/454 Genome Sequencer and Applied Biosystems SOLiD platforms, as well as newer platforms, such as the Helicos and Pacific Biosciences platforms.

Reduced representation bisulphite sequencing

This technique cuts genomic DNA with restriction enzymes to enrich for CG-rich regions, which are then converted through bisulphite treatment and sequenced with next-generation sequencing. Bisulphite treatment converts unmethylated C to uracil — which appears as T in sequencing reads — while leaving methylated C intact.

MeDIP–seq

Methylated DNA is immunoprecipitated with an antibody against methylated cytosine and then sequenced by next-generation sequencing.

MethylC–seq

(Also known as bisulphite conversion followed by sequencing (BS–seq).) Methylated DNA is identified by shotgun sequencing of bisulphite-converted DNA.

Sequence capture

This uses oligonucleotide microarrays or oligonucleotide-coupled beads to select for regions of the genome, such as all exons (exome sequencing) for targeted sequencing.

RNA sequencing

(RNA–seq.) RNA isolated from cells are sequenced by next-generation sequencing after conversion to cDNA.

Nuclear run-on

An assay that directly measures the transcriptional activity of a gene by incorporation of labelled UTP into its mRNA.

Histones

Small, highly conserved basic proteins, found in the chromatin of all eukaryotic cells, which associate with DNA to form a nucleosome. The amino-terminal tails of histones are subject to various post-translational modifications.

Chromatin immunoprecipitation

A technique used to identify potential regulatory sequences by isolating soluble DNA chromatin extracts (complexes of DNA and protein) using antibodies that recognize specific DNA-binding proteins.

DNase I hypersensitivity site footprinting

An assay that identifies regions of the genome that lack nucleosome structure and are therefore readily degraded by the enzyme DNase I. Such regions tend to be associated with transcriptional activity. When coupled with sequencing, the ends of DNA fragments generated by treatment of chromatin with DNase I are sequenced.

HITS-CLIP

A technique similar to ChIP–seq in which proteins bound to RNA — such as splicing factors — are immunoprecipitated and the RNA fragments are sequenced.

Two-hybrid

An assay system in which one protein is fused to an activation domain and the other to a DNA-binding domain, and both fusion proteins are expressed in cells. Expression of a reporter gene indicates that the two proteins physically interact.

Epistatic miniarray profiles

These are created by screening the fitness of double mutants in a high-throughput manner. The results, when analysed as a whole, can reveal both positive and negative genetic interactions between genes and provide insights into biological pathways and protein–protein complexes in the cell.

Single-nucleotide variant

Sequence variations that include insertions and deletions in addition to base substitutions (which are known as SNPs).

Genomic imprinting

The epigenetic marking of a gene on the basis of parental origin, which results in monoallelic expression.

Cap analysis of gene expression

(CAGE.) The high-throughput sequencing of concatamers of DNA tags that are derived from the initial nucleotides of 5′ mRNA.

Formaldehyde-assisted isolation of regulatory elements followed by sequencing

(FAIRE–seq.) This technique isolates nucleosome-free regions of DNA from chromatin during phenol:chloroform extraction.

Discretization

The conversion of a continuous signal to a discrete signal.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hawkins, R., Hon, G. & Ren, B. Next-generation genomics: an integrative approach. Nat Rev Genet 11, 476–486 (2010). https://doi.org/10.1038/nrg2795

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2795

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing