Epiviz: interactive visual analytics for functional genomics data

Journal name:
Nature Methods
Volume:
11,
Pages:
938–940
Year published:
DOI:
doi:10.1038/nmeth.3038
Received
Accepted
Published online

Visualization is an integral aspect of genomics data analysis. Algorithmic-statistical analysis and interactive visualization are most effective when used iteratively. Epiviz (http://epiviz.cbcb.umd.edu/), a web-based genome browser, and the Epivizr Bioconductor package allow interactive, extensible and reproducible visualization within a state-of-the-art data-analysis platform.

At a glance

Figures

  1. Screenshot of visualization of chromosome 11 region of colon cancer methylome using Epiviz.
    Figure 1: Screenshot of visualization of chromosome 11 region of colon cancer methylome using Epiviz.

    Views by feature (top): gene expression across multiple tissues (top left). Each cell of the graph shows the degree of gene expression based on the Gene Expression Barcode project7. The highlighted region (yellow boxes) shows the brushing feature linking all charts by spatial location. Difference in gene expression between colon tumor and normal (M) versus average expression (A) for colon normal and colon tumor as an MA plot (top right) shows genes in view region that are differentially expressed. Views by location (bottom): UCSC genome browser gene models (genes), hypermethylation and hypomethylation blocks showing long regions of methylation difference in colon cancer13, partially methylated domains in fibroblast (PMDs)15, and methylation colon cancer and normal showing base-pair-resolution smoothed methylation log ratio from sequencing of bisulfite-converted DNA13. (This workspace can be accessed at http://epiviz.cbcb.umd.edu/?ws=cDx4eNK96Ws.)

  2. Screenshots for integrative analysis of Illumina HumanMethylation450 BeadChip data and exon-level RNA-seq data using Epivizr.
    Figure 2: Screenshots for integrative analysis of Illumina HumanMethylation450 BeadChip data and exon-level RNA-seq data using Epivizr.

    View by feature (top): Difference between colon tumor and normal exon-level expression (M) versus average colon tumor and normal exon-level expression (A) as an MA plot of RNA-seq data from the TCGA project. View by location (bottom): annotation tracks from UCSC genome browser (genes and CpG islands), long regions of methylation difference obtained from sequencing data (hypermethylation and hypomethylation blocks)13, and methylation difference regions obtained from TCGA data using the HumanMethylation450 BeadChip (450k colon_blocks).

  3. The Epiviz architecture.
    Supplementary Fig. 1: The Epiviz architecture.

    Presentation, visualizations and data representations are distinct. This allows Epiviz to reuse visualizations regardless of data source (Epiviz sever, or WebSocket connection through Epivizr). Data providers and visualizations can be plugged in on the fly using Epiviz’ plugin API.

  4. Chart load times with and without cache.
    Supplementary Fig. 2: Chart load times with and without cache.

    Average comparison of time taken by ‘add chart’ and ‘navigate’ operations per 1,000 data objects with and without using the predictive cache in the Epiviz data management tier.

  5. Chart draw times for different parameter values.
    Supplementary Fig. 3: Chart draw times for different parameter values.

    A comparison of draw times when varying specific chart parameters for Scatter Plot and Blocks Track. The parameter for scatter plot is “circle ratio” which splits the chart in a grid of squares of width equal to this parameter, and draws at most one circle in each cell of the grid. All data objects that overlap this point are mapped to the single circle displayed. The parameter for block tracks is the minimum distance in screen pixels between two blocks before they are merged into one display object. Again, all data objects merged are mapped to the single display object. The data to visual object mapping is used for brushing, tooltips and other interactivity actions.

  6. A comparison of draw times when varying specific chart parameters for Heatmap Plot and Lines Track.
    Supplementary Fig. 4: A comparison of draw times when varying specific chart parameters for Heatmap Plot and Lines Track.

    The parameter for heatmap is the maximum number of columns to be drawn by the heat map before multiple columns are averaged into one. All data objects that are merged are mapped to the single column displayed. The data to visual object mapping is used for brushing, tooltips and other interactivity actions. The parameter for line tracks is the maximum number of points drawn. If the number of data points is greater than this parameter, the required number of points are sampled uniformly.

  7. Gene expression analysis of colon cancer methylation loss regions with Epiviz.
    Supplementary Fig. 5: Gene expression analysis of colon cancer methylation loss regions with Epiviz.

    A) We used the Epiviz computed columns feature to define an MA plot of colon cancer expression in the MMP gene family region (Figure 1). B) Gene expression barcode data for the same region shows similar expression patterns across multiple cancer types. Both of these plots were saved as pdfs directly from Epiviz.

  8. Comparison of hypomethylation block finding methods.
    Supplementary Fig. 6: Comparison of hypomethylation block finding methods.

    We compare hypomethylation blocks inferred using BSmooth on whole-genome bisuflite sequencing with blocks inferred with minfi on Illumina HumanMethylation450k beadarray data. In this plot we show the regions found along with smoothed bp-level mean methylation (for BSmooth) and probe-level mean methylation (aggregated over CpG clusters for minfi) data. The block-finding method used in minfi ignores methylation measurements in CpG islands by design, so that long blocks of methylation change would span across CpG islands. BSmooth does not use this design so blocks are frequently punctuated by CpG islands. We see this effect in this specific integrative visualization using Epivizr, where the only difference hypomethylation blocks is the punctuation at the CpG island for the BSmooth block.

  9. The spatial distribution of genes in correlation with hypomethylated blocks.
    Supplementary Fig. 7: The spatial distribution of genes in correlation with hypomethylated blocks.

    Visualizing genes and corresponding exons side by side with methylation levels in normal and cancer tissues using Epiviz confirms that hypo-methylated blocks are gene-poor.

  10. Exon-level expression in differentially methylated regions.
    Supplementary Fig. 8: Exon-level expression in differentially methylated regions.

    The track-based visualization of exon-level expression data, side by side with a view of DNA methylation and one of differentially methylated blocks reveals that at low resolution, exons tend to be silenced within blocks, and highly expressed outside.

References

  1. Bostock, M., Ogievetsky, V. & Heer, J. IEEE Trans. Vis. Comput. Graph. 17, 23012309 (2011).
  2. Stolte, C., Tang, D. & Hanrahan, P. Commun. ACM 51, 7584 (2008).
  3. Lister, R. et al. Cell 133, 523536 (2008).
  4. Zhou, X. et al. Nat. Methods 8, 989990 (2011).
  5. Gentleman, R.C. et al. Genome Biol. 5, R80 (2004).
  6. Yi, J.S., Kang, Y.A., Stasko, J. & Jacko, J. IEEE Trans. Vis. Comput. Graph. 13, 12241231 (2007).
  7. McCall, M.N., Uppal, K., Jaffee, H.A., Zilliox, M.J. & Irizarry, R.A. Nucleic Acids Res. 39, D1011D1015 (2011).
  8. Karolchik, D. et al. Nucleic Acids Res. 36, D773D779 (2008).
  9. Hubbard, T.J.P. et al. Nucleic Acids Res. 37, D690D697 (2009).
  10. Durinck, S. et al. Bioinformatics 21, 34393440 (2005).
  11. Anders, S. & Huber, W. Genome Biol. 11, R106 (2010).
  12. Lawrence, M. et al. PLoS Comput. Biol. 9, e1003118 (2013).
  13. Hansen, K.D. et al. Nat. Genet. 43, 768775 (2011).
  14. Paulson, J.N., Stine, O.C., Bravo, H.C. & Pop, M. Nat. Methods 10, 12001202 (2013).
  15. Lister, R. et al. Nature 462, 315322 (2009).
  16. Aryee, M.J. et al. Bioinformatics 30, 13631369 (2014).
  17. Cancer Genome Atlas Network. Nature 487, 330337 (2012).
  18. Goecks, J. et al. BMC Genomics 14, 397 (2013).
  19. Miller, C.A., Anthony, J., Meyer, M.M. & Marth, G. Bioinformatics 29, 381383 (2013).

Download references

Author information

Affiliations

  1. Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.

    • Florin Chelaru,
    • Llewellyn Smith,
    • Naomi Goldstein &
    • Héctor Corrada Bravo
  2. Department of Computer Science, University of Maryland, College Park, Maryland, USA.

    • Florin Chelaru &
    • Héctor Corrada Bravo
  3. Department of Mathematics, Williams College, Williamstown, Massachusetts, USA.

    • Llewellyn Smith
  4. Department of Computer Science, Williams College, Williamstown, Massachusetts, USA.

    • Llewellyn Smith
  5. Department of Mechanical Engineering and Materials Science, Washington University in St. Louis, St. Louis, Missouri, USA.

    • Naomi Goldstein

Contributions

H.C.B. conceived the project. F.C. and H.C.B. designed the project. F.C., L.S., N.G. and H.C.B. wrote the Epiviz and Epivizr software. F.C., L.S. and H.C.B. analyzed data. H.C.B. and F.C. wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: The Epiviz architecture. (82 KB)

    Presentation, visualizations and data representations are distinct. This allows Epiviz to reuse visualizations regardless of data source (Epiviz sever, or WebSocket connection through Epivizr). Data providers and visualizations can be plugged in on the fly using Epiviz’ plugin API.

  2. Supplementary Figure 2: Chart load times with and without cache. (190 KB)

    Average comparison of time taken by ‘add chart’ and ‘navigate’ operations per 1,000 data objects with and without using the predictive cache in the Epiviz data management tier.

  3. Supplementary Figure 3: Chart draw times for different parameter values. (302 KB)

    A comparison of draw times when varying specific chart parameters for Scatter Plot and Blocks Track. The parameter for scatter plot is “circle ratio” which splits the chart in a grid of squares of width equal to this parameter, and draws at most one circle in each cell of the grid. All data objects that overlap this point are mapped to the single circle displayed. The parameter for block tracks is the minimum distance in screen pixels between two blocks before they are merged into one display object. Again, all data objects merged are mapped to the single display object. The data to visual object mapping is used for brushing, tooltips and other interactivity actions.

  4. Supplementary Figure 4: A comparison of draw times when varying specific chart parameters for Heatmap Plot and Lines Track. (285 KB)

    The parameter for heatmap is the maximum number of columns to be drawn by the heat map before multiple columns are averaged into one. All data objects that are merged are mapped to the single column displayed. The data to visual object mapping is used for brushing, tooltips and other interactivity actions. The parameter for line tracks is the maximum number of points drawn. If the number of data points is greater than this parameter, the required number of points are sampled uniformly.

  5. Supplementary Figure 5: Gene expression analysis of colon cancer methylation loss regions with Epiviz. (57 KB)

    A) We used the Epiviz computed columns feature to define an MA plot of colon cancer expression in the MMP gene family region (Figure 1). B) Gene expression barcode data for the same region shows similar expression patterns across multiple cancer types. Both of these plots were saved as pdfs directly from Epiviz.

  6. Supplementary Figure 6: Comparison of hypomethylation block finding methods. (179 KB)

    We compare hypomethylation blocks inferred using BSmooth on whole-genome bisuflite sequencing with blocks inferred with minfi on Illumina HumanMethylation450k beadarray data. In this plot we show the regions found along with smoothed bp-level mean methylation (for BSmooth) and probe-level mean methylation (aggregated over CpG clusters for minfi) data. The block-finding method used in minfi ignores methylation measurements in CpG islands by design, so that long blocks of methylation change would span across CpG islands. BSmooth does not use this design so blocks are frequently punctuated by CpG islands. We see this effect in this specific integrative visualization using Epivizr, where the only difference hypomethylation blocks is the punctuation at the CpG island for the BSmooth block.

  7. Supplementary Figure 7: The spatial distribution of genes in correlation with hypomethylated blocks. (252 KB)

    Visualizing genes and corresponding exons side by side with methylation levels in normal and cancer tissues using Epiviz confirms that hypo-methylated blocks are gene-poor.

  8. Supplementary Figure 8: Exon-level expression in differentially methylated regions. (286 KB)

    The track-based visualization of exon-level expression data, side by side with a view of DNA methylation and one of differentially methylated blocks reveals that at low resolution, exons tend to be silenced within blocks, and highly expressed outside.

PDF files

  1. Supplementary Text and Figures (2,187 KB)

    Supplementary Figures 1–8 and Supplementary Note

Additional data