BrowserGenome.org: web-based RNA-seq data analysis and visualization

Schmid-Burgk, Jonathan L; Hornung, Veit

doi:10.1038/nmeth.3615

Download PDF

Correspondence
Published: 29 October 2015

BrowserGenome.org: web-based RNA-seq data analysis and visualization

Jonathan L Schmid-Burgk¹ &
Veit Hornung¹

Nature Methods volume 12, page 1001 (2015)Cite this article

7901 Accesses
7 Citations
24 Altmetric
Metrics details

Subjects

To the Editor

Applications of deep-sequencing technologies in life science research and clinical diagnostics are rapidly expanding. Although fast data-processing algorithms exist¹, intuitive, portable data-evaluation solutions are still needed. Web tools have a history in bioinformatics of providing platform-independent, intuitive, barrier-free software solutions. Whereas in most scientific web tools a server performs intense calculations, the new HTML5 standard and the competition between web browser platforms have recently opened access to computational resources for web apps. However, so far web apps have been used only to visualize existing genome annotations or alignment data^2,3. Here we describe BrowserGenome (http://www.BrowserGenome.org), a web-based deep-sequencing data-analysis platform offering barcode deconvolution, read mapping, real-time data visualization, transcript-count analysis and data normalization. BrowserGenome is specifically focused on the evaluation of mRNA-seq data, but it can easily be extended to other applications. BrowserGenome matches the speed and memory footprint of state-of-the-art software while being visually driven and intuitive to use.

Read-mapping, visualization and transcript-counting algorithms were implemented in JavaScript through adaptation of a non-overlapping q-gram indexing algorithm⁴, sorted data structures and random sampling³ (Supplementary Note 1 and Supplementary Figs. 1 and 2). The read-mapping strategy was specifically designed to allow quantification of gene expression in the limited web browser environment, without aims of splice-variant detection, calling of single-nucleotide polymorphisms or the evaluation of paired-end sequencing data, as offered by other software⁵. BrowserGenome uses raw sequencing data in FASTQ format or imports mapping results from other software in SAM format. It outputs binary or SAM-format mapping results or transcript-count tables. The graphical user interface displays the genome as a dynamic circle, with the mapping density displayed eccentrically (Fig. 1). The user navigates through the data using a mouse, with gestures similar to those used in web applications such as Google Maps. Reference gene names and exons are displayed at high zoom levels. Up to six hit-density tracks can be loaded in parallel. Wizard menus guide users through the read-mapping and transcript-counting processes (Supplementary Note 2).

**Figure 1: The BrowserGenome.org web application.**

To validate the performance of BrowserGenome, we analyzed a publically available mRNA-seq data set from the ENCODE database⁶ (human HepG2 cells; data set ENCFF000DPK) on a standard laptop computer. We observed that 59.2% of 26.6 million raw reads were mapped to the human genome at a rate of 18 million reads per hour. The hit-density map could be navigated in real time, and normalized transcript counts were calculated in less than two seconds (Supplementary Table 1). Despite BrowserGenome's simple read-mapping algorithm, analyzing the same data with the established STAR⁵ software produced highly correlated transcript-count data (Pearson R = 0.974; Supplementary Fig. 3) and near-equal correlation coefficients between gene expression results and sequencing-independent gene expression data (Supplementary Fig. 4).

BrowserGenome's usability and accessibility compare favorably with those of other graphics-based RNA-seq evaluation tools (Supplementary Fig. 5). The core functions can be easily extended or incorporated into other web apps through a library interface (Supplementary Note 3). The platform-independent web app does not transfer any scientific data via the Internet and is open-source software under the terms of GNU General Public License version 2 without depending on third-party code.

References

Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar
Skinner, M.E., Uzilov, A.V., Stein, L.D., Mungall, C.J. & Holmes, I.H. JBrowse: a next-generation genome browser. Genome Res. 19, 1630–1638 (2009).
Article CAS Google Scholar
Miller, C.A., Qiao, Y., DiSera, T., D'Astous, B. & Marth, G.T. bam.iobio: a web-based, real-time, sequence alignment file inspector. Nat. Methods 11, 1189 (2014).
Article CAS Google Scholar
Ukkonen, E. Approximate string-matching with q-grams and maximal matches. Theor. Comput. Sci. 92, 191–211 (1992).
Article Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS Google Scholar
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

Download references

Acknowledgements

We thank H. Hauswedell (Institut für Informatik, Freie Universität Berlin, Berlin, Germany) for critical comments. This work was supported by grants from the German Research Foundation (EXC1023) and the European Research Council to V.H. and from the German National Academic Foundation to J.L.S.-B.

Author information

Authors and Affiliations

Institute of Molecular Medicine, University Hospital, University of Bonn, Bonn, Germany
Jonathan L Schmid-Burgk & Veit Hornung

Authors

Jonathan L Schmid-Burgk
View author publications
You can also search for this author in PubMed Google Scholar
Veit Hornung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan L Schmid-Burgk.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome.

(a) Indexing strategy: The genome sequence of interest is divided into non-overlapping 12-mers. A Hook table is generated that contains the genome position of the first occurrence of every possible 12-mer sequence. Therefore, this table counts 4¹² entries and occupies 67 MB of RAM (4 bytes per entry). A second table called a Jump table for every genomic 12-mer position enlists the position wherein the genome the same 12-mer is found the next time. Its number of rows equals the genome length divided by 12. As such, for the human genome 1.1 GB of RAM are occupied (4 bytes per entry). (b) Fast sequence search: From a 25-mer search sequence, 12 overlapping 12-mers are extracted (left panel). For each 12-mer, all genomic occurrences are retrieved by looking them up in the Hook table and then iterating through the Jump table (right panel). At every 12-mer occurrence in the genome, the whole 25-mer search sequence is locally matched to the genome (100% identities, no gaps allowed).

Supplementary Figure 2 Fast visualization and gene-counting algorithms of BrowserGenome.

(a) For visualization of hits in an exemplary viewing range spanning from position 50 to position 80, an unsorted hit list has to be scanned from top to bottom in order to filter visible hits (left panel). BrowserGenome instead uses sorted hit lists (right panel), which allow the localization of only the first and the last hit entry. Localization works in O(log n) time by the algorithm described in Supplementary Note 1. Single comparison operations are color-coded in blue (equal or greater than target) and green (smaller than target). (b) For counting the hit numbers in annotated exonic regions at a genome-wide scale, a naive algorithm would require testing all hit positions for being included in all annotated exons, which would be computationally intense (left panel). BrowserGenome therefore makes use of both sorted exon and hit lists (right panel), which allows genome-wide exon hit counting in seconds by the algorithm detailed in Supplementary Note 1.

Supplementary Figure 3 RNA-seq data mapping performance comparison between STAR and BrowserGenome.org.

(a) The RNA-seq test data set ENCFF000DPK containing 26,642,287 raw deep-sequencing reads retrieved from human HepG2 cells was downloaded from encodeproject.org and was analyzed on two different computers using STAR 2.4.2a or BrowserGenome.org. (b) Correlation of gene expression quantification results from STAR and BrowserGenome.org using the same data set. Plotted are the absolute numbers of reads mapped to individual genes on a semi-logarithmic scale with added jitter. The Pearson correlation coefficient R was calculated after jitter addition and logarithmization.

Supplementary Figure 4 Correlation of transcript-quantification results of STAR and BrowserGenome with nanoString quantification data.

(a,b) ENCODE raw RNA-seq data set ENCFF000DPK was analyzed using STAR (a) or BrowserGenome.org (b) using default parameters. nCounter data of 52 exemplary genes were retrieved from ref. 3. Plotted are the decadic logarithms of the nCounter counts (x-axis) or RPKM values (y-axis) incremented by 10 and 0.1, respectively, in order to omit infinite numbers. Pearson correlation coefficients R were calculated after logarithmization.

Supplementary Figure 5 Feature comparison of BrowserGenome with three established graphics-based RNA-seq data-evaluation software tools.

(a) Current versions of Galaxy, CLC genomics workbench, and Chipster were compared to BrowserGenome with regard to the features given in the first column.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Notes 1–3 (PDF 2039 kb)

Supplementary Table 1

Normalized transcript counts calculated from RNA-seq data of human HepG2 cells using BrowserGenome. (XLSX 1040 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmid-Burgk, J., Hornung, V. BrowserGenome.org: web-based RNA-seq data analysis and visualization. Nat Methods 12, 1001 (2015). https://doi.org/10.1038/nmeth.3615

Download citation

Published: 29 October 2015
Issue Date: November 2015
DOI: https://doi.org/10.1038/nmeth.3615

This article is cited by

Design principles for cyclin K molecular glue degraders
- Zuzanna Kozicka
- Dakota J. Suchyta
- Nicolas H. Thomä
Nature Chemical Biology (2024)
Engineering of CRISPR-Cas12b for human genome editing
- Jonathan Strecker
- Sara Jones
- Feng Zhang
Nature Communications (2019)

BrowserGenome.org: web-based RNA-seq data analysis and visualization

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Fast read-mapping algorithm of BrowserGenome.

Supplementary Figure 2 Fast visualization and gene-counting algorithms of BrowserGenome.

Supplementary Figure 3 RNA-seq data mapping performance comparison between STAR and BrowserGenome.org.

Supplementary Figure 4 Correlation of transcript-quantification results of STAR and BrowserGenome with nanoString quantification data.

Supplementary Figure 5 Feature comparison of BrowserGenome with three established graphics-based RNA-seq data-evaluation software tools.

Supplementary information

Supplementary Text and Figures

Supplementary Table 1

Rights and permissions

About this article

Cite this article

This article is cited by

Design principles for cyclin K molecular glue degraders

Engineering of CRISPR-Cas12b for human genome editing

Search

Quick links

Subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 3 RNA-seq data mapping performance comparison between STAR and BrowserGenome.org.

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links