Abstract
Comparative genome- and proteome-wide screens yield large amounts of data. To efficiently present such datasets and to simplify the identification of hits, the results are often presented in a type of scatterplot known as a volcano plot, which shows a measure of effect size versus a measure of significance. The data points with the largest effect size and a statistical significance beyond a user-defined threshold are considered as hits. Such hits are usually annotated in the plot by a label with their name. Volcano plots can represent ten thousands of data points, of which typically only a handful is annotated. The information of data that is not annotated is hardly or not accessible. To simplify access to the data and enable its re-use, we have developed an open source and online web tool with R/Shiny. The web app is named VolcaNoseR and it can be used to create, explore, label and share volcano plots (https://huygens.science.uva.nl/VolcaNoseR). When the data is stored in an online data repository, the web app can retrieve that data together with user-defined settings to generate a customized, interactive volcano plot. Users can interact with the data, adjust the plot and share their modified plot together with the underlying data. Therefore, VolcaNoseR increases the transparency and re-use of large comparative genome- and proteome-wide datasets.
Similar content being viewed by others
Introduction
The volcano plot visualizes complex datasets generated by genomic screening or proteomic approaches. It is essentially a scatter plot, in which the coordinates of data points are defined by effect size and statistical significance1,2. Volcano plots typically show the data of hundreds to ten thousands of genes or proteins. Examples of such datasets are gene expression changes measured by RNA-seq3, genome-wide loss-of-function CRISPR screens4, or mapping the interactome of proteins-of-interest by mass spectrometry5. Although volcano plots are based on rich datasets, only a handful of data points are usually labeled with a gene or protein name. This enables the visual identification of hits and simplifies the interpretation of the complex dataset. Nevertheless, the data points that are not annotated may be of equal interest. Therefore, it is highly desirable to have easy access to the information of all data points from such large datasets.
Volcano plots are typically generated using commercial software or with software that requires the user to write scripts. A viable alternative is provided by dedicated free web apps that allow users to generate plots through a graphical user interface (GUI). Several web apps are available6,7, but these do not generate interactive plots and only have limited options for customization and annotation. Moreover, there is currently no easy and straightforward way of sharing the volcano plot together with the data. Therefore, we decided to generate a web-based online tool for generating and sharing volcano plots, similar to other plotting apps that we previously generated8,9. Here, we report an open source web app for generating, exploring, labeling and sharing volcano plots. The web app is created with R/Shiny and is dubbed VolcaNoseR. Below we discuss the features of the app.
Availability, code and issue reporting
The VolcaNoseR webtool is available at: https://huygens.science.uva.nl/VolcaNoseR or at (as long as the bandwidth limit is not reached): https://goedhart.shinyapps.io/VolcaNoseR/.
The code was written using R (https://www.r-project.org) and Rstudio (https://www.rstudio.com). To run the app, several freely available packages are required: shiny, ggplot2, magrittr, dplyr, ggrepel, shinycssloaders, DT, RCurl and readxl. The code of version 1.0.3 reported in this manuscript is archived at Zenodo.org: https://doi.org/10.5281/zenodo.4002791.
Up-to-date code and new releases will be made available on GitHub, together with information on running the app locally: https://github.com/JoachimGoedhart/VolcaNoseR.
The GitHub page of VolcaNoseR is the preferred way to communicate issues and request features (https://github.com/JoachimGoedhart/VolcaNoseR/issues). Alternatively, the users can contact the developers by email or Twitter. Contact information is found on the “About” page of the app.
Data input and format
The data can be supplied via file upload. The accepted file formats are text (with extension CSV or TXT) and spreadsheets (with extension XLS or XLSX). Different delimiters are acceptable for the text format, including the Comma Separate Values (CSV) format. Upload of Excel workbooks with multiple sheets is also supported. Alternatively, a CSV file from an online data repository can be used through a URL.
A limitation of the app on the Huygens server (https://huygens.science.uva.nl/VolcaNoseR) is the file size of ~ 1 Mb. Larger files are accepted when the app is run locally from R (up to 10 Mb) or from the shinyapps webserver (https://goedhart.shinyapps.io/VolcaNoseR/).
To demonstrate the features of the app, example data is included of which the details can be found elsewhere3,10.
After data upload, the user selects the columns that hold the information on the fold change (for the x-coordinate) and the significance (for the y-coordinate). Selecting a column with gene or protein names is optional.
Data visualization
A typical volcano plot shows the log2 of the fold change on the x-axis and minus log10 of the p-value on the y-axis. The data is shown as dots and their size and transparency can be adjusted. The position of the individual points is defined by these coordinates. By hovering over the data points, the information about the data can be accessed immediately and dynamically. When the pointer (mouse) is near a data point, the x- and y-coordinate and the name is retrieved, providing the user with easy access to the underlying data. In some cases, it may be desirable to display a 90 degrees rotated volcano plot. This option is available and will depict the fold change on the y-axis and the significance on the x-axis.
Thresholds and hits
The user can set threshold values for the fold change and the significance. The threshold values are indicated by dashed lines in the plot and used to classify the data as ‘unchanged’, ‘decreased’ or ‘increased’. The data are colored according to this classification and this can be shown in a legend.
The ‘top hits’ can be automatically detected and ranked based on a number of criteria. The default criterion is the Manhattan distance (|ΔX| +|ΔY|) of the data from the origin (0,0). The other criteria are Euclidean distance (SQRT(ΔX2 + ΔY2)), absolute fold change or significance. The data are sorted based on the selected criterion and the 10 top-ranking data points are selected. The number of top ranking data points can be adjusted by the user.
It is possible to annotate only ‘increased’ or ‘decreased’ or all significantly changed (‘increased’ and ‘decreased’) data points. The top-ranking hits are shown in the plot and there is an option to list them in a table. Finally, the user can manually search and select the names of genes or proteins of interest, which will be annotated in the plot and added to the table.
The standard colors to indicate ‘unchanged’, ‘increased’ and ‘decreased’ are respectively grey, red and blue. Another color combination that is available is grey, blue and green. Users can also define their own color scheme.
Output
Users can customize the titles and sizes for the axes labels. The plot that is generated by the app can be directly retrieved by drag-and-drop from the web browser. In addition, the plot can be downloaded as a PNG or PDF file. The PNG is a lossless bitmap format. The PDF allows for downstream processing/editing with software that can handle vector-based graphics.
Sharing data and plot settings
All settings that are defined in the user interface can be stored as a URL, as was previously implemented for PlotsOfData and PlotTwist8,9. When the data is retrieved from an external online resource, this hyperlink is included in the URL. The URL with settings is sufficient to (1) launch the app, (2) retrieve the data, and (3) plot the data according to user-defined settings. Once the plot is available, it can be adjusted and a new URL reflecting the new settings can be obtained. This feature enables transparent reporting of all the data and simplifies re-use of the data (Fig. 1).
We illustrate this feature with data from proteomic screens that we have recently published5. These data are deposited and publicly available at the data repository zenodo.org, https://doi.org/10.5281/zenodo.3713174. Volcano plots are generated with VolcaNoseR using the data from the CSV files in the repository. Next, the URL that encodes all necessary information was generated using the ‘clone current setting’ button. With this unique URL, the data is retrieved and a plot is generated by VolcaNoseR based on the parameters that are stored in the URL. For instance, this URL produces an interactive plot of which a static version is shown in Fig. 2A:
Users can easily access the data and plot through this URL, inspect the data and replot it. Suppose that a user is interested in both showing and annotating increased and decreased proteins with more stringent threshold levels, the user can replot the data as shown in Fig. 2B. The URL can be copied and shared. This URL would be:
A list of settings that can be stored in the URL is available in a supplemental document (Supplementary information S1 text).
Data re-use
To demonstrate the re-use of data, we examined the results of a recently published genome-wide CRISPR-based proliferation screen in a retinal pigment epithelial (RPE1) cell line4. First, we retrieved the data of the 2D proliferation screens in wildtype and TP53 knockout cell lines (shown in Fig. 1 of that paper). The data of each of the screens was converted to a CSV file and deposited at zenodo.org, https://doi.org/10.5281/zenodo.3843685. Next, we used the CSV file as input for VolcaNoseR and inspected the volcano plot (Fig. 3). Given our interest in G protein-coupled receptor signaling11, we looked for components of this signaling module. The GNAS gene was among the significant hits in the 2D proliferation screens in both wildtype and TP53 knockout cells (Fig. 3A,B), suggesting that it has an antiproliferative role in RPE1 cells. This result is in line with recent work on GNAS in the context of sonic hedgehog signaling12,13. This finding nicely demonstrates that the re-use of data from genome-wide, CRISPR-based screens is an efficient way to generate or test hypotheses. Here, we show that the VolcaNoseR web tool can be used to mine current datasets and communicate new observations (Fig. 3), which can be easily shared through hyperlinks for re-use.
Conclusion
Volcano plots are data visualizations that can plot a large amount of information. Unfortunately, only a fraction of the data is labeled in static figures and, therefore, the vast majority of the information is inaccessible. To provide access to all of the data represented in a volcano plot, we developed an interactive online plotting tool. A unique feature of the web app that sets it apart from other software for making volcano plots is that VolcanoseR enables an easy and straightforward way of sharing the volcano plot together with the data.
By hovering over the plot with a pointer, each data point can be inspected. In addition, user-defined candidates can be labeled in the plot and listed in a table. Together, these features enable access to all the information that the plot is based on. Finally, the web app can be used to share the data and the plot to allow other users to interact with the data and reuse it. Therefore, VolcaNoseR increases the transparency and re-use of large comparative genome- and proteome-wide datasets.
Data availability
All data and code is available in public repositories (GitHub and Zenodo) as referenced in the manuscript.
References
Li, W. Volcano plots in analyzing differential expressions with mRNA microarrays. J. Bioinform. Comput. Biol. 10, 1231003 (2012).
Cui, X. & Churchill, G. A. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 4, 210 (2003).
Becares, N. et al. Impaired LXRα phosphorylation attenuates progression of fatty liver disease. Cell Rep. 26, 984-995.e6 (2019).
Drainas, A. P. et al. Genome-wide screens implicate loss of cullin ring ligase 3 in persistent proliferation and genome instability in TP53-deficient cells. Cell Rep. 31, 107465 (2020).
van der Weegen, Y. et al. The cooperative action of CSB, CSA, and UVSSA target TFIIH to DNA damage-stalled RNA polymerase II. Nat. Commun. 11, 2104 (2020).
Singh, S., Hein, M. Y. & Stewart, A. F. msVolcano: a flexible web application for visualizing quantitative proteomics data. bioRxiv 38356 (2016). doi:https://doi.org/10.1101/038356
Naumov, V., Balashov, I., Lagutin, V., Borovikov, P. & Alexeev, A. VolcanoR: web service to produce volcano plots and do basic enrichment analysis. bioRxiv 165100 (2017). doi:https://doi.org/10.1101/165100
Goedhart, J. PlotTwist: a web app for plotting and annotating continuous data. PLOS Biol. 18, e3000581 (2020).
Postma, M. & Goedhart, J. PlotsOfData: a web app for visualizing data together with their summaries. PLOS Biol. 17, e3000202 (2019).
Gillingham, A. K., Bertram, J., Begum, F. & Munro, S. In vivo identification of GTPase interactors by mitochondrial relocalization and proximity biotinylation. Elife 8, e45916 (2019).
Chavez-Abiega, S., Goedhart, J. & Bruggeman, F. J. Physical biology of GPCR signalling dynamics inferred from fluorescence spectroscopy and imaging. Curr. Opin. Struct. Biol. 55, 204–211 (2019).
Pusapati, G. V. et al. CRISPR screens uncover genes that regulate target cell sensitivity to the morphogen sonic hedgehog. Dev. Cell 44, 113-129.e8 (2018).
Pusapati, G. V et al. G protein–coupled receptors control the sensitivity of cells to the morphogen Sonic Hedgehog. Sci. Signal. 11, eaao5749 (2018).
Acknowledgements
Some of the VolcaNoseR code is taken from PlotTwist and it is partially inspired by the VolcanoR app (https://github.com/vovalive/volcanoR). We are grateful to Graham Dellaire (Dalhousie University, Canada) and Inés Pineda-Torra (University College of London, UK) for their input and thank Auke Folkerts (UvA, The Netherlands) for help with the server that runs Shiny. The feedback, suggestions, enthusiastic responses and example plots that are shared on Twitter with @joachimgoedhart and @luijsterburglab are highly appreciated.
Author information
Authors and Affiliations
Contributions
J.G. and M.S.L. conceived the project and co-wrote the manuscript. J.G. wrote the code for the webtool.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Goedhart, J., Luijsterburg, M.S. VolcaNoseR is a web app for creating, exploring, labeling and sharing volcano plots. Sci Rep 10, 20560 (2020). https://doi.org/10.1038/s41598-020-76603-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-020-76603-3
This article is cited by
-
Identification of regulatory networks and crosstalk factors in brown adipose tissue and liver of a cold-exposed cardiometabolic mouse model
Cardiovascular Diabetology (2024)
-
Kinome and phosphoproteome reprogramming underlies the aberrant immune responses in critically ill COVID-19 patients
Clinical Proteomics (2024)
-
Mitigation of synaptic and memory impairments via F-actin stabilization in Alzheimer’s disease
Alzheimer's Research & Therapy (2024)
-
A rare olive compound oleacein functions as a TrkB agonist and mitigates neuroinflammation both in vitro and in vivo
Cell Communication and Signaling (2024)
-
Exploring molecular targets: herbal isolates in cervical cancer therapy
Genomics & Informatics (2024)