VolcaNoseR is a web app for creating, exploring, labeling and sharing volcano plots

Comparative genome- and proteome-wide screens yield large amounts of data. To efficiently present such datasets and to simplify the identification of hits, the results are often presented in a type of scatterplot known as a volcano plot, which shows a measure of effect size versus a measure of significance. The data points with the largest effect size and a statistical significance beyond a user-defined threshold are considered as hits. Such hits are usually annotated in the plot by a label with their name. Volcano plots can represent ten thousands of data points, of which typically only a handful is annotated. The information of data that is not annotated is hardly or not accessible. To simplify access to the data and enable its re-use, we have developed an open source and online web tool with R/Shiny. The web app is named VolcaNoseR and it can be used to create, explore, label and share volcano plots (https://huygens.science.uva.nl/VolcaNoseR). When the data is stored in an online data repository, the web app can retrieve that data together with user-defined settings to generate a customized, interactive volcano plot. Users can interact with the data, adjust the plot and share their modified plot together with the underlying data. Therefore, VolcaNoseR increases the transparency and re-use of large comparative genome- and proteome-wide datasets.

The volcano plot visualizes complex datasets generated by genomic screening or proteomic approaches. It is essentially a scatter plot, in which the coordinates of data points are defined by effect size and statistical significance 1,2 . Volcano plots typically show the data of hundreds to ten thousands of genes or proteins. Examples of such datasets are gene expression changes measured by RNA-seq 3 , genome-wide loss-of-function CRISPR screens 4 , or mapping the interactome of proteins-of-interest by mass spectrometry 5 . Although volcano plots are based on rich datasets, only a handful of data points are usually labeled with a gene or protein name. This enables the visual identification of hits and simplifies the interpretation of the complex dataset. Nevertheless, the data points that are not annotated may be of equal interest. Therefore, it is highly desirable to have easy access to the information of all data points from such large datasets.
Volcano plots are typically generated using commercial software or with software that requires the user to write scripts. A viable alternative is provided by dedicated free web apps that allow users to generate plots through a graphical user interface (GUI). Several web apps are available 6,7 , but these do not generate interactive plots and only have limited options for customization and annotation. Moreover, there is currently no easy and straightforward way of sharing the volcano plot together with the data. Therefore, we decided to generate a web-based online tool for generating and sharing volcano plots, similar to other plotting apps that we previously generated 8,9 . Here, we report an open source web app for generating, exploring, labeling and sharing volcano plots. The web app is created with R/Shiny and is dubbed VolcaNoseR. Below we discuss the features of the app.
The GitHub page of VolcaNoseR is the preferred way to communicate issues and request features (https :// githu b.com/Joach imGoe dhart /Volca NoseR /issue s). Alternatively, the users can contact the developers by email or Twitter. Contact information is found on the "About" page of the app.

Data input and format
The data can be supplied via file upload. The accepted file formats are text (with extension CSV or TXT) and spreadsheets (with extension XLS or XLSX). Different delimiters are acceptable for the text format, including the Comma Separate Values (CSV) format. Upload of Excel workbooks with multiple sheets is also supported. Alternatively, a CSV file from an online data repository can be used through a URL.
A limitation of the app on the Huygens server (https ://huyge ns.scien ce.uva.nl/Volca NoseR ) is the file size of ~ 1 Mb. Larger files are accepted when the app is run locally from R (up to 10 Mb) or from the shinyapps webserver (https ://goedh art.shiny apps.io/Volca NoseR /).
To demonstrate the features of the app, example data is included of which the details can be found elsewhere 3,10 .
After data upload, the user selects the columns that hold the information on the fold change (for the x-coordinate) and the significance (for the y-coordinate). Selecting a column with gene or protein names is optional.

Data visualization
A typical volcano plot shows the log 2 of the fold change on the x-axis and minus log 10 of the p-value on the y-axis. The data is shown as dots and their size and transparency can be adjusted. The position of the individual points is defined by these coordinates. By hovering over the data points, the information about the data can be accessed immediately and dynamically. When the pointer (mouse) is near a data point, the x-and y-coordinate and the name is retrieved, providing the user with easy access to the underlying data. In some cases, it may be desirable to display a 90 degrees rotated volcano plot. This option is available and will depict the fold change on the y-axis and the significance on the x-axis.

Thresholds and hits
The user can set threshold values for the fold change and the significance. The threshold values are indicated by dashed lines in the plot and used to classify the data as 'unchanged' , 'decreased' or 'increased' . The data are colored according to this classification and this can be shown in a legend.
The 'top hits' can be automatically detected and ranked based on a number of criteria. The default criterion is the Manhattan distance (|ΔX| +|ΔY|) of the data from the origin (0,0). The other criteria are Euclidean distance (SQRT(ΔX 2 + ΔY 2 )), absolute fold change or significance. The data are sorted based on the selected criterion and the 10 top-ranking data points are selected. The number of top ranking data points can be adjusted by the user.
It is possible to annotate only 'increased' or 'decreased' or all significantly changed ('increased' and 'decreased') data points. The top-ranking hits are shown in the plot and there is an option to list them in a table. Finally, the user can manually search and select the names of genes or proteins of interest, which will be annotated in the plot and added to the table.
The standard colors to indicate 'unchanged' , 'increased' and 'decreased' are respectively grey, red and blue. Another color combination that is available is grey, blue and green. Users can also define their own color scheme.

Output
Users can customize the titles and sizes for the axes labels. The plot that is generated by the app can be directly retrieved by drag-and-drop from the web browser. In addition, the plot can be downloaded as a PNG or PDF file. The PNG is a lossless bitmap format. The PDF allows for downstream processing/editing with software that can handle vector-based graphics.

Sharing data and plot settings
All settings that are defined in the user interface can be stored as a URL, as was previously implemented for PlotsOfData and PlotTwist 8,9 . When the data is retrieved from an external online resource, this hyperlink is included in the URL. The URL with settings is sufficient to (1) launch the app, (2) retrieve the data, and (3) plot the data according to user-defined settings. Once the plot is available, it can be adjusted and a new URL reflecting the new settings can be obtained. This feature enables transparent reporting of all the data and simplifies re-use of the data (Fig. 1).
We illustrate this feature with data from proteomic screens that we have recently published 5 . These data are deposited and publicly available at the data repository zenodo.org, https ://doi.org/10.5281/zenod o.37131 74. Volcano plots are generated with VolcaNoseR using the data from the CSV files in the repository. Next, the URL that encodes all necessary information was generated using the 'clone current setting' button. With this unique URL, the data is retrieved and a plot is generated by VolcaNoseR based on the parameters that are stored in the URL. For instance, this URL produces an interactive plot of which a static version is shown in Fig. 2A www.nature.com/scientificreports/ Users can easily access the data and plot through this URL, inspect the data and replot it. Suppose that a user is interested in both showing and annotating increased and decreased proteins with more stringent threshold levels, the user can replot the data as shown in Fig. 2B. The URL can be copied and shared. This URL would be: https ://huyge ns.scien ce.uva.nl/Volca NoseR /?data=5;;Diffe rence _CSB_GFP;p_value _CSB_GFP;Gene_ names &vis=4;0.8;-3,3;3;signi fican t;manh&can=20;;;&layou t=;;;;;X;600;800&color =3;none&label =TRUE;GFP-CSB-WT%20vs%20GFP -NLS;TRUE;Enric hment %20(log2);Signi fican ce%20(-log10 %20p-value );;24;24;18;6;&url=https ://zenod o.org/recor d/37131 74/files /CSV_1-GFP-CSB-WT_vs_GFP-NLS.csv A list of settings that can be stored in the URL is available in a supplemental document (Supplementary information S1 text).

Data re-use
To demonstrate the re-use of data, we examined the results of a recently published genome-wide CRISPR-based proliferation screen in a retinal pigment epithelial (RPE1) cell line 4 . First, we retrieved the data of the 2D proliferation screens in wildtype and TP53 knockout cell lines (shown in Fig. 1 of that paper). The data of each of the screens was converted to a CSV file and deposited at zenodo.org, https ://doi.org/10.5281/zenod o.38436 85. Next, we used the CSV file as input for VolcaNoseR and inspected the volcano plot (Fig. 3). Given our interest in G protein-coupled receptor signaling 11 , we looked for components of this signaling module. The GNAS gene was among the significant hits in the 2D proliferation screens in both wildtype and TP53 knockout cells (Fig. 3A,B), suggesting that it has an antiproliferative role in RPE1 cells. This result is in line with recent work on GNAS in the context of sonic hedgehog signaling 12,13 . This finding nicely demonstrates that the re-use of data from genome-wide, CRISPR-based screens is an efficient way to generate or test hypotheses. Here, we show that the VolcaNoseR web tool can be used to mine current datasets and communicate new observations (Fig. 3), which can be easily shared through hyperlinks for re-use.

Conclusion
Volcano plots are data visualizations that can plot a large amount of information. Unfortunately, only a fraction of the data is labeled in static figures and, therefore, the vast majority of the information is inaccessible. To provide access to all of the data represented in a volcano plot, we developed an interactive online plotting tool. A unique feature of the web app that sets it apart from other software for making volcano plots is that VolcanoseR enables an easy and straightforward way of sharing the volcano plot together with the data. By hovering over the plot with a pointer, each data point can be inspected. In addition, user-defined candidates can be labeled in the plot and listed in a table. Together, these features enable access to all the information that the plot is based on. Finally, the web app can be used to share the data and the plot to allow other users to interact with the data and reuse it. Therefore, VolcaNoseR increases the transparency and re-use of large comparative genome-and proteome-wide datasets.

Data availability
All data and code is available in public repositories (GitHub and Zenodo) as referenced in the manuscript.
Received: 6 July 2020; Accepted: 19 October 2020 Figure 3. Volcano plots generated from a published genome-wide CRISPR screen dataset 4 show that GNAS has an antiproliferative role in RPE1 cell growth. The volcano plots were generated with VolcaNoseR using published data and setting the threshold for the Log2(Fold Change) and − Log10(q-value) to 2. Significant hits are depicted in red and reflect genes that have an antiproliferative function. The top five candidates and GNAS are labeled. (A) Results from a 2D proliferation assay on RPE1 cells, plot and data are accessible through this hyper link. (B) Results from a 2D proliferation assay on RPE1 TP53 −/− cells, plot and data are accessible through this hyper link.