Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Correspondence
  • Published:

Galaxy External Display Applications: closing a dataflow interoperability loop

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Anatomy of a basic Galaxy External Display Application.

Data availability

A small example BAM dataset displayed at IOBIO as shown in Fig. 1 is available from https://usegalaxy.org/u/dan/h/very-important-experiment-geda-nat-meth-shared.

Code availability

Code described here has been integrated into the Galaxy codebase and released under the Academic Free License (AFL) v. 3.0 (https://github.com/galaxyproject/galaxy; Galaxy version 19.09 was used in creating Fig. 1: https://github.com/galaxyproject/galaxy/releases/tag/v19.09). A free public instance of Galaxy can be accessed at https://usegalaxy.org.

References

  1. Blankenberg, D., Taylor, J. & Nekrutenko, A. Cold Spring Harb. Protoc. 2015, 324–335 (2015).

    Article  Google Scholar 

  2. Qu, K. et al. Nat. Methods 13, 245–247 (2016).

    Article  CAS  Google Scholar 

  3. Afgan, E., Baker, D., Batut, B. & Van Den Beek, M. Nucleic Acids Res. 47, W537–W544 (2018).

    Article  Google Scholar 

  4. Blankenberg, D., Coraor, N., Von Kuster, G., Taylor, J. & Nekrutenko, A., Galaxy Team. Database 2011, bar011 (2011).

    Article  Google Scholar 

  5. Blankenberg, D., Johnson, J. E., Galaxy Team, Taylor, J. & Nekrutenko, A. Bioinformatics 30, 1917–1919 (2014).

    Article  CAS  Google Scholar 

  6. Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Brief Bioinform. 14, 178–192 (2013).

    Article  Google Scholar 

  7. Kent, W. J. et al. Genome Res. 12, 996–1006 (2002).

    Article  CAS  Google Scholar 

  8. Nicol, J. W., Helt, G. A., Blanchard, S. G. Jr, Raja, A. & Loraine, A. E. Bioinformatics 25, 2730–2731 (2009).

    Article  CAS  Google Scholar 

  9. Kalderimis, A. et al. Nucleic Acids Res. 42, W468–W472 (2014).

    Article  CAS  Google Scholar 

  10. Miller, C. A., Qiao, Y., DiSera, T., D’Astous, B. & Marth, G. T. Nat. Methods 11, 1189 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors are grateful and indebted to the Galaxy team and the Galaxy community for all of their contributions.

Author information

Authors and Affiliations

Authors

Contributions

D.B. wrote the manuscript. All authors read and approved the manuscript. All authors contributed code.

Corresponding author

Correspondence to Daniel Blankenberg.

Ethics declarations

Competing interests

The authors declare no competing interests.

Integrated supplementary information

Supplementary Figure 1 Anatomy of a dynamic Galaxy External Display Application.

(A.I) A dynamic GEDA is initially defined in the same fashion as the static type (see main Fig. 1). (A.II) A set of “dynamic_links” are defined according to the entries contained within a Galaxy Data Table (see panels B and C), with the link “id” and displayed “name” coming from the “value” and “name” fields, respectively. (A.III) This URL is generated using the Python string format operator (%) using the value of “url” from the Data Table, along with a dictionary containing an encoded URL to the “intermine_file” dataset content. (A.IV) The URL containing the InterMine dataset content is defined as containing a “DATASET_HASH” value that is unique to the Galaxy instance. This can be helpful to differentiate between multiple datasets within external resources that may only utilize or display the base filename of uploaded content; Galaxy dataset names can also be used (Supplementary Figs. 2 and 3). (B) A flat file (colloquially referred to as a “.loc” or “location” file) is tab-delimited and can be modified outside of the Galaxy codebase to configure this display application. There is a single entry shown (B.I) which contains 3 columns: (B.II) a unique id for the entry, (B.III) the display name for the entry, and (B.IV) a variable named “url” that contains the URL template used in (A). (C) The Data Tables configuration (C.I) can contain any number of defined “table”s (C.II) which declare a “name” for the table, as well as to ignore lines in the file that begin with a comment character (#) and to not allow duplicate entries. The (C.III) columns for the three fields in the table are defined as “value”, “name”, and “url”. The (C.IV) source of the data table content is loaded from the specified flat file (B). Reference: Kalderimis A, Lyne R, Butano D, Contrino S, Lyne M, Heimbach J, et al. InterMine: extensive web services for modern biology. Nucleic Acids Res 2014;42:W468–72. https://doi.org/10.1093/nar/gku301.

Supplementary Figure 2 Anatomy of a dynamic Galaxy External Display Application utilizing filters and templates.

(I) A dynamic GEDA is initially defined in the usual manner (see figure S1) for standard text-based VCF files. (II) A set of “dynamic_links” are defined according to the entries contained within an internal Galaxy system “site_type”. Additional (III) dynamic parameters are defined in a named fashion from the content of the site_type entry. The third parameter, “builds” is generated as a list from a comma-separated value within the site_type entry. (IV) Two filters are applied to determine if a GEDA link will be generated for the dataset. First to confirm that this UCSC Genome Browser site (e.g. main, mirror, or domain specific) has been enabled by the Galaxy Administrator. And, secondly, to confirm that this dataset belongs to a genome build that is valid for this particular UCSC Genome Browser site. Should any of the filters fail, then the GEDA will not be available and the link will not be generated for the user. The resource URL (V) is generated dynamically with the Galaxy content being defined using a UCSC Track definition file syntax (“track” template parameter) using the bigDataUrl mechanism. This GEDA is designed to work for standard VCF files, and (VI) we use Galaxy’s automatic datatype conversion system twice. First to bgzip the text-based VCF file and then to build a Tabix index on the bgzip VCF file. We (VII) define a template-type parameter named “track” that will be dynamically generated and presented as an externally viewable file to the external resource. This Track file provides the track type, a track name, dbkey (genome build), and a bigDataUrl value that references the bgzip VCF dataset content. This approach enables the UCSC Genome Browser to make efficient, semi-random access to the user’s dataset by only transferring the subset of data required for a particular user view. Reference: Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, et al. The human genome browser at UCSC. Genome Res 2002;12:996–1006. https://doi.org/10.1101/gr.229102.

Supplementary Figure 3 Anatomy of a complex Galaxy External Display Application interoperating with locally installed desktop software, which can be optionally launched over the web.

This dynamic GEDA is initially defined in the usual manner (I), however this GEDA has two sets of “dynamic_links” tagsets (II and III), where each is able to generate multiple external resource links. The first link set (II) is generated from the Galaxy built-in sites list for the Integrative Genomics Viewer (IGV), with dynamic parameters (IV) defined similarly as in (3.III). GEDA links are filtered (V) to include sites designed for directly operating on running local software or where the genome build for the dataset is enabled at the external resource. The URL (VI) is dynamically generated from a template-based parameter named “redirect_url” (X). This GEDA will work for GFF datasets (VII) and derivatives, with the URL file extension coming directly from the specific hierarchical datatype for the dataset. A (VIII) template-based parameter named “site_organism” is used to determine the genome build ID valid for the IGV software from a look-up table or as directly specified, with the “strip” attribute indicating to remove any whitespace surrounding the populated parameter value. A (IX) template-based parameter named “jnlp” dynamically generates a Java Network Launch Protocol (JNLP) description file that can be used launch a new instance of the IGV desktop application with the user’s GFF dataset. The (X) redirect_url used for the external resource URL (VI) is created in one of three different versions, depending upon the type of external resource site definition (local, web, or jnlp). The second set of “dynamic_links” (III) is defined from a Galaxy Data Table that is populated from an external URL (http://igv.broadinstitute.org/genomes/genomes.txt), enabling automatic syncing with the latest IGV genome build support. A filter (XI) is applied to ensure that only datasets with genome builds available within this IGV database will have GEDA links created. A URL (XII) is created that forwards the user and dataset content URL directly to the launch IGV page at the Broad Institute, along with a defined track name and genome build. The GFF dataset content (XIII) is defined as in (VII). Reference: Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013;14:178–92. https://doi.org/10.1093/bib/bbs017.

Supplementary information

Supplementary Information

Supplementary Figs. 1–3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blankenberg, D., Chilton, J. & Coraor, N. Galaxy External Display Applications: closing a dataflow interoperability loop. Nat Methods 17, 123–124 (2020). https://doi.org/10.1038/s41592-019-0727-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0727-x

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics