The genomics world has no shortage of visualization tools. But as new methods and data types emerge, existing techniques can struggle to cope. Now, a tool known as Gosling allows bioinformaticians to build apps that can display genomic information with the same level of flexibility that developers have come to expect from other graphics programming tools.

First released in 2020 by bioinformatician Nils Gehlenborg and his team at Harvard Medical School in Boston, Massachusetts, Gosling stands for ‘grammar of scalable linked interactive nucleotide graphics’1. But the name is also a nod to structural biologist Raymond Gosling, who, with Rosalind Franklin, captured the famous ‘Photograph 51’, which revealed the structure of DNA.

Gosling is what is known as a grammar. It is implemented in programming libraries that provide a flexible syntax for describing genomic regions and interactions and how they should be laid out on a web page. Researchers and bioinformaticians can use these libraries to create interactive, scalable visualizations that they can share with their colleagues, and to build bespoke genetic-analysis tools.

The views that Gosling creates can be linked, so that selecting a region in one panel highlights the same region in another. They can also be panned, manipulated and zoomed in and out of from the chromosome level down to single nucleotides. “The visual representation adapts to the zoom level,” Gehlenborg says — a feature called semantic zoom. An online testing environment provides visualizations that users can extend to create and export their own graphics. And libraries for both Python (Gos) and JavaScript (gosling.js) enable bioinformaticians to program the images directly into Jupyter computational notebooks and other applications. An alpha-stage R version was released in July. The libraries are used to systematically relate data sets to their visualizations, says Tamara Munzner, a computer scientist at the University of British Columbia in Vancouver, Canada. Popular libraries such as ggplot2 and Vega-Lite use the ‘grammar of graphics’ to define their visualizations. But these tools can be used for any type of graphic, whereas Gosling is specifically designed for genomics visualizations. “It’s like Vega-Lite for genomics,” Munzner says.

Bridging the gap

Programming tools for visualizations range from template-based functions that use a single line of code to create a standard type of graph to those that assemble visualizations piece by piece from lines and geometric shapes, such as the JavaScript D3.js library. The template version is easy to use, but relatively inflexible; the other offers a great deal more customization, but is laborious to use.

“Gosling really bridges that gap, making it way easier to make new tools that have visualization components,” says Maria Nattestad, a software engineer at Google in Mountain View, California. As part of her PhD research in 2015, Nattestad developed a tool called SplitThreader, which presents the genome in a circular layout known as a Circos plot, with sequencing reads as arcs to highlight structural variations. With no other options, she drew those elements from scratch, using D3.js to specify the placement and dimensions of each line, rectangle and circle. “It was such a learning curve,” she says. “It took me a long time to build SplitThreader,” she says, but adds that it could have probably built a lot faster with Gosling.

Gehlenborg says that Gosling arose out of a 2019 literature review2, during which his team surveyed the genome visualization landscape and built up a taxonomy for the tools and their capabilities. From there, the researchers developed a syntax to systematically describe the visualizations those tools could make. Gosling, Gehlenborg explains, “is a fundamental approach to assemble genomic visualizations using that same taxonomy”.

Gosling encodes the data using a plain-text format called JavaScript Object Notation (JSON) and uses language that is specific to genomics to supplement the more general terms used in standard graphing libraries. Gosling.js, Gos and g(R)osling then use that encoding to generate files in their respective programming languages. The final visualization is drawn in a web browser using a rendering engine and file-formatting tools developed by the Gehlenborg team to visualize chromosomal data from a technique called Hi-C3. Visualizations at gosling-lang.org provide starting points for Circos plots, gene annotation, chromatin conformation heat maps, evolutionary conservation and more.

Postdoc Sehi L’Yi, who led Gosling’s development, says that what differentiates Gosling from other visualization tools is its expressiveness. With most tools, he says, the graphics that can be made and what they will look like are predefined. “It is really not easy to customize visualizations as a user.” But with Gosling, users can, for instance, specify the colour, dimensions and placement of the symbol used to represent a centromere or genomic interval, then overlay that on an ideogram of a chromosome to highlight a region of interest.

An interesting space

A team of master’s students at the University of British Columbia decided to use Gosling to create its final project in a data-visualization class. “One of my team mates had heard about it at a conference last year,” says team member Armita Safa. “Even for someone who doesn’t have a coding background, it is relatively easier to work with Gosling than most other things that are used for visualization,” she says. That said, she notes that they initially struggled to extract the data they needed to allow users to click on regions and create new visualizations.

Dominic Girardi, chief product officer at the data visualization company Datavisyn in Linz, Austria, has also experimented with Gosling to create an interactive playground that allows users to filter a table of genes by genomic region. The firm — which Gehlenborg co-founded — is now using Gosling to generate visualization tools for its corporate clients, although it has not yet completed one, Girardi says.

Gosling isn’t the only visualization library for genomic data; other examples include ggbio, gggenomes and gggenes, all of which are extensions of the ggplot2 graphing library. But most of these tools create static images, Gehlenborg says — pictures, rather than interactive visualizations. Gehlenborg says that future plans for Gosling include giving it a graphical interface, so that researchers can create visualizations by dragging-and-dropping widgets onto a virtual canvas rather than having to program them.

Robert Buels, who is leading development of a genome browser at the University of California, Berkeley, says that Gosling “occupies a really interesting space” in the genomics visualization toolbox. “You can get a lot more customizability with Gosling,” he says. But users don’t have to write nearly as much code as they do for tools such as D3.js.

“It’s a really interesting niche in between the two things,” he says, “that I think is a really great addition to the field.”