Gene.iobio: an interactive web tool for versatile, clinically-driven variant interrogation and prioritization

Di Sera, Tonya; Velinder, Matt; Ward, Alistair; Qiao, Yi; Georges, Stephanie; Miller, Chase; Pitman, Anders; Richards, Will; Ekawade, Aditya; Viskochil, David; Carey, John C.; Pace, Laura; Bale, Jim; Clardy, Stacey L.; Andrews, Ashley; Botto, Lorenzo; Marth, Gabor

doi:10.1038/s41598-021-99752-5

Download PDF

Article
Open access
Published: 13 October 2021

Gene.iobio: an interactive web tool for versatile, clinically-driven variant interrogation and prioritization

Tonya Di Sera¹,
Matt Velinder¹,
Alistair Ward¹,
Yi Qiao¹,
Stephanie Georges¹,
Chase Miller¹,
Anders Pitman¹,
Will Richards¹,
Aditya Ekawade¹,
David Viskochil¹,
John C. Carey¹,
Laura Pace¹,
Jim Bale¹,
Stacey L. Clardy¹,
Ashley Andrews¹,
Lorenzo Botto¹ &
…
Gabor Marth¹

Scientific Reports volume 11, Article number: 20307 (2021) Cite this article

4202 Accesses
4 Citations
11 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 06 April 2022

This article has been updated

Abstract

With increasing utilization of comprehensive genomic data to guide clinical care, anticipated to become the standard of care in many clinical settings, the practice of diagnostic medicine is undergoing a notable shift. However, the move from single-gene or panel-based genetic testing to exome and genome sequencing has not been matched by the development of tools to enable diagnosticians to interpret increasingly complex or uncertain genomic findings. Here, we present gene.iobio, a real-time, intuitive and interactive web application for clinically-driven variant interrogation and prioritization. We show gene.iobio is a novel and effective approach that significantly improves upon and reimagines existing methods. In a radical departure from existing methods that present variants and genomic data in text and table formats, gene.iobio provides an interactive, intuitive and visually-driven analysis environment. We demonstrate that adoption of gene.iobio in clinical and research settings empowers clinical care providers to interact directly with patient genomic data both for establishing clinical diagnoses and informing patient care, using sophisticated genomic analyses that previously were only accessible via complex command line tools.

Neptune: an environment for the delivery of genomic medicine

Article 13 July 2021

GenomeDiver: a platform for phenotype-guided medical genomic diagnosis

Article 10 June 2021

The LOVD3 platform: efficient genome-wide sharing of genetic variants

Article Open access 15 September 2021

Introduction

It is becoming increasingly common for clinical care providers to incorporate genetic information into a patient’s clinical diagnosis and subsequent care. This major transition in care relies on a number of factors, including but not limited to: a patient’s access to genetic sequencing; genetics education of providers and an understanding of how genetic variants can impact patient care; time constraints on clinicians and clinical groups to adopt additional considerations and workloads into their patient care; and clinicians’ ability to confidently analyze and interpret genetic findings. Given the wide scope of this clinical transition, we developed gene.iobio to address the challenge clinicians face in analyzing and interpreting genomic findings.

Typical exome or genome sequencing studies produce vast amounts of data that are stored on cloud or institutional hardware. Raw sequencing reads pass through a number of complex processing tasks to generate variant calls. For a typical trio exome sequencing study (proband, mother and father), it is expected that well over 50,000 variants will be identified. This number reaches multiple millions when the entire genome is analyzed. These variants need to be prioritized and evaluated based on whether they can reasonably contribute to the patient’s phenotype. Given the number of variants, this is a daunting task that must consider numerous factors such as: the Mendelian mode of inheritance and segregation of the given allele in the family, population allele frequency, predicted impact and biological consequence, known gene:disease associations and in-silico predictions of pathogenicity. Command line, UNIX-based, variant prioritization tools have been developed to consider these factors^1,2,3,4. However, these tools are often difficult to download, install, and run—making it unreasonable for clinician care providers to perform these analyses. As a result, the current paradigm relies on a team of experts across multiple disciplines to deliver optimal and expedient genomically-informed care. This typically involves bioinformaticians, clinical geneticists, molecular pathologists and subspecialty clinicians—all of whom need to be in near-constant communication and have in-depth discussions about candidate variants before reaching a clinical diagnosis based on genetic information. This level of logistics and organization becomes even more challenging in sequencing studies that demand rapid turnover and timely diagnoses, as clinical interventions informed by genetic findings can significantly improve patient outcomes and prognoses⁵.

Broadly speaking, our approach to these challenges has been to reimagine the current paradigm and bring clinical knowledge closer to genomic data and variant interpretation. This approach has led to the development of an expanding suite of intuitive, visually-driven, web-based bioinformatics tools; the iobio⁶ software suite. Our other iobio tools provide rapid quality review of BAM/CRAM files⁷ (http://bam.iobio.io), and VCF files (http://vcf.iobio.io), and for generating lists of genes associated with given genetic disorders and phenotypes⁸ (http://genepanel.iobio.io). Gene.iobio expands our iobio approach into variant interrogation and prioritization. Few tools have attempted similar visual web-based approaches. The vast majority of these tools (Emedgene, Alamut, Fabric, Varsome Premium, QIAGEN Ingenuity, Genuity Science and others) have been commercialized with significant licensing costs—limiting access to academic research groups and underserved clinics and their patients. The only free-to-use academic option to our knowledge is VCF/Plotein⁹, which attempts to visualize variants and their context within publicly-available variant databases as well as their pathogenicity scores. However, the VCF/Plotein tool lacks significant functionality required for comprehensive variant prioritization in clinical settings. Gene.iobio addresses numerous unsolved clinical genetics challenges and provides a comprehensive, up-to-date, disease and phenotype-informed, variant review platform for clinicians and clinical teams. Even before a dedicated publication, the tool has become extremely popular and highly accessed, with > 1000 distinct monthly users, each performing an average of 3 analysis sessions 15 min or longer and involving > 15 user interactions.

Methods

File input/output

Gene.iobio accepts file-format compliant indexed BAM/CRAM¹⁰ and indexed (unannotated or annotated) VCF¹¹ files. Files can be provided via a publicly accessible URL, secure private URLs (via access tokens or VPN), and/or through the user’s local machine. Importantly, these files can be in distinct locations (e.g. BAM/CRAM files on the user’s local machine and the VCF via a URL). Regardless of file locations, gene.iobio streams the relevant portions of data, including VCF variant data and BAM/CRAM sequencing data, and displays them in a visual interface for the user to analyze. Gene.iobio allows a user to save results as a VCF or a comma-separated values file through an “Export Variants” option. This exported file includes all relevant annotations and reviews made by the user for loading back into gene.iobio using the “Import Variants” option, allowing users to recall previously saved analyses at any time and across browser sessions. Additionally, gene.iobio, can save analyses back to Mosaic, a commercial and collaborative genomic data platform, developed by Frameshift Genomics (https://frameshift.io/).

System architecture

Gene.iobio is a Javascript application that interfaces with our cloud-based iobio backend services (https://github.com/iobio/iobio-gru-backend). This architecture delineates application and data processing logic. Gene.iobio controls user visualizations and interactions in the browser and the coordination between various visual components, as well as interfacing with the iobio backend. The iobio backend services are functionally equivalent to command line bioinformatics tools, wrapped as web services. The iobio backend performs region-based bioinformatics analyses (see below) on source files (BAM/CRAM and VCF) and transforms these data into formats that are interpretable to web applications such as gene.iobio. These region-based analyses analyze only the streamed data, as discrete and manageable chunks, allowing the outputs to automatically update gene.iobio visualizations in real-time. Gene.iobio interfaces with iobio backend services asynchronously through secure HTTPS requests.

Variant annotation

Variant annotation is performed by iobio backend services in a region-specific manner, with the data streamed back to gene.iobio. This variant annotation service includes: tabix¹² (for region-based querying of indexed VCF files), vt¹³ (for sample subsetting, variant decomposing, normalizing and transforming), VEP¹⁴ (for transcript-aware annotation of variants with functional consequence, impact, ClinVar¹⁵ significance, REVEL¹⁶ score, HGVS¹⁷, and dbSNP¹⁸ ID), and bcftools (for determining variant population allele frequency in gnomAD) (https://github.com/samtools/bcftools).

Sequencing data coverage and alignment

Gene.iobio displays sequencing data coverage visualizations based on the data returned from iobio backend services. This coverage-based iobio backend service utilizes samtools¹⁰ for region-based queries of CRAM/BAM alignment files and to determine coverage across a gene or a given region such as an exon. Gene.iobio visually summarizes coverage in these regions including the min, max, median and mean.

Variant calling

On-demand variant calling is performed by a backend service that includes samtools¹⁰ for region-based queries of CRAM/BAM alignment files and Freebayes¹⁹ for calling variants. The Freebayes command used is as follows: freebayes -f $reference --only-use-input-alleles --min-mapping-quality 0 --min-coverage 0 --use-duplicate-reads. Called variants are annotated in the same way as described in the Variant annotation section above.

IGV integration

Gene.iobio directly integrates a web-based JavaScript version of the Integrated Genome Viewer (IGV)²⁰, called igv.js (https://github.com/igvteam/igv.js/).

Gene:disease association

Gene.iobio provides a controlled gene, phenotype and disorder vocabulary to help guide variant prioritization and ensure correct names have been loaded. Gene.iobio uses GENCODE²¹ and RefSeq²² gene names in an input text box with typeahead and autocomplete functionality. Gene.iobio also integrates Phenolyzer²³, which allows the user to enter a phenotype term and automatically generate a list of genes associated with that phenotype. Gene.iobio retrieves up-to-date gene:disease association data from OMIM²⁴ via their web API. PubMed articles associated with a particular gene are retrieved using the web API, NCBI E-utils.

Language and codebase

Gene.iobio is a large and complex codebase with over 30,000 lines of code, and is available in the public GitHub repository at https://github.com/iobio/gene.iobio.vue. Gene.iobio uses the Vue.js Javascript framework that supports reusable components that are able to plug and play in different aspects of the application and more broadly within our suite of iobio applications. All interactive data visualizations are built using D3²⁵, allowing for custom genomic visual components that respond in real-time to new data as it is streamed from iobio backend services.

Backend services

The iobio backend is written in server-side JavaScript running on Node.js. The source is available in the public GitHub repository (https://github.com/iobio/iobio-gru-backend). The iobio backend (gru) is responsible for remote invocation of command line bioinformatics pipelines. These pipelines are bash scripts that read from standard input and pipe through various bioinformatics applications and write data to standard output. Bash scripts are wrapped as web services using Node.js. Pipeline requests are made using remote procedure calls (RPCs), using simple Hypertext Transfer Protocol Secure (HTTPS) requests. Many of the requests are resource-intensive, therefore the gru iobio backend is designed to be horizontally scaled by load-balancing across as many compute nodes as necessary.

To serve local files, gene.iobio utilizes fibridge, a generic service that provides a way to proxy HTTP connections to a WebSocket source. The local file proxy greatly simplifies the code base, so that local files and remotely-service files are read from the iobio backend via an HTTP request. When the gene.iobio client application opens a local file it opens a WebSocket channel to the fibridge server. The fibridge server then provides a URL for the file.

The iobio backend heavily leverages Singularity containers, which allows each bioinformatics tool to be self-contained and isolated from the other tools as well as making them portable for other sites. This is particularly helpful for complex or legacy tools which may be written in languages like Perl, and for tools which have many dependencies. gru is loaded on AWS EC2 instances, providing a scalable, fault tolerant compute environment. The iobio backend is also installed in the University of Utah’s High Performance Computing Center’s protected environment.

External resources and databases

Gene.iobio integrates numerous public datasets to present up-to-date gene and variant annotations. These external resources and databases are kept up-to-date using iobio backend services built around the individual data type. For instance, the ClinVar resource is maintained with a backend service that retrieves the latest ClinVar VCF on a weekly basis. ClinVar VCF variants are matched to user-provided variants during gene.iobio analyses. Similarly, the gnomAD resource is regularly updated and variants are matched during analyses. GnomAD population allele frequencies, as well as heterozygous and homozygous alternate allele counts, are provided within gene.iobio. Gene function summaries are retrieved via the NCBI E-utilities²⁶ REST API. The iobio backend also maintains FASTQ files for genome reference builds including GRCh37 and GRCh38. The phlyoP²⁷ conservation scores and multiple species sequence alignment visualizations in gene.iobio rely on UCSC²⁸ genome tracks to display multiple organism sequence alignments surrounding the given variant. Gene names are maintained via GENCODE and RefSeq resources. Numerous other external links are provided at the gene- and variant-specific level, including MARRVEL²⁹, VarSome³⁰, OMIM³¹, DECIPHER³², GeneCards³³, GTEx³⁴, HumanMine (Kalderimis et al. 2014), PubMed, UniProt, Human Protein Atlas (Uhlén et al. 2015), UCSC Browser.

Development

Gene.iobio has been developed using best practices in software development, versioning and testing. We maintain a dynamic codebase with multiple developers contributing to the project. The gene.iobio code base is maintained in GitHub. This allows changes to be merged into the current version, pull requests initiated and versions to be tracked. We have also developed a testing environment that allows us to deploy specific versions directly to the web via a unique URL. These test builds are automatically generated each time a pull request is submitted to the GitHub repository. This enables developers to quickly share their proposed changes with the rest of the team, making collaboration more efficient. We actively maintain the gene.iobio codebase, regularly making improvements, adding new features and deploying them to the public version on a regular release schedule.

Deployment, usage and availability

Gene.iobio is publicly available and free-to-use for academic purposes at http://gene.iobio.io/. Commercial use is licensed through Frameshift Genomics (https://frameshift.io/). The University of Utah and the Utah Center for Genetic Discovery maintain an institutional version of gene.iobio for use by our clinical teams and genetics researchers. Gene.iobio has been developed and optimized for the Chrome browser, with additional support for the Firefox and Safari browsers.

Gene.iobio instances have been deployed at the University of Utah, Nebula Genomics³⁵ (https://nebula.org/), MyGene2³⁶ (https://mygene2.org/MyGene2/) and for an educational exhibit at the Natural History Museum of Utah (https://learngene.iobio.io/). From our web analytics, gene.iobio has more than 4000 distinct users, with an average of 1000 distinct users per month and more than 20,000 pageviews per month. The typical user performs multiple analysis sessions per month and spends more than 15 min per analysis session.

Results

Gene.iobio is a real-time, intuitive and interactive platform for performing sophisticated gene and variant level review. Gene.iobio does not require time-consuming data uploads and can be used for real-time analysis of both exome and genome sequencing data. Gene.iobio can be used in singleton sequencing studies, but is most powerful in family studies where parents and additional siblings have been sequenced. For all variants in user-provided genes, gene.iobio determines allele segregation and visually displays the mode of inheritance, including the evidence for reference and alternate alleles. This allows users to enter suspected genes, given the clinical phenotype or from disease:gene association tools such as genepanel.iobio⁸, Phenomizer³⁷ or PanelApp³⁸ as well as genes prioritized by upstream variant prioritization tools such as slivar⁴, GEMINI¹, ANNOVAR² and others. We find an especially useful approach is to use genepanel.iobio to generate a comprehensive list of disease-associated genes and enter this gene list into gene.iobio for variant review. Gene.iobio provides visual summaries of pertinent variant annotations such as biological impact, gnomAD population frequency³⁹, ClinVar assertion¹⁵, REVEL score¹⁶ and evolutionary conservation—with visual cues for how each annotation might contribute to pathogenicity. Users can assign a significance to variants, as well as attach freeform text notes. As gene.iobio is a research analysis tool and not a clinical diagnostic tool, significance terms (Significant, Unknown significance, Not significant) are informative, yet distinct from the accepted ACMG/AMP clinical diagnostic terms. Analyses and variant annotations can be saved and exported, allowing users to return to the analysis at any time. Gene.iobio allows a user to perform rapid and comprehensive variant interrogation and prioritization in a single visual and interactive web interface.

We have applied gene.iobio in numerous clinical settings at our institution, where it has contributed to the clinical and genetic diagnosis of numerous cases. One representative case was of a boy, less than 10 years old, whose primary objective finding was dysgenesis of the corpus callosum with retrocerebellar fluid collection, laryngeal cleft, cryptorchidism, proximal projections from 2nd metacarpals bilaterally, and non-familial facial profile with ear pits, unusual scalp hair pattern, and abnormal transverse palmar creases. Less specific findings included intellectual disability, autism spectrum disorder, unique pigmentation pattern on torso, and food aversion (Fig. 1a). The child and his parents underwent whole exome sequencing through a commercial sequencing provider, who returned a “likely pathogenic” synonymous variant in the report. However, the geneticist following the family disagreed with the report’s findings and requested our group to obtain the raw sequencing data (CRAM files) and perform independent variant calling and variant interpretation.

Following this reprocessing and reanalysis, the bioinformatics and clinical genetics teams reviewed candidate variants in gene.iobio. One of the candidate variants was a rare de novo frameshift variant in ARID1B. This variant was prioritized at the top of the variant list panel in gene.iobio (Fig. 1b). Coffin Siris syndrome was part of the initial differential diagnosis for this patient, but the patient did not present with classic Coffin-Siris syndrome. With this consideration, the clinical team entered Coffin Siris syndrome into the phenotype entry component of gene.iobio, which uses Phenolyzer²³ to generate a list of phenotype-associated genes (Fig. 1c). ARID1B was among the genes in this Coffin Siris syndrome-associated gene list (Fig. 1d). Also within this view of gene.iobio, the clinical team reviewed key variant annotations such as consequence, gnomAD allele frequency and inheritance (Fig. 1d). The clinical team also reviewed OMIM phenotypes and PubMed publications related to ARID1B for further clinical evidence. After reviewing this variant, phenotype and literature evidence, the clinical and bioinformatics teams were able to conclude this variant was causative of the patient’s phenotype, as it fits within a larger group of ARID1B-related disorders, of which Coffin-Siris is within⁴⁰. Lastly, the team assigned a significance to the ARID1B variant and entered a note describing why this variant was considered causative (Fig. 1e).

Gene.iobio uses a sophisticated software architecture (see “Methods” section) to provide real-time, comprehensive and visually-driven variant annotations and gene information (Fig. 2). Gene.iobio takes in a number of inputs, including variant and sequence alignment files (Fig. 2a). These genomic data files can be provided from a user’s local machine or as a publicly accessible URL. Files are never directly uploaded or stored, but rather data from these large files are analyzed in a gene region-specific manner, with small discrete data chunks being streamed to backend bioinformatics services (Fig. 2b). This allows for real-time analysis and visualization of the genomic data. During an analysis, gene.iobio automatically passes numerous pieces of information to various knowledge sources and bioinformatics services, which return pertinent information for variant filtering, interpretation and prioritization (Fig. 2c). This robust and versatile approach allows users to iteratively analyze variants, entering new genes and prioritizing new variants. Analyses and variants can be saved and exported for other downstream uses or for returning to gene.iobio at a later time.

Clinical expertise informs candidate variant prioritization. This can be especially useful in research settings where clinicians can often be more inclusive in their consideration of candidate variants and is distinct from commercial sequencing providers who typically only have access to a small set of phenotype terms and are more restricted in the variants they report. Clinicians in a research setting can also consider more subtle phenotypes and/or phenotypes that may not have been noted in the initial evaluation. For example, expert clinicians at our institution have reviewed variants in genes prioritized by our computational pipelines and provided key clinical insights about gene:disease association or first hand experience of patients with similar genetic and phenotypic findings. This has included a case of a male between 30 and 40 years old with adult onset leukodystrophy. The clinical team prioritized a rare X-linked recessive missense variant in ATP6AP2 due to the gene’s association with epilepsy syndromes, an insight that was not immediately available to the bioinformatician analyzing the case (Fig. 3a). Gene.iobio provided this clinician with a comprehensive and easy to understand summary of the variant as well as the OMIM phenotypes and PubMed literature (Fig. 3a). While this particular patient did not display all of the phenotypes associated with X-linked Parkinsonism with spasticity, as defined by OMIM, the patient had sufficient phenotypic overlap with the described disorder that the variant was considered diagnostic. This example highlights the clinical knowledge bases that gene.iobio provides to help clinical experts bolster evidence for a given variant’s pathogenicity.

Conversely, gene.iobio can also provide clinical experts with evidence against a given candidate variant’s pathogenicity. In the same case as above, clinical experts used gene.iobio to refute a computationally prioritized rare X-linked recessive missense variant in BRWD3 (Fig. 3b). Using gene.iobio the clinical team could confirm that while this variant was rare and possibly impactful, the gene is associated with X-linked mental retardation in OMIM and numerous publications in PubMed support this association (Fig. 3b). However, this phenotype is not consistent with the patient, who has no described intellectual disability, allowing the clinical team to correctly discard this variant from their analysis. Both previous examples highlight the utility of bringing clinical expertise closer to genomic analyses and variant prioritization, as those responsible for the patient’s clinical care have unique insights about the patient’s phenotypes and specific disease presentation.

Gene.iobio is uniquely positioned for reanalysis and reinterpretation of sequencing data and the increasing efforts to improve diagnostic yield. Gene.iobio provides users with the most up-to-date variant annotations and gene:disease associations, served from their latest releases through automatically updating backend services (Fig. 2c). This allows a user to return to previous sequencing data and analyze variants with the most up-to-date annotations. We have diagnosed numerous cases at our institution using gene.iobio to reinterpret variants from previously undiagnosed cases. One example was a case that was sequenced by a commercial provider and remained undiagnosed for multiple years. After obtaining the sequencing data from the commercial provider, the clinical team reviewed computationally prioritized variants in gene.iobio. During this review, the clinical team identified a rare de novo frameshift variant in the SON gene (Fig. 3c). Since the original sequencing was performed, numerous publications have described de novo loss of function variants in the SON gene^41,42,43,44, with the published patients’ phenotypes largely overlapping that of our patient. Furthermore, the variant had also been asserted as pathogenic in ClinVar. All of this evidence, as displayed in gene.iobio, was sufficient to return a genetic diagnosis to the patient and family. This example highlights the ease and power of gene.iobio as a variant reanalysis and reinterpretation platform.

Gene.iobio also provides users with a clearer understanding of nuanced, and sometimes confusing genomic information. Gene.iobio allows users to quickly and easily identify and adjudicate potential false positive and false negative variant calls, a particularly challenging task for de novo variants (Fig. 4). Allele balance (the number and ratio of reference and alternate allele observations) can often help adjudicate de novo variants. Gene.iobio displays allele balance information in an easily understandable and visual format. Allele balance information in gene.iobio was used to identify a false negative de novo variant in the GRASP gene, where upstream prioritization tools discarded this variant due to low coverage. Yet upon viewing in gene.iobio, the user can readily determine that while the total depth is low (16X), the allele balance is consistent with a heterozygous de novo variant (6 alternate observations and 10 reference observations) and no alternate observations are observed in either parent (Fig. 4a). Furthermore, the user can launch the integrated IGV viewer and confirm at the read level that this is a high quality variant and should likely be retained.

Similarly, allele balance views in gene.iobio can be used to discard likely false positive de novo variants. One such example was a variant observed in the KMT2C gene (Fig. 4b). This variant was prioritized as de novo by upstream prioritization tools due to it being heterozygous in proband, and both parents being homozygous reference. However, upon viewing the variant in gene.iobio, the user can readily see a poor allele balance and poor evidence for the alternate allele (38 alternate observations and 502 reference observations) in the proband sample. This evidence, provided by gene.iobio, suggests this variant is a false positive and should likely be discarded.

Instances also exist where a variant is prioritized as de novo in the proband, yet evidence for the alternate allele exists in the parents, despite their genotypes being reported as homozygous reference. Gene.iobio also empowers users to identify these nuanced situations. One such example was observed in the CASP5 gene, where the proband was genotyped as heterozygous and both parents were genotyped as homozygous reference (Fig. 4c). However, when viewing this variant in gene.iobio, the user can readily observe that while there are 6 alternate allele observations in the proband, each parent also has 4 alternate allele observations. This evidence can be inspected further in IGV, where a nearby paternally inherited variant can also be observed. This evidence for the alternate allele in both parents and read level inspection in IGV suggests the variant may be inherited from either parent and is likely not a true de novo.

Variant prioritization efforts only consider alleles that segregate with affected status. These Mendelian modes of inheritance include: autosomal dominant; autosomal recessive; de novo and X-linked. Violations of these inheritance modes are clinically important but can also lead to important questions about the samples (if sample swapping has occurred), the underlying genomic region (low complexity regions prone to calling multi-allelic variants) and the variant quality (often due to low complexity or difficult to sequence or call regions). A representative example of a Mendelian violation identified by gene.iobio was seen in the CNGA1 gene (Fig. 4d). In this example, the unaffected sibling in a quartet sequencing study shows a heterozygous genotype, despite both parents having strong evidence for their homozygous alternate genotypes. This is a clear violation of Mendelian inheritance modes and would lead the user to question the genotypes of all individuals in the family. This example highlights how gene.iobio conveys complex genetic concepts, encoded in genomic files, in a visual format that is immediately intuitive to the user.

In contrast with current variant analysis platforms that rely entirely on the variants called by the variant calling pipeline, gene.iobio has built-in variant calling functionality. This approach can be limiting in that variants are often missed or inappropriately removed by post-processing filtering steps. Gene.iobio addresses this limitation by directly integrating Freebayes¹⁹ for on-demand variant calling. This is especially useful for cases where variants are suspected but have not been called, or to provide confirmation that the provided VCF file has not undergone overly restrictive filtering, removing potentially interesting variants. Freebayes variant calling in gene.iobio calls variants in the provided gene regions in real-time. One example where this approach was useful was a case where a prior variant calling pipeline failed to call any variants in the gene DGCR2, a potentially clinically-relevant gene for the case. However, Freebayes variant calling in gene.iobio identified numerous variants (Fig. 4e). Our previous publication⁴⁵ also describes the benefits of this variant calling approach in early infantile epileptic encephalopathy cases.

Gene.iobio is a comprehensive and feature-rich variant interrogation and prioritization tool that incorporates state-of-the-art bioinformatics tools and clinical genetics resources into a single visual and interactive interface (see Supplemental Table 1). This single application approach removes the burden on the user in numerous ways. Gene.iobio removes the need to perform complex command line operations and interpret often cryptic bioinformatics file formats and metrics. Gene.iobio removes the need to match a given variant to external resources such as publicly reported variants databases, and is publicly-available for academic use (http://gene.iobio.io/) and has been integrated into the Mosaic⁴⁶ tool at our institution.

Discussion

Genetic information is becoming more routinely used to guide patient care. As such, clinical care providers are taking an increasingly active role in genetic analysis and diagnosis, from independently reviewing genetic findings to performing variant prioritization tasks. However, current variant prioritization and bioinformatics approaches rely almost exclusively on command line tools. Given their training and expertise, it is unreasonable for providers to add command line computational bioinformatics to their current patient care regimen. Our solution to this challenge has been to develop intuitive, visually driven web tools that are immediately usable by clinical care providers. This approach was the motivation for developing gene.iobio, a comprehensive genomic analysis and variant review application.

Gene.iobio enables secure, clinically-driven variant interrogation and prioritization, bringing clinical care providers’ intimate knowledge of the patient’s disease and phenotype closer to their genetics. Gene.iobio is a web application that allows providers to perform variant prioritization tasks through the web browser of any typical computer, without any specialized hardware or software. Gene.iobio ensures genomic data security, as data is never uploaded or stored and all data is queried through secure data requests and connections. Gene.iobio is also interactive and visually-driven, allowing providers to immediately intuit what the data is representing and directly interact with it, regardless of their bioinformatics experience. Furthermore, visualizations and interactive data are streamed into the application in real-time, allowing users to immediately interact with and analyze their data. This real-time approach analyzes only discrete, user-provided genomic regions, removing the need for large genomics file uploads and for long run time end-to-end data processing. Lastly, gene.iobio addresses numerous data insufficiencies issues and many nuanced considerations during variant prioritization. These include the adjudication of putative de novo variants, as well as Mendelian violations—all of which are represented to the user in an easily understandable visual format.

We continue to actively develop gene.iobio, releasing regular updates with new features and fixing issues raised by our users. Furthermore, as sequencing and genomic data uses continue to expand, gene.iobio is well suited to integrate new features, adopt new annotation metrics and display new data types. Since its initial development, we have added new variant metrics, such as REVEL scores. We anticipate adding pertinent new metrics, such as the metric for constrained coding regions⁴⁷, as they are published. We are also exploring ways to add other relevant functional genomics data types into gene.iobio. In the future these may include data for ChIP-seq, RNA-seq, ATAC-seq or methylation studies.

Gene.iobio reimagines many of the current paradigms in clinical genetics and addresses many of the challenges associated with the increased incorporation of genetic information into clinical care. The level of ease and sophistication provided by gene.iobio is unmatched by any other existing tools. We anticipate gene.iobio will help lower technical barriers and allow more providers to review and prioritize genetic variants in their own clinical practices and patients. These providers have an intimate understanding of their patient’s disease presentation and phenotypes, information that is unavailable to the bioinformatics analyst in the current paradigm. We anticipate enabling clinicians in this way will help accelerate the adoption of genetic information into clinical care decisions and ultimately improve patient care and outcomes. We also anticipate gene.iobio contributing to new genetic diagnoses and the discovery of new genetic disorders through reanalysis and reinterpretation of previously undiagnosed cases, an approach that is being increasingly appreciated in the field.

Change history

06 April 2022
A Correction to this paper has been published: https://doi.org/10.1038/s41598-022-09959-3

References

Paila, U., Chapman, B. A., Kirchner, R. & Quinlan, A. R. GEMINI: Integrative exploration of genetic variation and genome annotations. PLoS Comput. Biol. 9, e1003153 (2013).
Article ADS CAS Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucl. Acids Res. 38, e164 (2010).
Article Google Scholar
Moore, B., Flygare, S., Reese, M. G. & Yandell, M. VAAST 2.0: Improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genetic 37, 622–634 (2013).
Google Scholar
Pedersen, B. S. et al. Effective variant filtering and expected candidate variant yield in studies of rare human disease. NPJ Genom. Med. https://doi.org/10.1101/2020.08.13.249532 (2020).
Article Google Scholar
Farnaes, L. et al. Rapid whole-genome sequencing decreases infant morbidity and cost of hospitalization. NPJ Genom. Med. 3, 10 (2018).
Article Google Scholar
What is it? iobio. http://iobio.io/.
Miller, C. A., Qiao, Y., DiSera, T., D’Astous, B. & Marth, G. T. Bam.iobio: A web-based, real-time, sequence alignment file inspector. Nat. Methods 11, 1189 (2014).
Article CAS Google Scholar
Ekawade, A., Velinder, M., Ward, A., DiSera, T. & Marth, G. Genepanel.iobio—An easy to use web tool for generating disease- and phenotype-associated gene lists. bioRxiv https://doi.org/10.1101/722843 (2019).
Article Google Scholar
Ossio, R. et al. VCF/Plotein: Visualization and prioritization of genomic variants from human exome sequencing projects. Bioinformatics 35, 4803–4805 (2019).
Article CAS Google Scholar
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS Google Scholar
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Article Google Scholar
Tan, A., Abecasis, G. R. & Kang, H. M. Unified representation of genetic variants. Bioinformatics 31, 2202–2204 (2015).
Article CAS Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genom. Biol. 17, 122 (2016).
Article Google Scholar
Landrum, M. J. et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucl. Acids Res. 42, D980–D985 (2014).
Article CAS Google Scholar
Ioannidis, N. M. et al. REVEL: An ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
Article CAS Google Scholar
den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).
Article Google Scholar
Sherry, S. T. et al. dbSNP: The NCBI database of genetic variation. Nucl. Acids Res. 29, 308–311 (2001).
Article CAS Google Scholar
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. arXiv [q-bio.GN] (2012).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucl. Acids Res. 47, D766–D773 (2019).
Article CAS Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucl. Acids Res. 44, D733–D745 (2016).
Article Google Scholar
Yang, H., Robinson, P. N. & Wang, K. Phenolyzer: Phenotype-based prioritization of candidate genes for human diseases. Nat. Methods 12, 841–843 (2015).
Article CAS Google Scholar
McKusick, V. A. Mendelian inheritance in man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Article CAS Google Scholar
Bostock, M., Ogievetsky, V. & Heer, J. D³: Data-driven documents. IEEE Trans. Vis. Comput. Graph. 17, 2301–2309 (2011).
Article Google Scholar
Sayers, E. The E-utilities In-Depth: Parameters, Syntax and More (National Center for Biotechnology Information (US), 2018).
Google Scholar
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genom. Res. 20, 110–121 (2010).
Article CAS Google Scholar
Kent, W. J. et al. The human genome browser at UCSC. Genom. Res. 12, 996–1006 (2002).
Article CAS Google Scholar
Wang, J. et al. MARRVEL: Integration of human and model organism genetic resources to facilitate functional annotation of the human genome. Am. J. Hum. Genet. 100, 843–853 (2017).
Article CAS Google Scholar
Kopanos, C. et al. VarSome: The human genomic variant search engine. Bioinformatics 35, 1978–1980 (2019).
Article CAS Google Scholar
OMIM—Online Mendelian Inheritance in Man. https://omim.org/.
Firth, H. V. et al. DECIPHER: Database of chromosomal imbalance and phenotype in humans using ensembl resources. Am. J. Hum. Genet. 84, 524–533 (2009).
Article CAS Google Scholar
Stelzer, G. et al. The GeneCards suite: From gene data mining to disease genome sequence analyses. Curr. Protoc. Bioinform. 54, 1.30.1-1.30.33 (2016).
Article Google Scholar
GTEx Consortium. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Article Google Scholar
Nebula Genomics. https://nebula.org/whole-genome-sequencing/.
University of Washington Center for Mendelian Genomics. MyGene2. https://mygene2.org/MyGene2/.
Köhler, S. et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am. J. Hum. Genet. 85, 457–464 (2009).
Article Google Scholar
Martin, A. R. et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat. Genet. 51, 1560–1565 (2019).
Article CAS Google Scholar
Karczewski, K. J. et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. bioRxiv https://doi.org/10.1101/531210 (2019).
Article Google Scholar
Vergano, S. A., van der Sluijs, P. J. & Santen, G. ARID1B-related disorder. In GeneReviews® (eds Adam, M. P. et al.) (University of Washington, 2019).
Google Scholar
Takenouchi, T., Miura, K., Uehara, T., Mizuno, S. & Kosaki, K. Establishing SON in 21q22.11 as a cause a new syndromic form of intellectual disability: Possible contribution to Braddock-Carey syndrome phenotype. Am. J. Med. Genet. A 170, 2587–2590 (2016).
Article CAS Google Scholar
Kim, J.-H. et al. De novo mutations in SON disrupt RNA splicing of genes essential for brain development and metabolism, causing an intellectual-disability syndrome. Am. J. Hum. Genet. 99, 711–719 (2016).
Article CAS Google Scholar
Tokita, M. J. et al. De novo truncating variants in SON cause intellectual disability, congenital malformations, and failure to thrive. Am. J. Hum. Genet. 99, 720–727 (2016).
Article CAS Google Scholar
Yang, Y., Xu, L., Yu, Z., Huang, H. & Yang, L. Clinical and genetic analysis of ZTTK syndrome caused by SON heterozygous mutation c.394C>T. Mol. Genet. Genom. Med. 7, e953 (2019).
Google Scholar
Ward, A. et al. Rapid clinical diagnostic variant investigation of genomic patient sequencing data with iobio web tools. J. Clin. Transl. Sci. 1, 381–386 (2017).
Article Google Scholar
Frameshift Genomics—Genomic Data Visualization and Analytics. Frameshift Genomics https://frameshift.io/.
Havrilla, J. M., Pedersen, B. S., Layer, R. M. & Quinlan, A. R. A map of constrained coding regions in the human genome. bioRxiv https://doi.org/10.1101/220814 (2017).
Article Google Scholar

Download references

Acknowledgements

We would like to acknowledge all those whose feedback, both through personal use and organized testing sessions have been instrumental in focusing gene.iobio, and iobio development in general. This includes: Rong Mao, Pinar Bayrak-Toydemir, Steven Guthery, Marti Tristani-Firouzi, Nicola Longo, Betsy Ostrander, Hilary Coon, Josh Bonkowsky, Tatiana Tvrdik, Will Dere, Karl Voelkerding, Attila Kumanovics, Karin Chen, Russ Butterfield, Steve Bleyl and David Nix. This research was carried out with funding from NHGRI (R01HG009712 to GTM).

Author information

Authors and Affiliations

University of Utah, Salt Lake City, UT, USA
Tonya Di Sera, Matt Velinder, Alistair Ward, Yi Qiao, Stephanie Georges, Chase Miller, Anders Pitman, Will Richards, Aditya Ekawade, David Viskochil, John C. Carey, Laura Pace, Jim Bale, Stacey L. Clardy, Ashley Andrews, Lorenzo Botto & Gabor Marth

Authors

Tonya Di Sera
View author publications
You can also search for this author in PubMed Google Scholar
Matt Velinder
View author publications
You can also search for this author in PubMed Google Scholar
Alistair Ward
View author publications
You can also search for this author in PubMed Google Scholar
Yi Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Stephanie Georges
View author publications
You can also search for this author in PubMed Google Scholar
Chase Miller
View author publications
You can also search for this author in PubMed Google Scholar
Anders Pitman
View author publications
You can also search for this author in PubMed Google Scholar
Will Richards
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Ekawade
View author publications
You can also search for this author in PubMed Google Scholar
David Viskochil
View author publications
You can also search for this author in PubMed Google Scholar
John C. Carey
View author publications
You can also search for this author in PubMed Google Scholar
Laura Pace
View author publications
You can also search for this author in PubMed Google Scholar
Jim Bale
View author publications
You can also search for this author in PubMed Google Scholar
Stacey L. Clardy
View author publications
You can also search for this author in PubMed Google Scholar
Ashley Andrews
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Botto
View author publications
You can also search for this author in PubMed Google Scholar
Gabor Marth
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.D. implemented the overwhelming majority of the software. A.E., W.R., S.G., A.P. helped in software development. A.W., Y.Q., C.M. and G.M. conceived of the project. D.V., J.C., L.P., J.B., S.C., A.A. and L.B. helped guide software development from a clinical perspective. M.V. managed software development, conceived of and wrote the manuscript.

Corresponding authors

Correspondence to Matt Velinder or Gabor Marth.

Ethics declarations

Competing interests

Gabor Marth is a founder and chief scientific officer of Frameshift Labs, Inc. and Alistair Ward is a founder and chief executive officer of Frameshift Labs, Inc.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this Article was revised: The Competing interests section in the original version of this Article was incorrect. “The authors declare no competing interests.” now reads: “Gabor Marth is a founder and chief scientific officer of Frameshift Labs, Inc. and Alistair Ward is a founder and chief executive officer of Frameshift Labs, Inc.”

Supplementary Information

Supplementary Table 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Di Sera, T., Velinder, M., Ward, A. et al. Gene.iobio: an interactive web tool for versatile, clinically-driven variant interrogation and prioritization. Sci Rep 11, 20307 (2021). https://doi.org/10.1038/s41598-021-99752-5

Download citation

Received: 07 April 2021
Accepted: 03 September 2021
Published: 13 October 2021
DOI: https://doi.org/10.1038/s41598-021-99752-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.