Saliva research

The need to advance saliva research is strongly recognized by the Strategic Plan of the National Institute of Dental and Craniofacial Research.1 The National Cancer Institute has also recognized saliva as a promising cancer biomarker source.2 The ability to monitor health status, disease onset, progression, recurrence and treatment outcome through non-invasive means is highly important to advancing health care management. Saliva (oral fluid) is a perfect medium to be explored, offering the potential for a non-invasive, easy to obtain means for detecting and monitoring disease.

The adoption of saliva testing would allow a patient to collect their own specimens at home, yielding savings in health costs, convenience for the patient and facilitating multiple sampling. Specimen collection is less objectionable to patients than in the case of other bodily fluids and easier in children and older individuals. The analysis of saliva can thus provide a cost-effective approach for the screening of large populations. Due to these significant advantages, developing biomarkers in saliva for the detection of serious illnesses such as oral and systemic cancers has been on the national healthcare agenda for several years (Government Performance & Results Act 2008).3 One mandate formulated in the Government Performance & Results Act report is that by the year 2013 proof of principle will be obtained for the ability of saliva to monitor health and diagnose for one systemic disease.

Challenges and opportunities

A vast amount of saliva omics data has been generated by recent studies using high throughput technologies.4,5,6,7 However, there are still barriers which researchers must overcome before such data can be exploited, such as lack of computationally accessible salivary data and information, and inability to cross-reference the salivaomics data that could potentially be made available through different proteomics, transcriptomics, genomics and metabolomics studies. For these reasons, there is an urgent need to create the Salivaomics Knowledge Base (SKB), a data management system and web resource constructed to support saliva diagnostics research, and we present below the informatics advances brought about through the SKB and through the associated tools and resources.

Biomedical ontology

Ontologies are controlled structured vocabularies designed to provide consensus-based means to ensure consistent description of data by scientists working in disparate domains. As applied in the biomedical domain, ontology plays a key role in providing consensus-based controlled vocabularies serving the consistent annotation of biological and medical data and information, most conspicuously within the framework of the Gene Ontology8 and now of its sister ontologies within the Open Biomedical Ontologies Foundry (http://obofoundry.org).

The Basic Formal Ontology (BFO) is a formal ontological framework developed by Barry Smith, Pierre Grenon and others, which serves as the starting point for some 100 ontology projects primarily in the biomedical domain (http://www.ifomis.uni-saarland.de/bfo/). The BFO framework can be readily extended to the treatment of families of ontologies of other types, above all to the treatment of relations between ontologies of different levels of granularity, from genes to species and from a single patient to epidemics at a geographical scale (combining applications of BFO to the medical and to the geographical domain). The framework may also be used as a tool for dealing with the relations between distinct perspectives on the biomedical domain, including culturally generated perspectives of the sort which are studied by linguists and anthropologists.9

Two BFO-based ontologies of special significance for our work here are the Ontology for Biomedical Investigations (OBI)10 and the Ontology for General Medical Sciences. OBI addresses the need for a controlled vocabulary to support integration of experimental data. The OBI is an ontology designed to serve the coordinated representation of designs, protocols, instrumentation, materials, processes, data and types of analysis in all areas of biological and biomedical investigation. Ontology for General Medical Science is an ontology of the entities involved in the clinical encounter. Thus, it includes very general terms that are used across medical disciplines, including: ‘disease’, ‘disorder’, ‘disease course’, ‘diagnosis’, ‘patient’ and ‘healthcare provider’.11

A dental research ontology consortium

To advance the consistency of data in the dental research community, Smith et al.12 propose an approach to building a consensus-based ontology to support dental research (ODR). In analogy to efforts in other fields,11 a consortium of research groups specializing in different areas of study would undertake such an effort, each building different components of ontology to support dental research. Initial efforts in this direction, by scientists in dental research and biomedical ontology at University at Buffalo and University of California, include work on the ontology of oral pathology, oral maxillofacial anatomy, dental disease and dental procedures, and as we discuss below, the Saliva Ontology.13

Integral to his work is a plan to allow a seamless connection between the use of ontology to support dental research in the dental domain and the use of existing ontology resources developed in other areas of biology and medicine, by reusing elements and strategies from them. The anatomy work is based on the Foundational Model of Anatomy. The ontology of dental procedures use the framework established by OBI.14 The work on dental diseases is carried out in conjunction with the development of Ontology for General Medical Science.11

Saliva ontology

The Saliva Ontology (SALO) (Figure 1) is a detailed ontology of this bodily fluid that is optimized to meet the needs of both the clinical diagnostic community and the cross-disciplinary community of omics researchers. The SALO is created through cross-disciplinary interaction with saliva experts, protein experts, diagnosticians and ontologists. To aid development and testing of SALO, we develop a corpus of saliva-relevant literature in SKB to assist in characterizing core terms and synonyms within the ontology and to provide links between SALO content and relevant items in PubMed. SKB will also incorporate the results of experiments in data and text mining using the ontology.

Figure 1
figure 1

A fragment of the basic Saliva Ontology in its current form.

SALO will incorporate links to existing ontologies and terminology resources involving treatment of saliva-relevant phenomena. We will also identify and represent within SALO relationships to saliva-relevant types represented in ontologies such as the Gene Ontology, the Protein Ontology15 and the Chemical Entities of Biological Interest ontology,16 and also provide links to corresponding SNOMED CT terms where available. SALO is a public domain resource and entirely web-based. Each term in the ontology has its own URL which points to a webpage providing definitions, PubMed sources, references to annotations to SKB and to external databases. SALO and Blood Ontology17 are the foundation for a unified body fluids ontology resource—Body Fluids Ontology.18

BioMart

BioMart is a free, open-source, federated database system. It is cross-platform and supports many popular relational database managements systems, including MySQL, Oracle, PostgreSQL, SQL Server and DB2. The software is data-agnostic, and can therefore be easily adapted to existing data sets. It is expandable and customizable through a plug-in system, and is open-source so the community can participate in deeper development. Furthermore, BioMart can seamlessly connect geographically disparate databases, facilitating collaboration between different groups. These features have catalyzed the creation of BioMart Central Portal, a first of its kind community-supported effort to create a single access point integrating many different, independently administered biological databases. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying.19

SDxMart—a BioMart portal for salivaomics data

SDxMart is a BioMart data portal that hosts salivary proteomic, transcriptomic, metabolomic and microRNA data and offers access to the data by using the BioMart interface and querying environment. The SDxMart is designed to provide a variety of queries to facilitate saliva biomarker discovery including complex queries that integrate genomic, clinical and functional information.

The SDxMart holds data from projects of oral diseases and systemic diseases including oral cancer, Sjögren’s syndrome, pancreatic cancer and breast cancer. The types of datasets are: (i) proteomics; (ii) transcriptomics; (iii) microRNA; and (iv) metabolomics. In addition, the SDxMart is imported with several public databases including Ensembl genome database (Ensembl release 37),20 and the number of resources is continuously growing.

Summary

The SKB is being created to facilitate researchers using salivary data from multiple perspectives. It is being built in tandem with the SALO and SDxMart which will allow the SKB to interoperate with other omics databases as part of a general strategy to facilitate integration of heterogeneous and disparate data sources that enable system biology approaches. Either SALO or SDxMart is a first and only resource of its kind in the field of dentistry.