To fully understand the context, methods, data and conclusions that pertain to an experiment, one must have access to a range of background information. However, the current diversity of experimental designs and analytical techniques complicates the discovery and evaluation of experimental data; furthermore, the increasing rate of production of those data compounds the problem. Community opinion increasingly favors that a regularized set of the available metadata ('data about the data') pertaining to an experiment1,2 be associated with the results, making explicit both the biological and methodological contexts. Many journals and funding agencies now require that authors reporting microarray-based transcriptomics experiments comply with the Minimum Information about a Microarray Experiment (MIAME) checklist3 as a prerequisite for publication4,5,6,7. Similarly, minimum information guidelines for reporting proteomics experiments and describing systems biology models are gaining broader support in their respective database communities8,9; and progress is being made toward the standardization of the reporting of clinical trials in the medical literature10. Such minimum information checklists promote transparency in experimental reporting, enhance accessibility to data and support effective quality assessment, increasing the general value of a body of work (and the competitiveness of the originators).
Collaborative minimum information checklist development projects for diverse biologically and technologically delineated subject areas are ongoing. A special issue of the journal OMICS11 included invited pieces from eight communities supporting minimum information checklist development projects. However, until recently there were no mechanisms for such projects to coordinate their development. Consequently, the full range of checklists can be difficult to establish without intensive searching, and tracking their evolution is nontrivial. Furthermore, overlaps in scope and arbitrary decisions on wording and substructuring inhibit their use in combination. These issues present difficulties for checklist users, especially those who routinely combine information from several disciplines. Here we explore some of the issues arising from the development of checklists in relative isolation, discuss the potential benefits of greater coordination and describe the mechanisms we have put in place to facilitate such coordination. In summary, we present the MIBBI project (http://www.mibbi.org/), which maintains a web-based, freely accessible resource for checklist projects, providing straightforward access to extant checklists (and to complementary data formats, controlled vocabularies, tools and databases), thereby enhancing both transparency and accessibility, as discussed above. MIBBI is managed by representatives of its various participant communities and is fully open to comment from any interested party. Our goal is to facilitate the development of an integrated checklist resource site for the wider bioscience community.
On the need to harmonize minimum information checklists
The current proliferation of documents specifying the minimum information to provide when reporting particular kinds of experimental data has in large part been driven by the advent of a range of so-called 'omics' (and allied) technologies, many of which operate in a high-throughput mode, thereby generating large volumes of data. These documents have been developed independently for the most part, and as a result feature many arbitrary differences in both wording and structure. This greatly complicates the integration of data sets that comply with different minimum information checklists. Increasing appreciation of the potential value accruing to 'secondary use' of data is also a significant factor8, reflecting the general increase in frequency of data-driven (as opposed to hypothesis-driven) investigations in recent years. These trends have together made the need for coordination and harmonization between groups developing data format and reporting standards a critical issue10. Throughout this document, the words 'standard' and 'standardization' are used to refer only to the regularization of data capture, representation, annotation or reporting, as opposed to best practices for experimental procedures. Specifically, we refer to three kinds of reporting standards: (i) minimum information checklists or guidelines; (ii) formats (syntax); and (iii) controlled vocabularies and ontologies (semantics).
It is clear that checklists should be developed through close consultation with their sponsoring practitioner communities, but such checklists should also, we believe, be designed to anticipate 'cross-domain' integrative activities. It is unhelpful to confine checklists for the use of particular technologies to a limited set of biologically delineated communities, or to conceive of any such community as being restricted to a particular set of technologies. Consider mass spectrometry, which is used in the study of proteins, metabolites and even to sequence genes; or consider toxicology, which may use any or all of the available 'omics' technologies in pursuit of the greater understanding of the mode of action of a particular compound. Clearly the vistas from any two locations can overlap substantially, so who can claim sole ownership of any part of the scientific landscape? Initiatives such as that to harmonize the description of 'sample' (the biological source material for a study)12 or to develop (separable) community-level extensions to shared core standards such as MIAME to better describe domain-specific studies (for example, in environmental biology13) are clearly the order of the day. This throws into relief an important division between analytical approaches and the various subdivisions of the biosciences. Checklists that do not span that division will always achieve greater utility because they can be reused more straightforwardly to construct new, made-to-order checklists for a wider range of workflows.
The management of information from experiments (both data and metadata) requires the adoption of reporting standards that ensure transparency and interoperability and that facilitate the integration and exchange of data from different sources. Reporting standards also facilitate the execution of more powerful queries against repositories of experimental data because core information will be regularized and extended information will be supplied in a well characterized manner. This long-term vision will require significant effort and buy-in from a range of scientific communities spread across many nations, but development of some of the kinds of component required to establish such infrastructure is well underway: Functional Genomics Experiment14 (FuGE) is an object-oriented data model (with an associated XML-based syntactic format) capable of capturing a wide range of (meta)data in a consistent manner; Reporting Structure for Biological Investigations (RSBI)15 provides a foundational lingua franca for standards projects (described further below) and builds on this to define a simple, but general, tabular format (ISA-TAB)16 aligned with FuGE; Ontology for Biomedical Investigations (OBI; http://obi.sourceforge.net/) is a broad-scope ontology providing a self-compatible set of terms with which to describe a wide range of biological and medical studies; and the Open Biomedical Ontologies (OBO) Foundry17 (http://obofoundry.org/) coordinates the development of a set of 'gold-standard reference ontologies' (including OBI) that can be used in combination because they are based on common principles and, importantly, because procedures have been established to ensure resolution of the conflicts that might arise where ontologies overlap.
Although the primary purpose of minimum information checklists is to guide researchers in reporting their experiments, they can, for the kinds of projects mentioned above, serve a valuable role as key 'use cases', in that they represent the distilled opinion of a particular community on the information that should normally be captured to effectively describe a particular kind of experiment. They therefore provide a realistic scenario with which to test any resource's suitability for use by a community; for example, for software and database developers to ensure that their products can handle the specified data appropriately; or for instrument vendors to offer checklist-compliant data set export from their instrument management software. It is also likely that journals and funders will adopt some checklists wholesale, incorporating them into their guidance for authors and applicants.
A resource for minimum information checklists: MIBBI
The activities of standardization groups often go unpublished and may not be accessible at all, practically speaking. A common resource for minimum information checklists, coordinated by a group of community representatives from ongoing standardization activities, will help unify the standardization community. It will assist in recruiting participants to ongoing activities and it will help to maintain transparency of process by providing access to project-related information (for example, status, key players and plans). It will also ease the establishment of new initiatives by providing answers to questions such as, “How do we get started?” and, importantly, “How do we make sure we don't reinvent the wheel?”. Such an effort will improve communication, knowledge transfer and integration between checklist development projects hailing from different scientific communities and, further, between different kinds of reporting standards projects, ultimately resulting in simplified access to a broad range of richly annotated data for the end user. Thus, we have established the MIBBI project—a web-based, communal resource designed to act as a 'one-stop shop' for those exploring the range of extant checklist projects and to foster collaborative, integrative development of checklists (http://www.mibbi.org/).
MIBBI has two key parts. The first is the 'Portal', which exists simply to raise awareness of, and afford more straightforward access to, a wide range of checklists by providing researchers, journal editors, reviewers, funders and the wider community of checklist developers with a quick and simple way to discover (whether there is) a checklist addressing a particular area and to establish the scope and progress of the underlying project. The Portal provides summary information for each of the MIBBI-affiliated projects; specifically, the primary contact(s) and website (where available), an overview of the project's scope and developmental status and links to publications and other documents (including, where possible, a link to the most recent version of that project's checklist). Information available through the Portal will be updated as circumstances change (for example, if a project is fragmented or amalgamated, or simply becomes dormant). Box 1 offers brief textual descriptions of the 21 projects currently registered with MIBBI; Table 1 provides a representation of the concepts that comprise each project's scope, along with their checklist's developmental status and, where applicable, an indication that a checklist is composed of separate modules.
By signing up with the MIBBI Portal and thereby attracting more intensive peer oversight, communities will come under pressure to maintain their checklists in light of scientific advances, to provide open access to their processes and to respond to comments. We hope that one of the primary benefits of the Portal will be to raise awareness in the biological and medical communities of the importance of standardization, thereby increasing willingness among researchers to become involved in guiding and shaping the evolution of these activities. We hope it will help push the community to strive for compliance in their own publication and data-dissemination practices by facilitating access to relevant information about these efforts. We also see this as an excellent artifact with which to promote collaboration within and between communities: the principle we endorse is that if a broadly relevant effort already exists (for example, describing the use of a particular technology), individuals with an interest should seek to join that effort rather than compete with it. However, it is crucial that MIBBI never preclude revisions or innovations; the hoped-for kudos and enhanced coordination accruing to membership should not translate to a possible dominion.
The second key part of MIBBI is the 'Foundry'. Communities can, if motivated, sign up with the Foundry to jointly examine ways to refactor the checklists over which they have control and then to develop a suite of self-consistent, clearly bounded, orthogonal, integrable checklist modules. These modules will then be made available to the community through the MICheckout tool, a collaborative development between the European Bioinformatics Institute and the UK Natural Environmental Research Council's Environmental Bioinformatics Centre. MICheckout will assist users in compiling the correct list of modules and downloading them in a form that they can use. Note that registering a project with MIBBI implies no commitment by a project to participate in the Foundry activity. Furthermore, attempts to integrate checklists through the Foundry should be managed through a community-driven mechanism that relies primarily on openness and transparency to encourage (voluntary) uptake. The MIBBI Foundry is modeled on the OBO Foundry17, a newly established initiative in the field of ontology development. Communities working together through MIBBI will produce orthogonal (that is, non-overlapping) minimum information modules, just as the communities involved with the OBO Foundry are aiming to produce orthogonal ontologies.
Foundry activities must be driven by the member communities (acting through their representatives). In preparation for the Foundry activity, we have established discussion forums to facilitate communication between communities to encourage discussion of the overlaps between checklists. Exploratory studies are ongoing, based on coarse comparison tables (such as Table 1) that highlight areas addressed by one or more projects. The next stage is to use 'groupware' (that is, a wiki or an online document-sharing tool) to jointly develop modules for those shared areas. Throughout this gradually intensifying activity, we will hold regular face-to-face meetings that act as development workshops and promote good working relationships between project representatives. The first such meeting was held in April 2008 at the European Bioinformatics Institute and was funded by the UK's Biotechnology and Biological Sciences Research Council. This first meeting rapidly reached consensus on a work plan and established working groups to begin to generate MIBBI Foundry modules. A full workshop report is available through the project's website (http://www.mibbi.org/).
High-level abstractions of the components of experimental workflows offer a useful framework to support the integration of checklists. An example of a group attempting to produce such abstractions is the RSBI working group14, which interacts with a number of other initiatives18,19,20 in working toward an integrated view of functional genomics investigations. In their characterization, an 'Investigation' is a self-contained unit of scientific enquiry, with a holistic hypothesis or objective and a design that is defined by the relationships between one or more 'Studies' and 'Assays'. A Study represents the part of an experiment containing information about the biological material, and an Assay is the part using particular technologies that produce data. The RSBI's proposed framework of well defined, high-level abstractions (such as the three just described) was developed because the above concepts are duplicated, but differently named, across different checklists, confounding the uniform description of the diverse events that may occur within a Study (sensu RSBI).
Foundational analysis of MIBBI-registered projects
To better understand the scope and depth of the various MIBBI-registered minimum information checklists, we performed a comparative analysis. Table 1 presents a projection of the various checklists onto a coarse-grained list of ad hoc concepts, constructed exclusively for the purpose of identifying overlaps between those existing checklists; note that the concepts vary widely in breadth of scope (see Box 2), so the number of concepts addressed by any one project is not necessarily indicative of the size of that project's guidelines, as some concepts cover whole workflows (for example, 'nucleic acid sequencing'). It will be clear to the reader that some of these concepts, such as 'organism', are almost universal, whereas others, such as 'quantitative PCR amplification', may relate to one group alone. It is also clear that the depth of description required in relation to particular concepts varies widely across projects, suggesting a 'tiered' approach; that is, some of the checklist modules generated by the MIBBI Foundry should, in some cases, require a different depth of description contingent on the particular experimental context. Row and column totals (summing presence or absence only) are provided in Table 1; the row totals have been used to rank-order concepts by 'popularity'. Figure 1 lists the eighteen most common ad hoc concepts.
To support greater understanding of the relatedness of the different projects and of the various ad hoc concepts, we conducted two pairwise comparisons using the data presented in Table 1: concepts 'shared' between pairs of projects, and pairs of concepts occurring together within projects (counting presence or absence only). Supplementary Figure 1 online illustrates the interrelatedness of the 21 MIBBI-registered projects both as a tree and as an interaction graph. These two representations make clear that there is a subset of closely related (that is, heavily overlapping) projects; these are, broadly speaking, the 'technologically delineated' projects, such as MIAME and the Minimum Information About Proteomics Experiment (MIAPE). It is also clear that there are many projects that are 'related' (according to the tree, if considered in isolation) only by their low degree of relatedness to any other project (as the interaction graph makes explicit). Supplementary Figure 2 online presents an unrooted tree expressing the relatedness of individual concepts. Although this analysis is based on the various projects' scopes, rather than any sense of the similarity of the concepts themselves, it produces some sensible-looking groupings. All the highly ranked ('high-priority') concepts from Figure 1 cluster together because most of the projects share an interest in many of them, so they are often found to occur together in individual projects' scopes. Such an analysis can help in deciding how the ad hoc concept-based survey presented in Table 1 should be used as we draft the checklist modules that will ultimately be developed by participants in the MIBBI Foundry's activities (that is, whether some concepts can be combined, whether others should be further subdivided, and so on).
These various analyses make two things plain: first, that there are standout priority areas for the MIBBI Foundry (for example, the uniform description of an organism); and second, that there are many niche areas where little or no collaborative activity is required (for example, the process of mouse phenotyping)—a simple endorsement by MIBBI of the products of a particular project being sufficient, as things stand.
By providing easy access to checklist development projects and their products, MIBBI will facilitate the discovery of checklists appropriate to the needs of practitioners from diverse parts of biological and biomedical science (the 'one-stop shop' principle). The widespread availability of well-annotated data sets, ensuant to the routine use of minimum information checklists, will increase secondary use of data and allow for a more thorough assessment of the worth of a body of work, making for more efficient and effective science.
MIBBI will increase connectivity between minimum information checklist projects and, more widely, will increase connectivity with projects developing other kinds of informatics resources (formats, vocabularies, tools, databases). The resultant evolution of an interdisciplinary community of checklist developers will bring into focus the collective expertise residing in that group. It will accelerate the establishment of mutually beneficial networks of expertise, and it will advance (through the MIBBI Foundry, building on the foundational analysis presented here) our long-term vision of a fully integrated, broad-coverage suite of minimum information checklists, in step with the general movement in the biological and medical sciences toward integrated, multifaceted investigations of the puzzles that remain to be addressed in the postgenomic era.
D.F., S.-A.S. and C.F.T. conceived and designed the concept of MIBBI as synergistic project; D.F. and S.-A.S. raised the funds to support MIBBI activities, and C.F.T. performed the analysis presented in this paper; all authors discussed the results and implications and commented on the manuscript.
Note: Supplementary information is available on the Nature Biotechnology website.
We. acknowledge funding from the UK Natural Environmental Research Council's Environmental Bioinformatics Centre and the UK Biotechnology and Biological Sciences Research Council (BB/E025080/1) to D.F. and S.-A.S. to support C.F.T. and MIBBI. Work on MIFlowCyt is supported by the US National Institutes of Health's National Institute of Biomedical Imaging and Bioengineering (EB005034-01) and by Bioinformatics Integration Support Contract A140076 from the US National Institute of Allergy and Infectious Diseases. R.R.B. is supported by the Michael Smith Foundation for Health Research, by the International Society for the Advancement of Cytology and by grant funding from the US National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health (R01EB005034). N.W.H. acknowledges the support of the European Union Framework VI project META-PHOR (Food-ST-2006-03622). F.G., P.L. and work on CARMEN are supported by the UK Engineering and Physical Sciences Research Council (EP/E002331/1). K.T. acknowledges support from Science Foundation Ireland. Work on MIAME/Tox and MIAME/Nutr by P.R-S. is supported by the NuGO (NoE 503630) and CarcinoGenomics (PL 037712) European Union projects. Work on MIARE is supported by the eDIKT project. Opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the US National Science Foundation or the US National Institutes of Health.
Supplementary Figures 1–2