One can sum up all this by saying that the criterion of the scientific status of a theory is its falsifiability, or refutability, or testability. —Karl Popper, Conjectures and Refutations

Here, inspired by the bubble-popping metaphor as much as by Popper himself, we propose a way in which research articles can become partially or wholly semantic publications. This means that a publication can state not only that the entire article is fit for the journal but also that the semantic assertions the article contains (such as “mutation X in gene Y causes disease Z”) have met specific criteria. If this process is implemented, assertions within a publication made by the author will be checked against their stated experimental results by peer referees. When the experiment is an apt test of an assertion, the referees can state that and the journal can apply its own tag to make an assertion-experiment triple with provenance. This construct holds the hypothesis and the falsifying or confirming experiment together, and it is applied by the publishing journal or database after peer scrutiny meets its criteria.

Our definition of a conceptual advance worthy of publication is a theory or model that makes predictions that without the theory would be extremely unlikely, together with the skeptical testing of those predictions. There is, of course, always a place for descriptive and exploratory science, but much of genomics research progresses by hypothesis generation and experimental replication. Although the context in which hypotheses are asserted and tested can vary by experimental method, paradigmatic framework and scientific strategy, there are examples of reproducible and productive methods in genetics that are already amenable to semantic enhancement. Indeed, the approach may reduce the time, anxiety and waste of some practices we currently struggle with.

By paying careful attention to community standards for separating the generation of a hypothesis and testing of exactly the same hypothesis under conditions that explicitly invite its refutation, we have been able in recent years to publish results in several fields that can be readily reused by other scientists. For example, genome-wide association studies have since 2007 entailed a required internal replication because only a few years earlier less than a quarter of such studies could be successfully repeated. Monogenic disease studies including targeted and exome resequencing need to show in more than one pedigree mutations sufficient to disrupt the function of the hypothesized gene in a manner consistent with the stated mode of inheritance. Somatic mutations proposed to be recurrent drivers of cancer must be identified not only in the discovery samples but also in a separate series of further tumor-normal pairs. With semantic tagging, results within publications that meet community standards of hypothesis testing can be tagged as such, and what remains can be labeled untested hypothesis or available experimental result to invite further investigation.

It follows that studies that meet the technical criteria for hypothesis generation but not the full community standard for testing should not be considered as compromising the advance in these fuller publications held to the higher standard. Nor should they be considered premature or scientifically invalid. Indeed, the rapid dissemination of the hypothesis may accelerate and influence research directions of those with similar information or those prepared to take a greater risk. In some fields, there may be benefit in rapid dissemination of results that by themselves do not yet meet publication criteria in any existing journal. One example would be a single instance of a human mutation that could aid in decision making for an individual with a rare disease (J. Med. Genet. 48, 577–578, 2011). In this case, the hypothesis tag would act as an attribution license on a database entry that could ensure proper provenance and authorship or contribution credit once corroborating mutations were found in separate pedigrees sufficient to map the causal gene unambiguously. Dedicated medical genetics databases or alliances of affected individuals could become the stewards of these isolated observations.

One of the consequences of the greater transparency of online publication is the routine use of semantic identifiers such as standardized names for medical entities and genes, genome coordinates and data set accession codes. The relationships among these entities can also be formally stated within a given context and experimental setup. We invite you as authors to identify the assertions within your articles that can be defined in a sufficiently formal way that invites refutation, as well as the conditions under which they were tested and the experimental evidence that falsifies or corroborates each assertion. For the moment, these may be submitted as a Supplementary Table to be considered alongside the main article. We will conduct an experiment on the use of these in peer review and work towards their formal representation in the ensuing online publications.

We know that from where we stand the scientific method is alive and well, sound and productive. It thrives on clarity and improves with criticism. Let's use the new tools we have to make it even better.