Gene names can confound most-searched listings

Yale University, New Haven, Connecticut, USA.

Yale University, New Haven, Connecticut, USA.

A rough proportionality might be expected between the number of citations a gene collects in PubMed (see Nature 551, 427–431; 2017) and the hits it receives in Internet searches — where the former reflects its scientific value and the latter is also influenced by its impact on the wider public. Sometimes, however, the names of the genes themselves may introduce anomalies that distort this relationship.

Some gene names are much more popular outside science than they are on PubMed. ‘Superman’ is an example, referring as it does to a cult figure as well as to the SUPERMAN gene in the thale cress Arabidopsis thaliana. This distortion is particularly pronounced for longer gene names that are full words or phrases, such as drop dead and Brokenheart in the fruit fly Drosophila melanogaster (M. R. Seringhaus et al. Genome Biol. 9, 401; 2008).

Moreover, this distortion may be evident for genes that are now rarely a focus in the literature but still attract search-engine hits on a scale comparable to scientifically popular genes such as TP53, which encodes the tumour-suppressor protein p53. The gene for alcohol dehydrogenase (ADH), the enzyme responsible for metabolizing alcohol, is such an example.

Nature 553, 405 (2018)

doi: 10.1038/d41586-018-01077-3

