Biologists' collaborative data repositories come of age.
Most Tuesdays, a group of scientists at the Wellcome Trust Sanger Institute in Hinxton, UK, meets over lunch to edit Wikipedia pages. But there is no obsessing over the minutiae of Britney Spears's career to be found here — instead, they are building the next generation of global biological databases.
"Yesterday, we created 18 microRNA articles," says Alex Bateman, a computational biologist at the Sanger Institute who helped to found Rfam, a database that includes a set of Wikipedia articles covering about 1,500 families of RNA molecules, maintained by more than 2,000 editors. In the past few years, community-curated biological websites such as this have multiplied, and on 29 November scientists will gather for a first-of-its kind conference called Biological Wikis, in Naples, Italy, to take stock of the 'bio-wiki' approach and plan future expansion.
Wikis, collaboratively edited web pages named after the Hawaiian word for quick, offer a solution to the growing data glut in biology. Conventional databases are struggling to keep up with the flood of information about genes and proteins that labs are amassing by the terabyte (1012 bytes). "The old model of annotation, where the central database handles that information, doesn't work," says Dan Bolser, a computational biologist at the University of Dundee, UK, who is involved in curating a protein-structure database called PDBWiki.
So biologists increasingly maintain and update web pages focused on particular genes or proteins — or any concept or object of interest (see ). The success of Wikipedia, the ubiquitous online encyclopaedia compiled using the same technique, proves that community annotation works, say bio-wiki enthusiasts. "There's a genuine generational and technological change that's happening," says Ewan Birney, a bioinformatician at the European Bioinformatics Institute in Hinxton.
At the Naples meeting, scientists will discuss the lessons of the handful of bio-wikis that are beginning to assemble a critical mass of readers and contributors. These bio-wikis are now attracting more contributions from the community than from the developers themselves, and advocates say that the sites are becoming indispensable tools in some areas of biology.
Gene Wiki, for instance, has more than 10,000 Wikipedia pages, each devoted to a single gene, and draws some 4 million views and 1,000 edits every month. Many scientists come to the site after an experiment identifies a laundry list of genes that are of interest in their work — and which they know little about or have never heard of, says Andrew Su, a bioinformatician at the Genomics Institute of the Novartis Research Foundation in San Diego, California, and one of the driving forces behind Gene Wiki. "It's a great way to go in and get up to speed," he says.
Bio-wikis that are hosted by Wikipedia benefit from the contributions of its existing altruistic community of 'Wikipedians', says Bateman. His team will soon launch a protein-family wiki that will also be hosted on Wikipedia. "One of the big surprises for me in all of this is that the contributions we're getting are as much from non-scientists as they are from scientists," he says. Non-specialists with enough interest and expertise to add information about the activity of microRNAs may be rare, but others can provide help with page formatting and standardization, which are "important, valid contributions", says Bateman.
But Wikipedia comes with its own rules and idiosyncrasies, which limit its usefulness for some kinds of biological data. To merit a page on Wikipedia, a subject — whether a gene, a protein or any other biological entity — must be considered noteworthy by the Wikipedia community. Important data, such as protein crystal structures and genetic variants, do not always qualify, says Su.
The rub is that many bio-wikis not housed within Wikipedia struggle to attract readers and editors. But Alexander Pico, a bioinformatician at the Gladstone Institute of Cardiovascular Disease in San Francisco, California, thinks that these problems will fix themselves. "The vision going forward is that more and more scientists will be involved in the curation and consumption of data and they won't need to accidentally stumble on it through Wikipedia," he argues. His team's WikiPathways site, which characterizes and visualizes biological pathways and is independent of Wikipedia, thrives because the systems biologists it attracts are already avid consumers of other people's data and therefore see the benefits of a wiki, says Pico.
One challenge to bio-wikis that will be addressed at the Naples meeting is their text-based default layout, says Su. Written entries devoted to individual genes and proteins fit well within Wikipedia. But the format is a poor match to the highly structured, searchable data sets favoured by computational biologists, which include the precise relationships between genes, proteins and other factors. A number of bio-wikis, including Su's Gene Wiki, are adopting a software package called Semantic MediaWiki. This will bring them closer to working like true databases: for instance, the software could allow scientists to search for all the proteins phosphorylated by a specific kinase enzyme expressed in a particular tissue, rather than having to look up each interaction individually.
Despite such innovations, bio-wikis might not truly take off until scientists can get career-advancing credit for contributing to them. "Editing your wiki is not going to get you your grant, it's not going to get you promoted," says Jim Hu, a molecular biologist at Texas A&M University in College Station and one of the founders of EcoliWiki, a repository of information about the model bacterium Escherichia coli. One database trying to solve the attribution problem is Wikigenes, a site devoted to annotating 120,000 genes and other biomedical concepts, which meticulously records and displays the individual contributions of its 1,800 active editors.
Persuading funding agencies and tenure committees to take those contributions seriously would mark a major milestone for bio-wikis. Until then, says Bolser, "it's not clear to scientists why they should spend time editing a wiki article if it just gets them kudos from a few geeks on Wikipedia".
About this article
PLOS ONE (2019)
Journal of the Association for Information Science and Technology (2017)
Hans Journal of Computational Biology (2015)
Nucleic Acids Research (2014)
PLoS ONE (2013)