The burgeoning of public databases is benefiting climate change research, and the archiving of data in accessible, permanent repositories will soon become the norm.
Climate change research increasingly depends on the storage and ready accessibility of vast amounts of data, along with the brute — albeit sophisticated — power of modern computational and analytical methods. Massive observational data sets are used to test and validate models, the results of which are essential for predicting and preparing for future climate change and its expected impacts, and for informing the policymakers tasked with dealing with mitigation and adaptation. In short, climate change research is now 'big science', comparable in its magnitude, complexity and societal importance to human genomics and bioinformatics.
Over the past couple of decades, the genomics community has developed a strong culture of scientific openness. Since the initial sequencing of the human genome, the deposition of sequence and metadata in public data repositories has become not only routine but de rigueur. It can be confidently stated that the accessibility of this information, often in standardized formats, has helped to catalyse the remarkable advances seen over recent years, as has the availability of bioinformatics software packages used to analyse and interpret the reams of data churned out by high-throughput sequencing machines, as well as information on regulatory networks, gene expression and the biochemical make-up of cells.
The advances in genomics continue apace. Consider, for example, the coordinated publication last month of 30 open-access papers — six of them in Nature — by the international ENCODE consortium. ENCODE stands for the Encyclopaedia of DNA Elements. The aim of the project is to identify all of the functional elements, whether coding or regulatory, present within the human genome. Primarily funded by the National Human Genome Research Institute at the US National Institutes of Health (www.genome.gov) and based in Bethesda, Maryland, ENCODE is “a community resource project to accelerate access to and use of the data by the entire scientific community.” So far the project has generated no fewer than 1,640 large data sets, all of which are openly accessible in public databases and are available for use by others without restriction. This very fact is likely to spur huge further analyses, perhaps leading to insights not even anticipated by members of the consortium itself. Lead ENCODE analysis coordinator Ewan Birney (Nature 489, 49–51; 2012) says of this: “Large consortia clearly benefit from an open-door policy that allows new, unfunded analysts to participate. And when these individuals join the group or work with released consortium data, their analyses should be considered equally creditable and stigma-free relative to those performed by long-standing group members.”
After some false starts, and hard lessons learned, climate change researchers have woken up to the need for transparency and are increasingly following the example set by the genomics community in fostering scientific openness and the sharing of information through public data repositories. This trend is already reaping rewards. For example, the Coupled Model Intercomparison Project (http://cmip-pcmdi.llnl.gov/) developed under the auspices of the World Climate Research Programme (www.wcrp-climate.org) provides “a community-based infrastructure in support of climate model diagnosis, validation, intercomparison, documentation and data access.” Data generated by the project are archived by the Program for Climate Model Diagnosis and Intercomparison, established by the Lawrence Livermore National Laboratory in California. This allows researchers from a range of disciplines to analyse and compare general circulation models systematically, thereby aiding model improvement and the identification of key uncertainties in our understanding of the climate system.
Data accessibility can only increase the transparency, quality and perceived credibility of climate change science. It is no surprise therefore that national science funding agencies are increasingly insisting on the free access of data resulting from publicly funded research. Since 2011, grant proposals to the US National Science Foundation, for example, have had to be accompanied by a data management plan. The United Kingdom's Research Councils now have a similar policy, and others are certain to follow suit.
Meanwhile, there are many existing databases that archive data of interest to the broad climate change research community. Some of these are long established, at least at national level. For example, the British Oceanographic Data Centre (www.bodc.ac.uk) traces its origins to 1969. Funded largely by the UK Natural Environment Research Council, it is responsible for the storage, quality control and archiving of biological, chemical, physical and geophysical data. Much of the information and data archived by the British Oceanographic Data Centre and other networked oceanographic data centres around the world are of direct relevance to issues such as flood risk, ocean warming and acidification, and the impact of climate change on marine ecosystems.
Earth sciences databases include PANGAEA (www.pangaea.de), which has been operational on the Internet since 1995. Hosted by the Alfred Wegener Institute for Polar and Marine Research in Bremerhaven, and the Center for Marine Environmental Sciences at the University of Bremen, PANGAEA is an Open Access library that archives geo-scientific and environmental data with an emphasis on global change research. Other highly relevant data repositories include the World Data Center System (www.icsu-wds.org), and the National Climatic Data Center (www.ncdc.noaa.gov) run by the National Oceanographic and Atmospheric Administration in the United States. Although gaps exist, there are of course a host of other data repositories catering for different scientific communities broadly concerned with past, present and future global climate and environmental change.
Nature Climate Change encourages authors of submitted manuscripts to archive their data in accessible, permanent public databases and to provide the editors with full citation details so that a link to the data can be included in the paper in the event of publication. We recognize that this can generate extra work for authors and that some will fear that other researchers will make use of such data in their own publications, even if the source of those data is properly acknowledged. But it is increasingly evident that the benefits of data deposition to the research community greatly outweigh the costs (see Nature Clim. Change 1, 10–12; 2011). With moves afoot to make primary scientific data sets citable in their own right (see for example www.datacite.org), perceived loss of credit for those who generated the data in the first place should soon be a thing of the past.