The flood of digital research data that scientists are generating through genomics, sensor and other technologies has made it imperative to create an infrastructure to use, repurpose and preserve those data. Some such efforts are already under way, notably the US National Science Foundation's $100-million, five-year DataNet programme, and Europe's Alliance for Permanent Access (see Big Data special, http://tinyurl.com/5hh2rq). But how should the responsibilities be divided between governments and the private sector?

A series of events in December highlights the complexities of this issue. One was a pioneering move by Amazon to host large scientific data sets for free, starting with GenBank and other widely used sequence and chemical-structure databases

Amazon's move is not altogether altruistic. Although researchers will be able to download the data to their own computers, the company is betting that many will instead use its 'cloud computing' technology, which makes the company's vast server infrastructure available to process the data sets on a pay-as-you-go basis.

Such services could offer immense benefits to research. By giving scientific data a permanent home online, Amazon could help to ease the long-running problem of databases that are abandoned when, for example, funding dries up at the end of a research grant. Its cloud-computing approach could liberate smaller labs from the cost of running data centres of their own. And it should facilitate the sharing of data and analysis tools between widely dispersed research teams.

Making standardized data openly available could spur innovation of superior information services.

Also in December, however, came a reminder of the risks of depending on the public or private sector alone to create such infrastructure: Google announced it was abandoning its plans to host large scientific data sets for free, apparently because of the economic downturn. In November, the European Union, in collaboration with research organizations, libraries and museums, launched the Europeana online digital library as its much-touted alternative to Google Books. Europeana has scanned valuable historical collections, but its computing infrastructure crashed within hours, not clunking back into service until more than a month later. A similar fate met Géoportail, a service created by the French government as a competitor to Google Earth, and launched with great fanfare in 2006 by President Jacques Chirac.

In creating such public offerings, governments address valid concerns that private companies will exercise monopolies over significant cultural and scientific heritage. But their focus on creating their own digital libraries and databases too often means that other, perhaps more important, ways to address such concerns are neglected. Making standardized data openly available to both commercial and not-for-profit organizations, for instance, could spur innovation of superior information services. And to avoid embarrassing crashes, public efforts might well consider partnerships with private firms to tap into the economies of scale and expertise of the Googles and Amazons of the world.