The US government is considering a massive plan to store almost all scientific data generated by federal agencies in publicly accessible digital repositories. The aim is for the kind of data access and sharing currently enjoyed by genome researchers via GenBank, or astronomers via the National Virtual Observatory, but for the whole of US science.

Scientists would then be able to access data from any federal agency and integrate it into their studies. For example, a researcher browsing an online journal article on the spread of a disease could not only pull up the underlying data, but mesh them with information from databases on agricultural land use, weather and genetic sequences.

Nature has learned that a draft strategic plan will be drawn up by next autumn by a new Interagency Working Group on Digital Data (IWGDD). It represents 22 agencies, including the National Science Foundation (NSF), NASA, the Departments of Energy, Agriculture, and Health and Human Services, and other government branches including the Office of Science and Technology Policy.

The group's first step is to set up a robust public infrastructure so all researchers have a permanent home for their data. One option is to create a national network of online data repositories, funded by the government and staffed by dedicated computing and archiving professionals. It would extend to all communities a model similar to the Arabidopsis Information Resource, in which 20 staff serve 13,000 registered users and 5,000 labs.

Agreeing standards is like Middle East peacekeeping — every detail has to be agreed on.

The group then aims to help scientific communities create standards to let databases in one field talk to others in different disciplines. That will be no mean feat. Even scientists in highly organized 'big science' fields such as genomics and astronomy, who routinely work with large shared data sets, encounter problems when trying to carry out sophisticated calculations using data from different communities.

For example, it is still a huge task to mash together astronomical observations of the same object taken at different wavelengths and stored in different databases, says Giuseppina Fabbiano of the Harvard-Smithsonian Center for Astrophysics, the Smithsonian's representative on the IWGDD. “We still cannot extract data from different archives and put it together seamlessly.” Agreeing standards even within closely linked communities “is like Middle-East peacekeeping — every detail has to be worked out and agreed on,” adds her colleague at the centre, Martin Elvis.

Many researchers are reluctant to share their raw data in the first place. The IWGDD is considering making submission of well-documented data sets to archives a requirement of getting a grant.

Christopher Greer, senior adviser for digital data at the NSF's Office of Cyberinfrastructure, says that if and when all federally supported science data are accessible, he hopes that publishers and computing companies will add on more sophisticated information services. This would give researchers unprecedented ability to test their ideas: “The next web browser could be a visualization and data-navigation tool, and the next Google an information integrator.”