The open-data movement encourages the sharing and interlinking of heterogeneous research data through large infrastructures. We argue that a greater number of smaller databases for similar or related research domains could cut the financial, environmental, social and governance costs involved.

All-inclusive data repositories need large up-front investments and continued funding (see go.nature.com/3xvk0yp). Their energy consumption is huge, and significant work is needed to reformat vast amounts of data for easy sharing. The organization of deposited data can also vary and create confusion (Y. Demchenko et al. Int. Conf. Collab. Technol. Syst. 48–55; IEEE, 2013).

Streamlining data according to research domain and using a single format would simplify data processing and analysis, and be cheaper to run. These bespoke databases must not be balkanized: for example, different types of wildfire data — from the atmosphere, the ground or modelling, say — would need to be lodged in a single repository in one format. This approach can better tailor incentivizing mechanisms for data sharing. For instance, experiments at large facilities, such as synchrotrons, increasingly make their data available through citable DOIs.