The huge repositories of data collected by services such as Twitter, Facebook and Google can cause serious problems beyond quality control (Nature 481, 25; 2012).

Many of the emerging 'big data' come from private sources that are inaccessible to other researchers. The data source may be hidden, compounding problems of verification, as well as concerns about the generality of the results.

These results are meaningful only if many other data sets reveal the same behaviour. This uncovers a deeper problem: if an independent set of data fails to validate results derived from privately owned data, how do we know whether it is because those data are not universal or because the authors made a mistake?

If this trend continues, we could see a small group of scientists with access to private data repositories enjoying an unfair amount of attention at the expense of equally talented researchers without these 'connections'.