Large open-access data sets offer unprecedented opportunities for scientific discovery — the current global collapse of bee and frog populations are classic examples. However, we must resist the temptation to do science backwards by posing questions after, rather than before, data analysis.

A scant understanding of the context in which data sets were collected can lead to poorly framed questions and results, and to conclusions that are plain wrong. Scientists intending to make use of large composite data sets need to work closely with those responsible for gathering the data. Standard scientific principles and practice then demand that they first frame the important questions, then design and execute the data analyses needed to answer them.