I suggest two more 'red flags' in addition to the six that C. Glenn Begley identifies for evaluating preclinical studies (Nature 497, 433–434; 2013). These extend Begley's question regarding the suitability of statistical tests, and apply particularly to computational analyses of large amounts of data — such as those generated by high-throughput proteomics experiments.

The first new flag concerns the application of a multiple-hypothesis correction. The large number of statistical comparisons made in high-throughput data analyses will inflate estimates of significance by increasing the probability that an individual result at a particular significance could occur by chance.

Essentially, this extends the question, “how likely is it that the difference I observe in one measurement is a chance finding?” to the population-level question “how likely is it that I would find this difference by chance if I were to look at a whole bunch of measurements?” An example would be to assume that the chances of finding a left-handed player on a basketball team would simply be around 10% — the chance that an individual is left-handed. The real probability would be higher because many players are being tested.

The second flag questions whether an appropriate background distribution was used. It is vital to choose a set of variables appropriate to the question that the experimental results are being tested against for significance. An inappropriate choice of background can artificially induce significance in the results or mask real results. An example would be to sample a women's basketball team to determine whether there is a significant height difference between men and women.