Relating to our recent Review (The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms. Nature Reviews Genetics 12, 204–213 (2011))1, Hill and Zhang argue in their Correspondence (Assessing pleiotropy and its evolutionary consequences: pleiotropy is not necessarily limited, nor need it hinder the evolution of complexity. Nature Reviews Genetics 21 Feb 2012 (doi:10.1038/nrg2949-c1))2 that the experimental data summarized and discussed in our article are unable to reject the hypothesis of universal pleiotropy (HUP), which asserts that every mutation (or gene) affects every trait. This influential idea originated from Fisher's geometric model of evolution3. Hill and Zhang2 argue that any empirical detection of a gene effect relies on some statistical significance threshold and is thus likely to underestimate the true degree of pleiotropy. We agree with the substance of this argument and have discussed its implications in our paper1. However, we disagree with the conclusion that the HUP is a viable model of genetic architecture. We think this disagreement has deeper methodological roots, which we discuss below.

Hill and Zhang2 use the HUP as their null hypothesis and request evidence to reject this hypothesis. However, any real experiment has a detection limit. If the HUP allows for arbitrarily small effect sizes, this hypothesis could never be falsified and thus does not rise to the level of a scientific hypothesis. That this incarnation of the HUP is what Hill and Zhang2 have in mind is reflected in the mathematical structure of their model, which has an effect size distribution with a mode of zero4. Hence, most effects are assumed to be small to start with. Of course, it follows that a lower threshold for detection will quickly lead to a higher degree of pleiotropy. We think that Hill and Zhang will agree that the published data1 show that each mutation has few large effects and many effects that fall anywhere between 'small' and 'zero'.

We believe that an experimentally testable null hypothesis should be 'no gene effect'. An effect is proved only when this null hypothesis is rejected because the measured effect on a trait is larger than the detection limit for that experimental design. In fact, the null hypothesis of no gene effect is used by almost all geneticists, explicitly or implicitly. Of course, the real question is not which effects are statistically significant but which ones are biologically meaningful. A natural cut-off would be the smallest effect that can still be 'seen' by natural selection.

We also agree with Hill and Zhang2 that counting traits that are significantly affected by a mutation is not the best way of measuring the pleiotropic level of the mutation, because this level varies by experimental design and sensitivity. In our Review1, we proposed that a better way would be to measure the dispersion of the distribution of the effect sizes of the mutation on a large sample of traits. However, until ways to do this have been developed, counting actually measured effects is the best proxy that is available.

We think that our disagreement with Hill and Zhang is not so much about the substance of the empirical facts, but is instead about the methods used to analyse and represent them. At the very least, the disagreement shows that research in this area urgently needs more sophisticated and more biologically meaningful ways to measure pleiotropy.