Reply to

Marchini, Jonathan; Cardon, Lon R; Phillips, Michael S; Donnelly, Peter

doi:10.1038/ng1104-1131

Download PDF

Correspondence
Published: 01 November 2004

Reply to "Genomic Control to the extreme"

Jonathan Marchini¹,
Lon R Cardon²,
Michael S Phillips³ &
…
Peter Donnelly¹

Nature Genetics volume 36, page 1131 (2004)Cite this article

523 Accesses
4 Citations
Metrics details

In reply:

The main point of our original paper¹ was that even the relatively small levels of structure in large populations cannot be ignored in the coming generation of association studies, effectively because of the sizes of these studies (both sample size and numbers of loci). We continue to believe, however, that association studies have a central role in unraveling the genetic basis of common human diseases, provided that population structure is handled appropriately. One published method for dealing with population structure is Genomic Control (GC)². Our paper showed that GC typically performs well but that there are some previously unrecognized problems in certain settings.

We are delighted that our work prompted Devlin and his colleagues to correct this aspect of GC. Their new procedure, GCF, represents an important advance and should be used in place of the original method. We also agree that this approach to handling uncertainty in the estimation of the correction factor λ is better than the use of confidence limits³.

But whether the settings in which GC had problems should be dismissed as 'extreme' is less clear. Of course the design and analysis of studies should attempt to control for stratification. This is not simple to do in practice. First, there are important unresolved empirical questions about the levels and nature of such structure in population groups (e.g., people of European descent in a particular country or African Americans) and unresolved statistical issues about how best to use this kind of information in study design and analysis. Second, in the real world many studies will not meet these worthy objectives, in some cases because relevant confounding factors are not known or not easily measured and in other cases because investigators apportion their limited resources in other directions. Finally, as our paper noted¹, even with the best design and analysis, there is likely to be a level of residual structure after allowing for known confounders. At present there is limited relevant data to determine the probable levels of residual structure, but the simulations in our paper deliberately included plausible scenarios for these. Notably, in their original paper², Devlin and Roeder described the level of population structure that we considered in ref. 1 (F = 0.01 in their notation) as “realistic”. Further, as noted in ref. 2, cryptic relatedness poses as much of a threat to association studies as does geographic population structure and is much more difficult to reduce by experimental design. Preliminary analysis of a large UK case-control study (886 cases, 878 controls, 8,000 markers) showed substantial inflation of χ² statistics even after accounting for broad geographical region, with a portion of this inflation plausibly due to population structure (D. Clayton, personal communication).

Although we are positive in general about Bayesian statistical methods, we urge caution against viewing the Bayesian mixture approach (GCB), and more generally false discovery rates^4,5,6, as a simple panacea to multiple testing issues. There are not often free lunches. The idea of GCB is to partition loci into two groups: those associated with the disease (outlier loci) and those not associated with the disease, using a sensible statistical model, and method, to assign loci to each group. Informally, this will be easy if the test statistics of outlier loci look very different from those of nonassociated loci, which would be the case if the genetic effects were large and if there were moderate numbers of loci in each category. On the other hand, for the small effects appropriate to complex diseases, genome scans with massive numbers of nonassociated loci and a small relative number of true disease loci, the tail of the null distribution (after GC) of test statistics may well overlap, or possibly even bury, the few values from associated loci, and no statistical procedure will reliably separate the two. These kinds of settings have not been extensively explored.

We conclude with two points of detail. It is false that our original paper¹ assumed “subjects that originate from different populations”. Much of our focus (e.g., Fig. 4c–e and Fig. 6 in ref. 1) deliberately (and explicitly) concerned structure plausible within current populations. Finally, there are two different ways in which GC (or GCB or GCF) could fail in practice: (i) the null distribution of the test statistic may not behave as a simple multiple of a χ² distribution, or (ii) the statistical allowance for the inflation factor may not be effective. The 'short cut' simulations given by Devlin et al above presuppose that the first point is not a problem. In the absence of a formal mathematical proof, and with abundant computing resources, it would seem better to check routinely both aspects of GC, as in their Table 1, rather than only the second, as in their Figures 1 and 2.

References

Marchini, J., Cardon, L.R., Phillips, M.S. & Donnelly, P. Nat. Genet. 36, 512–728 (2004).
Article CAS Google Scholar
Devlin, B. & Roeder, K. Biometrics 55, 997–1004 (1999).
Article CAS Google Scholar
Freedman, M.L. et al. Nat. Genet. 36, 388–393 (2004).
Article CAS Google Scholar
Benjamini, Y. & Hochberg, Y.J.R. R. Stat. Soc. B 57, 289–300 (1995).
Google Scholar
Storey, J.D. J. R. Stat. B 64, 479–498 (2002)
Article Google Scholar
Storey, J.D. & Tibshirani, R. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK
Jonathan Marchini & Peter Donnelly
Wellcome Trust Center for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK
Lon R Cardon
Genome Quebec and McGill University Genome Center, Montreal, H3A 1A4, Canada
Michael S Phillips

Authors

Jonathan Marchini
View author publications
You can also search for this author in PubMed Google Scholar
Lon R Cardon
View author publications
You can also search for this author in PubMed Google Scholar
Michael S Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Peter Donnelly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Donnelly.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marchini, J., Cardon, L., Phillips, M. et al. Reply to "Genomic Control to the extreme". Nat Genet 36, 1131 (2004). https://doi.org/10.1038/ng1104-1131

Download citation

Issue Date: 01 November 2004
DOI: https://doi.org/10.1038/ng1104-1131

This article is cited by

Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis
- Ali Toosi
- Rohan L. Fernando
- Jack C. M. Dekkers
Genetics Selection Evolution (2018)

Reply to "Genomic Control to the extreme"

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis

Search

Quick links

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis

Search

Quick links