Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines

Journal name:
Date published:
Published online

Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases1, 2, genome-wide association (GWA) studies have, owing to advances in genotyping and sequencing technology, become an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available, because once these lines have been genotyped they can be phenotyped multiple times, making it possible (as well as extremely cost effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly self-fertilizing model plant known to harbour considerable genetic variation for many adaptively important traits3. Our results are dramatically different from those of human GWA studies, in that we identify many common alleles of major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true associations from false. However, a-priori candidates are significantly over-represented among these associations as well, making many of them excellent candidates for follow-up experiments. Our study demonstrates the feasibility of GWA studies in A.thaliana and suggests that the approach will be appropriate for many other organisms.

At a glance


  1. The number of associations identified using different P-value thresholds for each phenotype.
    Figure 1: The number of associations identified using different P-value thresholds for each phenotype.

    For each phenotype, the numbers of distinct peaks of association significant at nominal P-value thresholds (colour scale) are shown. The number of SNPs (out of 250,000) that would be expected to exceed each threshold is shown for comparison. a,No correction for population structure (non-parametric Wilcoxon rank-sum test). b,Correction for population structure (parametric mixed model (EMMA)).

  2. GWA analysis of hypersensitive response to the bacterial elicitor AvrRpm1.
    Figure 2: GWA analysis of hypersensitive response to the bacterial elicitor AvrRpm1.

    a,Genome-wide Pvalues from Fisher’s exact test. The horizontal dash–dot line corresponds to a nominal 5% significance threshold with Bonferroni correction for 250,000 tests. b,Magnification of the genomic region surrounding RPM1, the position (and extent) of which is indicated by the vertical blue line. Mb,megabase.

  3. Candidate SNPs are over-represented among strong associations.
    Figure 3: Candidate SNPs are over-represented among strong associations.

    GWA analysis of the phenotype of flowering time at 10°C: Pvalues from the Wilcoxon rank-sum test are plotted against those from EMMA. Points corresponding to SNPs within 20kb of a candidate gene are shown in red; the rest are shown in blue. The enrichment of the former over the latter in different parts of the distribution is as indicated.

  4. Association with FLC expression at the top of chromosome[thinsp]4 near FRI.
    Figure 4: Association with FLC expression at the top of chromosome4 near FRI.

    The Pvalues are from EMMA and the position of FRI is indicated by a vertical yellow line. The blue and red dots correspond to the Columbia and Landsberg erecta alleles of FRI, respectively. The horizontal dash–dot lines correspond to a nominal 5% significance threshold with Bonferroni correction for 250,000 tests. a,Single-SNP tests. b,Test with the Columbia allele included as a cofactor in the model. c,Test with the Landsberg erecta allele included as a cofactor in the model. d,Test with both alleles included as cofactors in the model.


  1. Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95108 (2005)
  2. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661678 (2007)
  3. Koornneef, M., Alonso-Blanco, C. & Vreugdenhil, D. Naturally occurring genetic variation in Arabidopsis thaliana . Annu. Rev. Plant Biol. 55, 141172 (2004)
  4. Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana . PLoS Biol. 3, e196 (2005)
  5. Shindo, C. et al. Role of FRIGIDA and FLC in determining variation in flowering time of Arabidopsis thaliana . Plant Physiol. 138, 11631173 (2005)
  6. Kim, S. et al. Recombination and linkage disequilibrium in Arabidopsis thaliana . Nature Genet. 39, 11511155 (2007)
  7. Nordborg, M. et al. The extent of linkage disequilibrium in Arabidopsis thaliana . Nature Genet. 30, 190193 (2002)
  8. Aranzana, M. J. et al. Genome-wide association mapping in Arabidopsis identifies previously known flowering time and pathogen resistance genes. PLoS Genet. 1, e60 (2005)
  9. Zhao, K. et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 3, e4 (2007)
  10. Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170181 (2000)
  11. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904909 (2006)
  12. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38, 203208 (2005)
  13. Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 17091723 (2008)
  14. Grant, M. R. et al. Structure of the Arabidopsis RPM1 gene enabling dual-specificity disease resistance. Science 269, 843846 (1995)
  15. Johanson, U. et al. Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time. Science 290, 344347 (2000)
  16. Toomajian, C. et al. A non-parametric test reveals selection for rapid flowering in the Arabidopsis genome. PLoS Biol. 4, e137 (2006)
  17. Michaels, S. D. & Amasino, R. M. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell 11, 949956 (1999)
  18. Bentsink, L., Jowett, J., Hanhart, C. J. & Koornneef, M. Cloning of DOG1, a quantitative trait locus controlling seed dormancy in Arabidopsis . Proc. Natl Acad. Sci. USA 103, 1704217047 (2006)
  19. Rus, A. et al. Natural variants of AtHKT1 enhance Na+ accumulation in two wild populations of Arabidopsis . PLoS Genet. 2, e210 (2006)
  20. Baxter, I. et al. Variation in molybdenum content across broadly distributed populations of Arabidopsis thaliana is controlled by a mitochondrial molybdenum transporter (MOT1). PLoS Genet. 4, e1000004 (2008)
  21. Hilscher, J., Schlötterer, C. & Hauser, M.-T. A single amino acid replacement in ETC2 acts as major modifier of trichome patterning in natural Arabidopsis populations. Curr. Biol. 19, 17471751 (2009)
  22. Lu, H., Rate, D. N., Song, J. T. & Greenberg, J. T. ACD6, a novel ankyrin protein, is a regulator and an effector of salicylic acid signaling in the Arabidopsis defense response. Plant Cell 15, 24082420 (2003)
  23. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747753 (2009)
  24. Han, J. et al. A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4, e1000074 (2008)
  25. Stokowski, R. P. et al. A genomewide association study of skin pigmentation in a South Asian population. Am. J. Hum. Genet. 81, 11191132 (2007)
  26. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 9971004 (1999)
  27. Rosenberg, N. A. & Nordborg, M. A general population-genetic model for the production by population structure of spurious genotype-phenotype associations in discrete, admixed, or spatially distributed populations. Genetics 173, 16651678 (2006)
  28. Nordborg, M. & Weigel, D. Next-generation genetics in plants. Nature 456, 720723 (2008)

Download references

Author information

  1. These authors contributed equally to this work.

    • Susanna Atwell,
    • Yu S. Huang,
    • Bjarni J. Vilhjálmsson &
    • Glenda Willems


  1. Molecular and Computational Biology,

    • Susanna Atwell,
    • Yu S. Huang,
    • Bjarni J. Vilhjálmsson,
    • Glenda Willems,
    • Dazhe Meng,
    • Alexander Platt,
    • Aaron M. Tarone,
    • Tina T. Hu,
    • Rong Jiang,
    • Muhammad Ali Amer,
    • Chunlao Tang &
    • Magnus Nordborg
  2. Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California 90089, USA

    • Paul Marjoram
  3. Department of Ecology & Evolution, University of Chicago, Chicago, Illinois 60637, USA

    • Matthew Horton,
    • Yan Li,
    • N. Wayan Muliyati,
    • Xu Zhang,
    • Joel M. Kniskern,
    • Fabrice Roux,
    • M. Brian Traw,
    • Justin O. Borevitz &
    • Joy Bergelson
  4. Bindley Bioscience Center,

    • Ivan Baxter
  5. Purdue University, West Lafayette, Indiana 47907, USA

    • David E. Salt
  6. Laboratoire de Génétique et Evolution des Populations Végétales, UMR CNRS 8016, Université des Sciences et Technologies de Lille 1, F-59655 Villeneuve d’Ascq Cedex, France

    • Benjamin Brachi,
    • Nathalie Faure &
    • Fabrice Roux
  7. Howard Hughes Medical Institute, La Jolla, California 92037, USA

    • Joanne Chory
  8. Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, California 92037, USA

    • Joanne Chory,
    • Joseph R. Ecker &
    • Todd Michael
  9. Department of Cell and Development Biology, John Innes Centre, Norwich NR4 7UH, UK

    • Caroline Dean
  10. Max Planck Institute for Plant Breeding Research, D-50829 Cologne, Germany

    • Marilyne Debieu &
    • Juliette de Meaux
  11. Sainsbury Laboratory, Norwich NR4 7UH, UK

    • Jonathan D. G. Jones &
    • Adnane Nemri
  12. Department of Molecular Biology, Max Planck Institute for Developmental Biology, D-72076 Tübingen, Germany

    • Marco Todesco &
    • Detlef Weigel
  13. Gregor Mendel Institute, A-1030 Vienna, Austria

    • Magnus Nordborg


J.O.B., J.B. and M.N. are equal senior authors. J.R.E. and D.W. generated the SNPs used in this project. S.A., M.H., Y.L., N.W.M., X.Z., J.O.B. and J.B. were responsible for the experimental aspects of genotyping. Y.S.H., B.J.V., M.H., T.T.H., R.J., X.Z., M.A.A., P.M., J.O.B., J.B. and M.N. were responsible for data management and the bioinformatics pipeline. S.A., I.B., B.B., J.C., C.D., M.D., J.d.M., N.F., J.M.K., J.D.G.J., T.M., A.N., F.R., D.E.S., C.T., M.T., M.B.T., D.W., J.B. and M.N. were responsible for phenotyping. S.A., Y.S.H., B.J.V., G.W., D.M., A.P., A.M.T., P.M. and M.N carried out the GWA analyses. Y.S.H. and D.M. developed the project website. M.N. wrote the paper with significant contributions from S.A., Y.S.H., B.J.V., G.W., A.P. and J.B. J.O.B., J.B. and M.N. designed and supervised the project.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Information (34.4M)

    This file contains Supplementary Information which comprises: 1 Genotyping; 2 Association Mapping Methods; 3 Enrichment for a priori candidates, Supplementary Figures 1-152 with legends, Supplementary References and Supplementary Tables 1-7.


  1. Report this comment #9861

    Antoni Rafalski said:

    For an example of WGA association study in a crop species, maize, see Bel?A, Zheng P, Luck S, Shen B, Meyer DJ, Li B, Tingey S, Rafalski A. Whole genome scan detects an allelic variant of fad2 associated with increased oleic acid levels in maize. Molecular Genetics and Genomics 2008, 279: 1-10. This study was haplotype (rather than single SNP-based) and included ca. 50,000 SNPs at 10,000 loci. Many of the conclusions are similar. Since the time of publication multiple additional phenotypes were analyzed using this approach.
    A. Rafalski

Subscribe to comments

Additional data