Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice

Journal name:
Nature Genetics
Volume:
48,
Pages:
927–934
Year published:
DOI:
doi:10.1038/ng.3596
Received
Accepted
Published online

Abstract

A genome-wide association study (GWAS) can be a powerful tool for the identification of genes associated with agronomic traits in crop species, but it is often hindered by population structure and the large extent of linkage disequilibrium. In this study, we identified agronomically important genes in rice using GWAS based on whole-genome sequencing, followed by the screening of candidate genes based on the estimated effect of nucleotide polymorphisms. Using this approach, we identified four new genes associated with agronomic traits. Some genes were undetectable by standard SNP analysis, but we detected them using gene-based association analysis. This study provides fundamental insights relevant to the rapid identification of genes associated with agronomic traits using GWAS and will accelerate future efforts aimed at crop improvement.

At a glance

Figures

  1. Phenotypic diversity and genetic structure of the Japanese rice varieties.
    Figure 1: Phenotypic diversity and genetic structure of the Japanese rice varieties.

    (a–d) Histograms of zero mean normalized phenotypic values of days to heading (a), plant height (b), panicle length (c) and leaf blade width (d). Yellow and gray bars represent the 176 Japanese rice varieties used in this study (data from phenotyping performed in 2013) and the diversity panel reported in ref. 19, respectively. (e) PCA for the 176 Japanese rice varieties based on whole-genome sequence data. PC1 and PC2 indicate score of principal components 1 and 2, respectively. Values in parentheses indicate percentage of variance in the data explained by each principal component.

  2. GWAS for days to heading and identification of the causal gene for the peak on chromosome 1.
    Figure 2: GWAS for days to heading and identification of the causal gene for the peak on chromosome 1.

    (a) Manhattan plot for days to heading. Dashed line represents the significance threshold (−log10 P = 4.77). Arrowheads indicate the position of strong peaks that did not localize with the known Hd genes investigated in this study. (b) Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak on chromosome 1. Arrow indicates the position of nucleotide variation in LOC_Os01g62780. Dashed lines indicate the candidate region for the peak. (c) Exon-intron structure of LOC_Os01g62780 and DNA polymorphism in that gene. (d) Boxplots for days to heading based on the haplotypes (Hap.) for LOC_Os01g62780 in 2013 (left) and 2014 (right). Box edges represent the 0.25 quantile and 0.75 quantile with the median values shown by bold lines. Whiskers extend to data no more than 1.5 times the interquartile range, and remaining data are indicated by dots. Differences between the haplotypes were analyzed by Welch's t-test. (e) Image of transgenic plants transformed with empty vector (VEC), haplotype A (Hap. A) and haplotype B (Hap. B). Red arrows indicate panicle exsertion. Scale bar, 15 cm. (f) Days to heading of the transgenic plants. Error bars, s.d. (n = 20). **P < 0.01; n.s., not significant (Welch's t-test).

  3. GWAS for plant height and panicle length, and identification of the causal gene for the peak on chromosome 11.
    Figure 3: GWAS for plant height and panicle length, and identification of the causal gene for the peak on chromosome 11.

    (a,b) Manhattan plots for plant height (a) and panicle length (b). Arrowheads indicate the position of strong peaks investigated in this study. Dashed lines represent significance thresholds (−log10 P = 3.67 in a and −log10 P = 5.30 in b). (c) Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak on chromosome 11. Arrow indicates position of nucleotide variations in LOC_Os11g08410. Dashed lines indicate the candidate region for the peak. (d) Exon structure of LOC_Os11g08410 and DNA polymorphisms in that gene. Red- and gray-shaded regions indicate nucleotide variation significantly (−log10 P ≥ 4.77, 3.67 and 5.30 for days to heading, plant height and panicle length, respectively) and not significantly associated with phenotypic variation, respectively. in, insertion; del, deletion. (e) Days to heading, plant height and panicle length for indicated haplotypes for LOC_Os11g08410. Data are presented as in Figure 2d. Differences between the haplotypes were statistically analyzed based on Tukey's test (*P < 0.05; n.s., not significant). (fh) Days to heading (f), plant height (g) and panicle length (h) for transgenic plants transformed with empty vector (VEC), haplotype A (Hap. A) and haplotype C (Hap. C). Error bars, s.d. (n = 20). *P < 0.05, **P < 0.01; n.s., not significant (Welch's t-test).

  4. GWAS for panicle number per plant, spikelet number per panicle and leaf blade width, and identification of the causal gene for the peak on chromosome 4.
    Figure 4: GWAS for panicle number per plant, spikelet number per panicle and leaf blade width, and identification of the causal gene for the peak on chromosome 4.

    (ac) Manhattan plots for panicle number per plant (a), spikelet number per panicle (b) and leaf blade width (c). Arrowheads indicate the position of strong peaks investigated in this study. Dashed lines represent significance threshold (−log10 P = 5.39 in a, 4.60 in b and 5.50 in c). (d) Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak on chromosome 4. Arrow indicates the position of nucleotide variation in LOC_Os04g52479. Dashed lines indicate the candidate region for the peak. (e) Exon-intron structure of LOC_Os04g52479 and DNA polymorphism in this gene. (f) Panicle number per plant, spikelet number per panicle and leaf blade width for the indicated haplotypes of LOC_Os04g52479. Data are presented as in Figure 2d. Differences between the haplotypes were statistically analyzed based on Welch's t-test. (gi) Expression of LOC_Os04g52479 (NAL1) in the transgenic plants (g). Panicle number per plant (h) and leaf blade width (i) for transgenic plants transformed with empty vector (VEC), overexpression of haplotype A (UBQ::Hap.A) and haplotype B (UBQ::Hap.B). UBQ indicates maize ubiquitin promoter that was used for the overexpression of LOC_Os04g52479. Error bars, s.d. (n = 12). **P < 0.01; n.s., not significant (Welch's t-test).

  5. Analyses of the peak for days to heading on chromosome 6.
    Figure 5: Analyses of the peak for days to heading on chromosome 6.

    (a) Local Manhattan plot (top) and LD heatmap (bottom) surrounding the peak on chromosome 6. Aarrow indicates the position of nucleotide variations in Hd1. (b) Schematic representation of the genome structure of the region in a. Major and minor alleles on each polymorphic site are represented in blue and orange, respectively (Online Methods). The varieties were divided based on Hd1 function as follows: haplotype (Hap.) A (intermediate), Hap. B and D (functional), and Hap. E and F (null). Arrowhead indicates position of the nucleotide polymorphisms in Hd1, which are not reflected in the graphical genotype owing to the effect of surrounding nucleotide polymorphisms (Online Methods). (c) Exon-intron structure of Hd1 and DNA polymorphisms in Hd1. (d) Manhattan plot of gene-based association analysis for days to heading. Dashed line represents a 0.2 false discovery rate (−log10 P = 3.02). (e) Local Manhattan plot of gene-based association analysis surrounding the peak on chromosome 6. Arrow indicates the position of Hd1. (f) Plot of −log10 P values of each marker. The markers were arranged in the descending order of −log10 P values. Arrows indicate positions of genes identified in this study.

  6. GWAS for awn length and identification of the causal gene for the peak on chromosome 8.
    Figure 6: GWAS for awn length and identification of the causal gene for the peak on chromosome 8.

    (a) Manhattan plot of single-polymorphism-based association analysis. Dashed line represents a significance threshold (−log10 P = 4.21). (b) Local Manhattan plot of single-polymorphism-based association (top) and LD heatmap (bottom) surrounding the peak on chromosome 8. Arrow indicates the position of nucleotide variations in LOC_Os08g37890. Dashed lines indicate the candidate region for the peak. (c) Manhattan plot of gene-based association analysis. Dashed line represents a 0.2 false discovery rate (−log10 P = 2.28). Arrowheads in a and c indicate the position of strong peaks investigated in this study. (d) Local Manhattan plot of gene-based association analysis surrounding the peak on chromosome 8. The arrow indicates the position of LOC_Os08g37890. (e) −log10 P values of each marker, arranged in descending order of −log10 P values. Arrow indicates the position of LOC_Os08g37890. (f) Exon-intron structure of LOC_Os08g37890 and DNA polymorphisms in that gene. ins, insertion; del, deletion. (g) Awn length based on the haplotypes for LOC_Os08g37890 in 2013 (left) and 2014 (right). Data are presented as shown in Figure 2d. Differences between the haplotypes were statistically analyzed based on Tukey's test (*P < 0.05; n.s., not significant). (h) Schematic representation of the genome structure of the indicated region of chromosome 8. Major and minor alleles on each polymorphic site are represented in blue and orange, respectively. (i) Awn lengths for transgenic plants transformed with empty vector (VEC), haplotype B (Hap. B) and haplotype C (Hap. C). Scale bar, 15 mm.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Godfray, H.C.J. et al. Food security: the challenge of feeding 9 billion people. Science 327, 812818 (2010).
  2. Miura, K., Ashikari, M. & Matsuoka, M. The role of QTLs in the breeding of high-yielding rice. Trends Plant Sci. 16, 319326 (2011).
  3. Huang, X. & Han, B. Natural variations and genome-wide association studies in crop plants. Annu. Rev. Plant Biol. 65, 531551 (2014).
  4. Myles, S. et al. Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell 21, 21942202 (2009).
  5. Hamblin, M.T., Buckler, E.S. & Jannink, J.-L. Population genetics of genomics-based crop improvement methods. Trends Genet. 27, 98106 (2011).
  6. Lipka, A.E. et al. From association to prediction: statistical methods for the dissection and selection of complex traits in plants. Curr. Opin. Plant Biol. 24, 110118 (2015).
  7. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497501 (2012).
  8. Huang, X., Lu, T. & Han, B. Resequencing rice genomes: an emerging new era of rice genomics. Trends Genet. 29, 225232 (2013).
  9. Jia, G. et al. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet. 45, 957961 (2013).
  10. Mace, E.S. et al. Whole-genome sequencing reveals untapped genetic potential in Africa's indigenous cereal crop sorghum. Nat. Commun. 4, 2320 (2013).
  11. Aflitos, S. et al. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J. 80, 136148 (2014).
  12. Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 12201226 (2014).
  13. Hazzouri, K.M. et al. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop. Nat. Commun. 6, 8824 (2015).
  14. Huang, X. et al. Genomic analysis of hybrid rice varieties reveals numerous superior alleles that contribute to heterosis. Nat. Commun. 6, 6258 (2015).
  15. Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408414 (2015).
  16. Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K. & Mardis, E.R. The next-generation sequencing revolution and its impact on genomics. Cell 155, 2738 (2013).
  17. Ott, J., Wang, J. & Leal, S.M. Genetic linkage analysis in the age of whole-genome sequencing. Nat. Rev. Genet. 16, 275284 (2015).
  18. Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961967 (2010).
  19. Zhao, K. et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat. Commun. 2, 467 (2011).
  20. Huang, X. et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat. Genet. 44, 3239 (2012).
  21. Li, H. et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat. Genet. 45, 4350 (2013).
  22. Yu, J., Holland, J.B., McMullen, M.D. & Buckler, E.S. Genetic design and statistical power of nested association mapping in maize. Genetics 178, 539551 (2008).
  23. McMullen, M.D. et al. Genetic properties of the maize nested association mapping population. Science 325, 737740 (2009).
  24. Kump, K.L. et al. Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat. Genet. 43, 163168 (2011).
  25. Tian, F. et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat. Genet. 43, 159162 (2011).
  26. Cavanagh, C., Morell, M., Mackay, I. & Powell, W. From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Curr. Opin. Plant Biol. 11, 215221 (2008).
  27. Holland, J.B. MAGIC maize: a new resource for plant genetics. Genome Biol. 16, 163 (2015).
  28. Dell'Acqua, M. et al. Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 16, 167 (2015).
  29. Reich, D.E. et al. Linkage disequilibrium in the human genome. Nature 411, 199204 (2001).
  30. Gupta, P.K., Rustgi, S. & Kulwal, P.L. Linkage disequilibrium and association studies in higher plants: present status and future prospects. Plant Mol. Biol. 57, 461485 (2005).
  31. Woolston, C. Rice. Nature 514, S49 (2014).
  32. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793800 (2005).
  33. Matsubara, K., Hori, K., Ogiso-Tanaka, E. & Yano, M. Cloning of quantitative trait genes from rice reveals conservation and divergence of photoperiod flowering pathways in Arabidopsis and rice. Front. Plant Sci. 5, 193 (2014).
  34. Takahashi, Y., Shomura, A., Sasaki, T. & Yano, M. Hd6, a rice quantitative trait locus involved in photoperiod sensitivity, encodes the alpha subunit of protein kinase CK2. Proc. Natl. Acad. Sci. USA 98, 79227927 (2001).
  35. Koo, B.H. et al. Natural variation in OsPRR37 regulates heading date and contributes to rice cultivation at a wide range of latitudes. Mol. Plant 6, 18771888 (2013).
  36. Ren, G., Chen, X. & Yu, B. Uridylation of miRNAs by HEN1 SUPPRESSOR1 in Arabidopsis. Curr. Biol. 22, 695700 (2012).
  37. Chen, X., Liu, J., Cheng, Y. & Jia, D. HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129, 10851094 (2002).
  38. Fujita, D. et al. NAL1 allele from a rice landrace greatly increases yield in modern indica cultivars. Proc. Natl. Acad. Sci. USA 110, 2043120436 (2013).
  39. Takai, T. et al. A natural variant of NAL1, selected in high-yield rice breeding programs, pleiotropically increases photosynthesis rate. Sci. Rep. 3, 2149 (2013).
  40. Yano, M. et al. Hd1, a major photoperiod sensitivity quantitative trait locus in rice, is closely related to the Arabidopsis flowering time gene CONSTANS. Plant Cell 12, 24732484 (2000).
  41. Fujino, K. et al. Multiple introgression events surrounding the Hd1 flowering-time gene in cultivated rice, Oryza sativa L. Mol. Genet. Genomics 284, 137146 (2010).
  42. Takahashi, Y. & Shimamoto, K. Heading date 1 (Hd1), an ortholog of Arabidopsis CONSTANS, is a possible target of human selection during domestication to diversify flowering times of cultivated rice. Genes Genet. Syst. 86, 175182 (2011).
  43. Atwell, S. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465, 627631 (2010).
  44. Baxter, I. et al. A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1; 1. PLoS Genet. 6, e1001193 (2010).
  45. Dickson, S.P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D.B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).
  46. Platt, A., Vilhjálmsson, B.J. & Nordborg, M. Conditions under which genome-wide association studies will be positively misleading. Genetics 186, 10451052 (2010).
  47. Jorgenson, E. & Witte, J.S. A gene-centric approach to genome-wide association studies. Nat. Rev. Genet. 7, 885891 (2006).
  48. Ivanov, D.K. et al. Longevity GWAS using the Drosophila genetic reference panel. J. Gerontol. A Biol. Sci. Med. Sci. 70, 14701478 (2015).
  49. Ferrari, R. et al. A genome-wide screening and SNPs-to-genes approach to identify novel genetic risk factors associated with frontotemporal dementia. Neurobiol. Aging 36, 2904, e13–2904.e26 (2015).
  50. Abrash, E.B., Davies, K.A. & Bergmann, D.C. Generation of signaling specificity in Arabidopsis by spatially restricted buffering of ligand-receptor interactions. Plant Cell 23, 28642879 (2011).
  51. Segura, V. et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825830 (2012).
  52. Vilhjálmsson, B.J. & Nordborg, M. The nature of confounding in genome-wide association studies. Nat. Rev. Genet. 14, 12 (2013).
  53. Sasaki, A. et al. Green revolution: a mutant gibberellin-synthesis gene in rice. Nature 416, 701702 (2002).
  54. Asano, K. et al. Artificial selection for a green revolution gene during japonica rice domestication. Proc. Natl. Acad. Sci. USA 108, 1103411039 (2011).
  55. Konishi, S. et al. An SNP caused loss of seed shattering during rice domestication. Science 312, 13921396 (2006).
  56. Li, Y. et al. Natural variation in GS5 plays an important role in regulating grain size and yield in rice. Nat. Genet. 43, 12661269 (2011).
  57. Wang, Y. et al. Copy number variation at the GL7 locus contributes to grain size diversity in rice. Nat. Genet. 47, 944948 (2015).
  58. Si, L. et al. OsSPL13 controls grain size in cultivated rice. Nat. Genet. 48, 447456 (2016).
  59. Hashimoto, Z. et al. Genetic diversity and phylogeny of Japanese sake-brewing rice as revealed by AFLP and nuclear and chloroplast SSR markers. Theor. Appl. Genet. 109, 15861596 (2004).
  60. Ebana, K., Kojima, Y., Fukuoka, S., Nagamine, T. & Kawase, M. Development of mini core collection of Japanese rice landrace. Breed. Sci. 58, 281291 (2008).
  61. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589595 (2010).
  62. DePristo, M.A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491498 (2011).
  63. Ouyang, S. et al. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res. 35, D883D887 (2007).
  64. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904909 (2006).
  65. Shin, J.-H., Blay, S., McNeney, B. & Graham, J. LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms. J. Stat. Softw. 16, Code Snippet 3 (2006).
  66. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421427 (2014).
  67. Ma, X. et al. No association between ovarian cancer susceptibility variants and breast cancer risk among Chinese women. Cancer Epidemiol. Biomarkers Prev. 22, 467469 (2013).
  68. Mayerle, J. et al. Identification of genetic loci associated with Helicobacter pylori serologic status. J. Am. Med. Assoc. 309, 19121920 (2013).
  69. Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203208 (2006).
  70. Endelman, J.B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome J. 4, 250255 (2011).
  71. Dudbridge, F. & Gusnanto, A. Estimation of significance thresholds for genomewide association scans. Genet. Epidemiol. 32, 227234 (2008).
  72. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 27252729 (2013).
  73. Ozawa, K. A high-efficiency Agrobacterium-mediated transformation system of rice (Oryza sativa L.). Methods Mol. Biol. 847, 5157 (2012).

Download references

Author information

Affiliations

  1. Bioscience and Biotechnology Center, Nagoya University, Nagoya, Japan.

    • Kenji Yano,
    • Koichiro Aya,
    • Hideyuki Takeuchi,
    • Pei-ching Lo,
    • Li Hu,
    • Hidemi Kitano,
    • Ko Hirano &
    • Makoto Matsuoka
  2. NARO Institute of Vegetable and Tea Science, Tsu, Japan.

    • Eiji Yamamoto
  3. Food Resources Education and Research Center, Graduate School of Agricultural Science, Kobe University, Kasai, Hyogo, Japan.

    • Masanori Yamasaki
  4. Hyogo Prefectural Research Center for Agriculture, Forestry and Fisheries, Kasai, Hyogo, Japan.

    • Shinya Yoshida

Contributions

K.Y., K.A., H.T., P.-C.L. and L.H. performed the field experiments and analyzed the results. K.Y., K.A. and H.T. performed the genotyping and the genome data analyses. M.Y. and S.Y. prepared the population material. K.Y. produced the constructs and generated and analyzed the transformants. K.Y., E.Y., K.A., H.K., K.H. and M.M. designed the research and wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (8,300 KB)

    Supplementary Figures 1–21 and Supplementary Tables 1, 3–5, 13 and 14.

Excel files

  1. Supplementary Table 2 (62,077 KB)

    Phenotypic data of the seven traits observed in 2013 and 2014.

  2. Supplementary Table 6 (65,381 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of days to heading.

  3. Supplementary Table 7 (101 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of plant height.

  4. Supplementary Table 8 (58,891 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of panicle length.

  5. Supplementary Table 9 (62,473 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of panicle number per plant.

  6. Supplementary Table 10 (55,987 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of leaf blade width.

  7. Supplementary Table 11 (67,230 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of spikelet number per panicle.

  8. Supplementary Table 12 (101 KB)

    List of the top 50 P-value-ranked genes in the gene-based association analysis of awn length.

Additional data