Genomic prediction contributing to a promising global strategy to turbocharge gene banks


The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Study design and genetic characterization of the sorghum population.
Figure 2: Selective phenotyping and cross-validation.
Figure 3: Relationship between biomass yield and three other traits.
Figure 4: Prediction accuracy from empirical validation experiments.
Figure 5: Prediction extrapolation to an independent population.
Figure 6: Upper bound for reliability (U) and the predicted Yield.
Figure 7: Turbocharging gene banks through genomic selection.


  1. 1

    Gerland, P. et al. World population stabilization unlikely this century. Science 346, 234–237 (2014).

    CAS  Article  Google Scholar 

  2. 2

    Hoisington, D. et al. Plant genetic resources: what can they contribute toward increased crop productivity? Proc. Natl Acad. Sci. USA 96, 5937–5943 (1999).

    CAS  Article  Google Scholar 

  3. 3

    Zamir, D. Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet. 2, 983–989 (2001).

    CAS  Article  Google Scholar 

  4. 4

    Tanksley, S. D. & McCouch, S. R. Seed banks and molecular maps: unlocking genetic potential from the wild. Science 277, 1063–1066 (1997).

    CAS  Article  Google Scholar 

  5. 5

    Houle, D., Govindaraju, D. R. & Omholt, S. Phenomics: the next challenge. Nat. Rev. Genet. 11, 855–866 (2010).

    CAS  Article  Google Scholar 

  6. 6

    Morrell, P. L., Buckler, E. S. & Ross-Ibarra, J. Crop genomics: advances and applications. Nat. Rev. Genet. 13, 85–96 (2012).

    CAS  Article  Google Scholar 

  7. 7

    Xu, S., Zhu, D. & Zhang, Q. Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc. Natl Acad. Sci. USA 111, 12456–12461 (2014).

    CAS  Article  Google Scholar 

  8. 8

    Riedelsheimer, C. et al. Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat. Genet. 44, 217–220 (2012).

    CAS  Article  Google Scholar 

  9. 9

    de los Campos, G., Gianola, D. & Allison, D. B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).

    CAS  Article  Google Scholar 

  10. 10

    Varshney, R. K. et al. Can genomics boost productivity of orphan crops? Nat. Biotech. 30, 1172–1176 (2012).

    CAS  Article  Google Scholar 

  11. 11

    McCouch, S. et al. Agriculture: Feeding the future. Nature 499, 23–24 (2013).

    CAS  Article  Google Scholar 

  12. 12

    Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009).

    CAS  Article  Google Scholar 

  13. 13

    Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat. Rev. Genet. 12, 499–510 (2011).

    CAS  Article  Google Scholar 

  14. 14

    Elshire, R. J. et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE 6, e19379 (2011).

    CAS  Article  Google Scholar 

  15. 15

    Thurber, C. S., Ma, J. M., Higgins, R. H. & Brown, P. J. Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production. Genome Biol. 14, R68 (2013).

    Article  Google Scholar 

  16. 16

    Morris, G. P. et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl Acad. Sci. USA 110, 453–458 (2012).

    Article  Google Scholar 

  17. 17

    Darvasi, A. Experimental strategies for the genetic dissection of complex traits in animal models. Nat. Genet. 18, 19–24 (1998).

    CAS  Article  Google Scholar 

  18. 18

    Morota, G. & Gianola, D. Kernel-based whole-genome prediction of complex traits: a review. Front. Genet. 5, 363 (2014).

    PubMed  PubMed Central  Google Scholar 

  19. 19

    Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    CAS  Article  Google Scholar 

  20. 20

    Zaitlen, N. et al. Leveraging population admixture to characterize the heritability of complex traits. Nat. Genet. 46, 1356–1362 (2014).

    CAS  Article  Google Scholar 

  21. 21

    Dekkers, J. C. M. Prediction of response to marker-assisted and genomic selection using selection index theory. J. Anim. Breed. Genet. 124, 331–341 (2007).

    CAS  Article  Google Scholar 

  22. 22

    Tester, M. & Langridge, P. Breeding technologies to increase crop production in a changing world. Science 327, 818–822 (2010).

    CAS  Article  Google Scholar 

  23. 23

    VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

    CAS  Article  Google Scholar 

  24. 24

    Karaman, E., Cheng, H., Firat, M. Z., Garrick, D. J. & Fernando, R. L. An upper bound for accuracy of prediction using GBLUP. PLoS ONE 11, e0161054 (2016).

    Article  Google Scholar 

  25. 25

    Harnessing the power of crop diversity to feed the future White Paper (DivSeek, 2014).

  26. 26

    Meuwissen, T. H., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27

    Bernardo, R. & Yu, J. Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47, 1082–1090 (2007).

    Article  Google Scholar 

  28. 28

    Heffner, E. L., Sorrells, M. E. & Jannink, J. L. Genomic selection for crop improvement. Crop Sci. 49, 1–12 (2009).

    CAS  Article  Google Scholar 

  29. 29

    Crossa, J. et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity 112, 48–60 (2014).

    CAS  Article  Google Scholar 

  30. 30

    Spindel, J. et al. Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet. 11, e1004982 (2015).

    Article  Google Scholar 

  31. 31

    Akdemir, D., Sanchez, J. I. & Jannink, J. L. Optimization of genomic selection training populations with a genetic algorithm. Genet. Sel. Evol. 47, 38 (2015).

    Article  Google Scholar 

  32. 32

    Isidro, J. et al. Training set optimization under population structure in genomic selection. Theor. Appl. Genet. 128, 145–158 (2015).

    Article  Google Scholar 

  33. 33

    Rincent, R. et al. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics 192, 715–728 (2012).

    CAS  Article  Google Scholar 

  34. 34

    Malosetti, M., Ribaut, J. M. & van Eeuwijk, F. A. The statistical analysis of multi-environment data: modeling genotype-by-environment interaction and its genetic basis. Front. Physiol. 4, 44 (2013).

    CAS  Article  Google Scholar 

  35. 35

    Technow, F., Messina, C. D., Totir, L. R. & Cooper, M. Integrating crop growth models with whole genome prediction through approximate bayesian computation. PLoS ONE 10, e0130855 (2015).

    Article  Google Scholar 

  36. 36

    Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

    CAS  Article  Google Scholar 

  37. 37

    Murray, M. G. & Thompson, W. F. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325 (1980).

    CAS  Article  Google Scholar 

  38. 38

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  39. 39

    Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).

    CAS  Article  Google Scholar 

  40. 40

    Scheet, P. & Stephens, M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

    CAS  Article  Google Scholar 

  41. 41

    Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  Article  Google Scholar 

  42. 42

    Liu, K. & Muse, S. V. Powermarker: an integrated analysis environment for genetic marker analysis. Bioinformatics 21, 2128–2129 (2005).

    CAS  Article  Google Scholar 

  43. 43

    Lipka, A. E. et al. GAPIT: Genome association and prediction integrated tool. Bioinformatics (2012).

  44. 44

    R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).

  45. 45

    Endelman, J. B. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4, 250–255 (2011).

    Article  Google Scholar 

  46. 46

    de los Campos, G. et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182, 375–385 (2009).

    CAS  Article  Google Scholar 

  47. 47

    Fernando, R. & Garrick, D. GenSel–User Manual for a Portfolio of Genomic Selection Related Analyses (Iowa State Univ., 2008);

    Google Scholar 

  48. 48

    Piepho, H. P. Ridge regression and extensions for genomewide selection in maize. Crop Sci. 49, 1165–1176 (2009).

    Article  Google Scholar 

  49. 49

    Gianola, D., Fernando, R. L. & Stella, A. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173, 1761–1776 (2006).

    CAS  Article  Google Scholar 

  50. 50

    Habier, D., Fernando, R., Kizilkaya, K. & Garrick, D. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics 12, 186 (2011).

    Article  Google Scholar 

Download references


This work was supported by the Agriculture and Food Research Initiative competitive grant (2011-03587) from the USDA National Institute of Food and Agriculture, by the National Science Foundation grant IOS-1238142, by the Kansas State University Center for Sorghum Improvement, by the Iowa State University Raymond F. Baker Center for Plant Breeding and by the Iowa State University Plant Science Institute. We appreciate K. Mayfield, L. Lambright and S. Staggenborg from Chromatin for conducting experiments at Lubbock, Texas.

Author information




J.Y., M.L.W., G.A.P., T.T.T., P.S.S. and R.B. conceived and designed the experiments. X.Y., X.L., T.G., C.Z., Y.W., K.L.R., M.L.W. and J.Y. performed the experiments. X.Y., X.L. and C.Z. analysed the data. S.E.M., K.L.R., D.W., M.L.W., G.A.P., T.T.T., P.S.S. and R.B. contributed materials/analysis tools. X.Y., X.L., T.G. and J.Y. wrote the paper.

Corresponding author

Correspondence to Jianming Yu.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

Supplementary Figures 1-12 and Supplementary Tables 1-3. (PDF 2298 kb)

Supplementary Data Set

Trait data for the 299-accession training set, the 200-accession validation set, and the 45-accession validation set. (XLSX 50 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, X., Li, X., Guo, T. et al. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nature Plants 2, 16150 (2016).

Download citation

Further reading