Genomic innovation for crop improvement

Journal name:
Date published:
Published online


Crop production needs to increase to secure future food supplies, while reducing its impact on ecosystems. Detailed characterization of plant genomes and genetic diversity is crucial for meeting these challenges. Advances in genome sequencing and assembly are being used to access the large and complex genomes of crops and their wild relatives. These have helped to identify a wide spectrum of genetic variation and permitted the association of genetic diversity with diverse agronomic phenotypes. In combination with improved and automated phenotyping assays and functional genomic studies, genomics is providing new foundations for crop-breeding systems.

At a glance


  1. Evolution and domestication of the common polyploid crops wheat and the genus Brassica.
    Figure 1: Evolution and domestication of the common polyploid crops wheat and the genus Brassica.

    a, About 7 Myr ago, an ancestral Triticeae gave rise to the AA and BB diploid precursor genomes of wheat. These formed the diploid precursor DD genome of goatgrass 5 Myr ago. Around 800 kyr ago, the tetraploid AABB genome of wild emmer wheat formed, which was domesticated in the region that is now south-west Turkey. About 8 kyr ago, a random hexaploidization event occurred between domesticated emmer wheat and a wild goatgrass, which contributed its DD genome, that formed the contemporary bread wheat genome. b, About 14 Myr ago, an ancestral Brassicales underwent genome triplication; it then speciated to form the AA, BB and CC genomes around 5 Myr ago. The resulting species were domesticated relatively recently and also formed polyploid species that are important crops worldwide.

  2. Optimal sequencing systems for crop applications.
    Figure 2: Optimal sequencing systems for crop applications.

    A variety of sequencing methods are now available for different applications in crop improvement (green pyramid). The number of genomes that can be sequenced cost-effectively varies according to the method applied (left). Long-read technologies from PacBio, alone or coupled with Illumina assemblies, can be used to provide accurate long-range assemblies for a smaller number of genomes. These are used to define comprehensively the range and types of variation that are found in the genomes of a species (the pan-genome). Linked reads, coupled to Illumina sequencing, may provide more cost-effective capacity for sequencing on the order of thousands of genomes, which is useful, for example, for the identification of structural variation. Skim sequencing consists of low-coverage (for example, 5–10×) Illumina reads and presents a cost-effective way of identifying genetic variation and haplotypes in populations. Exome sequencing captures gene-coding regions, and genotyping by sequencing typically involves the sequencing of about 100–150 bases from a randomly located restriction-enzyme cleavage site in the genome.

  3. Erosion of genetic diversity in cultivated crops and its re-incorporation through genomics.
    Figure 3: Erosion of genetic diversity in cultivated crops and its re-incorporation through genomics.

    a, Genetic diversity (coloured circles) in populations of wild precursors of crops has been eroded by domestication, in which a limited range of diversity is present in landraces that were initially selected and adopted for cultivation. Subsequent breeding has drawn on a limited range of the variation present in landraces to produce the elite cultivars that are used in modern agriculture. b, The identification of genes for crop improvement can use mutagenesis to introduce changes into the DNA of crops (top). Mutants with desired characteristics can be identified by screening for desired properties, known as phenotypes. In practice, this method is time-consuming and imprecise unless a specific phenotype can be measured in large populations. Mutant lines with desired phenotypes are pooled and sequenced (middle). Genomics can accelerate the process of identifying mutants by sequencing populations of mutant crops (or a range of wild relatives). Sequencing can be targeted to all genes, or specific families of genes, using sequence capture methods. RNA can also be sequenced to identify changes in gene expression that are caused by mutagenesis. Sequences of mutant lines are then compared to identify genes that are consistently mutated in the lines that exhibit the desired phenotype (bottom). c, Genomics can also be used to access genetic variation in populations of crop wild relatives. A population can be sequenced using a variety of approaches (described in Fig. 2). At the same time, the population is screened for a range of phenotypes of interest. Patterns of sequence variation, or haplotypes, can be associated with phenotypes to identify sequence variation that may cause the phenotype.

  4. The assembly of haplotypes in a crop-breeding programme.
    Figure 4: The assembly of haplotypes in a crop-breeding programme.

    a, An example of a genomic region that consists of four genes and contains genetic variation that defines three haplotypes (H1, H2 and H3) at a particular locus (locus 5) on a chromosome. The position of the SNP that defines each haplotype is marked by an asterisk. An array of haplotypes (H1–H4) from the same chromosome, with the variants of four breeding lines (line 1, line 2, line 3 and line 4) aligned underneath each locus, is also shown. Line n and line n+1 are landraces (domesticated lines) that can introduce new haplotypes (H5–H9) and genetic diversity. The genomic structure, diversity and functions of haplotypes are established by the re-sequencing of lines and the analysis of quantitative trait loci. The red line traces the assembly of a new line (variety X) from component haplotypes, using markers that are specific for the haplotypes in each line, that have been chosen on the basis of desired combinations of phenotypes that are expressed by each haplotype. b, The performance of various haplotypes in lines 1–4 is determined in different environments, often under field conditions and over several years, using specific assays. Examples are shown for the variation in performance of four common plant traits that are influenced by genetic variation at locus 1 (time to flower), locus 4 (yield), locus 8 (resistance to disease) and locus X (protein content), with the combined performance of variety X highlighted in light blue.


  1. Drèze, J. & Sen, A. K. Hunger and Public Action (Clarendon, 1989).
  2. Foley, J. A. et al. Solutions for a cultivated planet. Nature 478, 337342 (2011).
  3. Kitano, M. et al. Ammonia synthesis using a stable electride as an electron donor and reversible hydrogen store. Nature Chem. 4, 934940 (2012).
  4. Zhao, C. et al. Plausible rice yield losses under future climate warming. Nature Plants 3, 16202 (2016).
  5. Garrett, K. A., Dendy, S. P., Frank, E. E., Rouse, M. N. & Travers, S. E. Climate change effects on plant disease: genomes to ecosystems. Annu. Rev. Phytopathol. 44, 489509 (2006).
  6. Godfray, H. C. J. et al. Food security: the challenge of feeding 9 billion people. Science 327, 812818 (2010).
  7. Tilman, D., Balzer, C., Hill, J. & Befort, B. L. Global food demand and the sustainable intensification of agriculture. Proc. Natl Acad. Sci. USA 108, 2026020264 (2011).
  8. Bennett, M. D. & Leitch, I. J. Nuclear DNA amounts in angiosperms: targets, trends and tomorrow. Ann. Bot. 107, 467590 (2011).
  9. Bennetzen, J. L., Ma, J. & Devos, K. M. Mechanisms of recent genome size variation in flowering plants. Ann. Bot. 95, 127132 (2005).
  10. Lisch, D. How important are transposons for plant evolution? Nature Rev. Genet. 14, 4961 (2012).
  11. Kim, M. Y. & Zilberman, D. DNA methylation as a system of plant genomic immunity. Trends Plant Sci. 19, 320326 (2014).
  12. Jiao, Y. et al. Ancestral polyploidy in seed plants and angiosperms. Nature 473, 97100 (2012).
  13. Woodhouse, M. R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homeologs. PLoS Biol. 8, e1000409 (2010).
  14. Neale, D. B. et al. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol. 15, R59 (2014).
  15. Shulaev, V. et al. The genome of woodland strawberry (Fragaria vesca). Nature Genet. 43, 109116 (2010).
  16. Chalhoub, B. et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950953 (2014).
  17. Bertioli, D. J. et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nature Genet. 48, 438446 (2016).
  18. Voskoboynik, A. et al. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2, e00569 (2013).
  19. Safar, J. et al. Dissecting large and complex genomes: flow sorting and BAC cloning of individual chromosomes from bread wheat. Plant J. 39, 960968 (2004).
  20. International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).
  21. Sierro, N. et al. The tobacco genome sequence and its comparison with those of tomato and potato. Nature Commun. 5, 3833 (2014).
  22. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nature Biotechnol. 33, 531537 (2015).
  23. Staňková, H. et al. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol. J. 14, 15231531 (2016).
  24. Yang, J. et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nature Genet. 48, 12251232 (2016).
  25. Chaisson, M. J. P., Wilson, R. K. & Eichler, E. E. Genetic variation and the de novo assembly of human genomes. Nature Rev. Genet. 16, 627640 (2015).
  26. Kozarewa, I. et al. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature Methods 6, 291295 (2009).
  27. Weisenfeld, N. I. et al. Comprehensive variation discovery in single human genomes. Nature Genet. 46, 13501355 (2014).
    This paper highlights the development and application of the DISCOVAR assembler, which has been of crucial importance for the creation of assemblies with improved representation of sequence variants.
  28. Love, R. R., Weisenfeld, N. I., Jaffe, D. B., Besansky, N. J. & Neafsey, D. E. Evaluation of DISCOVAR de novo using a mosquito sample for cost-effective short-read genome assembly. BMC Genomics 17, 187 (2016).
  29. Pendleton, M. et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nature Methods 12, 780786 (2015).
    This article shows how the application of hybrid assembly methods has set new standards for sequence contiguity and the representation of diversity.
  30. Zimin, A. et al. Sequencing and assembly of the 22-Gb loblolly pine genome. Genetics 196, 875890 (2014).
  31. Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the mega-reads algorithm. Genome Res. (2017).
    This paper shows how long-read sequencing technology coupled with the mega-reads algorithm can be used successfully to tackle a large and complex grass genome, which paves the way for the sequencing of multiple variants.
  32. Chin, C.-S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods 13, 10501054 (2016).
    This study applies the PacBio long-read sequencing technology to resolving the highly heterozygous Vitis vinifera cv. Cabernet Sauvignon genome and demonstrates the importance of this technology for the assembly of complex plant genomes.
  33. Koren, S. et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nature Biotechnol. 30, 693700 (2012).
  34. Goodwin, S. et al. Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res. 25, 17501756 (2015).
  35. Loman, N. J., Quick, J. & Simpson, J. T. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods 12, 733735 (2015).
    Refs 34 and 35 highlight the potential of nanopore sequencing technology, using yeast and bacterial genomes.
  36. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnol. 31, 11191125 (2013).
  37. Session, A. M. et al. Genome evolution in the allotetraploid frog Xenopus laevis. Nature 538, 336343 (2016).
  38. Selvaraj, S., Dixon, J. R., Bansal, V. & Ren, B. Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing. Nature Biotechnol. 31, 11111118 (2013).
  39. Putnam, N. H. et al. Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342350 (2016).
  40. Jarvis, D. E. et al. The genome of Chenopodium quinoa. Nature 542, 307312 (2017).
  41. Mostovoy, Y. et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nature Methods 13, 587590 (2016).
  42. Zheng, G. X. Y. et al. Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nature Biotechnol. 34, 303311 (2016).
    As well as refs 41 and 43, this paper shows the considerable potential of linked-read sequencing technology for resolving the phasing of complete chromosomes.
  43. Weisenfeld, N. I., Kumar, V., Shah, P., Church, D. & Jaffe, D. B. Direct determination of diploid genome sequences. Preprint at (2016).
  44. Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345, 11811184 (2014).
  45. D'Hont, A. et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488, 213217 (2012).
  46. Clavijo, B. J. et al. An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Preprint at (2016).
    This preprint presents open-source assembly methods that preserve genetic variation and have enabled the fast and low-cost assembly of the large and complex wheat genome.
  47. Grivet, L. & Arruda, P. Sugarcane genomics: depicting the complex genome of an important tropical crop. Curr. Opin. Plant Biol. 5, 122127 (2001).
  48. Byrne, S. L. et al. A synteny-based draft genome sequence of the forage grass Lolium perenne. Plant J. 84, 816826 (2015).
  49. Bilgic, H., Hakki, E. E., Pandey, A., Khan, M. K. & Akkaya, M. S. Ancient DNA from 8400 year-old Çatalhöyük wheat: implications for the origin of Neolithic agriculture. PLoS ONE 11, e0151974 (2016).
  50. Dvorak, J., Luo, M.-C. & Akhunov, E. D. N. I. Vavilov's theory of centres of diversity in the light of current understanding of wheat diversity, domestication and evolution. Czech J. Genet. Plant Breed. 47, S20S27 (2011).
  51. Zheng, Y., Crawford, G. W., Jiang, L. & Chen, X. Rice domestication revealed by reduced shattering of archaeological rice from the lower Yangtze valley. Sci. Rep. 6, 28136 (2016).
  52. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497501 (2012).
  53. Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 13091321 (2006).
  54. Buckler, E. S. et al. The genetic architecture of maize flowering time. Science 325, 714718 (2009).
  55. Ramos-Madrigal, J. et al. Genome sequence of a 5,310-year-old maize cob provides insights into the early stages of maize domestication. Curr. Biol. 26, 31953201 (2016).
  56. Wright, S. I. et al. The effects of artificial selection on the maize genome. Science 308, 13101314 (2005).
  57. Galili, G., Levy, A. A. & Feldman, M. Gene-dosage compensation of endosperm proteins in hexaploid wheat Triticum aestivum. Proc. Natl Acad. Sci. USA 83, 65246528 (1986).
  58. Zhang, Z. et al. Duplication and partitioning in evolution and function of homoeologous Q loci governing domestication characters in polyploid wheat. Proc. Natl Acad. Sci. USA 108, 1873718742 (2011).
  59. Dubcovsky, J. & Dvorak, J. Genome plasticity a key factor in the success of polyploid wheat under domestication. Science 316, 18621866 (2007).
  60. Khoury, C. K. et al. Increasing homogeneity in global food supplies and the implications for food security. Proc. Natl Acad. Sci. USA 111, 40014006 (2014).
  61. Massawe, F., Mayes, S. & Cheng, A. Crop diversity: an unexploited treasure trove for food security. Trends Plant Sci. 21, 365368 (2016).
  62. Tanksley, S. D. Seed banks and molecular maps: unlocking genetic potential from the wild. Science 277, 10631066 (1997).
  63. Jafarzadeh, J. et al. Breeding value of primary synthetic wheat genotypes for grain yield. PLoS ONE 11, e0162860 (2016).
  64. Munns, R. et al. Wheat grain yield on saline soils is improved by an ancestral Na+ transporter gene. Nature Biotechnol. 30, 360364 (2012).
  65. Borrill, P., Adamski, N. & Uauy, C. Genomics as the key to unlocking the polyploid potential of wheat. New Phytol. 208, 10081022 (2015).
  66. McCouch, S. R. et al. Through the genetic bottleneck: O. rufipogon as a source of trait-enhancing alleles for O. sativa. Euphytica 154, 317339 (2006).
  67. Liu, Z. et al. Expanding maize genetic resources with predomestication alleles: maize–teosinte introgression populations. Plant Genome (2016).
  68. Witek, K. et al. Accelerated cloning of a potato late blight-resistance gene using RenSeq and SMRT sequencing. Nature Biotechnol. 34, 656660 (2016).
  69. Steuernagel, B. et al. Rapid cloning of disease-resistance genes in plants using mutagenesis and sequence capture. Nature Biotechnol. 34, 652655 (2016).
    This paper highlights a method with great promise for capturing the diversity of large genes families in populations of crops and their wild relatives.
  70. Krasileva, K. V. et al. Uncovering hidden variation in polyploid wheat genomes. Proc. Natl Acad. Sci. USA 114, E913E921 (2017).
    This paper describes functional genome resources that have been developed for tetraploid and hexaploid wheat lines — resources that will expedite many new areas of research.
  71. Zhang, Y. et al. Efficient and transgene-free genome editing in wheat through transient expression of CRISPR/Cas9 DNA or RNA. Nature Commun. 7, 12617 (2016).
  72. Gil-Humanes, J. et al. High efficiency gene targeting in hexaploid wheat using DNA replicons and CRISPR/Cas9. Plant J. (2016).
  73. Biffen, R. H. & Engledow, F. L. Wheat-Breeding Investigations at the Plant Breeding Institute, Cambridge (His Majesty's Stationery Office, 1926).
  74. Allen, A. M. et al. Characterization of a Wheat Breeders' Array suitable for high throughput SNP genotyping of global accessions of hexaploid bread wheat (Triticum aestivium). Plant Biotechnol. J. (2016).
  75. Barabaschi, D. et al. Next generation breeding. Plant Sci. 242, 313 (2015).
  76. Bassi, F. M., Bentley, A. R., Charmet, G., Ortiz, R. & Crossa, J. Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.). Plant Sci. 242, 2336 (2016).
  77. Vivek, B. S. et al. Use of genomic estimated breeding values results in rapid genetic gains for drought tolerance in maize. Plant Genome (2017).
  78. Riaz, A., Periyannan, S., Aitken, E. & Hickey, L. A rapid phenotyping method for adult plant resistance to leaf rust in wheat. Plant Methods 12, 17 (2016).
  79. Varshney, R. K., Terauchi, R. & McCouch, S. R. Harvesting the promising fruits of genomics: applying genome sequencing technologies to crop breeding. PLoS Biol. 12, e1001883 (2014).
  80. Rodgers-Melnick, E., Vera, D. L., Bass, H. W. & Buckler, E. S. Open chromatin reveals the functional maize genome. Proc. Natl Acad. Sci. USA 113, E3177E3184 (2016).
  81. Marulanda, J. J. et al. Optimum breeding strategies using genomic selection for hybrid breeding in wheat, maize, rye, barley, rice and triticale. Theor. Appl. Genet. 129, 19011913 (2016).
  82. Spindel, J. E. et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116, 395408 (2016).
  83. Patil, G. et al. Genomic-assisted haplotype analysis and the development of high-throughput SNP markers for salinity tolerance in soybean. Sci. Rep. 6, 19199 (2016).
  84. Jordan, K. W., Wang, S., Lun, Y. & Gardiner, L. J. A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome Biol. 16, 48 (2015).
  85. Porreca, G. J. et al. Multiplex amplification of large sets of human exons. Nature Methods 4, 931936 (2007).
  86. Saintenac, C. et al. Detailed recombination studies along chromosome 3B provide new insights on crossover distribution in wheat (Triticum aestivum L.). Genetics 181, 393403 (2008).
  87. Sadhu, M. J., Bloom, J. S., Day, L. & Kruglyak, L. CRISPR-directed mitotic recombination enables genetic mapping without crosses. Science 352, 11131116 (2016).
  88. Furbank, R. T. & Tester, M. Technologies to relieve the phenotyping bottleneck. Trends Plant Sci. 16, 635644 (2011).
  89. Fiorani, F. & Schurr, U. Future scenarios for plant phenotyping. Annu. Rev. Plant Biol. 64, 267291 (2013).
  90. Araus, J. L. & Cairns, J. E. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 19, 5261 (2013).
  91. Zamir, D. Where have all the crop phenotypes gone? PLoS Biol. 11, e1001595 (2013).
  92. Loose, M., Malla, S. & Stout, M. Real-time selective sequencing using nanopore technology. Nature Methods 13, 751754 (2016).
  93. Rand, A. C. et al. Cytosine variant calling with high-throughput nanopore sequencing. Preprint at (2016).
  94. Simpson, J. T. et al. Detecting DNA methylation using the Oxford Nanopore Technologies MinION sequencer. Preprint at (2016).

Download references

Author information


  1. John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK.

    • Michael W. Bevan,
    • Cristobal Uauy,
    • Brande B. H. Wulff &
    • Ji Zhou
  2. Earlham Institute, Norwich Research Park, Norwich NR4 7UH, UK.

    • Ji Zhou,
    • Ksenia Krasileva &
    • Matthew D. Clark
  3. The Sainsbury Laboratory, Norwich Research Park, Norwich NR4 7UH, UK.

    • Ksenia Krasileva
  4. School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK.

    • Matthew D. Clark

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Reprints and permissions information is available at

Reviewer Information Nature thanks V. Albert, J. Schmutz, T. Mitchell-Olds and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author details

Additional data