Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Plant pan-genomes are the new reference

An Author Correction to this article was published on 02 November 2020

This article has been updated

Abstract

Recent years have seen a surge in plant genome sequencing projects and the comparison of multiple related individuals. The high degree of genomic variation observed led to the realization that single reference genomes do not represent the diversity within a species, and led to the expansion of the pan-genome concept. Pan-genomes represent the genomic diversity of a species and includes core genes, found in all individuals, as well as variable genes, which are absent in some individuals. Variable gene annotations often show similarities across plant species, with genes for biotic and abiotic stress commonly enriched within variable gene groups. Here we review the growth of pan-genomics in plants, explore the origins of gene presence and absence variation, and show how pan-genomes can support plant breeding and evolution studies.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Comparison of pan-genome approaches.
Fig. 2: The growth of pan-genome publications.
Fig. 3: Different sources for novel genes.

Change history

  • 02 November 2020

    An amendment to this paper has been published and can be accessed via a link at the top of the paper.

References

  1. 1.

    Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).

    CAS  Google Scholar 

  2. 2.

    Golicz, A. A., Bayer, P. E., Bhalla, P. L., Batley, J. & Edwards, D. Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet. 36, 132–145 (2020).

    CAS  Google Scholar 

  3. 3.

    Morgante, M., De Paoli, E. & Radovic, S. Transposable elements and the plant pan-genomes. Curr. Opin. Plant Biol. 10, 149–155 (2007).

    CAS  Google Scholar 

  4. 4.

    Golicz, A. A., Batley, J. & Edwards, D. Towards plant pangenomics. Plant Biotechnol. J. 14, 1099–1105 (2016).

    Google Scholar 

  5. 5.

    Hurgobin, B. & Edwards, D. SNP discovery using a pangenome: has the single reference approach become obsolete? Biology 6, 21 (2017).

    Google Scholar 

  6. 6.

    Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs. Preprint at https://arxiv.org/abs/2003.06079 (2020).

  7. 7.

    Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

    CAS  Google Scholar 

  8. 8.

    Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).

    CAS  Google Scholar 

  9. 9.

    Schatz, M. C. et al. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol. 15, 506 (2014).

    Google Scholar 

  10. 10.

    Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).

    CAS  Google Scholar 

  11. 11.

    Jiao, W.-B. & Schneeberger, K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2019).

    Google Scholar 

  12. 12.

    Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).

    CAS  Google Scholar 

  13. 13.

    Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).

    CAS  Google Scholar 

  14. 14.

    Hurgobin, B. et al. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 16, 1265–1274 (2018).

    CAS  Google Scholar 

  15. 15.

    Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).

    Google Scholar 

  16. 16.

    Yu, J. Y. et al. Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars. Plant Biotechnol. J. 17, 881–892 (2019).

    CAS  Google Scholar 

  17. 17.

    Zhao, J. et al. Trait associations in the pangenome of pigeon pea (Cajanus cajan). Plant Biotechnol. J. https://doi.org/10.1111/pbi.13354 (2020).

  18. 18.

    Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

    CAS  Google Scholar 

  19. 19.

    Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).

    CAS  Google Scholar 

  20. 20.

    Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell (in the press).

  21. 21.

    Sears, E. & Miller, T. The history of Chinese Spring wheat. Cereal Res. Commun. 13, 261–263 (1985).

    Google Scholar 

  22. 22.

    Nsabiyera, V. et al. Fine mapping of Lr49 using 90K SNP chip array and flow sorted chromosome sequencing in wheat. Front. Plant Sci. 10, 1787 (2019).

    Google Scholar 

  23. 23.

    Tian, X. et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci. China Life Sci. 63, 750–763 (2020).

    Google Scholar 

  24. 24.

    Li, R. et al. Towards the complete goat pan-genome by recovering missing genomic segments from the reference genome. Front. Genet. 10, 1169 (2019).

    CAS  Google Scholar 

  25. 25.

    Pimentel, D. et al. Economic and environmental benefits of biodiversity. BioScience 47, 747–757 (1997).

    Google Scholar 

  26. 26.

    Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).

    CAS  Google Scholar 

  27. 27.

    Schouten, H. J. et al. Breeding has increased the diversity of cultivated tomato in The Netherlands. Front. Plant Sci. 10, 1606 (2019).

    Google Scholar 

  28. 28.

    Tian, D., Traw, M., Chen, J., Kreitman, M. & Bergelson, J. Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature 423, 74–77 (2003).

    CAS  Google Scholar 

  29. 29.

    Kehr, B. et al. Diversity in non-repetitive human sequences not found in the reference genome. Nat. Genet. 49, 588 (2017).

    CAS  Google Scholar 

  30. 30.

    Manni, M. & Zdobnov, E. M. Microbial contaminants cataloged as novel human sequences in recent human pan-genomes. Preprint at https://doi.org/10.1101/2020.03.16.994376 (2020).

  31. 31.

    Van de Weyer, A.-L. et al. A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell 178, 1260–1272 (2019).

    Google Scholar 

  32. 32.

    Pryor, T. The origin and structure of fungal disease resistance genes in plants. Trends Genet. 3, 157–161 (1987).

    Google Scholar 

  33. 33.

    Crute, I. R. & Pink, D. Genetics and utilization of pathogen resistance in plants. The Plant Cell 8, 1747 (1996).

    CAS  Google Scholar 

  34. 34.

    Michelmore, R. W. & Meyers, B. C. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8, 1113–1130 (1998).

    CAS  Google Scholar 

  35. 35.

    Shi, J. et al. Genome-wide analysis of nucleotide binding site-leucine-rich repeats (NBS-LRR) disease resistance genes in Gossypium hirsutum. Physiol. Mol. Plant P. 104, 1–8 (2018).

    CAS  Google Scholar 

  36. 36.

    Leister, D. et al. Rapid reorganization of resistance gene homologues in cereal genomes. Proc. Natl Acad. Sci. USA 95, 370–375 (1998).

    CAS  Google Scholar 

  37. 37.

    Cook, D. E. et al. Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338, 1206–1209 (2012).

    CAS  Google Scholar 

  38. 38.

    Chae, E. et al. Species-wide genetic incompatibility analysis identifies immune genes as hot spots of deleterious epistasis. Cell 159, 1341–1351 (2014).

    CAS  Google Scholar 

  39. 39.

    Bayer, P. E. et al. Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome. Plant Biotechnol. J. 17, 789–800 (2019).

    CAS  Google Scholar 

  40. 40.

    Dolatabadian, A. et al. Characterization of disease resistance genes in the Brassica napus pangenome reveals significant structural variation. Plant Biotechnol. J. 18, 969–982 (2019).

    Google Scholar 

  41. 41.

    Sudupak, M. A., Bennetzen, J. & Hulbert, S. H. Unequal exchange and meiotic instability of disease-resistance genes in the Rp1 region of maize. Genetics 133, 119–125 (1993).

    CAS  Google Scholar 

  42. 42.

    Kuang, H., Woo, S.-S., Meyers, B. C., Nevo, E. & Michelmore, R. W. Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. The Plant Cell 16, 2870–2894 (2004).

    CAS  Google Scholar 

  43. 43.

    Panchy, N., Lehti-Shiu, M. & Shiu, S.-H. Evolution of gene duplication in plants. Plant Physiol. 171, 2294 (2016).

    CAS  Google Scholar 

  44. 44.

    Zhang, L. et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat. Ecol. Evol. 3, 679–690 (2019).

    Google Scholar 

  45. 45.

    Dunning, L. T. et al. Lateral transfers of large DNA fragments spread functional genes among grasses. Proc. Natl Acad. Sci. USA 116, 4416–4425 (2019).

    CAS  Google Scholar 

  46. 46.

    Woodhouse, M. R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 (2010).

    Google Scholar 

  47. 47.

    Woodhouse, M. R., Pedersen, B. & Freeling, M. Transposed genes in Arabidopsis are often associated with flanking repeats. PLoS Genet. 6, e1000949 (2010).

    Google Scholar 

  48. 48.

    Edger, P. P. et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 51, 541–547 (2019).

    CAS  Google Scholar 

  49. 49.

    Bird, K. A. et al. Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus. Preprint at https://doi.org/10.1101/814491 (2019).

  50. 50.

    Tang, H. et al. Altered patterns of fractionation and exon deletions in Brassica rapa support a two-step model of paleohexaploidy. Genetics 190, 1563–1574 (2012).

    CAS  Google Scholar 

  51. 51.

    Cheng, F., Wu, J. & Wang, X. Genome triplication drove the diversification of Brassica plants. Hortic. Res. 1, 14024 (2014).

    Google Scholar 

  52. 52.

    Golicz, A. A. Construction and analysis of the Brassica oleracea pangenome. PhD thesis, The University of Queensland (2016).

  53. 53.

    Bird, K. A., VanBuren, R., Puzey, J. R. & Edger, P. P. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 220, 87–93 (2018).

    Google Scholar 

  54. 54.

    Chalhoub, B. et al. Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950–953 (2014).

    CAS  Google Scholar 

  55. 55.

    Samans, B., Chalhoub, B. & Snowdon, R. J. Surviving a genome collision: genomic signatures of allopolyploidization in the recent crop species Brassica napus. Plant Genome-US 10, 1–15 (2017).

    CAS  Google Scholar 

  56. 56.

    Feldman, M., Levy, A. A., Fahima, T. & Korol, A. Genomic asymmetry in allopolyploid plants: wheat as a model. J. Exp. Bot. 63, 5045–5059 (2012).

    CAS  Google Scholar 

  57. 57.

    Appels, R. et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).

    Google Scholar 

  58. 58.

    Ramírez-González, R. et al. The transcriptional landscape of polyploid wheat. Science 361, eaar6089 (2018).

    Google Scholar 

  59. 59.

    Bardil, A., de Almeida, J. D., Combes, M. C., Lashermes, P. & Bertrand, B. Genomic expression dominance in the natural allopolyploid Coffea arabica is massively affected by growth temperature. New Phytol. 192, 760–774 (2011).

    CAS  Google Scholar 

  60. 60.

    Yoo, M., Szadkowski, E. & Wendel, J. Homoeolog expression bias and expression level dominance in allopolyploid cotton. Heredity 110, 171–180 (2013).

    CAS  Google Scholar 

  61. 61.

    Edger, P. P. et al. Subgenome dominance in an interspecific hybrid, synthetic allopolyploid, and a 140-year-old naturally established neo-allopolyploid monkeyflower. Plant Cell 29, 2150–2167 (2017).

    CAS  Google Scholar 

  62. 62.

    Kashkush, K., Feldman, M. & Levy, A. A. Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160, 1651–1659 (2002).

    CAS  Google Scholar 

  63. 63.

    Hawkins, J. S., Proulx, S. R., Rapp, R. A. & Wendel, J. F. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. Proc. Natl Acad. Sci. USA 106, 17811–17816 (2009).

    CAS  Google Scholar 

  64. 64.

    Freeling, M. et al. Many or most genes in Arabidopsis transposed after the origin of the order Brassicales. Genome Res. 18, 1924–1937 (2008).

    CAS  Google Scholar 

  65. 65.

    McClintock, B. Induction of instability at selected loci in maize. Genetics 38, 579 (1953).

    CAS  Google Scholar 

  66. 66.

    Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).

    CAS  Google Scholar 

  67. 67.

    Yan, H., Bombarely, A. & Li, S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa519(2020).

  68. 68.

    da Cruz, M. H. P., Domingues, D. S., Saito, P. T. M., Paschoal, A. R. & Bugatti, P. H. TERL: classification of transposable elements by convolutional neural networks. Preprint at https://doi.org/10.1101/2020.03.25.000935 (2020).

  69. 69.

    Van Oss, S. B. & Carvunis, A.-R. De novo gene birth. PLoS Genet. 15, e1008160 (2019).

    Google Scholar 

  70. 70.

    Golicz, A. A., Bhalla, P. L. & Singh, M. B. lncRNAs in plant and animal sexual reproduction. Trends Plant Sci. 23, 195–205 (2018).

    CAS  Google Scholar 

  71. 71.

    Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).

    Google Scholar 

  72. 72.

    Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).

    CAS  Google Scholar 

  73. 73.

    Rabbani, L., Mueller, J. & Weigel, D. An algorithm to build a multi-genome reference. Preprint at https://doi.org/10.1101/2020.04.11.036871 (2020).

  74. 74.

    Jensen, S. E. et al. A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome-US 13, e20009 (2020).

    CAS  Google Scholar 

  75. 75.

    Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 184 (2017).

    Google Scholar 

  76. 76.

    Golicz, A. A., Bhalla, P. L. & Singh, M. B. MCRiceRepGP: a framework for the identification of genes associated with sexual reproduction in rice. Plant J. 96, 188–202 (2018).

    CAS  Google Scholar 

  77. 77.

    Hassani-Pak, K. et al. Developing integrated crop knowledge networks to advance candidate gene discovery. Appl. Transl. Genom. 11, 18–26 (2016).

    Google Scholar 

  78. 78.

    Rodgers-Melnick, E., Vera, D. L., Bass, H. W. & Buckler, E. S. Open chromatin reveals the functional maize genome. Proc. Natl Acad. Sci. USA 113, E3177–E3184 (2016).

    CAS  Google Scholar 

  79. 79.

    Maistrenko, O. M. et al. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic strain diversity. ISME J. 14, 1247–1259 (2020).

    Google Scholar 

  80. 80.

    Lin, K. et al. Beyond genomic variation - comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics 15, 250 (2014).

    Google Scholar 

  81. 81.

    Hirsch, C. N. et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135 (2014).

    CAS  Google Scholar 

  82. 82.

    Yao, W. et al. Exploring the rice dispensable genome using a metagenome-like assembly strategy. Genome Biol. 16, 187 (2015).

    Google Scholar 

  83. 83.

    Pinosio, S. et al. Characterization of the poplar pan-genome by genome-wide identification of structural variation. Mol. Biol. Evol. 33, 2706–2719 (2016).

    CAS  Google Scholar 

  84. 84.

    Zhou, P. et al. Exploring structural variation and gene family architecture with de novo assemblies of 15 Medicago genomes. BMC Genomics 18, 261 (2017).

    Google Scholar 

  85. 85.

    Ou, L. J. et al. Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol. 220, 360–363 (2018).

    Google Scholar 

  86. 86.

    Wang, W. S. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).

    CAS  Google Scholar 

  87. 87.

    Hubner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).

    CAS  Google Scholar 

  88. 88.

    Trouern-Trend, A. J. et al. Comparative genomics of six Juglans species reveals disease-associated gene family contractions. Plant J. 102, 410–423 (2020).

    CAS  Google Scholar 

Download references

Author information

Affiliations

Authors

Contributions

P.B., A.G., A.S., J.B. and D.E. wrote and edited the manuscript.

Corresponding author

Correspondence to David Edwards.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Plants thanks Xuehui Huang, Fay-Wei Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bayer, P.E., Golicz, A.A., Scheben, A. et al. Plant pan-genomes are the new reference. Nat. Plants 6, 914–920 (2020). https://doi.org/10.1038/s41477-020-0733-0

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing