Recent years have seen a surge in plant genome sequencing projects and the comparison of multiple related individuals. The high degree of genomic variation observed led to the realization that single reference genomes do not represent the diversity within a species, and led to the expansion of the pan-genome concept. Pan-genomes represent the genomic diversity of a species and includes core genes, found in all individuals, as well as variable genes, which are absent in some individuals. Variable gene annotations often show similarities across plant species, with genes for biotic and abiotic stress commonly enriched within variable gene groups. Here we review the growth of pan-genomics in plants, explore the origins of gene presence and absence variation, and show how pan-genomes can support plant breeding and evolution studies.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc. Natl Acad. Sci. USA 102, 13950–13955 (2005).
Golicz, A. A., Bayer, P. E., Bhalla, P. L., Batley, J. & Edwards, D. Pangenomics comes of age: from bacteria to plant and animal applications. Trends Genet. 36, 132–145 (2020).
Morgante, M., De Paoli, E. & Radovic, S. Transposable elements and the plant pan-genomes. Curr. Opin. Plant Biol. 10, 149–155 (2007).
Golicz, A. A., Batley, J. & Edwards, D. Towards plant pangenomics. Plant Biotechnol. J. 14, 1099–1105 (2016).
Hurgobin, B. & Edwards, D. SNP discovery using a pangenome: has the single reference approach become obsolete? Biology 6, 21 (2017).
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs. Preprint at https://arxiv.org/abs/2003.06079 (2020).
Li, Y. H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).
Gan, X. et al. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 477, 419–423 (2011).
Schatz, M. C. et al. Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica. Genome Biol. 15, 506 (2014).
Song, J. M. et al. Eight high-quality genomes reveal pan-genome architecture and ecotype differentiation of Brassica napus. Nat. Plants 6, 34–45 (2020).
Jiao, W.-B. & Schneeberger, K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat. Commun. 11, 989 (2019).
Golicz, A. A. et al. The pangenome of an agronomically important crop plant Brassica oleracea. Nat. Commun. 7, 13390 (2016).
Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 (2017).
Hurgobin, B. et al. Homoeologous exchange is a major cause of gene presence/absence variation in the amphidiploid Brassica napus. Plant Biotechnol. J. 16, 1265–1274 (2018).
Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).
Yu, J. Y. et al. Insight into the evolution and functional characteristics of the pan-genome assembly from sesame landraces and modern cultivars. Plant Biotechnol. J. 17, 881–892 (2019).
Zhao, J. et al. Trait associations in the pangenome of pigeon pea (Cajanus cajan). Plant Biotechnol. J. https://doi.org/10.1111/pbi.13354 (2020).
Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).
Gao, L. et al. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51, 1044–1051 (2019).
Liu, Y. et al. Pan-genome of wild and cultivated soybeans. Cell (in the press).
Sears, E. & Miller, T. The history of Chinese Spring wheat. Cereal Res. Commun. 13, 261–263 (1985).
Nsabiyera, V. et al. Fine mapping of Lr49 using 90K SNP chip array and flow sorted chromosome sequencing in wheat. Front. Plant Sci. 10, 1787 (2019).
Tian, X. et al. Building a sequence map of the pig pan-genome from multiple de novo assemblies and Hi-C data. Sci. China Life Sci. 63, 750–763 (2020).
Li, R. et al. Towards the complete goat pan-genome by recovering missing genomic segments from the reference genome. Front. Genet. 10, 1169 (2019).
Pimentel, D. et al. Economic and environmental benefits of biodiversity. BioScience 47, 747–757 (1997).
Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular genetics of crop domestication. Cell 127, 1309–1321 (2006).
Schouten, H. J. et al. Breeding has increased the diversity of cultivated tomato in The Netherlands. Front. Plant Sci. 10, 1606 (2019).
Tian, D., Traw, M., Chen, J., Kreitman, M. & Bergelson, J. Fitness costs of R-gene-mediated resistance in Arabidopsis thaliana. Nature 423, 74–77 (2003).
Kehr, B. et al. Diversity in non-repetitive human sequences not found in the reference genome. Nat. Genet. 49, 588 (2017).
Manni, M. & Zdobnov, E. M. Microbial contaminants cataloged as novel human sequences in recent human pan-genomes. Preprint at https://doi.org/10.1101/2020.03.16.994376 (2020).
Van de Weyer, A.-L. et al. A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell 178, 1260–1272 (2019).
Pryor, T. The origin and structure of fungal disease resistance genes in plants. Trends Genet. 3, 157–161 (1987).
Crute, I. R. & Pink, D. Genetics and utilization of pathogen resistance in plants. The Plant Cell 8, 1747 (1996).
Michelmore, R. W. & Meyers, B. C. Clusters of resistance genes in plants evolve by divergent selection and a birth-and-death process. Genome Res. 8, 1113–1130 (1998).
Shi, J. et al. Genome-wide analysis of nucleotide binding site-leucine-rich repeats (NBS-LRR) disease resistance genes in Gossypium hirsutum. Physiol. Mol. Plant P. 104, 1–8 (2018).
Leister, D. et al. Rapid reorganization of resistance gene homologues in cereal genomes. Proc. Natl Acad. Sci. USA 95, 370–375 (1998).
Cook, D. E. et al. Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338, 1206–1209 (2012).
Chae, E. et al. Species-wide genetic incompatibility analysis identifies immune genes as hot spots of deleterious epistasis. Cell 159, 1341–1351 (2014).
Bayer, P. E. et al. Variation in abundance of predicted resistance genes in the Brassica oleracea pangenome. Plant Biotechnol. J. 17, 789–800 (2019).
Dolatabadian, A. et al. Characterization of disease resistance genes in the Brassica napus pangenome reveals significant structural variation. Plant Biotechnol. J. 18, 969–982 (2019).
Sudupak, M. A., Bennetzen, J. & Hulbert, S. H. Unequal exchange and meiotic instability of disease-resistance genes in the Rp1 region of maize. Genetics 133, 119–125 (1993).
Kuang, H., Woo, S.-S., Meyers, B. C., Nevo, E. & Michelmore, R. W. Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce. The Plant Cell 16, 2870–2894 (2004).
Panchy, N., Lehti-Shiu, M. & Shiu, S.-H. Evolution of gene duplication in plants. Plant Physiol. 171, 2294 (2016).
Zhang, L. et al. Rapid evolution of protein diversity by de novo origination in Oryza. Nat. Ecol. Evol. 3, 679–690 (2019).
Dunning, L. T. et al. Lateral transfers of large DNA fragments spread functional genes among grasses. Proc. Natl Acad. Sci. USA 116, 4416–4425 (2019).
Woodhouse, M. R. et al. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol. 8, e1000409 (2010).
Woodhouse, M. R., Pedersen, B. & Freeling, M. Transposed genes in Arabidopsis are often associated with flanking repeats. PLoS Genet. 6, e1000949 (2010).
Edger, P. P. et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 51, 541–547 (2019).
Bird, K. A. et al. Replaying the evolutionary tape to investigate subgenome dominance in allopolyploid Brassica napus. Preprint at https://doi.org/10.1101/814491 (2019).
Tang, H. et al. Altered patterns of fractionation and exon deletions in Brassica rapa support a two-step model of paleohexaploidy. Genetics 190, 1563–1574 (2012).
Cheng, F., Wu, J. & Wang, X. Genome triplication drove the diversification of Brassica plants. Hortic. Res. 1, 14024 (2014).
Golicz, A. A. Construction and analysis of the Brassica oleracea pangenome. PhD thesis, The University of Queensland (2016).
Bird, K. A., VanBuren, R., Puzey, J. R. & Edger, P. P. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 220, 87–93 (2018).
Chalhoub, B. et al. Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950–953 (2014).
Samans, B., Chalhoub, B. & Snowdon, R. J. Surviving a genome collision: genomic signatures of allopolyploidization in the recent crop species Brassica napus. Plant Genome-US 10, 1–15 (2017).
Feldman, M., Levy, A. A., Fahima, T. & Korol, A. Genomic asymmetry in allopolyploid plants: wheat as a model. J. Exp. Bot. 63, 5045–5059 (2012).
Appels, R. et al. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361, eaar7191 (2018).
Ramírez-González, R. et al. The transcriptional landscape of polyploid wheat. Science 361, eaar6089 (2018).
Bardil, A., de Almeida, J. D., Combes, M. C., Lashermes, P. & Bertrand, B. Genomic expression dominance in the natural allopolyploid Coffea arabica is massively affected by growth temperature. New Phytol. 192, 760–774 (2011).
Yoo, M., Szadkowski, E. & Wendel, J. Homoeolog expression bias and expression level dominance in allopolyploid cotton. Heredity 110, 171–180 (2013).
Edger, P. P. et al. Subgenome dominance in an interspecific hybrid, synthetic allopolyploid, and a 140-year-old naturally established neo-allopolyploid monkeyflower. Plant Cell 29, 2150–2167 (2017).
Kashkush, K., Feldman, M. & Levy, A. A. Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160, 1651–1659 (2002).
Hawkins, J. S., Proulx, S. R., Rapp, R. A. & Wendel, J. F. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. Proc. Natl Acad. Sci. USA 106, 17811–17816 (2009).
Freeling, M. et al. Many or most genes in Arabidopsis transposed after the origin of the order Brassicales. Genome Res. 18, 1924–1937 (2008).
McClintock, B. Induction of instability at selected loci in maize. Genetics 38, 579 (1953).
Ou, S. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Yan, H., Bombarely, A. & Li, S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics https://doi.org/10.1093/bioinformatics/btaa519(2020).
da Cruz, M. H. P., Domingues, D. S., Saito, P. T. M., Paschoal, A. R. & Bugatti, P. H. TERL: classification of transposable elements by convolutional neural networks. Preprint at https://doi.org/10.1101/2020.03.25.000935 (2020).
Van Oss, S. B. & Carvunis, A.-R. De novo gene birth. PLoS Genet. 15, e1008160 (2019).
Golicz, A. A., Bhalla, P. L. & Singh, M. B. lncRNAs in plant and animal sexual reproduction. Trends Plant Sci. 23, 195–205 (2018).
Ruiz-Orera, J., Messeguer, X., Subirana, J. A. & Alba, M. M. Long non-coding RNAs as a source of new peptides. eLife 3, e03523 (2014).
Garrison, E. et al. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36, 875–879 (2018).
Rabbani, L., Mueller, J. & Weigel, D. An algorithm to build a multi-genome reference. Preprint at https://doi.org/10.1101/2020.04.11.036871 (2020).
Jensen, S. E. et al. A sorghum practical haplotype graph facilitates genome-wide imputation and cost-effective genomic prediction. Plant Genome-US 13, e20009 (2020).
Contreras-Moreira, B. et al. Analysis of plant pan-genomes and transcriptomes with GET_HOMOLOGUES-EST, a clustering solution for sequences of the same species. Front. Plant Sci. 8, 184 (2017).
Golicz, A. A., Bhalla, P. L. & Singh, M. B. MCRiceRepGP: a framework for the identification of genes associated with sexual reproduction in rice. Plant J. 96, 188–202 (2018).
Hassani-Pak, K. et al. Developing integrated crop knowledge networks to advance candidate gene discovery. Appl. Transl. Genom. 11, 18–26 (2016).
Rodgers-Melnick, E., Vera, D. L., Bass, H. W. & Buckler, E. S. Open chromatin reveals the functional maize genome. Proc. Natl Acad. Sci. USA 113, E3177–E3184 (2016).
Maistrenko, O. M. et al. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic strain diversity. ISME J. 14, 1247–1259 (2020).
Lin, K. et al. Beyond genomic variation - comparison and functional annotation of three Brassica rapa genomes: a turnip, a rapid cycling and a Chinese cabbage. BMC Genomics 15, 250 (2014).
Hirsch, C. N. et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell 26, 121–135 (2014).
Yao, W. et al. Exploring the rice dispensable genome using a metagenome-like assembly strategy. Genome Biol. 16, 187 (2015).
Pinosio, S. et al. Characterization of the poplar pan-genome by genome-wide identification of structural variation. Mol. Biol. Evol. 33, 2706–2719 (2016).
Zhou, P. et al. Exploring structural variation and gene family architecture with de novo assemblies of 15 Medicago genomes. BMC Genomics 18, 261 (2017).
Ou, L. J. et al. Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol. 220, 360–363 (2018).
Wang, W. S. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Hubner, S. et al. Sunflower pan-genome analysis shows that hybridization altered gene content and disease resistance. Nat. Plants 5, 54–62 (2019).
Trouern-Trend, A. J. et al. Comparative genomics of six Juglans species reveals disease-associated gene family contractions. Plant J. 102, 410–423 (2020).
The authors declare no competing interests.
Peer review information Nature Plants thanks Xuehui Huang, Fay-Wei Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Bayer, P.E., Golicz, A.A., Scheben, A. et al. Plant pan-genomes are the new reference. Nat. Plants 6, 914–920 (2020). https://doi.org/10.1038/s41477-020-0733-0
Molecular Ecology Resources (2021)
Journal of Plant Physiology (2021)
Modelling selection response in plant-breeding programs using crop models as mechanistic gene-to-phenotype (CGM-G2P) multi-trait link functions
in silico Plants (2021)
Molecular Breeding (2021)
The Plant Journal (2021)