A reference genome for pea provides insight into legume genome evolution

Kreplak, Jonathan; Madoui, Mohammed-Amin; Cápal, Petr; Novák, Petr; Labadie, Karine; Aubert, Grégoire; Bayer, Philipp E.; Gali, Krishna K.; Syme, Robert A.; Main, Dorrie; Klein, Anthony; Bérard, Aurélie; Vrbová, Iva; Fournier, Cyril; d’Agata, Leo; Belser, Caroline; Berrabah, Wahiba; Toegelová, Helena; Milec, Zbyněk; Vrána, Jan; Lee, HueyTyng; Kougbeadjo, Ayité; Térézol, Morgane; Huneau, Cécile; Turo, Chala J.; Mohellibi, Nacer; Neumann, Pavel; Falque, Matthieu; Gallardo, Karine; McGee, Rebecca; Tar’an, Bunyamin; Bendahmane, Abdelhafid; Aury, Jean-Marc; Batley, Jacqueline; Le Paslier, Marie-Christine; Ellis, Noel; Warkentin, Thomas D.; Coyne, Clarice J.; Salse, Jérome; Edwards, David; Lichtenzveig, Judith; Macas, Jiří; Doležel, Jaroslav; Wincker, Patrick; Burstin, Judith

doi:10.1038/s41588-019-0480-1

Download PDF

Article
Open access
Published: 02 September 2019

A reference genome for pea provides insight into legume genome evolution

Nature Genetics volume 51, pages 1411–1422 (2019)Cite this article

87k Accesses
307 Citations
285 Altmetric
Metrics details

Subjects

Abstract

We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel’s original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae diverged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.

The complex polyploid genome architecture of sugarcane

Article Open access 27 March 2024

A. L. Healey, O. Garsmeur, … A. D’Hont

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Qichao Lian, Bruno Huettel, … Raphael Mercier

Technology-enabled great leap in deciphering plant genomes

Article 20 March 2024

Lingjuan Xie, Xiaojiao Gong, … Longjiang Fan

Main

Pea (Pisum sativum L., 2n = 14) is the second most important grain legume in the world after common bean and is an important green vegetable with 14.3 t of dry pea and 19.9 t of green pea produced in 2016 (http://www.fao.org/faostat/). Pea belongs to the Leguminosae (or Fabaceae), which includes cool season grain legumes from the Galegoid clade, such as pea, lentil (Lens culinaris Medik.), chickpea (Cicer arietinum L.), faba bean (Vicia faba L.) and tropical grain legumes from the Milletoid clade, such as common bean (Phaseolus vulgaris L.), cowpea (Vigna unguiculata (L.) Walp.) and mungbean (Vigna radiata (L.) R. Wilczek). It provides significant ecosystem services: it is a valuable source of dietary proteins, mineral nutrients, complex starch and fibers with demonstrated health benefits^1,2,3,4 and its symbiosis with N-fixing soil bacteria reduces the need for applied N fertilizers so mitigating greenhouse gas emissions^5,6,7. Pea was domesticated ~10,000 years ago by Neolithic farmers of the Fertile Crescent, along with cereals and other grain legumes⁸. The large reservoir of genetic diversity in Pisum has facilitated its spread throughout Asia, Europe, Africa, the Americas and Oceania where it has adapted to diverse environments and culinary practices (https://iyp2016.org/). Due to its large genome size (1 C ~ 4.45 gigabases, Gb⁹), pea genomics has lagged behind that of legumes with smaller genomes, such as Medicago truncatula Gaertn.¹⁰, Lotus japonicus L.¹¹ or soybean (Glycine max (L.) Merr)¹². Yet, pea has been studied as a genetic model since the eighteenth century; the analysis of the inheritance of different pea morphotypes led Gregor Mendel to uncover the laws of genetics¹³. Several pea developmental mutations have since been characterized¹⁴ and chromosomal regions controlling agronomic traits identified¹⁵, but tools exploiting pea diversity for plant breeding, identifying favorable alleles underlying phenotypic variations and accelerating trait improvement by marker-assisted selection have been limited. The pea genome is large, probably resulting from a recent expansion and diversification of retrotransposons¹⁶. Early reassociation kinetic studies of the pea genome indicated that 75–97% is made up of a heterogeneous population of repetitive sequences^17,18. More recent investigations confirmed the occurrence of highly diverse families of high to moderately repeated sequences comprising about 76% of pea nuclear DNA¹⁹. When the repetitive DNA sequences of pea, soybean and M. truncatula are compared, little sequence similarity is found between pea and soybean¹⁹. Repetitive sequences between pea and M. truncatula were more similar but differed in abundance. The pea karyotype includes two sub-metacentric (1 and 2) and five acrocentric (3, 4, 5, 6 and 7) chromosomes¹⁶. Several major rearrangements, including translocations between nonhomologous chromosomes, have been reported^20,21,22.

Technological innovation now enables the sequencing and assembly of large genomes, bridging the gap between models and crops for quantitative trait analysis and genome-wide breeding approaches. Accordingly, an international consortium was formed to produce a reference genome sequence for pea. Here we report the draft assembly of the seven chromosomes of the inbred pea cultivar ‘Caméor’, released by the French breeding company Seminor in 1973 and characterized by its protein-rich seeds. This fully annotated assembly builds on genomic resources developed for Caméor over the last decade (Supplementary Fig. 1) and will enable genomic-assisted crop improvement. It provides insights into legume genome evolution, with resequencing data for 42 wild, landrace and cultivar Pisum genotypes, revealing genomic events that have shaped the evolution of this large and diverse genus.

Results

Genome sequencing and assembly

Complementary approaches were combined to obtain the pea reference genome assembly (Supplementary Fig. 2). Whole-genome Illumina short-read sequences (281× genome coverage; Supplementary Table 1) were assembled into contigs using SoapdeNovo2, then combined into scaffolds using long-range PacBio RSII sequences (13× genome coverage; Supplementary Table 1) and whole-genome profiling of a bacterial articial chromosome (BAC) library²³. Scaffolds were manually curated for inter and intrachromosomal chimeras using (1) sequences obtained from single chromosomes isolated by flow-cytometry²⁴ (Supplementary Fig. 3) and (2) an ultra-high-density skim genotyping-by-sequencing genetic map (Supplementary Dataset 1). Curated scaffolds were then integrated into 24,623 super-scaffolds (L50 of 415 kilobases (kb), Supplementary Table 2) using BioNano maps (Supplementary Table 3 and Supplementary Table 4). The seven pseudomolecules representing the pea chromosomes were obtained by anchoring super-scaffolds onto high-density genetic maps (Supplementary Dataset 2). Pseudomolecules were named according to the reference pea genetic map²⁵ and chromosome numbering²⁴ (Supplementary Table 5).

The pea genome v.1a assembly spans 3.92 Gb (Table 1) representing ~88% of the estimated pea genome size (~4.45 Gb), with 82.5% (3.23 Gb) of sequences assigned to the seven pseudomolecules and 14,266 unassigned scaffolds representing 685 Mb. The estimated size gap between the genome and assembly was mostly due to highly repeated sequences collapsed in the assembly, reflected by repeat proportions in unassembled reads compared to the assembly (Supplementary Fig. 4 and Supplementary Table 6). The most under-represented repeats were tandemly arranged satellite repeats and ribosomal RNA genes whose arrays were highly reduced or absent from the assembly, accounting for about 15% of the missing sequence and probably more at the centromeres and telomeres. No group of dispersed repeats was missing from the assembly, but under-representation of high copy number mobile elements accounted for most (~75%) of the difference between assembly length and estimated genome size. Recent long read sequencing technologies should in the future allow access to collapsed repeats and missing sequences.

Table 1 Characteristics of the pea genome assembly v.1a

Full size table

Centromere positions were indicated by regions of suppressed meiotic recombination revealed by comparing marker positions in the skim-GBS genetic map with the pseudomolecules (Fig. 1a and Supplementary Fig. 5). These were confirmed using selected sequences for FISH (Fig. 1b–f). Pea chromosomes are metapolycentric, characterized by extended primary constrictions containing multiple domains of centromeric histone cenH3²⁶. The coordinates of nonrecombining regions of the pseudomolecules agreed well with centromere positions obtained from cytogenetic measurements of the pea karyotype (Fig. 1b and Supplementary Notes). Outside centromeres, recombination rate appeared constant along chromosomes and marker order on pseudomolecules was highly (Spearman r > 0.95) collinear with high-density linkage maps of five recombinant inbred line (RIL) populations from intra-specific crosses²⁵ (Supplementary Dataset 2).

Repeat annotation and gene prediction

Annotation (Supplementary Fig. 6) identified 2,225,175 repetitive elements clustered into 2,940 consensus sequences representing ~83% of the genome (Table 1 and Fig. 1a). Most of these corresponded to transposable elements (TE) that were further sub-classified (Supplementary Table 7). Retrotransposons (Class I), with 1,945,520 copies, were the most abundant. Long-terminal repeat (LTR) retrotransposons (1,707,747 copies) represented 72.7% of the genome, with Ty3-gypsy Ogre elements being their major lineage (Supplementary Table 7). The 246,432 transposons (Class II) represented 5.4% of the genome, 84% of which were terminal-inverted repeat (TIR) transposons (Supplementary Table 7). TE family distribution varied across the genome (Fig. 1g). For example, the abundant Ogre family was distributed throughout all chromosomes with a lower density near telomeres. In contrast, Ty1-copia Ivana and Ty3-gypsy TatV were preferentially found near telomeres and Ty3-gypsy chromovirus CRM were mainly located around centromeres.

Ab initio and homology-based methods were combined to annotate protein-coding sequences (Supplementary Notes). In total, 44,756 complete and 29 truncated genes were predicted (Table 1 and Supplementary Table 8), with an average gene length, coding sequence length and exon number of 2,784 base pairs (bp), 1,016 bp and 6.33 exons, respectively. The vast majority of gene models were supported by complementary DNA/expressed sequence tag evidence. The completeness of the gene repertoire was assessed using BUSCO v.3.0.2 (see methods). From a core set of 1,440 single-copy ortholog genes from the Embryophyta lineage, 92.3% were complete in the assembly (67.4% as single-copy, 24.9% as duplicates), 2.7% were fragmented and 5.0% were not found, suggesting that the assembly includes most of the pea gene space. We identified 7,191 long non-coding RNAs, 824 transfer RNAs (tRNAs) and 71 microRNAs (miRNAs) expressed in developing seeds (Fig. 1a, Supplementary Notes). Fourteen of these miRNA and their 67 putative targets were identified for the first time (Supplementary Dataset 3).

Legume genome size evolution

Genome size varies significantly among land plants²⁷. The pea genome (~4.45 Gb (ref. ⁹)) is within the upper range for the superrosid eudicots²⁷. Among 695 Leguminosae species, only 104 have a larger genome size than P. sativum²⁸. All but three of these belong to the Fabeae tribe, which includes the genera Lathyrus, Vicia, Pisum and Lens. The Fabeae thus display distinctively large genomes compared to the closely related Trifolieae (genome size ~1.05 Gb) and Cicereae (genome size ~1.27 Gb (ref. ²⁸)). The pea genome assembly was thus a good opportunity to study the drivers of genome expansion in the Fabeae.

Genome expansion in plants is primarily driven by polyploidization (whole-genome duplication events) and the proliferation of TEs. A comparison with 21 eudicot species, especially Leguminosae (Supplementary Dataset 4 and Fig. 2a,b), showed that pea has an intermediate number of gene-coding sequences (44,791; Supplementary Dataset 4), ranking fifth after Cajanus cajan (L.) Millsp., M. truncatula, Lupinus angustifolius L. and G. max (Fig. 2a), the latter two exhibiting recent paleo-polyplodization^12,29 (Fig. 2b). Notably, the pea genome contains the largest percentage of singletons (54%) as compared to other legumes (Supplementary Dataset 5), which could explain why pea was such a successful plant model in early genetics when large collections of mutants were described for contrasting phenotypes³⁰. Paralogs and orthologs were identified using Orthofinder (Supplementary Notes). The distribution of synonymous substitutions per synonymous site (Ks) for pea paralog pairs shows no evidence of a recent whole-genome duplication but reflects the ancestral Papilionoideae whole-genome duplication event (PWGD), estimated to have occurred ~55 million years ago (Ma)^10,31 and the whole-genome triplication event common to the core eudicots³². The pea genome shows the highest whole-genome mutation rate among the Leguminosae, as demonstrated by a shift in the pea PWGD-peak (mode at Ks = 1) compared to other species (for example, M. truncatula at Ks = 0.83 and G. max at Ks = 0.61; Supplementary Fig. 7 and Supplementary Table 9), consistent with pea having the highest percentage of genus specific genes (33%; Supplementary Dataset 5). We classified paralog pairs according to their presence or absence among taxonomic lineages (Fig. 2c, Supplementary Dataset 5 and Supplementary Fig. 8). About 75% of pea paralogs, specific to Pisum or to the Trifolieae/Fabae clade, show Ks < 0.4, while most specific to inverted-repeat-lacking clade (IRLC) have Ks just below ~0.4 and for the Leguminosae lineages Ks > 0.4 (Supplementary Fig. 8). In sharp contrast, for M. truncatula paralogs, the Ks distribution is higher than in pea, except for those specific to the Leguminosae lineage where Ks is close to the PWGD-peak (Supplementary Fig. 8). We used synteny as an additional criterion to select a subset of paralog pairs in pea and M. truncatula (Fig. 2d). Many of these pea paralogs appeared to be in tandem and have lower Ks (~0.2) than in M. truncatula (Ks ~ 0.5). Gene number, high whole-genome mutation rate, high proportions of recent paralogs and Pisum-specific genes are all indicative of more frequent gene gain or loss in pea, most likely associated with genome size expansion about 24.7 and 17.5 Ma, coincident with the divergence of the Fabeae from its sister tribes³³. The appearance of these paralogs at that time is intriguing and could be related to genome reorganization associated with TE expansion and/or removal³⁴.

The massive increase of Ty3-gypsy, and to a lesser extent Ty1-copia, LTR-retrotransposons accounts for most of the genome size differences between pea and M. truncatula, Trifolium pratense L., L. japonicus, P. vulgaris or G. max^10,11,35 (Supplementary Table 10). Investigation of TE representation in Pisum species and subspecies confirmed that TE dynamics has shaped Pisum diversity through successive expansions and deletions (Fig. 3a and Supplementary Dataset 6). P. fulvum has fewer of several retroelements compared to cultivated pea and an increased content of Ogre retroelements. Wild P. s. elatius TE representation is intermediate between P. fulvum and cultivated pea. To determine the historical dynamics of the different Ty3-gypsy and Ty1-copia retroelements in the pea genome, we analyzed the divergence of the reverse transcriptase (RT) and integrase (INT) sequences of different TE lineages, revealing different evolutionary patterns among lineages (Fig. 3b,c). For example, Angela elements are all relatively young, consistent with either an intense and recent burst of insertion or a strong selection against Angela elements. This is in marked contrast to TatV elements, which are the most ancient (Fig. 3b). Interestingly, all TE lineages that showed significant representational differences among Pisum species and subspecies were, on average, older or of the same age as Ogre elements (Fig. 3c).

**Fig. 3: TE evolution in the pea genome.**

Paleohistory of modern legume genomes

To assess the paleohistory of modern legume genomes³⁶, we performed homology and synteny analyses (Supplementary Notes) with representatives of the Galegoid (P. sativum, L. japonicus, M. truncatula and C. arietinum) and Millettoid (C. cajan, G. max, P. vulgaris, V. radiata and Vigna angularis (Willid.) Ohwi & H. Ohashi) clades, together with one diploid peanut relative (Arachis duranensis Krapov. & W.C. Greg). Within the Galegoid subfamily, we identified 12,025 ancestral genes (that is, conserved between the four investigated species) defining an ancestral Galegoid karyotype (AGK) of eight conserved ancestral regions (CARs). The pea genome differentiated from this AGK through at least three chromosomal fissions, four fusions and a translocation between chromosomes Ps1 and Ps5. The genome of the closely related M. truncatula evolved through two fissions, two fusions and one translocation (between Mt4-Mt8 (ref. ³⁷), Supplementary Fig. 9). The five Millettoid genomes had 12,387 ancestral genes, defining an ancestral Millettoid karyotype (AMK) of 16 CARs. We then compared AGK, AMK and A. duranensis, an outgroup of the Galegoid and Millettoid subfamilies and identified 25 CARs with 13,181 protogenes. Merging CARs sharing partial synteny between a subset of these extant Millettoid and Galegoid genomes elucidated the ancestral legume karyotype (ALK), consisting of a minimum of 19 proto-chromosomes. We propose a legume evolutionary scenario from the reconstructed ancestral karyotypes showing that the legume genomes have been massively rearranged during their evolution (Fig. 4 and Supplementary Table 11). This approach delivered the first reconstruction of the Legume (ALK) as well as Galegoid (AGK) and Millettoid (AMK) subfamily ancestors and updated the publicly available catalog of paralogous and orthologous gene relationships between extant legume genomes (https://urgi.versailles.inra.fr/synteny/legumes) for translational research on conserved agronomical traits.

**Fig. 4: Legume evolutionary history.**

Pisum genome structure evolution

‘Caméor’ shows a translocation compared to the ancestral Galegoid karyotype and while translocations within Pisum have long been known^20,21,22, identifying the chromosomes involved suffered from the lack of clear chromosome identification. Cytological analyses³⁸ identified pairwise crosses between (1) P. sativum, including northern P. humile, (2) P. elatius, including southern P. humile and (3) P. fulvum, which gave rise to chromosomal rings during F₁ meiosis and to low hybrid fertility, suggesting that chromosome translocations accompanied Pisum evolution. To reassess these events in the light of the pea genome assembly, we sequenced single-chromosome samples isolated from three accessions that were used by Ben-Ze’ev and Zohary³⁸ (Supplementary Notes). These three lines were considered archetypes of wild species and subspecies: ‘703’ for P. fulvum, ‘721’ for P. elatius and ‘711’ for southern P. humile. DNA amplified from ~40 single chromosomes obtained for each (Supplementary Fig. 10 and Supplementary Table 12) was sequenced. Mapping reads from each chromosome sample to the ‘Caméor’ pseudomolecules identified the correspondence between the wild pea and Caméor chromosomes (Fig. 5a,b and Supplementary Fig. 11). All wild pea chromosomes were assigned to ‘Caméor’ chromosomes, but for accessions ‘711’, ‘721’ and ‘703’, reads from chromosome samples corresponding to pseudomolecule 5 mapped only from 0 to 465 Mb of this pseudomolecule and chromosome samples with reads mapping from ~465 Mb to the end of ‘Caméor’ pseudomolecule 5 also mapped to another ‘Caméor’ chromosome (Fig. 5b). For accessions ‘711’ and ‘721’, these mapped predominantly to pseudomolecule 1 of ‘Caméor’, while for ‘703’ they mapped predominantly to pseudomolecule 3 of ‘Caméor’ (Fig. 5b). This indicated a translocation between chromosomes 5 and 1 in ‘711’ and ‘721’ and between chromosomes 5 and 3 in ‘703’ as compared to ‘Caméor’. Investigating synteny between pea and other Galegoid species suggested that the ancestral Pisum karyotype resembled the present P.elatius/humile karyotype rather than the cultivated pea karyotype. Indeed, ‘Caméor’ chromosome 5 is syntenic with M. truncatula chromosome 3 from 0 to 467 Mb and with chromosome 2 of M. truncatula from 467 Mb to its end (Fig. 4). This breakpoint in synteny is close to the translocation point but lies 2 Mb closer to the centromere of chromosome 5. Similarly, a breakpoint in synteny between ‘Caméor’ chromosome 5 and C. arietinum chromosome 5 occurred at this translocation point, with the translocated fragment being syntenic with C. arietinum chromosome 1 (Fig. 4). C. arietinum chromosome 1 and M. truncatula chromosome 2 are syntenic with ‘Caméor’ chromosome 1 and the end of ‘Caméor’ chromosome 5. Considering the AGK reconstruction (Supplementary Fig. 9), the ancestral Pisum chromosome 1 probably contained the translocated fragment (Fig. 5c), as in the P. elatius/humile karyotype. This ancestral chromosome would then have been involved in two independent rearrangements, with the end of chromosome 1 translocated to chromosome 3 in P. fulvum and to chromosome 5 in cultivated pea. What remains unsolved is what role, if any, this breakage may have played in Pisum evolution and adaptation. We note that the repetitive 5 S rRNA gene sequences³⁹ are present at these chromosomal regions (end of chromosome 1, 3 and pericentromeric regions of chromosome 5) suggestive of a role in these translocations.

Pisum genetic diversity

Pisum is extremely diverse in terms of phenotypes, and pea breeding could benefit from broad crosses, including introgressions from wild relatives⁴⁰. Reproductive barriers are not strict among Pisum species and subspecies⁴¹. Davis⁴² proposed that Pisum comprises two species, P. fulvum and P. sativum, with two subspecies: P. s. sativum, which includes all formerly distinguished cultivated types, and P. s. elatius, which includes all formerly distinguished wild types. Although useful, this classification does not clarify the relationships between wild and domesticated forms, or between former taxa. To help refine Pisum taxonomy and evolution, we resequenced the genomes of 36 Pisum accessions representing the range of diversity of the species and one Lathyrus sativus accession as an outgroup. We also included public data from seven Pisum accessions (Supplementary Dataset 7). Because the boundary between wild and cultivated Pisum is blurred by possible introgressions and/or migration, we reassessed the ‘wild’ and ‘cultivated’ status of accessions by scoring germination after imbibing freshly harvested seeds in water for 7 d. Free germination is indeed considered the most important pea domestication trait⁴⁰. The accessions presented a wide range of phenotypic diversity (Fig. 6a) as shown by principal component analysis (PCA) of plant morphology, phenology, seed productivity and quality traits, which separated wild, landrace and cultivar accessions (Supplementary Dataset 7).

**Fig. 6: The genetic relationships among the *Pisum* genus.**

Whole-genome resequencing reads were mapped onto the pea genome assembly and SNPs were called using BCFtools v.1.6. After filtering, 17,212,424 high-quality SNPs were identified. On 37,591,394 alleles, 51.6% were shared among wild, landrace and cultivar accessions, 25.6% were present only in wild accessions, 3.5% only in landraces and 0.5% only in cultivars (Fig. 6b). Mean nucleotide diversity (π) decreased 1.7-fold between wild accessions (π = 8.2 × 10⁻⁴) and landraces (π = 4.9 × 10⁻⁴), and 3.4-fold between wild accessions and cultivars (π = 2.4 × 10⁻⁴), showing moderate diversity reduction associated with pea domestication and breeding (Fig. 6b and Supplementary Fig. 12). This reduction was accompanied by a high mean pairwise population differentiation (F_ST) between wild accessions and cultivars (F_ST = 0.213) and an increase in linkage disequilibrium (LD) across the genome (Supplementary Fig. 13). Mean D Tajima values were significantly positive in wild accessions (D = 0.424) and slightly negative in cultivars, consistent with recent selection (D = −0.038, Supplementary Fig. 12). Phylogenetic analysis of a subset of two million SNPs clustered accessions according to assigned taxon (Fig. 6c): P. fulvum clustered separately from P. sativum accessions. P. sativum accessions clustered according to their cultivated status (wild or cultivated) as well as their geographical origin and usage type (that is, as fodder, dry or fresh seeds). Wild P. s. elatius included former P. elatius and P. humile and cultivated P. s. sativum included P. transcaucasicum, P. asiaticum, P. arvense. P. hortense, but not Pisum abyssinicum. The two P. abyssinicum accessions clustered among the wild P. sativum elatius/humile accessions from Israel while presenting phenotypic attributes of cultivated accessions, including free germination (Fig. 6c). This strengthens the hypothesis of an independent domestication of this taxon from a distinct P. s. elatius⁴³ followed by a migration to Abyssinia possibly through ancient human trading routes⁴⁴. The chloroplast phylogenetic tree supports this scenario (Supplementary Fig. 14). Notably, the P. elatius accession closest to the cultivated pea was PI639984, an accession collected in 1986 on an abandoned agricultural terrace in Turkey, within the area where pea cultivation emerged.

Seed storage protein gene families

Pea is an important source of dietary proteins for humans and domestic animals. Fractionation of pea seeds into protein, starch and fiber is expanding rapidly in North America and Europe in response to the demand for plant-based protein. Pea seed storage proteins (SSPs) include legumin, vicilin and convicilin globulins and PA1 and PA2 albumins, whose nutritional and technological properties vary according to their amino-acid content and secondary structure^45,46. We searched the pea genome assembly for SSP genes using all pea storage protein genes available in UNIPROT (Supplementary Notes) and found 12, 9, 2, 8 and 9 genes encoding legumin, vicilin, convicilin, PA1 and PA2, respectively, as well as a few pseudogenes (Supplementary Dataset 8).

The various SSPs that characterize the pea seed proteome vary in quantity in response to the environment⁴⁷. Their diversity is magnified by the range of (1) cleavage sites controlling pre-polypeptide cleavage (Supplementary Fig. 15) and (2) transcriptional regulatory regions. Several regulatory motifs, upstream of the SSP genes are presumed to modulate their expression^48,49 (Supplementary Dataset 8) dependent on developmental and environmental cues. The RY motif, reported to be required for SSP seed expression⁵⁰, was found upstream of all but three SSP genes, with some having seven upstream RY motifs. Other motifs were found upstream legumin genes (for example ABRE motif) or vicilin genes (for example ACGT motif). Expression analysis of some SSP genes (Fig. 7a and Supplementary Dataset 8), assessed by microfluidic quantitative PCR, showed that RY motifs were not systematically associated with seed specific expression. Examination of Legumin and Vicilin genes in pea and M. truncatula showed an overall conservation of tandem organization in these two species: clusters of SSP genes were found on syntenic pea and M. truncatula chromosomes, but gene copy number differed (Vicilin and Legumin genes on syntenic Ps3 and Mt7, Convicilin and Legumin genes on syntenic Ps6 and Mt1). Additional gene clusters were found in pea (Vicilin genes on Ps5 and Legumin genes on Ps6 and Ps4, Fig. 7b). Interestingly, all Legumin and Vicilin gene cluster positions in pea corresponded to reported SSP quantity loci⁵¹.

**Fig. 7: Pea seed storage protein gene families.**

Discussion

Pea is an important plant-based protein source for human food and animal feed. This reference genome provides a foundation to elucidate Pisum evolution. The Pisum common ancestor was probably cytogenetically like P. s. elatius, this taxon evolved across the Mediterranean and Middle East^40,52 and gave rise in the northern Middle East to P. s. sativum. P. fulvum diverged from the Pisum ancestor in the southern Middle East. P. abyssinicum, an Ethiopian cultivated form, is likely the result of a domestication event from a southern P. s. elatius ancestor and is independent of the domestication of P. s. sativum. Different lines of evidence suggested that the pea genome is evolving at a faster pace than other investigated Leguminosae genomes, potentially through transposon-mediated unequal recombination giving rise to gain or loss of genes, or ectopic double-strand break repair³⁴. Differential expansion and removal of these elements probably shaped genomes throughout the evolution of the Fabeae and notably within Pisum¹⁹, suggesting that repetitive elements were major drivers in the evolution of these large genomes. A valuable tool for basic discovery, this high-quality, annotated pea genome sequence will facilitate the characterization of its many known mutants, enhance pea improvement and allow more efficient use of the wide genetic diversity present in the genus.

Methods

Genome sequencing

To enable an optimal assembly of the large (~4.45 Gb) and complex (85% repetitive DNA) pea reference genome, more than 1,300 Gb sequence data (equivalent to 294-fold genome coverage) were generated using DNA extracted from fresh plant material of the French pea cultivar ‘Caméor’ (Supplementary Notes). The data included 100 and 150 bp Illumina reads and PacBio RSII read batches, one N50 = 9,500 kb and another N50 = 15,917 kb. The Illumina reads derived from paired-end libraries with insert sizes of 300, 500, 600 and 800 bp and ten mate-pair libraries with insert sizes between 3 and 17 kb. All sequence data have been deposited in EBI Bioproject PRJEB30482 (Illumina reads) and NCBI Bioproject PRJNA509681 (PacBio reads). Reads 150 bp long, 30-fold genome coverage equivalent, were randomly sampled and genome size was estimated using the GenomeScope program (http://qb.cshl.edu/genomescope/). The estimated genome size of ‘Caméor’ through this method (4.426 Gb) was consistent with previous estimates obtained by flow-cytometry⁹.

De novo assembly

The pea nuclear genome was assembled into seven pseudomolecules in a step-wise manner. The assembly pipeline is summarized in Supplementary Fig. 2. Shotgun Illumina reads were assembled using SoapdeNovo2 (ref. ⁵³) with 127 nt K-mer and the –R option in the ‘pregraph’ step. Contigs shorter than 500 nt were removed. The remaining contigs were scaffolded with SSPACE 2.0 (ref. ⁵⁴) using the information captured by mate-pair reads; scaffolds 2 kb or larger, and validated by at least five read pairs, were considered as part of a first draft assembly. This assembly was improved with layers of data from physical maps (Whole-Genome Profiling, WGP), optical maps (Bionano maps), various high-density linkage maps (Genetic maps) and synteny to the M. truncatula genome. The physical map was produced using 295,680 BAC clones of cv. ‘Caméor’ pooled in a multi-dimensional manner. The BAC library was provided by INRA IPS2 and is available at INRA CNRGV (https://cnrgv.toulouse.inra.fr/fr/Banques/Pois); its average BAC insert size is 125 kb and its genome coverage is 9.3×. The BAC DNA was digested with HindIII/MseI; fragments were ligated, amplified by PCR, and sequenced using Illumina HiSeq 2000 platform (100 nt read length). The reads were clustered according to the parental BAC clones’ ID and assembled using FPC software (Keygene N.V.). The physical map was generated according to Gali et al.²³ and used to link the scaffolds in the draft assembly into super-scaffolds using MaGuS 1.0 (ref. ⁵⁵) and the WGP technology⁵⁶ (Keygene N.V.). Gaps in super-scaffolds were closed with GapCloser^57,58 using paired-end, mate-pair and PacBio reads. Super-scaffolds were manually curated for inter and intrachromosomal chimeras (Supplementary notes) using (1) sequences obtained from single chromosomes isolated by flow-cytometry sorting²⁴ (Supplementary Fig. 3, Bioprojects at ENA PRJEB30482, and at NCBI PRJNA507688) and (2) an ultra-high-density genetic map obtained from 162 RILs derived from the cross between ‘Caméor’ × ‘Melrose’ (Pop6)²⁵ and genotyped by skim genotyping-by-sequencing⁵⁹ (Supplementary Dataset 1, Bioproject PRJNA507685), it is worth noting this map included 468,448 SNPs and represents the highest density genetic map published for pea. Manually corrected scaffolds were integrated into 24,623 super-scaffolds (L50 of 415 Kb; Supplementary Table 2) using an optical map generated from ‘Caméor’ high-molecular weight DNA prepared from the nuclei of young leaves following the IrysPrep protocols (BioNano Genomics; Supplementary Table 3). The curated super-scaffolds were anchored onto high-density genetic maps (derived from populations Pop4, 5, 7, 9 described by Tayeh et al.²⁵, and Pop6’s map described herein) using Allmaps⁶⁰ to form quasi-chromosomal pseudomolecules. The genome of the model legume M. truncatula v.4 (JCVI⁶¹) was used for scaffold orientation when no indication from pea genetic map. The assembly, the pea genome v.1a, is available at https://urgi.versailles.inra.fr/Species/Pisum and at the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under project PRJEB31320. A genome JBrowse is available at https://urgi.versailles.inra.fr/Species/Pisum.

Genome annotation: repetitive sequences, gene models and microRNAs

The REPET package v.2.6 (refs. ^62,63) was used to identify and annotate repetitive elements in contigs of the pea genome sequence as summarized in Supplementary Fig. 6. A sample from each pseudomolecule, consisting of 700 Mb of the longest scaffolds, served to compare the genome to itself using the pipeline TEdenovo to detect repeats present in at least three copies; 200 Mb were aligned to themselves to identify repeats and RepeatScout was applied to screen the remaining 500 Mb for repetitive low complexity DNA. Identified repeat sequences were clustered by multiple alignments to produce a library of consensus sequences. The repeat consensus sequences were classified according to their characteristics and redundancy using PASTEC⁶⁴ with Repbase (v.20.05). TEannot then mapped the repeat consensus sequence library produced by TEdenovo against the genome using a two-step approach⁶⁵. The first step identified consensus sequences with at least one full-copy fragment in the genome. The second step identified the copies of these elements in the genome. The annotation of transposon-protein-domains was refined using DANTE (RepeatExplorer server; https://repeatexplorer-elixir.cerit-sc.cz/) against a custom database^66,67. The hits were filtered to cover at least 80% of the reference sequence, minimum identity of 35% and minimum similarity of 45%, allowing for a maximum of three interruptions (frameshifts or stop codons). TE classes were defined according to Wicker et al.⁶⁸ and TE lineages were defined according to Novak et al.⁶⁷. The density of TE consensus copies according to their lineages were computed along pseudomolecules and visualized using 1 Mb windows each 500 kb step (Fig. 1). The identification and quantification of repetitive sequences from unassembled Illumina reads were done using RepeatExplorer. The pipeline was run with default parameters, using 3,972,596 paired-end reads (100 nt) as input.

Gene models were predicted de novo using AUGUSTUS v.3.0.3 (ref. ⁶⁹) and Fgenesh v.7.1.1 (ref. ⁷⁰) trained on the M. truncatula gene matrix once repetitive DNA was masked using maskfasta v.5.1.22. Protein homology searches (TBLASTN) were done using sequences from: (1) C. arietinum (GA_v.1.0), G. max (275_Wm82.a2.v.1), M. truncatula (Mt4.0 v.1) retaining hits with an E value < 1 × 10⁻⁵⁰ and more than 50% of the protein length mapped; (2) UniProt and Swissprot databases retaining hits with an E-value < 1 × 10^–20; (3) pea DNA and RNA sequences from IPK and NCBI retaining hits with an E value < 1 × 10^–50 and identity criteria ≥98%. Retained sequences were analyzed using Exonerate v.2.2.0 (ref. ⁷¹) to generate protein-based gene models. To refine the annotation and identify splice junctions, RNA-seq reads from a series of libraries were aligned to the genome assembly using the ultrafast universal RNA-seq aligner STAR (v.STAR_2.4.0j⁷²: Twenty RNA-seq libraries from various plant tissues of ‘Caméor’ at different plant growth stages (188,446,568 reads) are described in Alves-Carvalho et al.⁷³ and 12 highly dense libraries generated from cultivar Kaspa inoculated with isolates of the fungal complex causing Ascochyta blight and mock-inoculated leaf tissue (160,332,071 reads) are described by Turo⁷⁴ and available in NCBI Bioproject PRJNA510273. A set of assembled transcripts were obtained from the alignments using StringTie (v.1.2.2) (ref. ⁷⁵) and Trinity-GG (v.2.0.6) (ref. ⁷⁶). Integration of all above gene models and identification of alternative splice sites were done using the annotation pipeline PASA v.2.0.2, which includes Evidence Modeler v.1.1.1 (ref. ⁷⁷). The completeness of the gene repertoire was assessed using BUSCO v.3.0.2 (ref. ⁷⁸).

Putative gene functions were assigned using the best match to SwissProt and TrEMBL databases⁷⁹. Motifs and domains were searched using InterProScan v.5 (refs. ^80,81) against all default protein databases including ProDom, PRINTS, PfamA, SMART, TIGRFAM, PrositeProfiles, HAMAP, PrositePatterns, SITE, SignalP, TMHMM, Panther, Gene3d, Phobius, Coils and CDD. In addition, we used TrapID (http://bioinformatics.psb.ugent.be/webtools/trapid/), and the PLAZA v.2.5 reference database to assign each transcript to a reference gene family and transfer functional annotation including GO for each transcript. Additionally, an embedded pipeline of EuGene v.4.2 (refs. ^82,83)was launched using the same proteins and RNA-seq databases. This annotation procedure yielded 34,137 gene models and was used to curate gene models manually.

For the identification of miRNA, developing seeds of ‘Caméor’ were harvested at two stages (12 d and 22 d after pollination). RNA was purified and small RNA libraries were produced and sequenced according to Lelandais-Briere et al.⁸⁴. Reads were pooled, trimmed using fastx clipper and a minimum length of 15 nt, and mapped to identify miRNA using ShortStacks (v.3.8.5). ShortStacks classify putative miRNA following several criteria: Y miRNA classification indicates that the miRNA sequence passed all tests including sequencing of the exact miRNA-star, supporting a de novo annotation of a new miRNA family. N15 miRNA classification indicates that the miRNA sequence passed all tests except that the miRNA-star was not sequenced. Y and N15 miRNA were mapped against miRbase v.22 mature miRNA sequences using ssearch36, and only alignment with at least 95% of identity were conserved. For N15 miRNAs, only those with a match to a known plant miRNA were kept. Y miRNAs without annotation were considered newly identified miRNA. Finally, targets were predicted using TargetFinder and kept only if their score was greater than 3. Fifty-nine miRNAs showed at least one putative target (Supplementary Dataset 3b).

Genome structure and evolution

To identify putative paralogous and orthologous gene clusters, protein-coding genes sets from pea and 21 other eudicot species (Supplementary Dataset 4) were analyzed using Orthofinder v.2.1.2 and its defaults parameters⁸⁵ with the Diamond v.0.9.14 option instead of BLAST⁸⁶ (Supplementary Notes). Before the analysis, genome assemblies and annotations were subjected to minor amendments to exclude plastid sequence data, inconsistencies in the headings format between fasta and gff3 files, spurious stop codons or sequences with premature stop codons and alternative transcripts. In cases where there were two or more transcript variants, the longest transcript was selected to represent the coding region (input data is summarized in Supplementary Dataset 4). The sequence divergence for all possible pairs of paralogs within each orthogroup was estimated based on pairwise Ks. Protein sequences were aligned using MUSCLE v.3.8.31 (ref. ⁸⁷) and converted into codon aligned nucleotides using the bioruby-alignment package⁸⁸. Ks values were calculated through maximum likelihood estimation (MLE) using the ‘codeml’⁸⁹ and ‘yn00’⁹⁰ programs in the PAML package⁹¹ and using the following parameters: runmode = −2, set-type = 1 (codon sequences), alpha fixed to 0, codonFreq = 2 (F2X4). For that purpose, we created an in-memory sqlite database including the whole-genome assemblies and annotations to identify pairs of paralogs based on the Orthogroups.csv file. For all Ks distribution histograms, the x axes were drawn on a log-scale with non-transformed Ks values to represent the decreasing relative importance of differences as the Ks value increases resulting from the stochastic nature and saturation of Ks calculations⁹². The range of values, 0.01–50, were binned into 400 interval-bins. To reduce the exponential effect of spurious homologs on background noise, we filtered the data based on orthogroup size. The histograms in Supplementary Fig. 7 represent paralogs pairs in orthogroups of 8 to 20 genes or less: for each species, the orthogroup size was determined based on the genome multiples for events leading to the eudicot divergence onwards (Supplementary Dataset 4).

Based on both homology and synteny, we further investigated the paleohistory of legume genomes. An evolutionary scenario was obtained following the method described in Pont et al.⁹³ based on synteny relationships identified between between pea (P. sativum), peanut diploid ancestor (Arachis duranensis,⁹⁴), lotus (Lotus japonicus¹¹), barrel medic (Medicago truncatula¹⁰), chickpea (Cicer arietinum⁹⁵), pigeonpea (Cajanus cajan⁹⁶), soybean (Glycine max¹²), common bean (Phaseolus vulgaris⁹⁷), mungbean (Vigna radiata⁹⁸) and adzuki bean (Vigna angularis⁹⁹). Genomes were aligned to define conserved or duplicated gene pairs based on alignment parameters, groups of conserved genes were clustered or chained into synteny blocks (excluding blocks with less than five genes) corresponding to independent sets of blocks sharing orthologous relationships in modern species. Then, conserved groups of gene-to-gene adjacencies defining identical chromosome-to-chromosome relationships between all the extant genomes were merged into CARs. CARs were merged into protochromosomes based on partial synteny observed between a subset of the investigated species. The ancestral karyotype is a ‘median’ or ‘intermediate’ genome consisting of proto-chromosomes defining a clean reference gene order, common to the extant species investigated. From the reconstructed ancestral karyotype an evolutionary scenario was then inferred taking into account the fewest number of genomic rearrangements, which may have occurred between the inferred ancestors and the modern genomes (Supplementary Notes).

Pisum diversity

Genomic resequencing data of 44 accessions were used to study the pea genome diversity (Supplementary Dataset 7). Sixteen genotypes, including Caméor, were resequenced as described in Tayeh et al.²⁵, as part of the ANR program GENOPEA (Bioproject PRJNA285605). Another 16 genotypes were chosen^25,52,100 and resequenced in the Pisdom Burgundy region PARI project (FABER M. Siol, Bioproject PRJNA431567). Nuclear DNA was extracted using the Floraclean Plant DNA isolation kit as recommended by MP Biomedicals (http:/www.mpbio.com). A quality control was performed for all DNA samples with Quant-iT PicoGreen (Invitrogen) and by measuring absorbance and checking electrophoretic profile on agarose gel. Illumina paired-end shotgun indexed libraries were prepared from one µg of DNA per genotype, using the TruSeq DNA PCR-free LT Sample Preparation Kit (Illumina Inc., https://www.illumina.com/). Paired-end sequencing 2 × 100 sequencing by synthesis (SBS) cycles was performed on a HiSeq 2000, TruSeq V.3 chemistry according to manufacturer’s instructions. Additionally, three genotypes (DSP, 90–2131, Kiflica; Bioproject PRJNA509279) were sequenced by a commercial company (NovoGene) using Illumina HiSeq, paired-end 150 bp from 350 bp insert DNA libraries and three accessions (‘703’, ‘711’, ‘721’) were resequenced at GENOSCOPE on an HiSeq2500 using the Nextera Mate Pair Sample preparation kit of Illumina (Bioproject PRJEB30482) as described above for the genome sequencing. All pea resequenced genotypes, except Zhongwan6 for which we had no seeds, were evaluated in the glasshouse for classical growth and development traits (Supplementary Notes and Supplementary Dataset 1). Two pots per accessions and six seeds per pot were sown in February 2017 in 7 l pots. In total, 59 phenotypic traits were scored on the 44 genotypes, including seed protein composition traits. Germination tests were conducted on freshly harvested seeds (five seeds per accession, three replicates) and mean germination rates were calculated.

Resequencing data for the 43 accessions of Pisum and the accession of Lathyrus sativus were mapped onto the pea genome v.1a assembly using BWA MEM¹⁰¹, keeping only unique mapping with a quality higher or equal to 30. Optical duplicates were removed with PICARD tools (http://picard.sourceforge.net/). Altogether, 95,326,251 SNPs were called using BCFtools v.1.6 (ref. ¹⁰¹) mpileup and call. All callings supported by less than three reads were reimputed. All markers that were homozygous or heterozygous in ‘Caméor’ as compared to the reference were deleted using SNPSift¹⁰². We produced two different datasets depending on the type of analysis to be conducted. For phylogenetic analysis, 2,026,659 SNPs with less than five missing data and ten heterozygotes were filtered using vcftools¹⁰³ and plink¹⁰⁴ (Phylogeny SNP dataset). For diversity analysis, 17,212,608 SNPs with less than ten missing data and ten heterozygotes were filtered (Diversity SNP dataset). In this dataset, accessions L180 and Zhongwan6 were removed.

The ‘Phylogeny’ SNP dataset was used to build a phylogenetic tree of the 44 accessions using IQ-Tree v.1.6 (ref. ¹⁰⁵). TVM + R10 was selected as the best model for a maximum likelihood tree using Modelfinder¹⁰⁶. The tree was inferred with 1000 replicates of ultrafast likelihood bootstrap¹⁰⁷ and SH-aLRT test to obtain bootstrap branch support values. The number of alleles present in the different Pisum groups were computed using the ‘Diversity’ dataset. An in-house script was used to transform SNP information into alleles coded in an allele dose 012 format. The VennCounts function of the R package limma¹⁰⁸ was used to calculate Venn diagrams for each group.

Resequencing reads obtained for wild, landrace and a few cultivar accessions were mapped on the genome using NGM by default¹⁰⁹ (Supplementary Notes). Counts were computed using FeatureCounts¹¹⁰ on specific associated lineage domains. The reads mapping onto TE domains were counted and normalized by dividing the number of counts on a specific domain by the total number of counts on all TE domains and by the total number of occurrences of each domain in the pea genome v.1a assembly per million.

Statistical tests were performed as follows. The variation of TE representation among the different Pisum species and subspecies was tested using proc GLM (SAS Institute). Different models were tested by analysis of variance (ANOVA): Model1 tested the different TE representation between P. fulvum, P. sativum wild and P. s. sativum groups; Model 2, between P. fulvum, P. sativum wild, P.sativum landraces and P.sativum cultivars; and Model 3 between P. fulvum, P. sativum wild, P. abyssinicum, P. sativum landraces and P.sativum cultivars. Counts were normalized by dividing the number of counts on a specific domain by the total number of counts on all TE domains and by the total number of occurrence of each domain in the pea genome v.1a assembly per million. For Model 2, mean least square predicted values of normalized mapped reads’ count and their standard deviations were computed and two-tailed t-tests were performed for eight selected TE lineages.

Translocation analyses

To identify chromosome translocations, we sequenced single chromosomes isolated by flow sorting from the three accessions P. fulvum ‘703’, P. sativum elatius ‘721’ and P. sativum southern humile ‘711’ characterized by Ben-Ze’ev and Zohary³⁸ and compared the sequences with the sequence assembly of P. sativum cv. Caméor. Preparation of suspensions of intact mitotic chromosomes, flow cytometric analysis and sorting was done according to Neumann et al.²⁴. For each genotype, 84 chromosomes were flow-sorted and single-chromosome DNA amplification was done (Supplementary Notes). Of these, a total of 137 DNA samples were selected and sequenced (Supplementary Notes). To identify the pseudomolecule that each sample corresponded to, we mapped the chromosome sequence data onto the genome assembly of P. sativum cv. Caméor. This identified the correspondence between chromosome samples and pseudomolecules.

Seed storage proteins annotation

A list of storage protein sequences was set up by combining sequences retrieved from the pea gene atlas, UNIPROT and NCBI and searched for homologies in the pea genome assembly (Supplementary Dataset 4). Candidate sequences were manually curated using protein alignments, RNA-seq data and gene models by euGene. Known regulatory motifs were searched in the 5′ region of the identified gene models (Supplementary Dataset 4). Best homology matches were search for in Uniprot Genbank and the M. truncatula genome v.4. To assess seed storage protein gene expression, total RNA from seeds was extracted using an RNeasy plant mini kit (Qiagen, www.qiagen.com) after grinding plant tissue in liquid nitrogen using a pestle and mortar. cDNA were prepared according to Gallardo et al.¹¹¹. Other cDNAs were produced as described in Alves-Carvalho et al.⁷³. High-throughput real-time quantitative PCR was performed using the Biomark microfluidic system from Fluidigm according to manufacturer’s protocol. Primers used are listed in Supplementary Dataset 4. Expression was normalized as in Alves-Carvalho et al.⁷³.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All raw sequencing data are available at the European Nucleotide Archive (PRJEB30482) and as an NCBI BioProject (PRJNA507685, PRJNA507688, PRJNA509681, PRJNA510273, PRJNA285605, PRJNA431567, PRJNA509279). The pea genome v.1a reference assembly is available for download and JBrowse at https://urgi.versailles.inra.fr/Species/Pisum. The genome is also available at the European Nucleotide Archive under project PRJEB31320.

References

Burstin, J., Gallardo, K., Mir, R. R., Varshney, R. K. & Duc, G. Improving protein content and nutrition quality, in Biology and Breeding of Food Legumes (eds Pratap, A. & Kumar, J.) 314–328 (CAB International, 2011).
Guillon, F. & Champ, M. M.-J. Carbohydrate fractions of legumes: uses in human nutrition and potential for health. Br. J. Nutr. 88, S293–S306 (2002).
CAS PubMed Google Scholar
Dahl, W. J., Foster, L. M. & Tyler, R. T. Review of the health benefits of peas (Pisum sativum L.). Br. J. Nutr. 108, S3–S10 (2012).
CAS PubMed Google Scholar
Foschia, M., Horstmann, S. W., Arendt, E. K. & Zannini, E. Legumes as functional ingredients in gluten-free bakery and pasta products. Ann. Rev. Food Sci. Technol. 8, 75–96 (2017).
CAS Google Scholar
Nemecek, T. et al. Environmental impacts of introducing grain legumes into European crop rotations. Eur. J. Agron. 28, 380–393 (2008).
Google Scholar
Crews, T. E. & Peoples, M. B. Legume versus fertilizer sources of nitrogen: ecological tradeoffs and human needs. Agric. Ecosyst. Environ. 102, 279–297 (2004).
Google Scholar
Poore, J. & Nemecek, T. Reducing food’s environmental impacts through producers and consumers. Science 360, 987–992 (2018).
CAS PubMed Google Scholar
Zohary, D. & Hopf, M. Domestication of Plants in the Old World (Oxford Univ. Press, Oxford, 2000).
Google Scholar
Doležel, J. et al. Plant genome size estimation by flow cytometry: inter-laboratory comparison. Ann. Bot. 82, 17–26 (1998).
Google Scholar
Young, N. D. et al. The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480, 520–524 (2011).
CAS PubMed PubMed Central Google Scholar
Sato, S. et al. Genome structure of the legume, Lotus japonicus. DNA Res. 15, 227–239 (2008).
CAS PubMed PubMed Central Google Scholar
Schmutz, J. Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183 (2010).
CAS PubMed Google Scholar
Mendel, G. Versuche über Pflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brünn, Bd. IV für das Jahr 1865. Abhandlungen, 3–47 (1866).
Ellis, T. H. N., Hofer, J. M. I., Timmerman-Vaughan, G. M., Coyne, C. J. & Hellens, R. P. Mendel, 150 years on. Trends Plant Sci. 16, 590–596 (2011).
CAS PubMed Google Scholar
Tayeh, N. et al. Genomic tools in pea breeding programs: status and perspectives. Front. Plant Sci. 6, 1037 (2015).
PubMed PubMed Central Google Scholar
Ellis, T. H. N. & Poyser, S. J. An integrated and comparative view of pea genetic and cytogenetic maps. New Phytol. 153, 17–25 (2002).
CAS Google Scholar
Flavell, R. B., Bennett, M. D., Smith, J. B. & Smith, D. B. Genome size and the proportion of repeated nucleotide sequence DNA in plants. Biochem. Genet. 12, 257–269 (1974).
CAS PubMed Google Scholar
Murray, M. G., Peters, D. L. & Thompson, W. F. Ancient repeated sequences in the pea and mung bean genomes and implications for genome evolution. J. Mol. Evol. 17, 31–42 (1981).
CAS Google Scholar
Macas, J. et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS One 10, e0143424 (2015).
PubMed PubMed Central Google Scholar
Hammarlund, C. & Håkansson, A. Parallelism of chromosome ring formation, sterility and linkage in Pisum. Hereditas 14, 97–98 (1930).
Google Scholar
Sansome, E. Segmental interchange lines in Pisum sativum. Nature 139, 113 (1937).
Google Scholar
Lamm, R. & Miravalle, R. J. A translocation tester set in Pisum. Hereditas 45, 417–440 (1959).
Google Scholar
Gali, K. K. et al. Development of a sequence-based reference physical map of pea (Pisum sativum L.). Front. Plant Sci. 10, 323 (2019).
PubMed PubMed Central Google Scholar
Neumann, P., Pozárková, D., Vrána, J., Doležel, J. & Macas, J. Chromosome sorting and PCR-based physical mapping in pea (Pisum sativum L.). Chromosome Res. 10, 63–71 (2002).
CAS PubMed Google Scholar
Tayeh, N. et al. Development of two major resources for pea genomics: the GenoPea 13.2K SNP Array and a high density, high resolution consensus genetic map. Plant J. 84, 1257–1273 (2015).
CAS PubMed Google Scholar
Neumann, P. et al. Stretching the rules: monocentric chromosomes with multiple centromere domains. PLoS Genet. 8, e1002777 (2012).
CAS PubMed PubMed Central Google Scholar
Pellicer, J., Hidalgo, O., Dodsworth, S. & Leitch, I. Genome size diversity and its impact on the evolution of land plants. Genes 9, 88 (2018).
PubMed Central Google Scholar
Bennett, M. C. & Leitch, I. J. Plant DNA C-values Database release 6.0 (FAIRsharing.org, 2012); https://doi.org/10.25504/FAIRsharing.7qexb2
Hane, J. K. et al. A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant–microbe interactions and legume evolution. Plant Biotechnol. J. 15, 318–330 (2017).
CAS PubMed Google Scholar
Blixt, S. Mutation genetics in Pisum. Agric. Hort. Genet. 30, 1–293 (1972).
Google Scholar
Cannon, S. et al. Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. Mol. Biol. Evol. 32, 193–210 (2015).
CAS PubMed Google Scholar
Bowers, J. E., Chapman, B. A., Rong, J. & Paterson, A. H. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003).
CAS PubMed Google Scholar
Lavin, M., Herendeen, P. S. & Wojciechowski, M. F. Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary. Syst. Biol. 54, 575–594 (2005).
PubMed Google Scholar
Li, S. F. et al. Chromosome evolution in connection with repetitive sequences and epigenetics in plants. Genes 8, 290 (2017).
PubMed Central Google Scholar
De Vega, J. J. et al. Red clover (Trifolium pratense L.) draft genome provides a platform for trait improvement. Sci. Rep. 5, 17394 (2015).
PubMed PubMed Central Google Scholar
Lee, C., Yu, D., Choi, H. K. & Kim, R. W. Reconstruction of a composite comparative map composed of ten legume genomes. Genes Genom. 39, 111–119 (2017).
CAS Google Scholar
Kamphuis, L. G. et al. The Medicago truncatula reference accession A17 has an aberrant chromosomal configuration. New Phytol. 174, 299–303 (2007).
CAS PubMed Google Scholar
Ben-Ze'ev, N. & Zohary, D. Species relationships in the genus Pisum L. Isr. J. Bot. 22, 73–91 (1973).
Google Scholar
Neumann, P., Nouzová, M. & Macas, J. Molecular and cytogenetic analysis of repetitive DNA in pea (Pisum sativum L.). Genome 44, 716–728 (2001).
CAS PubMed Google Scholar
Ladizinsky, G. & Abbo, S. (eds.) The Pisum genus. in The Search for Wild Relatives of Cool Season Legumes 55–68 (Springer, 2015).
Kosterin, O. E. & Bogdanova, V. S. Reciprocal compatibility within the genus Pisum L. as studied in F₁ hybrids: 1. Crosses involving P. sativum L. subsp. sativum. Genet. Resour. Crop Evol. 62, 691–709 (2015).
Google Scholar
Davis, P. H. in Flora of Turkey and the East Aegean Islands Vol. 3 (ed P. H. Davis) 370–373 (Edinburgh Univ., 1970).
Weeden, N. F. Domestication of pea (Pisum sativum L.): the case of the Abyssinicum pea. Front. Plant Sci. 9, 515 (2018).
PubMed PubMed Central Google Scholar
Pagani, L. et al. Ethiopian genetic diversity reveals linguistic stratification and complex influences on the Ethiopian gene pool. Am. J. Hum. Genet. 91, 83–96 (2012).
CAS PubMed PubMed Central Google Scholar
Gabriel, I. et al. Variation in seed protein digestion of different pea (Pisum sativum L.) genotypes by cecectomized broiler chickens: 1. Endogenous amino acid losses, true digestibility and in vitro hydrolysis of proteins. Livest. Sci. 113, 251–261 (2008).
Google Scholar
Rubio, L. A. et al. Characterization of pea (Pisum sativum) seed protein fractions. J. Sci. Food Agric. 94, 280–287 (2014).
CAS PubMed Google Scholar
Bourgeois, M. et al. Dissecting the proteome of pea mature seeds reveals the phenotypic plasticity of seed protein composition. Proteomics 9, 254–271 (2009).
CAS PubMed Google Scholar
Casey, R. & Domoney, C. in Seed Proteins (eds Shewry, P. R. & Casey, R.) 171–208 (Kluwer Academic Publishers, 1999).
Yoshino, M., Nagamatsu, A., Tsutsumi, K. I. & Kanazawa, A. The regulatory function of the upstream sequence of the β-conglycinin α subunit gene in seed-specific transcription is associated with the presence of the RY sequence. Genes Genet. Syst. 81, 135–141 (2006).
CAS PubMed Google Scholar
Yamamoto, S., Nishihara, M., Morikawa, H., Yamauchi, D. & Minamikawa, T. Promoter analysis of seed storage protein genes from Canavalia gladiata DC. Plant Mol. Biol. 27, 729–741 (1995).
CAS PubMed Google Scholar
Bourgeois, M. et al. A PQL (protein quantity loci) analysis of mature pea seed proteins identifies loci determining seed protein composition. Proteomics 11, 1581–1594 (2011).
CAS PubMed Google Scholar
Smýkal, P. et al. Genomic diversity and macroecology of the crop wild relatives of domesticated pea. Sci. Rep. 7, 17384 (2017).
PubMed PubMed Central Google Scholar
Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012).
PubMed PubMed Central Google Scholar
Boetzer, M., Henkel, C. V., Jansen, H. J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578–579 (2011).
CAS PubMed Google Scholar
Madoui, M.-A. et al. MaGuS: a tool for quality assessment and scaffolding of genome assemblies with whole genome profiling^TM Data. BMC Bioinformatics 17, 115 (2016).
PubMed PubMed Central Google Scholar
van Oeveren, J. et al. Sequence-based physical mapping of complex genomes by whole genome rofiling. Genome Res. 21, 618–625 (2011).
PubMed PubMed Central Google Scholar
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
CAS PubMed Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
CAS PubMed PubMed Central Google Scholar
Bayer, P. E. et al. High-resolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. Theor. Appl. Genet. 128, 1039–1047 (2015).
PubMed Google Scholar
Tang, H. et al. ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol. 16, 3 (2015).
CAS PubMed PubMed Central Google Scholar
Tang, H. et al. An improved genome release (version Mt4.0) for the model legume Medicago truncatula. BMC Genomics 27, 312 (2014).
Google Scholar
Flutre, T., Duprat, E., Feuillet, C. & Quesneville, H. Considering transposable element diversification in de novo annotation approaches. PloS One 6, e16526 (2011).
CAS PubMed PubMed Central Google Scholar
Quesneville, H. et al. Combined evidence annotation of transposable elements in genome sequences. PLoS Comput. Biol. 1, e22 (2005).
PubMed Central Google Scholar
Hoede, C. et al. PASTEC: an automatic transposable element classification tool. PLoS ONE 9, e91929 (2014).
PubMed PubMed Central Google Scholar
Jamilloux, V., Daron, J., Choulet, F. & Quesneville, H. De novo annotation of transposable elements: tackling the fat genome issue. Proc. IEEE 105, 474–481 (2107).
Google Scholar
Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11, 378 (2010).
PubMed PubMed Central Google Scholar
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
PubMed Google Scholar
Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).
CAS PubMed Google Scholar
Keller, O. et al. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 6, 757–763 (2011).
Google Scholar
Solovyev, V. et al. Automatic annotation of eukaryotic genes, pseudogenes and promoters. Genome Biol. 7, S10 (2006).
PubMed PubMed Central Google Scholar
Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31 (2005).
PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
CAS PubMed Google Scholar
Alves-Carvalho, S. Full-length de novo assembly of RNA-seq data in pea (Pisum sativum L.) provides a gene expression atlas and gives insights into root nodulation in this species. Plant J. 84, 1–19 (2015).
CAS PubMed Google Scholar
Turo, C. J. Genomic Analysis of Fungal Species Causing Ascochyta Blight in Field Pea. PhD thesis, Curtin Univ. (2016).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotech. 33, 290 (2015).
CAS Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
PubMed Google Scholar
The UniProt Consortium. Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Res. 39, D214–D219 (2011).
Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
CAS PubMed PubMed Central Google Scholar
Cock, P. J. A., Grüning, B. A., Paszkiewicz, K. & Pritchard, L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. Peer J. 1, e167 (2013).
PubMed PubMed Central Google Scholar
Foissac, S. et al. Genome annotation in plants and fungi: EuGene as a model platform. Curr. Bioinf. 3, 87–97 (2008).
CAS Google Scholar
Badouin, H. et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546, 148–152 (2017).
CAS PubMed Google Scholar
Lelandais-Brière, C. et al. Genome-wide Medicago truncatula small RNA analysis revealed novel microRNAs and isoforms differentially regulated in roots and nodules. Plant Cell 21, 2780–2796 (2009).
PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, 157 (2015).
PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2014).
PubMed Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
CAS PubMed PubMed Central Google Scholar
Bonnal, R. J. P. et al. Biogem: an effective tool-based approach for scaling up open source software development in bioinformatics. Bioinformatics 28, 1035–1037 (2012).
CAS PubMed PubMed Central Google Scholar
Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).
CAS PubMed Google Scholar
Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–43 (2000).
CAS PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
CAS PubMed Google Scholar
Vanneste, K., de Peer, Van & Maere, Y. S. Inference of genome duplications from age distributions revisited. Mol. Biol. Evol. 30, 177–190 (2013).
CAS PubMed Google Scholar
Pont, C. et al. Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA. Genome Biol. 20, 29 (2019).
PubMed PubMed Central Google Scholar
Bertioli, D. J. et al. The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat. Genet. 47, 438–446 (2015).
Google Scholar
Varshney, R. K. et al. Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat. Biotech. 31, 240–246 (2013).
CAS Google Scholar
Singh, N. K. et al. The first draft of the pigeonpea genome sequence. J. Plant Biochem. Biotechnol. 21, 98–112 (2012).
PubMed Google Scholar
Schmutz, J. et al. A reference genome for common bean and genome-wide analysis of dual domestications. Nat. Genet. 46, 707–713 (2014).
CAS PubMed PubMed Central Google Scholar
Kang, Y. J. et al. Genome sequence of mungbean and insights into evolution within Vigna species. Nat. Commun. 5, 5443 (2014).
CAS PubMed Google Scholar
Kang, Y. J. et al. Draft genome sequence of adzuki bean Vigna angularis. Sci. Rep. 5, 8069 (2015).
PubMed PubMed Central Google Scholar
Siol, M. et al. Patterns of genetic structure and linkage disequilibrium in a large collection of pea germplasm. G3: Genes, Genomes, Genet. 7, 2461–2471 (2017).
CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
PubMed PubMed Central Google Scholar
Cingolani, P. et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 3, 35 (2012).
PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: A Tool Set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
CAS PubMed PubMed Central Google Scholar
Nguyen, L. T. et al. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2014).
PubMed PubMed Central Google Scholar
Kalyaanamoorthy, S. et al. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
CAS PubMed PubMed Central Google Scholar
Hoang, D. T. et al. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2017).
PubMed Central Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 4, 3 e47 (2015).
Google Scholar
Sedlazeck, F. J., Rescheneder, P. & Von Haeseler, A. NextGenMap: fast and accurate read mapping in highly polymorphic genomes. Bioinformatics 29, 2790–2791 (2013).
CAS PubMed Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2013).
PubMed Google Scholar
Gallardo, K. et al. A combined proteome and transcriptome analysis of developing Medicago truncatula seeds evidence for metabolic specialization of maternal and filial tissues. Mol. Cell. Proteomics 6, 2165–2179 (2007).
CAS PubMed Google Scholar

Download references

Acknowledgements

We thank F. Jacquin, M. Chabert-Martinello, C. Rond-Coissieux, M. Touratier, M. Naudet-Huart and F. Naudé for their expert assistance in preparing plant and DNA materials and in phenotyping accessions. We are thankful to V. Jamilloux (REPET) and J. Gouzy (Eugene) for their support, to Z. Dubská, M. Karafiátová and J. Weiserová for assistance in flow-cytometry chromosome sorting, to V. Vernoud for providing seed tissues for transcriptomics, to E. Bonin for the high-throughput q-PCR assays, to E. Marquand, A. Chauveau and D. Brunel for the generation and management of resequencing 32 accessions, to CEA-IG/CNG for providing access EPGV group to its DNA quality control service and their Illumina sequencers, to E. van der Vossen (Keygene) for the development of the physical map, to H. Bergès for providing the BAC library, to D. Pouchnik and M. Wildung for PacBio library preparation and sequencing, to C. Cruaud for miRNA sequencing, to M. Siol and P. Smykal for the choice of resequenced accessions, to M. Siol for initiating the PARI Pisdom project, to N. Hostáková for her assistance during initial phases of repeat analysis, to B. Noel for data submission, to Raphael Flores for making the legume synteny data publicly available, to R. Thompson, N. Tayeh and K. Avia for discussions and reviewing the manuscript, and to V. Malécot for helpful discussions on taxonomy. This project was supported by ANR France-Génomique (no. ANR-10-INBS-09; Illumina genome sequencing, chromosome resequencing, assembly), ANR Genopea (no. ANR-09-GENM-026; resequencing), Région Bourgogne Franche-Comté (Projet PARI Pisdom; resequencing), European FP7 project ‘Legumes for the Agriculture of Tomorrow’ (no. 613551; genetic mapping), Czech Science Foundation (no. 17-09750S, centromere and repeat analysis), Czech Ministry of Education, Youth and Sports ERDF project ‘Plants as a tool for sustainable global development’, no. CZ.02.1.01/0.0/0.0/16_019/0000827 (chromosome sorting and optical maps), AVRIL (France; annotation and mapping), Saskatchewan Pulse Growers (Canada, WGP), USA Dry Pea & Lentil Council, Northern Pulse Growers (USA, PacBio sequencing), the Australian Grains Research and Development Corporation (no. GRDC CUR00021, skim-GBS and chromosome sequencing), Australian Research Council (nos. LP160100030 and LP140100537 to D.E. and J.Batley) and the AUS-Aid Australian Awards for Africa Scholarship (to C.J.T. supervised by J.L.). J.L., R.A.S., C.J.T., D.E., P.E.B. and H.T.L. were supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia. P.E.B. acknowledges the support of the Forrest Research Foundation.

Author information

HueyTyng Lee
Present address: Department of Plant Breeding, IFZ Research Centre for Biosystems, Land Use and Nutrition, Justus Liebig University, Giessen, Germany
These authors contributed equally: Jonathan Kreplak, Mohammed-Amin Madoui.

Authors and Affiliations

Agroécologie, AgroSup Dijon, INRA, Université Bourgogne Franche-Comté Bourgogne, Université Bourgogne Franche-Comté, Dijon, France
Jonathan Kreplak, Grégoire Aubert, Anthony Klein, Cyril Fournier, Ayité Kougbeadjo, Morgane Térézol, Karine Gallardo & Judith Burstin
Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université Evry, Université Paris-Saclay, Evry, France
Mohammed-Amin Madoui & Patrick Wincker
Institute of Experimental Botany, Centre of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
Petr Cápal, Helena Toegelová, Zbyněk Milec, Jan Vrána & Jaroslav Doležel
Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
Petr Novák, Iva Vrbová, Pavel Neumann & Jiří Macas
Genoscope, Institut François Jacob, CEA, Université Paris-Saclay, Evry, France
Karine Labadie, Leo d’Agata, Caroline Belser, Wahiba Berrabah & Jean-Marc Aury
School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, Western Australia, Australia
Philipp E. Bayer, HueyTyng Lee, Jacqueline Batley & David Edwards
Crop Development Centre/Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
Krishna K. Gali, Bunyamin Tar’an & Thomas D. Warkentin
Centre for Crop and Disease Management, Curtin University, Bentley, Western Australia, Australia
Robert A. Syme
Department of Horticulture, Washington State University, Pullman, WA, USA
Dorrie Main
Etude du Polymorphisme des Génomes Végétaux, INRA, Université Paris-Saclay, Evry, France
Aurélie Bérard & Marie-Christine Le Paslier
UMR 1095 Génétique, Diversité, Ecophysiologie des Céréales, INRA, Université Clermont Auvergne, Clermont-Ferrand, France
Cécile Huneau & Jérome Salse
Centre for Crop and Disease Management, School of Molecular and Life Science, Curtin University, Bentley, Western Australia, Australia
Chala J. Turo
URGI, INRA, Université Paris-Saclay, Versailles, France
Nacer Mohellibi
GQE–Le Moulon, INRA, University of Paris-Sud, CNRS, AgroParisTech, Université Paris-Saclay, Gif-sur-Yvette, France
Matthieu Falque
USDA Agricultural Research Service, Pullman, WA, USA
Rebecca McGee & Clarice J. Coyne
Institute of Plant Sciences Paris-Saclay, INRA, CNRS, University of Paris-Sud, University of Evry, University Paris-Diderot, Sorbonne Paris-Cite, University of Paris-Saclay, Orsay, France
Abdelhafid Bendahmane
School of Biological Sciences, University of Auckland, Auckland, New Zealand
Noel Ellis
School of Agriculture and Environment, University of Western Australia, Perth, Western Australia, Australia
Judith Lichtenzveig

Authors

Jonathan Kreplak
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed-Amin Madoui
View author publications
You can also search for this author in PubMed Google Scholar
Petr Cápal
View author publications
You can also search for this author in PubMed Google Scholar
Petr Novák
View author publications
You can also search for this author in PubMed Google Scholar
Karine Labadie
View author publications
You can also search for this author in PubMed Google Scholar
Grégoire Aubert
View author publications
You can also search for this author in PubMed Google Scholar
Philipp E. Bayer
View author publications
You can also search for this author in PubMed Google Scholar
Krishna K. Gali
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. Syme
View author publications
You can also search for this author in PubMed Google Scholar
Dorrie Main
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Klein
View author publications
You can also search for this author in PubMed Google Scholar
Aurélie Bérard
View author publications
You can also search for this author in PubMed Google Scholar
Iva Vrbová
View author publications
You can also search for this author in PubMed Google Scholar
Cyril Fournier
View author publications
You can also search for this author in PubMed Google Scholar
Leo d’Agata
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Belser
View author publications
You can also search for this author in PubMed Google Scholar
Wahiba Berrabah
View author publications
You can also search for this author in PubMed Google Scholar
Helena Toegelová
View author publications
You can also search for this author in PubMed Google Scholar
Zbyněk Milec
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vrána
View author publications
You can also search for this author in PubMed Google Scholar
HueyTyng Lee
View author publications
You can also search for this author in PubMed Google Scholar
Ayité Kougbeadjo
View author publications
You can also search for this author in PubMed Google Scholar
Morgane Térézol
View author publications
You can also search for this author in PubMed Google Scholar
Cécile Huneau
View author publications
You can also search for this author in PubMed Google Scholar
Chala J. Turo
View author publications
You can also search for this author in PubMed Google Scholar
Nacer Mohellibi
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Falque
View author publications
You can also search for this author in PubMed Google Scholar
Karine Gallardo
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca McGee
View author publications
You can also search for this author in PubMed Google Scholar
Bunyamin Tar’an
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhafid Bendahmane
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Aury
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline Batley
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Christine Le Paslier
View author publications
You can also search for this author in PubMed Google Scholar
Noel Ellis
View author publications
You can also search for this author in PubMed Google Scholar
Thomas D. Warkentin
View author publications
You can also search for this author in PubMed Google Scholar
Clarice J. Coyne
View author publications
You can also search for this author in PubMed Google Scholar
Jérome Salse
View author publications
You can also search for this author in PubMed Google Scholar
David Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Judith Lichtenzveig
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Macas
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslav Doležel
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Wincker
View author publications
You can also search for this author in PubMed Google Scholar
Judith Burstin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.K., G.A., M.A.M., J.D., J.M., D.E., T.D.W., C.J.C., J.L. and J.Burstin formed part of the International Pea Genome Sequencing Consortium steering committee and established the experimental outline. G.A. supervised DNA material production, A. Klein supervised plant material production and phenotyping, C.J.C. and J. Burstin supervised the germplasm choice. P.W. and K.L. generated Illumina genomic sequence data and single-chromosome sequencing data. D.M., R.M. and C.J.C. generated PacBio genomic sequence data. C.J.T. and J.L. conducted transcriptome assays and generated RNA-seq data. M.A.M., P.W., L.d’A. and J.M.A. conducted the genome assembly. J.K., A. Kougbeadjo, G.A. and J. Burstin curated and improved genome assembly. J.K. and C.F. produced the genome annotation. A.Bendahmane supervised the generation of the BAC library. P.C., J.V., J.D. performed flow cytometric sorting of single chromosomes and amplified their DNA for sequencing. H.T., Z.M., C.B. and J.D. made the associated optical maps. G.A., K.G. and J.Burstin performed the seed storage protein gene analyses. P.E.B., H.T.L., J.Batley and D.E. generated the skim-GBS map and ‘Caméor’ single-chromosome sequencing. A.Bérard, M.C.L.P., K.L., C.J.C., D.M., R.M. generated re-sequencing data. K.K.G., B.T. and T.D.W. procured the whole-genome profiling data. R.A.S., J.K., A. Kougbeadjo, J.L. and J.Burstin contributed to the whole-genome evolution studies. C.H. and J.S. reconstructed the legume paleo-genome. C.B. and W.B. developed the whole-genome optical maps. M.T., G.A. and K.G. performed the miRNA analyses. J.K., P.Novak, I.V., P.Neumann, J.Burstin and J.M. analyzed repetitive DNA. I.V., P.Neumann and J.M. performed FISH assays. N.E. contributed to data analysis. N.M., M.T. and J.K. set up the JBrowse platform. M.F. built the genetic maps. J.K., J.M., C.J.C., J.S., J.L. and J.Burstin wrote the paper. G.A., K.G, P.W., J.D., C.B., D.E., B.T., T.D.W., D.M., R.M. and N.E. edited the paper. All authors have read and approved the manuscript.

Corresponding author

Correspondence to Judith Burstin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15, Tables 1–12 and Note

Reporting Summary

Supplementary Datasets 1,3, 4, 5, 6, 7, 8

Supplementary Datasets 1, 3–8

Supplementary Dataset 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kreplak, J., Madoui, MA., Cápal, P. et al. A reference genome for pea provides insight into legume genome evolution. Nat Genet 51, 1411–1422 (2019). https://doi.org/10.1038/s41588-019-0480-1

Download citation

Received: 28 December 2018
Accepted: 10 July 2019
Published: 02 September 2019
Issue Date: September 2019
DOI: https://doi.org/10.1038/s41588-019-0480-1

This article is cited by

Transcriptomic analysis identifies candidate genes for Aphanomyces root rot disease resistance in pea
- Carol Kälin
- Edoardo Piombo
- Magnus Karlsson
BMC Plant Biology (2024)
Long-term push–pull cropping system shifts soil and maize-root microbiome diversity paving way to resilient farming system
- Abdul A. Jalloh
- Fathiya Mbarak Khamis
- Daniel Munyao Mutyambai
BMC Microbiology (2024)
A chromosome-level genome assembly for Onobrychis viciifolia reveals gene copy number gain underlying enhanced proanthocyanidin biosynthesis
- Junyi He
- Danyang Tian
- Yunwei Zhang
Communications Biology (2024)
Telomere-to-telomere genome of the allotetraploid legume Sesbania cannabina reveals transposon-driven subgenome divergence and mechanisms of alkaline stress tolerance
- Haofei Luo
- Xiaofei Wang
- Gai Huang
Science China Life Sciences (2024)
Comparative analysis of transcriptome and metabolome explores the underlying mechanism of pod color variation in pea (Pisum sativum L.)
- Xiaojuan Zhong
- Mei Yang
- Chao Xiang
Journal of Plant Biochemistry and Biotechnology (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Genome sequencing and assembly

Repeat annotation and gene prediction

Legume genome size evolution

Paleohistory of modern legume genomes

Pisum genome structure evolution

Pisum genetic diversity

Seed storage protein gene families

Discussion

Methods

Genome sequencing

De novo assembly

Genome annotation: repetitive sequences, gene models and microRNAs

Genome structure and evolution

Pisum diversity

Translocation analyses

Seed storage proteins annotation

Reporting Summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links