Genus brassica in brassicaceae

Plants of the genus Brassica are grouped into the tribe Brassiceae, which belongs to the plant family Brassicaceae. Brassicaceae comprises a large family of plants that exhibit common and distinct features in their flowers. The flowers have cruciform petals and six stamens, two of which are short outer stamens. In total, Brassicaceae is composed of 3709 species and 338 genera,1 with 308 of the 338 genera further assigned to 44 tribes.2 Among the abundant Brassicaceae species, the genus Brassica is important because it contains many economically valuable crops that are used as oilseeds, condiments, and culinary vegetables. Brassica species share an additional common feature in that they all experienced an extra whole genome triplication (WGT) event, which occurred approximately 9–15 million years ago3, 4 or even approximately 28 million years ago.5–7

The U’s triangle model describes the relationship among brassica crops

Six species of the genus Brassica are used widely throughout the world as oilseed, condiments, fodder or vegetable crops. Three of these species are diploid (Brassica rapa, n=10; B. nigra, n=8; and B. oleracea, n=9), whereas the other three are allotetraploids (B. juncea, n=18; B. napus, n=19; B. carinata, n=17) derived from each pair of the three diploid species. The genetic relationships of these species were identified and confirmed by extensive experimental crosses between tetraploid and/or diploid plants as well as karyotyping or microscopic inspection at the synapsis stage of meiosis in these crosses.8 For example, when crosses were performed between B. napus and B. rapa, F1 plants with a chromosome number of n=29 were generated. When there were two sets of chromosomes from the B. rapa genome, only nine normal pairs of synaptic chromosomes at the synapsis stage in meiotic cells of the F1 plants were observed by microscopic inspection.9 The other 10 chromosomes of B. napus that can never form normal pairs of synaptic chromosomes in the F1 were affected by the chromosomes of the A genome, B. rapa. This supports the theory that these 10 chromosomes of B. napus are homologous to the 10 chromosomes in B. rapa.9 Similar experiments using crosses of B. napus and B. oleracea showed that the other nine chromosomes of B. napus are homologous with the nine chromosomes of B. oleracea. Taken together, these results lead us to conclude that B. napus is a tetraploid of B. rapa and B. oleracea.8, 9 Based on experimental evidence, the relationships of the six species were simply described by the U’s triangle model,8 in which the three diploid species B. rapa, B. nigra and B. oleracea are considered to be the basic genomes A, B and C, respectively, and are placed at the three vertices of the triangle. The three allotetraploids, B. juncea, B. napus and B. carinata, which are hybrids of AB, AC and BC, respectively, are placed in the middle of the three edges of the triangle. U’s triangle has been successfully applied to aid in understanding the relationships among Brassica crops and has fostered the genetic study of these species.

The rich diversity of brassica plants

Brassica plants have rich diversity with respect to both speciation and the abundant morphotypes in each Brassica species. Brassica crops described by U’s triangle are close relatives, and many traits are shared but developed independently and in parallel, such as heading leaves and enlarged roots (Figure 1). One of the important vegetables of B. rapa is Chinese cabbage, which has the distinct feature of a leafy head; this feature is also observed in B. oleracea and B. juncea. Turnip, another morphotype of B. rapa, develops enlarged roots as storage organs, and this feature is also found in B. juncea and B. napus. Furthermore, each Brassica species has evolved multiple morphotypes, including leafy heads and enlarged roots, other enlarged organs of stems and inflorescences, oilseeds, sarsons and even ornamental features.10 Different morphotypes have different usages. In B. rapa (Figure 1a), heading Chinese cabbage and pak choi are consumed as leafy vegetables. Chinese cabbage is distinct for its large leafy head, whereas pak choi has relatively smaller leaves and does not develop heading leaves. Turnip has an enlarged root that is eaten and occasionally used as fodder. Caixin and purple caitai bolt rapidly and generate long, tender stems used as food. Morphotypes of oilseed B. rapa produce large, full seeds for oil extraction and sarsons produce seed pods that are eaten in India. Some morphotypes of B. rapa develop beautiful leaf patterns and colors, thus are used as ornamental plants. B. oleracea also has an abundance of morphotypes, as shown in Figure 1b. Heading B. oleracea is consumed as a leaf vegetable, whereas oilseed B. oleracea produces edible oil. Cauliflower and broccoli, special morphotypes of B. oleracea, have developed enlarged inflorescences that are eaten as vegetables. Other Brassica crops, such as B. juncea, have even greater morphotype richness than B. rapa and B. oleracea. In addition to these cultivated crops, there are many wild relatives of the species in U’s triangle that have greatly diversified phenotypes, further extending the diversity of Brassica plants.

Figure 1
figure 1

Rich morphotypes of Brassica plants. (a) Morphotypes of B. rapa; top two lines from left to right: pak choi, heading B. rapa, turnip, oilseed, purple pak choi, caixin, mizuna, purple caitai and takucai; the third line shows additional morphotypes or varieties of the previous morphotypes. (b) Morphotypes of B. oleracea; top two lines from left to right: heading cabbage, Brussels sprouts, broccoli, cauliflower, purple cabbage, purple cauliflower, collard; the third line shows additional morphotypes or varieties. Some of the pictures were collected from the Internet.

The WGT event was important to the speciation and the expansion of rich morphotypes in the genus Brassica. The subsequent genomic rearrangement and gene evolution initiated by WGT promoted the appearance of a variety of Brassica plants.

Chromosome evolution after WGT promoted brassica speciation

Genomic blocks (GBs) are collinear chromosome fragments conserved among different genomes. The genomes of Brassicaceae species are composed of 24 GBs labeled A to X. These blocks were defined by a comparative genomic analysis among the genomes of many Brassicaceae species, such as B. napus, Arabidopsis thaliana, A. lyrata and Capsella rubella.11, 12 These GBs formed the basic units in ancestral chromosome reshuffling that generated the present-day species. Parkin et al. 11 constructed a high-density linkage map of B. napus, which is the allotetraploid of B. rapa and B. oleracea. Based on this map, they defined 21 GBs shared between B. napus and A. thaliana. Subsequently, Schranz and co-workers12 combined the B. napus linkage map with comparative mapping results from A. lyrata and C. rubella, both of which have eight chromosomes and are defined as the ancestral common karyotype (ACK or AK) of Brassicaceae, resulting in the definition of 24 GBs (A–X) as the basic units of all Brassicaceae genomes. Various combinations of these 24 blocks, occasionally accompanied by whole genome duplication, compose all of the genomes of Brassicaceae species. Genomes that contain only one set of the 24 GBs are considered diploid species. There are many such genomes in Brassicaceae; the main karyotypes include ACK (n=8), Proto-Calepine karyotype (PCK, n=7), translocated PCK (tPCK, n=7) and A. thaliana (n=5).12, 13 Previous studies based on phylogenetics and comparative chromosome painting showed that ACK was the ancestral karyotype of Brassicaceae.12–15 ACK has eight chromosomes with GBs ordered from A to X across chromosomes one to eight. Examples of extant ACK species include A. lyrata and C. rubella.14, 16, 17 Both PCK and tPCK have seven chromosomes and differ in one inter-chromosomal translocation.13 Conringia orientalis has a PCK chromosome order,13 whereas Schrenkiella parvula has a tPCK genome.18 Genomes having more than one set of the 24 GBs in Brassicaceae are considered paleopolyploid species, including all Brassica crop species, which experienced a WGT event.

The genomic structure of the triplicated-genome species in the Brassica genus as well as their ancestral genome evolution were first studied in detail after the whole genome sequencing of the Brassica A genome B. rapa 3 followed by the C genome B. oleracea.19 The sequencing of other Brassica species is now underway. Genome datasets of the Brassica species are maintained and continuously updated within the Brassica database (http://brassicadb.org).20 For B. rapa, the genome size was estimated to be 485 Mb based on K-mer analysis, and gene prediction suggested 41 020 protein-coding gene models. For B. oleracea, the genome size was estimated to be 630 Mb with approximately 45 758 gene models.

Comparative genomic analysis between B. rapa and A. thaliana clearly indicated the WGT event experienced by B. rapa.3 Syntenic gene analysis between the triplicated genome of B. rapa and the diploid genome of A. thaliana using tool SynOrths showed that most genes inherited from their nearest common diploid ancestor were shared by both species (80.2% and 73.8% for B. rapa or A. thaliana, respectively).21, 22 After WGT, the genomic fragments were reshuffled and fractionated. However, the local gene order was conserved, and syntenic genomic fragments can be clearly observed in both B. rapa and A. thaliana. Furthermore, for each GB in A. thaliana, three corresponding syntenic GBs in B. rapa were detected, which were generated by the WGT event.3, 21 Genomic synteny analysis between B. oleracea and A. thaliana as well as between B. rapa and B. oleracea showed that B. oleracea has good genomic collinearity with genomes of A. thaliana and B. rapa (Figure 2). It was found that as observed in B. rapa, B. oleracea has three copies of each GB found in A. thaliana, thus confirming at the whole genome sequence level that B. oleracea also experienced the extra WGT event.19 A previous comparative genomic study of B. juncea and other Brassicas based on genetic maps showed that the genome of B. nigra also shared the WGT event.23 Based on the comparison of GB distribution in the Brassica A, B and C genomes, we provided a framework for the comparative genomic study for Brassica species.

Figure 2
figure 2

Framework for the comparative genomic analysis of Brassica plants. (a) Chromosomal synteny between B. rapa and B. oleracea determined based on whole genome sequences. Block associations for each chromosome are listed above or below the chromosome bars. The numbers 1, 2 and 3 placed after each block label (A–X) denote subgenomes LF, MF1 and MF2, respectively. (b) Evolutionary relationships between the chromosomes of B. rapa and B. nigra. Block information for B. nigra was extracted from the genetic map of B. napus.11, 23 Block information is shown below each chromosome bar, and the syntenic chromosomes of B. rapa to B. nigra are listed above the chromosome bars.

The diploid ancestor of B. rapa before WGT had seven chromosomes, which resembles the block arrangement of tPCK.24 In B. rapa, there should be three sets of the 24 GBs, ideally 72 GBs in total, that resulted from WGT. Using A. thaliana as a reference, the genomic fragments of the three copies of each GB were clearly identified (with the exception of one copy of block G) in the genome of B. rapa.24 For a certain GB, the three corresponding copies in B. rapa are not always all associated with a same GB. Some block associations are found in the ACK genome, such as block A, which associates with block B (block association A/B). However, some block associations do not exist in ACK, suggesting unique block associations for the diploid ancestor of B. rapa but not ACK; this lack of associations may results from genomic reshuffling after WGT. Based on this block distribution information, the block association relationships across the 10 chromosomes of B. rapa were observed. The breakage and formation of block associations occur independently; thus, the probability that an ancestral block association was broken more than twice after WGT is low, and the probability that a newly derived block association formed more than twice after WGT is also low. Based on this rule, by counting the copy numbers of all block associations in genome of B. rapa and comparing these numbers with the extant diploid karyotypes in Brassicaceae, such as ACK, PCK, tPCK and A. thaliana, Cheng and co-workers24 found that the diploid ancestor of B. rapa had a tPCK-like karyotype. Furthermore, block association analysis of B. oleracea or B. napus and R. sativus based on genetic maps showed that these species also evolved from an ancestor having a tPCK genome.

The distribution patterns of transposable elements (TEs) support the positions of the 21 tPCK paleocentromeres in the genome of B. rapa. It is well known that TEs are enriched in the flanking regions of centromeres; this configuration has been observed in the genomes of many species such as A. thaliana, maize and soybean.25–27 TE sequences continue to show a relatively high density surrounding the positions of the 21 paleocentromeres in B. rapa millions of years after rediploidization following WGT. After reconstructing the three subgenomes of B. rapa along the seven tPCK ancestral chromosomes, using a method similar to playing a jigsaw puzzle (Supplementary Fig. S1), we plotted the TEs as a function of their density along the 21 reconstructed tPCK chromosomes (Figure 3). This plot clearly shows that the TE distribution variation reflects the locations of the 10 inherited centromeres in B. rapa as well as the 11 inactivated paleocentromeres. The supported locations of the 21 paleocentromeres accurately match the centromere regions of tPCK, which are positioned between the block associations B/C, G/H, I/J, S/T, P/W, M/E and D/V for tPCK chromosomes one to seven, respectively.13, 24 Other TE-rich regions, such as the distal end of AK2/5/6/8 in subgenome LF (the least fractionated subgenome) (Figure 3), could represent traces of paleocentromeres from more ancient genome duplications. In addition, as shown in Figure 3, gene fractionations or large genomic fragmental deletions are generally more concentrated near the paleocentromere regions compared to the genomic background.

Figure 3
figure 3

Distribution of TEs supporting the positions of the 21 paleocentromeres in the three tPCK subgenomes of the B. rapa genome. The color of each bin represents the ratio of TE sequences in the flanking region of a given gene used to reconstruct the tPCK subgenomes. The x-axis shows the position of the reconstructed chromosomes; the y-axis shows the three copies of the seven chromosomes in tPCK, which are the three subgenomes of B. rapa.

Chromosomal reduction together with paleocentromere descent from the primal hexaploid ancestor (tPCK×3, n=21) is important for the speciation of Brassica plants. After WGT, extensive chromosome reshuffling during rediploidization contributed to the origin of closely related species in Brassica. As mentioned above, in the cross between B. napus and B. rapa, more than two copies of homologous chromosomes in the synapsis stage of meiosis will result in abnormal synaptonemal complexes, thereby decreasing the fertility of gametes. Logically, natural selection drives the rediploidization process with chromosomal rearrangement that removes the extra homologous chromosomes. Further rounds of genomic reshuffling of the rediploid ancestor at different evolutionary timepoints then created the different species in Brassica. In the B. rapa genome, the number of chromosomes and paleocentromeres was reduced from 21 to 10. The chromosomes were reduced by multi-chromosome translocation, fusion, and inter-/intrachromosomal recombination. Taking chromosomes A03 and A08 as examples.28, 29 A03 evolution involved six chromosomes of tPCK (Figure 4a), AK2/5, AK7, AK2/5/6/8, AK3, AK6/8 and AK4 (a proposed chromosomal rearrangement process is shown in Figure 4b), whereas A08 was generated from several rounds of interchromosomal translocation of two tPCK chromosomes, AK1 and AK7 (Figure 4c). The circle model for block associations of M/N/T/U/D and V/K/L/Q/X (Figure 4b) or T/U (Figure 4c) has been used in previous reports.28, 29 However, the chromosomal rearrangement explained by the circle model (Supplementary Fig. S2a) can also be achieved through an alternative process of chromosome translocation and fusion (Supplementary Fig. S2b).

Figure 4
figure 4

Chromosomal rearrangement of A03 and A08 in B. rapa. (a) The genomic block orders in the seven chromosomes of tPCK; block colors follow the labeling scheme of a previous report.12 (b) The chromosome evolution of A03 involved six chromosomes of tPCK: AK2/5, AK7, AK2/5/6/8, AK3, AK6/8 and AK4. Red, green, and blue colors denote the subgenomes LF, MF1 and MF2. (c) The process by which two tPCK chromosomes, AK7 and AK1, were reshuffled into chromosome A03 of B. rapa.

Gene evolution after WGT propelled the expansion of rich morphotypes for brassica species

Subgenome dominance has been detected among the three tPCK subgenomes in B. rapa.21, 30 The subgenome dominance effect resulted in the differentiation of paralogous genes and featured the following characteristics: (i) one subgenome retained more genes than the other two through gene fractionation after WGT; (ii) genes located in the subgenome with high gene density are always expressed at higher levels than their paralogs in the other two subgenomes; and (iii) genes in the dominant subgenome accumulated fewer non-synonymous mutations than did the other subgenomes.21 Gene density differentiation is clearly observed when counting the number of genes within the reconstructed tPCK subgenomes: the subgenome LF has approximately 1.6 times more genes than the other two subgenomes MF1 and MF2 (the more fractionated subgenomes one and two).3, 21 Using mRNA-Seq data generated for different organs of B. rapa, a comparison of paralogous gene pairs showed that a greater number of genes located in subgenome LF are expressed at a higher level (i.e., either those showing at least two-fold greater expression or ‘horserace’ winners) than their paralogs in the MF subgenomes.21 The resequencing of different morphotypes of B. rapa, such as L144 and a turnip, showed that genes located in LF accumulated fewer functional mutations (non-synonymous single-nucleotide polymorphisms and frame-shift InDels) than those located in the MF subgenomes.21 This subgenome dominance effect has also been observed in the genome of maize.31

The three aspects of the dominance effect among the subgenomes of B. rapa are united by the rule of improving the fitness of the plant. Under this rule, genes that are expressed at higher levels than their paralogs should be more important for the biological function of the plant. Thus, functional mutations of these dominantly expressed genes would be more significant in reducing the plant’s fitness than mutations of their syntenic paralogs. Therefore, natural selection drives the conservation of the dominantly expressed genes against functional mutations, whereas their paralogs accumulate more mutations and eventually become fractionated, resulting in a higher gene density in the dominant subgenome and lower gene density in the dominated subgenomes. This explanation was first suggested following an analysis of the maize genome and subsequently for the genome of B. rapa.21, 31, 32

Short homologous sequence-mediated deletion regulates gene fractionation in B. rapa. By investigating the fractionated genes in the B. rapa genome, it was found that genes were lost individually rather than via the simultaneously deletion of many genes located in a large fragment. Short repeated sequence-mediated individual gene fractionation has been observed in maize.33 First, a pair of small direct repeats appear near the gene coding region before fractionation. The small repeated sequences then form a loop for intrachromosome recombination, and the gene sequence located in the middle of the two homologous repeat sequences is deleted. This mechanism was also found to function in the process of gene fractionation in B. rapa.30

The 24-bp small RNA-targeted TE methylation that suppressed the expression of nearby genes as well as its biased distribution among the subgenomes of B. rapa led to subsequent subgenome dominance.34 Small RNA-Seq data analysis showed that dominantly expressed genes in B. rapa always have fewer 24-bp RNA-targeted TEs in their 1-kb flanking regions compared with their paralogs. Previous reports on A. thaliana showed that small RNA-targeted TEs were subjected to methylation,35, 36 and the methylated TEs then suppressed nearby gene expression. All of these observations suggest that the biased distribution of small RNA-targeted TEs played an important role in the formation of the subgenome dominance effect.

WGT provided a bulk of genes that served as both the raw materials and a buffer pool for multicopy genes to evolve disparate or new functions (subfunctionalization or neofunctionalization), whereas the subgenome dominance effect facilitated this process by differentiating the multicopy genes. These newly evolved functions further promoted the evolution of rich morphotypes in Brassica. In A. thaliana, after several rounds of whole genome duplication (α, β and γ polyploidization), many duplicated genes were subfunctionalized and/or neofunctionalized. For example, in A. thaliana, some genes from extra duplications have subfunctionalized compared with those in Carica papaya, such as the enzymes CYP79A and CYP79B that catalyze the first step of glucosinolate synthesis.37 Some genes have neofunctionalized to develop extra biosynthetic pathways for indole and methionine-derived aliphatic glucosinolates in A. thaliana, which are not detected in C. papaya. Analysis of glucosinolate genes in the genome of B. rapa showed that for most of these genes, multiple copies were retained after WGT.38 These over-retained genes would be undersubfunctionalization or neofunctionalization to develop new biological functions related to glucosinolate metabolism in B. rapa, as in A. thaliana. It is expected that there are many more such examples of other over-retained genes in B. rapa. The subgenome dominance effect may aid in this evolutionary process by conserving one copy of the dominant gene and letting the other copies differentiate or develop new roles. Finally, these differentiated genes will contribute to the different traits of B. rapa.

Biased gene retention after WGT promoted the morphotype diversification of Brassica plants. Phytohormones, especially auxin, play important roles in plant morphogenesis.39 The genes involved in plant hormone signaling pathways are thus important for divergent morphotype formation.39, 40 By comparing the gene contents in A. thaliana and other sequenced genomes, such as Carica papaya and Vitis vinifera, it was found that auxin-related genes were expanded in the B. rapa genome.3 Furthermore, by comparing the number of gene categories that retained only one or multiple copies, genes involved in the response to phytohormone signaling were found to be significantly over-retained via gene fractionation following WGT in the genomes of B. rapa 3 and B. oleracea.

Two-step theory to illustrate the WGT process in brassica

From a genome evolution perspective, a two-step theory of polyploidization was suggested to illustrate the process of WGT in Brassica plants.3, 21 Based on the results of comparative subgenome analysis in Brassica, mainly in B. rapa as summarized above, we proposed that the WGT event occurred as two genome duplication steps (Figure 5). In the first step, the two tPCK genomes MF1 and MF2 were merged together. Subsequently, a round of genomic reshuffling and gene fractionation resulted in a new diploid. No significant genome dominance is observed between the MF1 and MF2 subgenomes of B. rapa now and thus autotetraploidization cannot be excluded as a possible process for the first duplication. However, based on the observation that there were a greater number recent small deletions within the exons of MF1 than those of MF2,30 the first duplication is likely to have been an allotetraploidization. In the second step, the third tPCK genome LF was merged with the MFs (MF1 and MF2). A second round of genomic reshuffling and gene fractionation then resulted in the mesohexaploid ancestor of B. rapa. In the second step, the ‘two’ merged genomes (LF and MFs) had different karyotypes, which produced an allopolyploid, and subsequently resulted in biased genome fractionation and the dominant gene expression phenomenon.

Figure 5
figure 5

Two-step polyploidization theory for the WGT event experienced by Brassica plants.

B. rapa genes belonging to different gene families or having important biological functions have been systematically analyzed. Based on the genome sequencing and comparative genomic study of B. rapa, the evolution of many gene families or categories such as circadian clock genes,41 resistance genes,42 stress response genes,43–45 glucosinolate genes,38 anthocyanin biosynthesis genes,46 phytohormone-related genes,47 certain transcript factor families48 and other genes,49–54 were accurately determined individually and studied in detail. Moreover, some functionally important genes related to self-incompatibility,55, 56 male sterility,57 flowering regulation,58–60 leaf heading61 or color62 have been identified or cloned, and functional studies have been conducted for some of these genes. These follow-up studies in B. rapa helped to further elucidate the evolution of specific genes after the WGT event.

Conclusions and discussion

WGT promoted the diversification of Brassica plants with respect to both the speciation and expansion of rich morphotypes for each species. First, WGT promoted genomic reshuffling, i.e., rediploidization, to stabilize the genome and the meiosis process. Genomic reshuffling accompanied by chromosome reduction contributed the speciation of diploid Brassica plants, such as B. rapa, B. nigra and B. oleracea. Genomic differentiation of the three basic genomes in U’s triangle then generated the stable allotetraploid species B. carinata, B. napus and B. juncea. Second, subgenome differentiation, biased gene retention through gene fractionation after WGT, and further multicopy gene subfunctionalization or neofunctionalization promoted the parallel evolution of many different morphotypes in each Brassica species. Therefore, WGT with subsequent genomic and gene-level evolution drove Brassica speciation and generated an abundance of rich morphotypes of the Brassica species.

In the future, additional research should be conducted to investigate the morphotype evolution of Brassica plants. Previous studies based on de novo genome sequencing determined the genome- and gene-level evolution of only one accession of B. rapa. Subsequently, additional B. rapa accessions or other Brassica species should be extensively studied to address the following aspects regarding the Brassica population: (i) the origins and phylogenetic relationship of different morphotypes in Brassica species; (ii) the mechanism of the parallel evolution of similar traits that developed independently in different Brassica species (e.g., the leafy head in B. rapa and B. oleracea); and (iii) the genes involved in the development of different morphotypes or genes that regulate the agronomic important traits of Brassica crops. This knowledge will increase our understanding of Brassica morphotype diversification and ultimately leverage the benefits of genomic studies for the genetic improvement of Brassica crops.

Conflict of interest

The authors declare no conflict of interest.Footnote 1