Introduction

Brassica juncea (Indian mustard) is India’s second-most important source of edible oil and is widely grown in many parts of the world1. Seed size is one of the most crucial traits and is one of the principal breeding targets in B. juncea2,3. In dicots, seed development ensues after double fertilization and initiates the growth of the diploid embryo and triploid endosperm, surrounded by the maternal integuments. The growth of the embryo lags behind that of the endosperm. Therefore, seed size is initially determined by the growth of integuments and endosperm4. As the seed matures, the embryo fills the whole seed cavity, and replaces the endosperm. The mature seed thus contains a fully developed embryo, endosperm, and seed coat derived from integuments. Therefore, seed size is regulated by coordinated regulation of growth of the diploid zygotic embryo, triploid endosperm, and maternal integuments5,6,7. These stages of seed development have previously been characterized in detail in B. napus8.

Seed size is a quantitative trait, and several QTLs that regulate seed size have been identified in Brassica species2,9,10,11,12,13,14. QTLs of different seed traits frequently show co-localization15,16. In B. juncea, seed size negatively correlates with seed oil content17. Recombinant inbred line (RIL) phenotyping in B. napus indicated that seed weight is positively correlated with pod length and protein content but negatively correlated with seed number18. Therefore, improvement of seed size through breeding remains challenging. Moreover, due to complex genomic constitution, redundancy due to whole genome triplication, and lack of mutant stocks, the molecular basis of seed size control in Brassica species is still unclear19. The pieces of evidence from genetic and omic studies and molecular characterization are now starting to accumulate, and their careful assimilation is required to understand the complete picture.

As evidenced in model dicot Arabidopsis, seed size is governed by several maternal and non-maternal genetic factors by affecting the final cell number and size in the seed coat and the zygotic tissues20. Analysis of the correlation between seed weight and seed size, using genetically diverse inbred lines with contrasting seed weight in B. napus, has confirmed that seed diameter, volume, and surface area are highly correlated with seed weight while bulk density is not associated with seed weight21. Using high-resolution analysis of seed topology, it has been shown in B. napus that the final seed size is determined initially by the seed coat and the endosperm and later by the expansion of the developing embryo8. However, the spatial environment provided by the seed coat acts as a limitation for further embryo expansion and acts as a mechanical constraint to prevent further increase in embryo size8,22. Further, extensive analysis of topology and lipid imaging has suggested that increasing seed oil content may be challenging; instead, increasing the surface area of seeds may be a reasonable strategy to improve the overall oil content and yield8.

Transcriptional dynamics of developing seeds have recently been investigated in B. napus23,24,25,26 and B. rapa27. Here we report transcriptional profiling of developing seeds of two contrasting varieties of B. juncea, Early Heera 2 (EH2; small-seeded, yellow seed coat, low oil content) and Pusajaikisan (PJK; bold seeded, brown seed coat, high oil content). EH2 and PJK exhibit contrasting agronomic characters with significant differences in seed size, coat color, and other agronomic traits2,28(Fig. 1). In B. napus, seed development has previously been classified into three distinct stages viz., 0-18D (days after pollination), 18-40D and 40D-maturity representing the early morphogenic phase marked by the rapid growth of the embryo, middle phase characterized by the accumulation of triacylglycerols, and the desiccation phase, respectively29. A similar trajectory of seed development was also confirmed by another study in B. napus8. Therefore, to investigate the transcriptional dynamics at the landmark stages in B. juncea, we performed RNA sequencing from seeds collected at 15D, 30D, and 45D. Hereafter, these stages shall be referred to as early, middle, and late stages, respectively. While this manuscript was in preparation, another study reported transcriptome profiling from EH2 and PJK seeds30. However, all three stages used in that study correspond to the first phase of the seed development30. By investigating the transcriptional repertoire of early, middle and late seed stages, we demonstrate that the late developmental stages are particularly critical for regulating variety-specific differences in seed size in B. juncea.

Figure 1
figure 1

Summary of phenotyping and expression dynamics. (a) Picture showing seeds of Early Heera2 (EH2) (yellow colored) and Pusa Jai Kisan (PJK) (brown colored). The scale bar at the bottom corresponds to 1 cm, (b) Thousand seed weight (g) of EH2 and PJK depicting the contrasting seed weight of both varieties, (c) Venn diagram showing the number of transcripts expressed commonly in both varieties, and exclusively in EH2 and PJK, (d) Number of transcripts expressed in EH2 and PJK at each of the three stages, according to the level of expression. The legend at the bottom explains the ranges of TPM values.

Results

Global expression dynamics during seed development in B. juncea

The thousand seed weight of two contrasting varieties of B. juncea, EH2 and PJK was measured confirming the extreme variation in seed weight in both varieties, i.e., 2.0 g and 6.9 g, respectively (Fig. 1b). A total of 18 libraries comprising three biological replicates each of early (15D), middle (30D), and late (45D) seed stages from EH2 and PJK were prepared for sequencing. More than 694 billion paired-end reads were obtained. After adapter trimming and quality filtering, > 691 million high-quality reads were retained from 18 libraries (Supplementary Table 1). Read alignment to B. juncea reference genome31 resulted in 81.0 to 95.4% alignment of all the libraries (Supplementary Table 1). However, one replicate of 30D PJK sample was later removed due to an extremely high percentage of duplicates. Therefore, further analysis was carried out with 17 libraries. All the libraries showed a high correlation coefficient (spearman coefficient > 0.9) between biological replicates (Supplementary Table 2). A total of 137,912 unique transcripts were identified using StringTie. From these, 112,550 transcripts with normalized expression value of transcripts per million (TPM) > 0 were analyzed further. (Supplementary Table 3). Of these, 77,875 were previously annotated in B. juncea, while 34,675 were novel transcripts. Further, out of total 112,550 expressed transcripts, 95,471 were expressed in both the varieties, while 7590 and 9489 were detected exclusively in EH2 and PJK, respectively (Fig. 1c). The analysis of transcript abundance showed that ~ 50% of transcripts were detected at TPM > 1 in each of the stages in both the varieties (Fig. 1d). Further, 469 transcripts with TPM > 1 in EH2 but no expression in PJK were referred as EH2-specific while 638 transcripts with TPM > 1 in PJK but no expression in EH2 were identified as PJK-specific (Supplementary Table 4). Some of these transcripts expressed at very high levels with two of the EH2-specific and PJK-specific transcripts having TPM > 100, whereas a set of 41 and 37 EH2- and PJK-specific transcripts, respectively had average TPM > 10. Further, we performed intra- and inter-variety comparisons to identify differentially expressed transcripts at all three stages of seed development.

Intra-variety stage-wise differential expression analysis

Pairwise differential expression analysis by measuring ≥ two-fold change at q value < 0.05 revealed differentially expressed transcripts at different stages of seed development. A total of 20,514 and 18,901 unique transcripts were differentially expressed in EH2 and PJK, respectively (Supplementary Table 5 and 6). In EH2, 10,118 transcripts were differentially expressed in the middle vs. early stage, of which 4598 and 5520 transcripts were upregulated and downregulated, respectively (Fig. 2a). While in PJK, 8427 transcripts were differentially expressed in the same comparison, and 3676 and 4751 transcripts were upregulated and downregulated, respectively. In the late vs. middle stage, 8006 transcripts were differentially expressed in EH2, of which 5221 and 2785 transcripts were upregulated and downregulated, respectively while 7667 transcripts were differentially expressed in PJK out of which 3209 and 4458 transcripts were upregulated and downregulated, respectively. In late vs. early-stage samples, 16,474 transcripts were differentially expressed in EH2 with 10,106 and 6368 transcripts upregulated and downregulated, respectively. In same comparison, 15,988 transcripts were differentially expressed in PJK, of which 7411 and 8577 transcripts were upregulated and downregulated, respectively. Thus, maximum differential expression was observed in late vs. early stage EH2 and PJK.

Figure 2
figure 2

Results of differential expression analysis. (a) Differential expression analysis within each variety in three sets, 30D vs. 15D, 45D vs. 30D, 45D vs. 15D. The number of differentially expressed transcripts has been plotted for each comparison in EH2 and PJK. The green bars represent EH2, while the purple bars represent PJK (b) Differential expression analysis between the varieties, as compared at each stage, 15D, 30D, and 45D. The green bars represent EH2, while the purple bars represent PJK.

Inter-variety stage-wise differential expression analysis

In the inter-variety comparison between EH2 and PJK, 19,522 unique transcripts were differentially expressed during seed development (Supplementary Table 7) (Fig. 2b). In the early stage, 11,851 transcripts were differentially expressed in the PJK vs. EH2 comparison, of which 5292 and 6559 transcripts were upregulated and downregulated, respectively. In the middle stage, 8889 transcripts were differentially expressed, of which 3869 and 5020 transcripts were upregulated and downregulated, respectively. In contrast, 10,280 transcripts were differentially expressed in the late stage, of which 3125 and 7155 were upregulated and downregulated, respectively. Thus, the early stage showed maximum differential expression between the varieties, followed by the late and middle stages. Further, in each of the comparisons, the number of transcripts upregulated in PJK was higher than in EH2.

Functional analysis of stage-wise differentially expressed transcripts in both the varieties

To dissect the genetic pathways and regulatory modules regulating different stages of development in seeds of B. juncea, we separately carried out the functional analysis of transcripts differentially expressed in both varieties at early, middle, and late stages. Firstly, to facilitate functional predictions, we identified the Arabidopsis orthologs of all expressed transcripts. A total of 112,550 Brassica transcripts mapped to 21,329 Arabidopsis genes. Of the 103,061 transcripts expressed in EH2, 96,629 transcripts mapped to 20,880 Arabidopsis orthologs, while in PJK, of the 104,960 transcripts, 97,948 mapped to 20,948 Arabidopsis counterparts (Supplementary Table 3). Next, using MapMan pathway mapping tool, 73,180 Brassica transcripts were mapped to 36 different MapMan pathways and 224 MapMan pathway sub-categories. A total of 7521 transcripts encoding transcription factors could be assigned to 58 TF categories (Supplementary Table 3). Subsequently, to identify the transcripts, which are crucial for regulating early, middle, and late seed development in B. juncea, we shortlisted the transcripts exhibiting highest expression at the early, middle, and late stages in both EH2 and PJK (Fig. 3a). The transcripts upregulated in 30 vs. 15D and 45 vs. 15D comparison exhibited highest expression at early stage while those upregulated in 15 vs. 30D and 45 vs. 30D comparison were expressed at highest levels in middle stage. Similarly, transcripts upregulated in 15 vs. 45D and 30 vs. 45D comparison had highest expression at late stages (Fig. 3a). The information about the key genes with known roles in seed traits such as embryo development, seed size, TAG biosynthesis, glucosinolate content, seed coat color, etc. was retrieved from various databases like TAIR (https://www.arabidopsis.org), ARALIP (http://aralip.plantbiology.msu.edu) BRAD (http://brassicadb.cn/#/), and SeedGenes (https://seedgenes.org/index.html) and mapped onto differentially expressed gene sets (Supplementary Table 11). The key findings are elaborated below.

Figure 3
figure 3

Functional analysis of transcripts upregulated in early (15D), middle (30D), and late (45D) stages of seed development in both EH2 and PJK. (a) Venn diagrams showing the number of transcripts upregulated in both EH2 (green) and PJK (purple), as identified through pairwise differential expression analysis between different stages. The number of transcripts shown in bold were commonly upregulated in both varieties in different comparisons and were used for functional analysis (b) Pathway mapping results showing the MapMan pathway subcategories enriched in transcripts commonly upregulated at early (30D vs. 15D and 45D vs. 15D comparisons), middle (15D vs. 30D and 45D vs. 30D comparisons), and late (15D vs. 45D and 30D vs. 45D) stages of seed development in both varieties. (c) Fold enrichment of transcription factor categories enriched in the transcripts upregulated in early, middle, and late seed development stages in EH2 and PJK.

Functional analysis of transcripts exhibiting highest expression at the early stage

We observed that 1718 and 4065 transcripts were upregulated at an early stage in EH2 and PJK, compared with the middle and late stages, respectively (Fig. 3a). The key functions assigned to these genes using pathway mapping were cell wall cellulose synthesis, modification, and degradation; signaling receptor kinases; development (SPL); hormone metabolism like jasmonate and ethylene; protein degradation; transport, etc. (Supplementary Table 8) (Fig. 3b). The top GO categories enriched in this gene set were: cell wall, external encapsulating structure, endomembrane system, plasma membrane, apoplast, etc. (Supplementary Table 9). Further, TF fold enrichment at q value < 0.05 showed that transcripts encoding M-type MADS, SRS, ARR-B, ERF, and SBPs TFs were over-represented at the early stage (Supplementary Table 10) (Fig. 3c). Further, a total of 56 transcripts upregulated in the early-stage mapped to 38 Arabidopsis genes involved in the regulation of glucosinolate (30.4%), fatty acid elongation, oil content and TAG synthesis (25%), embryo development and endosperm cellularization (16.1%), seed germination and dormancy (16.1%), seed size (5.4%), seed development (3.5%), seed coat color (1.8%), and seed storage (1.8%) (Supplementary Table 11).

Functional analysis of transcripts exhibiting highest expression at the middle stage

In the middle stage, 2469 and 1437 transcripts were commonly upregulated in both the varieties compared to early and late stages, respectively (Fig. 3a). They showed enrichment in Mapman pathway sub-categories like brassinosteroid metabolism, hormone transport, signaling, glycolysis, secondary metabolism, etc. (Fig. 3b). GO categories like cell wall, endomembrane system, lipid particle, lipid storage, etc., were also enriched in this dataset (Supplementary Table 9). Further, TFs belonging to B3, BES1, HD-ZIP, LBD, and MYB were overrepresented at the middle stage (Fig. 3c). Candidate gene analysis showed that 168 transcripts upregulated in the middle stage mapped to known candidates, with maximum mapping to fatty acid elongation and synthesis, triacylglycerols (TAG) synthesis, and oil synthesis (47%), followed by embryo development (20.8%), seed coat color, development and mucilage (10.7%), seed germination and dormancy (10.7%), glucosinolate (6.5%), seed development (2.4%), and seed size (1.8%) (Supplementary Table 11).

Functional analysis of transcripts exhibiting highest expression at the late stage

In the late stage, 3905 and 1290 transcripts were upregulated as compared to early and middle stages, respectively (Fig. 3a). These were enriched in pathways like light reaction, flavonoid metabolism, development (LEA), etc. (Fig. 3b). GO categories related to photosystem, phenylpropanoid metabolism, secondary metabolites, etc., were enriched in transcripts upregulated at the late stage (Supplementary Table 9). In contrast, TFs like bHLH, WOX, and YABBY were enriched (Fig. 3c). The mapping of 166 transcripts upregulated in late-stage to candidate genes proffered transcripts related to fatty acid elongation and synthesis, oil synthesis and TAG synthesis (56%), seed germination and dormancy (15.1%), embryo development (14.5%), glucosinolate (4.8%), seed coat color, mucilage, and seed coat structure (4.2%), seed development (1.8%), seed maturation (0.6%), seed size (1.8%), and seed storage (0.6%) (Supplementary Table 11).

Apart from the stage-specific pathways, several pathways like regulation of transcription, auxin and gibberellin metabolism, development, cytochrome P450, etc., were enriched at all three stages. Similarly, DOF and TCP TFs were enriched in all the stages of seed development. In contrast, ZF-HD and GRFs were enriched in the middle and late stages.

Functional analysis of inter-variety differentially expressed transcripts

To identify the critical determinants governing the variety-specific differences concerning seed weight, we first analyzed the transcripts upregulated at each stage in a variety-specific manner (Fig. 4) (Supplementary Tables 5 and 6). Among the transcripts upregulated in EH2 only, the topmost enriched pathways at the early stage were biotic stress, protein synthesis, S-assimilation, cell wall modification, protease inhibitor, etc. Those involved in DNA synthesis, cell cycle, cell division, cell wall degradation, auxin, etc. were enriched at both early and middle stages, while cytochrome P450 genes were enriched at early, middle, and late-stages. Similarly, transcripts involved in secondary metabolism, peptide transport, etc. were enriched at the early and late stage. Those involved in regulation of transcription, development—storage proteins, transport—sugars, ATP-binding cassette (ABC) transporters, lipid metabolism, plastocyanin, etc. were enriched at the middle stage while transcripts regulating development and minor carbohydrates metabolism were enriched in middle and late-stage. Further, transcripts regulating abiotic stress, development—LEA, flavonoid metabolism, brassinosteroid, and ethylene metabolism, etc. were enriched at the late-stage. The enriched GO categories also reflected the same pattern with cell cycle and DNA replication among the top categories in the early and middle stages (Supplementary Table 9). On the contrary, among the transcripts upregulated in PJK only, the topmost enriched pathways belonged to categories S- assimilation, signaling receptor kinases, ethylene, and abscisic acid (ABA) metabolism, mitochondrial electron transport, transport sugars, etc. at the early stage while those involved in peptide and protein transport were enriched at early and middle stages. Transcripts associated with biotic stress and cytochrome P450 were enriched in early, middle, and late-stages, while those associated with transport—major intrinsic proteins, cell wall degradation, regulation of transcription, development were enriched at early and late stages. Similarly, gibberellin metabolism, gluconeogenesis, calcium signaling, etc.-associated transcripts were enriched at the middle stage and those associated with light reaction, Calvin cycle, flavonoid metabolism, etc. were enriched at the middle and late-stages. The transcripts involved in protein synthesis, lipid metabolism, tetrapyrrole synthesis, cell wall pectin esterases and secondary metabolism were enriched at late-stage.

Figure 4
figure 4

Results of MapMan pathway enrichment for transcripts differentially expressed only in EH2 and those differentially expressed only in PJK. The enriched MapMan pathway subcategories are shown on the left. The q value and the number of genes are represented for transcripts upregulated at early, middle, and late stages of seed development in both EH2 and PJK. Transcripts related to cell division and cell cycle are enriched among transcripts upregulated in the early and middle stages in EH2 only.

Interestingly, cell division and cycle-related pathways were enriched among the transcripts upregulated at both early and middle stages in EH2 only, while their expression goes down in the late stage of seed development (Fig. 4). Since cell proliferation is a significant determinant of seed size regulation, this observation indicates that the cell division and cell cycle transcripts might play a key role in regulating the seed size difference among these varieties. Therefore, we shortlisted the transcripts belonging to cell division and cell cycle pathway categories which were uniquely upregulated at 15D and 30D compared to 45D in EH2 only (Supplementary Table 12). We obtained a total of 136 transcripts. While these transcripts showed a reduction in expression at the 45D stage in EH2, most of these did not show the reduction in the 45D stage in PJK (Fig. 5a). This expression pattern correlates with the differences observed in seed weight in these varieties, as the continuous assessment of seed weight from 10D to the mature stage showed that the seed weight in both the varieties does not exhibit much difference in the early stages, and the maximum seed weight difference is observed at the later stage of seed development, i.e., from 30 to 45D (Fig. 5b). Identification of the orthologous genes from Arabidopsis revealed that 45 of these transcripts were cyclins, 12 were cell division control (CDC) transcripts, 7 were Cyclophilins, 4 were Anaphase Promoting Complex (APC) transcripts, 3 were Cyclin-dependent kinases (CDK), and the rest were miscellaneous cell cycle-related transcripts like core cell cycle, retinoblastoma-related (RBR) family, pin-formed (PIN) protein family, Ataxia-telangectasia and Rad3 related (ATR) kinase, etc. (Supplementary Table 12).

Figure 5
figure 5

Identification of candidate genes responsible for variety-specific differences in cell proliferation during the late stage of seed development. (a) Heatmap showing expression of cell cycle and cell division related transcripts upregulated in early and middle stage only in EH2. Most of the transcripts exhibit a decrease in expression in E45 but not in P45 (b). Analysis of seed weight (mg) in EH2 and PJK from 10 days after pollination till maturity. Fresh weight is shown from 10 to 45D, and dry weight is shown at maturity. An average weight of twenty seeds was used, and three replicates were used. Standard deviation is also shown. EH2 and PJK exhibit similar seed weight till 15D, and maximum seed weight difference can be observed 30D to 45D. (c) Heatmaps showing expression patterns of transcripts mapped to eight phytohormones based on MapMan pathway mapping. The transcripts upregulated in the early and middle stages only in EH2 are shown. The boxes demarcate the transcripts, which show a decrease in expression in E45 but not in P45, indicating candidates for phytohormones transcripts responsible for continued cell proliferation in PJK.

To further identify the candidates that may be important in regulating the cell cycle and cell division-related transcripts identified above, we shortlisted the phytohormones-related transcripts specifically upregulated at 15D and 30D compared to 45D in EH2 only. MapMan pathway mapping results showed that a total of 286 transcripts in this category mapped to phytohormones, corresponding to ABA (28), auxin (83), brassinosteroid (38), cytokinin (15), ethylene (51), gibberellin (36), jasmonate (27), and salicylic acid (8) biosynthesis/signaling (Fig. 5c, Supplementary Table 13).

To further dissect the cell cycle and division-related transcripts governing variety-specific differences in seed weight, we investigated the transcripts upregulated at the late stage in PJK compared to EH2. A total of 10,280 transcripts were differentially expressed in PJK vs. EH2 at 45D, of which 7155 transcripts were upregulated in PJK (Fig. 2b). Of these, 102 transcripts were related to cell cycle and cell division, as shown by pathway mapping (Supplementary Table 14). Further, we identified 446 phytohormones and TF-related transcripts upregulated at 45D in PJK vs. EH2. (Supplementary Table 14). Of these 548 transcripts, 11 were expressed in a PJK-specific manner (A01_VARUNA_g4163.t1, A03_VARUNA_g1726.t1, A03_VARUNA_g820.t1, B02_VARUNA_g6466.t1, B04_VARUNA_g3041.t1, B05_VARUNA_g1330.t1, B05_VARUNA_g1773.t1, B07_VARUNA_g6715.t1, MSTRG.19891.1, MSTRG.3848.1, MSTRG.42813.2). Next, using the 548 transcripts, we carried out k-means clustering using MeV to identify coexpressed transcripts to narrow down the regulatory modules. After clustering, these 548 transcripts were grouped into eight different groups (Fig. 6). Out of all the cell cycle and cell division transcripts, maximum (17) grouped into cluster 3, which showed the prevalence of MYB and B3 TFs and transcripts mapping to all six phytohormones, including auxin, ABA, brassinosteroid, ethylene, gibberellin, and salicylic acid. Three clusters, 2, 6, and 8, harbored 16 cell cycle and cell division transcripts. Clusters 5, 1, and 4 harbored 15, 11, and 6 transcripts, respectively, while the smallest cluster 7 harbored five transcripts related to cell cycle and division. MYB TFs were the top TFs present in all the clusters. Among phytohormones, auxin was a top hormone in most of the clusters. Based on the insights from the literature, some of the crucial candidates like E2F, BRI1(BRASSINOSTEROID INSENSITIVE1), BRIL2, GASA10 (GIBBERELLIC ACID STIMULATED ARABIDOPSIS 10), CKX1 (CYTOKININ DEHYDROGENASE 1), and CKX2 (CYTOKININ DEHYDROGENASE 2) were predicted as key regulators among these clusters.

Figure 6
figure 6

Clustering analysis of transcripts upregulated in the late seed development stage in PJK compared to EH2 (P45D vs. E45D). 548 transcripts belonging to pathway categories cell cycle, cell division, phytohormones, and transcription factors were used for k-means clustering in MeV using the log-transformed TPM values. 548 transcripts clustered into eight groups, as shown on the left. The number of transcripts belonging to each category (cell cycle and cell division, transcription factors, phytohormones) is shown. The percentage of cell cycle-related genes out of total mapping to each cluster is also shown, depicting that maximum cell cycle-related transcripts clustered in group 3 and lowest in group 5. The top categories of TFs and phytohormones are also shown. The purple boxes on the right side depict the most crucial candidates for the regulation of cell proliferation in developing seeds in each group.

Orthology-based identification of candidate genes for regulating seed size, coat color and oil content in B. juncea

We identified candidate genes regulating seed weight from the literature to further shortlist the candidates regulating variety-specific differences in seed size. Based on the knowledge from Arabidopsis, several pathways, including transcriptional regulators, phytohormones, signaling, and other players, are well elucidated for their roles in governing seed size32. We identified such candidate genes and observed that 79 transcripts, orthologous to 29 Arabidopsis genes, were differentially expressed between EH2 and PJK (Supplementary Table 15). This comparison unravelled that orthologous transcripts of at least eight positive regulators (BRI1, DET2 (DE-ETIOLATED-2), EOD3 (ENHANCER3 OF DA1), MINI3, CKX2, WRI1 (WRINKLED1), ANT (AINTEGUMENTA), and MKK5 (MITOGEN-ACTIVATED PROTEIN KINASE KINASE5) were upregulated in PJK (bold-seeded) and downregulated in EH2 (small-seeded), and those orthologous to one negative regulator (ARF2; AUXIN RESPONSE FACTOR2) was upregulated in EH2 and downregulated in PJK (Fig. 7).

Figure 7
figure 7

Identification of candidate genes responsible for variety-specific differences in seed weight in Brassica juncea. Candidate genes regulating seed size/weight were identified from the literature. Seed size is mainly regulated by genes regulating cell proliferation or expansion in the seed coat, endosperm, or embryo. Those genes which showed differential expression between PJK and EH2 are shown here. Positive regulators are shown in blue and negative regulators are shown in red. Heatmaps are shown for orthologous transcripts in B. juncea transcriptome, only for the transcripts differentially expressed at early, middle, or late stages of seed development. Color coding for heatmap: white—not differentially expressed, green—upregulated in EH2, purple—upregulated in PJK. Yellow stars mark the transcripts that are good candidates responsible for variety-specific seed size differences based on our transcriptome analysis.

Further, we also leveraged the information available in TAIR, ARALIP, SEEDGENES, and BRAD databases for Arabidopsis gene functions of genes, to identify candidate genes regulating seed coat color and oil content (Supplementary Table 16). EH2 is a yellow-seeded variety with ~ 36.9% oil content, while PJK is a brown seeded variety with ~ 41.1% oil content28 (Fig. 1). Orthologs of at least 14 seed coat color-related and 17 oil content-related Arabidopsis genes showed differential expression in EH2 and PJK indicating their potential involvement in regulating contrasting phenotypes (Table 1). We also analysed the highest expressing transcripts in each of the stages, in both the varieties, to determine if any of these are interesting candidates based on inter-varietal differential expression. For this, we shortlisted the top 100 expressing transcripts in 15D, 30D, and 45D stages of both EH2 and PJK. Among these, we obtained 11, 7, 37 highest expressing transcripts in 15D, 30D, and 45D respectively which showed significant upregulation in EH2 as compared to PJK at the respective stages, and 12, 18, and 22 transcripts that were highly expressed in 15D, 30D, and 45D in PJK and also significantly upregulated in PJK as compared to EH2 at the same stage (Supplementary Table 17). Based on overlaps with the pathway mapping and candidate genes results, we observed an interesting finding that of the 37 of these highest expressing transcripts upregulated in 45D in EH2, 10 were orthologs of Arabidopsis oleosin genes involved in oil biosynthesis (Supplementary Table 16 and 17), indicating that these may be good candidates for further dissecting the differences in lipid metabolism in EH2 and PJK.

Table 1 Transcripts shortlisted as candidates for seed coat color and oil content based on differential expression between PJK and EH2 and known roles in Arabidopsis (* represents ‘not differentially expressed’).

Discussion

Identification of conserved hallmarks regulating temporal stages of seed development in Brassica juncea

We utilized two approaches to dissect the global transcriptional dynamics during the early, middle, and late stages of seed development in B. juncea. In the first approach, we identified commonly upregulated transcripts in both the varieties at different stages of development, and dissected functional categories and candidates important for governing early, middle, and late seed development in B. juncea. Several key transcripts known as hallmarks of embryo development and endosperm cellularization, seed maturation and filling, and desiccation and dormancy were detected at early, middle and late stages of seed development, respectively.

The functional categories enriched among transcripts upregulated at an early stage highlighted cell wall-related transcripts. It is well evidenced that cell wall loosening and weakening are crucial for the growth of developing embryos33. Several MADS and MYB TFs, enriched in the early stage in our data, are also known to regulate embryo development in Arabidopsis34. One of the most well-known candidates was MINISEED3 (MINI3), which encodes a WRKY10 transcriptional activator and acts as a pivotal regulator of embryo development and cell size by regulating cell division and elongation35. It is also recently shown in B. napus as a potential regulator of CK responses in developing endosperm36. Some of the other important candidates were homologs of AMINO ACID PERMEASES1 (AAP1), ARABIDOPSIS FORMIN HOMOLOGY5 (AtFH5), (RETARDED GROWTH OF EMBRYO1 (RGE1), etc. AAP1 imports amino acids in the embryos and affects the content of carbon, nitrogen, and seed storage compounds37. AtFH5 codes for a formin protein, crucial for maintaining actin organization and cytokinesis, and is important for normal endosperm cellularization38. RGE1 codes for a bHLH TF, which regulates embryo development39.

The pathway categories enriched in the middle stage also conformed to the physiological changes during the middle stage of seed development. For example, brassinosteroids (BRs) regulate embryo and endosperm development40. B3 TFs, enriched in the middle stage, consist of several crucial regulators of seed storage accumulation like ABSCISIC ACID INSENSITIVE3 (ABI3), FUSCA3 (FUS3), LEAFY COTYLEDON2 (LEC2), etc. These are the master regulators of the seed maturation programs in dicot seed development41. Several crucial candidates upregulated in the middle stage were orthologs of genes regulating oil content and quality. For instance, in B. napus, simultaneous silencing of FATTY ACID DESATURASE2 (FAD2) and FATTY ACID ELONGASE1 (FAE1) has been shown to reduce the content of polyunsaturated fatty acids (PUFA) and erucic acid and increase the content of oleic acid and proteins in the oil42,43. FUS3 expresses during embryo maturation and is a significant regulator of seed filling during seed maturation44. Seed coat-related transcripts were also prominent in transcripts upregulated at the middle stage, for example, TRANSPARENT TESTA4 (TT4), TT10, TT16, TRANSPARENT TESTA GLABRA2 (TTG2), etc. TT10 codes for a laccase-like enzyme involved in the polymerization of flavonoids and regulates the browning of the seed coat45. TTG2 encodes a WRKY TF that controls the deposition of tannins in the seed coat46.

The late-stage-upregulated transcripts were mainly related to seed dormancy and desiccation-related functions. For example, LEA proteins are crucial for desiccation and seed longevity in mature seeds47. bHLH TFs are known to participate in the regulation of DELAY OF GERMINATION1 (DOG1), which regulates seed dormancy48. Eight transcripts orthologous to DOG1 showed upregulation among the dormancy-related transcripts at the late stage. DOG1 is a significant regulator of seed dormancy and longevity, as shown in several species like Arabidopsis, wheat, lettuce, etc. It regulates dormancy in response to various endogenous and environmental factors by affecting multiple germination-related processes like activation of storage compounds, weakening seed coats, cell elongation, etc.49. These observations not only confirm that we could successfully capture the transcriptional repertoire regulating landmarks stages of seed development in B. juncea but also highlights high conservation in the key regulators modules regulating seed development in Arabidopsis and Brassica.

An extended period of cell proliferation is a possible cause of increased seed size in the bold-seeded variety

In the second approach, we used the transcripts upregulated explicitly in each of the varieties at different stages, and identified the crucial candidates governing variety-specific differences in seed size. We identified cell cycle and cell division-related transcripts as significant determinants of seed size. These were orthologous to genes encoding cyclins and CDKs, the regulatory and catalytic subunits of the protein kinases that regulate the cell cycle progression. For example, CDCs are crucial for DNA replication initiation while APCs are essential for cell cycle phase transition50. The decrease in expression of these transcripts in late-stage in EH2, but not PJK, and the upregulation of several cell cycle-related transcripts in the late stage of PJK compared to EH2 indicates that cell division and cell cycle-related genes stay upregulated during later stages in PJK. At the same time, they are downregulated in EH2, suggesting that an extended period of cell proliferation is likely responsible for increased seed weight in PJK compared to EH2.

Interestingly, a similar finding has been recently made in B. rapa27. Based on the differential expression analysis between the seeds of high and low seed size cultivars, Niu et al. 202027 concluded that cell division and cell cycle are the significant determinants of seed size in B. rapa. They also suggested that the duration of cell cycle-specific gene expression at different seed development stages may be instrumental in determining seed size differences27. Further, Li et al. 201521 have also demonstrated that seed weight exhibits a higher correlation with cell number rather than the cell size of cotyledons and seed coats. These observations also hold up in B. napus18, indicating that cell number rather than cell size is the primary determinant of seed weight in Brassica. In another dicot, chickpea, an extended period of cell proliferation has been determined to contribute to cultivar-specific differences in seed weight51.

Since phytohormones are crucial regulators of the cell cycle, and auxins and cytokinins especially are classically known as positive regulators of cell proliferation, we also identified the phytohormone-related transcripts which may be responsible for regulating the cell cycle-related transcription52,53. Coexpression analysis suggested that TFs like E2Fs and phytohormones including BRs, CKS, GAs and auxins contribute to differential levels of cell cycle and cell division in developing seeds. E2Fs are well known for regulating the cell cycle during seed development50 and were represented in 6 of the 8 clusters. E2F and B3 TFs were also found to coexpress with cell cycle genes in B. napus and were identified as possible hub genes for regulating cell proliferation in developing seeds27. BRs regulate cell division positively, and BRI1 mutants exhibit a decrease in seed size. Hence, BRI1 is a valuable candidate for transcripts clustered in group 1. GASA genes express in response to GAs, and GASA10 is previously known to regulate cell elongation positively54. Therefore, GASA10 may also be a useful candidate for regulating cell cycle-related transcripts. CKXs are cytokinin dehydrogenases regulating cell proliferation55 and are thus, other potential candidates in the clusters. In B. napus, the cytokinin oxidase/dehydrogenase (CKX) gene family has been predicted to be a good candidate for manipulating seed size based on the transcriptome analysis of developing seeds and other tissues56.

Furthermore, candidate gene-based analysis identified orthologs of at least eight genes (BRI1, DET2, EOD3 MINI3, CKX2, WRI1, ANT, MKK5) which may be involved in regulating seed size differences in EH2 and PJK (Fig. 7). Among these, BRI1, DET2, EOD3, ANT, and MKK5 mainly regulate cell proliferation in the seed coat. MINI3 and CKX2 mainly control cell division in the endosperm, and WRI1 is primarily involved in regulating cell division in embryos. On the other hand, the negative regulator ARF2 mainly restricts cell division in the seed coat57. BRI1 and DET2 are related to BR signaling-mediated control of seed size. BRI1 mutants are BR insensitive and defective in BR reception. DET2 regulates cell size and cell number in embryos and cell size in integuments positively58. In our data, both orthologs of DET exhibited upregulation in PJK in the early stage. EOD3 codes for CytochromeP450/CYP78A6 and acts as an enhancer of DA1-1. It positively regulates the cell size of the integument 59. ANT codes for an AP2 (APETALA2) TF and promotes cell proliferation in the integuments60. It has also been shown to act downstream of TF MEE45 (MATERNAL EFFECT EMBRYO ARREST45). ANT positively regulates the auxin signaling and promotes cell proliferation in integuments and possibly embryo61. Only two of the Brassica orthologs of ANT showed upregulation in EH2 in the middle stage. Two orthologs showed upregulation in PJK in the late stage, and three orthologs showed upregulation in the early stage (Supplementary Table 15). MKK5 is a positive maternal regulator of seed size62, and three out of four orthologous transcripts showed upregulation in PJK in all three stages of seed development. MINI3, a WRKY TF, expressed in both endosperm and embryo, is a positive regulator of seed size. It regulates endosperm size and thus seed size. Mutants of MINI3 exhibit reduced seed size63,64. Two orthologs of MINI3 exhibited upregulation in PJK in the early stage, indicating its possible involvement in seed size regulation. CKX2 degrades cytokinin and thus reduces cytokinin signaling in the micropylar endosperm and controls endosperm size via the IKU2 pathway. Reduction of cytokinin activity through the IKU pathway is required for the active growth of endosperm. Hence, CKX2 is also a positive regulator of endosperm and seed size65. Of the three orthologs showing differential expression in our data, one showed upregulation in all three stages of seed development in PJK (Supplementary Table 15).

Further, overexpression of Arabidopsis WRI1 enhances seed weight in Camelina sativa, primarily through cell expansion in embryos66. We observed upregulation of WRI1 ortholog in the middle stage in PJK. ARF2 is a negative regulator of seed size, repressing cell proliferation in the seed coat through mediating auxin signaling67, and was upregulated in EH2 and downregulated in PJK at all three stages of seed development (Supplementary Table 15). Thus, these transcripts are good candidates for investigating variety-specific differences in seed size in B. juncea. ARF2 and ANT are the most interesting, as ARF2 represses ANT by binding to its promoter, and ANT positively regulates cell proliferation68,69. In our study, ARF2 is upregulated in EH2, while ANT is upregulated in PJK. Hence, this opposite expression pattern makes this ARF2-ANT cascade a strong candidate for future validation for investigating the molecular basis of seed size differences in these varieties.

Identification of candidates from East European and Indian gene pools of Brassica juncea for improving seed quality

Utilizing the differential expression analysis between the species, we also shortlisted candidates to regulate seed coat color, oil content, and other traits (Table 1, Supplementary Table 16). One of the seed coat color genes, TT8, has been previously shown in B. juncea to exhibit perfect co-segregation with the seed coat color phenotype70. In our study, TT8 ortholog B03_VARUNA_g5220.t1 showed upregulation in the brown-seeded variety (PJK) compared to the yellow-seeded variety (EH2), conforming to the expected pattern. In B. napus, TT8 has also been demonstrated to block phenylpropanoid synthesis, contributing to yellow seed color71. Similarly, orthologs of BAN (BANYULS), a late biosynthetic gene for flavonoid synthesis, were upregulated in all stages of PJK as compared to EH2. Downregulation of flavonoid biosynthesis-related genes like BAN, TT8, TT4, TT5, etc., have also been reported in yellow-seeded B. napus25. Moreover, various candidates related to TAG biosynthesis and metabolism were differentially expressed between EH2 and PJK. For instance, all four of the pivotal regulators of seed maturation and seed filling, LEC1, ABI3, FUS3, and LEC2 (LAFL) genes, were present among the differentially expressed transcripts in the PJK vs. EH2 comparison, and some were upregulated in EH2, while some were upregulated in PJK. These transcripts are suitable candidates for enhancing oil quality and content by contributing favorable alleles from both the varieties. Furthermore, upregulation of oil biosynthesis candidates like DGAT1, DGAT2, FAD2, LEC2 in PJK suggests that PJK can serve as a source of trait-enhancing alleles for not just seed size, but also for oil content. Previous studies have well-established that Indian gene pools have been selected for bigger seeds and higher oil content14. Hence, detailed transcriptional dynamics reported here can be further used to validate candidates for marker-assisted breeding for improving both oil content and seed size simultaneously using PJK as one of the parental lines.

Conclusions and future perspectives

We identified at least 136 cell cycle and cell division related transcripts like cyclins, CDCs, 14Cyclophilins, APC genes, CDKs, and other miscellaneous cell cycle-related genes that exhibit a decrease in the expression in later stages of seed development in the low seed weight variety EH2, but not in the high seed weight variety PJK. Further, we identified the candidate TFs (E2F, MYB, B3, etc.) and phytohormone-related transcripts (112 transcripts), which might act as upstream regulators of these cell cycle-related transcripts. Furthermore, eight previously known seed size-related genes (BRI1, DET2, EOD3 MINI3, CKX2, WRI1, ANT, and MKK5) were identified as the known candidates for explaining the variety-specific seed size differences. We also identified seed coat color and oil content-related transcripts differentially expressed between EH2 and PJK that can be utilized to improve oil quality. EH2 and PJK belong to distinct Indian and East European gene pools, respectively2. These gene pools are heterotic, and therefore have significant application in hybrid breeding in B. juncea72. The QTLs for seed size, coat color, oil content, and other seed traits have been previously reported in B. juncea2,11,28,73. However, marker-assisted breeding for improving these traits still lacks behind due to the scarcity of tightly linked markers and precise candidates. Hence, the candidates identified here (Table 1, Supplementary Tables 12 and 16) shall help address this lacuna. They can be prioritized for further validation and in crop improvement programs through genome editing or marker-assisted breeding to improve seed size and other seed-related economically important agronomic characters.

Materials and methods

Harvesting of samples and phenotypic analysis

Two varieties of B. juncea, including a bold-seeded variety Pusajaikisan of Indian origin and a small-seed weight East European variety EH2, were used for RNA sequencing. Plants were grown from October to March 2018 under open field conditions. Hand pollinations were performed as described previously74. Seeds were harvested at three developmental stages, i.e., 15, 30, and 45 days after pollination (D), immediately frozen in liquid nitrogen and stored at − 80 °C for RNA isolation. Seed weight was measured using five biological replicates of 20 seed each, and the average seed weight was calculated in mg. Thousand seed weight of mature seeds was calculated as described earlier2.

RNA isolation and sequencing

Total RNAs were isolated using the Spectrum ™ Plant Total RNA kit (Sigma-Aldrich) as per manufacturer’s guidelines. Genomic DNA contamination was removed from the samples using the Turbo DNase kit (Ambion) following the manufacturer’s instructions. RNA quality was checked using nanodrop, agarose gel electrophoresis, and Bioanalyzer. A total of 18 seeds samples with RIN > 7 were selected to prepare libraries using the Illumina Truseq™ RNA Sample Prep kit as per the manufacturer’s instructions. Paired-end sequencing was performed using Illumina HiSeq 2500 with an average read length of 100 bp.

Analysis of RNA sequencing data

Quality filtering at Q30 and adaptor trimming were carried out using FastP75. High-quality reads were mapped to the genome assembly of B. juncea variety Varuna31 using HISAT276. Further, the aligned reads were assembled using StringTie v2.1.777 and expression was estimated as counts using featureCounts v.2.2.0 with default parameters78. The gene expression level was quantified in TPM79 (transcripts per million). The correlation coefficient among biological replicates was calculated using Spearman’s correlation coefficient using the R package. Pairwise differential expression analysis was performed between different developmental stages/ varieties using the R package DESeq2 v.1.32.078. The significance of differential gene expression was considered by the cut-off threshold of absolute log2 fold change ≥ 1 (upregulated transcripts) or ≤ − 1 (downregulated transcripts) and Benjamini- Hochberg adjusted P-values < 0.05 (false discovery rate).

Functional analyses of transcriptomic data

Arabidopsis orthologs of Brassica transcripts were identified using BLASTx with TAIR protein (https://www.arabidopsis.org/download/index-auto.jsp?dir=/download_files/Proteins) sequences at an e-value of e < 10–3. Pathway mapping and enrichment analysis were carried out using MapMan (https://mapman.gabipd.org/mapman) as described earlier80. A p-value cut-off of  ≤ 0.05 was used for enrichment analysis, followed by Benjamini Hochberg correction with a q-value cut-off of ≤ 0.05. Gene ontology analysis was performed using AgriGO (http://systemsbiology.cau.edu.cn/agriGOv2/c_SEA.php). GO enrichment analysis was performed using a p-value cut of ≤ 0.05 followed by Hypergeometric test with the Bonferroni analysis method. Transcription factors were downloaded from Plant Transcription Factor Database (PTFDB v5.0) (http://planttfdb.gao-lab.org/). Heatmaps were generated using the MeV tool81. For identification of expressed transcripts, k-means clustering was performed using MeV. K-means clustering was performed to obtain 8 clusters using Pearson correlation distance metric with maximum number of iterations set to 1000. To identify candidate genes for seed-related traits, the functional_descriptions related to “seed” traits were downloaded from TAIR (https://www.arabidopsis.org/index.jsp) database. ARALIP (http://aralip.plantbiology.msu.edu/pathways/pathways) was used to identify triacylglycerol synthesis-related genes. For extracting glucosinolate biosynthesis-related genes, BRAD (http://brassicadb.cn/#/) database was used.

The use of plants in the present study complies with international, national, and/or institutional guidelines.