Abstract
Soybean is highly sensitive to flooding and extreme rainfall. The phenotypic variation of flooding tolerance is a complex quantitative trait controlled by many genes and their interaction with environmental factors. We previously constructed a gene-pool relevant to soybean flooding-tolerant responses from integrated multiple omics and non-omics databases, and selected 144 prioritized flooding tolerance genes (FTgenes). In this study, we proposed a comprehensive framework at the systems level, using competitive (hypergeometric test) and self-contained (sum-statistic, sum-square-statistic) pathway-based approaches to identify biologically enriched pathways through evaluating the joint effects of the FTgenes within annotated pathways. These FTgenes were significantly enriched in 36 pathways in the Gene Ontology database. These pathways were related to plant hormones, defense-related, primary metabolic process, and system development pathways, which plays key roles in soybean flooding-induced responses. We further identified nine key FTgenes from important subnetworks extracted from several gene networks of enriched pathways. The nine key FTgenes were significantly expressed in soybean root under flooding stress in a qRT-PCR analysis. We demonstrated that this systems biology framework is promising to uncover important key genes underlying the molecular mechanisms of flooding-tolerant responses in soybean. This result supplied a good foundation for gene function analysis in further work.
Similar content being viewed by others
Introduction
Soybean [Glycine max (L.) Merr] provides abundant flavonoids, plant-based proteins and lipids. It is the major protein source for vegetarians. Soybean is nutritious for their isoflavones and anthocyanins belonging to flavonoid compounds1. Isoflavones, of which soybean has higher content, generally exist in many kinds of plants2. Isoflavones have been functionally linked to anti-oxidation, reduction in inflammation, inhibition of free radicals, and cancer prevention3,4,5. Anthocyanin and its main constituents, such as cyaniding-3-O-glucoside, present in soybeans can effectively inhibit lipopolysaccharide, hydrogen peroxide, and pro-inflammatory cytokines, which are a natural source of antioxidants and anti-inflammatory6,7,8. Hence, soybeans could be used to boost the nutritional content, nutraceutical products, and potential therapeutic agents for some pathological diseases.
Soybeans are highly sensitive to growth conditions, particularly in flooding environments9,10. In recent years, global agriculture damage and losses from changing climate (e.g. flooding) have increased11,12. Extreme torrential rain or momentary heavy rain brought by strong southwesterly air currents or jet streams induced by typhoons has caused severe flooding during soybean (including edamame) autumn seedlings in southern and western areas of Taiwan. In the United States, flooding can occur sequentially during a single crop cycle or independently in the same fields during different years. Over the past 15 years, flooding resulted in $6.2 billion worth of soybean production losses. Loss of soil, nutrients, and pesticides to waterways is a major problem in high agricultural production areas such as the Mid-western United States13,14. In China, the flooding stress of soybean is associated with excess irrigation that impairs water uptake, and soil waterlogging is largely affected by the season15. The total summer crop sown area in 2020 is 26.17 million hectares; therefore, the floods affected 23% of the planted area of summer crops and caused 4.3% of crop failure. Facing such high uncertainty climate change, we need a systematical and comprehensive method to find the whole picture of defense mechanisms against flooding for breeding stress-tolerant cultivars.
There is general recognition that flooding can be classified into waterlogging, when the water covers only the root system, and submergence, when the water covers both the shoot and the root system, according to water levels above the soil surface16. The present study mainly focuses on submergence. Abiotic stresses can disturb plant growth and adversely affect growth characteristics, for example, leaf etiolation and the number of pods per plant17,18,19. Under the flooding stress, the contents of flavonoid compounds in soybean increase significantly, but the yields decrease simultaneously20,21,22. Also, cell wall maturation, cell wall formation, and plant development will be seriously changed during flooding23,24,25,26. Thus, a better understanding of the physiological mechanisms involved in flooding-induced response and tolerance of soybeans is needed for breeding work.
Mechanisms related to flooding tolerance or response have been investigated and reviewed27,28. At initial flooding stress of soybean, ATP-citrate lyase and xylosidase decrease while alcohol dehydrogenases and calreticulin increase29. These enzymes are related to the tricarboxylic acid cycle, cell wall maturation, alcohol fermentation, and calcium homeostasis26,30,31. Prolonged submergence caused a significant decline in photosynthesis, stomatal conductance, and the nutrition absorption of leaves32. Soybean produces abscisic acid (ABA) to regulate protein kinases under hypoxia33. These protein kinases are related to pathways including glycolysis, cell organization, and vesicle transport20,22,33. The proteomic analyses have found that excessive water supply for soybean roots induces anthocyanin 5-aromatic acyltransferase, anthocyanin malonyltransferase, and isoflavone reductase to increase20,34,35. These protein kinases facilitate isoflavones and anthocyanins to increase the survival rate after flooding. Although many molecular and physiological mechanisms were reported, mechanisms of flooding-induced response and tolerance have yet to be fully clarified for soybean. No studies were reported on the enhancement of pathway analysis for flooding tolerance and response, a polygenetic trait, by introducing multigenes selected from an integrated knowledge framework in a systematic and comprehensive design36.
Flooding tolerance is a complex quantitative (or polygenic) trait, which is regulated through several biological pathways that are controlled by a number of genes (i.e., polygenes). Many functional mechanisms studies for flooding tolerance in soybean have been reported20,33,37. Most of the studies were based on selected candidate genes that were hypothesis-driven, such as text-mining-based38 and meta-analysis-based39. However, these mechanisms may only partially explain flooding due to a limited understanding of the genetic make-up of a polygenic trait, particularly flooding tolerance. Furthermore, potential biases might have affected the results using the hypothesis-free approach, for example, genome-wide association study (GWAS), because it is challenging to account for variations between germplasms and quantitative trait40. It is also challenging to balance the results between false positives and false negatives in GWAS41. Determining the genetic makeup underlying flooding tolerance in soybean is crucial to precisely identifying mechanisms related to flooding tolerance or responding to stress. Hence, applying pathway-based analysis to selected candidate genes can systematically integrate prior biological knowledge of gene regulating functions and biological pathway information or functional categories to figure out the whole picture of physiological mechanisms for flooding tolerance in soybean. This can reveal a more comprehensive picture at the molecular level than a single marker-based or gene-level analysis.
The main purpose of system biology is to precisely explore the unknown mechanisms in experimental data containing implicit biological information42. Through systematical methods, pathway enrichment analysis, and network analysis, for example, enable us to understand the signal transmission of responses biologically in a plant cell being stimulated by an environmental factor. These signals are complicated, information-worthless in a single signal but information-valuable in systematic manners43. Pathway enrichment analysis, a knowledge-based approach, provide biological insights into molecular responses to a trait of interest from integrated omics and non-omics (OnO) data44. Pathway enrichment analysis detects whether particular biological pathways or molecular groups are significantly overrepresented. Networks have successfully carried on the idea of graph theory and probability theory to succinctly represent a mathematical structure of biological components using a group of nodes (e.g. proteins, genes, pathways) and links (e.g. genetic and/or functional interactions)45. Using available biological knowledge and candidate genes selected from integrated OnO data for network analysis provides a great potential to uncover novel information on complex biological networks46.
Methods of pathway enrichment analysis in systems biology can be generalized into, but not limited to, competitive and self-contained method47. In the competitive method, it compares associations between two gene sets (i.e., genes in a specific pathway versus genes not in that pathway) and traits, such as a hypergeometric test48. However, the self-contained method only considers associations between the genes in a specific pathway and traits, such as sum-statistic (e.g. SUMSTAT) and sum-square (e.g. SUMSQ) statistic49. There are several examples that successfully applied pathway-based analysis to explore potential mechanisms and biological functions for important traits in plants, including cytoplasmic male sterile in soybean50, comparison between a mutant gene and wild-type in soybean51, or high temperature in soybean52. Recently, Naithani et al.53 developed the Plant Reactome, a knowledgebase and resource for pathway-based analysis in plants to address important biological questions and regulatory mechanisms. Many open-access knowledgebase data such as Gene Ontology (GO, http://geneontology.org/) and Kyoto Encyclopedia of Genes Genomes (KEGG, https://www.genome.jp/kegg/kegg2.html) are commonly used worldwide. These functional annotations provide opportunities to access the whole map underlying a specific trait via systematically testing unknown functional gene sets by statistical model54. The networks integrate biological information (e.g. proteins, molecules, pathways), and quantify nucleic acid information, providing information on the associations between several genetic loci, and how genes and pathways interact with each other (i.e., gene modules) to regulate traits. The association between genes can be visualized by the network composed of nodes and edges, making the complex associations between genes presented in a simple and trivial way55,56. It is practical and efficient way to reveal enriched pathways and networks for flooding-tolerant responses using candidate genes prioritized from integrated OnO databases57.
We previously developed a comprehensive framework to integrate OnO data that is relevant to flood-tolerant responses in soybean. A total of 36,705 genes were collected and prioritized according to their magnitude of association with flooding-tolerant responses36. In this study, we introduced a systems biology framework (Fig. 1), through the pathway enrichment analysis (both the competitive and the self-contained methods) and network analysis to combine the joint effects of the 144 prioritized flooding tolerance genes (i.e., FTgenes) (Fig. 2) to uncover the molecular mechanisms underlying flooding-tolerant responses in soybean. The strategies proposed in this study can better understanding in how flooding-tolerance genes act against a flooding event and protect soybean plants from floods in complex biological systems.
Results
Gene-pathway mapping
A total of 14,772, 17,017, 19,060, and 18,889 expression data (Step 1 in Fig. 1) from soybean roots after 3, 6, 12, and 24 h (h) of submergence treatments in the RNA-seq database58 were used as the test sets to conduct pathway enrichment analyses. Only pathways (i.e. GO terms) containing at least one FTgenes (Fig. 2) were considered, resulting in 417 annotated pathways (Step 1 in Fig. 1) for pathway enrichment analysis.
Gene-wise statistic values
For gene score calculation, we transformed expression-level statistics (i.e. p-values) using 10-based logarithms into gene-wise statistic scores (Step 2 in Fig. 1) to measure changes in gene expression in roots flooded after 3, 6, 12, and 24 h. The distributions of gene-wise statistic scores were highly skewed to the right (Fig. 3), as seen in the microarray data. The expression skewness of each dataset was 4.82, 4.31, 4.91, and 4.24, respectively, indicating the expression skewness has the potential to reveal new insights into the FTgenes (Fig. 2) in the analyses of pathway enrichment and gene network.
Competitive method (hypergeometric test) revealed the mechanisms of flooding-tolerant responses
Using the hypergeometric model test (Step 3 in Fig. 1), we initially found 27 pathways (Fig. 4) with at least one nominal p-value less than 1.00 × 10–4 that were enriched with flooding tolerance or response to the stress in the gene expression data from soybean roots after 3, 6, 12, and 24 h of submergence treatments. Among them, 24 pathways were significantly enriched at all four-time points after submergence treatments. Table 1 demonstrated detailed information on significantly enriched pathways overrepresented in the gene expression dataset. The top five pathways included ‘abscisic acid mediated signaling pathway’, ‘response to ethylene stimulus’, ‘ethylene biosynthetic process’, ‘hyperosmotic salinity response’, and ‘response to the jasmonic acid stimulus’. Two pathways, ‘abscisic acid mediated signaling pathway’ and ‘response to ethylene stimulus’, were the most significantly enriched at 3 h after submergence treatments. The pathway of ‘abscisic acid mediated signaling pathway’ was the most significantly enriched at 6 and 12 h after submergence treatments. The top five pathways were the most significantly enriched at 24 h after submergence treatments.
Self-contained methods (SUMSTAT, SUMSQ) revealed the mechanisms of flooding-tolerant responses
The 144 FTgenes (Fig. 2) were significantly enriched in fourteen GO pathways (14 in SUMSTAT and 1 in SUMSQ) after controlling the false discovery rate at the 0.05 level in the self-contained approaches (Step 3 in Figs. 1, 5). Among them, only one GO pathway, ‘response to hypoxia’, was found at all four-time points in both methods. Tables 2 and 3 demonstrated detailed information of significantly enriched pathways overrepresented in the gene expression dataset using SUMSTAT and SUMSQ, respectively. Five GO pathways were significantly enriched after submergence treatments at all four-time points. The top five pathways included ‘response to hypoxia’, ‘response to cadmium ion’, ‘systemic acquired resistance’, ‘salicylic acid mediated signaling pathway’, ‘regulation of hydrogen peroxide metabolic process’, and ‘glycolysis’. One pathway, ‘response to hypoxia’, was the most significantly enriched at 3 h after submergence treatments. Three pathways, including ‘response to hypoxia’, ‘systemic acquired resistance, salicylic acid mediated signaling pathway’, and ‘response to cadmium ion’ were the most significantly enriched at both 6 and 12 h after submergence treatments. The ‘response to wounding’ pathway and the top five pathways were the most significantly enriched at 24 h after submergence treatments. Six pathways (‘response to wounding’, ‘ethylene mediated signaling pathway’, ‘carboxy-lyase activity’, ‘thiamine pyrophosphate binding’, ‘response to cold’, and ‘cell wall’) were not enriched at 3 h at the beginning but enriched later during 6–24 h after submergence treatments.
We found that four pathways (response to hypoxia, systemic acquired resistance, salicylic acid mediated signaling pathway, regulation of hydrogen peroxide metabolic process, and response to wounding) were reported in both competitive and self-contained approaches. However, there was no overlap among the top five pathways in both approaches.
Gene network analysis selects the key genes relevant to flooding-tolerant responses
Since many FTgenes were involved in flooding-tolerant responses, we conducted a functional gene network analysis (Step 4 in Fig. 1) to better understand how these FTgenes work together. Among the 144 FTgenes, 103 were found to have protein–protein interactions (PPIs) in the soybean interactome. Using the functional modules analytic tool in SoyNet, we successfully constructed a gene network specific to flooding-tolerant responses in soybean (Fig. 6; Sheet 1 in Supplementary Table 1). This gene network contained 103 FTgenes and 70 intermediate genes that were highly connected nodes (hubs) in the reference network and hence recruited in the gene network. The degree values of the 173 genes ranged from 0 to 66, with an average degree of 7.32. Of which, 110 genes (degree values between 0 and 2) and 13 genes (degree values between 3 and 10) had a low degree of centrality and were hence excluded from the gene network. Figure 6 demonstrated a dense gene network containing 50 genes, of which 23 genes had degree values between 20 and 30, demonstrating a high degree of centrality in the gene network. The 23 genes (highlighted in yellow), including eight FTgenes and 15 significant intermediate genes, had an average degree of 25.5. Among them, the eight FTgenes (Glyma.02g222400, Glyma.18g009700, Glyma.13g231700, Glyma.13g361900, Glyma.15g012000, Glyma.07g153100, Glyma.01g118000, and Glyma.15g011900), reported in both competitive and self-contained pathway analytic strategies (Table 4), demonstrated a high degree of centrality ranged between 20 and 30 (the average degree was 26). The eight FTgenes are mainly related to signal transduction (Glyma.13g361900, Glyma.15g011900, and Glyma.15g012000), energy-producing (Glyma.02g222400, Glyma.13g361900, and Glyma.18g009700), and plant hormone regulation (Glyma.15g011900 and Glyma.15g012000), indicating they play important roles in flooding-tolerant responses in soybean.
We selected 77 FTgenes from 24 significantly enriched pathways reported in the hypergeometric test to compute node edges in SoyNet and construct a gene network in Cytoscape. As a result, 103 genes (74 FTgenes and 29 intermediate genes) were retained, with degree values ranging from 0 to 35 (the average degree was 9.68). For simplicity, we further grouped 60 genes having a row degree of centrality into a node (named as Group0_2), and included the node with the remaining 43 genes (15 FTgenes and 28 intermediate genes) to form a gene network (Fig. 7; Sheet 2 in Supplementary Table 1). Of those, 20 genes (highlighted in yellow) having higher degree values between 20 and 30, with an average degree of 28.2, demonstrated to interact with each other more closely in the gene network. Among them, one Ftgenes (Glyma.14g127800) is mainly related to plant hormone transport that contributes to flooding-tolerant responses.
Another 34 Ftgenes from 5 significantly enriched pathways reported in SUMSTAT were being computed edges in SoyNet, resulting in 65 genes (32 Ftgenes and 33 intermediate genes), with degree values ranging from 0 to 71 (the average degree was 17.94). We further constructed a gene network in Cytoscape (Fig. 8; Sheet 3 in Supplementary Table 1), and observed that 29 genes (highlighted in yellow) were highly connected to form a dense module, with higher degree values between 20 and 29 (the average degree was 25.5). Among them, 6 Ftgenes are mainly related to signal transduction (Glyma.15g011900, Glyma.15g012000), plant hormone transport (Glyma.14G127800, Glyma.18G009700), and enzyme catalytic activity (Glyma.13G231700, Glyma.07G153100).
The above results show closely connected PPIs in the soybean interactome by entering the 144 FTgenes, 24 enriched pathways at all four-time points (3, 6, 12, and 24 h) in the hypergeometric test, and 5 enriched pathways at all four-time points in the SUMSTAT method, respectively. These genes were first compared to 23 (Fig. 6), 20 (Fig. 7), and 29 (Fig. 8) selected important genes (including the FTgenes and the intermediate genes) to examine their topological characteristics. Our results showed that these important genes had higher degree values in all comparisons, suggesting high degree of centrality. These FTgenes (103, 77, and 34 FTgenes) were further compared to the intermediate genes (70, 29, and 33 genes) and the remaining genes (14,599, 16,911, and 18,993 genes), respectively. We found that the FTgenes and the intermediate genes in the corresponding gene network more frequently received small p-values at all four-time points in gene expression datasets.
To further explore the key FTgenes, we selected 25 FTgenes from 5 significantly enriched pathways that overlapped in both the hypergeometric test and the SUMSTAT method to compute node edges in SoyNet and construct a gene network in Cytoscape. As a result, a gene network containing 25 FTgenes and 26 intermediate genes was obtained (Fig. 9; Sheet 4 in Supplementary Table 1), with degree values ranged from 0 to 28 (the average degree was 13.68). Of them, 26 genes (highlighted in yellow) were closely linked, having higher degree values between 20 and 30, with an average degree of 25.3. Among them, four FTgenes (Glyma.13g361900, Glyma.15g012000, Glyma.15g011900, and Glyma.14g127800) exhibited higher degree values ranged between 26 and 28, with an average degree value of 26.3. The four FTgenes are mainly related to signal transduction (Glyma.13g361900, Glyma.15g011900, Glyma.15g012000) and plant hormone transport (Glyma.14g127800) to play key roles in flooding-tolerant responses in soybean.
More importantly, all these FTgenes in the corresponding gene networks had significantly larger mean scores (or smaller p-values) in the corresponding gene expression datasets (p-values < 0.001) compared to the remaining genes (Fig. 10). Similar scenarios were also observed in the intermediate genes, although they did not reach significance level at 0.05. In particular, we further selected 8, 1, and 6 key FTgenes from the corresponding gene network, respectively, and found that these key FTgenes significantly outperformed all gene groups (Step 5 in Figs. 1, 10). The nine key FTgenes (Fig. 11A) were involving with signal transduction (Glyma.15g012000 and Glyma.15g011900), energy (Glyma.02g222400, Glyma.18g009700, Glyma.13g361900, and Glyma.14g127800), enzyme activity (Glyma.07g153100 and Glyma.13g231700), and unknown function (Glyma.01g118000), which were significantly related to abscisic acid transport and terpenoid transport.
To validate the flooding stress responses in the plant cell, a real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR) was used to measure the level of the nine key FTgenes expressions in soybean root under flooding stress (Step 6 in Figs. 1, 12). Our results revealed that four energy involved genes (Glyma.02g222400, Glyma.18g009700, Glyma.13g361900, and Glyma.14g127800) were significantly upregulated from 3 to 24 h except for Glyma.14g127800 which showed downregulation in all conditions compared with the control (i.e. untreated condition). In the enzyme activity involved genes (Glyma.07g153100 and Glyma.13g231700), the highest expression was found at 12 h after treatment. For those of signal transduction involved genes, the transcript level of Glyma.15g011900 and Glyma.15g012000 was significantly higher than the control from 3–24 h to 6–24 h, respectively. Interestingly, the Glyma.01g118000 which is an unknown function gene exhibited around 330–7000 times higher expression level than the control, and when compared with the other, this gene also has the highest relative gene expression.
Discussion
Understanding genetic backgrounds and molecular mechanisms underlying flooding-tolerant responses is imperative for soybean breeding. However, the success in identifying candidate genes for flooding-tolerant responses in soybean has been limited because of the complex nature of abiotic stresses. The present study introduced systems biology methods using pathway enrichment analysis (both competitive and self-contained approaches were considered) and gene network analysis to evaluate the joint effects of multi-genes (in this context, FTgenes) within annotated GO pathways. Most importantly, the FTgenes36 (Fig. 2) used in this study were prioritized from multiple OnO databases integrated from experimental and computational studies that have been made available in the last decades. In particular, several data-ensemble approaches were performed, including data cleaning, data harmonization, data heterogeneity, and data mapping, to remove unwanted data and inaccurate data. Through the process of gene prioritization, the uncertainties, noise, biases, and false positives raised from the data itself and statistical approaches could be reduced effectively.
Systems biology often requires sophisticated computational models and simulations to understand the larger picture of the biological systems by studying interactions among a set of candidate genes59. Integrative pathway and network analysis marry the idea of mathematical graph theory and data-driven approach (e.g. multiple omics data, OnO data integration) to efficiently uncover the genotype–phenotype relationship at the systems level by integrating knowledge of gene regulation and function. In an attempt to integrate OnO data with mathematical graph theory, we introduced an integrative pathway and network approach to construct a comprehensive view of the biological mechanisms for flooding-tolerant responses in soybean. To the best of our knowledge, this is the first work on the pathway and network analyses using candidate genes prioritized from multiple OnO data integration algorithms. Our results reveals novel molecular pathways and functional relationships of the FTgenes to better understand their biological implications in the regulatory system for further validation.
Skewness is widely used to measure the degree of asymmetry in expression data. Expression skewness can identify novel molecular pathways and key genes via the systems biology approaches (e.g. enrichment pathway analysis and functional network analysis), which is a valuable way to capture meaningful outliers (i.e. the greatest variation between samples with and without submergence) and asymmetrical behavior in the whole genome expression dataset60. Our results demonstrated a high degree of skewness (Fig. 3) that was appropriate for pathway and network analyses. In addition, our results may provide valuable insights into exploring mechanisms underlying flooding-tolerant responses in soybean.
In the studies of crops, pathway analysis is merely used for the exploration of candidate genes focusing on specific traits58,61. In general, pathway analysis can be distinguished into two different approaches, the competitive and the self-contained, according to their null hypothesis48. In practical applications, however, two different approaches often generated inconsistent results62 due to distinct null hypotheses. The competitive methods can potentially exclude confounding effects and provide biological relevance to the analysis63. The self-contained methods have the greater power to identify feature-set (i.e. GO pathways), and the outcomes are highly reproducible64. Both approaches have their strengths and limitations. Therefore, a suitable way to gain better insights into the data is to perform the competitive and the self-contained approaches simultaneously for feature-set (i.e. GO pathways) testing. This could reduce the likelihood of false-positive results and gain biological relevance to the analysis.
In this study, we identified 36 overrepresented GO pathways (Tables 1, 2, 3) in the independent RNA-seq databases of submergence treatments in soybean. The most frequently shared FTgenes among enriched pathways were Glyma.02g195300 (functioning in 23 pathways), Glyma.05g021100 (functioning in 20 pathways), Glyma.08g218600 (functioning in 20 pathways), and Glyma.14g102900 (functioning in 20 pathways), which were found in two or more pathway-based methods (Table 4). Many of these FTgenes (6, 18, 10, and 12 FTgenes in 3, 6, 12, and 24 h, respectively) were not significantly overrepresented at the single gene-level in the RNA-seq databases (Fig. 3); however, they were enriched (p-values < 0.001) with flood-tolerant responses at the systems level using pathway-based analytic approaches. For instance, Glyma.17g236200, Glyma.19g013700, and Glyma.11g180500 gene did not reach genome-wide significant association, but were found at the systems level in our approaches. In particular, Arabidopsis GO and Uniprot GO databases provide opportunities to access a better understanding of how these FTgenes participate in flooding activities. Under flooding conditions, the Glyma.17g236200 gene regulates root development to prevent from wounding, and the Glyma.19g013700 gene mediates the transpiration efficiency by regulating ABA to control stomata closure. In further, the Glyma.11g180500 gene participates in RNA regulation, producing factors to control where plant hormones should work. Evidence from previous studies confirmed the roles of these important FTgenes and pathways identified in this study for the complex mechanisms of flooding-tolerant responses in soybean. These findings indicate that systems biology methods can boost the power to reveal the potential roles of FTgenes in uncovering the molecular mechanisms and biological novelties for studying flooding-tolerant responses in soybean.
The hypothesis and the model of different categories of pathway analysis are distinct; hence, the results are also different. In this study, we compared the results across the hypergeometric test, the SUMSTAT method, and the SUMSQ method. In total, 27, 14, and 1 enriched pathway(s) were identified in the hypergeometric test (Table 1, Fig. 4), the SUMSTAT method (Table 2, Fig. 5A), and the SUMSQ method (Table 3, Fig. 5B), respectively. The three methods found only one pathway, ‘response to hypoxia’ in all four-time points (3, 6, 12, and 24 h) of gene expression data. Under flooding conditions, the response to hypoxia begins with low-oxygen stimulation, followed by activates the transcription of plant hormone genes. Plant hormones, such as ABA, ethylene, and salicylic acid, are involved in participating in roots recovery65. Of which five pathways were consistently reported by both the competitive and the self-contained approaches, even a more stringent threshold was applied to correct for multiple testing. The ‘systemic acquired resistance, salicylic acid mediated signaling pathway’ is responsible for regulating the biosynthesis, the perception, and the signal mediating66. When a plant suffers from flooding, hydrogen peroxide begins to express in roots to remove some harmful chemicals from flooding stress, and salicylic acid mediates a series of signals in producing hydrogen peroxide66,67. Evidence shows the important fact that the regulation of hydrogen peroxide interacts with salicylic acid by signaling series forms to eliminate fatal chemicals in roots cell under flooding stress66,67,68. Thus, ‘regulation of hydrogen peroxide metabolic process’ and ‘systemic acquired resistance, salicylic acid mediated signaling pathway’ are evidenced to be linked to flooding-tolerant responses in soybean. Besides, rice was also evidenced to be involved with these two pathways under flooding stress69,70. In soybean roots, cell wall and aerenchyma will swell under flooding. The response to wounding in roots leads to many salicylic acid signals activating and interacting with other plant hormones in order to restore the wounds71. After wounding, the soybean’s adventitious roots will grow against hypoxia environments. The energy from glycolysis and pyruvate-phosphorylation is consumed when soybean grows adventitious. Evidence showed that ‘response to wounding’ and plant hormone-related pathways may play key roles in flooding-tolerant responses in soybean. In addition, the gluconeogenesis and glycolysis, which can synthesize or degrade carbohydrates, make crops gain and store adequate ATPs in order to get more energy20,21,22,33. All the evidence suggested that ‘glycolysis’, ‘gluconeogenesis’, ‘pyruvate decarboxylase activity’, ‘abscisic acid mediated signaling pathway’, and ‘regulation of hydrogen peroxide metabolic process’ were found to be linked to flooding-tolerant responses in soybean, which were in line with the previous studies20,21,22,33. All these pathways mentioned above play key roles in the physiological mechanisms underlying flooding-tolerant responses. Our results demonstrated that the pathways we found differ considerably between distinct types of pathway analyses. Hence, combining distinct pathway-based analyses with considering different hypotheses and models can provide comprehensive, precise, and reliable results.
Ethylene is important to protein phosphorylation in the mechanisms of the initial stage of flooding stress, especially in root tips. Evidence shows that roots recovery needs more ATP to provide energy and protein phosphorylation to develop the cell tissue25,28,29,37. At the initial stage of flooding, root cells are stimulated by ethylene, and a series of mediated signaling produce more ethylene. The evidence proves that ‘response to ethylene stimulus’, ‘ethylene biosynthetic process’, and ‘ethylene mediated signaling pathway’ are important to flooding stress.
Our results also showed 22 (Table 1, Fig. 4) and 9 (Table 2, Fig. 5A) enriched pathways specific to the hypergeometric test and the SUMSTAT method, respectively. Without comparing the results of two different approaches, we might obtain false-positive and false-negative results. For instance, two pathways, ‘glycolysis’ and ‘gluconeogenesis’, were evidenced19,20,21,33 and found in the SUMSTAT approach but not in the hypergeometric test, and hence they are false-negative results; the ‘abscisic acid mediated signaling pathway’ pathway was evidenced22,25,28,33 and reported in the hypergeometric test but not in the SUMSTAT, and hence it is a false-negative result. Two pathways, ’response to chitin’ and ‘response to fungus’, were significantly enriched in the hypergeometric test, but did not reach the significance in the SUMSTAT approach. Besides, the two pathways were evidenced to be related to biotic stress72. Hence, the two pathways might be false-positive results or novel findings that need further validation. Again, combining the competitive and the self-contained methods is a promising approach to better understanding a given candidate genes for a trait of interest.
Benefits from combining advantages of pathway enrichment analysis and network analysis, our study not only discovers new novelty about the flooding mechanisms in soybean but also captures more information of biological systems. For instance, We finally selected nine key FTgenes, four of these genes (Glyma.13g361900, Glyma.14g127800, Glyma.15g012000, and Glyma.15g011900) were recorded in DNA, RNA, protein, function, and homologs layer; one gene (Glyma.18g009700) was recorded in RNA and protein layer; and four genes (Glyma.02g222400, Glyma.07g153100, Glyma.13g231700, and Glyma.01g118000) were recorded in RNA, protein, and homolog layer36 (Fig. 11B). These key genes may play important roles in coordinating physiological mechanisms under flooding-tolerant responses in soybean.
The systems biology framework proposed in this study demonstrated the power in identifying the nine key FTgenes in a rigorous and efficient manner. To validate the 9 key FTgenes, in planta FTgenes expression analysis was performed. Our qRT-PCR results (Fig. 12) revealed that eight key FTgenes were upregulated and one FTgene was downregulated after exposed to flooding stress. The results demonstrated the unique and differential response of soybean leaf tissue under flooding, offering the evidence of the real response to flooding in genetics and molecular biology. Our results can be supplied as a good foundation for the gene function analysis underlying flooding-tolerant responses in further work.
Although systems biology takes advantage of a comprehensive and systematic understanding of the FTgenes in flooding-tolerant responses in soybean, there still are some limitations and considerations in this study. First, pathway and network analyses were built on the basis of gene and pathway annotation completeness. In the application of scientific research, the research team of GO and PlantRegMap updates the databases and maintains the website annually. It ensures the databases are complete, accurate, and persuasive. Second, the accuracy of pathway and network analyses relied on the accuracy and the completeness of the FTgenes. Fortunately, our FTgenes were selected from a comprehensive framework consisting of omics and non-omics data integration and gene prioritization algorithm. Several data quality control processes were done during the data-ensemble step to effectively reduce potential uncertainties, noise, and false positive results. Although our FTgenes are informative, more validation experiments are required.
Flooding-tolerant responses are a quantitative trait regulated by polygenes; thus, many traditional single-marker methods, such as association mapping, linkage mapping, and genome-wide association study, have no power to uncover the whole picture of how these genes interact with each other to regulate traits. Our proposed systems biology framework can efficiently integrate gene information with annotated GO database biologically to boost the power of identifying key FTgenes and their underlying molecular pathways or mechanisms. This provides an opportunity to better understanding complex flooding-tolerant responses that should be noted. These findings present a wealth of information for future validation.
Methods
We developed an integrative systems biology framework to explore insights into the FTgenes underlying flooding-tolerant responses in soybean. Six-step pipelines (Fig. 1) were proposed to select the key FTgenes, including the GO annotations filtering, gene-wise statistic scores calculation, pathway enrichment analysis, functional network analysis, validation study, and the key gene selection. Detailed methods and materials used in this study are described below.
Candidate genes for flooding-tolerance (FTgenes)
We previously proposed a comprehensive multiple OnO data mining, integration, and prioritization framework36. All genetic data (SNPs, genes, SSRs, QTLs) and bioinformatics information (trait index, variety, biochemical, statistical values) that were relevant to flooding-tolerant responses in soybean were collected and defined as a flooding-tolerance gene pool (containing 36,705 genes). These OnO data were integrated from multidimensional data platforms, including association mapping and GWAS, linkage mapping, gene expression, pathway regulatory, network analysis, protein–protein interaction, proteomic analysis, and model plants. Through the systems biology framework, a total of 144 prioritized FTgenes (Fig. 2), based on the cut-off score of 42, were selected from the gene pool36. The FTgenes were defined to be significantly associated or enriched with flood-tolerance or flood-response after flooding treatment (i.e., submergence) was conducted during the germination and vegetative growth stages of soybean. The study framework and the prioritized results, the data of which are used here, are provided elsewhere36.
Gene expression dataset and gene-wise statistic values
The gene expression dataset (whole genome expression database) of soybean seedling submergence was accessed through the database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap/) that was published by Lin et al.58. They used cultivar Qihuang 34, a flooding-resistant variety, for submergence experiments, and recorded gene expression changes in roots after 3, 6, 12, and 24 h of submergence treatments. All four-time periods of RNA expression data were obtained from the dbGaP repository. We used p-values, of which genes under the null hypothesis of no differential gene expression, to present gene-level statistic values of flooding-tolerance in soybean. To obtain gene-level significance, we used 10-based logarithms to transform p-values into gene-wise statistic scores to capture information for gene expression changes in roots flooded after 3, 6, 12, and 24 h.
Pathway annotations
To perform mapping for functional pathway analysis, we used GO73,74 (http://geneontology.org/) annotations. GO-based functional annotation in soybean contains 4896 terms covering 48,606 unique genes, mapped in the Williams 82 reference genome version 2 (Glycine max Wm82.a2.v1). These annotated GO gene sets (i.e. pathways) systematically provide a standard catalogue to classify functional genes into biological functions and molecular mechanisms. Pathways with overly limited information (< 6 genes) were removed, as well as substantially large (> 1500 genes) pathways. As a result, a total of 2926 pathways, which consist of 916 cellular components, 762 biological processes, and 1248 molecular functions, remained for pathway analysis. In pathway analysis, we used the negative logarithm of these 2926 pathways’ p-values as our statistic.
Statistical methods for pathway enrichment analysis
We utilized two different strategies, the competitive method, and the self-contained method47, to test for significantly enriched pathways for the trait of flooding-tolerance in soybean. Three statistical methods, including the hypergeometric test (competitive method), SUMSTAT, and SUMSQ (self-contained methods), were used to discover the significance of enriched pathways. The former method compares two gene sets in terms of association with a phenotypic trait based on a statistical probability model, and the latter two methods only test the association between a phenotypic trait and genes in pathways.
The hypergeometric test, assuming an experimentally-derived gene list is randomly conditional on a fixed pathway, is a widely utilized competitive method for pathway analysis75. The null hypothesis of the test is that genes in a pathway are more strongly associated with the phenotypic trait than those outside the pathway. The main idea of the test is to sample randomly, without replacement, from a finite population, calculating the statistic of characteristic (here is flooding-tolerance) of interest. Hence, this method aims to test whether annotated pathways (i.e. biological functions or processes), which are functionally related, are enriched or over-represented in a list of important genes (i.e. FTgenes) with the trait of interest. The p-value can be computed by
where L is the total number of genes in a finite population, M is the size of important genes, S is the number of genes in a specific pathway, x is the number of important genes in a specific pathway, and g is the number of genes in M.
The idea of the self-contained methods is to use permutations to generate a huge number of null distributions. We compared genes in a specific pathway with random sets sampled from the hull distributions and calculated an empirical p-value for pathway analysis. The tests ignored genes not in the pathways. The present study applied two self-contained methods, SUMSTAT and SUMSQ76. Under the null hypothesis (the pathway is unrelated to the trait), we tested whether the observed gene set (i.e. pathway) outperforms the random gene sets generated by permutations. The enrichment score (ES) calculation of SUMSTAT and SUMSQ can be expressed as
where ti is the i-th value of the statistic (in this context, expression metrics e.g. p-value, fold-change) of FTgenes, and S is the number of genes in a specific pathway.
The analysis pipelines of SUMSTAT and SUMSQ consist of calculating the statistics ES of observed gene sets of soybeans, random permutations of statistics calculated from gene expression data, calculating permuted ES and association p-value. The ES represents association signals for each of annotated pathways, and the calculation of ESSUMSTAT and ESSUMSQ is to sum over all statistics and all squared statistics of a gene set (i.e. GO pathway) containing S FTgenes, respectively. We randomly shuffled the statistics calculated from gene expression data for each pathway and followed the same receipt above to calculate a permuted ES. Then, we normalized the ES by subtracting the mean of permutated ESs, and divided it by the standard deviation of permuted ESs. Finally, we calculated empirical p-values by comparing the observed ES and the permuted ES in 10,000 permutations for all pathways.
Functional gene network analysis
A graphical model of a network composes of nodes and edges. Nodes can be defined as genes, proteins, metabolites, and annotated pathways. Edges are typically presented by connections between nodes. In network analysis, the degree is the most widely used measure to describe the connections of the nodes in a network. In this study, we defined nodes as the FTgenes, and calculated edges using the sum of the log-likelihood score in SoyNet functional gene network tool (https://www.inetbio.org/soynet/Network_nfm_form_conv.php). For detailed steps of network links calculation, please refer to Berger et al.77 and Kim et al.78. We further used Cytoscape v3.9.079 to integrate molecular interaction network data to visualize the graphical model of the network.
Multiple testing correction
To account for multiple testing problems in pathway analysis, we applied both the Benjamini–Hochberg correction method80 and the Bonferroni correction method to balance false positive and false negative results. The procedure controls the false discovery rate at 0.05 level in the current study, assuming p-values are independently distributed under the null hypothesis. Only pathways reaching genome-wide significance threshold of p-value less than 1.00 × 10–4 were considered significantly enriched.
Validation for the key FTgenes
Soybean seeds of Chiangmai 60 cultivar were obtained from Thanya Farm Co., Ltd., Nonthaburi, Thailand. The seeds were surface-sterilized in 1% sodium hypochlorite and rinsed with distilled water 3 times. The seeds culture and stress conditions were done following Lin et al.58 with some modification. Ten seeds were sown on the sandy soil in a plastic pot (240-mm length × 240-mm width × 190-mm depth). A total of eight pots were sowed. Five seedlings soybean with the same size were retained in the pot, when two true leaves were fully unfolded (~ 8 days), the seedlings pot was transferred into new plastic containers filled with water. The samples were collected at 3, 6, 12, and 24 h, respectively. The untreated plants were used as the control. The root was collected and immediately frozen in liquid nitrogen for RNA extraction.
Total RNA was isolated from root and shoot with TRIzol reagent according to the manufacturer’s protocol. Subsequently, 5 µg of the total RNA was mixed with 500 ng of oligo(dT)18 and 200 U Superscript™ III reverse transcriptase (Invitrogen), and the mixture was reverse transcribed at 50 °C for 60 min. The real-time PCR was done following the manufacturer’s protocol of the Luna® Universal qPCR Master Mix (NEB) with the gene-specific primers listed in Supplementary Table 2. After the PCR had finished, the PCR specificity was examined using 2% agarose gel and the relative gene expression ratios were calculated using the 2−ΔΔCT method with untreated plants cDNA as the reference sample and actin as the reference gene. All experiments were done in biological triplicates.
To validate the significant differences between the transcript quantities of FTgenes under flooding stress, statistical analysis was performed using one-way ANOVA followed by Tukey’s HSD test method facilitated by the IBM SPSS statistics software. p-values less than 0.05 were considered as statistically significant difference.
Conclusions
This study shed new light on the effectiveness of the systems biology framework based on the FTgenes selected from the integrated OnO data and gene prioritization algorithm to uncover the mechanisms behind flooding-tolerant responses in soybean. We proposed a computational systems biology pipeline to discover enriched pathways and nine key genes that were real responses to flooding stress in our qRT-PCR experiments. This work suggests that the integrative pathway and network framework at systems biology level can be a good foundation for key genes discovery and gene function analysis for further work. In addition, this pipeline can minimize potential uncertainties and false positives and gain valuable insights into mechanisms underlying flooding-tolerant responses in soybean. The framework presented in this work can be applied to other complex traits in important crops.
Data availability
The raw RNA-seq data of the whole genome gene expression dataset of soybean seedling submergence can be accessed in the NCBI Sequence Read Archive (SRA), and the accession number is SRP181976. The data of the flooding-tolerance genes (FTgenes) presented in the study are deposited in the DRYAD repository, accession number for a unique digital object identifier (DOI): https://doi.org/10.5061/dryad.dv41ns229. The dataset of the FTgenes is available at https://datadryad.org/stash/share/yfjZHzx6Oal5UyUr87EISoC6txczBChObdEOYAwSbTE. Soybean seeds were obtained from Thanya Farm Co., Ltd., Nonthaburi, Thailand.
References
Terahara, N. Flavonoids in foods: A review. Nat. Prod. Commun. 10, 521–528 (2015).
Kim, E. H., Ro, H. M., Kim, S. L., Kim, H. S. & Chung, I. M. Analysis of isoflavone, phenolic, soyasapogenol, and tocopherol compounds in soybean Glycine max (L.) Merrill germplasms of different seed weights and origins. J. Agric. Food Chem. 60, 6045–6055 (2012).
Beavers, K. M., Jonnalagadda, S. S. & Messina, M. J. Soy consumption, adhesion molecules, and pro-inflammatory cytokines: A brief review of the literature. Nutr. Rev. 67, 213–221 (2009).
Hernandez-Montes, E. et al. Activation of glutathione peroxidase via Nrf1 mediates genistein’s protection against oxidative endothelial cell injury. Biochem. Biophys. Res. Commun. 346, 851–859 (2006).
Suzuki, K. et al. Genistein, a soy isoflavone, induces glutathione peroxidase in the human prostate cancer cell lines LNCaP and PC-3. Int. J. Cancer 99, 846–852 (2002).
Ali, T. et al. Natural dietary supplementation of anthocyanins via PI3K/Akt/Nrf2/HO-1 pathways mitigate oxidative stress, neurodegeneration, and memory impairment in a mouse model of Alzheimer’s disease. Mol. Neurobiol. 55, 6076–6093 (2018).
Min, J. Y. et al. Neuroprotective effect of cyanidin-3-O-glucoside anthocyanin in mice with focal cerebral ischemia. Neurosci. Lett. 500, 157–161 (2011).
Shin, W. H., Park, S. J. & Kim, E. J. Protective effect of anthocyanins in middle cerebral artery occlusion and reperfusion model of cerebral ischemia in rats. Life Sci. 79, 130–137 (2006).
Oh, M. & Komatsu, S. Characterization of proteins in soybean roots under flooding and drought stresses. J. Proteom. 114, 161–181 (2015).
Wang, X. & Komatsu, S. Proteomic approaches to uncover the flooding and drought stress response mechanisms in soybean. J. Proteom. 172, 201–215 (2018).
Sun, W. J. et al. Climate drives global soil carbon sequestration and crop yield changes under conservation agriculture. Glob. Change Biol. 26, 3325–3335 (2020).
Teshome, D. T., Zharare, G. E. & Naidoo, S. The threat of the combined effect of biotic and abiotic stress factors in forestry under a changing climate. Front. Plant Sci. 11, 601009 (2020).
Dietzel, R. et al. How efficiently do corn- and soybean-based cropping systems use water? A systems modeling analysis. Glob. Change Biol. 22, 666–681 (2016).
Tamang, B. G., Li, S., Rajasundaram, D., Lamichhane, S. & Fukao, T. Overlapping and stress-specific transcriptomic and hormonal responses to flooding and drought in soybean. Plant J. 107, 100–117 (2021).
Feng, Z., Ding, C. Q., Li, W. H., Wang, D. C. & Cui, D. Applications of metabolomics in the research of soybean plant under abiotic stress. Food Chem. 310, 125914 (2020).
Fukao, T., Barrera-Figueroa, B. E., Juntawong, P. & Pena-Castro, J. M. Submergence and waterlogging stress in plants: A review highlighting research opportunities and understudied aspects. Front. Plant Sci. 10, 340 (2019).
Li, M. W. et al. Using genomic information to improve soybean adaptability to climate change. J. Exp. Bot. 68, 1823–1834 (2017).
Valliyodan, B. et al. Genetic diversity and genomic strategies for improving drought and waterlogging tolerance in soybeans. J. Exp. Bot. 68, 1835–1849 (2017).
Yu, Z. P. et al. Identification of QTN and candidate gene for seed-flooding tolerance in soybean Glycine max (L.) Merr. using genome-wide association study (GWAS). Genes 10, 957 (2019).
Khan, M. N., Saizata, K. & Komatsu, S. Proteomic analysis of soybean hypocotyl during recovery after flooding stress. J. Proteom. 121, 15–27 (2015).
Valliyodan, B. et al. Expression of root-related transcription factors associated with flooding tolerance of soybean (Glycine max). Int. J. Mol. Sci. 15, 17622–17643 (2014).
Yin, X. J., Hiraga, S., Hajika, M., Nishimura, M. & Komatsu, S. Transcriptomic analysis reveals the flooding tolerant mechanism in flooding tolerant line and abscisic acid treated soybean. Plant Mol. Biol. 93, 479–496 (2017).
Chen, S. L., Ehrhardt, D. W. & Somerville, C. R. Mutations of cellulose synthase (CESA1) phosphorylation sites modulate anisotropic cell expansion and bidirectional mobility of cellulose synthase. Proc. Natl. Acad. Sci. U.S.A. 107, 17188–17193 (2010).
Fatland, B. L., Nikolau, B. J. & Wurtele, E. S. Reverse genetic characterization of cytosolic acetyl-CoA generation by ATP-citrate lyase in Arabidopsis. Plant Cell 17, 182–203 (2005).
Komatsu, S., Kobayashi, Y., Nishizawa, K., Nanjo, Y. & Furukawa, K. Comparative proteomics analysis of differentially expressed proteins in soybean cell wall during flooding stress. Amino Acids 39, 1435–1449 (2010).
Sunna, A. & Antranikian, G. Xylanolytic enzymes from fungi and bacteria. Crit. Rev. Biotechnol. 17, 39–67 (1997).
Wang, X. & Komatsu, S. Review: Proteomic techniques for the development of flood-tolerant soybean. Int. J. Mol. Sci. 21, 7497 (2020).
Yin, X. J. & Komatsu, S. Comprehensive analysis of response and tolerant mechanisms in early-stage soybean at initial-flooding stress. J. Proteom. 169, 225–232 (2017).
Yin, X., Sakata, K., Nanjo, Y. & Komatsu, S. Analysis of initial changes in the proteins of soybean root tip under flooding stress using gel-free and gel-based proteomic techniques. J. Proteom. 106, 1–16 (2014).
Johnson, S., Michalak, M., Opas, M. & Eggleton, P. The ins and outs of calreticulin: From the ER lumen to the extracellular space. Trends Cell Biol. 11, 122–129 (2001).
Smith, A. M. & Rees, T. A. Pathways of carbohydrate fermentation in the roots of marsh plants. Planta 146, 327–334 (1979).
Jackson, M. B., Ishizawa, K. & Ito, O. Evolution and mechanisms of plant tolerance to flooding stress. Ann. Bot. 103, 137–142 (2009).
Komatsu, S. et al. Label-free quantitative proteomic analysis of abscisic acid effect in early-stage soybean under flooding. J. Proteome Res. 12, 4769–4784 (2013).
Khan, M. N., Sakata, K., Hiraga, S. & Komatsu, S. Quantitative proteomics reveals that peroxidases play key roles in post-flooding recovery in soybean roots. J. Proteome Res. 13, 5812–5828 (2014).
Nanjo, Y. et al. Transcriptional responses to flooding stress in roots including hypocotyl of soybean seedlings. Plant Mol. Biol. 77, 129–144 (2011).
Lai, M. C., Lai, Z. Y., Jhan, L. H., Lai, Y. S. & Kao, C. F. Prioritization and evaluation of flooding tolerance genes in soybean Glycine max (L.) Merr. Front. Genet. 11, 612131 (2021).
Komatsu, S. et al. A comprehensive analysis of the soybean genes and proteins expressed under flooding stress using transcriptome and proteome techniques. J. Proteome Res. 8, 4766–4778 (2009).
Xia, J. B. et al. Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies. Biomed. Res. Int. 2013, 853043 (2013).
Zhai, J. J. et al. A meta-analysis based method for prioritizing candidate genes involved in a pre-specific function. Front. Plant Sci. 7, 01914 (2016).
Kaler, A. S. & Purcell, L. C. Estimation of a significance threshold for genome-wide association studies. BMC Genom. 20, 7 (2019).
Perneger, T. V. What’s wrong with Bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998).
Kitano, H. Systems biology: A brief overview. Science 295, 1662–1664 (2002).
Karahalil, B. Overview of systems biology and omics technologies. Curr. Med. Chem. 23, 4221–4230 (2016).
Canzler, S. & Hackermuller, J. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinform. 21, 1 (2020).
Bertozzi, A. L. Proceedings of the International Congress of Mathematicians: Rio de Janeiro 3865–3892 (World Scientific, 2018).
Ristevski, I., Flegg, K., Livingstone, M. & Dimaras, H. Co-creation of a pathway of care for retinoblastoma patients and families. Pediatr. Blood Cancer 67, S362–S363 (2020).
Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189–197 (2008).
Goeman, J. J. & Buhlmann, P. Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 23, 980–987 (2007).
Ebrahimpoor, M., Spitali, P., Hettne, K., Tsonaka, R. & Goeman, J. Simultaneous enrichment analysis of all possible gene-sets: Unifying self-contained and competitive methods. Brief. Bioinform. 21, 1302–1312 (2020).
Li, J. J. et al. Comparative transcriptome analysis between the cytoplasmic male sterile line NJCMS1A and its maintainer NJCMS1B in soybean (Glycine max (L.) Merr.). PLoS ONE 10, e0126771 (2015).
Shi, G. X. et al. RNA-Seq analysis reveals that multiple phytohormone biosynthesis and signal transduction pathways are reprogrammed in curled-cotyledons mutant of soybean Glycine max (L.) Merr.. BMC Genom. 15, 510 (2014).
Shu, Y. J. et al. A transcriptomic analysis reveals soybean seed pre-harvest deterioration resistance pathways under high temperature and humidity stress. Genome 63, 115–124 (2020).
Naithani, S. et al. Plant reactome: A knowledgebase and resource for comparative pathway analysis. Nucleic Acids Res. 48, D1093–D1103 (2020).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
Jia, P. L., Kao, C. F., Kuo, P. H. & Zhao, Z. M. A comprehensive network and pathway analysis of candidate genes in major depressive disorder. BMC Syst. Biol. 5, S12 (2011).
Zheng, M. L., Zhou, N. K., Huang, D. L. & Luo, C. H. Pathway cross-talk network strategy reveals key pathways in non-small cell lung cancer. J. BUON 22, 1252–1258 (2017).
Carter, H., Hofree, M. & Ideker, T. Genotype to phenotype via network analysis. Curr. Opin. Genet. Dev. 23, 611–621 (2013).
Lin, Y. H. et al. Identification of genes/proteins related to submergence tolerance by transcriptome and proteome analyses in soybean. Sci. Rep. 9, 14688 (2019).
Kitano, H. Computational systems biology. Nature 420, 206–210 (2002).
Zhao, Z. M. et al. The international conference on intelligent biology and medicine (ICIBM) 2019: Bioinformatics methods and applications for human diseases. BMC Bioinform. 20, 4 (2019).
Chen, W. et al. Identification and comparative analysis of differential gene expression in soybean leaf tissue under drought and flooding stress revealed by RNA-Seq. Front. Plant Sci. 7, 1044 (2016).
Wu, M. C. & Lin, X. H. Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways. Stat. Methods Med. Res. 18, 577–593 (2009).
de Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353–364 (2016).
Rahmatallah, Y., Emmert-Streib, F. & Glazko, G. Gene set analysis approaches for RNA-seq data: Performance evaluation and application guideline. Brief. Bioinform. 17, 393–407 (2016).
Binns, D. et al. QuickGO: A web-based tool for Gene Ontology searching. Bioinformatics 25, 3045–3046 (2009).
Gao, Q. M., Zhu, S. F., Kachroo, P. & Kachroo, A. Signal regulators of systemic acquired resistance. Front. Plant Sci. 6, 228 (2015).
Wang, C. X. et al. Free radicals mediate systemic acquired resistance. Cell Rep. 7, 348–355 (2014).
El-Shetehy, M. et al. Nitric oxide and reactive oxygen species are required for systemic acquired resistance in plants. Plant Signal. Behav. 10, e998544 (2015).
Hussain, S. et al. Comparative transcriptional profiling of primed and non-primed rice seedlings under submergence stress. Front. Plant Sci. 7, 01125 (2016).
Li, Y. S., Ou, S. L. & Yang, C. Y. The seedlings of different japonica rice varieties exhibit differ physiological properties to modulate plant survival rates under submergence stress. Plants-Basel 9, 982 (2020).
Khatoon, A., Rehman, S., Oh, M. W., Woo, S. H. & Komatsu, S. Analysis of response mechanism in soybean under low oxygen and flooding stresses using gel-base proteomics technique. Mol. Biol. Rep. 39, 10581–10594 (2012).
Dubey, A., Malla, M. A. & Kumar, A. Taxonomical and functional bacterial community profiling in disease-resistant and disease-susceptible soybean cultivars. Braz. J. Microbiol. 53, 1355 (2022).
Carbon, S. et al. The Gene Ontology resource: Enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Ashburner, M. et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Evangelou, M., Rendon, A., Ouwehand, W. H., Wernisch, L. & Dudbridge, F. Comparison of methods for competitive tests of pathway analysis. PLoS ONE 7, e41018 (2012).
Tintle, N. L., Borchers, B., Brown, M. & Bekmetjev, A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proc. 3, S96 (2009).
Berger, S. I., Posner, J. M. & Ma’ayan, A. Genes2Networks: Connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinform. 8, 372 (2007).
Kim, E., Hwang, S. & Lee, I. SoyNet: A database of co-functional networks for soybean Glycine max. Nucleic Acids Res. 45, D1082–D1089 (2017).
Shannon, P. et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc 57, 289–300 (1995).
Acknowledgements
The authors gratefully acknowledge the financial support from the NCHU-KU Joint Research Project (No.00062022). This work was financially supported (in part) by the Faculty of Science at Sriracha Kasetsart University, Thailand, and the Advanced Plant Biotechnology Center from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. We thank Hao-Wei Fu for maintaining the FTgenes and being in charge of data management. We also thank Min-Lun Lee for English editing.
Author information
Authors and Affiliations
Contributions
Study conception and design: C.F.K.; acquisition and analysis of data: L.H.J., C.F.K., S.B., M.C.L. and C.M.H.; interpretation of data: L.H.J., C.F.K., S.B. and C.Y.Y.; original draft preparation: L.H.J., C.F.K. and C.Y.Y.; revise and editing the manuscript: C.F.K., S.B. and L.H.J. All authors have read and approved the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jhan, LH., Yang, CY., Huang, CM. et al. Integrative pathway and network analysis provide insights on flooding-tolerance genes in soybean. Sci Rep 13, 1980 (2023). https://doi.org/10.1038/s41598-023-28593-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-28593-1
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.