Integrative pathway and network analysis provide insights on flooding-tolerance genes in soybean

Jhan, Li-Hsin; Yang, Chin-Ying; Huang, Chih-Min; Lai, Mu-Chien; Huang, Yen-Hsiang; Baiya, Supaporn; Kao, Chung-Feng

doi:10.1038/s41598-023-28593-1

Download PDF

Article
Open access
Published: 03 February 2023

Integrative pathway and network analysis provide insights on flooding-tolerance genes in soybean

Li-Hsin Jhan¹,
Chin-Ying Yang¹,
Chih-Min Huang¹,
Mu-Chien Lai²,
Yen-Hsiang Huang¹,
Supaporn Baiya³ &
…
Chung-Feng Kao^1,4

Scientific Reports volume 13, Article number: 1980 (2023) Cite this article

1504 Accesses
1 Citations
Metrics details

Subjects

Abstract

Soybean is highly sensitive to flooding and extreme rainfall. The phenotypic variation of flooding tolerance is a complex quantitative trait controlled by many genes and their interaction with environmental factors. We previously constructed a gene-pool relevant to soybean flooding-tolerant responses from integrated multiple omics and non-omics databases, and selected 144 prioritized flooding tolerance genes (FTgenes). In this study, we proposed a comprehensive framework at the systems level, using competitive (hypergeometric test) and self-contained (sum-statistic, sum-square-statistic) pathway-based approaches to identify biologically enriched pathways through evaluating the joint effects of the FTgenes within annotated pathways. These FTgenes were significantly enriched in 36 pathways in the Gene Ontology database. These pathways were related to plant hormones, defense-related, primary metabolic process, and system development pathways, which plays key roles in soybean flooding-induced responses. We further identified nine key FTgenes from important subnetworks extracted from several gene networks of enriched pathways. The nine key FTgenes were significantly expressed in soybean root under flooding stress in a qRT-PCR analysis. We demonstrated that this systems biology framework is promising to uncover important key genes underlying the molecular mechanisms of flooding-tolerant responses in soybean. This result supplied a good foundation for gene function analysis in further work.

Differential selection of yield and quality traits has shaped genomic signatures of cowpea domestication and improvement

Article 22 April 2024

Towards establishing a fungal economics spectrum in soil saprobic fungi

Article Open access 18 April 2024

Spatial co-transcriptomics reveals discrete stages of the arbuscular mycorrhizal symbiosis

Article Open access 08 April 2024

Introduction

Soybean [Glycine max (L.) Merr] provides abundant flavonoids, plant-based proteins and lipids. It is the major protein source for vegetarians. Soybean is nutritious for their isoflavones and anthocyanins belonging to flavonoid compounds¹. Isoflavones, of which soybean has higher content, generally exist in many kinds of plants². Isoflavones have been functionally linked to anti-oxidation, reduction in inflammation, inhibition of free radicals, and cancer prevention^3,4,5. Anthocyanin and its main constituents, such as cyaniding-3-O-glucoside, present in soybeans can effectively inhibit lipopolysaccharide, hydrogen peroxide, and pro-inflammatory cytokines, which are a natural source of antioxidants and anti-inflammatory^6,7,8. Hence, soybeans could be used to boost the nutritional content, nutraceutical products, and potential therapeutic agents for some pathological diseases.

Soybeans are highly sensitive to growth conditions, particularly in flooding environments^9,10. In recent years, global agriculture damage and losses from changing climate (e.g. flooding) have increased^11,12. Extreme torrential rain or momentary heavy rain brought by strong southwesterly air currents or jet streams induced by typhoons has caused severe flooding during soybean (including edamame) autumn seedlings in southern and western areas of Taiwan. In the United States, flooding can occur sequentially during a single crop cycle or independently in the same fields during different years. Over the past 15 years, flooding resulted in $6.2 billion worth of soybean production losses. Loss of soil, nutrients, and pesticides to waterways is a major problem in high agricultural production areas such as the Mid-western United States^13,14. In China, the flooding stress of soybean is associated with excess irrigation that impairs water uptake, and soil waterlogging is largely affected by the season¹⁵. The total summer crop sown area in 2020 is 26.17 million hectares; therefore, the floods affected 23% of the planted area of summer crops and caused 4.3% of crop failure. Facing such high uncertainty climate change, we need a systematical and comprehensive method to find the whole picture of defense mechanisms against flooding for breeding stress-tolerant cultivars.

There is general recognition that flooding can be classified into waterlogging, when the water covers only the root system, and submergence, when the water covers both the shoot and the root system, according to water levels above the soil surface¹⁶. The present study mainly focuses on submergence. Abiotic stresses can disturb plant growth and adversely affect growth characteristics, for example, leaf etiolation and the number of pods per plant^17,18,19. Under the flooding stress, the contents of flavonoid compounds in soybean increase significantly, but the yields decrease simultaneously^20,21,22. Also, cell wall maturation, cell wall formation, and plant development will be seriously changed during flooding^23,24,25,26. Thus, a better understanding of the physiological mechanisms involved in flooding-induced response and tolerance of soybeans is needed for breeding work.

Mechanisms related to flooding tolerance or response have been investigated and reviewed^27,28. At initial flooding stress of soybean, ATP-citrate lyase and xylosidase decrease while alcohol dehydrogenases and calreticulin increase²⁹. These enzymes are related to the tricarboxylic acid cycle, cell wall maturation, alcohol fermentation, and calcium homeostasis^26,30,31. Prolonged submergence caused a significant decline in photosynthesis, stomatal conductance, and the nutrition absorption of leaves³². Soybean produces abscisic acid (ABA) to regulate protein kinases under hypoxia³³. These protein kinases are related to pathways including glycolysis, cell organization, and vesicle transport^20,22,33. The proteomic analyses have found that excessive water supply for soybean roots induces anthocyanin 5-aromatic acyltransferase, anthocyanin malonyltransferase, and isoflavone reductase to increase^20,34,35. These protein kinases facilitate isoflavones and anthocyanins to increase the survival rate after flooding. Although many molecular and physiological mechanisms were reported, mechanisms of flooding-induced response and tolerance have yet to be fully clarified for soybean. No studies were reported on the enhancement of pathway analysis for flooding tolerance and response, a polygenetic trait, by introducing multigenes selected from an integrated knowledge framework in a systematic and comprehensive design³⁶.

Flooding tolerance is a complex quantitative (or polygenic) trait, which is regulated through several biological pathways that are controlled by a number of genes (i.e., polygenes). Many functional mechanisms studies for flooding tolerance in soybean have been reported^20,33,37. Most of the studies were based on selected candidate genes that were hypothesis-driven, such as text-mining-based³⁸ and meta-analysis-based³⁹. However, these mechanisms may only partially explain flooding due to a limited understanding of the genetic make-up of a polygenic trait, particularly flooding tolerance. Furthermore, potential biases might have affected the results using the hypothesis-free approach, for example, genome-wide association study (GWAS), because it is challenging to account for variations between germplasms and quantitative trait⁴⁰. It is also challenging to balance the results between false positives and false negatives in GWAS⁴¹. Determining the genetic makeup underlying flooding tolerance in soybean is crucial to precisely identifying mechanisms related to flooding tolerance or responding to stress. Hence, applying pathway-based analysis to selected candidate genes can systematically integrate prior biological knowledge of gene regulating functions and biological pathway information or functional categories to figure out the whole picture of physiological mechanisms for flooding tolerance in soybean. This can reveal a more comprehensive picture at the molecular level than a single marker-based or gene-level analysis.

The main purpose of system biology is to precisely explore the unknown mechanisms in experimental data containing implicit biological information⁴². Through systematical methods, pathway enrichment analysis, and network analysis, for example, enable us to understand the signal transmission of responses biologically in a plant cell being stimulated by an environmental factor. These signals are complicated, information-worthless in a single signal but information-valuable in systematic manners⁴³. Pathway enrichment analysis, a knowledge-based approach, provide biological insights into molecular responses to a trait of interest from integrated omics and non-omics (OnO) data⁴⁴. Pathway enrichment analysis detects whether particular biological pathways or molecular groups are significantly overrepresented. Networks have successfully carried on the idea of graph theory and probability theory to succinctly represent a mathematical structure of biological components using a group of nodes (e.g. proteins, genes, pathways) and links (e.g. genetic and/or functional interactions)⁴⁵. Using available biological knowledge and candidate genes selected from integrated OnO data for network analysis provides a great potential to uncover novel information on complex biological networks⁴⁶.

Methods of pathway enrichment analysis in systems biology can be generalized into, but not limited to, competitive and self-contained method⁴⁷. In the competitive method, it compares associations between two gene sets (i.e., genes in a specific pathway versus genes not in that pathway) and traits, such as a hypergeometric test⁴⁸. However, the self-contained method only considers associations between the genes in a specific pathway and traits, such as sum-statistic (e.g. SUMSTAT) and sum-square (e.g. SUMSQ) statistic⁴⁹. There are several examples that successfully applied pathway-based analysis to explore potential mechanisms and biological functions for important traits in plants, including cytoplasmic male sterile in soybean⁵⁰, comparison between a mutant gene and wild-type in soybean⁵¹, or high temperature in soybean⁵². Recently, Naithani et al.⁵³ developed the Plant Reactome, a knowledgebase and resource for pathway-based analysis in plants to address important biological questions and regulatory mechanisms. Many open-access knowledgebase data such as Gene Ontology (GO, http://geneontology.org/) and Kyoto Encyclopedia of Genes Genomes (KEGG, https://www.genome.jp/kegg/kegg2.html) are commonly used worldwide. These functional annotations provide opportunities to access the whole map underlying a specific trait via systematically testing unknown functional gene sets by statistical model⁵⁴. The networks integrate biological information (e.g. proteins, molecules, pathways), and quantify nucleic acid information, providing information on the associations between several genetic loci, and how genes and pathways interact with each other (i.e., gene modules) to regulate traits. The association between genes can be visualized by the network composed of nodes and edges, making the complex associations between genes presented in a simple and trivial way^55,56. It is practical and efficient way to reveal enriched pathways and networks for flooding-tolerant responses using candidate genes prioritized from integrated OnO databases⁵⁷.

We previously developed a comprehensive framework to integrate OnO data that is relevant to flood-tolerant responses in soybean. A total of 36,705 genes were collected and prioritized according to their magnitude of association with flooding-tolerant responses³⁶. In this study, we introduced a systems biology framework (Fig. 1), through the pathway enrichment analysis (both the competitive and the self-contained methods) and network analysis to combine the joint effects of the 144 prioritized flooding tolerance genes (i.e., FTgenes) (Fig. 2) to uncover the molecular mechanisms underlying flooding-tolerant responses in soybean. The strategies proposed in this study can better understanding in how flooding-tolerance genes act against a flooding event and protect soybean plants from floods in complex biological systems.

Results

Gene-pathway mapping

A total of 14,772, 17,017, 19,060, and 18,889 expression data (Step 1 in Fig. 1) from soybean roots after 3, 6, 12, and 24 h (h) of submergence treatments in the RNA-seq database⁵⁸ were used as the test sets to conduct pathway enrichment analyses. Only pathways (i.e. GO terms) containing at least one FTgenes (Fig. 2) were considered, resulting in 417 annotated pathways (Step 1 in Fig. 1) for pathway enrichment analysis.

Gene-wise statistic values

For gene score calculation, we transformed expression-level statistics (i.e. p-values) using 10-based logarithms into gene-wise statistic scores (Step 2 in Fig. 1) to measure changes in gene expression in roots flooded after 3, 6, 12, and 24 h. The distributions of gene-wise statistic scores were highly skewed to the right (Fig. 3), as seen in the microarray data. The expression skewness of each dataset was 4.82, 4.31, 4.91, and 4.24, respectively, indicating the expression skewness has the potential to reveal new insights into the FTgenes (Fig. 2) in the analyses of pathway enrichment and gene network.

Competitive method (hypergeometric test) revealed the mechanisms of flooding-tolerant responses

Using the hypergeometric model test (Step 3 in Fig. 1), we initially found 27 pathways (Fig. 4) with at least one nominal p-value less than 1.00 × 10^–4 that were enriched with flooding tolerance or response to the stress in the gene expression data from soybean roots after 3, 6, 12, and 24 h of submergence treatments. Among them, 24 pathways were significantly enriched at all four-time points after submergence treatments. Table 1 demonstrated detailed information on significantly enriched pathways overrepresented in the gene expression dataset. The top five pathways included ‘abscisic acid mediated signaling pathway’, ‘response to ethylene stimulus’, ‘ethylene biosynthetic process’, ‘hyperosmotic salinity response’, and ‘response to the jasmonic acid stimulus’. Two pathways, ‘abscisic acid mediated signaling pathway’ and ‘response to ethylene stimulus’, were the most significantly enriched at 3 h after submergence treatments. The pathway of ‘abscisic acid mediated signaling pathway’ was the most significantly enriched at 6 and 12 h after submergence treatments. The top five pathways were the most significantly enriched at 24 h after submergence treatments.

Table 1 Significantly enriched pathways in gene expression data for flooding-tolerance using hypergeometric test.

Full size table

Self-contained methods (SUMSTAT, SUMSQ) revealed the mechanisms of flooding-tolerant responses

The 144 FTgenes (Fig. 2) were significantly enriched in fourteen GO pathways (14 in SUMSTAT and 1 in SUMSQ) after controlling the false discovery rate at the 0.05 level in the self-contained approaches (Step 3 in Figs. 1, 5). Among them, only one GO pathway, ‘response to hypoxia’, was found at all four-time points in both methods. Tables 2 and 3 demonstrated detailed information of significantly enriched pathways overrepresented in the gene expression dataset using SUMSTAT and SUMSQ, respectively. Five GO pathways were significantly enriched after submergence treatments at all four-time points. The top five pathways included ‘response to hypoxia’, ‘response to cadmium ion’, ‘systemic acquired resistance’, ‘salicylic acid mediated signaling pathway’, ‘regulation of hydrogen peroxide metabolic process’, and ‘glycolysis’. One pathway, ‘response to hypoxia’, was the most significantly enriched at 3 h after submergence treatments. Three pathways, including ‘response to hypoxia’, ‘systemic acquired resistance, salicylic acid mediated signaling pathway’, and ‘response to cadmium ion’ were the most significantly enriched at both 6 and 12 h after submergence treatments. The ‘response to wounding’ pathway and the top five pathways were the most significantly enriched at 24 h after submergence treatments. Six pathways (‘response to wounding’, ‘ethylene mediated signaling pathway’, ‘carboxy-lyase activity’, ‘thiamine pyrophosphate binding’, ‘response to cold’, and ‘cell wall’) were not enriched at 3 h at the beginning but enriched later during 6–24 h after submergence treatments.

Table 2 Significantly enriched pathways in gene expression data for flooding-tolerance using SUMSTAT statistic.

Full size table

Table 3 Significantly enriched pathways in gene expression data for flooding-tolerance using SUMSQ statistic.

Full size table

We found that four pathways (response to hypoxia, systemic acquired resistance, salicylic acid mediated signaling pathway, regulation of hydrogen peroxide metabolic process, and response to wounding) were reported in both competitive and self-contained approaches. However, there was no overlap among the top five pathways in both approaches.

Gene network analysis selects the key genes relevant to flooding-tolerant responses

Since many FTgenes were involved in flooding-tolerant responses, we conducted a functional gene network analysis (Step 4 in Fig. 1) to better understand how these FTgenes work together. Among the 144 FTgenes, 103 were found to have protein–protein interactions (PPIs) in the soybean interactome. Using the functional modules analytic tool in SoyNet, we successfully constructed a gene network specific to flooding-tolerant responses in soybean (Fig. 6; Sheet 1 in Supplementary Table 1). This gene network contained 103 FTgenes and 70 intermediate genes that were highly connected nodes (hubs) in the reference network and hence recruited in the gene network. The degree values of the 173 genes ranged from 0 to 66, with an average degree of 7.32. Of which, 110 genes (degree values between 0 and 2) and 13 genes (degree values between 3 and 10) had a low degree of centrality and were hence excluded from the gene network. Figure 6 demonstrated a dense gene network containing 50 genes, of which 23 genes had degree values between 20 and 30, demonstrating a high degree of centrality in the gene network. The 23 genes (highlighted in yellow), including eight FTgenes and 15 significant intermediate genes, had an average degree of 25.5. Among them, the eight FTgenes (Glyma.02g222400, Glyma.18g009700, Glyma.13g231700, Glyma.13g361900, Glyma.15g012000, Glyma.07g153100, Glyma.01g118000, and Glyma.15g011900), reported in both competitive and self-contained pathway analytic strategies (Table 4), demonstrated a high degree of centrality ranged between 20 and 30 (the average degree was 26). The eight FTgenes are mainly related to signal transduction (Glyma.13g361900, Glyma.15g011900, and Glyma.15g012000), energy-producing (Glyma.02g222400, Glyma.13g361900, and Glyma.18g009700), and plant hormone regulation (Glyma.15g011900 and Glyma.15g012000), indicating they play important roles in flooding-tolerant responses in soybean.

Table 4 Contributions of the FTgenes in three pathway analysis.

Full size table

We selected 77 FTgenes from 24 significantly enriched pathways reported in the hypergeometric test to compute node edges in SoyNet and construct a gene network in Cytoscape. As a result, 103 genes (74 FTgenes and 29 intermediate genes) were retained, with degree values ranging from 0 to 35 (the average degree was 9.68). For simplicity, we further grouped 60 genes having a row degree of centrality into a node (named as Group0_2), and included the node with the remaining 43 genes (15 FTgenes and 28 intermediate genes) to form a gene network (Fig. 7; Sheet 2 in Supplementary Table 1). Of those, 20 genes (highlighted in yellow) having higher degree values between 20 and 30, with an average degree of 28.2, demonstrated to interact with each other more closely in the gene network. Among them, one Ftgenes (Glyma.14g127800) is mainly related to plant hormone transport that contributes to flooding-tolerant responses.

Another 34 Ftgenes from 5 significantly enriched pathways reported in SUMSTAT were being computed edges in SoyNet, resulting in 65 genes (32 Ftgenes and 33 intermediate genes), with degree values ranging from 0 to 71 (the average degree was 17.94). We further constructed a gene network in Cytoscape (Fig. 8; Sheet 3 in Supplementary Table 1), and observed that 29 genes (highlighted in yellow) were highly connected to form a dense module, with higher degree values between 20 and 29 (the average degree was 25.5). Among them, 6 Ftgenes are mainly related to signal transduction (Glyma.15g011900, Glyma.15g012000), plant hormone transport (Glyma.14G127800, Glyma.18G009700), and enzyme catalytic activity (Glyma.13G231700, Glyma.07G153100).

The above results show closely connected PPIs in the soybean interactome by entering the 144 FTgenes, 24 enriched pathways at all four-time points (3, 6, 12, and 24 h) in the hypergeometric test, and 5 enriched pathways at all four-time points in the SUMSTAT method, respectively. These genes were first compared to 23 (Fig. 6), 20 (Fig. 7), and 29 (Fig. 8) selected important genes (including the FTgenes and the intermediate genes) to examine their topological characteristics. Our results showed that these important genes had higher degree values in all comparisons, suggesting high degree of centrality. These FTgenes (103, 77, and 34 FTgenes) were further compared to the intermediate genes (70, 29, and 33 genes) and the remaining genes (14,599, 16,911, and 18,993 genes), respectively. We found that the FTgenes and the intermediate genes in the corresponding gene network more frequently received small p-values at all four-time points in gene expression datasets.

To further explore the key FTgenes, we selected 25 FTgenes from 5 significantly enriched pathways that overlapped in both the hypergeometric test and the SUMSTAT method to compute node edges in SoyNet and construct a gene network in Cytoscape. As a result, a gene network containing 25 FTgenes and 26 intermediate genes was obtained (Fig. 9; Sheet 4 in Supplementary Table 1), with degree values ranged from 0 to 28 (the average degree was 13.68). Of them, 26 genes (highlighted in yellow) were closely linked, having higher degree values between 20 and 30, with an average degree of 25.3. Among them, four FTgenes (Glyma.13g361900, Glyma.15g012000, Glyma.15g011900, and Glyma.14g127800) exhibited higher degree values ranged between 26 and 28, with an average degree value of 26.3. The four FTgenes are mainly related to signal transduction (Glyma.13g361900, Glyma.15g011900, Glyma.15g012000) and plant hormone transport (Glyma.14g127800) to play key roles in flooding-tolerant responses in soybean.

More importantly, all these FTgenes in the corresponding gene networks had significantly larger mean scores (or smaller p-values) in the corresponding gene expression datasets (p-values < 0.001) compared to the remaining genes (Fig. 10). Similar scenarios were also observed in the intermediate genes, although they did not reach significance level at 0.05. In particular, we further selected 8, 1, and 6 key FTgenes from the corresponding gene network, respectively, and found that these key FTgenes significantly outperformed all gene groups (Step 5 in Figs. 1, 10). The nine key FTgenes (Fig. 11A) were involving with signal transduction (Glyma.15g012000 and Glyma.15g011900), energy (Glyma.02g222400, Glyma.18g009700, Glyma.13g361900, and Glyma.14g127800), enzyme activity (Glyma.07g153100 and Glyma.13g231700), and unknown function (Glyma.01g118000), which were significantly related to abscisic acid transport and terpenoid transport.

To validate the flooding stress responses in the plant cell, a real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR) was used to measure the level of the nine key FTgenes expressions in soybean root under flooding stress (Step 6 in Figs. 1, 12). Our results revealed that four energy involved genes (Glyma.02g222400, Glyma.18g009700, Glyma.13g361900, and Glyma.14g127800) were significantly upregulated from 3 to 24 h except for Glyma.14g127800 which showed downregulation in all conditions compared with the control (i.e. untreated condition). In the enzyme activity involved genes (Glyma.07g153100 and Glyma.13g231700), the highest expression was found at 12 h after treatment. For those of signal transduction involved genes, the transcript level of Glyma.15g011900 and Glyma.15g012000 was significantly higher than the control from 3–24 h to 6–24 h, respectively. Interestingly, the Glyma.01g118000 which is an unknown function gene exhibited around 330–7000 times higher expression level than the control, and when compared with the other, this gene also has the highest relative gene expression.

Discussion

Understanding genetic backgrounds and molecular mechanisms underlying flooding-tolerant responses is imperative for soybean breeding. However, the success in identifying candidate genes for flooding-tolerant responses in soybean has been limited because of the complex nature of abiotic stresses. The present study introduced systems biology methods using pathway enrichment analysis (both competitive and self-contained approaches were considered) and gene network analysis to evaluate the joint effects of multi-genes (in this context, FTgenes) within annotated GO pathways. Most importantly, the FTgenes³⁶ (Fig. 2) used in this study were prioritized from multiple OnO databases integrated from experimental and computational studies that have been made available in the last decades. In particular, several data-ensemble approaches were performed, including data cleaning, data harmonization, data heterogeneity, and data mapping, to remove unwanted data and inaccurate data. Through the process of gene prioritization, the uncertainties, noise, biases, and false positives raised from the data itself and statistical approaches could be reduced effectively.

Systems biology often requires sophisticated computational models and simulations to understand the larger picture of the biological systems by studying interactions among a set of candidate genes⁵⁹. Integrative pathway and network analysis marry the idea of mathematical graph theory and data-driven approach (e.g. multiple omics data, OnO data integration) to efficiently uncover the genotype–phenotype relationship at the systems level by integrating knowledge of gene regulation and function. In an attempt to integrate OnO data with mathematical graph theory, we introduced an integrative pathway and network approach to construct a comprehensive view of the biological mechanisms for flooding-tolerant responses in soybean. To the best of our knowledge, this is the first work on the pathway and network analyses using candidate genes prioritized from multiple OnO data integration algorithms. Our results reveals novel molecular pathways and functional relationships of the FTgenes to better understand their biological implications in the regulatory system for further validation.

Skewness is widely used to measure the degree of asymmetry in expression data. Expression skewness can identify novel molecular pathways and key genes via the systems biology approaches (e.g. enrichment pathway analysis and functional network analysis), which is a valuable way to capture meaningful outliers (i.e. the greatest variation between samples with and without submergence) and asymmetrical behavior in the whole genome expression dataset⁶⁰. Our results demonstrated a high degree of skewness (Fig. 3) that was appropriate for pathway and network analyses. In addition, our results may provide valuable insights into exploring mechanisms underlying flooding-tolerant responses in soybean.

In the studies of crops, pathway analysis is merely used for the exploration of candidate genes focusing on specific traits^58,61. In general, pathway analysis can be distinguished into two different approaches, the competitive and the self-contained, according to their null hypothesis⁴⁸. In practical applications, however, two different approaches often generated inconsistent results⁶² due to distinct null hypotheses. The competitive methods can potentially exclude confounding effects and provide biological relevance to the analysis⁶³. The self-contained methods have the greater power to identify feature-set (i.e. GO pathways), and the outcomes are highly reproducible⁶⁴. Both approaches have their strengths and limitations. Therefore, a suitable way to gain better insights into the data is to perform the competitive and the self-contained approaches simultaneously for feature-set (i.e. GO pathways) testing. This could reduce the likelihood of false-positive results and gain biological relevance to the analysis.

In this study, we identified 36 overrepresented GO pathways (Tables 1, 2, 3) in the independent RNA-seq databases of submergence treatments in soybean. The most frequently shared FTgenes among enriched pathways were Glyma.02g195300 (functioning in 23 pathways), Glyma.05g021100 (functioning in 20 pathways), Glyma.08g218600 (functioning in 20 pathways), and Glyma.14g102900 (functioning in 20 pathways), which were found in two or more pathway-based methods (Table 4). Many of these FTgenes (6, 18, 10, and 12 FTgenes in 3, 6, 12, and 24 h, respectively) were not significantly overrepresented at the single gene-level in the RNA-seq databases (Fig. 3); however, they were enriched (p-values < 0.001) with flood-tolerant responses at the systems level using pathway-based analytic approaches. For instance, Glyma.17g236200, Glyma.19g013700, and Glyma.11g180500 gene did not reach genome-wide significant association, but were found at the systems level in our approaches. In particular, Arabidopsis GO and Uniprot GO databases provide opportunities to access a better understanding of how these FTgenes participate in flooding activities. Under flooding conditions, the Glyma.17g236200 gene regulates root development to prevent from wounding, and the Glyma.19g013700 gene mediates the transpiration efficiency by regulating ABA to control stomata closure. In further, the Glyma.11g180500 gene participates in RNA regulation, producing factors to control where plant hormones should work. Evidence from previous studies confirmed the roles of these important FTgenes and pathways identified in this study for the complex mechanisms of flooding-tolerant responses in soybean. These findings indicate that systems biology methods can boost the power to reveal the potential roles of FTgenes in uncovering the molecular mechanisms and biological novelties for studying flooding-tolerant responses in soybean.

The hypothesis and the model of different categories of pathway analysis are distinct; hence, the results are also different. In this study, we compared the results across the hypergeometric test, the SUMSTAT method, and the SUMSQ method. In total, 27, 14, and 1 enriched pathway(s) were identified in the hypergeometric test (Table 1, Fig. 4), the SUMSTAT method (Table 2, Fig. 5A), and the SUMSQ method (Table 3, Fig. 5B), respectively. The three methods found only one pathway, ‘response to hypoxia’ in all four-time points (3, 6, 12, and 24 h) of gene expression data. Under flooding conditions, the response to hypoxia begins with low-oxygen stimulation, followed by activates the transcription of plant hormone genes. Plant hormones, such as ABA, ethylene, and salicylic acid, are involved in participating in roots recovery⁶⁵. Of which five pathways were consistently reported by both the competitive and the self-contained approaches, even a more stringent threshold was applied to correct for multiple testing. The ‘systemic acquired resistance, salicylic acid mediated signaling pathway’ is responsible for regulating the biosynthesis, the perception, and the signal mediating⁶⁶. When a plant suffers from flooding, hydrogen peroxide begins to express in roots to remove some harmful chemicals from flooding stress, and salicylic acid mediates a series of signals in producing hydrogen peroxide^66,67. Evidence shows the important fact that the regulation of hydrogen peroxide interacts with salicylic acid by signaling series forms to eliminate fatal chemicals in roots cell under flooding stress^66,67,68. Thus, ‘regulation of hydrogen peroxide metabolic process’ and ‘systemic acquired resistance, salicylic acid mediated signaling pathway’ are evidenced to be linked to flooding-tolerant responses in soybean. Besides, rice was also evidenced to be involved with these two pathways under flooding stress^69,70. In soybean roots, cell wall and aerenchyma will swell under flooding. The response to wounding in roots leads to many salicylic acid signals activating and interacting with other plant hormones in order to restore the wounds⁷¹. After wounding, the soybean’s adventitious roots will grow against hypoxia environments. The energy from glycolysis and pyruvate-phosphorylation is consumed when soybean grows adventitious. Evidence showed that ‘response to wounding’ and plant hormone-related pathways may play key roles in flooding-tolerant responses in soybean. In addition, the gluconeogenesis and glycolysis, which can synthesize or degrade carbohydrates, make crops gain and store adequate ATPs in order to get more energy^20,21,22,33. All the evidence suggested that ‘glycolysis’, ‘gluconeogenesis’, ‘pyruvate decarboxylase activity’, ‘abscisic acid mediated signaling pathway’, and ‘regulation of hydrogen peroxide metabolic process’ were found to be linked to flooding-tolerant responses in soybean, which were in line with the previous studies^20,21,22,33. All these pathways mentioned above play key roles in the physiological mechanisms underlying flooding-tolerant responses. Our results demonstrated that the pathways we found differ considerably between distinct types of pathway analyses. Hence, combining distinct pathway-based analyses with considering different hypotheses and models can provide comprehensive, precise, and reliable results.

Ethylene is important to protein phosphorylation in the mechanisms of the initial stage of flooding stress, especially in root tips. Evidence shows that roots recovery needs more ATP to provide energy and protein phosphorylation to develop the cell tissue^25,28,29,37. At the initial stage of flooding, root cells are stimulated by ethylene, and a series of mediated signaling produce more ethylene. The evidence proves that ‘response to ethylene stimulus’, ‘ethylene biosynthetic process’, and ‘ethylene mediated signaling pathway’ are important to flooding stress.

Our results also showed 22 (Table 1, Fig. 4) and 9 (Table 2, Fig. 5A) enriched pathways specific to the hypergeometric test and the SUMSTAT method, respectively. Without comparing the results of two different approaches, we might obtain false-positive and false-negative results. For instance, two pathways, ‘glycolysis’ and ‘gluconeogenesis’, were evidenced^19,20,21,33 and found in the SUMSTAT approach but not in the hypergeometric test, and hence they are false-negative results; the ‘abscisic acid mediated signaling pathway’ pathway was evidenced^22,25,28,33 and reported in the hypergeometric test but not in the SUMSTAT, and hence it is a false-negative result. Two pathways, ’response to chitin’ and ‘response to fungus’, were significantly enriched in the hypergeometric test, but did not reach the significance in the SUMSTAT approach. Besides, the two pathways were evidenced to be related to biotic stress⁷². Hence, the two pathways might be false-positive results or novel findings that need further validation. Again, combining the competitive and the self-contained methods is a promising approach to better understanding a given candidate genes for a trait of interest.

Benefits from combining advantages of pathway enrichment analysis and network analysis, our study not only discovers new novelty about the flooding mechanisms in soybean but also captures more information of biological systems. For instance, We finally selected nine key FTgenes, four of these genes (Glyma.13g361900, Glyma.14g127800, Glyma.15g012000, and Glyma.15g011900) were recorded in DNA, RNA, protein, function, and homologs layer; one gene (Glyma.18g009700) was recorded in RNA and protein layer; and four genes (Glyma.02g222400, Glyma.07g153100, Glyma.13g231700, and Glyma.01g118000) were recorded in RNA, protein, and homolog layer³⁶ (Fig. 11B). These key genes may play important roles in coordinating physiological mechanisms under flooding-tolerant responses in soybean.

The systems biology framework proposed in this study demonstrated the power in identifying the nine key FTgenes in a rigorous and efficient manner. To validate the 9 key FTgenes, in planta FTgenes expression analysis was performed. Our qRT-PCR results (Fig. 12) revealed that eight key FTgenes were upregulated and one FTgene was downregulated after exposed to flooding stress. The results demonstrated the unique and differential response of soybean leaf tissue under flooding, offering the evidence of the real response to flooding in genetics and molecular biology. Our results can be supplied as a good foundation for the gene function analysis underlying flooding-tolerant responses in further work.

Although systems biology takes advantage of a comprehensive and systematic understanding of the FTgenes in flooding-tolerant responses in soybean, there still are some limitations and considerations in this study. First, pathway and network analyses were built on the basis of gene and pathway annotation completeness. In the application of scientific research, the research team of GO and PlantRegMap updates the databases and maintains the website annually. It ensures the databases are complete, accurate, and persuasive. Second, the accuracy of pathway and network analyses relied on the accuracy and the completeness of the FTgenes. Fortunately, our FTgenes were selected from a comprehensive framework consisting of omics and non-omics data integration and gene prioritization algorithm. Several data quality control processes were done during the data-ensemble step to effectively reduce potential uncertainties, noise, and false positive results. Although our FTgenes are informative, more validation experiments are required.

Flooding-tolerant responses are a quantitative trait regulated by polygenes; thus, many traditional single-marker methods, such as association mapping, linkage mapping, and genome-wide association study, have no power to uncover the whole picture of how these genes interact with each other to regulate traits. Our proposed systems biology framework can efficiently integrate gene information with annotated GO database biologically to boost the power of identifying key FTgenes and their underlying molecular pathways or mechanisms. This provides an opportunity to better understanding complex flooding-tolerant responses that should be noted. These findings present a wealth of information for future validation.

Methods

We developed an integrative systems biology framework to explore insights into the FTgenes underlying flooding-tolerant responses in soybean. Six-step pipelines (Fig. 1) were proposed to select the key FTgenes, including the GO annotations filtering, gene-wise statistic scores calculation, pathway enrichment analysis, functional network analysis, validation study, and the key gene selection. Detailed methods and materials used in this study are described below.

Candidate genes for flooding-tolerance (FTgenes)

We previously proposed a comprehensive multiple OnO data mining, integration, and prioritization framework³⁶. All genetic data (SNPs, genes, SSRs, QTLs) and bioinformatics information (trait index, variety, biochemical, statistical values) that were relevant to flooding-tolerant responses in soybean were collected and defined as a flooding-tolerance gene pool (containing 36,705 genes). These OnO data were integrated from multidimensional data platforms, including association mapping and GWAS, linkage mapping, gene expression, pathway regulatory, network analysis, protein–protein interaction, proteomic analysis, and model plants. Through the systems biology framework, a total of 144 prioritized FTgenes (Fig. 2), based on the cut-off score of 42, were selected from the gene pool³⁶. The FTgenes were defined to be significantly associated or enriched with flood-tolerance or flood-response after flooding treatment (i.e., submergence) was conducted during the germination and vegetative growth stages of soybean. The study framework and the prioritized results, the data of which are used here, are provided elsewhere³⁶.

Gene expression dataset and gene-wise statistic values

The gene expression dataset (whole genome expression database) of soybean seedling submergence was accessed through the database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap/) that was published by Lin et al.⁵⁸. They used cultivar Qihuang 34, a flooding-resistant variety, for submergence experiments, and recorded gene expression changes in roots after 3, 6, 12, and 24 h of submergence treatments. All four-time periods of RNA expression data were obtained from the dbGaP repository. We used p-values, of which genes under the null hypothesis of no differential gene expression, to present gene-level statistic values of flooding-tolerance in soybean. To obtain gene-level significance, we used 10-based logarithms to transform p-values into gene-wise statistic scores to capture information for gene expression changes in roots flooded after 3, 6, 12, and 24 h.

Pathway annotations

To perform mapping for functional pathway analysis, we used GO^73,74 (http://geneontology.org/) annotations. GO-based functional annotation in soybean contains 4896 terms covering 48,606 unique genes, mapped in the Williams 82 reference genome version 2 (Glycine max Wm82.a2.v1). These annotated GO gene sets (i.e. pathways) systematically provide a standard catalogue to classify functional genes into biological functions and molecular mechanisms. Pathways with overly limited information (< 6 genes) were removed, as well as substantially large (> 1500 genes) pathways. As a result, a total of 2926 pathways, which consist of 916 cellular components, 762 biological processes, and 1248 molecular functions, remained for pathway analysis. In pathway analysis, we used the negative logarithm of these 2926 pathways’ p-values as our statistic.

Statistical methods for pathway enrichment analysis

We utilized two different strategies, the competitive method, and the self-contained method⁴⁷, to test for significantly enriched pathways for the trait of flooding-tolerance in soybean. Three statistical methods, including the hypergeometric test (competitive method), SUMSTAT, and SUMSQ (self-contained methods), were used to discover the significance of enriched pathways. The former method compares two gene sets in terms of association with a phenotypic trait based on a statistical probability model, and the latter two methods only test the association between a phenotypic trait and genes in pathways.

The hypergeometric test, assuming an experimentally-derived gene list is randomly conditional on a fixed pathway, is a widely utilized competitive method for pathway analysis⁷⁵. The null hypothesis of the test is that genes in a pathway are more strongly associated with the phenotypic trait than those outside the pathway. The main idea of the test is to sample randomly, without replacement, from a finite population, calculating the statistic of characteristic (here is flooding-tolerance) of interest. Hence, this method aims to test whether annotated pathways (i.e. biological functions or processes), which are functionally related, are enriched or over-represented in a list of important genes (i.e. FTgenes) with the trait of interest. The p-value can be computed by

$$\mathrm{p-value }= \sum_{x=g}^{S}\frac{\left(\begin{array}{c}S\\ x\end{array}\right)\left(\begin{array}{c}L-S\\ M-x\end{array}\right)}{\left(\begin{array}{c}L\\ M\end{array}\right)},$$

where L is the total number of genes in a finite population, M is the size of important genes, S is the number of genes in a specific pathway, x is the number of important genes in a specific pathway, and g is the number of genes in M.

The idea of the self-contained methods is to use permutations to generate a huge number of null distributions. We compared genes in a specific pathway with random sets sampled from the hull distributions and calculated an empirical p-value for pathway analysis. The tests ignored genes not in the pathways. The present study applied two self-contained methods, SUMSTAT and SUMSQ⁷⁶. Under the null hypothesis (the pathway is unrelated to the trait), we tested whether the observed gene set (i.e. pathway) outperforms the random gene sets generated by permutations. The enrichment score (ES) calculation of SUMSTAT and SUMSQ can be expressed as

$$(\mathrm{SUMSTAT})\quad \mathrm{ ES_{SUMSTAT} }=\sum_{i=1}^{S}{t}_{i},$$

$$\left(\mathrm{SUMSQ}\right)\quad \mathrm{ ES_{SUMSQ }}=\sum_{i=1}^{S}{t}_{i}^{2},$$

where t_i is the i-th value of the statistic (in this context, expression metrics e.g. p-value, fold-change) of FTgenes, and S is the number of genes in a specific pathway.

The analysis pipelines of SUMSTAT and SUMSQ consist of calculating the statistics ES of observed gene sets of soybeans, random permutations of statistics calculated from gene expression data, calculating permuted ES and association p-value. The ES represents association signals for each of annotated pathways, and the calculation of ES_SUMSTAT and ES_SUMSQ is to sum over all statistics and all squared statistics of a gene set (i.e. GO pathway) containing S FTgenes, respectively. We randomly shuffled the statistics calculated from gene expression data for each pathway and followed the same receipt above to calculate a permuted ES. Then, we normalized the ES by subtracting the mean of permutated ESs, and divided it by the standard deviation of permuted ESs. Finally, we calculated empirical p-values by comparing the observed ES and the permuted ES in 10,000 permutations for all pathways.

Functional gene network analysis

A graphical model of a network composes of nodes and edges. Nodes can be defined as genes, proteins, metabolites, and annotated pathways. Edges are typically presented by connections between nodes. In network analysis, the degree is the most widely used measure to describe the connections of the nodes in a network. In this study, we defined nodes as the FTgenes, and calculated edges using the sum of the log-likelihood score in SoyNet functional gene network tool (https://www.inetbio.org/soynet/Network_nfm_form_conv.php). For detailed steps of network links calculation, please refer to Berger et al.⁷⁷ and Kim et al.⁷⁸. We further used Cytoscape v3.9.0⁷⁹ to integrate molecular interaction network data to visualize the graphical model of the network.

Multiple testing correction

To account for multiple testing problems in pathway analysis, we applied both the Benjamini–Hochberg correction method⁸⁰ and the Bonferroni correction method to balance false positive and false negative results. The procedure controls the false discovery rate at 0.05 level in the current study, assuming p-values are independently distributed under the null hypothesis. Only pathways reaching genome-wide significance threshold of p-value less than 1.00 × 10^–4 were considered significantly enriched.

Validation for the key FTgenes

Soybean seeds of Chiangmai 60 cultivar were obtained from Thanya Farm Co., Ltd., Nonthaburi, Thailand. The seeds were surface-sterilized in 1% sodium hypochlorite and rinsed with distilled water 3 times. The seeds culture and stress conditions were done following Lin et al.⁵⁸ with some modification. Ten seeds were sown on the sandy soil in a plastic pot (240-mm length × 240-mm width × 190-mm depth). A total of eight pots were sowed. Five seedlings soybean with the same size were retained in the pot, when two true leaves were fully unfolded (~ 8 days), the seedlings pot was transferred into new plastic containers filled with water. The samples were collected at 3, 6, 12, and 24 h, respectively. The untreated plants were used as the control. The root was collected and immediately frozen in liquid nitrogen for RNA extraction.

Total RNA was isolated from root and shoot with TRIzol reagent according to the manufacturer’s protocol. Subsequently, 5 µg of the total RNA was mixed with 500 ng of oligo(dT)₁₈ and 200 U Superscript™ III reverse transcriptase (Invitrogen), and the mixture was reverse transcribed at 50 °C for 60 min. The real-time PCR was done following the manufacturer’s protocol of the Luna^® Universal qPCR Master Mix (NEB) with the gene-specific primers listed in Supplementary Table 2. After the PCR had finished, the PCR specificity was examined using 2% agarose gel and the relative gene expression ratios were calculated using the 2^−ΔΔCT method with untreated plants cDNA as the reference sample and actin as the reference gene. All experiments were done in biological triplicates.

To validate the significant differences between the transcript quantities of FTgenes under flooding stress, statistical analysis was performed using one-way ANOVA followed by Tukey’s HSD test method facilitated by the IBM SPSS statistics software. p-values less than 0.05 were considered as statistically significant difference.

Conclusions

This study shed new light on the effectiveness of the systems biology framework based on the FTgenes selected from the integrated OnO data and gene prioritization algorithm to uncover the mechanisms behind flooding-tolerant responses in soybean. We proposed a computational systems biology pipeline to discover enriched pathways and nine key genes that were real responses to flooding stress in our qRT-PCR experiments. This work suggests that the integrative pathway and network framework at systems biology level can be a good foundation for key genes discovery and gene function analysis for further work. In addition, this pipeline can minimize potential uncertainties and false positives and gain valuable insights into mechanisms underlying flooding-tolerant responses in soybean. The framework presented in this work can be applied to other complex traits in important crops.

Data availability

The raw RNA-seq data of the whole genome gene expression dataset of soybean seedling submergence can be accessed in the NCBI Sequence Read Archive (SRA), and the accession number is SRP181976. The data of the flooding-tolerance genes (FTgenes) presented in the study are deposited in the DRYAD repository, accession number for a unique digital object identifier (DOI): https://doi.org/10.5061/dryad.dv41ns229. The dataset of the FTgenes is available at https://datadryad.org/stash/share/yfjZHzx6Oal5UyUr87EISoC6txczBChObdEOYAwSbTE. Soybean seeds were obtained from Thanya Farm Co., Ltd., Nonthaburi, Thailand.

References

Terahara, N. Flavonoids in foods: A review. Nat. Prod. Commun. 10, 521–528 (2015).
Google Scholar
Kim, E. H., Ro, H. M., Kim, S. L., Kim, H. S. & Chung, I. M. Analysis of isoflavone, phenolic, soyasapogenol, and tocopherol compounds in soybean Glycine max (L.) Merrill germplasms of different seed weights and origins. J. Agric. Food Chem. 60, 6045–6055 (2012).
Article CAS Google Scholar
Beavers, K. M., Jonnalagadda, S. S. & Messina, M. J. Soy consumption, adhesion molecules, and pro-inflammatory cytokines: A brief review of the literature. Nutr. Rev. 67, 213–221 (2009).
Article Google Scholar
Hernandez-Montes, E. et al. Activation of glutathione peroxidase via Nrf1 mediates genistein’s protection against oxidative endothelial cell injury. Biochem. Biophys. Res. Commun. 346, 851–859 (2006).
Article CAS Google Scholar
Suzuki, K. et al. Genistein, a soy isoflavone, induces glutathione peroxidase in the human prostate cancer cell lines LNCaP and PC-3. Int. J. Cancer 99, 846–852 (2002).
Article ADS CAS Google Scholar
Ali, T. et al. Natural dietary supplementation of anthocyanins via PI3K/Akt/Nrf2/HO-1 pathways mitigate oxidative stress, neurodegeneration, and memory impairment in a mouse model of Alzheimer’s disease. Mol. Neurobiol. 55, 6076–6093 (2018).
Article CAS Google Scholar
Min, J. Y. et al. Neuroprotective effect of cyanidin-3-O-glucoside anthocyanin in mice with focal cerebral ischemia. Neurosci. Lett. 500, 157–161 (2011).
Article CAS Google Scholar
Shin, W. H., Park, S. J. & Kim, E. J. Protective effect of anthocyanins in middle cerebral artery occlusion and reperfusion model of cerebral ischemia in rats. Life Sci. 79, 130–137 (2006).
Article CAS Google Scholar
Oh, M. & Komatsu, S. Characterization of proteins in soybean roots under flooding and drought stresses. J. Proteom. 114, 161–181 (2015).
Article CAS Google Scholar
Wang, X. & Komatsu, S. Proteomic approaches to uncover the flooding and drought stress response mechanisms in soybean. J. Proteom. 172, 201–215 (2018).
Article CAS Google Scholar
Sun, W. J. et al. Climate drives global soil carbon sequestration and crop yield changes under conservation agriculture. Glob. Change Biol. 26, 3325–3335 (2020).
Article ADS Google Scholar
Teshome, D. T., Zharare, G. E. & Naidoo, S. The threat of the combined effect of biotic and abiotic stress factors in forestry under a changing climate. Front. Plant Sci. 11, 601009 (2020).
Article Google Scholar
Dietzel, R. et al. How efficiently do corn- and soybean-based cropping systems use water? A systems modeling analysis. Glob. Change Biol. 22, 666–681 (2016).
Article ADS Google Scholar
Tamang, B. G., Li, S., Rajasundaram, D., Lamichhane, S. & Fukao, T. Overlapping and stress-specific transcriptomic and hormonal responses to flooding and drought in soybean. Plant J. 107, 100–117 (2021).
Article CAS Google Scholar
Feng, Z., Ding, C. Q., Li, W. H., Wang, D. C. & Cui, D. Applications of metabolomics in the research of soybean plant under abiotic stress. Food Chem. 310, 125914 (2020).
Article CAS Google Scholar
Fukao, T., Barrera-Figueroa, B. E., Juntawong, P. & Pena-Castro, J. M. Submergence and waterlogging stress in plants: A review highlighting research opportunities and understudied aspects. Front. Plant Sci. 10, 340 (2019).
Article Google Scholar
Li, M. W. et al. Using genomic information to improve soybean adaptability to climate change. J. Exp. Bot. 68, 1823–1834 (2017).
CAS Google Scholar
Valliyodan, B. et al. Genetic diversity and genomic strategies for improving drought and waterlogging tolerance in soybeans. J. Exp. Bot. 68, 1835–1849 (2017).
CAS Google Scholar
Yu, Z. P. et al. Identification of QTN and candidate gene for seed-flooding tolerance in soybean Glycine max (L.) Merr. using genome-wide association study (GWAS). Genes 10, 957 (2019).
Article CAS Google Scholar
Khan, M. N., Saizata, K. & Komatsu, S. Proteomic analysis of soybean hypocotyl during recovery after flooding stress. J. Proteom. 121, 15–27 (2015).
Article CAS Google Scholar
Valliyodan, B. et al. Expression of root-related transcription factors associated with flooding tolerance of soybean (Glycine max). Int. J. Mol. Sci. 15, 17622–17643 (2014).
Article CAS Google Scholar
Yin, X. J., Hiraga, S., Hajika, M., Nishimura, M. & Komatsu, S. Transcriptomic analysis reveals the flooding tolerant mechanism in flooding tolerant line and abscisic acid treated soybean. Plant Mol. Biol. 93, 479–496 (2017).
Article CAS Google Scholar
Chen, S. L., Ehrhardt, D. W. & Somerville, C. R. Mutations of cellulose synthase (CESA1) phosphorylation sites modulate anisotropic cell expansion and bidirectional mobility of cellulose synthase. Proc. Natl. Acad. Sci. U.S.A. 107, 17188–17193 (2010).
Article ADS CAS Google Scholar
Fatland, B. L., Nikolau, B. J. & Wurtele, E. S. Reverse genetic characterization of cytosolic acetyl-CoA generation by ATP-citrate lyase in Arabidopsis. Plant Cell 17, 182–203 (2005).
Article CAS Google Scholar
Komatsu, S., Kobayashi, Y., Nishizawa, K., Nanjo, Y. & Furukawa, K. Comparative proteomics analysis of differentially expressed proteins in soybean cell wall during flooding stress. Amino Acids 39, 1435–1449 (2010).
Article CAS Google Scholar
Sunna, A. & Antranikian, G. Xylanolytic enzymes from fungi and bacteria. Crit. Rev. Biotechnol. 17, 39–67 (1997).
Article CAS Google Scholar
Wang, X. & Komatsu, S. Review: Proteomic techniques for the development of flood-tolerant soybean. Int. J. Mol. Sci. 21, 7497 (2020).
Article CAS Google Scholar
Yin, X. J. & Komatsu, S. Comprehensive analysis of response and tolerant mechanisms in early-stage soybean at initial-flooding stress. J. Proteom. 169, 225–232 (2017).
Article CAS Google Scholar
Yin, X., Sakata, K., Nanjo, Y. & Komatsu, S. Analysis of initial changes in the proteins of soybean root tip under flooding stress using gel-free and gel-based proteomic techniques. J. Proteom. 106, 1–16 (2014).
Article CAS Google Scholar
Johnson, S., Michalak, M., Opas, M. & Eggleton, P. The ins and outs of calreticulin: From the ER lumen to the extracellular space. Trends Cell Biol. 11, 122–129 (2001).
Article CAS Google Scholar
Smith, A. M. & Rees, T. A. Pathways of carbohydrate fermentation in the roots of marsh plants. Planta 146, 327–334 (1979).
Article CAS Google Scholar
Jackson, M. B., Ishizawa, K. & Ito, O. Evolution and mechanisms of plant tolerance to flooding stress. Ann. Bot. 103, 137–142 (2009).
Article CAS Google Scholar
Komatsu, S. et al. Label-free quantitative proteomic analysis of abscisic acid effect in early-stage soybean under flooding. J. Proteome Res. 12, 4769–4784 (2013).
Article CAS Google Scholar
Khan, M. N., Sakata, K., Hiraga, S. & Komatsu, S. Quantitative proteomics reveals that peroxidases play key roles in post-flooding recovery in soybean roots. J. Proteome Res. 13, 5812–5828 (2014).
Article CAS Google Scholar
Nanjo, Y. et al. Transcriptional responses to flooding stress in roots including hypocotyl of soybean seedlings. Plant Mol. Biol. 77, 129–144 (2011).
Article CAS Google Scholar
Lai, M. C., Lai, Z. Y., Jhan, L. H., Lai, Y. S. & Kao, C. F. Prioritization and evaluation of flooding tolerance genes in soybean Glycine max (L.) Merr. Front. Genet. 11, 612131 (2021).
Article Google Scholar
Komatsu, S. et al. A comprehensive analysis of the soybean genes and proteins expressed under flooding stress using transcriptome and proteome techniques. J. Proteome Res. 8, 4766–4778 (2009).
Article CAS Google Scholar
Xia, J. B. et al. Gene prioritization of resistant rice gene against Xanthomas oryzae pv. oryzae by using text mining technologies. Biomed. Res. Int. 2013, 853043 (2013).
Article Google Scholar
Zhai, J. J. et al. A meta-analysis based method for prioritizing candidate genes involved in a pre-specific function. Front. Plant Sci. 7, 01914 (2016).
Article Google Scholar
Kaler, A. S. & Purcell, L. C. Estimation of a significance threshold for genome-wide association studies. BMC Genom. 20, 7 (2019).
Article Google Scholar
Perneger, T. V. What’s wrong with Bonferroni adjustments. Br. Med. J. 316, 1236–1238 (1998).
Article CAS Google Scholar
Kitano, H. Systems biology: A brief overview. Science 295, 1662–1664 (2002).
Article ADS CAS Google Scholar
Karahalil, B. Overview of systems biology and omics technologies. Curr. Med. Chem. 23, 4221–4230 (2016).
Article CAS Google Scholar
Canzler, S. & Hackermuller, J. multiGSEA: A GSEA-based pathway enrichment analysis for multi-omics data. BMC Bioinform. 21, 1 (2020).
Article Google Scholar
Bertozzi, A. L. Proceedings of the International Congress of Mathematicians: Rio de Janeiro 3865–3892 (World Scientific, 2018).
Google Scholar
Ristevski, I., Flegg, K., Livingstone, M. & Dimaras, H. Co-creation of a pathway of care for retinoblastoma patients and families. Pediatr. Blood Cancer 67, S362–S363 (2020).
Google Scholar
Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189–197 (2008).
Article Google Scholar
Goeman, J. J. & Buhlmann, P. Analyzing gene expression data in terms of gene sets: Methodological issues. Bioinformatics 23, 980–987 (2007).
Article CAS Google Scholar
Ebrahimpoor, M., Spitali, P., Hettne, K., Tsonaka, R. & Goeman, J. Simultaneous enrichment analysis of all possible gene-sets: Unifying self-contained and competitive methods. Brief. Bioinform. 21, 1302–1312 (2020).
Article Google Scholar
Li, J. J. et al. Comparative transcriptome analysis between the cytoplasmic male sterile line NJCMS1A and its maintainer NJCMS1B in soybean (Glycine max (L.) Merr.). PLoS ONE 10, e0126771 (2015).
Article Google Scholar
Shi, G. X. et al. RNA-Seq analysis reveals that multiple phytohormone biosynthesis and signal transduction pathways are reprogrammed in curled-cotyledons mutant of soybean Glycine max (L.) Merr.. BMC Genom. 15, 510 (2014).
Article Google Scholar
Shu, Y. J. et al. A transcriptomic analysis reveals soybean seed pre-harvest deterioration resistance pathways under high temperature and humidity stress. Genome 63, 115–124 (2020).
Article CAS Google Scholar
Naithani, S. et al. Plant reactome: A knowledgebase and resource for comparative pathway analysis. Nucleic Acids Res. 48, D1093–D1103 (2020).
CAS Google Scholar
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).
Article Google Scholar
Jia, P. L., Kao, C. F., Kuo, P. H. & Zhao, Z. M. A comprehensive network and pathway analysis of candidate genes in major depressive disorder. BMC Syst. Biol. 5, S12 (2011).
Article Google Scholar
Zheng, M. L., Zhou, N. K., Huang, D. L. & Luo, C. H. Pathway cross-talk network strategy reveals key pathways in non-small cell lung cancer. J. BUON 22, 1252–1258 (2017).
Google Scholar
Carter, H., Hofree, M. & Ideker, T. Genotype to phenotype via network analysis. Curr. Opin. Genet. Dev. 23, 611–621 (2013).
Article CAS Google Scholar
Lin, Y. H. et al. Identification of genes/proteins related to submergence tolerance by transcriptome and proteome analyses in soybean. Sci. Rep. 9, 14688 (2019).
Article ADS Google Scholar
Kitano, H. Computational systems biology. Nature 420, 206–210 (2002).
Article ADS CAS Google Scholar
Zhao, Z. M. et al. The international conference on intelligent biology and medicine (ICIBM) 2019: Bioinformatics methods and applications for human diseases. BMC Bioinform. 20, 4 (2019).
Article Google Scholar
Chen, W. et al. Identification and comparative analysis of differential gene expression in soybean leaf tissue under drought and flooding stress revealed by RNA-Seq. Front. Plant Sci. 7, 1044 (2016).
Article Google Scholar
Wu, M. C. & Lin, X. H. Prior biological knowledge-based approaches for the analysis of genome-wide expression profiles using gene sets and pathways. Stat. Methods Med. Res. 18, 577–593 (2009).
Article MathSciNet Google Scholar
de Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353–364 (2016).
Article Google Scholar
Rahmatallah, Y., Emmert-Streib, F. & Glazko, G. Gene set analysis approaches for RNA-seq data: Performance evaluation and application guideline. Brief. Bioinform. 17, 393–407 (2016).
Article CAS Google Scholar
Binns, D. et al. QuickGO: A web-based tool for Gene Ontology searching. Bioinformatics 25, 3045–3046 (2009).
Article CAS Google Scholar
Gao, Q. M., Zhu, S. F., Kachroo, P. & Kachroo, A. Signal regulators of systemic acquired resistance. Front. Plant Sci. 6, 228 (2015).
Article Google Scholar
Wang, C. X. et al. Free radicals mediate systemic acquired resistance. Cell Rep. 7, 348–355 (2014).
Article Google Scholar
El-Shetehy, M. et al. Nitric oxide and reactive oxygen species are required for systemic acquired resistance in plants. Plant Signal. Behav. 10, e998544 (2015).
Article Google Scholar
Hussain, S. et al. Comparative transcriptional profiling of primed and non-primed rice seedlings under submergence stress. Front. Plant Sci. 7, 01125 (2016).
Article Google Scholar
Li, Y. S., Ou, S. L. & Yang, C. Y. The seedlings of different japonica rice varieties exhibit differ physiological properties to modulate plant survival rates under submergence stress. Plants-Basel 9, 982 (2020).
Article CAS Google Scholar
Khatoon, A., Rehman, S., Oh, M. W., Woo, S. H. & Komatsu, S. Analysis of response mechanism in soybean under low oxygen and flooding stresses using gel-base proteomics technique. Mol. Biol. Rep. 39, 10581–10594 (2012).
Article CAS Google Scholar
Dubey, A., Malla, M. A. & Kumar, A. Taxonomical and functional bacterial community profiling in disease-resistant and disease-susceptible soybean cultivars. Braz. J. Microbiol. 53, 1355 (2022).
Article CAS Google Scholar
Carbon, S. et al. The Gene Ontology resource: Enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2021).
Article CAS Google Scholar
Ashburner, M. et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS Google Scholar
Evangelou, M., Rendon, A., Ouwehand, W. H., Wernisch, L. & Dudbridge, F. Comparison of methods for competitive tests of pathway analysis. PLoS ONE 7, e41018 (2012).
Article ADS CAS Google Scholar
Tintle, N. L., Borchers, B., Brown, M. & Bekmetjev, A. Comparing gene set analysis methods on single-nucleotide polymorphism data from Genetic Analysis Workshop 16. BMC Proc. 3, S96 (2009).
Article Google Scholar
Berger, S. I., Posner, J. M. & Ma’ayan, A. Genes2Networks: Connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinform. 8, 372 (2007).
Article Google Scholar
Kim, E., Hwang, S. & Lee, I. SoyNet: A database of co-functional networks for soybean Glycine max. Nucleic Acids Res. 45, D1082–D1089 (2017).
Article CAS Google Scholar
Shannon, P. et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc 57, 289–300 (1995).
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the financial support from the NCHU-KU Joint Research Project (No.00062022). This work was financially supported (in part) by the Faculty of Science at Sriracha Kasetsart University, Thailand, and the Advanced Plant Biotechnology Center from The Featured Areas Research Center Program within the framework of the Higher Education Sprout Project by the Ministry of Education (MOE) in Taiwan. We thank Hao-Wei Fu for maintaining the FTgenes and being in charge of data management. We also thank Min-Lun Lee for English editing.

Author information

Authors and Affiliations

Department of Agronomy, College of Agriculture and Natural Resources, National Chung Hsing University, Taichung, Taiwan
Li-Hsin Jhan, Chin-Ying Yang, Chih-Min Huang, Yen-Hsiang Huang & Chung-Feng Kao
Physiology and Biochemistry Division, Taiwan Banana Research Institute, Pingtung, Taiwan
Mu-Chien Lai
Department of Resource and Environment Faculty of Science at Sriracha, Kasetsart University at Sriracha Campus, Sriracha, 20230, Chonburi, Thailand
Supaporn Baiya
Advanced Plant Biotechnology Center, National Chung Hsing University, Taichung, Taiwan
Chung-Feng Kao

Authors

Li-Hsin Jhan
View author publications
You can also search for this author in PubMed Google Scholar
Chin-Ying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Min Huang
View author publications
You can also search for this author in PubMed Google Scholar
Mu-Chien Lai
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Hsiang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Supaporn Baiya
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Feng Kao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study conception and design: C.F.K.; acquisition and analysis of data: L.H.J., C.F.K., S.B., M.C.L. and C.M.H.; interpretation of data: L.H.J., C.F.K., S.B. and C.Y.Y.; original draft preparation: L.H.J., C.F.K. and C.Y.Y.; revise and editing the manuscript: C.F.K., S.B. and L.H.J. All authors have read and approved the published version of the manuscript.

Corresponding authors

Correspondence to Supaporn Baiya or Chung-Feng Kao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Table 1.

Supplementary Table 2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jhan, LH., Yang, CY., Huang, CM. et al. Integrative pathway and network analysis provide insights on flooding-tolerance genes in soybean. Sci Rep 13, 1980 (2023). https://doi.org/10.1038/s41598-023-28593-1

Download citation

Received: 01 September 2022
Accepted: 20 January 2023
Published: 03 February 2023
DOI: https://doi.org/10.1038/s41598-023-28593-1

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Differential selection of yield and quality traits has shaped genomic signatures of cowpea domestication and improvement

Towards establishing a fungal economics spectrum in soil saprobic fungi

Spatial co-transcriptomics reveals discrete stages of the arbuscular mycorrhizal symbiosis

Introduction

Results

Gene-pathway mapping

Gene-wise statistic values

Competitive method (hypergeometric test) revealed the mechanisms of flooding-tolerant responses

Self-contained methods (SUMSTAT, SUMSQ) revealed the mechanisms of flooding-tolerant responses

Gene network analysis selects the key genes relevant to flooding-tolerant responses

Discussion

Methods

Candidate genes for flooding-tolerance (FTgenes)

Gene expression dataset and gene-wise statistic values

Pathway annotations

Statistical methods for pathway enrichment analysis

Functional gene network analysis

Multiple testing correction

Validation for the key FTgenes

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Table 1.

Supplementary Table 2.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links