Abstract
Soybean is one of the most important legume crops worldwide. However, soybean yield is dramatically affected by fungal diseases, leading to economic losses of billions of dollars yearly. Here, we integrated publicly available genome-wide association studies and transcriptomic data to prioritize candidate genes associated with resistance to Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi. We identified 188, 56, 11, 8, and 3 high-confidence candidates for resistance to F. virguliforme, F. graminearum, C. gregata, M. phaseolina and P. pachyrhizi, respectively. The prioritized candidate genes are highly conserved in the pangenome of cultivated soybeans and are heavily biased towards fungal species-specific defense responses. The vast majority of the prioritized candidate resistance genes are related to plant immunity processes, such as recognition, signaling, oxidative stress, systemic acquired resistance, and physical defense. Based on the number of resistance alleles, we selected the five most resistant accessions against each fungal species in the soybean USDA germplasm. Interestingly, the most resistant accessions do not reach the maximum theoretical resistance potential. Hence, they can be further improved to increase resistance in breeding programs or through genetic engineering. Finally, the coexpression network generated here is available in a user-friendly web application (https://soyfungigcn.venanciogroup.uenf.br/) and an R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN) that serve as a public resource to explore soybean-pathogenic fungi interactions at the transcriptional level.
Similar content being viewed by others
Introduction
Soybean (Glycine max (L.) Merr.) is a major legume crop worldwide, contributing to global food security and economy. However, soybean yield is significantly affected by diseases, with an estimated economic loss of 95.8 billion dollars from 1996 to 2006 in the US1. Most of the yield loss has been linked to foliar and stem/root diseases, which are mostly caused by phytopathogenic fungi1. Fungal diseases, such as sudden death syndrome, Fusarium wilt, brown stem rot and asian rust, can impact soybean crops through leaf damage, necrosis, chlorosis, and death1,2,3.
Over the past decade, several genome-wide association studies (GWAS) have uncovered multiple single-nucleotide polymorphisms (SNPs) associated with resistance to pathogenic fungi in soybean populations3,4,5,6,7,8,9. Nevertheless, GWAS often fail to accurately pinpoint the causative genes10. GWAS limitations are particularly challenging for self-pollinating plants (e.g., soybean) because of limited recombination and strong linkage disequilibrium between causative and non-causative variants11. Such limitations ultimately lead to large genetic intervals with several genes, hindering causative gene identification. Because of the exponential accumulation of genomic and transcriptomic data in public databases12,13,14,15,16, integrative analyses to prioritize candidate genes have become a promising approach. This strategy consists in investigating the transcriptional patterns of all the genes near a significant SNP. Hence, the combination of multiple sources of evidence can result in richer and narrower sets of high-confidence candidate genes for downstream experimental validation towards biotechnological applications.
Here, we integrated multiple publicly available RNA-seq and GWAS datasets to identify high-confidence candidate genes for resistance to five phytopathogenic fungi. The prioritized resistance genes are species-specific and highly conserved in the pangenome of cultivated soybeans. The candidate resistance genes against each species are involved in various immunity-related processes, such as recognition, signaling, oxidative stress, and apoptosis. Finally, we highlighted the five most resistant accessions against each fungal species in the USDA germplasm, uncovering important information for breeding programs and genetic engineering initiatives. Finally, the coexpression network resulting from this work was also made available as a publicly available web application (https://soyfungigcn.venanciogroup.uenf.br/) and R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN).
Materials and methods
Curation of resistance-associated SNPs and pan-genome data
SNPs that contribute to resistance against phytopathogenic fungi were manually curated from the scientific literature (Table 1; Supplementary Table S1). SNPs that were identified using the Gmax_a1.v1 genome were converted to their corresponding sites in the Gmax_a2.v1 assembly using the .vcf files for both assemblies available at Soybase17. A matrix of gene presence/absence variation (PAV) in the pan-genome of cultivated soybeans (n = 204 genomes from 24 countries and 5 continents) was obtained from the Supplementary Data in18.
Transcriptome data
Gene expression estimates in transcripts per million mapped reads (TPM, Kallisto estimation) were retrieved from the Soybean Expression Atlas19. Additional RNA-seq samples comprising soybean tissues infected with fungal pathogens were retrieved from a recent publication from our group20. We filtered the SNP and transcriptome datasets to keep only fungal species that were represented by both data sources. A total of 150 RNA-seq samples from soybean tissues infected with fungal pathogens were selected (Supplementary Table S2). Finally, genes with median expression values lower than 5 were excluded to attenuate noise, resulting in an 18,748 × 150 gene expression matrix for downstream analyses.
Selection of guide genes
MapMan annotations for soybean genes were retrieved from the PLAZA 3.0 Dicots database21. Genes assigned to defense-related pathways (e.g., pathogenesis-related proteins, lignin biosynthesis, oxidative stress, and phytohormone regulation) were used as guides (Supplementary Table S3).
Candidate gene mining and functional analyses
Gene expression data were adjusted for confounding artifacts and quantile normalized with the R package BioNERO22. An unsigned coexpression network was inferred with BioNERO using Pearson’s r as correlation. All genes located in a 2 Mb sliding window relative to each SNP were selected as putative candidates, as previously proposed23. Candidate genes were prioritized using the algorithm implemented in the R package cageminer24, with an rpb threshold of 0.2 for gene significance (gene-trait correlation). Enrichment analyses were also performed with BioNERO, using functional annotations from the PLAZA 4.0 database25. To rank the prioritized candidates, they were given scores using the formula:
where
-
\(r_{pb}\) = point-biserial correlation coefficient (cageminer algorithm)
-
\(\kappa = 2\) if the gene is a transcription factor
-
\(\kappa = 2\) if the gene is a hub
-
\(\kappa = 3\) if the gene is a hub and a transcription factor
-
\(\kappa = 1\) if the gene is neither a hub nor a transcription factor.
Selection of most resistant accessions from the USDA germplasm
The VCF file with genotypic information for all accessions in the USDA germplasm was downloaded from Soybase17. For each locus, scores 0, 1, or 2 were attributed if accessions had 0, 1, or 2 beneficial SNPs (effect size > 0), respectively, whereas scores 2, 1, or 0 were attributed if accessions had 0, 1, or 2 deleterious SNPs (effect size < 0). Total resistance scores for each accession were calculated as the sum of scores Si for all n loci as follows:
Total resistance scores were ranked from highest to lowest, and ranks were used to select the most resistant accessions. The resistance potential of the best accessions was calculated as a ratio of the attributed scores to the theoretical maximum score (all beneficial SNPs and no deleterious SNPs).
Results and discussion
Data summary and genomic distribution of SNPs
After filtering the datasets to keep only fungal species represented by both SNP and transcriptome information, we kept five common phytopathogenic fungi: Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi (Fig. 1A). Overall, SNPs were located in gene-rich regions of the genome (Fig. 1B). SNPs were unevenly distributed across chromosomes, except for F. virguliforme (Fig. 1C). Further, we found that most SNPs were located in intergenic regions (Fig. 1D). Hence, predicting SNP effect on genes would not be suitable for this trait.
Candidate gene mining reveals a highly species-specific immune response
Using defense-related genes as guides, the cageminer algorithm identified 188, 56, 11, 8, and 3 high-confidence genes for F. virguliforme, F. graminearum, C. gregata, M. phaseolina, and P. pachyrhizi, respectively (Fig. 2). Only three genes were shared between species, revealing a high specificity in plant-pathogen interactions for these species. The three genes are shared by F. virguliforme and F. graminearum, suggesting that some conservation can occur at the genus level, but not at other broader taxonomic levels.
The specificity of resistance genes to particular species has been widely reported26,27,28,29. This phenomenon imposes a challenge for biotechnological applications, as it requires pyramiding many different genes to render elite cultivars resistant to different pathogens. However, we cannot rule out that the species-specific trend we observed results from low diversity in the association panels in the GWAS we analyzed. Additionally, as SNP and transcriptome data are not available for multiple pathogen strains, we might overlook broad-spectrum resistance genes that confer resistance to multiple strains of the same species27.
Further, we manually curated the high-confidence candidate resistance genes to predict the putative role of their products in plant immunity (Supplementary Table S4). Most of the prioritized candidates (28%) encode proteins involved in immune signaling, although this does not apply to all fungi species (Fig. 3). The main discrepancy in the functional classification of candidates was observed for candidate resistance genes against P. pachyrhizi. However, this is likely due to sampling bias, as the number of SNPs associated with resistance to P. pachyrhizi is limited as compared to other species. Candidates also encode proteins that play a role in recognition, phytohormone metabolism, systemic acquired resistance, transport, transcriptional regulation, oxidative stress, apoptosis, physical defense, and direct function against fungi (Fig. 3).
Interestingly, 21 candidate genes lack functional description and, hence, we could not infer their roles in plant immunity (n = 2, 4, 14, and 1 for C. gregata, F. virguliforme, and P. pachyrhizi, respectively). Nevertheless, as they were identified as high-confidence candidate genes, we hypothesize that they encode defense-related proteins. This finding reveals that besides the identification of high-confidence candidate genes, our algorithm can serve as a network-based approach to predict functions of unannotated genes, similar to previous approaches30,31.
We also developed a scheme that was used to rank high-confidence candidate genes (Table 2). Ranking candidates is particularly useful to prioritize genes when there are several candidates, such as for F. virguliforme and F. graminearum. Here, we suggest using the top 10 candidate resistance genes against each pathogen for experimental validation in future studies. Experimental tests with transgenic or edited soybeans using our set of target genes will likely reveal which genes are more suitable to develop soybean lines with increased resistance to each fungal disease.
Pangenome presence/absence variation analysis demonstrates that most prioritized genes are core genes
We analyzed PAV patterns for our prioritized candidate genes in the recently published pangenome of cultivated soybeans to unveil which soybean genotypes contain prioritized candidate genes and explore gene presence/absence variation patterns across genomes18. We found that most candidates are present in all 204 accessions (Supplementary Fig. 1A). This trend is not surprising, as the gene content in this pangenome is highly conserved, with ~ 91% of the genes being shared by > 99% of the genomes. Although the variable genome is enriched in genes associated with defense, signaling, and plant development, this trend was not found in our gene set.
Further, we investigated if gene PAV patterns could be explained by the geographical origins of the accessions (Supplementary Fig. 1B). We observed no clustering by geographical origin, suggesting that gene PAV is not affected by population structure. As this pangenome is comprised of improved soybean accessions18, the lack of population structure effect can be due to breeding programs targeting optimal adaptation to different environmental conditions (e.g., latitude and climate), even if they are in the same country.
Screening of the USDA germplasm reveals a room for genetic improvement
We inspected the USDA germplasm to find the top 5 most resistant genotypes against each fungal pathogen (see Materials and Methods for details). Strikingly, the most resistant genotypes do not contain all resistance alleles, revealing that, theoretically, they could be further improved to increase resistance (Table 3). All resistance-associated SNPs against P. pachyrhizi are present in some accessions, but this is because only two SNPs have been reported for this species. Additionally, none of the reported SNPs for F. graminearum have been identified in the SoySNP50k collection. Hence, we could not predict the most resistant accessions to this fungal species in the USDA germplasm.
Although some individual genes can confer full race-specific resistance to some pathogens, their durability in the field is often short because of pathogen evolution27. Thus, pyramiding quantitative trait loci (QTL) that confer partial resistance has been proposed as a strategy to confer long-term resistance28. To accomplish this, the most resistant genotypes identified here can be targets of allele pyramiding in breeding programs using marker-assisted selection. Alternatively, these genotypes might have their genomes edited with CRISPR/Cas systems to introduce beneficial alleles or remove deleterious alleles, ultimately boosting resistance.
Development of a user-friendly web application for network exploration
To facilitate network exploration and data reuse, we developed a user-friendly web application named SoyFungiGCN (https://soyfungigcn.venanciogroup.uenf.br/). Users can input a soybean gene of interest (Wm82.a2.v1 assembly) and visualize the gene’s module, scaled intramodular degree, and hub status (Fig. 4A). Additionally, users can explore enriched GO terms, Mapman bins and/or Interpro domains associated with the input gene’s module (Fig. 4A). Users can also visualize a network plot with the input gene and its coexpression neighbors (Fig. 4B). This resource can be particularly useful for researchers studying soybean response to other fungal species, as they can check if their genes of interest are located in defense-related coexpression modules. Also, researchers studying other species can verify if the soybean ortholog of their genes of interest is located in a defense-related module. The application is also available as an R package named SoyFungiGCN (https://github.com/almeidasilvaf/SoyFungiGCN). This package lets users run the application locally as a Shiny app, ensuring the application will always be available, even in case of server downtime.
Conclusions
By integrating publicly available GWAS and RNA-seq data, we found promising candidate genes in soybean associated with resistance to five common phytopathogenic fungi, namely C. gregata, F. graminearum, F. virguliforme, M. phaseolina, and P. pachyrhizi. The prioritized candidates encode proteins that play a role immunity-related processes such as in recognition, signaling, transcriptional regulation, oxidative stress, and physical defense. We have also found the top 5 most resistant soybean accessions against each fungal species and hypothesize that they can be further genetically improved in breeding programs with marker-assisted selection or through genome editing. The coexpression network generated here was also made available in a web resource and R package to help in future studies on soybean-pathogenic fungi interactions.
Data availability
All data and code used in this study are available in our GitHub repository (https://github.com/almeidasilvaf/SoyFungi_GWAS_GCN) to ensure full reproducibility.
References
Bandara, A. Y., Weerasooriya, D. K., Bradley, C. A., Allen, T. W. & Esker, P. D. Dissecting the economic impact of soybean diseases in the United States over two decades. PLoS ONE 15(4), 1–28. https://doi.org/10.1371/journal.pone.0231141 (2020).
Pandey, A. K. et al. Functional analysis of the asian soybean rust resistance pathway mediated by Rpp2. Mol. Plant-Microbe Interact. 24(2), 194–206. https://doi.org/10.1094/MPMI-08-10-0187 (2011).
Rincker, K., Lipka, A. E. & Diers, B. W. Genome-wide association study of brown stem rot resistance in soybean across multiple populations. Plant Genome https://doi.org/10.3835/plantgenome2015.08.0064 (2016).
Iquira, E., Humira, S. & François, B. Association mapping of QTLs for sclerotinia stem rot resistance in a collection of soybean plant introductions using a genotyping by sequencing (GBS) approach. BMC Plant Biol. 15(1), 1–12. https://doi.org/10.1186/s12870-014-0408-y (2015).
Sun, M. et al. Genome-wide association study of partial resistance to sclerotinia stem rot of cultivated soybean based on the detached leaf method. PLoS ONE 15(5), 1–15. https://doi.org/10.1371/journal.pone.0233366 (2020).
Kandel, R. et al. Soybean resistance to white mold: Evaluation of soybean germplasm under different conditions and validation of QTL. Front. Plant Sci. 9(April), 1–12. https://doi.org/10.3389/fpls.2018.00505 (2018).
Zhang, J., Singh, A., Mueller, D. S. & Singh, A. K. Genome-wide association and epistasis studies unravel the genetic architecture of sudden death syndrome resistance in soybean. Plant J. 84(6), 1124–1136. https://doi.org/10.1111/tpj.13069 (2015).
Zhang, C. et al. Loci and candidate genes in soybean that confer resistance to Fusarium graminearum. Theor. Appl. Genet. 132(2), 431–441. https://doi.org/10.1007/s00122-018-3230-3 (2019).
Chang, H. X., Lipka, A. E., Domier, L. L. & Hartman, G. L. Characterization of disease resistance loci in the USDA soybean germplasm collection using genome-wide association studies. Phytopathology 106(10), 1139–1151. https://doi.org/10.1094/PHYTO-01-16-0042-FI (2016).
Baxter, I. We aren’t good at picking candidate genes, and it’s slowing us down. Curr. Opin. Plant Biol. 54, 57–60. https://doi.org/10.1016/j.pbi.2020.01.006 (2020).
Michno, J. M., Liu, J., Jeffers, J. R., Stupar, R. M. & Myers, C. L. Identification of nodulation-related genes in Medicago truncatula using genome-wide association studies and co-expression networks. Plant Direct 4(5), 1–10. https://doi.org/10.1002/pld3.220 (2020).
Schwartz, T. S. The promises and the challenges of integrating multi-omics and systems biology in comparative stress biology. Integr. Comp. Biol. 53(9), 1689–1699. https://doi.org/10.1017/CBO9781107415324.004 (2020).
Deshmukh, R. et al. Integrating omic approaches for abiotic stress tolerance in soybean. Front. Plant Sci. 5, 1–12. https://doi.org/10.3389/fpls.2014.00244 (2014).
Schaefer, R. J. et al. Integrating coexpression networks with GWAS to prioritize causal genes in maize. Plant Cell 30(December), 2922–2942. https://doi.org/10.1105/tpc.18.00299 (2018).
Baker, R. L. et al. Integrating transcriptomic network reconstruction and eQTL analyses reveals mechanistic connections between genomic architecture and Brassica rapa development. PLOS Genet. 15(9), e1008367. https://doi.org/10.1371/journal.pgen.1008367 (2019).
Wen, Z. et al. Integrating GWAS and gene expression data for functional characterization of resistance to white mould in soya bean. Plant Biotechnol. J. 16(11), 1825–1835. https://doi.org/10.1111/pbi.12918 (2018).
Brown, A. V. et al. A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 13(3), 1–6. https://doi.org/10.1093/nar/gkaa1107 (2020).
Torkamaneh, D., Lemay, M.-A. & Belzile, F. The pan-genome of the cultivated soybean (pansoy) reveals an extraordinarily conserved gene content. Plant Biotechnol. J. 19, 1852–1862. https://doi.org/10.1111/pbi.13600 (2021).
Machado, F. B. et al. Systematic analysis of 1,298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas. Plant J. 103, 1894–2190. https://doi.org/10.1111/tpj.14850 (2020).
Almeida-Silva, F. & Venancio, T. M. Pathogenesis-related protein 1 (PR-1) genes in soybean: Genome-wide identification, structural analysis and expression profiling under multiple biotic and abiotic stresses. Gene 809, 146013. https://doi.org/10.1016/j.gene.2021.146013 (2022).
Proost, S. et al. PLAZA 3.0: an access point for plant comparative genomics. Nucleic Acids Res. 43(D1), D974–D981. https://doi.org/10.1093/nar/gku986 (2015).
Almeida-Silva, F. & Venancio, T. M. BioNERO: an all-in-one R/Bioconductor package for comprehensive and easy biological network reconstruction. Funct. Integr. Genom. https://doi.org/10.1007/s10142-021-00821-9 (2021).
Brodie, A., Azaria, J. R. & Ofran, Y. How far from the SNP may the causative genes be?. Nucleic Acids Res. 44(13), 6046–6054. https://doi.org/10.1093/nar/gkw500 (2016).
Almeida-Silva, F. & Venancio, T. M. cageminer: an R/Bioconductor package to prioritize candidate genes by integrating GWAS and gene coexpression networks. bioRxiv 54, 57. https://doi.org/10.1101/2021.08.04.455037 (2021).
Van Bel, M. et al. PLAZA 4.0: An integrative resource for functional, evolutionary and comparative plant genomics. Nucleic Acids Res. 46(D1), D1190–D1196. https://doi.org/10.1093/nar/gkx1002 (2018).
Kourelis, J. & Van Der Hoorn, R. A. L. Defended to the nines: 25 years of resistance gene cloning identifies nine mechanisms for R protein function. Plant Cell https://doi.org/10.1105/tpc.17.00579 (2018).
Ning, Y. & Wang, G. L. Breeding plant broad-spectrum resistance without yield penalties. Proc. Natl. Acad. Sci. USA 115(12), 2859–2861. https://doi.org/10.1073/pnas.1801235115 (2018).
Li, W., Deng, Y., Ning, Y., He, Z. & Wang, G. L. Exploiting broad-spectrum disease resistance in crops: From molecular dissection to breeding. Annu. Rev. Plant Biol. 71, 575–603. https://doi.org/10.1146/annurev-arplant-010720-022215 (2020).
Durrant, W. E. & Dong, X. Systemic acquired resistance. Annu. Rev. Phytopathol. 42, 185–209. https://doi.org/10.1146/annurev.phyto.42.040803.140421 (2004).
Almeida-Silva, F., Moharana, K. C., Machado, F. B. & Venancio, T. M. Exploring the complexity of soybean (Glycine max) transcriptional regulation using global gene co-expression networks. Planta 252, 1–12. https://doi.org/10.1007/s00425-020-03499-8 (2020).
Depuydt, T. & Vandepoele, K. Multi-omics network-based functional annotation of unknown Arabidopsis genes. Plant J. 108, 1198–1212. https://doi.org/10.1111/tpj.15507 (2021).
Bao, Y., Kurle, J. E., Anderson, G. & Young, N. D. Association mapping and genomic prediction for resistance to sudden death syndrome in early maturing soybean germplasm. Mol. Breed. 35(6), 1–14. https://doi.org/10.1007/s11032-015-0324-3 (2015).
Swaminathan, S. et al. Genome wide association study identifies novel single nucleotide polymorphic loci and candidate genes involved in soybean sudden death syndrome resistance. PLoS ONE 14(2), 1–21. https://doi.org/10.1371/journal.pone.0212071 (2019).
Vinholes, P., Rosado, R., Roberts, P., Borém, A. & Schuster, I. Single nucleotide polymorphism-based haplotypes associated with charcoal rot resistance in Brazilian soybean germplasm. Agron. J. 111(1), 182–192. https://doi.org/10.2134/agronj2018.07.0429 (2019).
Coser, S. M. et al. Genetic architecture of charcoal rot (Macrophomina phaseolina) resistance in soybean revealed using a diverse panel. Front. Plant Sci. 8(September), 1–12. https://doi.org/10.3389/fpls.2017.01626 (2017).
Acknowledgements
This work was supported by Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ; grants E-26/203.309/2016 and E-26/203.014/2018), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES; Finance Code 001), and Conselho Nacional de Desenvolvimento Científico e Tecnológico. The funding agencies had no role in the design of the study and collection, analysis, and interpretation of data and in writing.
Author information
Authors and Affiliations
Contributions
Conceived the study: F.A.-S. and T.M.V. Data analysis: F.A.-S. Funding, project coordination and infrastructure: T.M.V. Manuscript writing: F.A.-S and T.M.V.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Almeida-Silva, F., Venancio, T.M. Integration of genome-wide association studies and gene coexpression networks unveils promising soybean resistance genes against five common fungal pathogens. Sci Rep 11, 24453 (2021). https://doi.org/10.1038/s41598-021-03864-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-03864-x
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.