Integration of genome-wide association studies and gene coexpression networks unveils promising soybean resistance genes against five common fungal pathogens

Soybean is one of the most important legume crops worldwide. However, soybean yield is dramatically affected by fungal diseases, leading to economic losses of billions of dollars yearly. Here, we integrated publicly available genome-wide association studies and transcriptomic data to prioritize candidate genes associated with resistance to Cadophora gregata, Fusarium graminearum, Fusarium virguliforme, Macrophomina phaseolina, and Phakopsora pachyrhizi. We identified 188, 56, 11, 8, and 3 high-confidence candidates for resistance to F. virguliforme, F. graminearum, C. gregata, M. phaseolina and P. pachyrhizi, respectively. The prioritized candidate genes are highly conserved in the pangenome of cultivated soybeans and are heavily biased towards fungal species-specific defense responses. The vast majority of the prioritized candidate resistance genes are related to plant immunity processes, such as recognition, signaling, oxidative stress, systemic acquired resistance, and physical defense. Based on the number of resistance alleles, we selected the five most resistant accessions against each fungal species in the soybean USDA germplasm. Interestingly, the most resistant accessions do not reach the maximum theoretical resistance potential. Hence, they can be further improved to increase resistance in breeding programs or through genetic engineering. Finally, the coexpression network generated here is available in a user-friendly web application (https://soyfungigcn.venanciogroup.uenf.br/) and an R/Shiny package (https://github.com/almeidasilvaf/SoyFungiGCN) that serve as a public resource to explore soybean-pathogenic fungi interactions at the transcriptional level.


Curation of resistance-associated SNPs
SNPs that contribute to resistance against phytopathogenic fungi were manually curated from the using the .vcf files for both assemblies available at Soybase (Brown et al., 2020).   The VCF file with genotypic information for all accessions in the USDA germplasm was downloaded 119 from Soybase (Brown et al., 2020). Scores 0, 1, and 2 were attributed to accessions with 0, 1, and 120 2 beneficial SNPs (effect size >0), respectively, whereas scores 2, 1, and 0 were attributed to 121 accessions with 0, 1, and 2 deleterious SNPs (effect size <0). The resistance potential of the best 122 accessions was calculated as a ratio of the attributed scores to the theoretical maximum score (all 123 beneficial SNPs and no deleterious SNPs).  Figure 1C). Further, we found that most 132 SNPs were located in intergenic regions ( Figure 1D). Hence, predicting SNP effect on genes would 133 not be suitable for this trait.

142
The specificity of resistance genes to particular species has been widely reported (Durrant & 6 imposes a challenge for biotechnological applications, as it requires pyramiding many different genes to render elite cultivars resistant to different pathogens. However, we cannot rule out that multiple strains of the same species (Ning & Wang, 2018).

150
Further, we manually curated the high-confidence candidate resistance genes to predict the 151 putative role of their products in plant immunity (Supplementary Table S4

156
Interestingly, 21 candidate genes lack functional description and, hence, we could not infer their 157 roles in plant immunity (n=2, 4, 14, and 1 for C. gregata, F. virguliforme, and P. pachyrhizi, 158 respectively). Nevertheless, as they were identified as high-confidence candidate genes, we 159 hypothesize that they encode defense-related proteins. We also developed a scheme that was used 160 to rank high-confidence candidate genes, which can be used to prioritize candidates for 161 experimental validation in future studies (Table 2).  Figure 1B). Strikingly, we observed no clustering by geographical origin, suggesting that gene PAV is not affected by population structure. As this pangenome is 7 structure effect can be due to breeding programs targeting optimal adaptation to different environmental conditions (e.g., latitude and climate), even if they are in the same country.
contain all resistance alleles, revealing that, theoretically, they could be further improved to increase 185 resistance (Table 3). All resistance-associated SNPs against P. pachyrhizi are present in some 186 accessions, but this is because only two SNPs have been reported for this species. Additionally, none 187 of the reported SNPs for F. graminearum have been identified in the SoySNP50k collection. Hence, 188 we could not predict the most resistant accessions to this fungal species in the USDA germplasm.

189
Although some individual genes can confer full race-specific resistance to some pathogens, the application locally as a Shiny app, ensuring the application will always be available, even in case 210 of server downtime.

8
By integrating publicly available GWAS and RNA-seq data, we found promising candidate genes in association study of partial resistance to sclerotinia stem rot of cultivated soybean based on the detached leaf method. PLoS ONE 15: 1-15.   Figure 2. Venn diagram of prioritized candidate resistance genes against each species. The diagram demonstrates a high species-specific response to each pathogen, as genes are mostly not shared. Only three genes are shared between F. graminearum and F. virguliforme, suggesting some conservation at the genus level.    Table 3. Top 5 most resistant soybean accessions against each fungal pathogen. Overall, the best genotypes do not reach the maximum potential. An exception is observed for P. pachyrhizi-resistant genotypes, but this is likely due to the small number of resistance SNPs. None of the resistance SNPs for F. graminearum have been identified in the USDA SoySNP50k compendium and, hence, we could not predict resistance potential against this species.