A simple, high-throughput modeling approach reveals insights into the mechanism of gametophytic self-incompatibility

Specificity in the GSI response results from the S-haplotype-specific molecular interaction of S-locus F-box (SLF/SFB) and SRNase proteins in the self-incompatibility locus (S-locus). The answer to the question of how these two components of the S-locus (SRNase and SLF/SFB) interact has been gathered from several models. Since there is not enough evidence as to which one is the definitive model, none of them can be ruled out. Despite the identification of interacting protein elements, the mechanism by which SLF/SFB and SRNase interact to differently trigger the self-incompatibility among families and subfamilies remain uncertain. The high-throughput modeling approach demonstrates structural visions into the possible existence of a Collaborative Non-Self Recognition model in apple. These findings postulate several prospects for future investigation providing useful information to guide the implementation of breeding strategies.

2 the quality of the predicted structures the ProSa-web Z-scores are between -6.11 and -7. 16, all of which are within the range of the experimentally predicted structures (Supplementary Figure 3). The folding topologies of the modeled structures were examined using the Dali server, which was found to be very similar to the topologies of Momordica charantia ribonuclease MC and MC1, with RMSD value of 2.0 − 2.3 and the identity of 28 − 31% (Supplementary Table 4). Hence, it is concluded that these predicted structures have good homology with the experimentally solved structures of RNase T2 family enzymes. These models can, therefore, be confidently used in the docking study to further assess the Collaborative Non-Self Recognition model in apple, Malus × domestica (Borkh.). The final tertiary structures of SRNases were shown in figure S4 and their α+β counts are as follows: i) S1-and S2-RNase contain six αhelices and four β-sheets, ii) S3-, S8-, and S31-RNase contain seven α-helices and seven βsheets, iii) S4-, S16-, S26-RNase consist of seven α-helices and five β-sheets, iv) S7-and S9-RNase contain six α-helices and seven β-sheets, v) S10-and S24-RNase consist of seven αhelices and four β-sheets, vi) S20-RNase has seven α-helices and six β-sheets, and vii) S25-, S28-, and S30-RNase contain six α-helices and five β-sheets.
With respect to SLF/SFBs, Supplementary Table 4 shows the best templates used for modeling the structures and includes a list of PDB hits along with %Identity, %Coverage, and Norm Z-score values. Among the PDB hits used for the prediction four are common, namely Mus Musculus Skp1-FbS1 complex (2e31A), Homo sapiance Fbw7-Skp1-cyclin E complex (2ovpB), E. coli β-propeller protein YghT (2uvkA), Saccharomyces cerevisiae Cdc4/Skp1-SCF-I2 (3mksA) (Supplementary Figure 5). The general functional theme of these proteins allows one to elucidate that the predicted structure shares common binding sites with the above-mentioned proteins that allow for their interaction in a similar manner typical of Skp1-Fbox protein.
Generally, an F-box protein allows for the formation of SCF (Skp1/Cullin/F-box) E3 ligase complex resulting in the interaction of the F-box motif with the Skp1 protein. Therefore, this information provides further support for the validity of the predicted structures. Furthermore, the percentage sequence identity of the templates in the threading aligned region with the query sequences ranges between 6 − 22% while the coverage of the threading alignment is between 21 − 98% suggesting that many residues of the query sequences have aligned. Additionally, the Norm Z-score provided by LOMETS threading alignment is in the range of 1.06-3.15 (Z-score >1) presenting an acceptable alignment (Supplementary Table 5). To determine the best model from the top five predicted models by the I-TASSER suite the C-score, QMEAN-norm score and folding energies were assessed (Supplementary Table 6). The C-score for the predicted models ranges from -1.34 to -4.29 with the QMEAN-norm score between 0.17 and 0.35 and folding energies between -309.98 and -396.80. According to these results model 1 among the top five models of SFBB3α and SFBB3β, model 2 for SLFB9 and SFBB9β, and model 4 for SLFB3 and SFBB9α were selected as the best models and are highlighted in Supplementary Table 6. The quality of the final refined models was subjected to a series of tests for their internal consistency and reliability. Backbone conformation and overall model quality were evaluated by PROCHECK 2 in VADAR v1.8 3 and ProSa-web 4 , respectively. PROCHECK provides a detailed assessment of the stereochemistry of a protein structure as compared with well-defined structures of the same resolution while highlighting regions that may require further investigation 2 . Furthermore, the ProSa-web is used in the refinement and validation of protein structures and provides the quality scores in the context of known protein structures, as well as energy plots that indicate possible problems in the protein structure 4 . The Ramachandran plots represented in Supplementary Figure 6 were generated for the top models from Supplementary Table 6. 3 According to these figures, the distribution of the φ/ψ angles of most amino acids (90 − 96%) of the modeled structures is in the core and allowed region of the Ramachandran plots. To further support the quality of the predicted models ProSa-web Z-scores are between -4.11 and -6.82, all of which are within the range of the experimentally predicted structures (Supplementary Figure  7). The final tertiary structures of the predicted models were shown in Supplementary Figure 8 and their α+β counts are as follows: i) SLFB3 contains 10 α-helices and 23 β-sheets, ii) SFBB3α contain eight α-helices and 19 β-sheets, iii) SFBB3β consist of 9 α-helices and 17 β-sheets, iv) SLFB9 contain eight α-helices and 22 β-sheets, v) SFBB9α consist of 9 α-helices and 20 βsheets, and vi) SFBB9β contains 10 α-helices and 21 β-sheets. Furthermore, the folding topologies of the modeled structures were examined using the Dali server, which was found to be generally very similar to the topologies of F-BOX/WD-repeat proteins (Supplementary Table 7), which generally function as platforms of protein-protein interactions and are involved in the various biological process 5 .
Since, the hypervariable regions are involved in the control of allele specificity, to identify the location of hypervariable regions on Malus SRNases and SLF/SFBs, the previously identified hypervariable regions by Ashkani and Rees (2016) containing 24 amino acids from Ser-40 to Pro-63 and 27 amino acids from Lys-247 to Cys-273, based on ancestor sequence, were mapped on the primary and secondary structures of Malus SRNases and SLF/SFBs, respectively. To achieve this, McRate v1 7 was used to determine site-specific evolutionary rates. The evolutionary rate for each site was averaged with that of its neighbors using a sliding window of the size of 11-13 amino acid 6 . The position of the HV regions of SLF/SFBs and SRNases were shown in Figure 1. These results are in line with the findings of Vieira and colleagues (2007) who reported a HV region in the C-terminus of amino acid positions 41-61 for Malus/Pyrus SRNase 8 . For SLF/SFBs a HV region was detected in the N-terminus of SLF/SFBs, which is consistent with the results obtained by Tao and colleagues (2010) with the only exception that they described two HV regions (HVa and HVb) in Prunus SLF/SFBs 9 .
The ZDOCK program was used to evaluate the interactions between currently known Malus × domestica (Borkh.) SLF/SFBs and SRNases. The PDB file of the modeled structures of SLF/SFBs and SRNases were used as inputs to ZDOCK as receptors and ligands respectively. For this analysis, the number of top poses for SLF/SFB-SRNase complex was set to 2 000 with the root mean square division (RMSD) cut off the value of 10 while only the hypervariable region was provided to ZDOCK as an interface for docking. Furthermore, ZRANK was used to re-rank the ZDOCK scores. The binding energies from ZRANK were analysed using Wilcoxon rank-sum test 10 , as implemented in the R-package 11 . The RMSD values were calculated with ProFitv3.1 (Martin, unpublished, http://www.bioinf.org.uk/software/profit). Assuming that a pose with the lowest binding energy is the nearest native, the structure with the top ZRANK from the previous analysis was provided to ProFit as the reference structure in PDB format. All the other structures (from 2 000 poses) were provided as mobile structures. The reference structure remains fixed while the mobile structures are fitted on to it. Finally, to determine the accuracy of the scoring function for docking models that were shown to be significant using Wilcoxon rank-sum test, the ZRANK scores were plotted against ligand RMSDs, L-RMSDs. The red, yellow, green, and gray areas refer to core, allowed, general, and disallowed regions of the plot respectively. The plot was generated using PROCHECK 2 in VADAR v1.8 3 .  Table 1). The location of HV region is shown as surface representation on the S10-RNase structure in red while the amino acid composition of HV is also marked. Supplementary Figure 6 | Ramachandran plots of the modeled SLF/SFBs. The red, yellow, green, and gray areas refer to core, allowed, general, and disallowed regions of the plot respectively. The plot was generated using PROCHECK 2 in VADAR v1.8 3 .      [14], 2uvkA: Escherichia coli betapropeller protein YjhT [15], 3mksA: Saccharomyces cerevisiae Cdc4/Skp1-SCF-I2 [16], 1kb0A: Comamonas testosterone Quinohemoprotein alcohol dehydrogenase [17], 1p22A: Human β-TrCP1-Skp1-beta-catenin complex [18], 2dyhA: Mus musculus Kelch-like ECH-associated protein 1 [19]. Table 6 | Quality assessment of the top-five predicted models of SLF/SFBs. The scores were generated using I-TASSER suite and the QMEANnorm score was calculated using QMEAN server.

Supplementary
Note: The best model among the top-five predicted models for each SLF/SFB is shaded.