Comparative secretome analysis of Rhizoctonia solani isolates with different host ranges reveals unique secretomes and cell death inducing effectors

Rhizoctonia solani is a fungal pathogen causing substantial damage to many of the worlds’ largest food crops including wheat, rice, maize and soybean. Despite impacting global food security, little is known about the pathogenicity mechanisms employed by R. solani. To enable prediction of effectors possessing either broad efficacy or host specificity, a combined secretome was constructed from a monocot specific isolate, a dicot specific isolate and broad host range isolate infecting both monocot and dicot hosts. Secretome analysis suggested R. solani employs largely different virulence mechanisms to well-studied pathogens, despite in many instances infecting the same host plants. Furthermore, the secretome of the broad host range AG8 isolate may be shaped by maintaining functions for saprophytic life stages while minimising opportunities for host plant recognition. Analysis of possible co-evolution with host plants and in-planta up-regulation in particular, aided identification of effectors including xylanase and inhibitor I9 domain containing proteins able to induce cell death in-planta. The inhibitor I9 domain was more abundant in the secretomes of a wide range of necrotising fungi relative to biotrophs. These findings provide novel targets for further dissection of the virulence mechanisms and potential avenues to control this under-characterised but important pathogen.


Supplementary Methods. Prediction of the R. solani pan secretome
Proteins encoded by R. solani AG8 WAC10335 1 and R. solani AG1-IA 2 along with predicted protein sequences (Table S2) from a GeneMark-ES version 2 3 gene prediction of R. solani AG3 Rhs1AP 4 were analysed by the SignalP 2.0 neural network algorithm to predict the secretome from each isolate in accordance with previous reports 5, 6 . Any protein with a predicted transmembrane domain outside of the secretion sequence according to TMHMM 2.0 7 or a potential mitochondrial signal according to TargetP 1.1b 8 were removed. Predicted secretion signal sequences were removed from the proteins, any proteins with a trimmed size less than 40 amino acid residues were removed and CD-HIT 9 used to cluster redundant proteins from all isolates with the parameters -c 90 -n 5 producing efficient clustering of orthologs but minimal clustering of homologs. In instances where proteins from multiple isolates were clustered, the AG8 protein was chosen as the cluster representative to facilitate later RNAseq analysis. Proteins in the pan-secretome were analysed for the percent cysteine residues, the presence of known effector associated motifs, the presence in internal repeats, similarity to domains in the pfam database 10 , the length of the flanking intergenic region, the presence of nuclear localisation signals, whole protein and sitespecific diversifying selection, EffectorP classification as candidate effectors 11 , isolate specificity or conservation, conservation in other fungal plant pathogens, animal pathogens and non-pathogenic fungi, and for AG8 proteins; their regulation during infection of wheat.  5,[12][13][14][15][16][17][18] . Internal repeats within the secretome proteins were predicted using T-Reks 19 and nuclear localisation signals were predicted with PredictNLS 20 . PFAM domains were mapped on proteins using the PFAM batch search server 10 . The flanking intergenic region for the coding region relating to each secretome protein was calculated using Perl scripts. EffectorP prediction was conducted on the secretome proteins according to Sperschneider et al. 11 . Calculation of entire protein and site-specific positive selection was conducted according to Sperschneider et al. 21 . Markov clustering of the pan secretome was performed using TribeMCL 22 according to Haas et al. 23 . Orthology between the AG3 protein sequences used in this study with the gene models released in Cubeta et al. 4 is provided in Supplementary Data S8. Orthology between genes in all R. solani isolates currently having a publically available genome sequence according to OrthoFinder v0.4 24 is presented in Supplementary Data S7.

Conservation of proteins in R. solani isolates and other fungal species
To explore the potential for conservation of the proteins among the three isolates, AG8, AG1-IA and AG3, the proteins comprising the secretomes and proteomes were compared using Orthofinder version 0.4 24 to group protein into orthology groups. The presence of orthology groups between isolates was compared using VENNY 25 to display shared and isolate specific orthology groups. Blast2GO 26 was used to identify overrepresented gene ontology terms associated with proteins in shared or isolate-specific orthology groups. To take a stringent approach toward identifying proteins with potential homology in the other isolates, and to examine the potential for conservation of genes between isolates that is not reflected in the predicted secretome, 3.3e6 100 bp HiSeq reads from genomic AG8 DNA (equivalent to approximately 17x genome coverage) were mapped to the coding sequences for the pan-secretome using bowtie 2.0.5 27 using the very-sensitive-local parameters. The reads per coding region counted using bedtools 2.20.1 28 . Any coding sequence with at least one read mapping was considered potentially present in that species. The conservation of secretome proteins in the secretome of other fungal species was examined using BLASTp version 2.2.30. Proteins hits with e-values higher than 1e-5 were disregarded. Genome sequences were obtained from data of Broad Institute of Harvard and MIT, and Joint Genome Institute. Secretomes were predicted according to the same method of the prediction of R. solani secretomes as described above. The number of proteins from that species meeting the cut off was recorded for each R. solani combined-secretome protein (Supplementary Data S3).

Gene expression analysis
Three day old seedlings of wheat genotype Chinese spring were planted into vermiculite that had been preinoculated with R. solani AG8 for 1 week at 21°C. Mock treated control seedlings were planted into vermiculite without pre-inoculation with R. solani. At 48 hours and 7 days after inoculation, seedlings were harvested and above ground and root tissue collected separately and frozen in liquid N 2 . Additional seedlings were scored for disease development at 21 days after inoculation to confirm infection occurred successfully. RNA was extracted from three biological replicates of all treatments using Trizol (Sigma) and sequenced with 100bp paired end strand specific Illumina HiSeq reads (Ramaciotti Centre for Genomics, NSW). Sequence reads were trimmed for adapters and quality and mapped to the AG8 genome sequence (NCBI bioproject number PRJNA187548) using Tophat-2.0.8b 29 according to Hane et al. 1 . Reads mapping to genes were counted using HTseq-count 0.6.0 30 and analysis of differential expression conducted using both EdgeR 2.4.6 and DEseq 1.6.1 31,32 . Genes predicted to be significantly differentially expressed with greater than 2 fold change in expression by both EdgeR and DEseq were considered differentially regulated for the purposes of this study.

Tribe scoring
Scoring of tribes for protein characteristics was conducted according to Saunders et al. 5 . Briefly, the characteristics were converted to plus or minus scores for individual proteins as follows. The degree of conservation among the three R. solani isolates was assessed by identifying proteins with matches in all three isolates (R. solani conserved) or unique to single isolates (R. solani unique). Homology of secretome proteins to non-pathogens was assessed by evaluating the absence of homology in the non-pathogen secretomes tested to allow for a positive score to be incorporated. Proteins with a hit to at least one plant or animal pathogen species were considered potentially conserved. Scores for up-regulation of gene expression during infection were assigned only to tribes that contained AG8 proteins according to whether those AG8 proteins were up-regulated during infection of wheat. Tribes without AG8 proteins were given an infection-related expression score of zero (minus). Tribe characteristic scores were calculated based on the proportion of proteins within that tribe that meet a particular criteria and the overall tribe score calculated as a sum of all characteristic scores. Even weighting was applied to all characteristics since selection for tribes with particular combinations of characteristics was to be conducted only after hierarchical clustering of high scoring tribes. As tribes with low overall scores were more prone to harbouring effector properties by chance, a score threshold being the median tribe score for tribes having at least three members was calculated. Tribes having an overall score greater than the calculated median (42.03) were selected for further analyses. Hierarchical clustering of tribes was conducted with the score associated to each property for proteins in tribes considered as the 'intensity' values. The consensus tribe hierarchical tree was derived from 1000 bootstrap runs with Pearson correlation coefficient as distance value, and average linkage between groups using MEV 4.9 33 . The tribe hierarchical tree and tribe characteristics were viewed using circos-0.69-3 34 .
Quantitative polymerase chain reaction and functional analysis of candidate effectors RNA extraction, cDNA production and quantitative PCR was conducted as previously described 35 using primer sequences in Table S12. Three biological replicates and two technical repeats were analysed for each treatment. The coding regions for candidate effector genes minus the predicted secretion sequence were cloned from cDNA using primers in Table S12 or codon optimised sequence was synthesised. The tobacco PR1 secretion sequence 36 was added to the 5' end of the open reading frame to enable efficient secretion from tobacco cells. For transient expression in N. benthamiana, the coding region was cloned under the control of the 35S promoter in pK7WG2D 37 and introduced into leaves via Agrobacterium tumefaciens AGL1 mediated transformation as previously described 38 . Expression of GFP from pK7WG2D was observed in the infiltrated regions to confirm successful transformation and transgene expression in each leaf. Each experiment was repeated a minimum of three time with similar results