Newly identified relatives of botulinum neurotoxins shed light on their molecular evolution

The evolution of bacterial toxins is a central question to understanding the origins of human pathogens and infectious disease. Through genomic data mining, we traced the evolution of the deadliest known toxin family, clostridial neurotoxins, comprised of tetanus and botulinum neurotoxins (BoNT). We identified numerous uncharacterized lineages of BoNT-related genes in environmental species outside of Clostridium, revealing insights into their molecular ancestry. Phylogenetic analysis pinpointed a sister lineage of BoNT-like toxins in the gram-negative organism, Chryseobacterium piperi, that exhibit distant homology at the sequence level but preserve overall domain architecture. Resequencing and assembly of the C. piperi genome confirmed the presence of BoNT-like proteins encoded within two toxin-rich gene clusters. A C. piperi BoNT-like protein was validated as a novel toxin that induced necrotic cell death in human kidney cells. Mutagenesis of the putative active site abolished toxicity and indicated a zinc metalloprotease-dependent mechanism. The C. piperi toxin did not cleave common SNARE substrates of BoNTs, indicating that BoNTs have diverged from related families in substrate specificity. The new lineages of BoNT-like toxins identified by computational methods represent evolutionary missing links, and suggest an origin of clostridial neurotoxins from ancestral toxins present in environmental bacteria. Significance statement The origins of bacterial toxins that cause human disease is a key question in our understanding of pathogen evolution. To explore this question, we searched genomes for evolutionary relatives of the deadliest biological toxins known to science, botulinum neurotoxins. Genomic and phylogenetic analysis revealed a group of toxins in the Chryseobacterium piperi genome that are a sister lineage to botulinum toxins. Genome sequencing of this organism confirmed the presence of toxin-rich gene clusters, and a predicted C. piperi toxin was shown to induce necrotic cell death in human cells. These newly predicted toxins are missing links in our understanding of botulinum neurotoxin evolution, revealing its origins from an ancestral family of toxins that may be widespread in the environment.


Introduction
Despite a wealth of detailed knowledge on the structure-function relationship of many bacterial toxins (1)(2)(3)(4), the evolutionary origins of important toxins responsible for human disease remain unclear (5,6). Although bacterial toxins are typically studied in the context of their pathogenesis toward human cells, modern toxins are likely descendants of ancestral proteins that targeted difference substrates, cell types and hosts. By identifying the evolutionary relatives of bacterial toxins in diverse environments (5), it may be possible to not only reconstruct these toxin evolutionary histories, but also identify sequence changes responsible for their unique activities (5)(6)(7)(8).
Clostridial neurotoxins, including botulinum neurotoxins (BoNTs) and tetanus neurotoxin (TeNT), the causative agents of botulism and tetanus, are the most deadly toxins known to science with LD 50 values ranging from 0.1-1.0 ng per kg (9). Clostridial neurotoxins represent serious threats to public health and require constant monitoring by health agencies. Owing to their extreme toxicity, BoNTs are potential bioterrorism agents, and yet also have enormous utility as protein therapeutics (10,11). BoNTs are produced by Clostridium botulinum, a polyphyletic taxon classified solely by the presence of the neurotoxin, and several other Clostridium species. Neurotoxin genes reside in distinct gene clusters encoded on the chromosome, plasmids or phages.
The extreme toxicity of BoNT is ultimately due to its structure and function. BoNTs are initially produced as a single polypeptide chain, which is then cleaved by a bacterial protease to result in a light-chain (LC) and heavy-chain (HC) which remain linked by a disulfide bond. A C-terminal receptor-binding domain in the heavy chain (H CC ), which adopts a ricin-type beta-trefoil structure, targets BoNTs to different neuronal receptors in the host, including SV2 for BoNT serotypes A/D/E/F, and synaptotagmin I/II for BoNT serotypes B/G with polysialogangliosides as co-receptors (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23). After neuronal binding, BoNTs are internalized within endocytic vesicles. At low pH, the HC translocation domain, which forms an all alpha-helical bundle structure, transports the partially unfolded LC into the cytosol. The BoNT-LC, an ~400 residue N-terminal zinc metalloprotease domain, then cleaves intracellular SNARE proteins including VAMPs, SNAP25, and syntaxin 1 (24)(25)(26)(27) to prevent exocytosis of synaptic vesicles, leading to flaccid paralysis (3).
The sophistication of BoNT's mechanism of action (see Rossetto et al. (3) for a thorough review) raises questions about its evolutionary origins. Recent work describing a divergent homolog of BoNTs in the genome of Weissella oryzae, suggests that BoNT-related proteins may not be limited to the Clostridium genus (8,28). As the Weissella toxin is the only identified homolog outside of the BoNT family to date, however, it is unclear whether it represents a highly divergent pseudogene, or if it indicates the existence of a larger bont-like gene family present in a broader range of bacterial taxa.
Here we present a large-scale bioinformatic screen for putative BoNT-related toxin genes in all available genomes. Our search identified 161 BoNT-related genes in 30 different species.
Predicted genes include new candidate toxins from Chryseobacterium, Mycobacterium, Streptomyces, and 19 species of entomopathogenic fungi, all containing signatures of toxin functionality. We identified toxin candidates in the Chryseobacterium piperi genome (29), which we resequenced and verified to encode two toxin-containing gene clusters with tandem duplications reminiscent of BoNT-NTNH. Experimental validation of C. piperi toxins revealed a candidate toxin with metallopeptidase activity that arrested cell proliferation in human kidney cells. Phylogenetic analysis suggests that the C. piperi and other identified toxins are missing links that shed light on the evolution of BoNTs from ancestral protein toxins present in environmental bacteria.

Genomic data mining predicts BoNT-like toxins outside of Clostridium
We screened the NCBI Genbank database (March 26, 2017) comprised of 94,396 prokaryotic, 4,123 eukaryotic, and 7,178 viral genomes, for potential homologs of BoNTs. Using PSI-BLAST with selected BoNT sequence queries (see Methods), we detected a total of 309 protein sequences displaying at least partial homology to BoNTs with an E-value below 0.001 (Table   S1). The dataset includes all known BoNT serotypes, and an additional 161 BoNT-related sequences, all of which are experimentally uncharacterized to date. We performed all-by-all pairwise alignments and clustered the toxins using principal coordinates analysis (PCoA), which revealed three main sequence clusters (Fig. 1A). Cluster I includes a large family of putative ADP-ribosyltransferase toxins from entomopathogenic fungi, as well as diphtheria toxin, all possessing partial similarity only to the BoNT translocation domain (17.3% max sequence identity, PSI-BLAST E-value = 7 x 10 -40 ). Cluster II includes type III effector proteases such as Escherichia coli NleD (peptidase family M91) which cleaves host JNK and p38 (30), and which possess remote detectable homology to the BoNT-LC with 14.9% max sequence identity and a PSI-BLAST E-value of 4 x 10 -5 (see Methods).
The lower right cluster (III) contains BoNTs, NTNHs, the Weissella toxin and several uncharacterized proteins (Fig. 1A) that share multiple domains in common with BoNTs (Fig. 1B,   Fig. S2) and are therefore of considerable interest (8,31). Among the uncharacterized proteins are nine partial and full-length homologs from the genome of Chryseobacterium piperi, two from Mycobacterium chelonae, and five from other Actinobacteria. Some of these organisms are associated with disease; Chryseobacterium species are opportunistic pathogens (32,33), Acaricomes phytoseiuli is a pathogen of mites (34), and Mycobacterium chelonae is a human pathogen associated with skin, soft tissue, and bone infections (35). We termed these proteins "BoNT-like toxins" given their evolutionary distance to BoNTs (Fig. 1A) and similarity of domain architecture (Fig. 1B). As shown for a representative protein from this group (putative

Chryseobacterium toxins are a sister lineage to BoNTs
Putative protease domains from BoNT-like toxins were aligned to the BoNT-LC to perform sequence, structural, and phylogenetic analysis (Fig. 2, Fig. S3, S4). Phylogenetically, BoNTs grouped into a distinct clade with BoNT/X and Weissella toxin forming divergent early branching lineages ( Fig. 2A, Fig. S4). A sister lineage consisting of the four C. piperi homologs and one M. chelonae homolog next grouped with the BoNT family with high statistical confidence (98% boostrap support) ( Fig. 2A, Fig. S4). Additional toxins from Streptomyces grouped next with this cluster, and finally a large clade of distantly related, type III effector proteases (M91 family) ( Fig. 2A). Despite some variable segments and low sequence identity (BoNTA1/CP1: 17.9%), the protease domains from C. piperi and other BoNT-like toxins possess detectable homology to the BoNT-LC (bl2seq E-value = 2 x 10 -6 between Cp1 and BoNT/A1) and conserve key functional residues found in BoNTs (Fig. 2B). These residues include: the critical HExxH zinc-coordinating active site motif; the third zinc ligand Glu-261; the functionally important Glu-350 which shapes active site fine structure, the active site stabilizing motif R363-x-x-Tyr366 (36), and the cysteine residues that form the disulfide bridge between BoNT LC and HC (37) (Fig. 2B).
Consistent with phylogenetic analysis, the predicted structure of C. piperi toxins are most similar to BoNT-LC (7.0Å RMSD versus 12.0Å for E. coli NleD) ( Fig. 2A). One insertion is common to BoNT-LCs and BoNT-like toxins, and absent in NleD and other M91 proteases ( Fig. 2A,B), which makes extensive contacts with SNAP25 (51 inter-residue contacts < 2 Å) and VAMP2 (91 inter-residue contacts < 2Å) in BoNT co-crystal complexes (Fig. S5). This insertion may therefore have contributed to an ancestral shift in substrate specificity between M27 and M91 protease families. A second C-terminal insertion common to BoNT-LC and BoNT-like toxins ( Fig. 2A,B) forms part of the hydrophobic SNAP25 binding pocket (38), and has been shown to mediate catalytic activity and product removal (39).
In addition to LC conservation, Chryseobacterium and other BoNT-like toxins possess significant similarity to the BoNT translocation domain (22% identity, bl2seq E-value < 1 x 10 -5 ), particularly across the region 593-686 in BoNT/A1 (Fig. S6), which forms a critical channelforming amphipathic alpha helical motif (40)(41)(42). This region common to BoNTs and BoNT-like toxins is also highly similar to the membrane-inserting residues 286-325 of diphtheria toxin (helices TH5-TH6/TH7, PDB ID 4AE0) (43), consistent with a structural prediction for this region made by PHYRE (Fig. S2). In particular, the motif [K/R]x(8)PxxG is universally conserved among the translocation regions of all the identified toxins in our screen except the single-domain M91 family proteases (Fig. 1B, Fig. S6), highlighting it as a key sequence signature of translocation function.
Lastly, following the translocation domain, BoNT-like toxins possess a receptor-binding domain that is predicted to adopt the same fold as the BoNT H CC domain (Fig. 1B, Fig. S2). A ricin-type beta-trefoil fold was predicted for the C-terminal region of the putative C. piperi toxins by three separate methods (HHpred, Pfam, and Phyre with E-values < 0.001). Interestingly, a ricin-type beta-trefoil domain from the C. botulinum hemagglutinin, HA33, was identified as the top template by PHYRE (Fig. S2), indicating yet another link between the C. piperi toxins and BoNT clusters.

Genome resequencing of Chryseobacterium piperi confirms presence of bont-like gene clusters
The putative C. piperi toxins are located on three separate contigs (2, 44, and 59) in the original draft C. piperi genome (NCBI accession JPRJ01, 89 total contigs). To verify a C. piperi origin for these contigs, rule out the possibility of sequencing artifacts, and enable further genome-wide analysis, C. piperi was acquired from ATCC and sequenced on Illumina MiSeq and Pacific Biosciences RS-II sequencing platforms. A closed genome 4.5Mbp in length, 35.3% GC content, and 250X coverage was produced and analysed (Fig. 3). The assembly revealed one toxin gene cluster (GC1) located at 1399-1432 kbp consisting of two bont-like genes, and a second toxin gene cluster (GC2) located at 3287-3312 kbp consisting of one bont-like gene (Fig. 3).
Similar to BoNT and BoNT/Wo gene clusters, C. piperi BoNT-like toxin genes are flanked by a neighboring gene (designated "NTNH-analog") with detectable partial homology but lacking the conserved HExxH active site motif. Additionally, the C. piperi "NTNH-analog" genes uniquely feature a IBC1 ("Isoprenoid_Biosyn_C1 super family") domain at the N-terminus similar to class I terpene-synthases whose role is unclear. No additional neurotoxin-associated proteins from the ha/p47/orfx families are present in these clusters. are complete and partial IS1595 family, ISChpi, insertion sequences (CJF12_14525 and CJF12_14620). Third, genes neighboring Chryseobacterium toxin gene clusters were found to possess homology to genes in M. chelonae (e.g., closest homolog of CJF12_14560 was M. chelonae WP_064393402.1, 71% amino acid identity), consistent with the detected similarity between the C. piperi toxins and M. chelonae genes ( Fig. 2A).

Chryseobacterium piperi toxin: a novel BoNT-like toxin that induces necrotic cell death
Given the substantial sequence variation between predicted C. piperi toxins, we selected WP_034687872 (Cp1) for experimental characterization based on it having the greatest sequence identity to BoNTs (% id = 17%). Initial protease assays of recombinant Cp1 LC against known BoNT substrates, including VAMPs and SNAP25 SNARE proteins, yielded negative results (results not shown). Although the putative toxins from Chryseobacterium did not display activity against canonical BoNT targets, the conservation of the active site residues and similarity to M27 and M91 metalloproteases suggested the possibility of other targets. We elected to test for broad, metallopeptidase-induced toxicity via transfection and subsequent expression of the Cp1 LC cDNA in human embryonic kidney HEK293T cell line. CP1-LCs with active site knockouts at H209A and E210A, were utilized as negative controls.
As shown in Fig. 4A, the expression of wild-type CP1-LC resulted in a cell death phenotype in HEK293T cells. These cells stopped proliferating and were visibly shrunken, eventually dying and detaching from culture plates. Cell counts after 48 hours revealed an almost 4-fold reduction in the number of cells (Fig. 4B). No cell death phenotype or significant reduction in cell number was observed in the H209A and E210A mutants, confirming that the observed toxicity is metalloprotease-dependant. To further confirm the effect of CP1-LC, we performed cell apoptosis assays by flow cytometry using Hoechst 33342, YO-PRO-1, and propidium iodide (Fig. 4C) Following the recently discovered Weissella toxin, Chryseobacterium BoNT-like toxins appear to be the next closest evolutionary relatives to BoNTs. The identification of BoNT homologs that cluster outside of the BoNT family ( Fig. 2A) and possess toxin activity (Fig. 4) raises the strong possibility that BoNTs evolved from precursor proteolytic toxins targeting different substrates.

. In this assay, live cells (blue), apoptotic cells (blue and green), and dead cells (bright
By assessing the domain architectures of BoNT-like toxins in the context of the phylogenetic tree, it is possible to reconstruct a stepwise model of BoNT evolution (Fig. 5). This phylogenetic model suggests that BoNTs evolved from an ancestral toxin containing a M27-like protease domain, which in turn is related to the more distant M91 family of single domain type III effector proteases. The ancestral toxin also likely possessed a translocation domain that may have more closely resembled the T domain of diphtheria toxin. Finally, the ancestral toxin also likely possessed a receptor-binding region encoding a beta-trefoil ricin-type fold. Evolutionary novelties that occurred in the lineage leading to BoNTs appear to include extension of the translocation domain to include the large alpha-helical bundle, the gain of the N-terminal LamGlike binding domain, as well as the acquisition of toxin accessory genes neighbouring the proteolytic toxin gene (including NTNH and HA or P47/ORFX proteins). The increased sequence diversity of BoNT-like toxins, as well as the taxonomic diversity of their hosts, suggests that these represent early-branching lineages in BoNT evolution, and the canonical domain architecture of BoNTs has emerged more recently.
The future identification of the substrates targeted by Chryseobacterium toxin, Weissella toxin and others, combined with determination of their structure, will illuminate the specific evolutionary changes in the BoNT LC responsible for its gain of activity against neuronal SNAREs. We anticipate that this work will ultimately uncover a detailed model for the origin of the most deadly toxin known.

Detection, comparison and analysis of bont-like genes
Sequences were retrieved using PSI-BLAST with default parameters (BLOSUM62 scoring matrix; expect threshold 10; gap open 11; extension 1) from the nr database on (March 26, 2017) (45). Initial homologs were discovered by searching with BoNT/A1 (NCBI accession number coli NleD (WP_069191536.1) as the original queries. These sets of T3SS effectors and diphtheria toxins were pruned to remove identical sequences using Jalview (46). To obtain the Austwickia pseudogene corresponding to the diphtheria C domain, we translated the region immediately upstream of the protein containing diphtheria T and R domains (the region 39931-43386 on contig NZ_BAGZ01000024.1 of A. chelonae NBRC 105200).
All-by-all sequence pairwise alignments were generated with needle (of the EMBOSS package, v6.6.0.0 (47)) with default parameters (gap open = 10, gap extend = 0.5, EBLOSUM62 scoring matrix). In Figure 1, percent similarity was used over percent identity in order to allow divergent homologs to cluster more accurately. Principal coordinate analysis was performed in R on a distance matrix of pairwise similarity values using the default dist() and cmdscale() functions.
Domains were annotated with hmmscan (v3.1b2, available from http://hmmer.org/) against the Pfam database v31.0 (48) with an E-value cutoff of 1e-6. Annotations were subsequently confirmed by comparison to the Conserved Domain Database with relaxed cutoffs v3.16 (49), and alignment to BoNTs. Full domain annotations are available upon request. For Figure 1, the BoNT homologs with the most BoNT-like annotations were depicted to facilitate comparison between categories.

Comparison of proteases from BoNTs, BoNT-like proteins and T3SS effector peptidases
All BoNT homologs possessing a putative peptidase domain (i.e., possessing an HExxH motif) were aligned with BoNT and M91 peptidases using Clustal-Omega with defaults v1.2.1 (50), manipulated and colored in Jalview (46). Only regions corresponding to the peptidase domain boundaries were used, the positions of which were estimated based on alignment with domain boundaries of BoNT/A1 (PDB structure 3BTA). The same alignment procedure was used to identify the putative translocation region of BoNT homologs (Fig. S6). After identifying putative domains in BoNT homologs, the segments were combined and realigned.

Structural modelling of BoNT-like proteins and T3SS effector peptidases
Structural templates were identified for Cp1 (C. piperi, accession WP_034687872.1) using the LOMETS meta-server (52) (54). Identified template sequences were aligned to M91 with T-Coffee Expresso (55), which uses 3D-Coffee to incorporate structural information during alignment. A total of 20 homology models were created with slow refinement based on the resulting alignment using MODELLER (56). The model with the lowest DOPE Z-Score was selected. Structural quality was assessed with Ramachandran plot analysis using PROCHECK (57). Models were visualized using Chimera (58).

Re-sequencing of the Chryseobacterium piperi genome
Methods, materials, and platforms used in the sequencing and assembly of C. piperi are described in Wentz et al. (59). The closed genome is accessible at the DDBJ/ENA/Genbank under the accession number CP023049. MiSeq and RS-II reads utilized in assembly are available at NCBI SRA under accessions SRX3229522, SRX3231351, SRX3231352. Figure 3 was generated using the program Circos (60).