Venoms are a rich source for the discovery of molecules with biotechnological applications, but their analysis is challenging even for state-of-the-art proteomics. Here we report on a large-scale proteomic assessment of the venom of Loxosceles intermedia, the so-called brown spider. Venom was extracted from 200 spiders and fractioned into two aliquots relative to a 10 kDa cutoff mass. Each of these was further fractioned and digested with trypsin (4 h), trypsin (18 h), pepsin (18 h), and chymotrypsin (18 h), then analyzed by MudPIT on an LTQ-Orbitrap XL ETD mass spectrometer fragmenting precursors by CID, HCD, and ETD. Aliquots of undigested samples were also analyzed. Our experimental design allowed us to apply spectral networks, thus enabling us to obtain meta-contig assemblies, and consequently de novo sequencing of practically complete proteins, culminating in a deep proteome assessment of the venom. Data are available via ProteomeXchange, with identifier PXD005523.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Scientists have long enlisted venoms in their quest to characterize novel molecules with biotechnological applications1,2. The literature provides innumerous examples of venom-derived applications, ranging from biopesticides to medical applications. In particular, works on serpent venom are, unarguably, success stories. Some examples are: Batroxobin, a widely used thrombin-like enzyme and commonly extracted from the venom of Bothrops atrox and Bothrops moojeni, has been used as a replacement for thrombin in bleeding injuries3; Ecarin, from Echis carinatus, as the primary reagent for laboratorial tests that monitor anticoagulation4; and Captopril, developed from peptides of the Bothrops jararaca venom, as a widely adopted inhibitor of the angiotensin converting enzyme (ACE). Other examples of venom-derived drugs include: Aggrastat, for myocardial infarct and ischemia; Ancrod, for stroke; Defibrase, for acute cerebral infarction and angina pectoris; Exanta, used as an anti-coagulant; Hemocoagulase, for hemorrhage; and Integrilin, for acute coronary syndrome5. Venoms have also been used to search for inhibitors derived from other species (e.g., Didelphis marsupialis)6,7.
Motivated by all the successful research on snake venoms, efforts have been geared towards spider toxins. In particular, those from the Loxosceles genus are already being used in at least four general application fronts, viz.: as therapeutic anti-venom sera8; as tools in molecular and cellular biology research; and as aids in drug development and production of selective and environmentally friendly bioinsecticides5. Peptides originating from the venom of Thrixopelma pruriens have been used in the treatment of pain and inflammation9; the T×2–5 and T×2–6 neuropeptides from the Phoneutria nigriventer venom, for treating erectile dysfunctions10; and distinct bioactive peptides from spider venoms, in the treatment of diverse diseases, such as cancer11. Taken together, toxins have served as an endless treasure trove for biotechnological applications.
Spider venoms, in particular, comprising mainly proteins and peptides2,5,12,13 and displaying great diversity in their toxins, have drawn considerable attention. Yet, characterizing venoms poses great challenges even for state-of-the-art proteomic strategies: in fact, most species lack a reference sequence genome14 and the post-translational modifications of venoms vary greatly. Moreover, current mainstream strategies are not tailored towards performing de novo sequencing of the large (i.e., greater than tryptic), biologically active peptides that abound in venoms. Indeed, peptide-centric approaches are oblivious to whether a sequenced peptide originates from a larger peptide or a full protein, but obtaining the complete sequence of these larger molecules will undoubtedly fuel a great diversity of biotechnological applications. In this regard, it is our view that widely adopted proteomic strategies such as peptide spectrum matching (PSM)15,16 and mainstream de novo sequencing17 only reveal the tip of the iceberg in terms of what can be unveiled from venoms.
One of our goals has been to characterize the venom of the so-called brown spiders (the Loxosceles genus). Altogether, their venom is composed of a complex cocktail of biologically active compounds, with toxins ranging up to 40 kDa and over18. To the best of our knowledge, an in-depth, comprehensive proteomic profiling of the Loxosceles venom tailored towards the discovery of new molecules has so far remained elusive. Currently, there are several descriptions of enzymatic and non-enzymatic proteins from distinct Loxosceles species19,20. In 2003, a study aimed to investigate whether venoms of phylogenetically-related groups of Haplogyne spiders possess sphingomyelinase-D (SMD) toxins21. The study included 10 Loxosceles species and 2 Sicarius species, among other spider genera. The Amplex Red Phospholipase-D assay kit indicated SMD activity and these results were further supported by a Surface-Enhanced Laser Desorption/Ionization (SELDI) Time-of-Flight (TOF) analysis showing mass spectral peaks with m/z’s corresponding to those of SMD. Loxosceles SMDs, later referred to as phospholipases-D (PLDs), are known to be the major component of Loxosceles venoms and are the most well characterized toxin family in brown spider venoms. In 2005, two-dimensional protein profiles of the L. intermedia, L. laeta, and L. gaucho venoms were determined, but protein identification was focused only on the SMD toxins of the L. gaucho venom22. The identification of seven spots of interest was first attempted using data from Matrix-Assisted Laser Desorption/Ionization (MALDI) Time-of-Flight (TOF) Mass Spectrometry (MS) and Electrospray Ionization (ESI) quadrupole-time-of-flight Tandem Mass Spectrometry (MS/MS) for direct search of raw data using MASCOT22. Since the searches retrieved no significant match, de novo sequencing was performed and the resulting sequences were BLASTed against the non-redundant sequences, allowing SMD identification for all analyzed spots22. Only in 2009 was a proteomic study described that targeted the total protein content of the Loxosceles venom23. Although the L. intermedia venom was analyzed using Multi-Dimensional Protein Identification Technology (MudPIT)24, only 39 proteins were identified. Of these proteins, only 14 were described as toxins generally found in animal venoms23. Thus, this proteomic study seems to have severely underestimated the great toxin diversity of the Loxosceles venom, particularly in comparison to the many publications that already described distinct molecular clones from venoms of different Loxosceles species25,
The lack of genomic data from this arachnid prevents employing the PSM approach in full, so most of the weightlifting must be accomplished through de novo sequencing. Mainstream de novo sequencing, however, cannot efficiently handle unanticipated post-translational modifications, being far more prone to generating sequencing errors. This is because various molecules fail to provide enough mass spectral peaks during fragmentation to enable the sequencing of full peptides. To overcome these limitations, our dataset was acquired with multiple dissociation strategies applied to the same precursor (e.g., collision-induced dissociation (CID), higher-energy collisional dissociation (HCD), and electron-transfer dissociation (ETD)), thereby enabling the use of state-of-the-art de novo sequencing algorithms. These capitalize on complementary dissociation information and thus achieve unprecedented sequencing accuracy35,36. The use of different proteolytic enzymes on the venom aliquots unlocks the application of another very powerful paradigm, that of spectral networks37,38. These ‘specnets’ align spectra against one another, ultimately allowing the detection of unanticipated post-translational modifications. Moreover, they can assemble consensus mass spectra from overlapping peptides yielded by different proteolytic digests. A consensus spectrum thus obtained presents a better signal-to-noise ratio and allows for the de novo sequencing of amino-acid stretches far longer than those handled by the conventional approach. Once high-confidence de novo data are available, it becomes possible to employ tools, such as PepExplorer39 or Meta-SPS37, that apply pattern recognition approaches to the mapping of de novo sequencing data against sequences from homologous organisms, thereby facilitating biological interpretation.
By themselves, the meta-contig assemblies provided by spectral networks are not enough for one to conclude whether a biomolecule obtained 100% coverage. To pave the way in this direction, top-down proteomic data in combination with MS3 (i.e., product ion(s) selected from an MS/MS spectrum further fragmented and producing another tandem mass spectrum) and ETD were also acquired for a partition of the venom molecules into two sets (<~10 kDa and >~10 kDa). The top-down strategy consists of injecting intact proteins into the mass spectrometer, thus doing away with the inference limitations of the peptide-centric approach40. This provides complementary information to that of the networks and helps in the discovery of how much is required for obtaining full coverage. We anticipate that these data will be fundamental in the development of next-generation algorithms capable of bridging the gap between bottom-up, middle-down, and top-down proteomics.
Here, we present the first multi-protease, multi-dissociation, bottom-up-to-top-down proteomic dataset of the venom of L. intermedia, the ‘urban’ spider species commonly found in the city of Curitiba, Brazil41, along with an analysis using state-of-the-art tools. The approach stems from the motivation that multiple enzyme digestion increases protein coverage42, besides relying on different activation and acquisition methods.
Adult L. intermedia specimens (both male and female) were collected in the wild in accordance with the Brazilian Federal System for Authorization and Information on Biodiversity (SISBIO-ICMBIO, license number 29801-1). Venom from 200 spiders was extracted through the electrostimulation method43 and immediately diluted in ammonium bicarbonate buffer 0.4 M/urea 8 M. Protein concentration was determined through the Coomassie blue method, using bovine serum albumin (BSA) as standard curve44. First, the venom was separated into two fractions using an ultra-filter unit (MW cutoff 10 kDa) (Millipore), one fraction containing venom proteins above ~10 kDa (400 μg) and the other containing venom proteins and peptides bellow ~10 kDa (90 μg). All procedures described next were performed equally for each fraction, after further dividing it into four aliquots, each of which was reduced with dithiothreitol (DTT) to a final concentration of 25 mM for 3 h at room temperature. Afterwards, the samples were alkylated with iodacetamide (IAA) to a final concentration of 80 mM for 15 min at room temperature in the dark. Each aliquot was digested with one of the follow enzymes: trypsin (Trypsin Gold, Mass Spectrometry Grade, Promega Corporation, Madison, cat. No. V5280, WI, USA), chymotrypsin (Promega, cat. No. V1062), and pepsin (Promega, cat. No. V1959) at the ratio of 1:50 (E:S). We note that an additional aliquot was stored and not digested. Three aliquots were incubated individually with each enzyme for 18 h, at 25 °C for chymotrypsin and 37 °C for trypsin and pepsin. The other aliquot was incubated for only 4 h with trypsin at 37 °C. Each digested fraction was divided into three aliquots and desalted with ultra-micro C-18 spin columns according to the manufacturer’s instructions (Harvard Apparatus). One of these three aliquots was stored for future use, another had its peptides desalted and directly submitted to reverse phase chromatography coupled online with an Orbitrap XL mass spectrometer. The third aliquot of the desalted peptides was eluted with 70% acetonitrile (ACN) and 0.1% formic acid, then dried in a speed vacuum concentrator, suspending buffer C (i.e., 10 mM of K2HPO4, 25%ACN, pH=3.0). Afterwards, the sample was passed through a micro strong cation exchanged spin column (SCX) according to the manufacturer’s instructions (Harvard Apparatus). Briefly, the column was equilibrated with buffer C, centrifuged for 1 min at 100×g, and the sample was eluted from the SCX spin column with increasing concentration of KCl, i.e., 100, 170, 290, and 400 mM. Finally, each fraction was desalted once more with ultra-micro C-18 spin columns according to the manufacturer’s instructions (Harvard Apparatus). All columns were then washed ten times with 0.1% formic acid and the peptides were eluted with buffer B (i.e., 70% acetonitrile, 0.1% formic acid) to proceed to next step.
Mass spectrometry analysis
Each fraction of peptides, including the non-fractionated as well as those from the SCX fractionation, was previously desalted and subjected to an LC-MS/MS analysis on a nano-LC 1D plus System (Eksigent, Dublin, CA), an ultra-high performance liquid chromatography (UHPLC) system coupled with an LTQ-Orbitrap XL ETD (Thermo, San Jose, CA) mass spectrometer, at the Mass Spectrometry Facility RPT02H of the Carlos Chagas Institute (Fiocruz, Brazil). In these analyses, the peptide mixtures were loaded onto a column (75 mm i.d., 15 cm long), packed in-house with a 3.2 μm ReproSil-Pur C18-AQ resin (Dr Maisch) with a flow of 500 nl/min and subsequently eluted with a flow of 250 nl/min from 5 to 40% ACN in 0.5% formic acid in a 120 min gradient. The mass spectrometer was set to data-dependent mode to automatically switch between MS and MS/MS acquisition. Full-scan MS spectra (m/z 350–1,800) were acquired in the Orbitrap analyzer with resolution R=60,000 at m/z 400 (after accumulation to a target value of 1,000,000 in the linear trap) using survey mode. The three most intense ions were sequentially isolated and fragmented using CID, HCD, and ETD for the same precursor. Previous target ions selected for MS/MS were dynamically excluded for 60 s. The total cycle time was approximately 5 s. The general mass spectrometric conditions were: spray voltage, 2.4 kV; no sheath or auxiliary gas flow; ion transfer tube temperature, 100 °C; collision gas pressure, 1.3 mTorr; normalized collision energy using wide-band activation mode; 35% for MS/MS. Ion selection thresholds were of 5,000 counts for MS/MS. The parameters for each fragmentation type in MS/MS acquisitions were as follows. For CID: isolation width, m/z 2.5; normalized collision energy, 35; activation, q=0.25; activation time, 30 ms. For HCD: isolation width, m/z 2.5; normalized collision energy, 35; activation time, 30 ms; full width at half maximum resolution, 15,000. For ETD: isolation width, m/z 2.5; activation time, 100 ms.
The de novo sequencing approach employed in this work utilized multiple MS/MS spectra from overlapping peptides, generated from multiple proteases and of precursors analyzed with CID, HCD, and ETD spectrum triples. Each was then converted into prefix residue mass (PRM) spectra. In this conversion, MS/MS peak masses were converted into putative cumulative precursor fragment masses, with intensity scores determined from likelihood models specific to each fragmentation mode. Triples of PRM spectra from the same precursor were then merged into a single PRM spectrum per precursor by adding scores for matching peak masses. Spectral-network algorithms, implemented in the ProteoSAFe web platform that is freely accessible at http://proteomics.ucsd.edu/ProteoSAFe/, were then used to align merged PRM spectra from peptides with overlapping sequences. Moreover, A-Bruijn algorithms were used to integrate these alignments into assembled contigs.
Each contig was then used to construct a consensus contig spectrum, or meta-contig, capitalizing on the corroborating evidence from all of its assembled spectra to yield a high-quality consensus de novo sequence36. Subsequently, the Meta-SPS algorithm was used to align the meta-contigs against a FASTA sequence database37. This database contained all Loxosceles sequences from UniProt, all from the transcriptome of the L. intermedia venom gland20, and an internal database with common mass spectrometry contaminants and proteases.
A summary of this methodology is found in Fig. 1.
Our bioinformatics analysis disclosed a list of 190 proteins (Table 1). As far as we know, this is the most complete comprehensive proteomic profiling of the L. intermedia venom. All mass spectrometry data are available from both the ProteomeXchange Consortium via the PRIDE45 partner repository, with dataset identifier PXD005523 (Data Citation 1: PRIDE PXD005523), and our servers (http://proteomics.fiocruz.br/pcarvalho/lintermedia/venom/). A full list of the proteins, meta-contigs, and homologous sequences is made available in Table 1.
All Meta-SPS results for >~10 kDa and <~10 kDa, together with the parameter files used for running the software, are available as separate material (MetaSPS_Results.xlsx, Data Citation 2: Figshare https://doi.org/10.6084/m9.figshare.c.3709168). The results are presented in six tabs, viz., for >~10 kDa grouped by contig, >~10 kDa grouped by spectrum, >~10 kDa parameter file, <~10 kDa grouped by contig, <~10 kDa grouped by spectrum, and <~10 kDa parameter file.
The lack of any previous comprehensive proteomic analysis of the Loxosceles venom demonstrates that studying this venom in detail has been a challenge, one that stems from the organism being highly non-canonical and from the fact that protein sequences for it have remained scarce in databases. The present work circumvented these obstacles by using a combination of shotgun proteomic experiments and different tools to generate and analyze large proteomic datasets and de novo sequencing results.
Our results revealed 190 protein identifications, including all classes of toxins described in previous transcriptome analyses19,20 (Table 2 (available online only)). Our approach identified both high- and low-abundance toxins of the L. intermedia venom, as well as homolog sequences from distinct Loxosceles species (astacin-like proteases, PLDs, peptides, TCTPs, hyaluronidases, allergens, serine proteases, serine protease inhibitors, and housekeeping proteins) (Table 2 (available online only)). These data reinforce the holocrine nature of the Loxosceles venom gland23 and demonstrate that its venom is composed of toxins and housekeeping proteins originating from epithelial-cell content, such as the angiotensin converting enzyme, the 60S ribosomal protein, the Na-Pi co-transporter, and the myosin heavy chain (Table 2 (available online only)). Our results, therefore, validate the method used for analyzing the proteome of an organism with non-sequenced genome.
Taken together, the identified toxins in the L. intermedia venom include representatives from all toxin groups, even if in low abundances (as in the case of, e.g., hyaluronidases and serine proteases). We also find it noteworthy that we obtained significant coverage of the three major families present in the venom, viz., PLDs, astacin-like metalloproteases, and ICK peptides. These families are of great importance for studies of the brown-spider envenomation features and of biotechnological and medical applications.
Many of the aligned contigs mapped to distinct PLD isoforms from a variety of Loxosceles species. In fact, these toxins are the most studied and well-characterized components of the Loxosceles venom5,20,26,31,46,
As for the astacin-like metalloproteases identified, we note that astacins were first described as an animal-venom component in 2007 (ref. 28) and only later recognized as a family of toxins present in the Loxosceles venom33. These toxins present proteolytic activity on distinct extracellular matrix proteins and are related to the hemostatic effects in loxoscelism43,49.
ICK peptides, the major components of the L. intermedia venom-gland transcriptome (54,9% of the expressed sequence tags), were identified with correspondence to all four different ICK peptides described for L. intermedia (LiTx1, LiTx2, LiTx3, and LiTx4)50,51. These ICK peptides, also called knottins, are characterized by the neurotoxic properties they exhibit on ion channels and receptors expressed in the nervous systems of insects and mammals52. The high expression of LiTx transcripts, which correlates with the proteomic results found herein, are consistent with the venom’s effects of paralyzing and killing both preys and predators1,20,51.
How to cite this article: Trevisan-Silva, D. et al. A multi-protease, multi-dissociation, bottom-up-to-top-down proteomic view of the Loxosceles intermedia venom. Sci. Data 4:170090 doi: 10.1038/sdata.2017.90 (2017).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Trevisan-Silva, D. Figshare https://doi.org/10.6084/m9.figshare.c.3709168 (2017)
The authors thank CNPq and CAPES for financial support. They also thank Fiocruz for use of Mass Spectrometry Facility RPT02H at the Carlos Chagas Institute, as well as Dr Michel Batista for aiding in the mass spectrometry procedures. V.C.B. acknowledges support from a FAPERJ BBP grant. N.B. is an Alfred P. Sloan Research Fellow and was partially supported by the US National Institutes of Health Grant 2 P41 GM103484-06A1 from the National Institute of General Medical Sciences. The authors thank Wagner Nagib from Carlos Chagas Instituto, Fiocruz—Paraná for generating the final cover art.