Elucidation of protein biomarkers for verification of selected biological warfare agents using tandem mass spectrometry

Some pathogens and toxins have the potential to be used as weapons of mass destruction and instigate population-based fear. Efforts to mitigate biothreat require development of efficient countermeasures which in turn relies on fast and accurate methods to detect the biological agents in a range of complex matrices including environmental and clinical samples. We report here an mass spectrometry (MS) based methodology, employing both targeted and shot-gun approaches for the verification of biological agents from the environmental samples. Our shot-gun methodology relied on tandem MS analysis of abundant peptides from the spiked samples, whereas, the targeted method was based on an extensive elucidation of marker proteins and unique peptides resulting in the generation of an inclusion list of masses reflecting relevant peptides for the unambiguous identification of nine bacterial species [listed as priority agents of bioterrorism by Centre for Disease Control and Prevention (CDC)] belonging to phylogenetically diverse genera. The marker peptides were elucidated by extensive literature mining, in silico analysis, and tandem MS (MS/MS) analysis of abundant proteins of the cultivated bacterial species in our laboratory. A combination of shot-gun MS/MS analysis and the targeted search using a panel of unique peptides is likely to provide unambiguous verification of biological agents at sub-species level, even with limited fractionation of crude protein extracts from environmental samples. The comprehensive list of peptides reflected in the inclusion list, makes a valuable resource for the multiplex analysis of select biothreat agents and further development of targeted MS/MS assays.


Supplementary
Supplementary Table S5: Results from proof-of-concept studies.

Selection of species specific putative marker proteins
In order to select putative marker proteins, the following criteria were envisaged: 1) the protein should have evidence of abundant expression by the microorganism; 2) it should not be closely related to its homologs in other bacterial species; and 3) should preferably be identifiable in wet lab experiments using whole cell lysate of pure culture. virulence, and surface localization) of each of the proteins were also noted. Predicted molecular function and cellular processes for each of the selected proteins were also listed using the ExPASy Proteomics tools (http://www.expasy.ch). Localisation of protein in the cell was also predicted using the online PSORT tool at ExPASy (http://www.expasy.ch). All these putative protein markers, previously shown to be expressed in abundance in the respective pathogenic bacterial species (9) were initially ranked in the decreasing order of the number of reports for a given protein with a cut-off of at least three reports for each. Several independent reports showing experimental evidence of expression increased the likelihood of identifying the protein using MS, from diverse matrices where the given species is present.
FASTA sequences for each of these putative marker proteins, short-listed on the basis of three or independent reports of expression, were retrieved from the reference strain of the selected bacterial species and subjected to global protein BLAST against the nonredundant protein database at NCBI and percent identity of the query sequence with its nearest homolog in another species was noted.
A low percent identity of the protein with its closest homolog in any other bacterial species was given maximum weightage for an increased likely hood of getting species specific unique peptides (table 2).
Using MALDI-TOF-TOF, abundant proteins were identified in the laboratory from the cell lysates of selected bacterial species after minimal fractionation on SDS-PAGE. The peptides with significant MS/MS ion score were listed (Supplementary Table 1 A -F). The identified protein and their peptides with significant MS/MS data, indicated abundance and amenability of the peptides for the tandem MS analysis.

Prioritisation of the species specific putative marker proteins
Putative marker proteins for each of the selected species were prioritized according to an arbitrarily designed scoring scheme that was aimed at selecting proteins / peptides unique to the species and expressed by the microorganism in abundance, in turn increasing the probability of getting better signal on MS analysis. The putative marker proteins were judiciously selected to impart selectivity by employing a ranking scheme to prioritize candidate protein markers for each of the selected species by taking into consideration the following three parameters: After adding all the points for each protein in a given species, 15 -27 proteins were short-listed as marker proteins for each of the selected species (Table 1).

In silico digestion of species specific marker proteins and unique peptide selection
The FASTA sequences of the selected putative marker proteins were subjected to in silico tryptic digestion using the Peptide Mass algorithm at ExPASy Proteomics tools (http://www.expasy.org). after in silico digestion, peptides were selected in the mass range of 1000 -3000 Da from each selected protein from the selected bacterial species. These peptides were subjected to global protein BLAST against the GeneBank non-redundant protein database (http://wwww.ncbi.nlm.nih.gov) with search parameters adjusted for short input sequence. Peptides showing less than 100% sequence identity with any other bacterial species (except the query species) were selected for the inclusion list of unique peptides. This selection was based on the assumption that a difference of even one single amino acid is most likely to ensure that the selected mass will not come from any other species except the target taxon. Peptide cereus sensu lato group meant that the peptide queried did not show 100% sequence identity with any other species but a complete match with one or more of the six species within the group was allowed. Similarly, the closely related species of Burkholderia (B. mallei and B. pseudomallei) and Brucella (B. suis, B. abortus, and B. melitensis) were considered together for the peptide blast search and scoring unique peptides. In order to discriminate between these closely related pathogens, the list of putative markers was further appended with peptides unique to these related species using reported literature.
The unique peptides in the inclusion list were further screened for strain coverage using all the available sequences retrieved by species specific BLAST search and counting for the presence of a given peptide from the alignment data of these sequences from 111 diverse strains in the genome data base. The peptides were further curated to reduce background signal by deducting isobaric peptides observed in the MS analysis of tryptic digest from environmental bacterial consortium from garden soil.
This way, an inclusion list of masses, corresponding to the peptides selected as described above, were generated for each of the nine bacterial species and a consolidated list of nine was used for the validation experiment for the targeted analysis.       y19 y18 y17 y16 y15 y14 y13 y12 y11 y10 y9 y8 y7 y19 y18 y17 y16 y15 y14 y13 y12 y11 y10 y9 y8 y7 y6 y5