Key Points
-
The prediction of a bacterial protein's subcellular localization can be of considerable aid to microbiological research. It can be used to infer potential functions for a protein, to either design or support the results of particular experimental approaches and, in the case of surface-exposed proteins, to quickly identify potential drug or vaccine targets in a given pathogen genome, or potential diagnostic/detection targets in pathogen or environmental isolates.
-
Bacterial proteins contain sequence features that either directly influence the targeting of a protein to a particular cellular compartment or else are characteristic of proteins found at a specific localization site. These features are encoded in the protein's amino-acid sequence and can be identified computationally.
-
By analyzing a protein for the presence or absence of one or more of these features and integrating the results, a prediction of which compartment a protein is likely to reside in can be generated.
-
Since the 1991 release of the first comprehensive, web-based bacterial protein localization prediction method, PSORT I, seven other such tools have been released. This review summarizes the techniques implemented by each tool, their benefits, pitfalls and predictive performance.
-
The review also describes alternative methods for localization prediction, including similarity searches against localization databases and the use of predictive tools designed to identify individual sequence features. The performance of these methods is compared with that of the seven broad-spectrum localization prediction tools.
-
PSORTb and Proteome Analyst are the most precise predictive methods currently available, with other methods complementing them when higher sensitivity (a larger number of predictions) is required.
-
The precision of certain localization prediction tools has now surpassed the precision of some high-throughput laboratory methods for localization determination. We can now reliably assign potential localization sites to the majority of proteins encoded in a genome.
Abstract
The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets. Since the 1991 release of PSORT I ? the first comprehensive algorithm to predict bacterial protein localization ? many other localization prediction tools have been developed. These methods offer significant improvements in predictive performance over PSORT I and the accuracy of some methods now rivals that of certain high-throughput laboratory methods for protein localization identification.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Holland, I. B., Schmitt, L. & Young, J. Type 1 protein secretion in bacteria, the ABC-transporter dependent pathway. Mol. Membr. Biol. 22, 29?39 (2005).
Pugsley, A. P. The complete general secretory pathway in Gram-negative bacteria. Microbiol. Rev. 57, 50?108 (1993).
Muller, M. & Klosgen, R. B. The Tat pathway in bacteria and chloroplasts. Mol. Membr. Biol. 22, 113?121 (2005).
Journet, L., Hughes, K. T. & Cornelis, G. R. Type III secretion: a secretory pathway serving both motility and virulence. Mol. Membr. Biol. 22, 41?50 (2005).
Christie, P. J. & Cascales, E. Structural and dynamic properties of bacterial type IV secretion systems. Mol. Membr. Biol. 22, 51?61 (2005).
Thanassi, D. G., Stathopoulos, C., Karkal, A. & Li, H. Protein secretion in the absence of ATP: the autotransporter, two-partner secretion and chaperone/usher pathways of Gram-negative bacteria (review). Mol. Membr. Biol. 22, 63?72 (2005).
Nishikawa, K. & Ooi, T. Correlation of the amino acid composition of a protein to its structural and biological characters. J. Biochem. (Tokyo) 91, 1821?1824 (1982).
Cedano, J., Aloy, P., Perez-Pons, J. A. & Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol. 266, 594?600 (1997).
Holland, I. B. Translocation of bacterial proteins ? an overview. Biochim. Biophys. Acta 1694, 5?16 (2004).
van Wely, K. H., Swaving, J., Freudl, R. & Driessen, A. J. Translocation of proteins across the cell envelope of Gram-positive bacteria. FEMS Microbiol. Rev. 25, 437?454 (2001).
McGeoch, D. J. On the predictive recognition of signal peptide sequences. Virus Res. 3, 271?286 (1985).
von Heijne, G. A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14, 4683?4690 (1986).
Eisenberg, D., Weiss, R. M. & Terwilliger, T. C. The hydrophobic moment detects periodicity in protein hydrophobicity. Proc. Natl. Acad. Sci. USA 81, 140?144 (1984).
Kyte, J. & Doolittle, R. F. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105?132 (1982). Introduces the Kyte and Doolittle hydropathy scale and the sliding window approach for identifying hydrophobic segments within a protein, both of which were later used in many transmembrane α-helix prediction methods.
von Heijne, G. Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225, 487?494 (1992).
Nakai, K. & Kanehisa, M. Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11, 95?110 (1991). Describes PSORT I, the first localization prediction method to implement the analysis of multiple sequence features.
Rey, S., Gardy, J. L. & Brinkman, F. S. Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria. BMC Genomics 6, 162 (2005).
Gardy, J. L. et al. PSORTb v. 2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21, 617?623 (2005).
Gardy, J. L. et al. PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Res. 31, 3613?3617 (2003).
Lu, Z. et al. Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20, 547?556 (2004).
Nakai, K. & Horton, P. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem. Sci. 24, 34?36 (1999).
Tusnady, G. E. & Simon, I. The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849?850 (2001).
Rey, S. et al. PSORTdb: a protein subcellular localization database for bacteria. Nucleic Acids Res. 33, D164?D168 (2005).
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365?370 (2003).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389?3402 (1997).
Lu, P. et al. PA-GOSUB: a searchable database of model organism protein sequences with their predicted gene ontology molecular function and subcellular localization. Nucleic Acids Res. 33, D147?D153 (2005).
Vapnik, V. The Nature of Statistical Learning Theory (Springer, New York, 2000). Although Vapnik had formulated the idea of using hyperplanes for linear classification in the 1960s, it was not until this book was published that support vector machine became a well-developed and widely recognized method for the classification of non-linearly separable data.
Hua, S. & Sun, Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17, 721?728 (2001).
Reinhardt, A. & Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res. 26, 2230?2236 (1998).
Yu, C. S., Lin, C. J. & Hwang, J. K. Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci. 13, 1402?1406 (2004).
Yu, C. S., Chen, Y. C., Lu, C. H. & Hwang, J. K. Prediction of protein subcellular localization. Proteins 64, 643?651 (2006).
Bhasin, M., Garg, A. & Raghava, G. P. PSLpred: prediction of subcellular localization of bacterial proteins. Bioinformatics 21, 2522?2524 (2005).
Nair, R. & Rost, B. Mimicking cellular sorting improves prediction of subcellular localization. J. Mol. Biol. 348, 85?100 (2005).
Wang, J., Sung, W. K., Krishnan, A. & Li, K. B. Protein subcellular localization prediction for Gram-negative bacteria using amino acid subalphabets and a combination of multiple support vector machines. BMC Bioinformatics 6, 174 (2005).
Nair, R. & Rost, B. Sequence conserved for subcellular localization. Protein Sci. 11, 2836?2847 (2002). The authors demonstrate that subcellular localization is an evolutionarily conserved property and that, above certain levels of sequence similarity, localization annotation can be transferred from a known protein to an unknown protein with a high degree of confidence.
Guo, T., Hua, S., Ji, X. & Sun, Z. DBSubLoc: database of protein subcellular localization. Nucleic Acids Res. 32, D122?D124 (2004).
Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783?795 (2004).
Juncker, A. S. et al. Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci. 12, 1652?1662 (2003).
Bendtsen, J. D., Nielsen, H., Widdick, D., Palmer, T. & Brunak, S. Prediction of twin-arginine signal peptides. BMC Bioinformatics 6, 167 (2005).
Kall, L., Krogh, A. & Sonnhammer, E. L. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, 1027?1036 (2004).
Menne, K. M., Hermjakob, H. & Apweiler, R. A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16, 741?742 (2000).
Zhang, Z. & Henzel, W. J. Signal peptide prediction based on analysis of experimentally verified cleavage sites. Protein Sci. 13, 2819?2824 (2004).
Moller, S., Croning, M. D. & Apweiler, R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646?653 (2001).
Kall, L. & Sonnhammer, E. L. Reliability of transmembrane predictions in whole-genome data. FEBS Lett. 532, 415?418 (2002).
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412?424 (2000). An excellent technical review of several performance evaluation metrics used in bioinformatics, discussed in the context of transmembrane α-helix and signal peptide prediction.
Huang, Y. L. & Chen, D. R. Support vector machines in sonography: application to decision making in the diagnosis of breast cancer. Clin. Imaging 29, 179?184 (2005).
Ratsch, G., Sonnenburg, S. & Scholkopf, B. RASE: recognition of alternatively spliced exons in C. elegans. Bioinformatics 21 (Suppl. 1) i369?i377 (2005).
Barutcuoglu, Z., Schapire, R. E. & Troyanskaya, O. G. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 830?836 (2006).
Al-Shahib, A., Breitling, R. & Gilbert, D. Feature selection and the class imbalance problem in predicting protein function from sequence. Appl. Bioinformatics 4, 195?203 (2005).
Gardy, J. L. in Molecular Biology and Biochemistry (Simon Fraser Univ., Burnaby, 2006).
Saleh, M. T., Fillon, M., Brennan, P. J. & Belisle, J. T. Identification of putative exported/secreted proteins in prokaryotic proteomes. Gene 269, 195?204 (2001).
Schatz, G. & Dobberstein, B. Common principles of protein translocation across membranes. Science 271, 1519?1526 (1996).
Schneider, G. How many potentially secreted proteins are contained in a bacterial genome? Gene 237, 113?121 (1999).
Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567?580 (2001).
Arai, M. et al. ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability. Nucleic Acids Res. 32, W390?W393 (2004).
Berven, F. S., Flikka, K., Jensen, H. B. & Eidhammer, I. BOMP: a program to predict integral β-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic Acids Res. 32, W394?W399 (2004).
Bigelow, H. R., Petrey, D. S., Liu, J., Przybylski, D. & Rost, B. Predicting transmembrane β-barrels in proteomes. Nucleic Acids Res. 32, 2566?2577 (2004).
Bigelow, H. & Rost, B. PROFtmb: a web server for predicting bacterial transmembrane β-barrel proteins. Nucleic Acids Res. 34, W186?W188 (2006).
Garrow, A. G., Agnew, A. & Westhead, D. R. TMB-Hunt: an amino acid composition based method to screen proteomes for β-barrel transmembrane proteins. BMC Bioinformatics 6, 56 (2005).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Table 1
test description (PDF 99 kb)
Related links
Related links
DATABASES
Entrez Genome Project
FURTHER INFORMATION
Glossary
- Type I secretion system
-
A protein export system spanning the bacterial cell envelope that transports newly synthesized proteins directly from the cytoplasm to the extracellular space.
- Type II secretion system
-
A two-stage protein export system spanning both the bacterial cytoplasmic and outer membranes. Also known as the general secretory pathway.
- Sec-dependent pathway
-
One of several possible first stages of the general secretory pathway protein export system in the cytoplasmic membrane that transports newly synthesized proteins into or across the cytoplasmic membrane.
- SRP-dependent pathway
-
One of several possible first stages of the general secretory pathway protein export system that inserts membrane proteins into the cytoplasmic membrane.
- Twin arginine translocation pathway
-
(TAT pathway). One of several possible first stages of the general secretory pathway protein export system in the cytoplasmic membrane that transports folded proteins across the cytoplasmic membrane.
- Type III secretion system
-
A system that is used by many pathogenic bacteria to inject virulence proteins directly into host cells through needle-like structures. Ancestrally related to the system used by bacteria to export flagellum protein subunits.
- Type IV secretion system
-
A syringe-like proteinaceous machinery that can transport bacterial protein or DNA effector molecules directly into a eukaryotic cell.
- Type V secretion system
-
A system that involves autotransporter proteins, which are translocated across the outer membrane of Gram-negative bacteria through a transmembrane pore that is formed by a self-encoded β-barrel structure.
- Signal peptide
-
A short sequence of mainly hydrophobic amino acids at the N terminus of some secreted proteins that directs the nascent protein to the first step of the general secretory pathway.
- κ-nearest-neighbour classification technique
-
A method for classifying an unknown object based on its proximity in multidimensional space to neighbouring objects of known class.
- HMMTOP
-
(Hidden Markov model for topology prediction). An automatic server for predicting transmembrane helices and the topology of proteins. HMMTOP is based on the principle that the topology of transmembrane proteins is determined by the maximum divergence of amino-acid composition of sequence.
- Bayesian network
-
A statistical approach (named after Bayes' Theorem) for inferring the likelihood of an event given a series of prior events with known probabilities.
- BLAST
-
(Basic local alignment search tool). A sequence comparison algorithm, optimized for speed, used to search sequence databases for regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches.
- PA?GOSUB
-
(Proteome Analyst?Gene Ontology Molecular Function and Subcellular Localization). A publicly available, web-based, searchable and downloadable database that contains the sequences, predicted molecular functions and predicted subcellular localizations of over 107,000 proteins from ten model organisms.
- Matthews Correlation Coefficient
-
(MCC). A measure of predictive performance that incorporates both precision and recall into a single value between −1 and +1.
- PSI-BLAST
-
Position-specific iterative BLAST. This is a feature of BLAST 2.0 in which a profile (or position-specific scoring matrix, (PSSM)) is constructed from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is then used to do a second BLAST search and the results of each 'iteration' are used to refine the profile. This iterative searching strategy results in increased sensitivity.
- Expect value
-
This describes the likelihood that a sequence with a similar score will occur in the database by chance. The smaller the e value, the more significant the alignment. For example, if the first alignment has a low e value of 10−117, this indicates that there is a significant sequence alignment and that a sequence with a similar score is unlikely to occur simply by chance.
- FASTA
-
A commonly used sequence format in bioinformatics starting with a '>' character and optional description, followed by a DNA or protein sequence.
- BLASTp
-
This is used to compare an amino-acid query sequence with other protein sequences stored in databases.
Rights and permissions
About this article
Cite this article
Gardy, J., Brinkman, F. Methods for predicting bacterial protein subcellular localization. Nat Rev Microbiol 4, 741–751 (2006). https://doi.org/10.1038/nrmicro1494
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrmicro1494
This article is cited by
-
Prediction of Potential Drug Targets and Vaccine Candidates Against Antibiotic-Resistant Pseudomonas aeruginosa
International Journal of Peptide Research and Therapeutics (2022)
-
Predicting subcellular location of protein with evolution information and sequence-based deep learning
BMC Bioinformatics (2021)
-
Plasmids do not consistently stabilize cooperation across bacteria but may promote broad pathogen host-range
Nature Ecology & Evolution (2021)
-
Delineating the potential targets of thymoquinone in ESKAPE pathogens using a computational approach
In Silico Pharmacology (2021)
-
The complete genome sequence of the nitrile biocatalyst Rhodococcus rhodochrous ATCC BAA-870
BMC Genomics (2020)