Antimicrobial peptides (AMPs) are produced by multicellular organisms as a defence mechanism against competing pathogenic microbes1,2. Pioneering studies have led to the discovery of various types of these 'host defence peptides' — including defensins3, cecropins4, magainins5 and cathelicidins6 — with remarkably different structures and bioactivity profiles7. Extensive research has led to the realization that these bioactive peptides do not merely act as direct antimicrobial agents but also represent important effectors and regulators of the innate immune system that are able to profoundly modulate the immune response through a range of activities, including increasing chemokine production and release by immune and epithelial cells, enhancing wound healing and angiogenesis, exerting pro- and anti-apoptotic effects on different immune cell types, as well as having adjuvant activity in promoting adaptive immunity2,8,9,10.

In addition to their direct (although sometimes weak) antimicrobial activities, AMPs have additional antimicrobial effects as they are able to suppress biofilm formation and induce the dissolution of existing biofilms11, chemotactically attract phagocytes and mediate non-opsonic phagocytosis12,13,14. Some natural AMPs, like porcine protegrin, exhibit strong antimicrobial effects but in general the activity of AMPs is improved by their increased concentrations in phagocyte granules, the crypts of the intestine and near degranulating phagocytes15,16,17. In vitro studies have revealed that direct antimicrobial activity is not limited to the previously suggested mechanisms of membrane and/or cell rupture but instead extends to interference with membrane-associated biosynthesis, macromolecular synthesis in the cytoplasm and metabolic functions2,18,19.

It has been suggested that the term 'antimicrobial peptide' should be used when direct antimicrobial activity is being examined, and the term 'host defence peptide' should be used when referring to anti-infective activity that enhances or modulates the host immune response to infectious agents20. Here, we review the current status of computer-assisted peptide design, with particular focus on the in silico generation of novel AMPs. We concentrate on describing the optimization of direct antimicrobial activity, as optimization studies on host defence peptides and their synthetic innate defence regulator counterparts are still in their infancy.

Our awareness of natural antimicrobials — namely, the “existence of antimicrobial substances in blood, leucocytes and lymphatic tissue”21 — dates back to the late nineteenth century22. The concept that low-molecular-weight antimicrobial proteins were important in immunity was first highlighted by studies on human phagocytes23 and Hyalophora cecropia moths24. Currently, the most potent natural peptides known to exhibit antimicrobial activity are the β-hairpin peptides typified by the horseshoe crab polyphemusin I (RRWC1FRVC2YRGFC2YRKC1R; where the equivalently numbered cysteine residues form cystine disulphide bridges)25,26 and pig protegrin (RGGRLC1YC2RRRFC2VC1VGR)27,28. Polyphemusin I inhibits the growth of various Gram-positive and Gram-negative bacteria as well as Candida albicans at a minimal inhibitory concentration (MIC) of around 0.5–1 μg per ml29. However, polyphemusin I does not protect against Pseudomonas aeruginosa infections in cyclophosphamide-treated (neutropenic) mice, it exhibits haemolytic activity at higher concentrations, and a protegrin derivative — IB-367 (iseganan) — failed in Phase III clinical trials of oral mucositis29,30,31.

Most of the natural cationic peptides are much less (10–100-fold) active than these AMPs and are strongly antagonized by physiological concentrations of mono- and divalent cations as well as polyanionic polymers like glycosaminoglycans and mucin32. Thus, new design approaches are required that enable the identification of more cost-effective sequences (that is, smaller sequences without post-translational modifications) that are highly active, have broad-spectrum activity without associated toxicities, good pharmacokinetics and a desired selectivity profile.

The challenge of bacterial resistance

Presumably, bacteria have been exposed to AMPs for millions of years and, with the exception of a few species (such as Burkholderia spp.)33, widespread resistance has not been reported, making them a potential treasure trove of starting points for rational, focused antimicrobial drug design. In most instances natural AMPs do not appear to be highly optimized for direct antimicrobial activity, and it is likely that multiple modestly active peptides with concomitant immunomodulatory activities work effectively in combination and/or when induced or delivered to sites of infection34. Drug developers are not limited to such considerations and can strive to develop AMPs with an optimized specific function or target.

Modern antibiotics have a considerably limited number of macromolecular targets, often essential bacterial proteins, and are subject to severe and growing resistance problems35,36,37. Notably, the development of resistance against AMPs has occurred to a much lesser degree as they usually work by attacking multiple hydrophobic and/or polyanionic targets38,39. Thus, it is difficult to obtain mutants that are resistant to AMPs, and training methods — for example, multiple passages with half the MIC of AMPs — are usually required for the development of any resistance40.

Mechanisms of resistance to AMPs include cell surface modification (which can occur adaptively through the two-component regulator PhoPQ, for example), external trapping of AMPs, active efflux of AMPs, proteolytic degradation, as well as the suppression of host pathways by the pathogen for the production of AMPs39,41. These challenges are usually met by screening peptides for activity against intact bacteria using conventional bacterial growth media that contain physiological salt concentrations, and by counterscreening for lack of mammalian cell toxicity. The development of computational alternatives to this experimental selection protocol is aiding the elimination of poor candidate peptides at very early stages of development.

In addition, the realization that small-molecule drugs tend to interact with multiple macromolecular targets has profoundly changed drug design42,43,44. Consequently, chemogenomics and 'systems' views have evolved as methods for rational drug discovery and compound library design, and target profile prediction that specifically addresses undesired off-target activity is now possible for drug-like compounds45,46,47. So-called kernel methods have recently been added to the set of algorithms that seem to be particularly suited for the purpose of identifying drug–target interaction pairs48. These methodological advances have not yet been transferred to peptide design but have the potential to boost the design of AMPs with desired selective antibacterial activity.

Mechanism of action of AMPs

A profound understanding of the molecular mechanisms responsible for the direct antibacterial activity of AMPs will enable the development of improved predictive models, and several mechanisms of action have been proposed49,50,51. Indisputably, AMPs must interact with membranes as part of their direct antibacterial mechanism (or mechanisms) of action, leading to membrane perturbation, disruption of membrane-associated physiological events such as cell wall biosynthesis or cell division, and/or translocation across the membrane to interact with cytoplasmic targets (Fig. 1). It is generally assumed that the positively charged AMP initially interacts with the negatively charged lipid head groups of the outer surface of the cytoplasmic membrane. The peptide is then inserted, in an approximately parallel orientation to the bilayer, into the outer leaflet of the cytoplasmic membrane lipid bilayer, leading to the displacement of lipids.

Figure 1: Potential mechanism of membrane disruption and/or translocation by antimicrobial peptides.
figure 1

The antimicrobial peptide (AMP) is represented as a ribbon diagram, with positively charged residues indicated in blue (other residues are shown in yellow). The initial stages include membrane attachment (part a) and insertion into the outer leaflet (part b). Not all AMPs actually insert into and disrupt the membrane (part c) or form pores. Some peptides are too short to span the bilayer (for example,the cyclic decapeptide gramicidin S). A high peptide concentration leads to increased membrane curvature and facilitates pore formation192 or translocation across the membrane. A suggested feature of the mode of action of AMP is to stretch, disorder and thin the outer leaflet of the membrane (part b). The panels on the right illustrate hypothetical mechanisms of the AMP–membrane interaction that can be studied by molecular dynamics simulations. Here, spinigerin (Protein Data Bank193 ID: 1ZRV194) was simulated in explicit water with a POPE (1-palmitoyl-2-oleoyl-phosphaethanolamine) membrane bilayer model using the NAMD v2.7b2 software195 and the CHARMM22 force field196.

Possible alterations in membrane structure, including thinning, pore formation, altered curvature, modified electrostatics and localized perturbations, may result in the reorientation of peptide molecules in the membrane. Finally, the peptides may translocate through the membrane and diffuse into the cytoplasm to reach intracellular targets. These basic mechanisms certainly explain many aspects of the observed antibacterial activity of AMPs but they do not increase our understanding of the overall number of fundamentally different mechanisms by which AMPs interact with the bilayer, or the relative importance of membrane rupture, graded leakage and non-membrane mechanisms. Moreover, they do not answer the important question of whether membrane binding represents the sole basis of AMP selectivity52. Further modelling analyses of peptides and lipids will be required to achieve significant progress in rational AMP design and engineering based on membrane interactions.

The interactions of an AMP with the membrane cannot be explained by a particular sequential amino-acid pattern or motif; rather, they originate from a combination of physicochemical and structural features8 including size, residue composition, overall charge, secondary structure, hydrophobicity and amphiphilic character53. There has been considerable discussion, based largely on model membrane studies, about how AMPs exhibit strong preferences for specific membrane compositions and seemingly prefer membranes that: contain comparatively large fractions of anionic lipids, such as phosphatidylglycerol and cardiolipin; maintain a high electrical potential gradient; and lack cholesterol, such as bacterial membranes. For example, bacterial membranes possess a comparatively large fraction (up to 20 mole percent) of negatively charged lipids and maintain high electrical potential gradients (a transmembrane potential (Δψ) of approximately −120 mV) that attract positively charged substances like AMPs, whereas the membranes of plant cells and animal cells are enriched in cholesterol and lipids, have no net charge and maintain weak Δψ1.

To assess AMP-induced membrane tropism, a protocol for surface plasmon resonance has been devised that involves using membrane compositions that mimic those of mammalian and microbial cells53. A membrane-based surface plasmon resonance sensor chip has also been described that can detect a binding event and monitor the actual membrane-disturbing effect of the peptides54. However, the actual in vivo situation is likely to be more complex as membranes are not just simple passive bilayer structures as assumed in this surface plasmon resonance protocol and in model membrane studies; rather, they contain domains that may include hexagonal structures, minor lipid species, embedded proteins and glycolipids, and may differ in fluidity, fatty acid chain composition, as well as transmembrane Δψ and ΔpH gradients49,55.

This does not mean that model membrane systems have no value; however, any predictions that come from such systems cannot automatically be transferred to the in vivo setting. For example, the human cathelicidin AMP LL-37 has an absolute tropism for model membranes of bacterial-like composition56 but its antibacterial activity is largely restricted by physiological conditions57. LL-37 is freely translocated across eukaryotic membranes, in a similar manner to cell-penetrating peptides, and this translocation is obligatory for its immunomodulatory function of chemokine induction58,59.

Preference for a certain membrane or classes of membranes is likely to be important in peptide activity. This probably requires nascent or induced (via peptide–membrane interaction) structural features of AMPs, rather than mere electrostatic attraction due to opposing charges or hydrophobic interactions. Further evidence for this concept stems from extensive studies performed with the cell-penetrating peptide penetratin (also known as the protein transduction domain), a 16-residue cationic peptide (RQIKIWFQNRRMKWKK-amide) corresponding to the third helix of the homeodomain of the antennapedia protein60, which has been patented as a carrier peptide (or a cargo carrier) for drug delivery into cells61. The mutation of a single residue (W6F) abolished the membrane-transfer properties of penetratin, indicating that lipid binding per se might be insufficient for AMP activity62. Similarly a W2G mutation in cecropin, an AMP constituting a major part of cell-free immunity in insects63, nearly abolished antibacterial activity64.

There are many similarities between cell-penetrating peptides and AMPs. Both exhibit antimicrobial effects and can carry passenger molecules into the cell; in fact, the well-studied peptide LL-37 can be translocated into eukaryotic cells at concentrations below those required to kill bacteria (at the equivalent divalent cation concentrations)65. However, it is worth noting that AMPs are considered to have the ability to pass across bacterial membranes without requiring a transport apparatus, whereas cell-penetrating peptides are thought to be internalized primarily by active endocytosis66, perhaps pointing to a fundamental difference in how peptides access prokaryotic and eukaryotic cells.

AMPs feature a remarkable variety of structural motifs67. Figure 2 presents an alternative to the traditional structural classification of peptides with some degree of antimicrobial activity (and for which experimentally determined structures are available). To obtain an overview of their structural diversity, we clustered the peptides according to their backbone structure in solution (data obtained from nuclear magnetic resonance spectroscopy). These peptides have very different folds, which suggests that elements of their secondary structure can be used as a means for classification68. Furthermore, these peptides have to integrate, interact with and/or pass a specific membrane, but the mechanisms by which they do this are unclear. Extracting the discriminating features, however delicately tuned, among these groups of membrane-interacting peptides will help us understand the functional aspects of AMPs and enable the optimization of activity and 'rational' structure-based peptide design.

Figure 2: Clustering of antimicrobial peptides according to backbone torsion angles.
figure 2

Representatives of each cluster formed are shown as ribbon diagrams with secondary structure elements highlighted (helices are shown in turquoise, and strands are shown in magenta). We selected 135 antimicrobial peptides (AMPs) from the updated Antimicrobial Peptide Database76, for which nuclear magnetic resonance solution structures are available from the Protein Data Bank. Each peptide structure was represented by a 16-dimensional vector coding for the prevalence of backbone torsion angles (Φ and ψ), which are characteristic of peptide structure, and clustered using Ward's method197. As a result, AMPs with similar backbone folds were grouped together. This automated, fine-grained classification provides a basis for the rational design of novel AMPs with desired structural features. All calculations were performed using Matlab R2010b (The MathWorks Inc., Natick, USA).

AMP databases

As natural AMPs are largely derived from gene-coded sequences, bioinformatics methods have been applied to create databases of known AMPs as well as tools to specifically predict AMPs from genomes that have not yet been annotated. For example, the AMPer resource was developed with the aim of classifying natural AMPs, as well as predicting novel AMPs in the bovine genome using hidden Markov models69,70. Recently, feature-extraction methods have been applied to the Collection of Antimicrobial Peptides71, and alignment-based AMP detection was performed72. Another example is the ANTIMIC collection, which contains approximately 1,700 known and putative AMP sequences73. One of the original data resources, the Antimicrobial Sequences Database, was created more than 14 years ago and there are now at least 21 AMP repositories in different stages of activity and maintenance74. The Antimicrobial Peptides Database (APD)75, and later the updated APD2 (Ref. 76), is a valuable resource of AMP sequences. This well-maintained database offers various options for convenient searches in different phylogenetic kingdoms. In recent years, the discovery of AMPs from plants and marine organisms has further augmented the variety of anti-infective peptides with biotechnological and pharmaceutical potential77,78.

Despite many convenient database analysis options, one may doubt whether studies across AMPs that are structurally widely dissimilar will be fruitful for understanding AMP activity in general, given their potentially diverse mechanisms of action. To address this issue and enable the development of improved data resources, synthetic peptide design will help to unravel the underlying structure–activity relationships (SARs). Like databases of small, drug-like bioactive compounds, AMP databases provide an indispensable knowledge base for both qualitative and quantitative activity prediction models79,80,81. Knowledge-based computational approaches have already led to the design of synthetic AMPs, such as the adepantins (automatically designed peptide antibiotics)82, and allowed the systematic mining of genomic expressed sequence tag data aimed at the discovery of hitherto undescribed natural AMP sequences83.

Synthetic AMPs

To develop AMPs into therapeutics, several objectives must be met. As an anti-infective treatment, an AMP must be active against the pathogen of interest and have low toxicity at the therapeutic dose (that is, it must have a high therapeutic index). Recent research on AMPs has focused on methods to search through the constellation of known or predicted peptide sequences — either empirically or computationally — for peptides with desired properties, and these approaches are continually evolving. We distinguish three research approaches: modification of known AMP sequences (known as templates) with limited computational input; rigorous biophysical modelling to understand peptide activity; and virtual screening. We outline these methodologies and data requirements below.

Template-based studies. Studies of AMP activity based on a template AMP with known activity seek to identify peptides with greater antimicrobial activity or reduced toxicity based on altered amino-acid sequences84. These studies have often systematically — but incompletely — changed a single amino acid in the peptide to identify amino acids and positions that are important for activity. Cecropin, magainin, protegrin and lactoferricin have all been used as AMP templates. In most cases, the source of their activity has been investigated by examining peptide variants that were designed on the basis of the general — but somewhat limited — concept that charge and amphiphilicity are crucial to peptide activity85,86,87. Despite their apparent limitations, such studies have shed light on the importance of specific amino acids and residue positions to peptide activity.

It is evident, however, that such 'local' approaches fail to account for interactions between amino acids that influence the global three-dimensional conformation of the peptide88. Using high-throughput peptide synthesis on arrays, coupled with a high-throughput and rapid luminescence-based assay for bacterial killing89, it has been possible to carry out a complete substitution study including all single amino-acid changes to bactenecin 2A, a linearized version of bovine bactenecin90,91. Overall, it is clear when comparing two amino-acid substitution studies that the substitutions favouring activity vary according to the template sequence, and analogous substitutions may well have substantially different effects at different positions in the primary sequence. In other words, the effect of a particular residue substitution is context-dependent92.

Possibly the most intuitive attempt to model AMPs based on natural peptide sequences was a linguistic model in which sequences in one-letter amino-acid codes were considered as 'text' and 'formal grammar rules' were applied to identify text patterns in naturally occurring peptides93. When novel peptides were constructed based on this linguistic model but selected to be dissimilar to natural AMPs, they were found to be superior to similar peptides with a shuffled amino-acid sequence, demonstrating that this method did indeed identify functionally relevant patterns or motifs. Out of the 40 peptides that were designed, four exhibited activity against Escherichia coli or Bacillus cereus at concentrations below 64 μg per ml; the most potent peptide designed through this approach, D28, had an MIC against Staphylococcus aureus of 4–8 μg per ml and MICs against E. coli of 64 μg per ml. Although these values fall far short of the most potent natural peptides and even the peptides used in the training sets, this study elegantly demonstrated how innovative computer-based modelling could support peptide design.

Several studies have chosen templates that are based on small amino acids or reduced numbers of amino acids. As early as 1992, AMPs consisting only of leucine and lysine residues were synthesized, which were active against both Gram-negative and Gram-positive bacteria94. Short peptides acting as AMPs were designed de novo that consisted of only tryptophan, leucine and lysine residues, and the positioning of tryptophan residues and sequence length on antimicrobial activity was investigated95. This study also succeeded in designing peptide sequences that are capable of forming amphipathic helices, using only arginine and valine as prototype residues.

In another project, the generic template sequence acetyl-C-HBHB(P)HBH-GSG-HBHB(P)HBH-C-amide (where B corresponds to a cationic residue; H corresponds to a hydrophobic residue; and P corresponds to a polar residue) served as the starting point for the design of anti-endotoxic peptides using chemoinformatics methods96. Following a similar template approach, simplistic cationic amphiphilic peptides have been devised and shown to have membrane-lytic effects against Bacillus subtilis and C. albicans by scanning electron microscopy85. Notably, the replacement of very hydrophobic residues in the apolar face of the amphipathic helix with weakly hydrophobic residues decreased the lytic effect of the peptide on erythrocytes, demonstrating that excessive hydrophobicity in AMPs increases undesirable erythrocyte lysis97.

Recently, a template-based design approach has been presented, introducing the concept of specificity determinants to achieve membrane selectivity98. This design concept involves the use of positively charged amino-acid residues in the centre of the non-polar face of amphipathic α-helical AMPs to enhance the peptide's selectivity between eukaryotic and prokaryotic membranes. Starting from a known broad-spectrum AMP, the systematic alteration of residues led to reduced haemolytic activity and improved therapeutic indices against the targets Acinetobacter baumannii and P. aeruginosa. These results demonstrated that a reduction in haemolytic activity can be achieved using computer-assisted drug design without losing antimicrobial activity, and that both charge distribution and structural features have a delicate role in the balance between membrane recognition and the selectivity of AMPs.

In 2009, the design of an AMP that exhibits considerable selectivity for both P. aeruginosa and Streptococcus mutans was reported99. This effect was accomplished by fusing a nonspecific widespread AMP with a specifically targeted AMP. One domain of the chimeric peptide was proposed to harbour the killing function, whereas the other led to selectivity. Forming chimaeras from two active compounds is a popular concept in computer-assisted drug design, and is used by the software tools TOPAS100 and BREED101. However, this approach may not always be applicable to peptides, as it does not automatically take into account structural folding (or refolding) that might involve the two supposed domains.

Another important strategy for enhancing AMP activity is the addition of acyl moieties, as these can provide the necessary hydrophobic domains for making short peptides102,103,104. For example, the aminolauryl-acylated AMP sequence ALWKTLLKKVLKA exhibits improved bactericidal properties and is less prone to degradation by plasma proteases than the unmodified peptide42.

Biophysical studies. In contrast to template-based design methods in which peptides are treated as a 'text' formed by amino acid 'letters' bearing individual properties like charge and hydrophobicity values, biophysically motivated modelling studies aim to understand AMP activity and design improved variants by examining peptide structures in hydrophobic environments or by modelling peptides at the atomic level. Examples of these types of studies include molecular modelling based on free energy perturbation, molecular dynamics simulation as well as thermodynamic calculations of the interactions of peptides with membranes105. Such computationally demanding exercises have been successfully applied to drug design and optimization106,107, and are also increasingly being applied to peptide design. The AMPs bactenecin108 and indolicidin109 are examples of known peptides that have been used as templates for structure-guided design.

Conversely, porcine protegrin110 was investigated using molecular dynamics methods. Protegrin is thought to act primarily by causing membrane disruption as a result of pore formation (at a concentration of approximately 1 μg per ml), and it is amenable to computational modelling because of its β-hairpin structure111. Although the precise details of the mechanism of action of protegrin are largely unknown, the model of protegrin activity suggests that the following mechanisms are involved: electrostatic attraction to the anionic membrane; dimerization; insertion of protegrin into the membrane; and the formation of large aggregates that lead to transmembrane pores and a consequent lethal flux of ions from the cytoplasm (Fig. 1).

Molecular dynamics simulations may involve representing each atom of the peptide, surrounding solvent and portions of the membrane; the value of a model is inevitably limited by the complexity of natural systems as simplifications have to be made in computational approaches. For example, one can restrict the number of peptide atoms represented or replace the solvent by a continuous medium (known as an implicit solvent model). Interactions between atoms or coarser-grained sites are typically calculated based on Coulomb and van der Waals forces as well as bond interactions using methods derived from a consistent empirical force field. Simulations and experimental confirmation are often restricted to peptides interacting with micelles or lipid bilayers as surrogates for the bacterial membrane. Phenomena such as lipid bilayer thinning and conductance induction, as well as the biological effects of pores, have emerged from these studies, and potentially relevant hydrogen-bonding sites have been suggested112,113,114.

Even though molecular dynamics simulation of the AMP–membrane interaction can provide a working hypothesis, one has to bear in mind that by using the currently available technology the initial conditions for simulation (including the conformation of pore formation) must be well defined, and practical simulation times might be too short to allow spontaneous pore formation to be observed — if indeed this occurs biologically. Simulation studies are often interpreted by a disordered toroidal pore model but they tend to demonstrate115 that only a very small number of peptides (as few as one or two) are oriented in a perpendicular manner to the membrane, causing substantial membrane perturbation, and these peptides do not tend to cluster (which is possibly more consistent with an aggregate model of an AMP–membrane interaction)116. Nevertheless, molecular dynamics simulation was successfully applied in the design of ovispirin117 and indolicidin118 analogues, and has led to the development of peptides with a twofold improvement in antimicrobial activity and a tenfold decrease in haemolytic activity.

Virtual screening studies. Virtual screening methods can be used when exhaustive synthesis and testing is prohibitively expensive and biology-assisted techniques such as phage display119 cannot be applied120. These approaches have the advantage of having only a few a priori assumptions, as they seek to impute peptide structures based on primary sequences. In contrast to computational simulation studies, virtual screening studies do not necessarily attempt to create models with immediately and easily interpretable outcomes. Instead, numerical methods are used to determine quantifiable properties of peptides (descriptors) — such as charge and hydrophobicity — from the primary structure and physicochemical characteristics of the peptides, and they are used to relate such properties to the biological activities of the peptides using SAR models. Virtual screening studies — more specifically, quantitative SAR (QSAR) models — apply numerical analyses to describe the relationship between these descriptors as input variables and biological activity as the output variable. Many papers have been published on the statistical analyses and machine-learning methodologies that may be used for this purpose121,122.

The most important aspect of computer-assisted AMP design (Fig. 3) is the accurate estimation or prediction of desired biological activity from the primary amino-acid sequence alone. Since the 1980s, computational QSAR models for peptides have been used as a guide for activity prediction and sequence optimization for several biological activities (Table 1). In the 1990s, machine-learning methods — specifically artificial neural networks123 — replaced the more traditional regression functions for peptide QSAR models. Currently, a common computer-based design approach involves combining a sophisticated activity estimator with a technique that enables stochastic optimization (that is, involving random iterative processes to overcome issues associated with experimental noise).

Figure 3: Computer-assisted molecular design cycle.
figure 3

Peptide design starts from scratch (de novo) or from known peptides that have a desired activity (also termed 'seed peptides'). In an iterative process new peptides are generated in silico using alternating variation-selection operators. A 'fitness function', often a machine-learning model, guides the design towards regions in sequence space that contain residue sequences with a higher predicted biological activity.

Table 1 Selection of computer-based peptide design concepts and approaches

Evolutionary algorithms, particularly evolution strategies and genetic algorithms, in which the process of evolution is performed in silico through successive generations of mutations, deletions, sequence shuffling and so on, have also been used to search for peptides with improved activity124,125,126. Stochastic approaches are suited for peptide optimization in a vast search space, particularly for long sequences for which deterministic methods are prohibitive. Although some implementations of evolutionary algorithms suffer from certain well-known computational inefficiencies (for example, a dependency on parameter initialization, partially insufficient sampling and premature convergence)127, they have proven their applicability in many practical studies128,129; this can be attributed to their overall robustness to experimental noise and optimization involving many solutions that are locally optimal.

Notably, for each molecular design problem there is a best-suited combination of the size of a screening compound library (the number of peptides synthesized and tested at a time) and the number of iterative synthesize-and-test rounds required, with the aim of keeping experimental efforts minimal130. The overall concept is to utilize the results that are obtained by activity testing or prediction to influence the decision as to which peptides should be designed, generated and tested in the next cycle. Such a 'memory' concept is characteristic of adaptive systems and essential for successful navigation in a large sequence space131,132 (Box 1).

Molecular descriptors and QSAR models

In general, peptide modelling is guided by the same concepts as small-molecule drug design, particularly to account for the underlying pharmacophore and molecular shape features that are relevant for an observed or desirable activity133,134. Modelling peptides using molecular descriptors that account for these features is not a new approach. In 1987 a set of three descriptors (z1, z2 and z3) was proposed, based on a principal component analysis of 29 largely empirical properties such as molecular weight, pK (the logarithm of the dissociation constant), pI (the isoelectric point), nuclear magnetic resonance measurements and chromatographic indices135. Studies examining peptides based on lactoferricin successfully used the z-descriptors as well as other descriptor sets136,137,138,139. Other early studies of AMPs using chemoinformatics were based on protegrin140,141.

The most straightforward molecular representations in terms of computed descriptors provide a quantification of whole-molecule properties — such as partial charge, hydrophobicity and amphiphilicity — and are relatively intuitive. However, arrays of descriptors have been developed that lack such clear interpretations and intuitive understanding. These include descriptors that are calculated based on theoretical models, such as van der Waals surface area and hardness (akin to the energy required to remove the outermost electron), as well as properties that are experimentally measured, such as retention time in a given chromatography column, pI, octanol–water partition fraction or circular dichroism. Commercial software packages are available that offer hundreds of descriptors of the molecular nature of molecules, and are often customized specifically to the type of molecule (for example, they can be specific for small compounds versus peptides and proteins)142.

Nevertheless, to some extent the computational approaches pursued in peptide design have been uncoupled from those used in small-molecule drug design. This might be due to the comparably higher molecular weight, increased flexibility and abundance of repetitive pharmacophoric features (such as the amide backbone) of peptides. The choice of descriptors has often been made based on a prior understanding of the physical attributes that give rise to the activity of the peptide, but ideally descriptors can and should be automatically selected during numerical analysis, through a method termed 'feature selection'143,144.

Regardless of the numerical method used, most algorithms require at least as many distinct peptides with measured activity as the number of parameters matched by the method. For simplistic methods this is not an onerous requirement (for example, using only two parameters can fit activity to a linear method). However, more recent studies on modelling AMP activity have led to models based on machine-learning methods that involve simultaneously fitting tens to hundreds of parameters depending on the configuration. Many descriptors are available for modelling AMPs, and a substantial number of modelling methods have been applied to find QSAR functions that predict AMP activity (reviewed in Ref. 145).

Complex prediction models using machine-learning methods often require a large number of measured values of AMP activity to fit the correspondingly large set of parameters. Solid-phase synthesis and high-throughput screening of large peptide arrays has become common practice in drug discovery146. For peptide lengths of up to 6 or 7 residues, full combinatorial arrays have been only marginally practically feasible, resulting in a sequence space containing 20n peptides for the 20 natural amino acids (if n = 6, 206 = 6.4 × 107; if n = 7, 207 = 1.28 × 109); owing to potential crosslinking, oxidation, poor aqueous solubility and synthetic issues, cysteine and methionine residues as well as very hydrophobic sequences are usually excluded from these arrays.

Measurements of MIC, which must be made using low-throughput methods (overnight incubation with dilutions of AMPs), have restricted the size of data sets available for modelling AMP activity. Surrogate measures of bacterial killing, such as lipid vesicle experiments147 or the diminished energy-dependent luminescence of bacteria constitutively expressing luciferase90, have been used to develop faster assays for peptide activity. By replacing MIC values with such surrogate measures148, and combining these with high-throughput analysis and relatively inexpensive peptide array synthesis on cellulose sheets, a set of peptides with more than 1,400 distinct sequences was assayed for activity149,150. Initially, two iterative sets of randomized peptide sequences were synthesized and tested for antibacterial activity against P. aeruginosa. These peptide sequences were biased according to the amino-acid composition of the most active peptides, which were especially rich in lysine, arginine, tryptophan and hydrophobic amino acids.

Based on these data, several artificial neural networks were trained to recognize potent peptides using the measured peptide activities and 44 calculated descriptors of the peptides based on the properties of the amino-acid sequences (rather than the sequence per se). To maximize the use of the data, a set of crossvalidated neural networks was used rather than a single neural network. A consensus of this set of neural networks was used to predict the ranked activities of nearly 100,000 virtual (that is, generated in silico) peptides. A total of 200 peptides, classified into four levels of activity (assessed by comparison with the bactenecin 2A control), were synthesized and used for validation. Of the novel peptides predicted to be highly active compared to the bactenecin 2A control, 94% were found to be highly active. Of the novel peptides predicted to have low activity, all were found to have low activity. Interestingly, when MIC values were measured against clinical pathogens with demonstrated drug resistance, many synthetic peptides had MIC values <10 μM and several had MIC values <1 μM, which is equivalent to the best peptides described in the literature, despite being significantly shorter (nine residues long) than any natural peptide (natural peptides can be >12 residues long, but are usually >18 residues long).

It is also notable that peptides with similar overall properties, such as hydrophobicity or charge, can have dramatically different levels of activity. For example, the peptides KRWWKWIRW and KRWWKWWRR demonstrated the highest antibacterial activity, whereas peptides with a very similar residue composition — for example, WHGVRWWKW, WVKVWKYTW, WVRFVYRYW and AIRRWRIRK — were ranked in the least active 30% of peptides and found to be virtually inactive. These findings suggest that there are common physicochemical features shared by AMPs, presumably in the context of their three-dimensional structures. Some of the features found in the most active AMP sequences are consecutive pairs of tryptophan residues and interspersed arginine and/or lysine residues; these residues might aid in the general interfacial affinity of these AMPs for lipid membranes or in their translocation across lipid membranes (Fig. 4).

Figure 4: Artificial fitness landscape spanned by synthetic peptides.
figure 4

There is a visible separation between only moderately active peptides (red) and potent antimicrobial peptides (AMPs) (blue). Selected peptide sequences are shown together with their relative IC50 (half-maximal inhibitory concentration) values indicating their antibacterial activity against Pseudomonas aeruginosa in a reporter gene assay151. Many active peptides are strongly enriched in tryptophan and arginine residues. For computation and visualization of the structure–activity landscape, each nonapeptide was represented by 9 × 19 properties (principal component scores computed from a set of 434 physicochemical amino-acid properties)123,198 and projected to two new coordinates using stochastic proximity embedding199. The continuous landscape was generated using the software tool LiSARD200.

Evolutionary search methods for peptide sequences allow for a guided search in the sequence space, in which peptide sequences are varied to achieve improvements in a 'fitness landscape' — an analogy for visualizing how good (or well matched) numerical solutions are in the space of possible parameter settings (Box 1). This concept of peptide design was first applied to signal peptides88,92. Recently, the methodology has been adapted to generate potent synthetic AMPs. Using trained neural network models as an estimate of peptide 'fitness', a genetic algorithm was used to find AMPs with greater efficiency in a preliminary computational screening151.

Owing to the limited number of studies that have used machine-learning models for peptide design, we feel that an assessment of their relative performance and practical applicability would be premature. Nevertheless, pioneering concept studies have already demonstrated that novel, short AMP sequences with substantial biological activity can be obtained using adaptive computer-based design. It is safe to say that machine-learning applications hold substantial promise for peptide design in general, and not just for finding novel membrane-interacting peptides like AMPs.

AMPs as drugs: challenges and solutions

Since 2000, 20 new antibiotics have been launched and approximately 40 compounds are currently in active clinical development152. Several synthetic AMPs have entered clinical trials and at least 15 peptides or mimetics are in (or have completed) clinical trials as antimicrobial or immunomodulatory agents (Table 2). In a Phase I/II trial, the AMP hLF1–11 (which is composed of amino-terminal amino acids 1–11 of human lactoferrin) was found to be safe and well tolerated when delivered intravenously153. By contrast, a protegrin derivative, iseganan, failed to demonstrate significant efficacy in the clinic30,154.

Table 2 Selected host defence peptides in drug development

Two peptides have demonstrated efficacy in Phase III clinical trials but have not yet been approved. Pexiganan155, a derivative of magainin, showed equivalence to an oral fluoroquinolone for foot ulcer infections in patients with diabetes but was deemed non-approvable by the US Food and Drug Administration, although there is evidence it might resurface in clinical trials. Omiganan (MBI-226), an analogue of indolicidin, has been proven to be capable of significantly reducing catheter colonization and microbiologically confirmed tunnel infections during catheterization ( identifier: NCT00608959), and in Phase II trials it exhibited anti-inflammatory activity against the non-infectious skin condition rosacea. Both of these peptides are first-generation peptides that were devised by template-based design. Overall, the peptides that are currently in the clinic offer fascinating alternatives to standard therapies and indicate that synthetic peptides are an active and promising area of research. AMP-coated devices represent another promising application, although the reduction in antimicrobial activity by the tethering of the peptide to solid supports must be overcome156. SAR studies have demonstrated that tethered peptides are nearly 100-fold less active (on a molecular weight basis) than their soluble counterparts157.

Despite several attempts to develop AMPs as antibiotics, the reasons why synthetic AMPs have not progressed more successfully through the clinic include the cost of goods, their lability to proteolytic degradation, and their unknown toxicology profile when administered systemically2. Each of these factors can be addressed by the peptide design approaches described above in combination with advanced chemoinformatics tools158,159. For example, the cost of goods can be addressed by making smaller peptides, and machine-learning approaches have already delivered highly active, broad-spectrum peptides that work systemically in animals. The liability to degradation by proteases in the body can be addressed using D-amino acids, non-natural amino-acid analogues, mimetics with different backbone structures or appropriate formulations160,161.

The toxicology of AMPs can typically be addressed by making a plethora of highly active sequences and testing these for lack of toxicity in animals and/or using formulations that mask the peptides — for example, liposomal formulations2,162. Although it has become common to investigate the haemolytic toxicity of AMPs163, it is evident that reliable computational toxicology prediction will be necessary to improve design algorithms that explicitly consider crucial preclinical toxicological end points for AMPs. Taking into consideration the multifactorial nature of toxicology and current lack of large sets of published standardized toxicology data for AMPs, some machine-learning methods and alerting tools have been devised that seem to be suited for this task164,165,166. Multidimensional design techniques have been devised and were first applied to combinatorial optimization and drug discovery167,168,169,170,171,172. Computer-assisted peptide design and peptide-mimetics design coupled with in silico pharmacology will undoubtedly benefit from these methodological advances173,174.

A recent and extensive review of the field of peptide mimetics provides an overview of various peptide-to-drug design approaches175. In many cases these design principles are analogous to those described above and will benefit from prior experiences in this arena with natural peptides176,177. For example, the reported effects of secondary-structure disruption or modification of D-amino-acid replacements in AMPs suggest that secondary-structure preference and biological activity are not directly coupled178. Furthermore, methylation has been shown to fine-tune the haemolytic activity of a cecropin A–melittin-derived helical AMP (KWKLFKKIGAVLKVL-amide)179 without significantly affecting the secondary structure of the AMP. These studies also suggest that helix formation in at least part of this chimeric AMP, together with ionic interactions with the bacterial membrane, is mandatory for direct antimicrobial activity.

There are apparent similarities in amino-acid composition (with the exception of positively charged residues) between aggregation-prone regions of proteins and AMPs, and it has recently been shown that amyloid-forming peptides can be turned into membrane-disrupting AMPs by placing cationic amino acids at selected residue positions so that the mutated peptides possess the ability to adopt amphiphilic structures180. Such experiments provide a rationale for linking molecular structure with direct antimicrobial activity in peptides that have been designed de novo. The apparent preferences of AMPs for certain membranes could be triggered by differential membrane lipid compositions181, and methodological advances in structure determination will aid future investigations of the dynamic behaviour of AMP structure at the membrane–solvent interface; this was recently demonstrated using deep-ultraviolet resonance Raman spectroscopy of a model helical peptide embedded in a membrane-mimetic environment182.

Given these considerations, how likely is it that AMP-like compounds will succeed in delivering their therapeutic potential? There are clear precedents for cationic peptides having clinical efficacy183; the cationic lipopeptide polymyxin is the last-resort drug for treating multiresistant Pseudomonas spp. and Acinetobacter spp. infections, the cyclic cationic peptide gramicidin S is highly used in topical ointments and eye drops, and the cationic lantibiotic nisin is an approved food additive in Europe184. Thus, in our opinion, the increasing availability and use of innovative computer-assisted design strategies has considerable potential to boost the discovery of next-generation therapeutic peptides and peptide mimetics as anti-infectives not only for targeting bacteria that have become resistant to existing antibiotics but also for targeting disease-causing protozoa, helminths, insects and fungi185.