Introduction

One of the most profound open questions in biology is how the genetic code was established. While proteins are encoded by nucleic acid blueprints, decoding this information in turn requires proteins. The emergence of this self-referencing system poses a chicken-or-egg dilemma and its origin is still heavily debated1,2. Aminoacyl-tRNA synthetases (aaRSs) implement the correct assignment of amino acids to their codons and are thus inherently connected to the emergence of genetic coding. These enzymes link tRNA molecules with their amino acid cargo and are consequently vital for protein biosynthesis. Beside the correct recognition of tRNA features3, highly specific non-covalent interactions in the binding sites of aaRSs are required to correctly detect the designated amino acid4,5,6,7 and to prevent errors in biosynthesis5,8. The minimization of such errors represents the utmost barrier for the development of biological complexity9 and accurate specification of aaRS binding sites is proposed to be one of the major determinants for the closure of the genetic code10. Beside binding side features, recognition fidelity is controlled by the ratio of concentrations of aaRSs and cognate tRNA molecules11 and may involve spatial secondary structures motifs in addition to side chain configurations12,13.

Evolution

The evolutionary origin of aaRSs is hard to track. Phylogenetic analyses of aaRS sequences show that they do not follow the standard model of life14; the development of aaRSs was nearly complete before the Last Universal Common Ancestor (LUCA)15,16. Their complex evolutionary history included horizontal gene transfer, fusion, duplication, and recombination events14,17,18,19,20,21. Sequence analyses22 and subsequent structure investigations23,24 revealed that aaRSs can be divided into two distinct classes (Class I and Class II) that share no similarities at sequence or structure level. Each of the classes is responsible for 10 of the 20 proteinogenic amino acids and can be further grouped into subclasses15. One exception to this class separation rule is lysyl-tRNA synthetase (LysRS), where euryarchaeal genomes were shown to contain a Class I form25 instead of the standard Class II form. Most eukaryotic genomes contain the complete set of 20 aaRSs. However, some species lack certain aaRS-encoding genes and compensate for this by post-modifications7,26,27,28 or alternative pathways29,30,31. A scenario where Class I and Class II originated simultaneously from opposite strands of the same gene32,33 is among the most popular explanations for the origin of aaRSs. This so-called Rodin-Ohno hypothesis (named after Sergei N. Rodin and Susumu Ohno32) is supported by experimental deconstructions of both aaRS classes34,35,36. At the dawn of life the concurrent duality could have allowed to implement an initial binary choice, which is the minimal requirement to establish any code9.

Origin of genetic coding

Several theories exist (for a summary see reference2) that aim to explain the origin of the genetic code and its self-translating machinery. The theory of co-evolution37, states that the appearance of amino acids via new biochemical synthesis pathways was strongly coupled with their integration into the genetic code. Thus, the co-evolution theory takes the age of amino acids—defined by the complexity of their biochemical pathways38—into account. Another theory, ambiguity reduction of physicochemical properties39,40, considers the major selective pressure for genetic code emergence to be the minimization of deleterious effects of mutations. According to this theory, codons that differ only in a single base, encode for amino acids with comparable physicochemical properties to mitigate the effect of translation errors. The role of stereochemical forces in genetic code formation, is supported by numerous studies2. Here, the role of primordial amino acid binding structures is seen as a major determinant of which amino acids were embedded in the genetic code. This theory is of special interest in the light of our study, as amino acid binding sites of aaRSs might have composed such an ancient recognition structure.

Biochemical function

In order to fulfill their biological function aaRSs are required to catalyze two distinct reaction steps. Prior to its covalent attachment to the 3’ end of the tRNA molecule, the designated amino acid is activated with adenosine triphosphate (ATP) and an aminoacyl-adenylate intermediate is formed41,42. In general, the binding sites of aaRSs can be divided into two moieties: the part where ATP is bound as well as the part where specific interactions with the amino acid ligand are established (Fig. 1). Is is assumed that the amino acid activation with ATP constituted the principal kinetic barrier for the creation of peptides in the prebiotic context36. Due to the fundamental importance of this first reaction step, highly conserved sequence4 and structural motifs43 exist, which are likely to be vital for the aminoacylation reaction. These structural motifs were detected in our previous study43, reinforcing the Class I and Class II separation of aaRSs. In structures of Class I aaRSs, ATP is bound via backbone hydrogen bonds. This motif, termed Backbone Brackets, undergoes structural rearrangement upon ATP binding and is only revealed at functional interaction level. Class II aaRSs ensure ATP binding with a pair of arginine residues, forming salt bridges towards the ATP molecule. These Arginine Tweezers are observable at sequence as well as structure level. While the activation of amino acids with ATP is the basic requirement of all aaRSs and is consistent within each aaRS class43, the recognition mechanism of individual amino acids differs substantially between each aaRS. These differences are among the key drivers to maintain a low error rate during the translational process.

Figure 1
figure 1

The aaRS·tRNA complex (PDB:1f7u) and the architecture of its active site. The enzyme catalyzes the covalent attachment of an amino acid to the 3’ end of a tRNA molecule. The binding site itself can be divided into two moieties. While the ATP moiety is responsible for the fixation of ATP, which is consistent within each aaRS class43, the specificity-conferring moiety differs between each aaRS and forms highly specific non-covalent interactions with the amino acid ligand. Depending on the reaction state (pre- or post-activation), the ATP moiety contains ATP or an adenylate group.

Non-covalent binding site interactions

Non-covalent protein-ligand interactions play an important role for the specific binding of any ligand. These interactions are generally reversible and correspond to an energy of binding between −80 kJ\(\cdot \)mol\(^{-1}\) and −10 kJ\(\cdot \)mol\(^{-1}\), which is less compared to covalent interactions44. Several types of non-covalent interactions exist that can add energetic contribution to the binding of a protein ligand-complex. Each type is constrained regarding interaction partners and geometry. Generally, directed hydrogen bonds are considered to be the strongest non-covalent interaction, followed by \(\pi \)-cation and \(\pi \)-stacking interactions, electrostatic (or salt bridge) interactions, and hydrophobic interactions45. Based on experimentally determined three-dimensional structures of protein-ligand complexes, non-covalent interactions can be studied computationally. However, this requires a detailed annotation of non-covalent interaction patterns. In this study, we use the rule-based Protein-Ligand Interaction Profiler (PLIP)46 to characterize the amino acid binding in aaRSs.

Motivation

In our last study we identified two unique ATP binding motifs in Class I and Class II aaRSs43, which are by now the minimal description of the two classes. Hence, a detailed study of the amino acid binding site is the logical next step to extend the picture of ligand binding in aaRSs. Protein structures of aaRSs from all kingdoms of life, co-crystallized with their amino acid ligands, are publicly available in the Protein Data Bank (PDB)47. Furthermore, there are tools such as PLIP46 to characterize and map the interactions of proteins and their ligands. These rich data allow for the investigation of specific characteristics of amino acid recognition in individual aaRS. The overall aim is to contribute to the understanding of how aaRSs realize the correct mapping of the genetic code and to provide a compendium of binding site interactions relevant to maintain amino acid specificity. The results shed light on how evolution implemented a specific recognition via the amino acid composition of the binding size, non-covalent interaction patterns, pre- or post-transfer correction mechanisms, and steric effects such as the volume of the binding cavity. Moreover, the overall recognition strategies for Class I and Class II aaRSs differ, suggesting that the existence of the classes allowed the enzymes to cover a broader ligand diversity and thus the gradual incorporation of new amino acids into the genetic code.

Results

Dataset

Based on all available structures in the PDB, 424 (189 Class I, 235 Class II) three-dimensional structures of aaRSs co-crystallized with their corresponding amino acid ligands were analyzed. The selected data covers aaRSs of 56 different species in total, 180 from eukaryotes, 213 from bacteria, and 31 from archaea (SI Appendix Fig. S1). In total, 70 human structures are part of the dataset. Each protein chain that contains a protein-ligand complex of a catalytic aaRS domain was considered. Data was available for each of the 20 aaRSs, plus the non-standard aaRSs pyrrolysyl-tRNA synthetase (PylRS) and phosphoseryl-tRNA synthetase (SepRS). Unfortunately, Class I LysRS could not be considered for analysis. The single structure of this enzyme from Pyrococcus horikoshii (PDB-ID: 1irx), which is part of the dataset, does not contain any co-crystallized amino acid ligand. The numbers of protein-ligand complexes available for each aaRS are given in SI Appendix Fig. S2. For twelve aaRSs, protein-ligand complexes were available in both pre-activation and post-activation reaction states, i.e. co-crystallized with either amino acid or aminoacyl ligand (SI Appendix Fig. S3). Out of all analyzed structures, 240 are in pre-activation and 184 in post-activation state. Out of the post-activation complexes, 72 are adenosine monophosphate (AMP) esters and 112 are non-hydrolysable analogs, mainly sulfamoyl derivatives.

Interaction features

The frequencies of observed non-covalent binding site interactions in respect of the aaRS class and the type of interaction are shown in Table 1. In general, hydrophobic interactions are the most prevalent interactions for Class I aaRSs with a frequency of 44.60% with respect to the total number of interactions, while hydrogen bonds are most frequently observed in Class II aaRSs with 59.23% frequency. Five (hydrogen bonds, hydrophobic interactions, salt bridges, \(\pi \)-stacking, and metal complexes) interaction types were observed in aaRSs. No \(\pi \)-cation interactions were observed to be involved in amino acid binding. Water bridges were excluded from the interaction analysis. Some aaRS structures deposited in the PDB are resolved including water, but other structures do not contain water molecules. In these cases, no water bridges can be detected using PLIP, despite them existing in vivo, which would lead to an experimental bias. Nonetheless, water molecules are known to mediate important interactions for ligand recognition48 and their role should not be underestimated.

Table 1 Overview of observed interactions between aaRSs and their amino acid ligands.

Amino acid recognition

The annotation of non-covalent protein-ligand interactions allowed to characterize interaction preferences of each aaRS at the level of individual atoms of their amino acid ligands. This analysis highlights the preferred modes of binding for each of the 22 amino acid ligands. Figure 2 shows the occurring interactions for each aaRS based on the analysis with PLIP. Each interaction is annotated with its occupancy, i.e. the relative frequency of occurrence in respect of the total number of structures for this aaRS. Binding site features are neglected at this point and all interactions are shown with respect to the amino acid ligand.

Figure 2
figure 2

The recognition of individual amino acids by aaRSs mapped to their ligands. The ligands are grouped by physicochemical properties49 and aaRS class. Different types of non-covalent protein-ligand interactions were determined with PLIP46 and assigned to individual atoms of the ligand using subgraph isomorphism detection50. Backbone atoms of the ligand are depicted as circles without filled interior. The relative occupancy of each interaction in respect of the total number of investigated structures (number in parentheses for each aaRS) is given by pie charts. Interactions with an occupancy below 0.1 are neglected. Interactions for which a unique mapping to an individual atom is not possible due to ambiguous isomorphism, e.g. for the side chain of valine, were assigned to multiple atoms. \(\pi \)-stacking interactions are shown in dark green and refer to all atoms of the aromatic ring structures in TyrRSs, TrpRSs and PheRSs. Some aaRSs prevent the mischarging of their tRNAs via error correction mechanisms (“editing”)51. The aaRSs conducting error correction are typeset in bold.

Class I

In general, Class I aaRSs interact mainly via hydrogen bonds and hydrophobic interactions with the ligand. The backbone atoms of all Class I ligands feature hydrogen bonding with the primary amine group. The occupancy of this interaction is high throughout all Class I aaRSs, indicating a pivotal role of this interaction for ligand fixation. Additionally, the oxygen atom of the ligand’s carboxyl group is involved in hydrogen bonding except for glutaminyl-tRNA synthetase (GlnRS), isoleucyl-tRNA synthetase (IleRS), and valyl-tRNA synthetase (ValRS). The same atom forms additional salt bridges in leucyl-tRNA synthetase (LeuRS), arginyl-tRNA synthetase (ArgRS), methionyl-tRNA synthetase (MetRS), and glutamyl-tRNA synthetase (GluRS). The side chains of the aliphatic amino acids leucine, isoleucine, and valine are exclusively bound via hydrophobic interactions. ArgRS and GluRS form salt bridges between binding site residues and the charged carboxyl and guanidine groups of the ligand, respectively. Glutamine is bound by GlnRS via conserved hydrogen bonds to the amide group and hydrophobic interactions with beta and delta carbon atoms. The two aromatic amino acids tyrosine and tryptophan are recognized by \(\pi \)-stacking interactions and extensive hydrophobic contact networks. Tryptophan is bound preferably from one side of its indole group at positions one, six, and seven. The sulfur atom of the cysteinyl-tRNA synthetase (CysRS) ligand forms a metal complex with a zinc ion in both structures. MetRSs bind their ligand with a highly conserved hydrophobic interaction with the beta carbon atom.

Class II

Class II aaRSs consistently interact with the backbone atoms of the ligand via hydrogen bonds and salt bridges. The primary amine group forms hydrogen bonds with high occupancy and is involved in metal complex formation in threonyl-tRNA synthetases (ThrRSs) and seryl-tRNA synthetases (SerRSs). The carboxyl oxygen atoms of the ligands are bound by a combination of hydrogen bonding and electrostatic salt bridge interactions. The overall backbone interaction pattern is highly conserved within Class II aaRSs. Closer investigation revealed that a previously described structural motif of two arginine residues43, responsible for ATP fixation, seems to be involved in stabilizing the amino acid carboxyl group with its N-terminal arginine residue. The charged amino acid ligands in histidyl-tRNA synthetase (HisRS) and LysRS form highly conserved hydrogen bonds with the binding site residues. Other specificity-conferring interactions include \(\pi \)-stacking interactions and hydrophobic contacts observed for phenylalanine-tRNA synthetase (PheRS), metal complex formation for ThrRS and SerRS with zinc, and salt bridges as well as hydrogen bonds for aspartyl-tRNA synthetase (AspRS). The amino acids alanine and proline are bound by alanyl-tRNA synthetases (AlaRSs) and prolyl-tRNA synthetases (ProRSs) via hydrophobic interactions. No specificity-conferring interactions can be described for the smallest amino acid glycine due to absence of a side chain. Hence, glycyl-tRNA synthetase (GlyRS) can only form interactions with the backbone atoms of the ligand. Furthermore, asparaginyl-tRNA synthetases (AsnRSs) mediate highly conserved hydrogen bonds with the amide group of their asparagine ligand. The non-standard amino acid pyrrolysine is bound by PylRS via several hydrogen bonds and hydrophobic interactions with the pyrroline group. SepRSs employ mainly salt bridge interactions to fixate the phosphate group of the phosphoserine ligand.

Conserved Interaction Patterns

Class I aaRSs show a strong conservation of hydrogen bonds with the primary amine group of the amino acid ligand with 83.16% of all structures forming this interaction. Interactions with the carboxyl group are less conserved with a frequency of 32.65% for hydrogen bonds and 28.57% for salt bridges, respectively. In this context, the salt bridges with the carboxyl group are a form of extra strong hydrogen bonding52. Interaction patterns with the backbone atoms of the amino acid ligand are strikingly consistent within Class II aaRSs. This class forms hydrogen bonds with the primary amine group in 92.15% of all structures. Additionally, hydrogen bonds with the oxygen atom of the carboxyl group occur in 65.70% of all structures and salt bridges with the same atom are formed in 39.26% of all Class II protein-ligand complexes.

Similar recognition requires editing mechanisms

Various aaRSs are known to conduct pre- or post-transfer editing (see the work of Perona and Gruic-Sovulj51 for a detailed discussion of editing mechanisms) in order to ensure proper mapping of amino acids to their cognate tRNAs. The similarity of interaction preferences depicted in Fig. 2 suggests that groups of very similar amino acids require editing mechanisms for their correct handling. Especially the three aliphatic amino acids isoleucine, leucine, and valine are bound via unspecific and weak hydrophobic interactions, substantiating the necessity of editing mechanisms observed for their aaRSs53 and that substrate hydrophobicity cannot entirely account for specificity54. Distinction between those three similar amino acids is proposed to happen via the “double sieve”52 mechanism. Exemplarily for IleRS, amino acids larger than isoleucine are excluded with the “first sieve” at the aminoacylation site, whereas smaller amino acids (like valine and leucine) are sorted out by the editing domain, functioning as a finer “sieve”. Specificity can therefore be accomplished by steric selection based on side chain length and shape at the editing site53. A similar trend can be observed, e.g., for AlaRS55 in order to distinguish alanine from serine or glycine.

Binding site geometry and cavity volume

We investigated binding site geometry and cavity volume in order to quantify their potential contribution to amino acid recognition. Known editing mechanisms in aaRSs are focused on the prevention or correction of tRNA mischarging within one aaRS class (intra-class), e.g. the amino acids isoleucine, leucine, and valine belong to Class I. However, GluRSs and AspRSs have a highly similar interaction pattern of hydrogen bonds and salt bridges with the carboxyl group and weak hydrophobic interactions. Both aaRSs do not use editing and are handled by different aaRS classes. In this case, the geometry and size of the binding site can act as an additional layer of selectivity; a mechanism also exploited by ValRS53,56. To quantify the contribution of binding site geometry, seven structures of GluRS and six structures of AspRS were superimposed with respect to their common adenine substructure using the Fit3D57 software. As this superimposition can solely be computed for protein-ligand complexes which resemble the post-reaction state, only a subset of the structures was used. The results show that the ligands of GluRSs and AspRSs are oriented towards different sides of a plane defined by their common adenine substructure (Fig. 3A). There is a significant difference (Mann-Whitney U p<0.01) in ligand orientation, described by the torsion angle between phosphate and the amino acid substructure of the ligand (Fig. 3B). Class I GluRSs feature a torsion angle of 54.64 ± 7.12\(^{\circ }\), whereas the torsion angle of Class II AspRSs is −65.02 ± 7.40\(^{\circ }\). Furthermore, the volume of the specificity-conferring moiety of the binding site (see Fig. 1) was estimated with the POVME58 algorithm. It differs significantly (Mann-Whitney U p<0.01) between GluRS (147.00 ± 22.31 Å\(^3\)) and AspRS (73.34 ± 17.12 Å\(^3\)). This trend can be observed for all Class I and Class II structures, respectively. An analysis of all representative structures for Class I and Class II aaRSs shows that Class I binding sites are significantly (Mann-Whitney U p<0.01) larger on average (Fig. 3C). While Class I binding cavities have a mean volume of 143.40 ± 39.62 Å\(^3\), Class II binding sites are on average 90.36 ± 32.09 Å\(^3\) in volume.

Figure 3
figure 3

Binding geometry and binding cavity volume analysis. (A) Binding geometry of GluRSs and AspRSs. Aminoacyl ligands of Class I GluRSs and Class II AspRSs in post-activation state aligned with Fit3D57 with respect to their adenine substructure. The midpoints of non-covalent interactions46 with binding site residues are depicted as small spheres. Blue is hydrogen bond, yellow is salt bridge, and gray is hydrophobic interaction. (B) Distribution of torsion angles between the phosphate and amino acid substructure of the ligand. The orientation of the ligand in the binding site differs significantly (Mann–Whitney U \(p<0.01\)) between GluRSs and AspRSs. (C) The volume of the specificity-conferring moiety of the binding site, estimated with the POVME algorithm58, differs significantly between Class I and Class II aaRSs (Mann-Whitney U \(p<0.01\)).

Interaction patterns of individual aaRSs

In addition to the investigation of interaction preferences from the ligand point-of-view, the binding sites of each aaRS were analyzed regarding the residues that form interactions with the amino acid ligand. Because each aaRS is backed by multiple proteins from diverse organisms with considerably divergent sequences, we devised a computational abstraction to allow the reader to infer amino acids of individual proteins via a structure-driven multiple sequence alignments (MSAs) (see "Methods" section). Original sequence numbers for each position can be inferred with the mapping tables published along with this manuscript (see Data Availability). Each row in the table corresponds to the artificial sequence position, whereas each column gives the original position for each structure in our dataset as defined by the PDB. Figure 4A shows a sequence logo59 representation of binding site interactions for AlaRS. Each colored position in the sequence logo represents interactions occurring at this position. Highly conserved interactions can be observed at renumbered position 135. The corresponding hydrogen bond and salt bridge interactions are formed with the backbone atoms of the ligand. On the protein side, this interaction is mediated by a conserved arginine residue that corresponds to the N-terminal residue of the previously described Arginine Tweezers motif43. Another prominent interaction is formed by valine at renumbered position 293. This residue interacts with the beta carbon atom of the alanine ligand via hydrophobic interactions. In some structures, this hydrophobic interaction is complemented by an alanine residue at renumbered position 325. Aspartic acid at renumbered position 323 is highly conserved in AlaRSs and seems to be involved in amino acid fixation via hydrogen bonding of the primary amine group. Overall, the specificity-conferring interactions with the small side chain of alanine are hydrophobic contacts. An example for amino acid recognition in AlaRSs is given in Fig. 4B. The structure of bacterial Escherichia coli AlaRS forms the whole array of observed interactions. Sequence logos of the remaining aaRSs are given in SI Appendix Figs. S4S24. Based on the interactions between binding site residues and the ligand, a qualitative summary of specificity-conferring mechanisms and key residues was composed (Table 2). Moreover, the ligand size and count of observed interactions was checked for dependence. There is a weak but significant positive correlation between the average number of interacting binding site residues for each aaRS and the number of all non-hydrogen atoms of the amino acid ligand (Pearson r=0.32, p<0.01). This indicates that the number of formed interactions generally increases with ligand size. However, smaller amino acids do not necessarily have a less complex recognition pattern. ThrRSs, for example, bind their amino acid ligand with on average more than a dozen binding site residues, while ValRSs employ on average five binding site residues. The hydroxyl group of threonine allows for an extended range of non-covalent interactions to be formed with binding site residues compared to valine, where only hydrophobic contacts can be established. Distributions of interacting binding site residues for each aaRS are given in SI Appendix Fig. S25.

Table 2 Overview of specificity-conferring recognition mechanisms for all aaRSs grouped by aaRS class and subclass15. Only interactions with side chain atoms of the amino acid ligand were included in this summary. HB is hydrogen bond, SB is salt bridge, HP is hydrophobic, MC is metal complex, and PS is \(\pi \)-stacking interaction. Correspondences between interactions and residues are indicated by superscript letters. Entries in parentheses were only observed in certain structures and are no general pattern. (*) Residue numbers are given according to the respective MSA (see "Methods" section). Original residue numbers can be inferred with tables published along with this manuscript (see Data Availability).
Figure 4
figure 4

Interaction patterns of AlaRS. (A) Sequence logo59 of representative sequences for AlaRSs. Non-covalent interactions with the amino acid ligand occurring at certain positions are indicated by colored circles. Filled circles are interactions with the side chain atoms, while hollow circles are interactions with any of the backbone atoms of the amino acid ligand. Blue is hydrogen bond, yellow is salt bridge, gray is hydrophobic interaction. (B) Depiction of interactions in the binding site (blue stick model) of an AlaRS from Escherichia coli (PDB:3hxz chain A) with its ligand (orange stick model). Here, hydrogen bonds (solid blue lines) and hydrophobic interactions (dashed gray lines) are established. The sequence positions of the interacting residues are given in accordance to the MSA (black) as well as the original structure (red). Figure created with PyMol60. Double bonds are indicated by parallel line segments, aromatic bonds by circular dashed lines.

Quantitative comparison of ligand recognition

To allow for a quantitative analysis and comparison of ligand recognition between several aaRSs, interaction and binding site features were represented as binary vectors, so-called interaction fingerprints (see "Methods" section). Based on these fingerprints, the Jaccard distance was computed for each pair of structures to represent the dissimilarity in ligand recognition. Subsequently, the Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) algorithm61 was used for dimensionality reduction and embedding of the high-dimensional fingerprints into two dimensions for visualization. This embedding is considered to be the recognition space of aaRSs. The two-dimensional visualization of this recognition space (Fig. 5) can be seen as a map describing the similarity in ligand recognition across all aaRSs. Thereby, each data point corresponds to a single amino acid binding site that was characterized by interaction and binding site features. In general, a similar recognition mechanism between two aaRSs can be assumed if they are located close to each other in this map. The more distant two aaRSs are from each other, the less similar their amino acid recognition. However, it has to be noted that the applied dimension reduction does not perfectly conserve distances. Figure 5A shows the embedding results for all aaRSs in the dataset colored according to the aaRS classes. A Principal Component Analysis (PCA) of the same data is given in SI Appendix Fig. S26. For each aaRS the average position of all data points in the embedding space was calculated and is shown as one-letter code label. Figure 5B shows the same data colored according to the physicochemical properties of the amino acid ligand, i.e. positive (lysine, arginine, and histidine), aromatic (phenylalanine, tyrosine, and tryptophan), negative (aspartic acid and glutamic acid), polar (asparagine, cysteine, glutamine, proline, serine, and threonine), and unpolar (glycine, alanine, isoleucine, leucine, methionine, and valine).

Figure 5
figure 5

Recognition space analysis of all aaRSs. (A) Embedding61 space of interaction fingerprints for all aaRS structures in the dataset. Scaling is in arbitrary units. The data points are colored according to the aaRS class. One letter code labels are given for each aaRS based on the averaged coordinates in the embedding space. An asterisk indicates the non-standard amino acids phosphoserine (J*) and pyrrolysine (O*). (B) Embedding space of interaction fingerprints for all aaRS structures in the dataset except phosphoserine and pyrrolysine. Scaling is in arbitrary units. One-letter codes of amino acid ligands are used to identify each aaRS. Every data point represents an individual protein-ligand complex. The color of the data points encodes the physicochemical properties49 of the ligand.

Class I

In terms of amino acid binding both aaRS classes seem to employ different overall mechanism; they separate almost perfectly in the embedding space. Especially aromatic amino acid recognition in Class I tryptophanyl-tRNA synthetases (TrpRSs) and tyrosyl-tRNA synthetases (TyrRSs) is distinct from Class II aaRSs and forms two outgroups in the embedding space. Remarkably, two different recognition mechanisms exist for TrpRSs, indicated by two clusters approximately at positions (−2.0,6.0) and (1.0,8.5) of the embedding space, respectively. The cluster at position (−2.0,6.0) is formed by structures from bacteria and archaea, while the cluster at position (1.0,8.5) is formed by eukaryotes and archaea and is in proximity to TyrRSs. Closer investigation of two representatives from these clusters shows two distinct forms of amino acid recognition for TrpRSs. Human aaRSs employ a tyrosine residue in order to bind the amine group of the indole ring, while prokaryotes employ different residues (SI Appendix Fig. S27). The Class I aaRSs that are closest to Class II are GluRSs and CysRSs. A cluster of high density is formed by Class I IleRS, MetRS, and ValRS, which handle aliphatic amino acids. This indicates closely related recognition mechanisms and difficult discrimination between these amino acids.

Class II

For Class II aaRSs the recognition space is less structured. Nonetheless, clusters are formed that coincide with individual Class II aaRSs, e.g. a distinct recognition mechanism in AlaRSs. The aaRSs handling the small and polar amino acids threonine, serine, and proline are closely neighbored in the embedding space. Recognition of GlyRSs seems to be diverse; GlyRSs are not grouped in the embedding space. However, the recognition of glycine, which has no side chain, is limited by definition and thus the fingerprinting approach might fail to capture subtle recognition features. AspRSs and AsnRSs are located next to each other in the embedding space. Their recognition mechanisms seem to be very similar as the only difference between these two amino acids is the carboxylate and amide group, respectively.

Mechanisms that drive specificity

In order to quantify the influence of different aspects of binding site evolution on amino acid recognition by aaRSs, different interaction fingerprint designs were compared against each other. Each design includes varying levels of information and combinations thereof: the sequence composition of the enzyme’s binding site (Seq), non-covalent interactions formed between side chains of the enzyme’s binding site and the amino acid ligand (Int), whether pre- or post-transfer correction (i.e. “editing”) is conducted (Ed), and the overall volume of the enzyme’s binding cavity (Vol). To assess the segregation power of each fingerprint variant, the mean silhouette coefficient62, a quantification for the error in clustering methods, over all data points was calculated. This score allows to assess to which extent the recognition of one aaRS differs from other aaRSs and how similar it is within its own group. Perfect discrimination between all amino acids would give a value close to one, while a totally random assignment corresponds to a value of zero. Negative values indicate that the recognition of a different aaRS is rated to be more similar than the recognition of the same aaRS. Figure 6 shows the results of this comparison. When using fingerprints describing the sequence composition of the enzyme’s binding site (Seq\(_\text {sim}\)), the mean silhouette coefficient over all samples is −0.0510, which indicates many overlapping data points and unspecific recognition. By including non-covalent interactions (Seq, Int) the value increases to 0.1361. If pre- or post-transfer correction mechanisms are considered (Seq, Int, Ed), the silhouette coefficient improves further to 0.2731. Adding information about the binding cavity volume (Seq, Int, Ed, Vol) slightly increases the quality of the embedding to 0.2757. The silhouette coefficients for error correction and volume-based fingerprints were calculated as baseline comparison. If only pre- or post-transfer correction mechanisms (Ed) are considered the mean silhouette coefficient amounts to −0.3027. For binding cavity volume (Vol) the mean silhouette coefficient is −0.4682.

Relation to physicochemical properties of the ligands

In order to investigate whether the fingerprinting approach is a simple encoding of the physicochemical properties of the amino acids, the results were related to experimentally determined phase transfer free energies for the side chains of amino acids from water (\(\Delta G_{w>c}\)) and vapor (\(\Delta G_{w>c}\)) to cyclohexane3,63. These energies are descriptors for the size and polarity of amino acid side chains and underlie both, the rules of protein folding and the genetic code64. The Spearman’s rank correlation between pairwise distances for each aaRS in the recognition space and physicochemical property space is weak with \(\rho \)=0.2564 and p\(<0.01\) (see SI Appendix Fig. S28). This indicates that the fingerprinting approach used in this study is a true high-dimensional representation of the complex binding mechanisms of amino acid recognition in aaRSs. This assumption is supported by a PCA (SI Appendix Fig. S26) of the fingerprint data, where the first two principal components account for only 9.24% and 8.44% of the covered variance, respectively.

Figure 6
figure 6

Comparison of different fingerprint designs that include the sequence composition of the enzyme’s binding site (Seq), non-covalent interactions formed between side chains of the enzyme’s binding site and the amino acid ligand (Int), pre- or post-transfer correction (i.e. “editing”) mechanisms (Ed), and volume of the enzyme’s binding cavity (Vol). Simple sequence-based fingerprints (Seq\(_\text {sim}\)) are a 20-dimensional representation of binding site composition. The line plot shows the silhouette coefficient62 for each embedding. Points represent mean values, error bars are calculated based on all silhouette coefficients for each data point.

Discussion

The correct recognition of individual amino acids is a key determinant for evolutionary fitness of aaRSs and considered to be one of the major determinants for the closure of the genetic code10. The results of this study emphasize the multitude of mechanisms that lead to the identification of the correct amino acid ligand in the binding sites of aaRSs. Based on available protein structure data, a thorough characterization of binding site features and interaction patterns allowed to pinpoint the most important drivers for the correct mapping of the genetic code. The main findings of this analysis can be summarized as follows: (i) Class I and Class II aaRSs employ different overall strategies for amino acid recognition. (ii) Interaction patterns and binding site composition are the most important drivers to mediate specificity. However, very similar amino acids require additional selectivity through steric effects or editing mechanisms. (iii) The analysis of interaction fingerprints suggests that error-free recognition is a delicate task demanding a complex interplay between binding site composition, interaction patterns, editing mechanisms, and steric effects. The results point towards a gradual diversification of amino acid recognition and, hence, a gradual extension of the genetic code.

Genetic code formation

We propose that the ancient aaRS binding sites might have formed the structural basis of the genetic code on the protein side. The exploration of stereochemical possibilities in the binding sites of aaRSs was likely to be vital for a stable and successful integration of amino acids into the genetic code. For the case of an RNA world, nucleotide sequences that bind specific amino acids were already suggested65. However, the limited conformational and catalytic repertoire of RNA molecules66 is an underestimated factor. The amino acid binding sites of ancient aaRS precursors could have created a much broader recognition space, which was gradually gaining more complexity upon the addition of new amino acids to the translational system. Combined with the findings of our previous structural analyses43, a modularity may be proposed for the substrate binding in aaRSs. This modularity allowed to re-use recognition patterns across different aaRSs, yet achieving a sufficient separation of the amino acid entities. ATP fixation differs substantially between the aaRS classes, implemented as the Backbone Brackets for Class I and the Arginine Tweezers for Class II. Binding of the amino acids still shows a general trend to employ different kinds of interaction for the two classes. Nonetheless, recognition of each amino acid is realized less class-specific, but more determined on slight differences between this highly specific ligand part. We find these considerations being compatible to the idea of sterochemically driven genetic code formation39 and support the hypothesis that peptides and RNA coexisted and complemented each other from the very beginning32,33,66,67.

Generation of orthologous aaRS·tRNA pairs

According to our analysis and the representation of binding site features in a high-dimensional vector space, the recognition space of aaRSs seems to be not yet fully explored, i.e. there are “blank areas” (Fig. 5). Whether these spots are tangible to the enzymes by binding site evolution can only be speculated. However, engineering aaRS·tRNA pairs in order to create an artificially extended genetic and subsequently to generate novel biopolymers is of high interest68,69. Beside the requirement of new codons and engineered ribosomes with broader substrate compatibility, the choice of an appropriate aaRSs·tRNA pair is of great importance. The major goal at the aaRS level is hereby to engineer specificity towards the new substrate but not to interfere with canonical aaRSs. According to our analysis there are several interesting candidates which are separated from other aaRSs in terms of their amino acid recognition as described by the high-dimensional fingerprints (see Fig. 5), namely bacterial TrpRSs, AlaRSs, HisRSs, GlnRSs, and LeuRSs. These aaRSs, especially bacterial TrpRSs (SI Appendix Fig. S27), form distinct clusters in the recognition space analysis and thus might be interesting targets for directed evolution of binding sites. TrpRSs were already successfully used to accomplish this goal70. We envision that an approach similar to the one presented in this study, might be helpful to estimate the success for generating novel aaRSs binding sites in silico. The characterization of key interactions for each aaRS (SI Appendix Fig. S4S24) provides a valuable resource for predicting which mutations in the binding site are expected to alter specificity.

Coupling between tRNA and amino acid recognition

The specific detection of the cognate amino acid by aaRSs investigated here is only part of the whole reaction. aaRSs need to discriminate the tRNA molecule as well, to ensure correct coupling of amino acid and tRNA. This happens based on the anticodon and the acceptor stem of the tRNA, being recognized by the aaRS anticodon binding domain and the catalytic domain, respectively. Mischarged tRNAs due to failed cognate tRNA detection can hardly be corrected by the respective aaRS, but mistranslation may still be avoided by cross-editing of the cognate aaRS71. The evolutionary older72 acceptor stem is highly necessary for specificity, whereas tRNAs are still correctly detected when the evolutionary younger anticodon information is masked73. Additionally, certain informative nucleotide motifs in the tRNA are relevant for the aaRS to couple the cognate tRNA and amino acid, differing only in as few as one position between the 20 types15,73,74. These specific discrimination are thought to have been incorporated and extended over time as new amino acids joined the genetic code75. Combined with our results on amino acid recognition, it is therefore conceivable that aaRS specificity towards the cognate amino acid and tRNA developed simultaneously.

Class duality extends possibilities

The aaRS class duality allowed to broaden the amino acid recognition space significantly. In general, the recognition of amino acids with low side chain complexity seems to be complemented by allosteric interactions and cannot be exclusively implemented by configuring side chains. Although the volumes of Class I and Class II binding sites differ significantly, they are probably not the major determinants for amino acid selectivity. In general, Class I aaRSs handle most of the hydrophobic and larger amino acids3 and thus the binding site volume of Class I aaRSs is expected to match the volumes of their larger ligands. Nonetheless, binding site volume and geometry may act as additional layers of selectivity. An example are the two negatively charged amino acids glutamic acid and aspartic acid, handled by a Class I and Class II aaRS, respectively. In this case, overall interactions are highly similar but binding geometry and binding site volume is significantly different. Both ligands are attacked from the opposite side76 as highlighted by significantly different conformations (Fig. 3B). There is evidence that both amino acids were among the first to exist in the prebiotic context37,77,78,79,80,81. It is conceivable that the discrimination between glutamic and aspartic acid was based on tertiary contacts between secondary structures elements and size selectivity rather than on specific side chain interactions82. The recent identification of a protein folding motif83 strengthens this assumption. This is further supported by the observation that ancient proteins, based on a limited set of amino acids, were still capable to exhibit secondary structures81,84,85. One can only speculate whether a simultaneous emergence of two different aaRS classes and secondary structure formation allowed to incorporate these early – but highly similar – amino acids into the genetic code. An interesting hypothesis for the existence of two aaRS classes describes that ancestral Class I and II aaRS paired together with a tRNA molecule, contacting it from the opposite site to form a ternary complex. These pairs can be traced to belong to the now known aaRS subclasses86. This coincides with GluRS and AspRS, belonging to subclass Ib and IIb, respectively. The distribution of the aaRSs in Fig. 5 might support this idea of the development of two symmetrical classes as well, since the proposed pairs86 are separated from each other. According to the biochemical pathway hypothesis77, GluRS and AspRS might have been the first Class I and Class II representatives, with other aaRSs evolving from them77,87. However, the decreased usage of aspartic acid and the enrichment of glutamic acid in modern species, compared to the LUCA, points towards a different direction88. According to these usage frequencies, aspartic acid was incorporated into the genetic code prior to glutamic acid. This temporal order was equally concluded by the evaluation of various criteria to derive a consensus order of amino acid appearance89.

Glutamine and asparagine followed glutamic acid and aspartic acid

Glutamine and asparagine are chemically closely related to glutamic and aspartic acid, respectively. It is likely that GlnRSs6 and AsnRSs7 mutually co-evolved from the evolutionary old GluRSs and AspRSs through recent gene duplication and were distributed via horizontal gene transfer (HGT)15, 90. A theory for this fast change in recognition was recently proposed by Carter et al.91, according to which the HGT resulted in distinct clades in the phylogenetic tree of aaRS, distinguishing an aaRS like GlnRS strictly from the others. This resulted in a rapid change of specificity towards the amino acid ligand. In contrast, older aaRSs did not go through HGT, allowing their sequences to “wander” more during evolution, leading to clades which are not as clearly separated from each other and therefore inferior in discriminating specific amino acids91. Although the ligands of GluRS and GlnRS are rather similar, interaction patterns and binding site compositions differ between these two enzymes. These differences coincide with the analysis of the recognition space (Fig. 5), where GluRSs and GlnRSs are not neighbored in the embedding. Hence, they evolved to distinguish between these amino acids without editing mechanisms92 or the exploitation of the negative charge of glutamic acid93,94. The discrimination of glutamine and glutamic acid by GlnRS cannot be attributed entirely to the composition of the binding site; changing the specificity from glutamine and glutamic acid could not be achieved by mutating only first order binding site residues95. This emphasizes the role of subtler interactions and allosteric effects within the catalytic domain as it was shown to be the case for TrpRS96. In contrast to the observed differences between GlnRS and GluRS, AspRS and AsnRS are directly neighbored in the embedding space and share a greater similarity in their recognition mechanism. However, as for GlnRS and GluRS, the discrimination between aspartic acid and asparagine is not entirely driven by specific interactions with binding site residues. Correct recognition depends on a water molecule that forms water-assisted hydrogen bonding between a binding site leucine in AsnRS and the amide group of the activated asparagine. Additionally, specificity of AsnRS to discriminate asparagine against aspartic acid is supported by two water molecules, forming the binding pocket to perfectly fit the asparagine side chain48. Although such indirect aspects may not be detected with our interaction-based investigation of static structures, they still contribute to specificity. The vicinity in the recognition space might be due to the limitation of interaction data, since water bridges were excluded from our analyses. Multiple aaRS structures were not determined with co-crystallized water molecules, making a detection with PLIP impossible. To avoid an overall bias due to this imbalance, water-mediated interactions were not considered during analysis. We conclude that for both Class I GlnRS/GluRS and Class II AsnRS/AspRS, the role of allosteric effects and other subtle interactions should not be underestimated.

Distinct recognition of arginine and lysine

Another interesting example are the two positively charged amino acids lysine and arginine. Interaction data suggests two unrelated ways to achieve ligand recognition in Class II LysRSs and Class I ArgRS, i.e. the two enzymes are well separated in the embedding space. The poor editing capabilities for LysRS regarding arginine97 might have required a good separation of the two recognition mechanisms. Even if a relation of ArgRSs to aaRSs of hydrophobic amino acids was proposed98, a separate subclass grouping for ArgRSs15 seems to be reasonable and is in accordance with the observed data; the recognition mechanism differs substantially from the hydrophobic amino acids. Furthermore, based on the consensus of all analyzed ArgRS structures, the characteristic Class I HIGH motif4 seems to play an important role for stabilization of the arginine ligand in pre-activation state (see SI Appendix Fig. S4). For both histidine residues of the HIGH motif highly conserved salt bridges are observed that bind to the carboxyl group of the ligand.

Glycine recognition is not interaction-driven

Based on interaction data, the recognition of the smallest amino acid glycine seems to be rather unspecific; a large spread in the embedding space can be observed for individual protein-ligand complexes of GlyRS. This is to be expected as GlyRS is known to maintain its specificity not due to interactions with glycine – it has no side chain to interact with – but rather due to active site geometry that blocks larger amino acids10, 99.

Alanine recognition is crucial

Alanine is the second smallest amino acid with only a single heavy side chain atom. The idiosyncratic architecture of AlaRS is different from other Class II aaRSs100. Still, the confusion with glycine and serine55, or non-proteinogenic amino acids8, poses a challenge for correct recognition of alanine and a loss of specificity is associated with severe disease outcomes101. The recognition mechanism in AlaRSs seems to differ substantially from other Class II aaRSs (see Fig. 5), indicating evolutionary endeavor to develop a unique recognition mechanism.

Discrimination of hydrophobic amino acids requires editing

The hydrophobic amino acids isoleucine, leucine, valine, and methionine likely entered the genetic code at the same time20,37,98. The highly similar interaction patterns for IleRS, ValRS, and MetRS substantiate this assumption. Due to their difficult discrimination, editing functionality is key5,56,92,102,103 for these aaRSs.

Tryptophan recognition suggests late addition to the genetic code

The emergence of TrpRSs and TyrRSs is considered to have happened at a later stage of evolution. The two aaRSs are likely to be of common origin42,104 and constitute their own subclass, which is supported by sequence and structure studies15,18,19,105,106. PheRS supposedly evolved from the same precursor as TrpRS and TyrRS21. In general, TrpRSs and TyrRSs separate well from other aaRSs in the recognition space, which is likely due to the unique utilization of \(\pi \)-stacking interactions with binding site residues. Beside specific interactions in the binding site, allosteric effects and interdomain cooperativity107,108 are drivers for TrpRS specificity. Furthermore, mutations in the dimerization interface of TrpRSs were shown to reduce specificity96. Remarkably, two distinct ways of recognition are apparent for TrpRSs in bacteria and eukaryotes. These differences support the previous described separation of eukaryotic TrpRSs and TyrRSs from their prokaryotic counterparts109 and late addition of these amino acids to the genetic code110. However, structures from archaea do not follow this pattern and feature both recognition variants.

Methods

Data acquisition

The dataset from our last study43 served as the basis for all analysis. As all structures in the dataset are annotated with ligand information, only entries containing ligands relevant for amino acid recognition were considered, i.e. they bind to the specificity-conferring moiety of the binding site (see Fig. 1). Every protein chain of the entry was considered that: (i) comprises a catalytic aaRS domain, (ii) contains a co-crystallized specificity-relevant ligand in the active site, and (iii) the ligand must contain an amino acid substructure. Filtering of the data resulted in 189 (235) structures for Class I (Class II) aaRSs that contain ligands with relevance for specificity. The number of structures in respect of the pre- or post-activation state of the catalyzed reaction is shown in SI Appendix Fig. S3. Furthermore, sequences of the dataset entries were clustered using single-linkage clustering with a sequence identity cutoff of 95% according to a global Needleman-Wunsch111 alignment with BLOSUM62 substitution matrix computed with BioJava112. Representative chains for each cluster were selected, preferring wild type and high-quality structures. In total, 47 (54) protein chains were selected to be representatives for Class I (Class II) aaRSs. The dataset covers structures of all known aaRSs from species across all kingdoms of life (SI Appendix Fig. S1).

Mapping of sequence positions

Amino acid sequences were derived from the set of representative structures of the respective aaRS. To allow a unified mapping of sequence positions, an MSA was computed for each aaRS using the T-Coffee113 Expresso pipeline. The quality of each MSA in the specificity-conferring region of the binding site was assessed regarding the correct mapping of the Backbone Brackets and Arginine Tweezers structural motifs43, and the conservation of the respective sequence signature motifs4,22. All MSAs preserved the considered regions and passed the quality checks. The sequence positions for each aaRS were then unified according to the resulting MSA in order to investigate conserved interaction patterns. For this purpose the custom script “MSA PDB Renumber”, available under open-source license (MIT) at github.com/vjhaupt, was used.

Annotation of non-covalent protein-ligand interactions

Non-covalent protein-ligand interactions were annotated for all entries in the dataset that contained a valid ligand using PLIP v1.3.346 with default parameters.

Determination of interactions relevant for specificity

Only interactions formed between the amino acid substructure of the ligand and binding site residues were considered for analysis. For this purpose subgraph isomorphism detection with the RI algorithm50 was applied. The RI implementation of the SiNGA framework v0.5.0114 was used. Each amino acid scaffold was represented by a graph created from the amino acid’s SMILES string taken from PubChem115. The full amino acid graph was modified using MolView v2.4 (available at molview.org) in order to remove the terminal hydroxyl group, which is cleaved during the enzymatic reaction and must thus be ignored for subgraph matching. For each dataset entry that contained a valid ligand, the corresponding amino acid graph was matched against the ligand in order to identify the atoms involved in the formation of specificity-conferring interactions. A depiction of the workflow to determine specificity-conferring interactions is given in Fig. 7.

Figure 7
figure 7

The identification of specificity-conferring interactions in SerRS. For each aaRS a pattern graph is used to map interactions. This patterns graph resembles the amino acid without its terminal hydroxyl group and is matched against the full ligand with annotated interactions using subgraph isomorphism detection50. The interactions formed between matched atoms and binding site residues are considered to be specificity-conferring interactions.

Generation of interaction fingerprints

To allow for a quantitative comparison of recognition mechanisms, each protein-ligand complex was represented by a structure-invariant binary interaction fingerprint (see for example the paper of Salentin et al.45 about the idea of interaction fingerprinting). Different fingerprint designs were chosen for comparison: a simple 20-dimensional fingerprint on binding site composition and a 500-dimensional fingerprint based on binding site composition and interaction information. The latter was further enriched with editing and binding site volume information.

Simple binding site based fingerprints

Binary and structure-invariant fingerprints that represent binding site compositions (used as baseline for the comparison of different fingerprint designs, Fig. 6) were constructed as follows. Each residue predicted to be in contact with any specificity-relevant atom of the ligand was considered for fingerprint generation. A 20-dimensional binary vector was used to represent the occurrence of individual residue types in the binding site. For each of the interacting residues the corresponding bit was set to active. Hence, multiple occurrences of the same residue type were not taken into account.

Binding site and interaction-based fingerprints

Single three-dimensional vectors of non-covalent interactions were encoded into a binary vector by considering the type of interaction, the interacting group in the ligand and the interacting amino acid residue. One such feature could be a hydrogen bond between an oxygen atom in the ligand and tyrosine in the protein. Each of these features is hashed to a number between 1 and 500 so that the resulting fingerprint has 500 bits.

Encoding of editing mechanisms and binding site volume

Information about the editing mechanisms performed by some aaRSs were taken from the paper of Perona and Gruic-Sovulj51 and encoded by appending a 22-dimensional bit vector to the 500-dimensional fingerprint. Each active bit represents a ligand against which editing is performed, e.g. for structures of ThrRS the bit for serine is set. In addition to editing information the binding site volume, estimated with the POVME58 algorithm, was encoded. Twelve bins were created that represent binding site volumes ranging from 30–270 Å\(^3\) in steps of 20 Å\(^3\). For example, if a structure has a binding site volume of 45 Å\(^3\) the first bit was set to active. For a binding site volume of, e.g., 52 Å\(^3\) the second bit was set to active and so on. The fingerprints were concatenated to contain the binding site and interaction features (500 bits), editing mechanisms (22 bits), and binding site volume (12 bits). The final fingerprint has a size of 534 bits.

Embedding of interaction fingerprints

To allow for a quantitative comparison of the interactions between individual aaRSs, the high-dimensional interaction fingerprints were embedded using UMAP version 0.3.261. The parameters for all embeddings given in this manuscript were set as follows: \(|\texttt {n}_\texttt {neighbors}| = 60\), \(|\texttt {min}_\texttt {dist}| = 0.1\), \(|\texttt {n}_\texttt {components}| = 2\). The Jaccard distance was used to describe the dissimilarity between two fingerprints a and b:

$$\begin{aligned} d(a,b) = 1 - \frac{n_{a \wedge b}}{n_a + n_b - n_{a \wedge b}} \end{aligned}$$
(1)

with \(n_{a \wedge b}\) being the count of active bits common between fingerprints a and b, \(n_a\) the number of active bits in fingerprint a, and \(n_b\) the number of active bits in fingerprint b. This distance metric was used as input for UMAP.