The pocketome of G-protein-coupled receptors reveals previously untargeted allosteric sites

G-protein-coupled receptors do not only feature the orthosteric pockets, where most endogenous agonists bind, but also a multitude of other allosteric pockets that have come into the focus as potential binding sites for synthetic modulators. Here, to better characterise such pockets, we investigate 557 GPCR structures by exhaustively docking small molecular probes in silico and converting the ensemble of binding locations to pocket-defining volumes. Our analysis confirms all previously identified pockets and reveals nine previously untargeted sites. In order to test for the feasibility of functional modulation of receptors through binding of a ligand to such sites, we mutate residues in two sites, in two model receptors, the muscarinic acetylcholine receptor M3 and β2-adrenergic receptor. Moreover, we analyse the correlation of inter-residue contacts with the activation states of receptors and show that contact patterns closely correlating with activation indeed coincide with these sites.

sites. Blockade of the β 1 AR in heart by beta-blockers is desired for cardiovascular disease, 23 but antagonising the β 2 AR in lung tissue is detrimental for asthma. Conversely, stimulation 24 of the β 2 AR helps asthma patients but potentially damages their heart through concomitant 25 agonism of the β 1 AR. 26 As a possible way of circumventing this challenge of highly similar pockets, the targeting 27 of allosteric pockets is billed as a sensible alternative. 3 Due to the nature of GPCRs as 28 bundles of seven transmembrane helices that are only relatively loosely coupled, 4 one could 29 indeed expect that a ligand binding to one of these pockets is able to modulate the response 30 of a receptor. Moreover, it is generally claimed -but has never been shown -that these 31 alternative pockets share lower sequence homology. 3 There are examples of individual ligands 32 binding to non-orthosteric sites on a few receptors (e.g. refs [5][6][7][8] ), but it is currently unknown 33 to what extent such binding sites exist across the receptorome and how different or similar 34 they are in shape and sequence. 35 In this work, we therefore identified and analysed the ensemble of all discernible pockets 36 -the "pocketome" -of 557 GPCR structures of 113 different receptors. We discovered 37 potential pockets by exhaustive docking of small molecular probes, taking into account the 38 different electrostatics of the solvent-exposed and transmembrane parts of the receptors, and 39 compared these data across all receptors. 40 Based on class A and B1 structures in active and inactive conformations, we computed 41 residue contacts including both backbone and side chain atoms. In doing so, we identified 42 interhelical residue contacts crucial for an active or inactive state of both class A and class 43 B1 GPCRs (we follow the nomenclature in IUPHAR's "Guide to Pharmacology" and refer 44 to "classes" of GPCRs rather than "families"). We were then able to show that known and 45 orphan allosteric pockets contain such contacts of importance, speaking to the likelihood 46 of their functional relevance. These computational investigations were strengthened with 47 experimental studies of two model class A receptors, the muscarinic acetylcholine receptor 48 M 3 (M 3 R) and the β 2 AR. Through mutations of two pockets that have not been targeted by 49 a synthetic ligand before, we demonstrate that the residues forming these pockets are indeed 50 involved in receptor activation. Last, but not least, we compare the sequence similarity of 51 the most frequently occurring pockets, thereby providing a quantitative assessment of their 52 overall selectivity potential. 53 This represents the currently most exhaustive analysis of the GPCR pocketome, span- 54 ning receptors from classes A, B1, B2, C, D1, and F. In the present manuscript, we will 55 first describe the receptor alignment and quality control procedure. Second, the docking of 56 the small probes as well as the aggregation of all probe poses to densities will be explained. 57 We will then describe several of the known and as-of-yet-untargeted ("orphan") sites (ab-58 breviated as KS and OS, respectively, in the following), and show mutational data speaking 59 to the biological relevance of two of the OS. Finally, we will discuss the implications of our   On average, each of the obtained maps consisted of 1 000 000 up to 3 500 000 volume elements.

87
Since we wanted to investigate the density maps for trends across the different receptor  The class-specific density maps provided with this work can be visualised using Pymol (see 93 supplementary Pymol density [.dx] files and README) and might aid a reader with the 94 following description. Said density maps reveal multiple contiguous regions that represent 95 common cavities on the surface of all GPCRs analysed in this study (Fig. 1). Particularly 96 for class A GPCRs, these pockets are distributed in a notably symmetric manner: both at 97 the intra-and extracellular end of the 7TM bundle, pockets can be seen between each pair of 98 adjacent helices. The density maps for the other classes are somewhat less well-defined and 99 more scattered overall. This is owed to the lower numbers of structures and therefore poorer 100 statistics, as individual structures -and possible deviations in them -carry a relatively 101 higher weight than for the more numerous class A structures.

102
Here, we present only those pockets that we will discuss and examine in depth, whereas 103 the rest of them is described in the Supplementary Information. We chose to focus on three 104 of the largest and -by our analysis -best-defined orphan sites and contrast them with an 105 equal number of known sites, which we picked because they are clearly defined and because 106 they host synthetic ligands. While the vast majority of sites defined by the densities is 107 located at the outward-facing receptor portion (i.e. receptor residues in contact with the 108 membrane), we also were able to identify regions of density inside the 7TM bundle. Of note, 109 in each class, the orthosteric (ORTHO) and adjacent secondary binding pockets (ORTHO1 110 and ORTHO2) can clearly be discerned, making it plausible that the other pockets identified 111 in this work could also host ligands. By aligning our density maps with each other, one can 112 see that the average orthosteric pocket for class C receptors is significantly deeper than for 113 the one of class B1, which again is slightly deeper than the one in class A. This is perfectly 114 consistent with experimental evidence. 9 Due to the overall higher flexibility and thus often 115 worse resolution of extra-and intracellular loops, pockets found within these regions will not 116 be further analysed or discussed. 117 Figure 1: Representative depiction of the GPCR pocketome. Cumulative densities for all class A GPCRs (orange volumes) are shown projected on the structure with PDB 1F88 (ice blue ribbon). The surface is indicated white transparent. Visible hotspots (pockets) located at the lipid-facing receptor portion around the 7TM bundle are labelled either as "OS" (Orphan Site) or "KS" (Known Site). A more detailed description of their location is provided in the text and Tab. 1. We note that OS3 was described in the most recent X-ray structure PDB 7M3J 10 during writing of this manuscript. We therefore re-labeled this site to KS12. Furthermore, OS4 was not found in the class A densities. Three known and three orphan pockets (red labels) are discussed in more detail in the text. The red sphere indicates the tip of HVIII and has been included for ease of orientation.
Comparing the densities on the outward-facing receptor portion for all analysed GPCR 118 classes, we assigned pocket identifiers to several volumes that appeared well-defined and 119 clearly distinct from their neighbouring densities. This facilitated later analysis and provided 120 the means for a common orientation and discussion. However, since not only the GPCR 121 structures themselves but also the density map shapes differ across the classes, the reader's 122 view on whether a particular region is an individual pocket might differ from ours. That being 123 said, our general conclusions are independent of any such small differences in definitions. The 124 full list of pockets is presented in Tab In order to provide evidence that it is possible to achieve modulation of receptor function 143 with a ligand binding to one of the allosteric pockets, we investigated to what extent these 144 pockets are formed by residues that also participate in contact patterns specific for an active 145 or inactive conformation of the receptor. The rationale is that residues which are involved 146 in crucial state-specific contacts are more susceptible to interference by a ligand. In . 11 The right panel shows the data re-calculated based on the points in the clearly separated active and inactive clusters. Each PC value for each PDB can be seen as a linear combination of variables (i.e. contacts) that represents the residue contact landscape of a structure in a condensed manner. The first two PCs shown here explained most of the variance across all structural data, hence represent the most interesting PCs for investigating differences between receptor states on the residue contact level.
states ranging all the way from structures classified as active to those classified as inactive,    show that it seems to be conserved across all GPCR classes. In order to further validate 196 this finding, we analysed the receptorome-wide sequence identity and similarity of residues show that the overall identity is considerably low, the similarity based on physicochemical 199 properties is much higher with an average value above 50% (Fig. S6). 200 We then investigated the interactions of known ligands with KS2 and compared them to    IV,V  A, B1, B2, C, D1, F A (6RZ5, 6RZ8, 6RZ9)  KS5  MP-LP OF III,IV,V  A, B1, B2, C, D1, F A (5KW2, 5O9H, 5TZY, 6C1Q, 6C1R,  Table 1: List of all pockets observed after probe docking, their approximate locations and structures in which they are visible. We note that OS3 was described in the most recent X-ray structure PDB 7M3J 10 during writing of this manuscript. We therefore re-labeled this site to KS12.  Table 2: Position of residues making up each of the sites discussed in more detail in the text and Supplementary Information (Ballesteros-Weinstein numbering 15 was chosen for an overall better comparability). The conservation of these residues across all analysed receptors is shown in Fig. S4 and S5.

215
In this section, we will focus on two orphan sites that could represent binding sites for al-216 losteric modulators. They are among the best-defined and largest-volume sites that emerged 217 from our analysis. A third site is discussed in the Supplementary Information. Since no 218 structural data of ligands binding to these regions is known yet, we will describe the pockets 219 based on their amino acid sequences and our class A and class B1 residue contact analysis.  In order to identify the impact of OS5 and OS9 on the biological function of the receptors, 237 we constructed double or quadruple mutants, where two or four, respectively, of the residues 238 that form these pockets in the M 3 R were changed (residues mutated in a particular mutant 239 are connected by a grey vertical bar in Fig. 3). These mutants were tested in a BRET-based 240 G protein activation assay as well as in a FRET-based β-arrestin2 recruitment assay. The 241 summary of the resulting data is shown in Fig. 3.   The same is true for OS9, where a similar right shift of G αq activation in the mutants 250 occurs ( Fig. 3 and S10B). Similarly, the extent of the recruitment of β-arrestin2 is substan-251 tially reduced for all mutations of OS9 ( Fig. 3 and S11), without major effects on the logEC 50 252 values, suggesting a major impact of the mutants on agonist efficacy. By demonstrating that 253 the partial agonist arecoline led to a greatly diminished G protein activation even under 254 saturating conditions (Fig. S10C), we confirmed that the mutations indeed strongly affect 255 the efficacy of muscarinic agonists.

256
To further support our findings, we also mutated individual OS5 and OS9 residues in the 257 β 2 AR. The results are summarised in Fig. 4 and full curves are shown in Fig. S12. 258 Figure 4: Experimental data for OS5-and OS9-mutants in the β 2 AR. The y-axis depicts the position of the pocket residues (Ballesteros-Weinstein numbering 15 ). The left panel shows normalised (to the wt) logEC 50 values, and the right panel the normalised amplitude of the extent to which β-arrestin2 was recruited. While in the left plot a value of 0 corresponds to no potency changes, a value of 1 in the right plot corresponds to no changes in efficacy compared to wt. The grayed out area indicates minimal and maximal potency shifts of multiple G αs wt measurements. The central panel depicts interhelical residue contacts of each of the mutated residues that are important in active (blue) or inactive (red) conformations of class A GPCRs in a similar manner to Fig. S9 (normalised PCA coefficient cut-off: 0.7). The point size correlates with the normalised PCA coefficient for a given contact and can be seen as a direct measurement of importance for a given state.
Similar to our findings for the M 3 R, our results show an increase in the logEC 50 values of 259 adrenaline-induced activation of two different biosensors (G αs and β-arrestin2) and a decrease 260 in efficacy relative to the β 2 AR wt. Our β 2 AR signalling results show that the logEC 50 is 261 increased for either G αs or β-arrestin2 or both for half the mutations across the OS5 and OS9 262 pockets. Similarly, for about half of the mutations, the efficacy for β-arrestin2 recruitment 263 is reduced. As was the case for the M 3 R, these results further strengthen the assumption 264 that OS5 and OS9 are indeed physiologically relevant.

265
In order to obtain deeper insight into the possible reasons behind the interference of 266 residues in OS5 and OS9 with GPCR activation, we compared the mutational data with our 267 class A contact analysis (middle panel of Fig. 3 and 4). For OS5, we found that the mutated 268 residues were frequently involved in conserved active and inactive state contacts. Starting

330
In our study, we have determined the occurrence of pockets across the largest part of the 331 currently available structural G protein-coupled receptorome. Because the ultimate goal 332 of this research is to identify ligands for these pockets through which receptors might be 333 modulated in their activity, we chose the docking of small molecular probes, i.e. chemically 334 valid compounds, as a method, rather than definitions of a pocket based on the protein 335 surface. Thus, we optimised our definition towards what a potential ligand would "see". 336 We explicitely accounted for the different environments of the various receptor portions, viz. 337 the membrane-embedded core and the solvent-exposed ends, by using a docking method that some sites are more conserved regarding their shape and physicochemical properties 349 than others. This is also borne out at the level of the presence of crystallisation additives in 350 the pockets. These characteristics therefore should be exploitable for the design of class-or 351 type-specific allosteric modulators, which has implications when designing ligands for such 352 pockets in order to avoid unintended polypharmacology. 353 We are convinced that the number of structures investigated is such that the general 354 trends observed, at least for class A and class B1, will hold even as new structures become 355 available. In fact, we reran our analysis pipeline shortly before submission, approximately 356 nine months after the first time, and did not observe noticeable changes in the pocket defi-  While we found all pockets that have previously been localised through structure determi-360 nation with a ligand, we also identified several that have not yet successfully been targeted, 361 at least according to publicly available data. To demonstrate that the two most prominent 362 "orphan pockets" OS5 and OS9 have potential as target sites for small-molecule modulators, 363 we mutated several of the residues lining these two pockets in two model GPCRs, the M 3 R 364 and the β 2 AR. In both cases, mutations had a robust effect on both G protein activation as 365 well as β-arrestin recruitment. Of note, we did not only observe substantial right-shifts of 366 up to more than 10-fold of the concentration-response curves for G protein activation, indi-367 cating the need for higher ligand concentrations to achieve similar levels of stimulation, but 368 also decreases in the maximum level of response for β-arrestin. These results indeed point 369 towards the modulatory potential of ligands binding at these sites. Moreover, given the fact 370 that logEC 50 -values were shifted in opposite directions for G αq and β-arrestin2 at the M 3 R, 371 one might speculate that pathway-selective ligands could be designed for these pockets. A 372 receptor region similar to OS9 has also recently been investigated in the angiotensin II type 373 1 receptor, 16 where it has been termed a cryptic site. As mutations of amino acids led to 374 changes in G αq and β-arrestin2 responses similar to what we show for our receptors, this 375 lends further credibility to our finding that OS9 is a pan-class A pocket.

376
In addition, we compared the experimental findings to our contact analysis, which is based 377 on the residue contact maps of 557 GPCR structures. We found that ligands addressing the 378 two pockets could potentially act as negative allosteric modulators (NAMs) by disrupting 379 contacts crucial for an active state. This was expected, since both OS5 and OS9 reside 380 near the G protein binding site, which is known for undergoing profound rearrangements 381 upon receptor activation. However, it also seems plausible that by strategically targeting 382 the inactive-state contacts or stabilising active-state contacts, one could potentially design 383 positive allosteric modulators (PAMs) for both pockets.

384
Our probe docking allows us to make observations also for several of the known pockets.

385
In the case of KS2, the available structural data from the FFAR1 and the PAR2 suggests   Similar rationales as for KS2 above can be applied to the design of ligands for the orphan 401 sites. Of course, in these cases, the challenge will be to unequivocally determine the binding 402 locations of such ligands and to design assays that are fast yet precise enough to be utilised 403 in their optimisation. We certainly hope that the three-dimensional atlas laid out in this 404 work will aid the community in achieving this goal. ware. 33 Here, incomplete residues were built utilising the "Structure Preparation" function.

440
Termini and chain breaks that contained only one atom were removed. The built-in method 441 "Protonate3D" was used to assign protonation states to histidine and cysteine residues. For 442 consistency, all other residues were assigned their most frequent protonation state under 443 physiological conditions.

444
Preparations were continued using CHARMM together with the CHARMM36 protein 445 force field. 34 Termini and breaks were capped by adding ACE and NME caps to the N-and 446 C-terminal ends, respectively. Hydrogen atoms were placed with the HBUILD command. In 447 order to remove too close van-der-Waals contacts, an energy minimisation was carried out for 448 each structure with a short 20-step steepest-descent optimisation followed by an adopted-449 basis Newton-Raphson optimisation until convergence. In order to keep as much original 450 structural information as possible, only the side chains of formerly incomplete residues and 451 the backbone and caps of terminal residues were allowed to move. Hydrogen atoms were 452 rebuilt using HBUILD again after all previous operations. Then, structures were aligned by 453 using the "cealign" algorithm as implemented in Pymol. Finally, structures were converted 454 to MOL2 file format using UCSF Chimera. 35 The correct CHARMM atom types and charges 455 were reassigned based on the information from the CHARMM PSF output file.  mCit-β-arrestin2-mTurq was analogously cloned as described for similar reference constructs 558 in Dorsch et al. 46 The M 3 R mutants were generated from these plasmids by mutagenesis 559 using the following primers:   Single-Cell FRET Imaging 608 A FRET-based assay was used to measure the agonist-induced interaction between β-arrestin2-609 mTurq and M 3 R-mCit wt/mutant sensors. 49 The measurements were performed as previ-  The measurements were additionally baseline-corrected for photobleaching, using Origin Pro For all class A and class B1 structures, residue contact maps were calculated only considering 666 the transmembrane portions. A contact was defined to occur when the distance between any 667 two atoms of two distinct residues was smaller than the sum of their van der Waals radii plus a 668 buffer distance of 0.5Å. In order to prevent sampling of local contacts which might introduce 669 noise at later stages of the analysis, contacts between residues less than four positions apart 670 in sequence and where one of the atoms involved was in the backbone were not considered.

672
In order to describe the residue contact distribution of each GPCR in a simplistic manner, 673 a class-specific contact fingerprint was calculated for every structure. First, for each class of 674 GPCRs, the set of all residue-residue contacts that occurred in at least one of its members 675 was compiled. Only contacts that were observable in all activation states were considered. 676 Further, this set of contacts was treated as a fingerprint in which the individual bits were set 677 to either 1 or 0 depending whether or not a particular contact occurred in a structure. Finally, 678 for each of our 557 structures, the appropriate class-specific fingerprint was calculated, the 679 aggregate of which was then used to determine activation networks.  For both classes, the PCA coefficients were used in order to estimate the importance of a 702 contact for a certain receptor state. Here, we focused on those contacts that were formed 703 between residues in KS2, KS5, KS8, OS5, OS6 and OS9. Contacts between two residues 704 that belong to the same helix were not considered. By investigating the re-calculated PCA 705 plots ( Fig. 2 and S2), each contact was considered either as an active or inactive contact 706 depending on the sign of its corresponding PCA coefficient. The PCA coefficients were 707 normalised to their highest absolute value such that they ranged from 0 (not important) to 708 ± 1 (important). For each pocket, the residue contacts together with their normalised PCA 709 coefficient were plotted. The datasets generated and analysed during the current study (e.g. aligned receptor struc-740 tures) are available from the corresponding author on reasonable request.

741
Code Availability

742
The structure retrieval pipeline is available from the corresponding author upon reasonable 743 request. SEED 4.1.2 is available from http://www.biochem-caflisch.uzh.ch/download.