Introduction

www.CartoonStock.com
It's shocking to realize that with all the advances in our molecular understanding of cellular processes, with all the new potential drug targets being discovered, and with all the new chemistries and screening systems available, there has still been no real increase—in fact, there has been a significant decrease—in the success rate for new drugs as they move through clinical trials to gain US Food and Drug Administration (FDA) approval1, 2. The published statistics are dismal, leading some to call for a radical shift in how we incentivize pharmaceutical companies in their strategizing of which drugs to put through clinical trials3. Of the many thousands of drugs that go into preclinical testing in animal models, only a few make it to testing in humans. For every five drugs that go into phase 1 trials, a large portion are knocked out in phase 1, and only one of these drugs will finally achieve FDA approval1, 2. Those drugs that make it past approval face the potential burden of toxic side effects that are only discovered when broadly 'tested' in the normal human population, and such drugs might even be withdrawn if the side effects are severe enough, such as was the case with rofecoxib (Vioxx)4.
On the bright side, such a low success rate might be viewed as the result of a vigilant regulatory process that carefully examines drugs from every vantage, ensuring the benefit-to-harm ratio for every drug adds up to more people getting better than suffering grievous outcomes. However, one is left with the nagging sensation that although the regulatory process is largely playing its appropriate role (please, no more regulation!), it is the input quality of the drugs at the start of the pipeline, combined with a lack of appropriate monitoring of drug effects during clinical trials, that is somehow at fault.
For the sake of wanting to start an argument—diplomat I am not—I am going to argue for a fundamental revamping of how drug screening is conducted and how drugs are assayed during preclinical and early clinical development. The current standard assay systems for drug screening, though often successful in the laboratory, are ultimately a source of discouragement and dead ends during development and clinical trials. I believe that huge advances in drug quality at the start of the pipeline could be achieved with a concerted focus on an early understanding of systematic drug effects in primary cells, and, critically, against networks (systems) of multiple proteins on a per-cell basis. The sooner drug screening moves from screening with purified proteins in vitro or with artificial cell lines on to understanding of drug effects in primary cells by assaying multiple markers concurrently, the better off the entire industry will be. The approach that I am positing here will have the added benefit of allowing an early look at drug pharmacodynamics and drug efficacy in cell populations with different disease markers during initial drug screening. With this clinically relevant information in hand from early-stage research, the progress of a drug can be monitored with those same assay systems (again, against a network of proteins) during the clinical trial and possibly after approval.
In the beginning...
We don't make drugs to save the lives of cell lines or to better the existence of bacterial extracts filled with overexpressed kinases. Why, then, do we screen our drug candidates against them? As technologies increase in sensitivity and precision, it begs the question: when will we stop cheating and start the business of screening drugs in more relevant and physiological settings? Why wait until an expensive phase 1 or phase 2 trial? Perhaps the appropriate place to start this discussion is to look at where drug screening has come from, and then move to the direction screening should take in the future.
Not too long ago drug screening moved from a shamanistic ritual of trial and error in the jungle (or the vicar's herb garden) to the modern targeted approach when drugs were discovered to act against knowable molecular targets, as proposed essentially by Paul Ehrlich in the nineteenth century (in his use of dyes for staining of bacterial and human cells). Drugs were found to bind to specific sites that antagonized, agonized or completely modified the function and shape of proteins, RNA or DNA. From this point, the current process of searching through large libraries of drug candidates to find an appropriate shape that fits into the right pocket on a known target emerged as the dominant strategy for identifying new drugs. The race was on to create the most diverse sets of shapes with chemically malleable backbones that could be tailored to have the 'appropriate' pharmacologic properties (uptake in the gut, decent half-life in the body, crosses cell walls or blood-brain barriers, does not get metabolized into nasty new compounds) among many possibilities5. As a result, most large and small pharmaceutical companies have developed in-house libraries of millions of compounds that are either synthetic or culled from natural sources. The objective of these libraries is to cover as much of the 'shape-space' of chemical opportunities as possible in any screen, while minimizing the risk of duplicating chemical screening effort. The goal has always been about creating enough diversity to get a drug 'hit', which is then iteratively optimized to achieve greater binding capacity and better pharmacologic properties, and to minimize the side effects. Of course, with that premise it's very easy to become a reductionist. How do you best measure the binding strength and inhibitory capacity of a molecule? "With purified enzyme," says the Biochemist—no messy cells or mice, please, and once we ramp up we can do the tests in a few nanoliters on a zillion-well plate! Sounds good, and great leads are produced and optimized this way, don't get me wrong. But the genetic maxim always applies: in any screen you get exactly what you ask for, which is not always what you want. For instance, you might get from a large screen a set of drug candidates that work great at target inhibition in vitro, but the qualities that allow the drug to have marvelous properties in a cell-free extract might also turn out to be qualities that cause the compound to never cross a cell wall, or that make the compound bind to serum albumin in the blood so tightly as to take it out of action, or that make the drug bind to a bystander protein in the wrong cell type, thereby wreaking unexpected toxic havoc. While these statements come as no surprise to anyone that is well versed in drug screening, they set up an important question: where have people gone from here? Is drug screening just about purified proteins and extracts? Can we exploit 'real' cells in drug screening?
The utility of single real cells
It would be great to have a working 'system' that completely mimics the complexity of the human body to test our drug candidates against. The best real thing we have is human beings, and clearly it is not practical to do screening with them. The next best things are chimpanzees, then rats or mice, then mice with human organs or cells implanted in them, then tissue-culture-based primary normal or diseased cells or organs from humans, then cell lines derived from the latter, then reconstituted biochemical systems, and finally proteins produced in bacterial, insect or mammalian cells from characterized genes or gene fragments. With the pros and cons associated with all of these choices, that's a lot to consider. Even the best model (humans) isn't perfect because no single human with a given disease is a match for a complex human population containing multiple ethnicities, disease subtypes and desired outcomes.
But if you look at the choices closely, there's a common objective beneath the surface: model the system sufficiently well to get a decent answer with the most practical use of available resources. Ideally, the objective achieved in the most reductionist system (the bacterially produced protein) should faithfully reflect the desired outcome in the most complex (the human). The smallest unit of a reasonably living/breathing complex system is the single cell. As such, single cells are a great place to look for the effects of a drug, its ability to get into the cell and its lack of toxicity against housekeeping processes common to normal cells. The kind of single cells one uses of course depends on the disease and character of the outcome desired. And once you stop thinking of single cells as homogeneous entities, you realize that populations of cells might contain many different subtypes of cells, or at the very least individual cells are often in different phases of the cell cycle. In addition, do you screen against normal cells or do you screen against diseased cells? Do you use a cell line that last saw a human body 5,000 cell divisions ago and has lived through multiple freeze-thawing events, or do you use cells recently acquired from a patient?
Of course, such questions lead one to conclude that cells recently acquired from a normal or diseased person are likely to be the most relevant. Cells from patients can be limiting, of course, which leads to issues about technology and sensitivity (discussed below). This affects the size of the library that might be screened: more available cells means a larger library to be screened; a smaller cell sample set might mean the screening must be done against cells from a mouse model, a cell line or a bacterial extract, but at least starting a large-scale screening in the latter allows a secondary screen against a more physiologically relevant target type, such as hard-to-obtain human cells. And, in the case of the immune system, for instance, working with primary cells derived from patients allows for complex cell populations composed of multiple cell types to be screened simultaneously.
It's obvious that primary cells are more physiological. But is there a common utility that we can exploit about primary cells that makes them even more valuable than just serving as a place to express a target protein? The focus in primary drug screens has most often been maniacally focused on the protein target, with not enough thought about how the immediate neighbors in the signaling pathway might be used to provide additional input on the action of the drug. As such, it is not just the immediate target of the drug's activity that should be measured, but several outcomes that can give in finer detail a report of the drug's activity in clinical practice. The argument then becomes, in a positive sense, circular: if one is going to measure several things to determine the viability and utility of the drug in the later stage, why not use as many of these measurements as possible in the initial stages of the screening to determine the function and activity of the drug?
For certain kinds of screens and cellular processes, such as screens against signaling networks, a uniform system can be created that can measure multiple points in a complex, interconnected system in a common assay type, and that can be applied from early cell-based screening approaches all the way through clinical trials and in diagnostic utility after approval. The goal is simple: get to the primary cell type as quickly as possible and use as many simultaneous assays per cell as you can handle. A screening specialist might understand this intuitively, but confuse simultaneity with parallel assays. Newer screeners—especially those in academic labs with institutional screening operations nearby—can be persuaded to this way of thinking early on. A dozen parallel measures accomplished in a dozen independent assays on 100 cells per assay has far less information content (1,200 cells measured) than a dozen measures accomplished per cell on 100 cells (100 cells assayed in a correlated manner). It is the correlated measurement on a per-cell basis that gives such approaches real power (for an example of this, see refs. 6,7).
It's all about the network
What a network concept really means is that we can use one, or multiple, protein signaling elements of a connected system to predict the state of the total system. What you need is sufficient knowledge about how changing one or more protein activities affects the rest. You might not know precisely what all the proteins are doing by measuring a fraction of them, but you can do a lot of decent prediction by measuring several proteins that 'attach' to the target of the drug via a network and that have connectivity to the desired outcome. If you are measuring all of these events in a single cell, or in a correlated manner in many thousands of cells, you can get deep insight even into how the signaling network responds to drug perturbation in many different cell subtypes simultaneously.
But how does measuring many things in a cell tell us about the state of the cell? One way to think of this is to imagine several spiders, say, sitting on a spiderweb, and picture how a newly introduced insect tugging on the web strands can be felt by the several spiders on the web. As the insect pulls and struggles in the web, it distorts the web's shape (Fig. 1). If we wanted to save the bug and get the web 'back to normal', we could try to remove the bug by force (tweezers), but this might similarly distort the web and attract the attention of the spiders. So, what do we measure? Do we measure the location of the bug as an indicator of the system state, or do we measure the spiders? Where are the spiders and the bug in three-dimensional space on the web? What is their general state of agitation? Knowing these things can allow us to predict the shape of the web, and in many cases we can also predict where else on the web other spiders might lurk. Now, knowing this, and knowing what a normal spiderweb should look like absent the invading bug, we can screen for drugs that place the entire network back to a state of normality. Essentially, we can measure not just the target loci itself on which the bug lands, but also the general state of where all the spiders sit, and we can use this information to better understand whether the state of the web is back to normal.
Figure 1: A tangled web we weave.
A change at the center of a complex network leads to changes felt elsewhere in the network. Dysregulated signaling in the form of a diseased signaling protein (shown in this model as an insect landing in the center of the web network) leads to changes elsewhere in the network. Measuring multiple network elements that respond to those changes (in this case spiders that sit on the web node interfaces) can provide significant information about the state of the web network. Inappropriately attempting to act on the target diseased node can further distort the web network and cause side effects, for instance. Determining whether the problem is 'fixed' by suitable application of a drug to the target leads to reversion of the web to a normal shape, as measured through the correct placement of the web elements in the form of the spatial locales of the spiders at the web node interfaces.
Full size image (71 KB)Flipping back to drug screening reality, measuring several downstream or connected molecules in a complex signaling network can allow us to accurately detail the shape of the network and assess how close to normal it is with regards to a drug's desired action (Fig. 1). The goal, therefore, in drug screening today should be to detail nearby and distantly connected players in the network to assure not only that the drug has the inhibitory or stimulatory outcome immediately proximal to the target's action (for example, inhibiting a kinase leads to lack of phosphorylation on a hypothesized target of the kinase), but also that other nodal elements on the net are appropriately repositioned to indicate that, as much as possible, we have recreated a normal state of affairs in the network topology.
Network assays present and future
If the goal is to obtain correlated measurements on multiple primary cells and to relate these data to drug action and disease outcome, and perhaps also use them as a monitoring device in clinical practice, the question becomes: how do we get there? I always start in my design of any experiment with the ideal goal, determine what's the practical outcome that best approximates that goal, and then think backwards to where I am today, the intermediate steps I need to take, and the tools I have at my disposal. Most of the time, the key tools I need do not even exist, so I iterate the process, now with the tool I need as the goal, and determine how I design and create the tool. I then apply that to each tool and each step in the process. After enough iterations like this, and time, a plan emerges. It's likely that along the way I have developed some fancy tools that can be applied effectively to other problems. Essentially, as with organic syntheses of complex small molecules, this could be considered a retrosynthetic approach to tool development.
For our laboratory this was the case with the development of phospho-flow cytometry for measuring the activities of kinases and phosphatases in single primary cells 8, 9. Flow cytometry sets the bar high for quantitative measures of fluorescence in many thousands of cells per second. Thirty years of academic and clinical development have made the technology widespread, useful, and robust in its applications. We imagined an end goal of measuring multiple phosphoproteins on a per-cell basis in primary cells (Fig. 2). We essentially took antibodies developed originally for western blot, attached them to fluorescent dyes, and used them to detail the phosphorylation states of multiple proteins on a per-cell basis in primary cells from mouse models and patients8. For us, the key aspect that led to significant insight was using evoked or potentiated cell signaling through stimulation to force cells to initiate signaling events6.
Figure 2: Multicomponent network screening from discovery to clinic.
Starting from primary screens, the initiative should be to foreswear the use of in vitro assays as soon as practical, with an emphasis on moving to whole cells, and where appropriate, primary cells. Multicomponent measurements in the form of specific signaling biomarkers in networks should be chosen to provide sensors on the target itself and on the state of the network in the cell. Emphasis should be placed on simultaneous measures per cell, not parallel measures, to enhance network derivations. As assays move to clinical studies, it is appropriate to start to use the most informative subsets of markers in relevant cell subsets to drive the generation of diagnostics that can reflect pharmacodynamic or patient stratification strategies in phases 1 to 3 and for post-approval diagnostics strategies.
Full size image (51 KB)Why is this of consequence? Of course, useful knowledge is gained by simply measuring the basal phosphorylation states of cells out of the body, but by measuring evoked signaling one adds to this a wealth of connectivity knowledge about the multiple members of the signaling network being analyzed. Different evocations (cytokines, drugs) lead to different phosphorylation outcomes. With application of appropriate algorithms everything from the signaling network to correlation of network status and drug outcomes in patients can be achieved 6, 7.
What is most useful about phosphorylation is that kinases, and their counterparts, phosphatases, are exquisitely sensitive to cellular states. No matter whether it is metabolism, damage, reaction to drugs, cell division, cell movement, or some other state, some of that information is reflected by phosphorylation, and it will be processed via kinases and phosphatases. Thus, though they are not the targets of all or even most drugs, the activities they have by virtue of their phosphorylation of thousands of different epitopes within cells are the most lasting cellular state that can be readily measured (they can be measured for years if the cells are frozen appropriately). The phosphorylation state of these target proteins holds further value in that it is often correlated with a functional state of the protein on which it is carried. So, for the criteria we set above, measurement of multiple states in a network seems to mesh perfectly with kinases and phosphoproteins: they talk to each other via a common tag (phosphorylation), are closely and rapidly correlated to cell states of interest, and are readily measurable in primary cells from patients. Flow cytometry sets the standard for high-throughput quantitative measurement of many events per cell (up to 18–20 for high practitioners of the art). Together the tools are irresistible (to me at least). Now, there are other cytometric technologies out there that are great at measuring, but most static imaging systems and microchannel- based systems are still sorely lacking in their ability to achieve more than two or three colors per cell, and their throughput and quantitation are variable. Of course, there are many more events in cells that are critical to cellular function. Protein-protein binding, protein location in the cell over time, and other secondary modifications to cellular constituents (such as sumoylation, methylation and sugar additions) are important for getting a crucial snapshot of the cell. Unfortunately the technology for measuring these events at the single-cell level is either fully lacking, reduced to one-off proof-of-concept measurements, or based on highly customized fluorophores that react to singular proteins. Gene expression arrays are great for measuring thousands of genes per cell, and are arguably extremely linked (through signaling networks), but the need to lyse thousands of cells limits the approach to a parallel measure of multiple events in an averaged lysate of cells rather than a correlated set of multiple events measured in each cell. This is true for mass spectrometry measures of all kinds of cellular material as well. So, for the time being, phosphorylation suffices as a great measurement system of cellular history and potential. In the near future it might be possible to carry out mass spectrometry on single cells, or using imaging platforms that report on live primary cell information of multiple cellular states with novel indicators (developments in this area are being made). But for the time being, multiple events per cell above five or six colors remains the province of flow cytometry.
Integrating network biomarkers
To screen for a drug that creates a certain cellular state, it's best not to rely on a single output to measure that complex state. On an assumption of connectivity in a network one can measure the most likely downstream (for instance) phosphorylated targets of the enzyme or receptor targeted by the drug, as well as multiple proteins that coexist in nearby or compensatory pathways. One can set as a standard what a 'normal' cellular state should be (by surveying a cohort of 'normal' individuals) or define a compensatory state of how the drug should 'fix' a set of cells from diseased samples, and screen by this approach with primary cells. Now, that said, the cells I prefer to work with are immune system cells—for the obvious benefit that it is easy to dissociate this tissue into individual cell constituents for staining and flow cytometry. Appropriate application of dissociation techniques has resulted in phospho-specific measurements of solid tumor cells as well as their infiltrating immune system cells (E. Danna and G.P.N., unpublished data). Although our approach has easy applications in liquid cancers, studies of the immune system, and some solid tumors, I will bow to the caveat that not all cell types are perfectly amenable to this system by flow cytometry (neural cells might best be surveyed by static imaging systems, for instance). The benefit of defining early what multiple associated markers might correlate with drug action, or even with patient outcomes, has immediate utility in the clinical setting as a drug passes the hurdles of preclinical testing and reaches phase 1 or phase 2 clinical testing. The same markers, the same techniques, and the same cells can be used to determine whether or not the results seen in vitro for screening hold up in the patient. Is the desired network state achieved? If not, is the network state different, and if so, how different? And, what network state or states correlate with drug efficacy or toxicity? As a simple pharmacodynamic monitoring tool, a network topology measurement could be used to determine whether or not the drug leads to a titratable effect or whether a threshold is achieved across each member of the network. At the end of the day, the marker sets used in the clinical trials could even be used to prescreen patients for the patient subgroups for inclusion and exclusion criteria. As the drug moves forward, the initial hypothesis of screening is verified, the preclinical results back this up (assuming requisite measurements), and clinical studies with the same biomarker measurements in phases 1 to 3 can be used to move the drug forward as evidentiary for approvals. Finally, obviously, a list of exclusion/inclusion criteria is an ideal companion diagnostic for a drug's application after FDA approval.
Though all of the above makes sense, I have been surprised by the lack of uniform application of biomarker assays across clinical testing by many drug companies, large and small. The reasons are varied and often come down to the need to maintain timelines, the difficulty in getting access to clinical samples and the existential push to just keep moving because everyone else is moving. I would posit that the most useful approach would be to start integrating these kinds of screens as early as possible (most practically as a secondary or sublibrary scaffold screen if one is doing screening by single-cell flow-based screening10). By starting early it then becomes an intuitive imperative to continue that subset of markers that is most relevant as the drugs move forward through preclinical and onto clinical trials. It might be too late for the older drugs well along the process to apply this notion, but for new drugs, an integrated approach will likely serve up huge benefits.
Philosophical meanderings
I believe that it is time to stop taking the easy way out in drug screening and that there are many assay types beyond my provincial favorite (single-cell flow cytometry) to apply to primary cells that can be used in a clinical setting. The percentage of successful drugs coming through the pipeline has not improved, and the budgets for pushing them through clinical trials are not expanding enough to test more drugs, even if more passed the preclinical stages. I have argued that it is time to stop winning the small victories and start moving the use of clinical samples back toward larger-scale drug screening. And I believe it's time to start moving more mechanistic studies of cellular systems at the single-cell level forward into the clinic to truly use them as mechanistically relevant biomarkers of disease course and drug action. With such an approach, more fit candidate drugs will move out of the labs and into preclinical development, and with the right assays in the clinic we'll discard the suboptimal drugs faster, achieve earlier clinical focus through 'patient stratification', and effectively increase the success rate of drugs as they move through the system. And finally, I want to state emphatically that looking for a single-biomarker 'one size fits all' knockout punch is an elusive philosopher's stone for transmuting sera into gold. The HER2/neu-trastuzumab (Herceptin) biomarker stories are a dream to attain as a biomarker that relates symptoms to mechanism, and for most diseases they will remain a dream because such easy pickings are few and far between. The network, and measures of immediate drug impact through single cells, is this shaman's advice for the near term. And that's what I think, at least for now.
