## Introduction

Despite billion-dollar investments in research and development, the process of approving new drugs remains lengthy and costly due to high attrition rates1,2,3. Failure is common because the models used preclinically—which include computational, traditional cell culture, and animal models—have limited predictive validity4. The resulting damage to productivity in the pharmaceutical industry causes concern across a broad community of drug developers, investors, payers, regulators, and patients, the last of whom desperately need access to medicines with proven efficacy and improved safety profiles. Approximately 75% of the cost in research and development is the cost of failure5—that is, money spent on projects in which the candidate drug was deemed efficacious and safe by early testing but was later revealed to be ineffective, unsafe, or otherwise of limited commercial value during human clinical trials. Pharmaceutical companies are addressing this challenge by learning from drugs that failed and devising frameworks to unite research and development organizations to enhance the probability of clinical success6,7,8,9. One of the major goals of this effort is to develop preclinical models that could enable a “fail early, fail fast” approach, which would result in candidate drugs with greater probability of clinical success, improved patient safety, lower cost, and a faster time to market.

There are important practical challenges in ascertaining the predictive validity of new preclinical models, as there is a broad diversity of chemistries and mechanisms of action or toxicity to consider, as well as considerable time needed to confirm the model’s predictions once tested in the clinic. Consequently, arguments for the adoption of these new models are often based on features that are presumed to correlate with human responses to pharmacological interventions—realistic histology, similar genetics, or the use of patient-derived tissues. But even here there is a common problem in much of the academic literature: the important model features are chosen post hoc by the authors and not prospectively by an independent third party that has expertise in the therapeutic problem at hand10.

The Innovation and Quality (IQ) consortium is a collaboration of pharmaceutical and biotechnology companies that aims to advance science and technology to enhance drug discovery programs. To further this goal, the consortium has described a series of performance criteria that a new preclinical model must meet to become qualified. Within this consortium is an affiliate dedicated to microphysiological systems (MPS), including Organ-on-a-Chip (Organ-Chip) technology, which employs microfluidic engineering to recapitulate in vivo cell and tissue microenvironments in an organ-specific context11,12. This is achieved by recreating tissue-tissue interfaces and providing fine control over fluid flow and mechanical forces13,14, optionally including supporting interactions with immune cells15 and microbiome16, and reproducing clinical drug exposure profiles17. Recognizing the promise of MPS for drug research and development, the IQ MPS affiliate has provided guidelines for qualifying new models for specific contexts of use to help advance regulatory acceptance and broader industrial adoption18; however, to this date, there have been no publications describing studies that carry out this type of performance validation for any specific context of use or that demonstrate an MPS capable of meeting these IQ consortium performance goals.

## Discussion

Numerous authors have argued that Organ-Chip technology has the potential to substantially improve drug discovery and development55, but although many major pharmaceutical companies have already invested in the technology, routine utilization is limited56. This may be due to several factors, including the absence of end-to-end investigations showing that Organ-Chips replicate human biological responses in a robust and repeatable manner; demonstrations that Organ-Chip performance exceeds that of existing preclinical models across a suitably broad set of compounds; and illustrations of ways to implement the technology within routine preclinical workflows. Furthermore, the broader stakeholder group—especially budget holders—need assurance that there will be an attractive return on investment and an increase in R&D productivity that may mitigate the pharmaceutical industry’s widely documented productivity crisis57,58,59. This study aims to address these four concerns.

We particularly report here on the systematic evaluation of the validity of Organ-Chips for DILI prediction against criteria designed by a third party of experts. To our knowledge, no MPS has been evaluated against 27 small-molecule drugs in a single study involving three different human donors and hundreds of chips, making this study the largest reported evaluation of Organ-Chip performance. In this evaluation, the Liver-Chip has demonstrated that it can correctly distinguish toxic drugs from their non-toxic structural analogs, and, across a blinded set of 27 small molecules, it displayed a true positive rate of 87%, a specificity of 100%, and a Spearman correlation of 0.78 against the Garside DILI severity scale when two donors are used, and data are corrected for protein binding. Importantly, these data were independently verified by two external toxicologists. Said differently, the Liver-Chip detected nearly 7 out of every 8 drugs that proved hepatoxic in clinical use despite having been deemed to have an appropriate therapeutic window by animal models; the Liver-Chip similarly detected 2 out of 4 such drugs that were additionally missed by 3D hepatic spheroids. We therefore believe that these findings advocate the routine use of the human Liver-Chip in drug discovery programs to enhance the probability of clinical success while improving patient safety. This would be achieved by more accurately categorizing risk associated with a candidate drug to provide valuable data to support a ‘weight-of-evidence’ argument both for entry into the clinic as well as for starting dose in Phase I. Such added evidence could potentially remove any safety factor applied because of a liver finding in an animal model60,61. In turn, this would reduce overall cost and time in the preclinical development process.

A unique feature of this work is the demonstration of the throughput capability of Organ-Chip technology using automated culture instruments, as a total of 870 chips were created and analyzed. In terms of establishing effective workflows, scientists were placed into three teams: the first team prepared the drug solutions and supplied them in a blinded manner to the second team. The second team seeded, maintained, and dosed the Liver-Chips while carrying out various morphological, biochemical, and genetic analyses at the end of the experiment. The third team collected the effluents and performed real-time analyses of albumin and ALT as well as terminal immunofluorescence imaging using an automated confocal microscope (Opera Phenix; Perkin Elmer). In this manner, we were able to analyze and report the hepatotoxic effects of 27 drugs in 870 Liver-Chips that used cells from three human donors in a period of 20 weeks.

Based on this experience, we believe that the Liver-Chip could be employed in the drug-development pipeline during the lead optimization phase where projects have identified three-to-five chemical compounds that have the potential to become the candidate drug (Fig. 5). If data emerge showing that a chemical compound produces a toxic signal in the Liver-Chip, this will indicate to toxicologists that there is a high (~87%) probability that the compound would similarly cause toxicity in humans. This, in turn, would enable scientists to deprioritize these compounds from early in vivo toxicology studies (such as the maximum tolerated dose/dose-range finding study) and, consequently, reduce animal usage and advance the “fail early, fail fast” strategy. Importantly, the absence of false positives strengthens the argument that the Liver-Chips should also be adopted within the early discovery phase, as stopping drug candidates that are falsely determined to be toxic by less-robust preclinical models could result in good therapeutics never reaching patients.

Despite these positive findings, it should be acknowledged that the current chip material (PDMS) used in the construction of the Liver-Chip may be problematic for a subset of small molecules that are prone to non-specific binding. Although this study demonstrates that the material binding issue does not in practice greatly reduce the predictive value of the Liver-Chip DILI model, work is currently underway to develop chips using materials that have a lower binding potential. Until such a chip is available, we recommend users assess potential PDMS binding using an acellular chip and measuring drug in the effluent channel using LC/MS to enable adjustment of workflow if required. It should also be recognized that many pharmaceutical companies have diversified portfolios, with only 40–50% now being small molecules. Consequently, further investigation of the Liver-Chip performance against large molecules and biologic therapies should be carried out. Integration of resident and circulating immune cells should add even greater predictive capability.

Finally, predictive models that demonstrate concordance with clinical outcomes should provide scientists and corporate leadership with greater confidence in decision making at major investment milestones. Our economic analysis revealed that supplementing existing preclinical models with human Liver-Chips for the prediction of small-molecule DILI could lead to a substantial economic impact, with broad adoption of the technology having the potential to generate an estimated $3 billion annually across the industry due to improved R&D productivity. Moreover, the analysis illustrates that the productivity gain could potentially extend to an estimated$24 billion annually if four additional Organ-Chip models are used to address the most common toxicities that result in drug attrition, and the additional Organ-Chips demonstrate a similar level of performance to the Liver-Chip. Taken together, these results suggest that Organ-Chip technology has tremendous potential to benefit drug development, improve patient safety, and enhance pharmaceutical industry productivity and capital efficiency. This work also provides a starting point for other groups that hope to validate their MPS models for integration into commercial drug pipelines