We are Merck, a vibrant science and technology company. Science is at the heart of everything we do. It drives the discoveries we make and the technologies we create. The passion of our curious minds makes a positive difference to millions of people’s lives every day. In healthcare, we discover unique ways to treat the most challenging diseases, such as multiple sclerosis and cancer. Our life science experts empower scientists by developing tools and solutions that help deliver breakthroughs more quickly. And in electronics, we develop science that sits inside technologies and changes the way we access, store, process and display information. Everything we do is fuelled by a belief in science and technology as a force for good. A belief that has driven our work since 1668 and will continue to inspire us to find more joyful and sustainable ways to live. At Merck, we have curious minds and are dedicated to human progress.
Merck has invested in digital infrastructure and tools in the past five years; we are now focused on building and scaling an innovation engine that harnesses the power of data and digital to accelerate growth for ‘One Merck’ as a science and technology company. We focus on innovation as One Merck, bringing together the collective capabilities across our three business sectors, and identifying, prioritizing and implementing technical capabilities that all three business sectors need for future growth. Examples are the group data strategy and smart manufacturing. To enable new innovation opportunities across Merck, we apply an enterprise-wide view and seek to seed, de-risk and integrate transformative technologies that go beyond the strategic direction of any one business sector but can have future growth potential for Merck. To drive transformative technologies, we also leverage open innovation, collaboration, partnering, public funding and strategic investments through M Ventures, and drive the creation of new digital products and business models.
In this white paper we introduce some examples to explain how we leverage data and digital for innovation across the value chain, from early leads to the manufacturing of final products.
AI-powered drug discovery platform
Discovering drugs is a long, iterative process. Bringing a drug to market still takes on average more than ten years and costs over US$2 billion. Only about 10% of Phase I candidates make it to market1. In the early discovery process, it can take months (or even years) to design molecule libraries, synthesize them in the lab, and perform high-throughput screening to identify a potential candidate out of millions. The use of artificial intelligence (AI) and machine learning methods is having a profound impact on this early drug discovery process. However, the implementation of these new tools in everyday drug discovery is challenging due to the unexplainable nature of the model predictions. To overcome this challenge, it is crucial to integrate generative models, property predictions, and computer-aided drug design (CADD) tools in a manner that ensures the reliability of their results. This is particularly important when building a next-generation platform for AI-powered drug discovery that can produce outputs that are both novel and trustworthy.
AIDDISON is Merck’s AI-powered drug discovery platform that integrates AI-powered molecule design, synthesis planning and direct sourcing of chemical building blocks to accelerate the journey from concept to clinic2. AIDDISON is intended to help medicinal chemists with early-stage prediction of both manufacturability and drug-like properties for novel molecules. This AI-powered platform integrates generative design with predictive synthesis planning, allowing rapid identification of promising candidates and reducing risk of late-stage failures. A typical user interface can be seen in Figure 1.
AIDDISON uses generative methods and machine learning models trained on experimentally validated ADMET data (adsorption, distribution, metabolism, elimination and toxicity) to guide the search in ultra-large chemical spaces and de novo design of ‘drug-like’ and synthetically viable compounds. AIDDISON also encompasses SA-space, a synthetically accessible chemical space of approximately 25 billion virtual compounds built on the Sigma-Aldrich catalogue of molecules that are readily available for purchase and well known, robust chemical transformation rules. AIDDISON has been developed mostly in-house. It uses decades of exclusive Merck drug discovery data to train predictive AI/machine learning models on drug properties and synthesis, while also connecting customers to our global supplier network for sourcing essential chemical building blocks. Additionally, to complete the drug discovery workflow in a single, user-friendly interface, the AIDDISON platform leverages best-in-class technologies from strategic partnerships, such as BioSolveIT for virtual screening or Cresset for 3D molecular docking.
As millions of people are waiting on the promise of new drugs and therapies to come to market, the goal is to leverage the power of AI and machine learning to accelerate and reduce cost of drug discovery. According to market research firm Bekryl3, AI has the potential to offer over US$70 billion in savings for the drug discovery process by 2028. According to Gartner Research, by 20254, more than 30% of new drugs and materials will be systematically discovered using generative AI techniques, up from zero today. While the potential is significant, to date, no new drug has been developed and approved based on fully AI-generated drug discovery. AI methods have helped discover, in part, about 19 drug candidates in early clinical trials5 and the future is promising.
Further information on AIDDISON can be found at AIDDISON: Harnessing generative AI to revolutionize drug discoveries, Merck Launches First Ever AI Solution to Integrate Drug Discovery and Synthesis and Fast-Tracking Drug Discovery with an AI Boost.
From BayBE steps to giant leaps
The Bayesian Back End software (in short BayBE) represents the development of an AI-assisted experimental planner by Merck scientists across the entire group. The project has grown into an ecosystem providing our researchers and product developers with new high-tech digital tools, speeding up development cycles and enabling innovative digital business ideas that previously had not been possible.
Every day, countless experiments are being performed at chemical, materials and pharmaceutical companies to develop their products, from screening reaction conditions to altering the composition of mixtures and layers to finding suitable process parameters. These and many more uses are united by one goal: finding the best outcome with as few trials as possible. Failing in this endeavour is costly, time consuming and produces undesired amounts of waste. In many cases, complex projects cannot even be tackled, inhibiting the field’s overall ability to innovate.
State-of-the art methods are all too often limited to non-systematic (intuition) or simple linear approaches (classical design of experiment, DOE), not to mention the immense variety and heterogeneity between many labs and divisions. BayBE represents an innovation that tackles all mentioned issues by providing a digital toolset built on data-driven AI methods6. With BayBE, scientists can flexibly ask for recommendations and tell results – the machine learning core is able to understand non-linear correlations, multiple targets, chemical information and has many more features, which solve the above mentioned issues.
However, at Merck, BayBE is not just one code, it has become an entire ecosystem development in a true One Merck spirit with contributions from all sectors and group functions. The Python package can be used by data science experts, granting the full power of all features. In addition, a REST API was built to enable web and dashboard developers without worrying about deployment and computation. Lastly, BayBE powers BayChem, our self-service tool directly usable by wet lab scientists, which saw more than 300 unique users in 2023.
Thanks to the flexibility of the framework, BayBE is applicable to nearly all projects that perform iterative experimentation. By now, BayBE:
• Powers ~30 use cases within Merck, among them ACHM film development, optimal container design, EUV photoresist and OLED chemistry.
• Won benchmark comparisons against startup vendors and comparable open-source tools (see Figure 2).
• Enabled Merck’s first self-driving autonomous platform.
• Is part of released (Viscosity Reduction Platform) and in-development (Bioreactor Digital Twin, SYNTHIA, AIDD) Merck software products.
BayBE is Merck’s most popular open-source GitHub repository, highlighting our software and AI capabilities to a wide audience, helping to attract digitally native talents where we team up with the Acceleration consortium to enable the self-driving labs of tomorrow.
BayBE has a set of features that makes it unique across the AI-driven experimentation planners available open-source from academic groups and some industry consortia, and software vendors. For example, built-in chemical encodings to improve campaigns with chemical knowledge, custom parameter encodings to improve campaigns with domain knowledge, custom surrogate models for specialized problems or active learning. All objects are fully de-/serializable for storing results in databases or use in wrappers like application programming interfaces, hybrid (mixed continuous and discrete spaces), transfer learning to mix data from multiple campaigns and accelerate optimization, comprehensive back-test, simulation and imputation utilities to find your best setting and fully typed and hypothesis-tested for a robust code base.
Retrosynthesis software
Retrosynthesis is a common organic chemistry problem solving technique that starts with a target molecule and works iteratively backwards to simpler starting compounds. Without SYNTHIA retrosynthesis software, organic chemists must rely on their own knowledge, expertise and other time-consuming manual methods for synthetic route planning. Engineered by organic chemists and computer scientists over the course of 15 years, SYNTHIA retrosynthesis software harnesses the power of AI in chemistry to curate information and predict reactions for computer-aided synthetic design of optimal synthetic routes. Based on years of development as Chematica, then further enhancement as SYNTHIA retrosynthesis software, this unique tool enables chemists to easily navigate through viable pathways that can be executed at the bench.
The Merck Compound Synthesis Challenge7 is a globally open competition to identify the most efficient synthetic pathway for a given small molecular compound. The competition is open to scientists of all career levels from around the world. After a 48-hour sprint to develop a synthetic route, in which all teams are given access to SYNTHIA to utilize during the competition, the routes undergo review to determine the top routes which will be tested in a wet lab. The team with the best route wins a prize of €10,000.
Other open innovation and crowd sourcing offers that also involve research grants or competitions in relation to innovation powered by digital and data are regularly posted at http://researchgrants.merckgroup.com and http://researchchallenges.merckgroup.com. Likewise, the topic consistently plays a major role at the bi-annual Curious – Future Insight Conference and the Innovation Cup, a one-week summer camp for students and young professionals with a chance to win the Merck Innovation Cup along with €20,000 for the most innovative project plan9,8. We will also roll out new scientific awards to honour and enable scientists whose work is utilizing new digital methodologies to advance science. For example, the 2024 Merck Future Insight Prize will be given to a researcher for work to help fight the next pandemic with AI.
Syntropy – for the healthcare industry
In the fast-paced world of biomedical research, the advent of generative models underscores the critical need for data that is not only of high quality but also meticulously traceable. Syntropy offers a secure and collaborative digital ecosystem that enables researchers to unlock the value of healthcare data. Syntropy is tailored to meet the exigencies of modern research, where collaboration catalyzes the speed of discovery. We understand that in an era marked by increasing analytics and AI advances, such as generative models, the demand for high-quality, traceable data has never been greater. There is no shortage of healthcare data available to researchers, with health data making up one-third of all data generated in the world. The vast volumes of data generated within the healthcare sector hold the key to groundbreaking therapeutic innovations and enhanced health outcomes. However, much of the available health data is trapped in silos within and between institutions, unnecessarily burdening the scientists and clinicians who need the information to advance their work with data administration. Currently, researchers often spend up to 80% of their time integrating and preparing data for analysis. To optimise collaboration, researchers need data to be accessible, contextualized and strongly secured.
Syntropy’s connected research environment securely integrates this siloed data. With over 15 publications to date, Syntropy helps research organizations structure and contextualize their own data. Bringing together disparate data sets directly from source systems – including clinical, wearable, and genomic data – into a singular, structured platform, Syntropy not only streamlines the preparation process but also elevates the quality of the data itself. This ensures that researchers spend less time on data management and more on what they do best: advancing medical research and patient care. By ensuring that all information within the Syntropy platform is transparent, traceable and auditable, we also enable researchers to swiftly build upon one another’s work, fostering a cycle of rapid innovation and robust, trustworthy research outcomes. For instance, during the critical periods of the coronavirus disease 2019 (COVID-19) pandemic, Syntropy played a pivotal role in facilitating a research consortium led by MITRE. By connecting real-world patient data from four disparate health systems in California, Minnesota, Texas and Utah, we enabled a collaborative effort that was instrumental in advancing our understanding of COVID-19. Utilizing our secure data collaboration platform, researchers established a pipeline that captured and analyzed data from patients with COVID-19 across each organization, allowing them to quickly deliver high-quality data to the final analysis10.
Syntropy is not just a tool but a principle of speed in research, where every second counts towards saving lives. By enabling a seamless flow of high-quality, secure and collaborative data analysis, we are setting a new standard for healthcare innovation. Our platform ensures that researchers can leverage the collective intelligence of the global scientific community, propelling medical advancements at an unprecedented pace and ushering in a new era of cooperative discovery. With Syntropy, the path from data to discovery is not only accelerated but also paved with the assurance of trust and transparency, marking a cornerstone in the collaborative effort to advance healthcare research.
Athinia – for the semiconductor industry
With the proliferation of digital technologies, there is immense pressure on the semiconductor industry to produce with zero defects and rapidly deliver new innovations to market – a situation that has been accentuated due to the chip shortage. The immense amount of data produced today creates opportunities for not only a single company, but for the entire value chain to achieve excellence in production, innovation and cost reduction. The continuous and secure sharing of data between many companies in the semiconductor industry has required the creation of a new standard in quality. Industry participants know that there is a need for data collaboration, but individual companies do not want to establish an isolated ecosystem given the prohibitive cost and time investments. Moreover, the average semiconductor supplier needs five to ten years to create the required data foundation. Many companies have dispersed data systems and lengthy learning cycles and capability building.
The Athinia platform was created to address limitations and provide a single source of real-time data for collaborating on relevant information from participants across the industry (Figure 3). It brings manufacturers and materials suppliers together, to share, aggregate and analyze data to unlock efficiencies, improve quality, supply chain transparency and time to market in a highly secure way. Connecting the value chain allows companies to be more efficient and innovate quickly.
Athinia creates a new standard in quality based on a data ecosystem that allows secure and continuous feedback and sharing of data between many companies to avoid siloes. Athinia will increase the efficiency of current production, as companies will better understand their own data to reduce quality deviations and speed up time to market. The rapidly increasing number of process variables within semiconductor manufacturing requires smart and scalable data and analytical capabilities from industry players. Moreover, due to the growing amount and types of data, companies need to focus time and resources on parameters that matter the most so they can remain competitive. Finding and interpreting the right parameters is of utmost importance, but not always easy. In fact, one of the challenges is that key parameters that impact performance are often hidden below the surface and are not identified in standard certificate of analysis procedures (CofA) (Figure 4). Therefore, statistical analysis and machine learning models are essential to determine critical-to-quality key performance indicators for specific processes that would have remained hidden otherwise.
The use of AI and machine learning for predictive material analytics is pivotal to make sense of big data and unlock all the above-mentioned benefits. Athinia enables customers to leverage off-the-shelf machine learning models as well as build custom algorithms to enable analysis of encrypted customer data. Big data can be quite complex to manage, therefore it is important to ensure that complex data are as straightforward as possible to navigate. Combining a highly secure and user-friendly platform with expert support, data aggregation and analysis becomes more efficient and easier than ever before.
Leading with ethics and security
Both Syntropy and Athinia handle data responsibly by enabling data collaboration without ever taking ownership of that data. Both companies have been built based on state-of-the-art security architecture that brings different parties together within their respective sectors to collaborate on key data. This ensures that intellectual property is protected and that each company retains full control of governance and ownership over their data. The secure collaboration between ecosystem participants in each industry is facilitated by the Palantir Foundry platform, which fulfills and exceeds the current requirements for data security and ownership available. Palantir is a world-class provider of secure data integration technology, while Merck draws on a long history in bioethics that has evolved to expand into digital ethics. With technology advancing at an unseen pace, questions around who bears responsibility for digital action are commonplace. Merck has not only thought about these existential questions but has acted as a true pioneer with the initial formation of its digital ethics advisory panel, followed by the rollout of its code of digital ethics (CoDE). The CoDE was developed together with the digital ethics advisory panel to provide clear ethical principles for handling data and algorithms, and to address the rapid and dynamic nature of deploying novel digital technologies, such as Syntropy.
SmartFacturing
In the contemporary technological landscape, the Merck SmartFacturing programme epitomizes innovation within the life science, healthcare and electronics sectors, promoting smart and scalable capabilities in the entire value chain of operations including the supply chain, underpinned by robust Information Technology/Operational Technology (IT/OT) integration, data management, and workforce readiness. It employs advanced technologies like robotics, automation, augmented reality/virtual reality (AR/VR) and AI, facilitating agile and adaptable manufacturing ecosystems. Central to these advancements are digital twins, dynamic virtual replicas of physical entities and processes used for optimizing real-life counterparts. First utilized in 1970 by the United States National Aeronautics and Space Administration (NASA), these models now enable pre-implementation testing and maintenance of various applications remotely, fostering efficiency and innovation across industries. Digital twins vary in complexity and application, from individual products to entire factories, and are integral for predictive maintenance and process improvement. The SmartFacturing programme not only underscores the technological shift towards interconnected digital and physical realms but also emphasizes human elements, nurturing a culture inclined towards continuous learning and innovation. We are exploring the application of digital twins across our life science, electronics and healthcare business sectors to improve our facilities' performance and sustainability and accelerate product delivery to our customers and patients. For example, in life science, we could build a digital twin of a bioreactor for manufacturing monoclonal antibodies to predict the titer and scalability of any given antibody production method and optimize production for clinical development. This approach could be used for any bioreactor used to make crucial life sciences reagents or therapeutic agents, from bespoke cell lines to rapidly optimized production of new antibiotics. Our electronics business also plans to use digital twins to rapidly scale up chemical production processes and reduce downtime by steering more productive operating windows. Currently, we are piloting the use of digital twins to further optimize supply chains, such as the materials and components required for semiconductor manufacturing.
What does the future hold?
At Merck, we are committed to staying at the forefront of innovation driven by data and digital. We invest in leading-edge technologies and collaborations with academic institutions, research organizations and technology companies. Our dedicated research teams actively explore and develop AI algorithms and participate in consortia to build collaborative models and solutions tailored to the specific challenges. We foster collaborations and partnerships with external experts and startups in the AI field. By engaging in open innovation initiatives, we can tap into a broader range of expertise and ideas, which helps us stay agile and adaptive in a rapidly evolving landscape.