Microsoft is parlaying its machine-learning expertise into a new integrated platform for working with biological systems, and to refine it, the tech giant is partnering with a Princeton University researcher and UK-based companies Oxford Biomedica and Synthace. The platform, dubbed Station B and unveiled in March, aims to improve how life scientists go about conducting their research. It enables them first to model new biomolecules and organisms in silico, then test their designs in wet-lab experiments and finally analyze the results using machine-learning algorithms. To achieve this, the platform combines Microsoft Azure’s cloud infrastructure and algorithms with Synthace’s automated platform for biological experiments. For real-world testing, Microsoft has brought in one commercial enterprise, Oxford Biomedica, which will apply the platform to its gene-manufacturing processes, and one academic, Princeton University molecular biologist Bonnie Bassler, who will use Station B to gain insights into bacterial biofilms. Station B, and other solutions combining laboratory information management systems (LIMS) for uploading research data with cloud-based services for search, analysis, and discovery, promise to not only enhance reproducibility but also provide time and budgetary savings for researchers in both academia and industry.
With this move into synthetic biology, Microsoft underlines the accelerating effect that machine learning could have on understanding biological systems and designing new solutions to agricultural, industrial and therapeutic problems. This vision is what increasingly draws tech founders to synthetic biology, says John Cumbers, founder and CEO of the SynBioBeta industry group. Zymergen, of Emeryville, California, a company that uses machine learning and genomics to discover new materials and products, counts among its investors Jerry Yang, founder and former CEO of Yahoo; Ev Willams, previously chairman and CEO of Twitter; and Google’s Eric Schmidt. PayPal’s co-founder Peter Thiel has bet on Transcriptic, of Menlo Park, California, and Emerald Cloud Lab, of South San Francisco, California, both of which are robotic cloud-based research service providers, as well as Emeryville-based Bolt Threads, a company making materials from spider-silk fibers. At the SynbiTECH conference in London in June, Cumbers noted that the field amassed a record $1.9 billion in venture funding in 2018, with a similar investment figure expected for 2019.
Station B will follow the framework that synthetic biology normally applies to designing and building biological systems: ‘design–build–test–learn’. The design phase uses programming languages Microsoft developed to represent information about a cell’s metabolism, genetics, and transcription, regulation and signaling networks. The build phase converts those designs into DNA sequences and experiment instructions The wet-lab experiments that follow will be carried out not by human researchers operation by trial and error, but by Synthace’s ANTHA robotic platform. Machine-learning algorithms will then analyze the results and use those experiments to refine subsequent designs and increase reproducibility. By operating with cloud-based algorithms and automated experimentation, scientists can use the platform to improve their efficiency and work more reliably.
As yet, Station B is a research project rather than a commercial offering, but the Microsoft Research group anticipates that the pilot projects with Oxford Biomedica and Princeton’s Bassler will prove useful to its customers in both academic and large-scale industrial settings, says Andrew Phillips, head of the Biological Computation Group at Microsoft Research in Cambridge, UK.
For the most part, when synthetic biology labs want to predict, say, the best conditions for a new drug, or a cell’s response to an RNA sequence, they rely on computational models of cells and cell components to run simulations. Such models need to be detailed and accurate enough for researchers to successfully design new and modified organisms in silico. And that requires extraordinary amounts of data about the cells, says Phillips. Biotech companies resort to brute force to build them—typically testing many biomolecular variations in parallel. “We can’t get away from that completely but we’re really pushing hard on learning computational models from the data using advanced machine-learning techniques,” he says. Key to building those models is gaining a detailed understanding of cellular components, such as organelles, microtubules and cytoskeletal proteins. “So you would build a component, you’d use it in one context, you get measurements, you characterize it, use it in another context [where] it may behave slightly differently,” he says. “You learn over time about which properties are intrinsic to the component and which properties depend on the environment."
Rapid knowledge generation is one of the main selling points for AI-driven synthetic biology platforms such as System B. For a company, gaining such insights can boost productivity in the short term by narrowing the field of candidates, for example, among biomolecular variants, and in the long term by contributing to mechanistic cell models that enable rational cell design.
Though machine learning is key to generating knowledge, the strength of platforms like Station B is that they tie all the processes together, from designing cells to automating wet lab functions to analyzing test results via machine-learning software, says Nathan Hillson, a synthetic biologist and bioinformatician with the US Department of Energy and cofounder and chief scientific officer of San Francisco–based TeselaGen, an AI-driven, cloud-based platform for biological systems. It also matters that researchers use the platforms in a way that produces high-quality, reproducible data, he says. “It’s the whole engineering cycle, so everything from the original design of experiment to actually figuring out how to make those designs in the real world: how to fabricate them, how to validate that they’re made correctly, how do you test them, how do you capture that test data, how do you learn from that test data to then inform your next round of design,” he says. “Unless you have software infrastructure coordinated with automation and analytical instrumentation, and all of those types of things across that whole cycle, you can’t really iterate effectively and quickly.”
Oxford Biomedica's primary goal in working with Microsoft is to reduce the cost of producing a dose of lentiviral vector, says Jason Slingsby, the company’s chief business officer. Oxford Biomedica is a cell and gene therapy biotech known in the industry for providing lentiviral vectors to numerous biotech and pharma companies. Oxford Biomedica produces for Novartis the chimeric antigen receptor (CAR)-T therapy Kymriah, which is approved for children and young adults with acute lymphoblastic leukemia and adults with diffuse large B-cell lymphoma. It’s an expensive therapy: $475,000 for the leukemia therapy and $373,000 for the lymphoma therapy in the United States. “By lowering the price we might have a chance to democratize access to gene therapy treatments,” says Slingsby. “It also helps global supply.”
The partnership with Microsoft Research comes as Oxford Biomedica is developing a line of stable producer cells to replace its current transfection method of producing lentiviral vectors. Station B might help the company make better producer cells, says Slingsby. One of Oxford Biomedica’s long-term goals for the collaboration with Microsoft Research is to better understand what makes a good cell for producing gene therapy vectors. “Then we’ll be in a better position to do rational design,” says Slingsby. “Once you know what to do from a rational basis there’s a lot of new technologies we can use to engineer cells optimally. And then by using stable producer cell lines we feel that there's major, major economic savings we can make.”
AI-driven synthetic biology platforms can boost basic research as well as commercial R&D. Princeton’s Bassler is a pioneer in the field of bacterial quorum sensing. Her lab researches ways of disrupting the bacterial communal behavior that causes many infectious diseases. Even with a team of graduate students and postdocs using laboratory automation systems, Bassler’s lab is limited in the number of constructs they can test and the amount of data they can collect, she says. With Station B, her aim is to learn the patterns of what doesn’t work so that it takes fewer attempts to get to constructs that work, she says. “The idea is to accelerate success. It makes us think that we're not bound by how many of these can I do in a day, and that is an incredible way for a scientist to be able to think.”
Meanwhile, in March, Asimov, of Cambridge, Massachusetts, which develops engineered mammalian cell lines for therapeutics, landed a contract with the US Defense Advanced Research Projects Agency (DARPA) to develop a physics-based AI-driven design engine aimed at programming complex cellular behaviors. This would be a major step toward rational cell design. In October, TeselaGen, along with Twist Bioscience, of San Francisco, and Labcyte, of San Jose, California, partnered with Arzeda, of Seattle, to build a DNA-assembly platform to boost the efficiency of Arzeda's designer protein and enzyme business. TeselaGen is providing a synthetic biology platform that will work with Labcyte’s lab automation equipment and Twist Bioscience’s synthetic DNA.
Separately, Berkeley Lights, of Emeryville, California, which has developed a high-throughput cell-handling platform for the synthetic biology field, inked a deal inked with Amyris, also of Emeryville, in June. Berkeley Lights’ system uses light beams to handle and analyze thousands of individual cells, which moves laboratory automation from the scale of microwells to the scale of cells. Amyris develops microbial strains that produce substances for consumer products. The company plans to use Berkeley Lights’ platform to identify and isolate strains that produce higher yields.
Station B and other AI-driven synthetic biology platforms will help move rational cell design forward, says Simone Bianco, of IBM Research. Bianco is a principal investigator at the Center for Cellular Construction, a National Science Foundation center housed at the University of California, San Francisco, charged with the mission of laying the foundation for rational cell design. “It’s very encouraging because it shows that there is an interest in a discipline that in my opinion lacks a strong quantitative foundation,” he says.
At the same time, the synthetic biology field needs to be careful about overselling the benefits of machine learning, says Charles Fracchia, CEO of BioBright, in Boston, which markets a platform that centralizes and annotates large volumes of research data. The platform includes dashboards and tools for searching and communicating those data to machine-learning algorithms. But for those algorithms to generate insights about biology without first being given the rules of biology will require quantities of data and computer power beyond today’s technology, he says. "You need to guide the machines."
In the context of developing therapies, it’s also important to recognize that these AI-driven synthetic biology platforms address only the front end of the process, says Fracchia. "I can see the trough of disillusionment coming soon in the sense that it doesn't matter how good your algorithm is at finding a new potential candidate, it still takes 10 years to validate that," he says. Developing a drug typically takes 10–12 years and $1.4–2 billion, and finding ways to cut that in half or better is going to take innovations throughout the pipeline, he says.