A close up view of detail from a computer render of a PI5P4Kγ protein diagram.

An AlphaFold3 model of a kinase bound to chemical.Credit: Isomorphic Labs

When Google DeepMind unveiled AlphaFold31 — the latest edition of its revolutionary protein-structure-prediction AI — in Nature this month, it came with a hitch. Unlike a previous version2, there was no computer code describing the advance to accompany the paper.

The London-based company reversed course days later, promising to release the code by year’s end. But the omission has set researchers worldwide racing to develop their own open-source versions of AlphaFold3, an artificial-intelligence (AI) model that can predict a protein’s structure, as well as those of other molecules, including potential new drugs. Other scientists are doing their best to hack the web version of AlphaFold3 that DeepMind released to skirt its limitations.

“It would be bad if capabilities that are just so fundamental to our ability to do drug discovery and other things that are relevant for human health end up getting locked up,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. His ‘OpenFold’ team has already begun3 coding an open-source version of AlphaFold3 that it hopes to complete this year.

Scientists disappointed

DeepMind’s initial withholding of code for AlphaFold3, as well as the 9 May publication in Nature, irked many scientists (Nature’s news team is independent of its journal team). Nature’s policies say that code associated with studies should typically be made available, while acknowledging that there can be restrictions.

“This does not align with the principles of scientific progress, which rely on the ability of the community to evaluate, use, and build upon existing work,” states an 11 May open letter to Nature co-written by Stephanie Wankowicz, a computational structural biologist at the University of California, San Francisco, and nine other scientists, and since signed by more than 600 researchers.

In an editorial published on 22 May, Nature said that it welcomed the conversation that the AlphaFold3 publication sparked, and seeks opinions from readers on how it can encourage openness in science. It added that its policies support open science, but acknowledges that the private sector funds most global research and that many findings resulting from such work remain proprietary. “We at Nature think it’s important that journals engage with the private sector and work with its scientists so they can submit their research for peer review and publication,” it stated. The journal said that it would update the paper with the code when DeepMind releases it.

In lieu of code and parameters obtained from training AlphaFold3 called model weights, DeepMind created a website where researchers can access the tool. But this AlphaFold3 server is restricted: it can be used only for non-commercial research, and it is not possible to obtain structures of proteins bound to possible drugs. The paper describing AlphaFold3 also contains detailed ‘pseudocode’, outlining how the model works.

Retraining AlphaFold

Roland Dunbrack, a computational structural biologist at Fox Chase Cancer Center in Philadelphia, Pennsylvania, who says that he peer reviewed the AlphaFold3 paper for Nature, says that he was disappointed that DeepMind did not release the code, either for him to review or in the publication. The availability of the AlphaFold2 code widened its reach and enabled researchers to adapt and improve on the tool. “I wanted the downloadable code because of the science that would happen if I and others had access,” says Dunbrack, a co-author of the open letter.

On 13 May, days after the backlash began, DeepMind did an about-face and announced that it would make the AlphaFold3 code and model weights available for academic use within six months.

But questions remain as to whether this version of AlphaFold3 will have the full range of capabilities, especially the ability to predict the structure of proteins in conjunction with potential drug molecules, or ligands, say scientists. “I don’t think they are going to give us the ability to do any ligand,” says Dunbrack. The OpenFold3 model that AlQuraishi’s team is developing won’t have such limitations, he says, nor any restrictions on commercial use.

There are other reasons that scientists are pursuing open-source versions of AlphaFold3. One, says AlQuraishi, will be the ability to retrain the model to better model interactions between proteins and would-be drugs. His team retrained its version of AlphaFold2 using the same publicly available data sets that DeepMind used. But AlQuraishi expects that many pharmaceutical companies — which have access to troves of experimentally determined structures of proteins bound to possible drugs — will be keen to have a version of AlphaFold3 that they can retrain with their own proprietary data, which could boost the model’s performance.

AlQuraishi isn’t the only scientist trying to learn AlphaFold3’s secrets. David Baker, a computational biophysicist at the University of Washington in Seattle, wants to see what can be applied to an open-source protein- and chemical-prediction model that his team developed called RoseTTAFold-All-Atom, which doesn’t yet perform as well as AlphaFold3.

And Phil Wang, an independent software engineer in San Francisco, has begun a crowdsourced effort to replicate DeepMind’s latest model. Wang, who also has a medical degree, has developed open-source versions of dozens of AI models, including the image-generating tool DALL-E. Wang has received financial support for his work from companies to do this in the past, but has not yet received offers to work on opening up AlphaFold3.

Hacked versions

Wang says that his team of three expects to have the code describing the AlphaFold3 model done in a month. But the most time-consuming step will be training the models on experimentally determined protein structures and other data sets, says AlQuraishi. “The code is by far the easier piece. That’s 5% of the effort.”

It’s also likely to prove expensive, says Sergey Ovchinnikov, an evolutionary biologist at the Massachusetts Institute of Technology in Cambridge. It could cost upwards of US$1 million in cloud-computing resources to train AlphaFold3 in the same way that DeepMind did, Ovchinnikov estimates, although it might be possible to cut corners to bring costs down without compromising performance.

A fully open-source version of AlphaFold3 will allow researchers to better understand how the model works and expand its abilities. But some scientists are already trying to do this with the AlphaFold3 server. “There has already been some hacking online,” says Ovchinnikov, for instance to obtain more-accurate models of proteins embedded in the cell membrane, where they interact with fat molecules. Another server hack has revealed an alternative shape that one protein adopts.

AlQuraishi hopes that the push to develop open-source versions of AlphaFold3 will serve as a “cautionary tale” to academics about the perils of relying on technology companies such as DeepMind to develop and distribute tools such as AlphaFold. “It’s good they did it, but we shouldn’t depend on it,” he says. “We need to create a public-sector infrastructure to be able to do that in academia.”