DeepMind shocked the biology world late last year when its AlphaFold 2 AI model predicted the structure of proteins (a common and very difficult problem) so accurately that many “solved” a decades-old problem. Now researchers claim to have leapfrogged the rest of the world just the way DeepMind has done with RoseTTfold, a system that does nearly the same thing at a fraction of the computational cost. (Oh, and it’s free to use.)
AlphaFold2 has been a topic of discussion in the industry since November, when it blew up competition in CASP14, a virtual competition between algorithms designed to predict the physical structure of a protein by looking at the sequence of the amino acids that make up it. Went. DeepMind’s model was so ahead of the others, so high and credibly accurate, that many people in the field have talked (half-seriously and in good humour) about moving into a new field.
But one aspect that didn’t satisfy anyone was DeepMind’s plans for the system. This was not fully and openly described, and some were concerned that the company (which is owned by Alphabet/Google) was planning to keep the more or less secret sauce – which would be their prerogative but to some extent. Even against the ethos of mutual aid. in the scientific world.
UpdateSurprisingly: DeepMind published the more detailed methods today in the journal Nature. The code is available on GitHub. This significantly alleviates the above concern, but the advance described below is still highly relevant. I’ve also added a comment from that team at the bottom of the article.
It appears to have been at least partially robbed of the work of University of Washington researchers led by David Baker and Minkyung Beck, published in the latest issue of the journal Science. You may recall that Baker recently won the Breakthrough Prize for his team’s work on combating COVID-19 with engineered proteins.
The team’s new model, RoseTTAFold, makes predictions at similar accuracy levels using methods that Baker, while answering questions via email, explicitly acknowledged that they were inspired by those used by AlphaFold 2. Were.
“The AlphaFold 2 Group presented several new high-level concepts at the CASP 14 meeting. Starting with these ideas, and with collective brainstorming with colleagues in the group, Minkyung has been able to make astonishing progress in a very short amount of time,” He said. (“She’s amazing!” he said.)
Baker’s group more or less finished second in CASP14 by no means all, but listening to DeepMind’s described methods also generally put them on a collision course. They developed a “three-track” neural network that simultaneously considered amino acid sequences (one dimension), distances between residues (two dimensions) and coordinates in space (three dimensions). The implementation is beyond complicated and far beyond the scope of this article, but the result is a model that achieves nearly identical accuracy levels – levels, it reiterates, that were completely unprecedented just a year ago.
What’s more, the RoseTTAFold accomplishes this level of accuracy much more quickly—that is, by using less computation power. As the paper puts it:
DeepMind reported using multiple GPUs to make personalized predictions, while our predictions are made in a single pass through the network in the same way that a server would be used for … RoseTTFold’s end-to-end End version requires ~10 minutes. An RTX2080 GPU to generate backbone coordinates for proteins with less than 400 residues.
Heard that? It is the sound of thousands of microbiologists breathing a sigh of relief and abandoning a draft email asking for supercomputer time. It may not be easy to get your hands on the 2080 these days, but the point is that any high-end desktop GPU can perform this task in minutes, rather than requiring a high-end cluster to run for days.
The modest requirements also make RoseTTAFold suitable for public hosting and distribution, something that may never have been in AlphaFold2’s cards.
“We have a public server that anyone can submit protein sequences to and predict structures,” Baker said. “Since we put the server down a few weeks ago, there have been over 4,500 submissions. We’ve also made the source code available for free.”
This may sound very typical, and it is, but protein folding has historically been one of the most difficult problems in biology and one to which countless hours of high-performance computing have been devoted. You might remember Folding@home, the popular distributed computing app that lets people donate their computing cycles to an effort to predict protein structures. The kind of problem that might have taken a thousand computer days or weeks to do – essentially by brute-forcing solutions and checking fits – can now be done in minutes on a single desktop.
The physical structure of proteins is of utmost importance in biology, as it is the proteins that perform most of the functions in our bodies, and the proteins that must be modified, suppressed, increased for medical reasons, and so on; First, however, they need to be understood, and until November that understanding could not be achieved computationally reliably. This was proven possible in CASP14, and has now been made widely available.
It is not, by a long shot, a “solution” to the problem of protein folding, although sentiment is expressed. Most proteins resting under neutral conditions can now predict their structure, and this has enormous bearing in many domains, but proteins are rarely found “at rest under neutral conditions”. They twist and reverse to capture or release other molecules, to block or slip through gates and other proteins, and generally do whatever they do. These interactions are far more numerous, complex and difficult to predict, and neither AlphaFold2 nor RoseTTAFold can do so.
“There are many exciting chapters ahead… the story is just beginning,” Baker said.
With regard to the DeepMind paper, Baker offered the following commentary in the spirit of collegiate camaraderie:
I’ve read it, and I think it’s a beautiful paper describing the fantastic work.
The DeepMind paper is really very complementary to our paper, and I think it’s fitting that it’s not coming after us, as our work is really based on their progress.
I think readers will enjoy reading both papers – they are far from repetitive. As we point out in our paper, their method is more accurate than ours, and it will now be very interesting to see which features of their approach are responsible for the remaining differences. We are already using RoseTTAFold for protein design and more systematic protein-protein complex structure prediction, and we are excited to rapidly improve these alongside traditional single chain modeling by incorporating ideas from the DeepMind paper.
If you’re curious about the science and possible results, consider reading this more detailed and technical account of the methods and possible next steps as written in the wake of AlphaFold 2’s CASP 14 performance.