DeepMind has solved perhaps one of the greatest challenges in biology. One that rivals the discovery of DNA’s double helix. It could forever change biomedical medicine, drug discovery and vaccine development.
The actual performance sounds a lot less sexy at first glance. One of DeepMind’s powerful AI algorithms, called AlphaFold, used its deep learning capabilities to predict the three-dimensional (3D) shape of a protein, down to the width of an atom. It’s a challenge that has baffled biologists for 50 years – so much so that computer-based protein structure predictions have been worked into crowdsourcing games, global competitions and a Nobel Prize looking for a breakthrough.
We are at that turning point. AlphaFold triumphed over about 100 other teams in a long-running challenge called Critical Assessment of Structure Prediction, or CASP, with an overwhelming performance. Speak against Nature, CASP co-founder Dr. John Moult of the University of Maryland said, “In a way, the problem has been solved.”
Dr. Columbia University’s Mohammed AlQuraishi, who also participated in CASP, praised the AI as transformational. “It’s a first-class breakthrough, certainly one of the most important scientific achievements of my life,” he told me Nature.
It also comes as a triumph for DeepMind, which rose to fame with a slew of algorithms outperforming humans in games like Go and the entire Atari laundry list. However, the win for protein structure prediction marks its dazzling real-world debut – one that ignores negative knowledge about AI’s value for real-life dilemmas.
DeepMind isn’t the only contender in the protein folding game. AlphaFold relies on biological data and insights. This week, a group of experimental scientists delivered. By tactically altering the genes of a complicated protein assembly and observing the result, the team was able to build an algorithm that reconstructs the protein with extremely high accuracy.
Together we are on the road to a paradigm shift. “This will change medicine,” said Dr. Andrei Lupas of the Max Planck Institute for Developmental Biology. “It will change research. It will change biotechnology. It will change everything. “
What’s the problem?
A central principle in biology is ‘structure explains function’. The discovery of the double helix shape of DNA, for example, has dramatically increased the understanding of how genetic information is copied and stored. Without structure, we wouldn’t have gene editing, DNA computers, or storage devices.
Protein structures have been shown to contain just as much, if not more, information. But they are much more difficult to decipher. They begin life as ribbons of linear components called amino acids, like beads on a string. Based on hugely complicated biophysics – much of which remains mysterious – the string folds into delicate shapes, such as sheets of twisting and turning strands, or helices that wrap around each other. Many of these structures further link into a megaplex. Only then can they function as intended to survive.
Knowing the structure of a protein allows us to make educated guesses about its function. And by mapping thousands of protein structures, we can begin to decipher the biology of life – and find ways to manipulate it.
Take Covid-19 vaccines. A major breakthrough has been the mapping of the structure of spike proteins on the surface of the virus, which the virus relies on to enter our cells. Imagine the 3D structure of a protein as a lock. If we can map the shape of the lock, it is possible to design ‘keys’ – drugs or vaccines – to disrupt it. Unsurprisingly, DeepMind’s AlphaFold went after these peak protein structures in March, just as Covid-19 cases were skyrocketing around the world.
The classic “gold standard” for uncovering protein structures is based on an extremely tedious and difficult laboratory technique called X-ray crystallography. Scientists essentially ‘freeze’ proteins into delicate crystal-like structures and use a combination of X-rays, high-tech microscopes and math to figure out their shapes. But not all proteins can be “flash-frozen” for analysis, leaving a hole the size of the Grand Canyon for decoding biology. Other methods, with unfriendly names like “nuclear magnetic resonance spectroscopy,” are just as expensive and choosy.
But here’s the thing. The instructions for building a 3D protein are inherently embedded in the 1D amino acid sequences – a discovery that won the Nobel Prize. And if there is one thing that AI is good at, it is finding patterns in complicated sequences that are beyond our meager human ability.
3D Chess
The CASP are challenging crowd-sourced predictions of protein structures already identified using X-ray crystallography but not available to the public. DeepMind is no newcomer to the challenge; In 2018, its performance shocked many academic scientists who had long worked in the field.
AlphaFold’s strategy is similar to most of the entries in CASP this year in that it is based on deep learning. Remember, amino acid sequences, the building blocks of proteins, contain data about the final 3D shape of a protein, which seems perfect for a deep learning approach.
DeepMind went one step further. They took on the leviathan task of adding physics, geometry, and evolutionary history data to their model. The neural network, trained on protein databases of approximately 170,000 protein structures, could interpret the structure of the protein as a “3D map” and analyze any hidden relationships or patterns. By repeating this process, AlphaFold was able to “determine highly accurate structures within days,” DeepMind wrote.
These are not empty words. At CASP, the algorithm put competitors to shame. Almost two-thirds of his predictions were comparable to experimental data with similar single-atom resolution. It scored a mind-boggling 90 out of 100 – a massive 25-point margin over other contenders.
More to go
More practically, the success of AlphaFold means that we could have access to previously “non-drug” proteins – many of which are involved in cancer and other serious diseases.
Almost all of our drugs are designed to attach to a protein, like keys to a lock. The first step is to know your enemy; that is, the structure of the protein to find vulnerable attack points. Having an AI-based method to decode protein structure could quickly screen for tens of thousands of new drug targets. “AlphaFold will open a new area of research,” said Dame Janet Thornton of the European Bioinformatics Institute in the UK MIT Technology Review.
Overwhelming awards aside, there is room for improvement. AlphaFold is relatively slow compared to some algorithms that return results in seconds, but with the tradeoff of less accuracy. But more importantly, it struggled to decipher protein complexes – megastructures of multiple individual 3D building blocks forming into a collective functional entity. These are hardly rare in biology – most of the chemical receptors in our brain cells, for example, depend on these structures. They are also like shape-shifting mega-Rubik’s cubes, as their 3D structure can change depending on the state of the body. For example, a closed tunnel megaprotein can open when it detects a chemical docked on the surface – a process that’s central to how our brains work.
The positive side? DeepMind has help. This week, a team took a separate approach to analyzing protein complexes in living cells – something that AlphaFold has not yet dominated. Their approach to the annoying problem went back to genes, the blueprint that drives amino acid chain construction, which contains information about 3D protein folding.
It’s also an out-of-the-box idea. The team found they were able to quickly search thousands of mutations for a gene that makes a protein in living cells. By observing the structure of the resulting protein complexes, they can then use AI-based methods to map out how one mutation affects another – in turn revealing the ‘rules’ behind how these megastructures arise by just looking at their underlying genetic instructions. .
As with AlphaFold, the technology called “integrative modeling” is not yet ready to replace the gold standard of protein mapping. But more than ever before, we are close. From single proteins to meta-protein complexes, we now have faster, simpler, and cheaper ways to accurately visualize a biologically invisible human. With AI and biology working together, protein folding may be the first big breakthrough for medicine in our generation.
“AlphaFold is one of our most significant advancements to date,” wrote the DeepMind team. [The progress] “Gives us further confidence that AI will become one of humanity’s most useful tools in pushing the boundaries of scientific knowledge, and we look forward to many years of hard work and discovery ahead!”
Image credit: fdecomite / flickr