- We’re still learning how SARS-CoV-2 interacts with cells in the body
- Understanding how the virus enters our cells requires modeling and simulation
- AI and machine learning are vital in efficiently creating these tools
Rick Stevens, associate laboratory director at Argonne National Laboratory, gave a presentation at ISC High Performance 2020 about how high-performance computing (HPC) is being used to fight back against the SARS-CoV-2 virus. In particular, he discussed how machine learning and artificial intelligence (AI) are instrumental to these studies.
At this point, we’ve heard just about every COVID-19 infection story there is. A woman in China took a single elevator ride that resulted in 71 infections. Some evidence suggests that as few as 10% of infected people cause 80% of new infections. There’s no doubt that this virus can spread quickly and efficiently, but we’re still figuring out how it works—and how we can stop it.
One way to learn about the SARS-CoV-2 virus is through computational modeling and simulation. Understanding how the virus interacts with human host cells will allow us to discover viable treatments.
Working as part of the COVID-19 HPC Consortium, Stevens and other scientists around the world are trying to model the virus as accurately and as quickly as possible—and they’re counting on artificial intelligence to give them an edge.
A key in a lock
According to Stevens, one of the goals of both simulation and modeling has to do with how SARS-CoV-2 infects the human host. Our cells have a receptor for Angiotensin-converting enzyme 2 (ACE2), which normally helps us lower our blood pressure. But the ACE2 receptor on the cell’s surface also serves as an entry point for many viruses, including SARS-CoV-2. Scientists want to model that entry.
Picture what happens at the cellular level: The SARS-CoV-2 enters your body and finds itself near an ACE2 receptor on one of your cells. A spike protein wriggling on the exterior of the virus finds the ACE2 receptor and connects with it like a key opening a lock.
If scientists can model the whole virus and its interaction with the ACE2 receptor, they can then develop therapies for blocking infection. In the terms of our analogy, they want to learn how to stick some gum on that wriggling spike protein key so it no longer fits in the ACE2 receptor lock.
“The questions you’re trying to answer with these methods are: Do these proteins have multiple states? Are the states stable? And are they good states for drugging?” says Stevens. “That is, can you get small molecules that can bind to these proteins in these different states and basically shut down their functions.”
Traditional modeling and simulation on even the fastest supercomputer in the world would take too much time. And with over 25.2 million COVID-19 cases worldwide and growing, time is something we don’t have. But AI and machine learning can help make giant leaps in our understanding in a much shorter timeframe.
Outsourcing the thinking
Imagine AI as the means of having a computer do your thinking for you. Stevens explains that there are about 4 billion molecules that could possibly be used to gum up the SARS-CoV-2 spike protein’s key. For each of those molecules, there are 100 possible binding sites on the protein to try. Simulating each of those 400 billion options individually would take too much time.
The more efficient alternative is to train an AI to do all that work for you.
“You take a sample of a million of these small molecules, and you do a molecular interaction-level simulation of how the protein might bind to that molecule. And then you use that data to train an AI,” says Stevens. “So we build something like 100 AIs, each trained on a subset of the larger data and how well the model predicts it docks, and then use those AI models to search that much larger space.”
- Graph: Sometimes known as a fingerprint, these are abstract patterns that represent the structural formula of a molecule.
- Descriptors: Numerical properties of a molecule. Each molecule has about 5,000 specific descriptors to work with.
- Image: Computationally derived images of the molecule. Just like an experienced chemist, computer vision methods are extremely good at recognizing aspects of a molecule that affect its binding.
In order for the AI to learn to effectively analyze this dataset, it needs ways to represent the molecules. The consortium is currently working with three main types of representation: graph, descriptors, and image.
Of course, using AI in this way is a lot easier said than done. Stevens and his colleagues have been working on this since the early spring of 2020. The database they’ve created already contains nearly 80 terabytes of data—and that’s not everything they wanted to include.
But they are making progress. Stevens reports that the AI programs can now search 4 billion molecules for a given target in just two days, clearing the way for the team to move onto the more laborious docking simulations. Once that step is complete, they will move onto free energy simulations at the atomic level, where they can get a much more accurate understanding of potential treatments.
When asked how he’s been dealing with researching a virus that has swallowed large portions of everyday life, Stevens is both realistic and optimistic.
“Well, it’s exhausting,” says Stevens. “But it’s also giving me a reason to get up every day. I’ve been probably working 12, 14, 15-hour days on this ever since the end of February. We’ve got hundreds of people working on this across the country and in Europe and other places, and that really makes it easier to bear. It’s good to be able to contribute to something that’s causing so much trouble.”
Read more: