Small molecule therapeutics, such as the anti-cancer drug Gleevec, work by binding to a specific protein in the body and modulating its function. Cancer researchers work to pinpoint proteins in cancer cells, and find compounds that target those proteins, all in hopes of shutting them down. "The tough part is finding that compound," says Samy Meroueh, associate professor in the department of biochemistry and molecular biology at the Indiana University School of Medicine (IUSM).
Meroueh's laboratory has created SPLInter (Structural Protein Ligand Interactome), an online interactome that predicts the interactions of thousands of small organic molecules with thousands of proteins, through structure-based molecular docking and scoring. The site also ranks proteins for active compounds against individual targets. Users can also identify potential off-targets of a compound of interest for further experimental validation in cells or in vivo.
"In SPLInter, we're doing the same thing we would traditionally do in rational drug design, but doing it at the cellular level. Instead of one focused target, we screen for all proteins of the human proteome whose structure has been solved by x-ray crystallography or NMR. We take a protein, find its structure, and through its structure we do virtual screening for all the targets in the cell."
"Proteins evolve to adopt a specific shape. That shape and how the protein associates with other things are critical. If you want a drug to bind to a protein it has to fit into a cavity; the small molecules like pockets." explains Meroueh.
"Collecting all of the protein structures available in the PDB and finding all 7,500 pockets was our first step in creating SPLInter. So far with Open Science Grid (OSG) we've docked thousands of compounds and we're continually expanding the interactome. More compounds means more diversity, which translates into more chances of finding interesting proteins and discoveries that are promising," says Meroueh.
"When we met with Dr. Meroueh in January, we were looking at a first run that would potentially include 3,900 proteins and 5,000 compounds, totaling 19.5 million docked pairs," explains Rob Quick, manager of Indiana University's high-throughput computing group and OSG operations area coordinator. "This translates to 19.5 million individual, short jobs on OSG."
"We did face a few preliminary challenges. Some nodes on OSG did not support the Computational Sciences at Indiana University (CSIU) Virtual Organization at the time, which we knew we were going to use. We also knew some sort of grouping was necessary to negotiate that many individual jobs. We were also looking at 19.5 million output files," Quick says.
"Mats Rynge from the Information Sciences Institute at the University of Southern California, US, hooked us into Pegasus for workflow management and helped with the grouping and creating the sub-workflows. His work and the grouping really put us in the sweet spot on OSG for a total runtime of between one to two hours."
Subsequent runs addressed the non-trivial problem of how to conveniently deliver the large number of output files to the researchers. "In the later runs, we further grouped jobs so that output was organized into a more manageable directory structure, significantly reducing output file handling overhead," says Scott Teige, OSG operations center manager. Teige built on Quick's work in the approximately 11 million follow-on runs requested by Meroueh's group.
"We had contributing nodes on OSG totaling near 1.4 million hours for the first 19.5 million jobs, which ran for about a month," Quick says. "Since then we've done two more runs and we're now up to more than 30 million docking jobs consuming over 3 million CPU-hours from opportunistic OSG resources. Following all of the preliminary work though, the crowning jewel in this project has been being able to finally hand this off to the graduate students at IUSM, who can now control their data and submit jobs on OSG at their own discretion."
"Hopefully the interactome will be a platform for researchers to do a lot of exciting science," says Meroueh. "I'm of the opinion that cancer is not a single target disease; it is a multi-target disease, meaning in order to really vanquish cancer, we have to be able to target multiple proteins simultaneously. One protein will not be enough."
"Cancer cells have this amazing ability to adapt; if you inhibit one protein, the cancer cell will find another signaling pathway to continue surviving. Most scientists agree a multi-pronged approach is necessary, but doing it is the challenge. I hope this interactome will enable us to look at an ensemble of targets and start thinking about how to find compounds that could shut down multiple pathways simultaneously. My hope is that we will eventually be able to explore the entire proteome."