Feature - Cancer researchers speed crystallography
Using the World Community Grid, scientists at the Help Conquer Cancer Project have found a way to automate and speed up protein crystallography, according to a recent paper in the Journal of Structural and Functional Genomics.
X-ray crystallography is the process of using x-rays to map the structure of crystals. Although biological molecules such as proteins and DNA are not normally crystalline in form, they can be prompted to form crystals through exposure to the right chemical compounds. Once crystallized, the scientists can use x-rays to map the protein; knowing the structure of a protein is invaluable to scientists who are trying to understand how a protein interacts with the human body to cause cancer and other illnesses.
It can take thousands of attempts before a single protein sample will crystallize, however, making it necessary for human experts to examine each sample to verify that crystallization had occurred.
"The process of crystallography some say is more art than science," said Igor Jurisica, a senior scientist at the Ontario Cancer Institute. "You can find multiple experts that not necessarily completely agree on what they see."
Human verification is also time-consuming; for high-throughput protein crystallography to become a reality, scientists needed to find a way to automate the process.
That's exactly what Jurisica and his colleagues at the Hauptman-Woodward Medical Research Institute in Buffalo, New York, have been working on for the last decade. Using HWMRI's image bank of 12,500 human-identified crystallized cancer-related proteins as their gold standard, they've been refining their image analysis algorithm, making it progressively more accurate and comprehensive.
The first challenge they faced was identifying what quantifiable characteristics distinguish the image of a protein from that of a crystallized protein. This was harder than you might think. Just as an athlete may not be able to explain how they accomplish a complex maneuver, most experts can't really describe what they're looking for when they identify crystallized proteins. They just know it when they see it.
To start, Jurisica's team managed to come up with 850 characteristics to look for in the images they examined. Using those features, their algorithm could accurately verify crystallization in 70% of cases. That algorithm has evolved over time; today it can identify over 15,000 features, resulting in the successful identification of 80% of crystal-bearing images and 98% of the clear drops of protein solution that exist prior to crystallization.
"As we improved the algorithm, we had the problem that the fastest computer that we would be able to use would be a Linux cluster with 2000 cores, but even if we could take it completely it would take 168 years," Jurisica said.
Jurisica had a history of working with IBM on other research projects, and his colleagues at IBM had already mentioned the WCG to him once before. It quickly became clear that it was the right choice for their needs.
"The grid was the only option where we could actually go and process all of these images within a reasonable time frame," Jurisica said. "It's basically simple parallelism, because each image can be sent to a separate processor; we can process each image separately without any need for communication between those processing nodes."
The Help Conquer Cancer team translated their prototype into C++, and IBM donated the time to get the system set up on the WCG, which uses the well-known BOINC platform for volunteer computing.
When the Help Conquer Cancer Project officially launched on WCG in November 2007, they expected to be able access the WCG's 85 million CPUs in order to analyze their images in only five years - a far cry from the 168 years they would need on their high performance cluster.
Since then, the outlook has only improved. The Help Conquer Cancer Project has improved their algorithm, and the WCG has grown to include 115 million CPUs. Now, Jurisica hopes to complete the analysis in 2011.
While the WCG chips away at the mountain of images to be analyzed, Jurisica and his colleagues continue to refine their algorithm.
"The more computations we can do, the better result we obtain from this analysis," Jurisica explained. "We already started doing data mining and analysis of these results in order to speed up and gain additional benefits."
The algorithm designed by the Help Conquer Cancer Project may prove to be useful in other domains of research - for instance, other protein crystallography studies.
Said Jurisica, "If they are photographed in a similar way, they would be able to work with maybe some small modifications and some retraining of the algorithm."
-Miriam Boon, iSGTW