Feature - Stem cell research goes Boolean

Feature - Stem cell research goes Boolean with BooleanNet

Image courtesy of Rodolfo Clix.

To make use of the human genome in our quest to understand genetic disorders, we need to learn more about what each gene accomplishes. Unfortunately, connecting a specific gene to the formation of a specific cell can take years of hard work and thousands of dollars.

An algorithm that could cut that time down from years to hours has passed its first litmus test, however, according to a paper recently published in the Proceedings of the National Academy of Sciences.

The paper's lead author, Debashis Sahoo, had his eureka moment during an immunology class. Sahoo, who was working on a doctorate in electrical engineering at the time, observed that although many biological relationships are asymmetrical, biologists tended nonetheless to look for symmetrical relationships. Sahoo and his advisors quickly realized that these asymmetrical relationships can be found using Boolean logic, such as if-then implication structures, and the BooleanNet program was born.

"My magnifying glass is Boolean implications," explained Sahoo, who is now a bioinformatics researcher at the Stanford Stem Cell Institute.

In order to test BooleanNet, Sahoo and his colleagues chose the B cell. "B cell is the most well understood developmental pathway in the whole of developmental biology," said Sahoo.

Sahoo and his colleagues began with three genes associated with B cell development. One, which we will call A, is known to be active at the very beginning of B cell development, while the remaining two (C and D) are active at the very end of B cell development, and just before the end. They decided to search for a gene we will call B, which is the precursor to C and D.

By searching existing databases for genes that are inactive while A is active, but are active while C and D are active, the algorithm produced a list of 62 genes. These genes, according to the algorithm, had a high probability of being involved in the middle of B cell development.

Image courtesy of Svilen Milev.

To test that prediction, the researchers searched public databases for strains of laboratory mice engineered to be deficient in one of the 62 genes. Forty-one were found, and among those, 26 are known to be associated with defects in B cell development.

"We have to prove this in multiple systems," said Sahoo, before the method is likely to be widely accepted. That process may take longer, as other developmental pathways are not as well understood, and data on relevant strains of laboratory mice may not exist for verifying predictions.

Gaining the trust of developmental biologists is just one barrier standing in their way. Getting access to data is also likely to be a challenge. So far they've been able to pull data from existing free data archives, but not all of the archived data is useable. For example, sometimes when people publish their data in these archives, they publish the "normalized" data; Sahoo needs the raw, un-normalized data.

BooleanNet is not, of course, a panacea. Even with all of the data in the world, this method will not find every gene associated with a developmental path.

"I look at every gene that has Boolean implications, but there are genes that don't have Boolean implications," said Sahoo. "You won't be able to figure out everything but you will be able to figure out most things that have systematic Boolean relations."

For the PNAS paper, BooleanNet took only four hours of computational time on a standard high-end desktop. But the algorithm's need for processing power scales quadratically as more information is added. With next generation sequencing technologies coming online, the amount of data BooleanNet will have access to will increase sharply. At that point, applying the algorithm will require more processing power, storage space, and/or bandwidth.

"If we can get the data in one place, then we can probably use a supercomputer for that analysis," said Sahoo. "If the data is distributed in many parts of the world, you can't get them together for security issues; a distributed grid can be used in that scenario."

-Miriam Boon, iSGTW


  • Miriam Boon

    E-mail Miriam

    After earning her undergraduate degree in physics from MIT, Miriam decided that writing about science is just as much fun as doing it. Over the next decade, she developed a specialization in science journalism and new media, earning a masters degree in journalism in 2010. Her career culminated in three years as US Editor of iSGTW. Today, she is a student and research assistant in the Technology and Social Behavior doctoral program at Northwestern University, where she hopes to do applied experimental research at the intersection of online communication, science, democracy, learning, and computing.