iSGTW Feature - Never-born proteins: predicting protein structures from your web browser

Feature - Never-born proteins: predicting protein structures from your web browser

By predicting the structure and functional properties of Â"never born proteins
Â" scientists can learn more about the structure and function of actual existing proteins.
Image courtesy of INFN

While the number of known natural protein sequences is quite large
, it is infinitely small compared to the number of proteins theoretically possible using the 20 natural amino acids. Thus
, there exist a huge number of protein sequences that have never been observed in nature
, the so-called Â"never born proteins.Â"

If we can study the structural and functional properties of these Â"never born proteins
,Â" we can improve our knowledge of the fundamental properties that make existing protein sequences so unique. Where these studies would take years on a single CPU
, grid infrastructures make it feasible to approach this problem in an acceptable time frame.

The Rosetta software

The Rosetta ab initio module is a software application developed by David Baker at the University of Washington that predicts the three-dimensional structure of an amino acid sequence
, starting from a secondary structure prediction of the sequence itself and a set of fragments specifically extracted from the Protein Data Bank.

The Rosetta ab initio module is computationally demanding
, requiring 40-80 min CPU time
, depending on the degree of refinement required. For this reason
, we first integrated it in to the GILDA facility
, with the fundamental contribution of the INFN unit
, as a first step in running it on the EUChinaGRID production grid infrastructure, part of EGEE.

The GENIUS grid portal allows grid users to access the power of Rosetta software at the click of a button.
Courtesy of GILDA and INFN; larger version at SciVee


To allow the wider biological community to run this software using a user friendly interface
, the Rosetta ab initio application has been integrated in to the
GENIUS Grid Portal
, developed as a collaboration between the Italian INFN Grid project and based on the NICE EnginFrame Grid Portal, developed by grid technology company

Thanks to this grid portal
, non-expert users using a conventional web browser can access the grid infrastructure to execute and monitor their protein prediction jobs. GENIUS hides all the complexity of the underlying grid infrastructure from the user
, leaving researchers free to concentrate on their results.

In our context
, given the huge number of NBP sequences to be simulated
, we have set up an automatic procedure for generating parametric JDL files on the GENIUS Grid Portal. This procedure exploits the features introduced by the last release of the gLite middleware
, allowing biologists to create and submit parametric jobs to the grid. Each submitted job then independently predicts a protein structure.

- Fabio Polticelli
, Pier Luigi Luisi
and Giovanni Mivervini, Computational Biochemistry Lab, Roma 3, Italy; and Giuseppe la Rocca
, GILDA, Italy