Whether it's modeling the neural connections of the brain or visualizing blood flow in the arteries, the usage of distributed computers for research by life scientists is increasing at a rapid rate. In 2012, European Grid Infrastructure (EGI) reported a 106% rise in life science usage and around 11% of users of PRACE's high performance computing resources are life scientists. Software use and development is growing in the area, but codes and applications can exhibit very different efficiencies and scaling behaviors in different hardware architectures. ScalaLife, a one-stop-shop for life scientists and software communities, has been assisting the community with making simulations run faster and more efficiently. With the project set to come to an end in August, Rossen Apostolov will be discussing some of their findings at the International Supercomputing Conference (ISC'13) later this month in Leipzig, Germany.
"If you come in as an experimentalist and you need to run a bio-molecular simulation, it can be a bit intimidating to dive into an area of command-lines, clusters and parallelizations," says Apostolov, who is a researcher at KTH Royal Institute of Technology, Sweden and is the project coordinator of ScalaLife. "If a researcher wants to use their application efficiently on HPC machines, they have to understand both the underlying algorithms, and the specifics of the systems, which can be a rather a steep learning curve. Most end-users will often run a sub-optimal setup and that's where expert advice is very helpful," he says.
Seven apps in Competence Center for Computational Life Scientists
Since 2011, the EU-funded project has united life scientists with expert software developers through a web portal - a life sciences "Competence Centre", which offers best practice guides, tutorials, a help desk, training and advice on choosing parameters for modeling. The site attracts on average 3,500 visits each month and shows, for example, how simple tweaks can improve performance four to five times.
"Structural biologists will use crystallography to investigate the interaction of membrane proteins so they can test out possible drug targets and they can gain a lot from bio-molecular simulations. With the latest codes, you can simulate the molecular dynamics of gigantic biological systems of 500 million atoms for example. We are only limited by available memory," says Apostolov.
In the last year, three new applications have also been added. MUSIC allows various neurosimulator applications to exchange data during run-time. XMIPP is a suite of image processing programs, primarily aimed at single-particle 3D electron microscopy. SIMONA is the newest application which helps simulate protein interactions.
Help with HPC access
It is still mainly computational experts who apply for HPC resources and not the "average" life scientist but the project offers assistance with proposals to PRACE's resources, as researchers can say that their code has been tested and can efficiently run in parallel on 1000s of cores. "Clearly, compiling code to get the most out of the HPC resources is non trivial. ScalaLife avoids reinventing the wheel every time, by providing optimized solutions," says Alexandre Bonvin, project coordinator of WeNMR, a worldwide e-infrastructure for structural biology and NMR (nuclear magnetic resonance).
Gluing together structural and quantum worlds
The degree of scalability of GROMACS, for instance - a popular application for simulating molecular dynamics - has also been pushed several times beyond the original implementation. Gromacs is fifth in the list of WeNMR's most popular applications.
Solving quantum mechanics calculations is even more computationally intensive. At a quantum-scale, researchers often rely on a program called DALTON to study the quantum electronic properties of biomolecules. Researchers at KTH recently published a new super-fast algorithm for electron integers in the Journal of Chemical Physics. Zilvinas Rinkevicius, a lecturer in theoretical chemistry, says that the project has enabled their research to move from investigation of conceptual model systems to full-scale simulations of biomolecules in their environment.
The project has now integrated Dalton and Gromacs into the multi-scale MAPPER framework that can break up the task of simulating multi-scale processes into several single-scale problems. The aim is to facilitate data exchange between quantum and molecular mechanical calculations. "Typically, you would run a simulation with Gromacs and take snapshots from the trajectory of the protein and membrane, and then process those snapshots with Dalton to understand and calculate the optical features of those proteins embedded in the membrane. This is not very efficient as you often waste CPU cycles going back and forth between the two applications," says Apostolov.
Simplifying tool discovery
For life scientists, the reasons for choosing a particular software vary enormously. A recent survey published in the journal Science suggests that when modeling species distribution scientists based their decisions on an easy to manipulate user interface (57%) but only 11% used 'syntax-driven' platforms (read more about this in last week's issue of iSGTW, here).
Bridging the skills gap is an important step, but finding the right tools for research needs can prove problematic as semantics can differ. "It can be difficult to find and make sense of the software offerings out there as tools come in a variety of types of interface, can have multiple functions (including both data services and analytical functions), are bundled in arbitrary collections, and served by multiple providers" explains John Ison, a developer from the EMBL-European Bioinformatics Institute in Cambridge, UK, who is constructing a service/tool registry to help assist researchers navigate the heterogeneous bioinformatics software landscape. Researchers will be able to find, understand, compare and select software more easily, including web services, command-line tools, web-user interfaces, and desktop programs. The initiative is part of the BioMedBridges project, which is bringing together ten different communities in the biological and medical science.
As the software landscape is vast and evolves all the time, managing the curation effort will have to be a federated community effort. It is about shifting the culture of developers and providers so that when changes are made, software metadata (in an easily consumed format) is updated - so that it can be incorporated automatically into a central public registry.
Projects such as ScalaLife and the BioMedBridges tool registry are helping to establish a more easy-to-navigate software environment, which will allow life scientists to focus their efforts on their research goals, rather than on the complex and often distracting journey of compiling, testing and debugging coding language.
Read more iSGTW stories on Mapper (Glueing together a multi-scale world) and ScalaLife ('One-stop shop' saves time & money for life sciences).