Feature  Scientific software goes parallel 



Most scientists do their data analysis using commercial software over which they have little control. Yet, to take advantage of the multicore processors new computers ship with, algorithms must be designed to run in parallel. Luckily, many of the more popular scientific software packages have gone parallel. Some even offer versions or toolboxes that manage clusters or grids. iSGTW scoured the world of scientific software for the latest information on parallelization in scientific software. 

Excel "More than 50 percent of customers cite lack of skills and complexity of integrated cluster solutions as inhibiting to adoption," said Kyril Faenov, general manager of Microsoft's technical computing group. "On the other hand, we know that more than 50 percent of workstation users today have said that they have problems that don't fit in their workstations and that require largescale infrastructure, but only 10 percent of them use HPC." Since virtually all Windows machines already have a copy of Excel, the HPC team at Microsoft views Excel as the perfect place to start researchers on parallel computing, according to Faenov. "We have a broad set of researchers that work for engineering firms for example and they want to look at the data coming in from their labs," Faenov said. "Some of them choose to very quickly whip up their models in Excel to do some simple statistical analysis. But then what you want to avoid is having to recode it." The latest version of Excel will automatically make use of multiple cores, but for more advanced scheduling, it should be used in conjunction with Windows HPC Server 2008 R2, which can distribute jobs across multiple workstations or even clusters, spawning new instances of Excel as needed. Many of the algorithms scientists use on a regular basis, such as monte carlos, come in libraries which must be purchased separately. 

Maple Maple introduced their Grid Computing Toolbox in late 2007. The GCT allows Maple users to run computations on large networks, multinode supercomputers, or the more commonplace multicore computers widely available on the market. Maple 13, which was released 27 April 2009, provides a multithreaded programming model called The Task Programming Model. The Task Programming Model was originally inspired by the MIT Supercomputing Technologies Group's Cilk project and Intel's Threaded Building Blocks. Parallelization is an ongoing area of development for Maple. "Our goal for the future is to keep expanding the set of operations where multiprocessing happens automatically and without further user intervention, while always allowing an expert user to override the defaults and interact with Maple's parallel toolkits at a low level in order to achieve the best possible performance," said Laurent Bernardin, executive vice president of research and development at Maplesoft. 

Mathematica Mathematica first dipped a toe in the parallelization pool in 2005, when they made a lot of the intensive numerical capabilities such as statistics and matrix operations automatically multithread over multiple local CPUs or multicore CPUs. The most recent version of Mathematica, which was released in 2008, introduced userspecified parallelization tools, and an extensible license model for additional computation engines. "The idea is that a user can specify within their own programs that a particular task is to be split into parts and distributed over subordinate instances of Mathematica to be computed in parallel," said Jon McLoone, who is currently director of business development at Wolfram Research, the makers of Mathematica. "Then the results [are] returned and combined in the controlling Mathematica." Mathematica developers have tried to automate as much as possible, in as many ways as possible. "So, for example, I can write code that says effectively 'do the following operation in parallel on 1000 data points' but not have to specify how many parts to break it into, or where the subordinate Mathematicas are or how many of them there are," McLoone explained. That way, whether the software is run on a single multicore machine, or in a lab with all of the computers networked together, Mathematica will automatically split the problem into the correct number of parts and distribute them  as long as the user has the right Mathematica licenses. Other kinds of automation incorporated into Mathematica include automatically distributing programs, restarting subordinates in the case of a failure, and discovering and acquiring remote computers. What next? "There are a few more wellunderstood parallel algorithms that are yet to be implemented, but also many algorithmic areas where noone has ever really studied how one might parallelize in a robust and effective way," McLoone said. "The userspecified parallelization platform makes it easy for our developers to write parallel algorithms just as it does for endusers. It also means that many of the parallel algorithms that will be in future versions of Mathematica will also work over networked grids and clusters, not just on multicore local hardware." 

Matlab The MathWorks' Parallel Computing Toolbox allows users to take advantage of multiprocessing on a multicore computer, while the MATLAB Distributed Computing Server extends that ability to larger pools of processors such as clusters. A variety of MATLAB algorithms incorporate support for the use of parallel processing; these work in conjunction with the PCT and DCS. "Users are looking to easily access the power of high performance computing resources, whether it be maximizing the multicore capabilities of their local desktops or using clusters and grids," said Silvina GradFreilich, manager of parallel computing and application deployment marketing at The MathWorks. "Parallel Computing Toolbox supports up to eight desktop cores, enabling users to leverage the increased hardware capabilities without requiring significant programming." Researchers can also use the Parallel Computing Toolbox to create their own programs. "Parallel Computing Toolbox language constructs, such as spmd, simplify the development of dataintensive parallel applications," GradFreilich said. "With language features such as spmd, users solve large computationally and dataintensive technical problems by making minimal to no code changes to their existing code." 

Stata As a statistical package, many of the operations in Stata require substantial computational power. That's why a multiprocessing version that runs on up to 64 cores, called Stata/MP, was introduced in April of 2006. "Stata/MP works by parallelizing the computations that underlie the commands," said Vince Wiggins, vice president of scientific development at StataCorp. "That means that Stata users do not have to change anything to obtain performance improvements." According to Wiggins, thanks to the three years of development they put into developing Stata/MP, many of their algorithms approach the theoretical limit of running twice as fast every time you double the number of cores. "Because we implemented parallel algorithms and tuned the use of multiple processors, rather than rely on compiler switches, we were able to effectively parallelize more commands and approach theoretical performance limits for many commands," Wiggins said. Since then, they have tried using the new compiler switches that automatically provide multiprocessor support, with little success. "Compared to our handrolled algorithms, the results were dismal," Wiggins said. "For statistical computations, compilers cannot adequately identify large code blocks that can be parallelized, and they clearly cannot automagically apply wholly different algorithms that are naturally parallelizeable." Stata also has an integrated matrix programming language called Mata. The Stata/MP version of Stata 11, which was released in July of 2009, included 109 parallelized functions in Mata. It doesn't stop there, either. "We continue to parallelize new commands in Stata and new functions in Mata," Wiggins said. 

Statistica "We introduced this capability with the release of the serverbased WebSTATISTICA solutions back in January 2004," said Thomas Hill, the vice president of analytic solutions at StatSoft. "In this architecture, instances of STATISTICA would run in parallel to support significant computational load." WebSTATISTICA is scalable, supporting multiple servers and intelligent failover (that is, requests for analyses will automatically be routed to available servers). STATISTICA Version 9, the desktop version of the software, supports multithreading for most analytic tasks. These include all of the data mining algorithms such as classification and regression trees and stochastic gradient boosting. For true parallel processing, however, users will have to go with WebSTATISTICA. What next? In addition to considering a possible move into the realms of cloud computing and software as a service, StatSoft will focus on improving integration between desktop and serverbased computing, continuous development of efficient interfaces to STATISTICA analytic services to support automated scoring and analysis solutions.  Miriam Boon, iSGTW 