To build an HPC cluster in house, or to access third party HPC resources through the cloud: that is the question. While it may not be quite as poetic as Hamlet, this is the conundrum with which many small-to-medium-sized enterprises and research institutes are faced. Organizations interested in conducting computationally expensive data analysis or carrying out complex simulations have to decide whether to build in-house HPC clusters, or take advantage of the availability of such clusters through cloud offerings. Both options have their relative pros and cons, but the message from last week's ISC Cloud '13 conference in Heidelberg, Germany, is that there is increasingly a very clear middle way.
Termed 'utility HPC' by keynote speaker Jason Stowe, CEO of Cycle Computing, this middle way involves organizations owning in-house HPC resources of sufficient performance to cover their typical usage, but also supplementing this on an ad hoc basis with additional cloud-based HPC resources for particularly computationally expensive projects - in other words, HPC clouds to cover the peaks.
Consequently, as Stowe and others at the event argued, the question at the top of this article has now become a false dichotomy: why ask 'either/or', when you can intelligently use both in conjunction? Why invest in extensive in-house HPC capacity that is unlikely to be fully exploited at all times? But equally, why not build some limited in-house capacity if your organization possesses the relevant expertise and if this would be cheaper than purely using cloud-based offerings. "Too often, scientists and researchers have to scale their research to match the capabilities of the hardware available, but with supercomputing available on demand, researchers should be able to do better science and do it faster," says Stowe.
Flexibility and adaptability were themes which ran through the conference as a whole: both in terms of flexibility with regards to when clouds are and are not used, as well as flexibility within the clouds used. Andrés Gómez of the Supercomputing Centre of Galicia, Spain, gave a presentation at the event about how federated HPC clouds can be used to optimize radiation therapy cancer treatments. He seeks to use complex mathematical algorithms to calculate the best treatment for an individual patient.
As part of this, Gómez presented an autonomous and fault-tolerant virtual cluster architecture, developed as part of the European BonFIRE project, which can handle significant variability in computing performance, including the failure of a cloud site. The architecture includes an 'elasticity engine', which uses information about application performance to automatically tailor the size of the HPC cluster to meet the expectation of the user. In case of a specific deadline objective, the elasticity engine can start new machines and add them to the virtual cluster. Additionally, if the virtual cluster is deployed on two different cloud sites and one site fails, the cluster at the other site can resize itself to recover from the failure.
Other presentations at the event also focused on the use of HPC clouds in a medical context. Ad Emmen, managing director of Genias Benelux, presented on the Dutch Health Hub, which is an initiative seeking to enable the sharing and use of medical big data across the Netherlands. Emmen explained that using big data in this setting requires a federation of clouds, and several cloud-federation tools that have been developed as part of the European Contrail project are currently being deployed to achieve this.
Also on a medical theme, Horst Schwichtenberg from the Fraunhofer Institute for Algorithms and Scientific Computing in Germany, discussed effective ways to deal with sensitive patient data in a secure cloud environment. Of course, not all research case studies presented focused purely on medical applications. FermiLab's Steven Timm presented the latest results from the FermiCloud project and discussed research on the use of virtualized Infiniband for message-passing interface (MPI) and as a virtual private network (VPN). Also, Wolfgang Gentzsch, chair of the event, led a session focusing on some of the roadblocks faced by HPC as a cloud-based service. During this session, panelists discussed some of the lessons learnt from the UberCloud experiment, including from a case study focusing on computational fluid dynamics for a desalination plant. You can read more about this and other case studies in the UberCloud compendium, here.
Finally, Earl Dodd, president of Ideas and Machines, spoke during the closing session of the event on the importance of training in HPC. He argued that HPC cloud standards play a crucial role in developing a skilled workforce and said that he would have liked to have heard more people focus on the challenges related to the workforce throughout the two-day event. "Graduates often require years of additional training when they turn up to work at an oil company, or elsewhere," he says. "We need to focus more on the people and less on the systems." During his presentation, Dodd highlighted a number of ways to improve the situation, including: industry partnering with and funding groups like the Cloud and Autonomic Computing Center at Texas Tech University, US; extending university and college certification programs; increasing incentives for industry to train staff and exchange information; promoting and exploring more collaborative outreach; and more closely coupling HPC, the cloud, and big data business models with industry-driven needs. "Workforce development is like online dating: it's not just about what you are looking for, it's also about what you have to offer," says Dodd. "We've got to make it fun and sexy again."