Virtualization - putting all the spinning plates on one stick
To give a better idea of what virtualization is and how it can be used, we look at recent developments at CERN, using it as a sort of case study.
In the words of EGEE director Bob Jones, a computer center operates "a lot like the guy spinning plates at the circus. He's got more and more plates spinning on more and more sticks, and he's running around trying to keep them all going."
At the typical computer center, one machine is doing graphics, one is doing electronic document handling, and one is doing something else.
"But what virtualization does it put all those plates on one stick," said Jones. "You don't need several different physical machines, but have everything running on one machine doing everything."
As a result, it's faster and more efficient.
Virtualization technology is not a new idea. Virtual memory was developed at Manchester University, UK, for the Atlas Computer as far back as 1962. In the 1980's, the IBM virtual machine VM370 allowed hundreds of users to use small slices of this one machine.
So why is virtualization becoming so important now?
The answer lies in multicore processors.
New tricks for an old dog
Because of the trend towards miniaturization and the peak in computer clock speeds in 2005, the speed offered by single core processors has hit something of a brick wall. In order to keep up with the annual demand for increased performance dictated by Moore's law, microprocessor manufacturers started putting 2, then 4, 8, 16 and even 32 cores on the same processor. Multicore is here to stay - there is no other way, that we know of, to improve processor speeds.
And until the advent of virtual memory, a computer was limited by the amount of physical memory it actually possessed. Virtualization offers a way to improve performance by multiplexing memory in and out from a backing store.
It is essentially a partitioning system, that divides the physical resources of a computer into particular services by breaking it up into manageable chunks, like a sliced loaf of bread. It then stacks these isolated layers on top of one another, allowing many people to run multiple applications on the same piece of physical equipment simultaneously. Additionally, it hides the complexity of the hardware from the user, who get the feel of experiencing the full resources of the memory, CPU, and so forth, when in reality they are only getting a slice.
"To me, virtualization is no longer a fundamental technology issue, it's just there. Everybody can use it." - Sverre Jarp
One use of virtual machines is to allow users to run two operating systems concurrently while only experiencing one. For example, you might be using Microsoft Windows to type up a report, while BOINC's LHC@Home is running on Scientific Linux behind the scenes. Large data centers can use the technology to run all of their services on virtual machines, which means they require fewer physical machines.
But the technology has yet to achieve its full potential. Virtualization offers a way to effectively exploit multicore, because running the different partitions on different cores of the same chip is a very efficient way to get the most out of the processor. It minimizes the amount of wasted processing power.
This means virtualization has started receiving serious attention again, but how much of a difference will this make? Sverre Jarp, chief technical officer of CERN's Open Lab, thinks it's ready for widespread application: "To me, virtualization is no longer a fundamental technology issue, it's just there. Everybody can use it. You can use it in banking, in grid, on your laptop. You can use it anywhere."
Even users with basic requirements can benefit from virtualization, by running background applications such as virus checkers and disc fragmentation automatically in the background. Jarp sees no reason why virtualization can't be widely used by computers in offices and homes within the decade, provided the technology is integrated in such a way to ensure there is no additional burden on the user.
Challenges for the future
Virtualization offers even more advantages for large data centers because it allows them to consolidate their server requirements. Fewer physical machines are needed, which produces overhead savings because the demands for space, power and cooling systems are smaller.
It's also more private and secure because virtualization applies another layer of separation between the different applications. Last but not least, it makes upgrading easier because you don't have to modify the virtual machine for every hardware change, and the service provider can change the system architecture without users noticing any difference.
There are still organizational details that need to be ironed out. For example, all the network addresses associated with virtual machines need to be virtualized as well. It's like splitting up a large house into ten apartments - each apartment requires a new, distinct postal address. However, unlike postal addresses, network addresses are a limited resource. Virtualization increases the pressure on the dwindling resources of available IP4 addresses, but taking advantage of IP6 network addresses may alleviate this problem. As with most technologies, virtualization is still evolving but it's ready for widespread deployment.
"It's the biggest technological advance having an impact on the grid" comments Jones. Virtualization lets the consumer manage their own computing infrastructure, because it allows them to dictate whether they want to run their applications as a series of short jobs or as a longer-running, processing-intensive task - in other words, as if the computing infrastructure is a grid or a cloud. So virtualization could be a standardizing element.
Virtualization may even unite grids and clouds.
-Seth Bell, iSGTW, with special thanks to Bob Jones of EGEE, and Sverre Jarp of CERN Open Lab. For more, see below.
On a related note, CERN held a three-day workshop to discuss how best to adapt the laboratory's applications and computing services to suit virtualization. Bringing together experts from industry, developers using these technologies, and CERN IT service providers, the workshop provided an understanding of what can be achieved today and identified a "To Do" list which will allow physics applications to further exploit multi-core and virtualization technologies.
Our workshop's suggestions:
• Provide an infrastructure in computer centers for the preparation of virtual images and the delivery of application software from virtual organizations.
• Test scheduling options for parallel jobs in mixed workload environments.
• Establish procedures for creating images which can be run and trusted.
• Investigate scenarios for reducing the need for public IPv4 addresses on worker nodes.
• Deploy multi-core performance monitoring tools, such as KSM and PERFMON, at grid sites.
• Contribute to initiatives for running multi-core jobs grid-wide, such as the recommendations of EGEE's Message Passing Interface Working Group
• Prototype a working solution to run grid jobs on cloud resources.
-Denise Heagerty, CERN