• Subscribe

Back to Basics - How hardware virtualization works: Part 2

Back to Basics - How hardware virtualization works: Part 2

Greg Pfister
Since retiring from his position as an IBM Distinguished Engineer, Greg Pfister has worked as an independent consultant as well as serving as research faculty at Colorado State University. Pfister is the author of "In Search of Clusters," and over his 30-year career, he has accrued over 30 patents in parallel computing and computer communications.

It is possible to find many explanations of hardware virtualization on the Internet and, of course, in computer science courses. Nonetheless, there remains a great deal of confusion regarding this increasingly popular technology.

This multi-part series attempts to provide an approachable explanation of hardware virtualization. You can see Part 1 here.

The Goal

The goal of hardware virtualization is to maintain, for all the code running in a virtual machine, the illusion that it is running on its own, private, stand-alone piece of hardware. What a provider is giving you is a lease on your own private computer, after all.

"All code" includes all applications, all middleware like databases or LAMP stacks, and crucially, your own operating system -including the ability to run different operating systems, like Windows and Linux, on the same hardware, simultaneously. Hence: Isolation of virtual machines from each other is key. Each should think it still "owns" all of its own hardware.

The result isn't always precisely perfect. With sufficient diligence, operating system code can figure out that it isn't running on bare metal. Usually, however, that is the case only when specific programming is done with the aim of finding that out.

Trap and Map

The basic technique used is often referred to as "trap and map." Imagine you are a thread of computation in a virtual machine, running on a one processor of a multiprocessor that is also running other virtual machines.

So off you go, pounding away, directly executing instructions on your own processor, running directly on bare hardware. There is no simulation or, at this point, software of any kind involved in what you are doing; you manipulate the real physical registers, use the real physical adders, floating-point units, cache, and so on. You are running asfastas thehardwarewillgo. Fastasyoucan. Poundingoncache, playingwithpointers, keepinghardwarepipelinesfull, until…


Greg Pfister
Image courtesy of Greg Pfister

You attempt to execute an instruction that would change the state of the physical machine in a way that would be visible to other virtual machines. (See the figure nearby.)

Just altering the value in your own register file doesn't do that, and neither does, for example, writing into your own section of memory. That's why you can do such things at full-bore hardware speed.

Suppose, however, you attempt to do something like set the real-time clock - the one master real time clock for the whole physical machine. Having that clock altered out from under other running virtual machines would not be very good at all for their health. You aren't allowed to do things like that.

So, BAM, you trap. You are wrenched out of user mode, or out of supervisor mode, up into a new higher privilege mode; call it hypervisor mode. There, the hypervisor looks at what you wanted to do - change the real-time clock - and looks in a bag of bits it keeps that holds the description of your virtual machine. In particular, it grabs the value showing the offset between the hardware real time clock and your real time clock, alters that offset appropriately, returns the appropriate settings to you, and gives you back control. Then you start runningasfastasyoucan again. If you later read the real-time clock, the analogous sequence happens, adding that stored offset to the value in the hardware real-time clock.

Not every such operation is as simple as computing an offset, of course. For example, a client virtual machine's supervisor attempting to manipulate its virtual memory mapping is a rather more complicated case to deal with, a case that involves maintaining an additional layer of mapping (kept in the bag o' bits): A map from the hardware real memory space to the "virtually real" memory space seen by the client virtual machine. All the mappings involved can be, and are, ultimately collapsed into a single mapping step; so execution directly uses the hardware that performs virtual memory mapping.

Concerning Efficiency

How often do you BAM? Unhelpfully, this is clearly application dependent. But the answer in practice, setting aside input/output for the moment, is not often at all. It's usually a small fraction of the total time spent in the supervisor, which itself is usually a small fraction of the total run time. As a coarse guide, think in terms of overhead that is well less than 5%, or in other words, for most purposes, negligible. Programs that are IO intensive can see substantially higher numbers, though, unless you have access to the very latest in hardware virtualization support; then it's negligible again. A little more about that later.

I originally asked you to imagine you were a thread running on one processor of a multiprocessor. What happens when this isn't the case? You could be running on a uniprocessor, or, as is commonly the case, there could be more virtual machines than physical processors or processor hardware theads. For such cases, hypervisors implement a time-slicing scheduler that switches among the virtual machine clients. It's usually not as complex as schedulers in modern operating systems, but it suffices. This might be pointed to as a source of overhead: You're only getting a fraction of the whole machine! But assuming we're talking about a commercial server, you were only using 12% or so of it anyway, so that's not a problem. A more serious problem arises when you have less real memory than all the machines need; virtualization does not reduce aggregate memory requirements. But with enough memory, many virtual machines can be hosted on a single physical system with negligible degradation.

The next post covers more of the techniques used to do this, getting around some hardware limitations (translate/trap/map) and efficiency issues (paravirtualization).

This post originally appeared on Greg Pfister's blog, Perils of Parallel, and in the proceedings of Cloudviews - Cloud Computing Conference 2009, the 2nd Cloud Computing International Conference.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2021 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.