Sidebar: Virtualization RAM Technology

There are numerous memory optimization technologies used by virtualization platforms today. I'd like to look at the major ones and discuss how they relate to Hyper-V and its methods for memory allocation.

Memory Overcommitment

One method of thinking about VM memory is to look at a virtualization server and say, "This box has 16GB of memory. I'm going to set aside 1GB for the hypervisor and the management and parent partition and use the other 15GB for virtual machines (VMs). The memory I can allocate to VMs can't be more than 15GB. I could have 15 1GB VMs, or seven 2GB VMs, and so on."

This method ensures that there's always enough physical RAM, even if all your VMs are running. Generally, however, VMs aren't using all of the memory allocated to them. Memory overcommitting takes advantage of this fact, allowing more memory to be allocated to VMs than physically exists in the box, based on expected real usage patterns that have been discovered and planned out. The VM thinks it has its full amount of memory, but in reality the VM's memory isn't mapped to physical RAM until the VM first tries to write to it. As a VM writes to its virtual memory space, the pages are mapped through to physical RAM.

This sounds great, but the problem with memory overcommitment is just what the name suggests—you're overcommitting the resources you actually have in a box. If those resources are taken to the maximum possible usage, you won't have enough raw resources and the VMs won't actually get the resources you planned. Careful planning can offset these problems, but memory overcommitment isn't working as well with modern OSs as in legacy OSs.

OSs today aren't wasteful. In old versions of Windows, if there were 4GB of memory and you only needed 1.5GB, you'd see a big figure for free memory in Task Manager. The hypervisor that allocated RAM on first use never allocated the other 2.5GB of memory to the VM, so the overcommitment was safe. Today's OSs waste not, want not; whatever memory they have, they'll try to use for caching features, most SuperFetch (which was introduced in Windows Vista and preloads a machine's most used applications into memory for faster application start times). Linux OSs also have similar functionality. You can see this when you look at the Free and Available memory values in Task Manager, as shown here. I have 12GB of memory and only 68MB Free, but nearly 10GB of Available memory. Windows is using 10GB of memory as cache (the Cached value) to store information I've used or may use, but it can easily drop data from the cache and make it available to programs when needed, which is why the Available figure includes Cached memory.

In a hypervisor that allocates physical memory on first write, memory overcommitment won't help because Windows Server 2008, Windows Vista, later Windows OS, and modern Linux OSs will try to use whatever memory they have to improve the overall performance. First use memory allocation won't be very effective and risks performance challenges.

Ballooning

The process to reclaim memory that a VM doesn't need any more is called ballooning. It's a clever way to get a guest OS to decide which memory it no longer really needs. Balloon drivers are kernel mode device drivers, so OSs have to give them memory when they ask. The virtualization manager tells the guest component to grow the balloon driver to a certain size. The balloon driver makes demands for an amount of memory to the OS and the OS looks at its memory for the best way to meet the request (hopefully just by allocating free memory, but potentially using memory currently allocated for cache and perhaps having to page out memory to the guest OS page file). The guest OS gets to intelligently decide which pages should be given up in the most unobtrusive way, with the least hit to performance.

Once the memory is allocated to the balloon driver, these memory addresses are communicated to the virtualization manager, which tells the hypervisor it can now un-map those address ranges from physical RAM—the balloon driver will never actually touch them and no other part of the guest OS is allowed to. You've reclaimed memory. The ballooning idea is a neat way to handle reclaiming memory and is the best option short of being able to actually ask the VM to give back areas of memory. If the VM needs additional memory, then VM management can tell the balloon to deflate and reallocate physical RAM to the memory areas given back to the guest OS.

Page Sharing

When you use virtualization, you run many OS instances on one physical piece of hardware. Often, the VMs run similar versions of OS. For example, you might have 50 instances of Windows 7 running on a single physical server in a VDI environment. Because these are the same OS, a large part of their memory contents will be the same. Page sharing is the idea of only storing a page that is duplicated across VMs once in memory—basically Single Instance Storage for VM memory.

A process in the hypervisor looks at every page of memory for every VM and creates a hash value for them. It compares the hash values and if a duplicate hash is found, the process does a bit by bit comparison of the memory pages to make sure they really are identical. The content is then stored only once in memory and the duplicate VM page addresses just point to the page.

A number of factors make this technique less effective with modern OSs than older ones. One factor (but not a huge one) is that Windows Vista and above use Address Space Load Randomization, a security technology that loads key components of the Windows kernel into 1 of 256 possible locations. The feature makes it harder for malware to attack the kernel based on the component's location in memory because the locations will vary on different instances of the OS and at each reboot. As a side effect, though, duplicate instances of the same OS won't have the same content in the same locations. For this specific content, Address Space Load Randomization hurts the effectiveness of page sharing, but this is only for a small part of the OS content.

Another factor is that page sharing works best on empty pages, but as I mentioned in the previous sections, modern OSs rarely leave memory empty. But the biggest blow to page sharing is large memory pages. In the past, memory pages were 4KB, and the chances of finding 4KB pages with the same content across OSs is pretty high, so physical memory space will be saved. Modern windows and Linux OSs use 2MB memory pages by default, however, and the chances of finding duplicate 2MB memory pages are very slim.

Why do you even want larger memory pages? OSs work with virtual address spaces for memory, and these spaces have to be mapped to physical memory through page tables. Lookups in the page table can take time, so processors have the Translation Lookaside Buffer (TLB), a very fast memory cache on the processor that stores recently accessed physical-to-virtual address space mappings. This gives you faster lookups for commonly used memory areas, but the TLB has a finite size, so using 2MB memory pages instead of 4KB pages enables the TLB to cover far more memory—512 times more memory is addressable using the TLB with large memory pages than with small.

Swapping to disk

Swapping memory to disk by the hypervisor is never recommended by any virtualization platform. But it's present in some as a last resort for when the hypervisor badly needs memory and can't reclaim it through page sharing or quickly enough through ballooning. In this case, the hypervisor will randomly pick pages of the guests' memory and write them out to a hypervisor-level disk swap file. At this point, there will likely be a performance impact on the guest OSs. You never want this type of swapping to occur because the hypervisor could be writing key areas of memory to disk, crippling performance.

Microsoft's solution for VM memory doesn't use page sharing because most OSs will use large memory pages going forward. It doesn't allocate memory on first write in the overcommit style of allocation because of the inherent dangers of overcommitting any resource. It also doesn't perform hypervisor swapping out to disk.

Comments

Plain text