Server virtualization has many benefits, including savings on hardware, power, and cooling, as well as simplified management. Anyone who’s been in IT for any length of time has dealt with a computer that was infected with spyware, a virus, or possibly a rootkit. If the infected computer is a physical machine, you can at least shut it down or unplug it from the network to prevent further infection. But what if the infected computer is a virtual machine (VM)? Or worse, what if the infected computer is a VMware ESX host?
If a host is compromised, a hacker can create new rogue VMs or create a hyperstack or hyperjack attack. A hyperstack attack is similar to a man-in-the-middle attack, in which the original hypervisor is replaced with a rouge hypervisor but the legitimate hypervisor and hardware are unaware of the compromise. The rogue hypervisor is then privy to all the communication between the original hypervisor and the VMs. A hyperjack attack is a compromise of the hypervisor, similar to a rootkit infection on a physical computer. There have been a few documented hypervisor attacks, and it’s just a matter of time before an ESX attack is released into the public domain.
When a hypervisor is compromised, determining the source of attack is very difficult because the machine is a VM. The problem is even worse if the hypervisor is part of a cluster. The rogue VM could be moved from one host to another and possibly even replicated to a remote site using SAN replication. If a host gets infected in a cluster, you’ll probably have to take down your entire virtualization cluster and clean up the mess.
I hope I’ve raised enough concerns to convince you that virtualization security is indeed important. The basic tenant of virtualization security is to protect the virtualization host at all costs. If a host is compromised, you might not be able to fully recover your environment. I teach a 5-day course on this subject, so this article obviously isn’t a comprehensive tutorial on how to harden your virtualization environment. However, the suggestions I highlight in this article can be a significant help in hardening your virtualization infrastructure in a VMware environment.
Protected Management Network
The most important step you can take to protect your ESX hosts is to establish a protected management network for your hosts, as Figure 1 shows. This dedicated management network is protected from the other internal networks. In Figure 1, the only way to manage the ESX host is to access the Virtual Host Management Computer. This computer typically runs VMware vCenter Server.
To access the management network, the network administrator must authenticate to the SSL VPN, which is set up for two-factor authentication. After the network administrator is authenticated through Active Directory (AD), he receives a one-time password (OTP) via text message to his cell phone. (If a network administrator receives an OTP but wasn’t accessing the SSL VPN appliance, his AD username and password have been compromised.) After authenticating to the SSL VPN, the network administrator receives a shortcut for using RDP to access the vCenter server.
A firewall rule is created to allow SSL (TCP port 443) traffic to pass from the virtual server network to the dedicated management network. For further security, you can restrict the firewall rule to allow only specific IP addresses to access the SSL VPN appliance on the protected network. We typically allow all traffic to pass from the dedicated management network to the virtual server network. In the event of an attack, this dedicated network should buy you extra time to further isolate the network before any information is compromised.
This approach is consistent with protecting the host at all costs. Although this kind of strategy with a VPN appliance is somewhat uncommon, it’s appropriate for an enterprise environment with multiple ESX administrators. If you don’t want to use an SSL VPN, I still strongly suggest that you use a dedicated management network and that you open up TCP port 3389 (i.e., Terminal Server) on the firewall so that network administrators can access the vCenter server.
The concept of a dedicated management network is consistent with today’s security model of siloing. Siloing segments a network into logical subnets so that users don’t have access to computers they don’t need access to. I prefer using a firewall to implement siloing rather than using a switch or router because the logging, management, and troubleshooting tools are significantly better on a firewall. The following should also be considerations in a dedicated management network:
- HP Integrated Lights-Out (iLO)/Dell Remote Access Controller (DRAC) cards—These cards let the network administrator gain remote console access to an ESX host even when it’s turned off. However, they must be plugged in to the dedicated management network rather than the public network to prevent leaving an open back door to the host.
- Switch management—Should be accessible only from the management network.
- Firewall management—Should be accessible only from a dedicated management network.
- UPS management—If you’re running a UPS with a network-enabled management card, this card should be plugged in to the management network; otherwise, a hacker could launch a Denial of Service (DoS) attack against all the VMs by accessing the UPS management card and simulating a power failure.
On ESX or ESXi, you can establish management presence on a specific network card on the ESX host. In the vSphere Client, on the ESXi host, select the Configuration tab and click Networking. Select Add Networking, VMkernel, Use this port group for management traffic. Then, assign a static IP address to the host. Figure 2 shows an example of a dedicated management network on ESX.
In this example, a dedicated ESX management network was created with an ESX host IP address of 192.168.x.x. As you can see, all the other VMs are isolated on a separated network called VM Network. The two VMs that are connected to this management network are running vCenter. All other VMs on the ESX host are connected to the separate VM network.
Another important step in ensuring a secure virtualized environment is to configure your hosts with an adequate number of network cards. vMotion (ESX) lets you move a VM from one host to another host while the VM is running. Isolating traffic, especially vMotion traffic, is a security consideration as well as a performance consideration. You should establish a dedicated isolated network for vMotion, because any time a VM is moved from one host to another, the traffic moves in clear text—including everything that’s currently in RAM—which creates a significant security risk.
For standalone hosts, you need three network cards, as Table 1 shows. For clustered hosts, you need seven network cards, as Table 2 shows.
On ESX, VMware Distributed Resource Scheduler (DRS) performs automatic load balancing between ESX hosts. You create a resource pool of two or more ESX hosts, then run VMs on that resource pool. When a host gets overloaded, some of the VMs are automatically migrated to other ESX hosts in the resource pool that have lower utilization. VMware Distributed Power Management (DPM) automatically consolidates VMs onto fewer ESX hosts when the VM load is light. ESX hosts are automatically powered down when they aren’t in use. When the VM load increases, the ESX hosts are started and the VMs are migrated to the newly started hosts. It’s especially important to create an isolated network for vMotion moves on DRS- and DPM-enabled clusters because you can’t predict when a vMotion move will occur. An added benefit of a dedicated vMotion network is the improved performance of vMotion migrations because of reduced network contention.
ESX Hardening Guidelines
Establishing a dedicated management network is just one step in hardening your virtualization infrastructure. Next, let’s look at some ways to make your ESX hosts more secure.
According to VMware, vSphere 4.1 will be the last version to include both the ESX and ESXi versions of vSphere Hypervisor. All future releases of vSphere Hypervisor will include only ESXi. The main reason for this change is security. ESXi removes the Service Console and Web Server from ESX, making the footprint significantly smaller. I suggest that you move to ESXi now because you’ll be forced to move to it in the next release of vSphere. You must learn the Command-Line Interface (CLI) on ESXi, which is the replacement for the Service Console in ESX. Before you upgrade, make sure that any third-party applications or applications that run in the Service Console have compatible versions that work with ESXi.
If you can’t migrate to ESXi yet (in my experience, about half of all shops have yet to migrate from ESX), make sure that you adhere to the following best practices for accessing the ESX console:
- Disable remote root access. This access is disabled by default, but many administrators enable remote root access after ESX is installed. Instead of logging on as root, create an additional user with administrator rights to perform your ESX console management.
- Use sudo. When logging on to the ESX console, use sudo rather than logging on as root or using su. When you use sudo, all the console commands are logged in \var\log\secure. If you log on as root or use su, not all the console commands are logged. I suggest removing or disabling su so it can’t be used.
- Use host profiles. Host profiles are included in VMware vSphere Enterprise Plus. After you harden an ESX host, you can use host profiles to clone the ESX host configuration, determine which ESX hosts are out of compliance, and automatically remediate them. Host profiles ensure that you have a consistent ESX host configuration across all the ESX hosts in your virtualization infrastructure.
- Configure the ESX firewall. Verify that the ESX firewall is enabled with only the proper ports. Issue the command esxcfg-firewall -q to view the current firewall settings. Table 3 contains a list of ESX ports and their use.
Use vCenter’s Update Manager plug-in for patch management. I suggest using vCenter to manage your ESX hosts. Even VMware vSphere Essentials includes a license for VMware vCenter Server for Essentials, which eliminates the excuse that vCenter is too expensive. Using the vCenter Update Manager automates the patching of all your ESX hosts and VMs running on the hosts. You should run vCenter on a physical server. You won’t be able to patch the ESX host if your vCenter server is a VM and you’re not in a cluster, because you must shut down all the VMs on the host before you can patch the host. vCenter lets an ESX administrator establish very fine-grained privileges when managing an ESX host and cluster.
Purchase commercial SSL certificates for ESX hosts. By default, ESX and ESXi use self-signed certificates. These certificates are subject to man-in-the-middle attacks and should be replaced with commercial SSL certificates.
Perform VM image backups. Although this practice isn’t directly security related, it can help you recover in the event of a major attack. I suggest performing regular *.vmdk image backups of VMs. This action significantly simplifies and accelerates the recovery process of the VM, especially if the VM is running Microsoft Exchange Server, SQL Server, or SharePoint Server.
Use Trusted Platform Module (TPM) on ESX hosts. TPM chips first appeared as an option on laptops to prevent a rogue OS from booting on a laptop and optionally as a way to encrypt a laptop’s hard drive. TPM chips are now available as an option on the current generation of servers. You can use a TPM chip on an ESX host that serializes the ESX version with the TPM chip to provide additional protection against a hyperstack or hyperjack attack.
If you’re running ESX or ESXi 4.1, be sure to apply the patch that addresses the root password authentication/truncation problem. When you set a root password in ESX or ESXi 4.1, the password is authenticated only to the first eight characters. The rest of the password is truncated regardless of its length. For more information about this problem, see the VMware Knowledge Base article “ESX 4.1 and ESXi 4.1 root passwords are authenticated up to only 8 characters.”
Don’t use nonpersistent disks. Verify that you don’t have any nonpersistent disks on any of your VMs. If nonpersistent disks are enabled on a VM, all the changes to the disk are discarded when the VM is turned off. This is a great way for a hacker to cover his tracks. One of the few legitimate uses of a nonpersistent disk is in a lab environment where all the changes to the VM are removed after the VM is turned off. This setting can be changed only when a VM is off. To check whether a VM has nonpersistent disks enabled, start vSphere, right-click the VM, and select Properties. Click the VM’s hard drive and review the VM’s hard disk mode. The Independent check box shouldn’t be selected. Figure 3 shows a VM with a nonpersistent disk enabled.
Plan Ahead to Avoid Disaster
This article isn’t a comprehensive set of instructions for hardening your virtual infrastructure—realistically, an entire book could be written on this subject. However, the techniques that I suggest will go a long way toward securing your virtual infrastructure in a VMware environment. Although implementing these suggestions can be a significant amount of work, it’s still less work than cleaning up the mess if your virtualization infrastructure is compromised. For more information about security in a virtual environment, see VMware’s “vSphere 4.1 Hardening Guide.”