One of the tools which is going to be very familiar to anyone who has gone through a Physical to Virtual migration using VMware, is the VMware Capacity Planner. This tool runs on a machine within your network and gathers up a wealth of performance monitor data from all the (Windows) servers running within the network. When installed, you simply tell the software which machines you want it to monitor and it spits out a nice report a couple of weeks later which includes all the various metrics for your network.
This tool is used by just about every SAN/Server/VMware reseller and VAR out there to size environments for new virtualization projects. My problem with it, is that it doesn’t work worth a damn for smaller environments. The reason that I say this, is that the VMware Capacity Planner works on averages. If the averages that it comes up with are for servers that run at a consistent load throughout the day then the capacity planner report will be just fine. However if the server only works for a small number of hours per week, but when it works it works really hard then the numbers that the capacity planner reports will be next to useless.
As an example I was working with a client last week that has a brand new EMC, UCS, VMware environment which was sized from the VMware Capacity Planner. They took one of their existing production SQL Servers and created a copy of it in the new platform and the process went from running in one hour to 9 hours. The reason that the process was so slow was that the storage hadn’t been sized correctly. The reason that it hadn’t been sized correctly was that the VMware Capacity Planner showed that the SQL Server needed 14 IO/second and 0.08 MB/sec of data transfer, so the storage for this server was designed with this workload in mind. However the actual workload for the server is that for 1 hour a night the servers runs to about 600 IO/second and the rest of the day the server is totally idle. So on average the numbers break down to about 14 IO/second, but the actual workload when the server is running is WAY higher than that.
Now if this was a large company which had purchased a fully loaded vMAX from EMC, or even a pretty powerful midrange VNX there wouldn’t be any problems as the system would have been powerful enough to handle this extra IO without issue. However this company was sold a smaller VNX 5300 which didn’t have enough IO capacity to handle the unexpected workload.
Now I may not be fair in placing the blame on the VMware Capacity Planner, there is plenty of blame to be placed on the VAR for not validating the numbers from the report against the raw data. If they had looked at the peaks in the raw data they would have seen that the numbers weren’t anywhere close to the output from the capacity planner and they could have done something. But instead I’ve got a client who trusted their reseller and vendors to sell them a solution which could handle their workload and instead they are pissed off about the situation they are in, and they’ve got me telling them that the brand new platform which they just purchased needs to be upgraded to handle their workload.
If you are working with a VAR or vendor and they are using the VMware Capacity Planner to size a new storage or server platform be VERY sure that you double check the numbers that they are using to size the platform so that you don’t end up with a big surprise after you finish the purchase. If fact, one of the services which I’m happy to offer as a consultant is to help you double check those numbers from the VMware Capacity Planner to ensure that the solution that you are purchasing is the right size solution.