One cited key advantage of most cloud computing platforms is that they give customers elastic scaling of compute services. That is, cloud computing gives customers the ability to specify how much compute capability they need at any given time and to scale out or in, based on changes in their requirements. For example, a financial system that is hosted on Azure might need to increase compute capacity at the end of the month to accommodate the burst of work that happens with month-end procedures. Alternatively, an e-commerce website might experience a burst of traffic because of media attention and need to increase capacity quickly to accommodate the new demand. In the opposite direction—scaling in—no company wants to pay for resources that it isn't using.
Windows Azure provides the ability to scale out by increasing the number of instances, or to scale in by decreasing the number of instances, of a particular Azure role. The platform also provides the ability to scale out or in by changing a role's virtual machine (VM) size (going from small to large or vice versa). Both settings are specified within the role's service configuration.
Although Azure enables these forms of scaling changes, they require manual application. An automated solution is preferable, hence the concept of auto-scaling. Auto-scaling refers to scaling out or in, by changing the configuration of Azure roles, based on some criteria. In this article, we simplify the complexities of auto-scaling on Azure, to provide you with an approach that you can tailor to your particular needs.
Let's begin by asking the simple question: Which metrics should you use? When considering auto-scaling for your solution, some heuristic that tells whether you should scale in or out will implicitly drive the scaling process. There are probably thousands of variants, but the two most common types of metrics on which scaling decisions are based are a schedule, such as particular weeks in the month, or Key Performance Indicators (KPIs), such as CPU utilization or queue depth. In this article, we use heuristics derived from these two types of metrics to guide us in making scaling decisions about Azure roles.
General Scaling Approach
Azure does not provide a native auto-scaling solution, but that doesn't mean that such a solution is beyond your reach. Figure 1 provides a flowchart that illustrates the general scaling approach. Starting from the top of this chart, suppose you have already deployed your initial hosted service. You need to retrieve either the current KPIs or the schedule, as well as your current capacity, to decide whether you have enough, too little, or too much compute capacity. We'll explore how to retrieve this information in a moment.
Figure 1: General auto-scaling approach
Assuming that you need to scale, you next need to determine in which direction: out or in? In either case, you need to modify the service configuration that you retrieved previously and adjust the number of instances for the role accordingly. After you have completed this task, you need to apply these settings to your deployed hosted service. This part is the one that catches most people off guard: Applying these changes will cause your instances to restart, hence the warnings in the flowchart. (We will return later to ways in which we can minimize the effect of this downtime on users.) As you can see, conceptually the process isn't too complicated, but the devil is in the details—so we will explore them next.
How to Collect Metrics
The first challenge that you'll face is how to get at the metrics that inform your scaling decision. If you are scaling based only on a schedule, then the lookup is pretty simple (e.g., against a configuration file, against a database). But what about KPIs? For example, say you determine empirically that whenever a role's average CPU utilization climbs above 50 percent, you need to add more instances to maintain a good experience for users. CPU utilization is a performance counter, so (as for most other performance counters) you configure the role at startup to periodically transfer performance-counter data to an Azure table. Figure 2 shows how to collect CPU utilization every 5 seconds (on the role instance) and then transfer it to the Azure table every 1 minute.
A similar approach can be used for other performance counters, or even non-performance counters such as the depth of an Azure queue. The key is to write out the counter data to an Azure table (or even to SQL Azure) so that you can analyze the data, which will inform your scaling decisions.
How to Trigger Scaling
Now you have the data, but where and how do you make your scaling decisions? The most common solution is to build a worker role that periodically polls your active hosted service's configuration and the relevant key performance counters, then proceeds with the decision tree that Figure 1 shows. The worker role implementation can retrieve the active number of instances by invoking the GetDeployment operation of the Windows Azure Service Management API. This API is readily consumable in Representational State Transfer (REST) form, but a sample implementation of the API in compiled form is available from CodePlex and simplifies the API's use from your .NET code.
The GetDeployment operation returns a deployment object that contains the Base64 encoded form of the XML ServiceConfiguration file within the object's Configuration property. Figure 3 shows an example of a retrieved instance of this document. You need to update the count attribute of the Instances element.
Next, you need to apply this updated configuration to your hosted service. As Figure 1 shows, you have two options, both of which have certain ramifications. One approach is to use the ChangeDeploymentConfiguration operation, which allows you to send only the Base64 encoded version of the updated configuration file. Azure will apply the changes, during which time all your instances will become unavailable. One workaround is to deploy to staging and perform a virtual IP (VIP) swap, which can be done through the API, via the SwapDeployment operation. Alternatively, you can use the UpgradeDeployment operation. This option requires you to put your CSPackage in BLOB storage but has the benefit of walking the upgrade domains so that some of your instances are always available.
The astute reader might guess a few details that need to be considered. First, all these operations take some amount of time to complete. Therefore, your system cannot be too aggressive in increasing the count. The necessary pace is accomplished by increasing the instance count and then waiting 10 to 20 minutes to allow those instances to kick in, before re-evaluating whether additional instances are still required. Given this delay, you might also choose to scale by multiple instances at a time. Second, when it comes to downsizing, there is little benefit in being more aggressive than removing instances at an hourly pace. Recall that compute services on Azure are billed at the complete hour, so you pay for a full hour even when you use only 45 minutes. You might as well let the extra resources run for the full hour.
Getting Your Hands Dirty with Azure Auto Scale
Obviously, you need to consider many details, particularly given the requirements of your own solution. If you just want to get started playing with auto-scale, a good approach is to start from a sample. There are quite a few, but we have found Windows Azure Service Instances Auto Scaling (aka Azure Auto Scale) to be the simplest of those we've tried, with Cloud Ninja being a much more robust example. We suggest that you check out Azure Auto Scale first. (Be aware that as of this writing, Azure Auto Scale was still in alpha testing and had a few minor bugs, which we mention here for your benefit.)
Follow the basic setup instructions on the CodePlex page (see the Additional Resources box for the URL) to set up your Azure hosted services, modify Azure Auto Scale's XML configuration to suit your needs, and add the necessary code to your scaled role so that you can log the CPU performance counter (if you aren't already doing so).
Next, update the code in AutoScalingWorker.Utils.getLocalCert() to load a configured management certificate instead of the first certificate in the personal store, and add a service configuration setting that stores the thumbprint of the certificate to load. Figure 4 show the changes that you should make.
Update the code in AutoScalingWorker.ServiceController.updateDeploymentConfigurations to use the UpgradeDeployment configuration operation instead, as Figure 5 shows. Finally, ensure that you create the table ScalingLogTable (or whatever you name it in Azure Auto Scale's XML configuration file) prior to deployment.
You now have information and some tools to help you start working with Windows Azure's scaling capabilities, so you can adjust your Azure installation according to the resource needs of your application. Have fun!