Skip navigation
How to Coordinate Startup Across Multiple Windows Azure Instances

How to Coordinate Startup Across Multiple Windows Azure Instances

Synchronize work between Azure instances using flags stored as simple Windows Azure blobs

When you deploy an infrastructure to multiple Windows Azure instances, you might run into the problem of determining how to coordinate efforts between those instances. Each instance inherently runs independently of the other, according to its own timing, and sometimes you need a way to ensure that all the instances have completed a particular task. There are many subtle variants on this problem and equally many solutions. Here I'll focus on one scenario to show how you can coordinate the startup state across multiple instances in a deployment, using files created in blob storage as simple semaphores or flags.

Synchronizing Startup: The Scenario

Let's say our sample scenario involves synchronizing the boot sequence of a set of Azure worker role instances, as Figure 1 illustrates.

144317_fig1_azure_instances_waiting-sm
Figure 1: Azure Instances Waiting for All Others

We have a deployment starting up that has three instances. In the OnStart method, we perform some work (indicated by the DoWork method) and then, before we can do anything else in the startup process, we need to make sure that all instances have gotten this far. In this case, we aren't concerned with a semaphore that allows only one instance at a time to do work (that's a subject for another article).

The Solution

To accomplish the proposed synchronization, we could write our state out to a database table or a Windows Azure table or even create some form of pub/sub notification service between them. However, there's a simpler way: Have each instance write out a flag file to blob storage that indicates that the instance is ready. The contents of the file don't matter -- only that there's protocol that's followed for creating the file and a convention used for generating a filename.

Getting Instance Ordinals

We have each instance signal that it's ready by creating a file in blob storage when the instance is ready, including in the filename an identifier for the instance. In our case, because we have only a single role, we can simply use the ordinal of the instance as parsed from the RoleEnvironment.CurrentInstance.Id property, as Listing 1 shows.

public static int GetCurrentInstanceIdOrdinal()
{
    string id = RoleEnvironment.CurrentRoleInstance.Id;
    return GetInstanceIdOrdinal(id);
}

public static int GetInstanceIdOrdinal(string id)
{
    int instanceId = int.Parse(id.Substring(id.LastIndexOf("_") + 1));
    return instanceId;
}

In the code, when we call GetCurrentInstanceIdOrdinal, we're really just taking the instance ID that might look something like "MyWorkerRole_IN_0" and pulling out only the ordinal ("0" in this example). We factor it out into a helper method, so that we can apply the same logic in the sections that follow to get the ordinal from the instance IDs of other instances in the role.

The File-Naming Convention

With the ordinal for the current instance in hand, we rely on a convention that only the owner of that ordinal (e.g., the current instance) can write a file with that ordinal in the filename. Therefore we can write out a file like "startupWorkDone_0" when instance 0 has reached that point. We can capture this logic in a helper method, as shown in Listing 2.

public static void CreateFlagFile(string storageAccountSettingName, string containerName, string flagFileName)
{
    Trace.TraceInformation("Creating Flag File: " + flagFileName);

    CloudStorageAccount storageAccount =  CloudStorageAccount.FromConfigurationSetting(storageAccountSettingName);
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudPageBlob file = new CloudPageBlob(containerName + "/" + flagFileName, blobClient);

    try
    {
        file.Create(512);
    }
    catch (StorageClientException ex)
    {
if (ex.ErrorCode == StorageErrorCode.ConditionFailed)
        {
            Trace.TraceError("Unable to create flag file, unable to acquire lease on a locked blob. " + ex.Message);
            throw;
        }
        else
        {
            Trace.TraceError("Unable to create flag file. " + ex.Message);
            throw;
        }
    }

    Trace.TraceInformation("Done creating flag file.");
}

There are a couple of interesting things worth pointing out in that code snippet. The first parameter names the setting in the roles service's configuration settings that contains the connection string to the Windows Azure Storage account we want to use. The second parameter names the container (which must exist before we run this code) that will store all the flag files. The third parameter constructs the name of the file appending the ordinal of the current instance.

When using the file-naming convention, we create a CloudBlobClient from a CloudStorageAccount created from the provided configuration setting, then use that CloudBlobClient to create a new object to represent the CloudPageBlob file. Using the CloudPageBlob file, we create a tiny page blob of 512 bytes, then handle any exceptions that might indicate that the file was locked with a lease.

We would invoke this helper when the instance has completed the work of interest (e.g., at the end of DoWork in our scenario). A call to this method would appear as in Listing 3.

Utilities.CreateFlagFile("MyStorage",
                            "flags",
"startupWorkDone_" +
Utilities.GetCurrentInstanceIdOrdinal());


Waiting for Ready

By having each of our instances write out its own flag file, all we need to do now is enable them to wait for all the expected flag files to be written, which we can do by polling the container, as shown in Listing 4. The key to this code is that we can build all the expected filenames because we can get the ordinal out of the Role.Instances collection.

private void WaitForAllInstancesReady()
{
    int numRoleInstances = RoleEnvironment.CurrentRoleInstance.Role.Instances.Count();
    int numReady = 0;

    while (true)
    {
        numReady = 0;
        foreach (RoleInstance instance in RoleEnvironment.CurrentRoleInstance.Role.Instances)
        {
            if (Utilities.DoesFlagFileExist("MyStorage",
                            "flags",
                            "startupWorkDone_" + Utilities.GetInstanceIdOrdinal(instance.Id)))
            {
                numReady++;
            }
        }

        if (numReady == numRoleInstances)
        {
            break;
        }

        Thread.Sleep(1500);
    }
}

Listing 5 shows the implementation of the helper method DoesFlagFileExist. One piece that's important to notice is the call to file.FetchAttributes() because in the end the result of this call determines whether the file really exists in blob storage -- or not (because it throws an error). With the call to WaitForAllInstancesReady within our OnStart code, we can now proceed to include any additional work that must follow this synchronized junction.

public static bool DoesFlagFileExist(string storageAccountSettingName, string containerName, string flagFileName)
{
    bool doesFlagFileExist = false;
    CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSetting(storageAccountSettingName);
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
    CloudBlobContainer container = blobClient.GetContainerReference(containerName);

    try
    {
        CloudBlob file = container.GetBlobReference(flagFileName);
        file.FetchAttributes();
        Trace.TraceInformation("Flag file exists: " + flagFileName);
        doesFlagFileExist = true;
    }
    catch (StorageException ex)
    {
        Trace.TraceError("Unable to get blob reference. " + ex.Message);
        if (ex.ErrorCode == StorageErrorCode.BlobNotFound || ex.ErrorCode == StorageErrorCode.ResourceNotFound)
        {
            Trace.TraceInformation("Flag file does not exist: " + flagFileName);
            return false;
        }
        else
        {
            throw;
        }
    }

    return doesFlagFileExist;
}

Completing the Solution

There are additional pieces to consider beyond what I've shown in the solution here -- such as what to do when you restart an instance and when you should clean up your flag files. What specific additional pieces you'll need to provide will be determined mainly by your scenario's particulars, but with the help of the code I've provided for manipulating your flag files, these parts should be fairly easy to implement.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish