Achieving 100 percent application availability is the holy grail of development. So much so that achieving this goal borders on the line of impossibility. That said, I'm currently working on a distributed solution that’s attempting to achieve 100 percent availability by at least one of its distributed nodes. That has caused me to finally take SQL Azure for a spin—as well as think more about redundancy and the reality of being able to hit 100 percent uptime.
Finally Using SQL Azure
I admit that anyone following my articles on a regular basis might wonder if I'm mentally unstable. For example, my coverage of Windows Azure and SQL Azure has been seemingly contradictory at a number of points. For example, I've voiced my opinion about how beneficial SQL Azure will be to SQL Server. I've taken a similar approach to how I think Windows Azure will impart huge benefits to .NET development. Yet I've also taken the time to actively rant or complain about how confusing and problematic signing up for Windows Azure or SQL Azure can be, and I’ve explained why I've never used Azure.
On the surface it might look like I can't make up my mind about Azure. But in my mind, things are pretty simple: Azure has a lot of amazing potential that's slowly clawing its way out of some really ugly and perennial issues that Microsoft has when it comes to the delivery and execution of new solutions. For example, the fact that Azure was so late to the cloud services party, yet Microsoft persists in pretending that it invented the cloud is a bit of a turn-off to say the least. Similarly, the fact that Microsoft pushes Azure so heavily and aggressively even when many developers, system admins, and businesses don't need cloud services is another problem. And the fact that Azure's website marketing has been such a calamity until recently, has made me very critical of Azure’s execution and offerings—but never of its potential.
Accordingly, it was personally exciting for me to actually reach a point in which a number of things I disliked about Azure's execution had been addressed and I was finally able to sign up and start working with Azure.
Taking SQL Azure for a Test-Drive
To date I've only taken SQL Azure for a test drive with a very small database. But, I didn't have to dumb down any of my functionality to get things up and running as needed. I also find that the Silverlight-based dashboard that Microsoft provides for managing Azure resources is a big win because it's very responsive and easy to use.
I was also pleased to see how easy it was to create a new server for SQL Azure and then spin up a database against it, configure permissions, and start populating and querying data. Being able to directly query and interact with my SQL Azure database from within SQL Server Management Studio was a big win. Because I'm a fan of T-SQL templates, I was very happy to see that the SQL Azure team has heavily leveraged templates as a way to get around current GUI limitations (and, I’m assuming, to cut down on chatter back and forth over the wire) by means of providing templates for many administrative actions and operations.
In fact, the only negative point I have to report about my SQL Azure experience is that I find the firewall interaction to be a bit cumbersome or heavy handed. The need for a firewall with SQL Azure and other Azure offerings can't be understated, so the fact that there’s a requisite hurdle in the form of a firewall is a huge win. That said, my beef (as weird as it is) is that the only way to currently work with the firewall is to specify IP addresses or a range of addresses that can access a given Azure endpoint or resource. And although that's going to be fine for most businesses, it's a bit ugly and tedious for me because I'm going to be building a highly distributed solution with numerous nodes from a host of different locations that are all trying to access my Azure database. Consequently, it would be cool if Azure had a less-secure option that would let me set up DNS entries and then define reverse-DNS for things such as node1.mydomain.com and node2.mydomain.com as an easier way of working with the firewall. But other than this one negative point, my experience so far has been great—and I haven't had to do any dumbing-down of my apps or code to get them to go from working on an on-premises SQL Server database to working in SQL Azure.
SQL Azure and the Promise of Cloud Availability
In sizing up SQL Azure as an option for my solution, one thing I did have to do was gauge how much downtime I could potentially expect when using SQL Azure. As much as the hype from all cloud vendors would have you believe that cloud solutions are always on, that's commonly not the case. As such, I went ahead and did some homework on overall uptime for SQL Azure within the last year to get a feel for what I could potentially expect going forward.
Of course, my homework was nothing more than taking a peek at the excellent uptime statistics compiled by www.cloudharmony.com. Its Cloud Status report is something everyone should bookmark who's looking to use a cloud service of any type or flavor. According to the report's metrics, SQL Azure had an uptime of 99.985 during the past 365 days prior to my inquiry. According to the rule of five-nines, an uptime of 99.985 means that within the last year, SQL Azure had just under 53 minutes (what you'd get at 99.99 percent) of total downtime.
More specifically, in looking at the detailed statistics on cloudharmony.com, it turns out that SQL Azure was down a little under 33 minutes for an entire year. Not bad, but not 100 percent uptime either.
Because the highly distributed application I'm building uses SQL Server persistence only during the startup of each node and to periodically write tiny amounts of data that can be eventually consistent, I decided that even if SQL Azure was to double its failure rates in the next year, having a SQL Azure database along with a mirror SQL Server database hosted somewhere else would be good enough for me to work with. As such, what I've done is settled on using SQL Azure as my primary data storage mechanism because pricing and uptime are great. Then I've gone ahead and actually set up a secondary mirrored database for redundancy purposes.
This approach obviously wouldn't work for many applications that are database-centric, but this solution will probably only ever get to be around 2GB in size, only read about 2MB of data when an individual node starts up, and only make periodic writes. Consequently, instead of using a data repository, I've actually gone ahead and created a redundant data repository that pushes all writes to both databases (that uses an Amazon Simple Queue Service persistence mechanism to queue writes against either database if it's down) and tries to read from the failover or secondary database when the primary database doesn't respond quickly enough during node startup.
So far this approach is working well. And although creating a redundant repository did add some additional complexity to my application, the amount of complexity paled in comparison to what I would have had to do had I not written my application against a semi-permanent centralized data store. This meant that I could avoid coding up true peer-to-peer semantics that would have made my simple solution a nightmare.
As such, I think I might actually have a shot at hitting that coveted 100 percent uptime— even if 100 percent uptime is theoretically impossible and crazy to pursue.