There have been several highly-publicized outages of cloud services over the last few months, ranging from Amazon's cloud collapse to Microsoft's Office 365 failures. While the track record for the public cloud hasn't been spotless, it's important to remember that internal IT resources -- and by extension, private cloud deployments -- aren't always infallible either. That's why it's doubly important to make sure that any of your cloud IT resources -- internal or external -- are researched, purchased, and deployed with quality of service (QoS) in mind.
When discussing QoS, it's important to remember that we're not just talking about service-level agreements (SLAs). QoS should definitely include solid, well-researched SLAs, but there are other factors that contribute to QoS as well. Here are some basic things to look and plan for when adopting a cloud solution.
Plan for Failure
It's a fact of life that mistakes and accidents happen, and that applies to both traditional and cloud resources. Netflix famously survived the April 2011 collapse of Amazon's web services, in part due to the approach taken by Netflix engineer John Ciancutti, who described the Netflix approach in a blog post way back in December 2010:
"One of the first systems our engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage." Source: http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
So if you're using a cloud service (or any IT service) be sure that you have backups, alternatives, and a "Plan B" ready if things go south. For example: If you’re using a public cloud backup provider, make sure that their approach supports the creation of local backups as well. Living in an imperfect world ensures that bad things will happen eventually, so prepare for them.
SLAs are Paramount
When working on an SLA for internal or external cloud providers, the old axiom of measuring twice to cut once is never more true. SLAs should be very specific about what you're expecting a provider to deliver, and what consequences there will be if that provider doesn't keep their promises. Make sure your data is portable, so that if a vendor goes out of business you can move that data to a new one. If you have data that requires a provider's custom software or a special method to obtain access to, you may want to consider looking into software escrow services so that even if the provider goes out of business, you still have a legal right to use that providers systems and software to access your data.
Communication is Key
End users and stakeholders accessing public and private cloud resources you've deployed should be brought up to speed on why you're moving resources to the cloud, and given ample training on how to access and use the resources you're providing. Even the most well-planned, managed, and deployed IT projects will meet with failure if your end users don't receive the proper training. I've personally seen expensive SharePoint deployments gather dust and sit unused, simply because IT and upper management never effectively communicated the what, why, where and how of the deployment. Ensuring that users receive the proper training and support will help your end users believe that QoS is being maintained.
These are just some quick tips for maintaining QoS in the cloud, but feel free to add some suggestions and questions of your own by commenting on this blog post.
EMC’s Window to the Private Cloud Partner Post:
By Dustin Smith
Microsoft updated their virtualization guidance for Exchange 2010 right around the time of TechEd, and I have started to get more and more questions from customers around virtualizing Exchange 2010 based on some of the solutions we have been creating.
With the increasing popularity of virtualizing Exchange 2010 a number of questions usually arise around High Avaliability and BC/DR options (aka Site Resiliency) and how they are effected by Virtualization. The good news is that they are not affected very much and with technologies like Hyper-V Live Migration, VM/HA, VMWare’s Site Recovery Manager, you will have additional options that can provide other things to think about. Since these technologies can now have coexistence with DAG, one must consider how or if these other options come into play.… Read More