Retaining Productivity During Cloud Service Outages

How do workers who depend upon cloud-based SaaS remain productive when those services go down? It seemed like a reasonable question to ask after a year where more and more enterprises have accelerated their moves to the cloud, and one that would be of interest to IT pros and system admins who are responsible both for assessing the cloud services they’ll be implementing and for supporting end users.

The cloud has become an integral part of many organizations and cloud computing services such as a productivity suite are the basis of the apps end users need to accomplish their daily work. As an IT pro, it is important to understand how the cloud and cloud service outages can impact your users so that the impact on productivity is minimized, if at all possible.

Of course, a possible approach to all of this is to do what one company told ITPro Today they do during cloud service outages: Take a break and wait for the cloud to come back online. That might be OK for some businesses and organizations but could be a deal breaker for others.

Organizations must have a plan in place to remain at least somewhat productive if their cloud computing service goes offline.

How Often Do Cloud Service Outages Occur?

ITPro Today asked the three big cloud players (Amazon, Microsoft, and Google) for this data to see if there were any verified numbers on how many working hours per year the services were down – after all, even if a service is up 99% of every year, that’s still 87.65 hours per year it would be down, or a little over two working weeks. One company never responded to my request, another one offered to talk about cloud service outages but only on background (which means no attribution in the field of journalism), and one PR firm tried for more than two weeks with no success to get the data from their client.

The best and most recent outage data I could track down is from a Network World article by Zeus Kerravala, founder and principal analyst for ZK Research.

Kerravala’s data, which was obtained from an unnamed third-party firm that constantly collects cloud outage-related data self-reported by the cloud service vendors, was for the period between January 1^st, 2018 and May 3^rd, 2019. For reference, that is 488 days, or 11,712 hours of total monitored time being analyzed.

During that timeframe, the cloud service outages were:

Microsoft Azure: 1,934 hours, or 16.5% of the total monitored time
Google Cloud Platform: 361 hours, or 3.08% of the total monitored time
Amazon Web Services: 338 hours, or 2.89% of the total monitored time

As Kerravala notes in his article, this data is not normalized because it does not contain outages listed on a service or regional basis. However, it provides a quick snapshot in time for reliability.

Spread out over the entire 488-day period, which this data from Kerravala covers, those are good looking number across the board when it comes to a daily average. For Microsoft Azure, even at 16.51% downtime over that period, it averages out to just .03% downtime per day. For Google and Amazon, it is .006% and .005% respectively.

However, despite those stats, when any one of those services goes down it is in the headlines. Most recently, we saw Slack down on the first full workday of 2021 and it was documented across the internet. The same thing has occurred when any of the big three incurred cloud service outages as well. Although disruption is minimal, that does not change a service being unavailable when a business needs it to be up .

I spoke with Roy Isley, Omdia’s chief analyst for IT ecosystems and operations, and he confirmed reliability for cloud-based SaaS has certainly improved over the years. He added that for many customers, the reliability is not focused so much on downtime but rather on the services' overall ability to deliver those services when they are needed. What’s driving the ability to deliver on demand? Credit the advent of multi-cloud computing.

What Exactly Is Multi-Cloud?

This nebulous phrase is bantered about these days as if everyone understands exactly what it means. During a discussion with Mark Nunnikhoven, the VP of cloud research for Trend Micro, he helped clarify what the reality of multi-cloud really is for most companies – and how the approach can be beneficial during cloud service outages.

“When a company mentions they use multi-cloud, that typically means they are using different clouds for different services,” Nunnikhoven said.

In other words, some companies decide to use Salesforce for their CRM, Amazon Web Services for hosting an app and Microsoft’s Office productivity suite for email, cloud files, collaboration and content creation.

This then allows a company to shift their productivity focus over to another stack/service while other cloud services might be down. (Rarely will all services be down at the same time unless there is a larger connectivity outage in a certain region.)

Nunnikhoven also confirmed that the one thing multi-cloud does not provide is full redundancy for all SaaS cloud computing any one company uses. It is cost prohibitive to have 100% redundancy across multiple cloud services.

Alternatively, multi-region (more expensive) or availability zones (less expensive) are the better solutions for many companies looking to have some modicum of near-continuous service.

Hybrid Cloud to the Rescue?

Although many companies have already made a full digital transformation to the cloud and no longer have their own on-premises hardware, for those organizations who still retain that local data center or server rack there is an option to stay productive during cloud service outages.

Hybrid cloud has been embraced by the big three cloud providers to the point that robust tools and services have been added to cloud services to allow these companies to operate in this unique environment.

As Nunnikhoven pointed out, cloud providers have embraced hybrid solutions because they see value in an on-premises company learning the cloud environment while retaining that local capability. It helps them become more familiar and comfortable with the interfaces used to manage these cloud services and yet retain that local hardware redundancy.

That is why hybrid cloud is an option for retaining some level of productivity during cloud service outages. In this configuration, data is likely synched between the local and cloud storage along with the front-end interface for the functionality on a regular basis. In the event the cloud goes down, it would be a simple process to redirect the front-end software to find the data in the local on-premises hardware until the cloud outage is resolved.

Microsoft’s Azure Stack, Google’s Anthos, and Amazon’s Outposts all offer hybrid cloud solutions. Keep in mind though, even these hybrid solutions will not provide 100% of business continuity during a cloud outage, but it could restore critical processes depending on your organization’s priorities.

What's the Best Option For Retaining Some Productivity During Cloud Service Outages?

Both Omdia’s Isley and Trend Micro’s Nunnikhoven recommended hosting certain services across different clouds rather than putting all company services into one cloud basket. It is not a perfect 100% redundancy solution, but it would allow employees to retain some productivity during an outage. Next, consider the hybrid options previously discussed.

The reality of these options is that they are better suited to keeping critical business continuity functionality online – they’re not aimed at keeping the interoffice chat going.

That means productivity during cloud outages for employees in admin, finance, human resources, and other similar departments might not exist for those who use SaaS cloud computing solutions.

In that case, the advice from one company to just “take a break” during the cloud outage might be the best productivity option for many organizations until the outage is over.

Comments

Plain text