Tuesday's release of the Global Data Center Survey report from Uptime Institute offers something of an inside view of the data center business. Conducted earlier this year, the survey gathered responses from nearly 900 data center operators and IT practitioners affiliated with enterprise and service provider data center facilities.
The survey's sponsor, Uptime Institute, is a Seattle-based organization that focuses on all aspects of data center operations, and is probably most well known as the creator of the data center "Tier Standard" and its associated certifications.
This year's results presents a mixed bag of business-as-usual, with mostly positive indicators, but with a few red flags pointing to areas where some improvement might be desired.
We asked Lee Kirby, Uptime Institute's executive director, what he thought the takeaway was from this year's results.
"If I were to sum it up, I think it shows the growing need for reliable digital infrastructure, because the digital economy is so reliant on it and getting more so every day," he said. "Outages are costing us more and having greater impact. The key is that our reliance on each other continues to increase, so we as an industry need to improve, and we can do that by embracing common global standards for how we design, build, and operate data centers."
The "outages" he mentioned are probably the most worrisome of the red flags. This year, 31 percent of the respondents reported they experienced a downtime incident or severe service degradation in the past year, up from 25 percent in the previous year's survey. On-premise power failures, network failures, and software or IT systems errors were offered as the most common primary causes, and nearly 80 percent indicated their most recent outage could have been prevented. The time to full recovery for most outages was one to four hours, with over a third reporting a recovery time of five hours or longer.
Given the great expense associated with downtime, and the proactive approaches data centers take to maintain "nine nines" reliability, this was a surprising result. Kirby said that a possible explanation might center on the increasing data center complexity as IT adds resources at the edge.
"I think as we see the adoption rates go up, and the use of the applications and deployment of the Internet of Things with all of the devices that are out there, we're taxing the entire matrix of the digital infrastructure," he said.
"The key point is, we are all dependent on each other, and if we don't embrace performance levels, and increasing those as an industry, we're going to see even greater impacts that will have a negative effect on economies. Just because something goes down at Telecity doesn't mean it's only Telecity that's impacted; it's going to impact the entire European segment. Different people will get blamed, but we're all to blame if we don't, as an industry, increase our performance levels."
In many ways, data centers seem to be on top of the problem. The traditional method of attempting to guarantee a reasonably quick recovery from a catastrophic failure, regular backups to a secondary site, is deployed by 68 percent of the respondents, and 51 percent make near real-time replication to secondary sites (with 40 percent replicating to two or more sites). Newer methods are also gaining traction. Forty-two percent said they utilize some sort of disaster recovery as a service program, and 36 percent take advantage of cloud-based high availability services.
Going hand-in-hand with increased downtime is the finding that many data center operators aren't paying much attention to the effects that climate change might have on their facilities. According to the survey, 45 percent of respondents said their organizations "are not adapting to climate change impacts at this time."
"That was surprising to me," Kirby said. "It's kind of counterintuitive to some of the things that we've seen happening, like having multiple 'storms of the century' and records being broken time and again. We're definitely seeing climate change, and I think that what it means for the data center industry is that there needs to be more diligence in the planning for disaster recovery. It's not just running a drill at the infrastructure level, but full business resumption planning and addressing what happens if a data center is no longer going to be able to function."
Despite these negatives, most of the survey indicates that data center operators are staying ahead of the curve in many key areas. For example, cooling costs as a percentage of a data center's total power outlay continues to improve. According to survey results, this year's power usage effectiveness (PUE) number is at an all time low (lower is better) of 1.58. The survey report points out that in another unrelated survey the institute conducted in 2007, the PUC stood at 2.5 and had dropped to 1.65 by 2013. Since then, the decreases have been incremental.
"One takeaway is that the biggest infrastructure efficiency gains happened five years ago," the institute notes in the report. "Further improvements will require significant investment and effort, with increasingly diminishing returns. Organizations will continue to increase efficiency in a bid to lower operating costs or to maximize available power (or both), including with artificial intelligence driven data center management as a service (DMaaS), software-defined power, and other approaches."
Rack density is also on the rise. When asked about the highest onsite server density, exactly half reported between 10 kW and 29 kW per rack, and 19 percent reported a density of 30 kW per rack or higher.
The highest rack density is far from the average, however. In last year's survey, 67 percent reported an average of below 6 kW per rack and just 9 percent had average densities of 10 kW per rack or higher. No figures for current average rack densities were included the report.
One speed bump to increasing compute density is cooling. While only 30 percent are still relying on basic room-based systems, and 56 percent saying they rely on precision air cooling, only 14 percent are utilizing liquid cooling, which allows the highest server density. Most respondents said they cool their highest density racks using a precision air solution.
The report point's out this will, by necessity, need to change as server hardware optimized for artificial intelligence (AI) workloads have much higher power and cooling requirements than standard x86 servers.
"While very high-density IT environments are likely to be confined to operators of AI applications and high-performance computing (including gaming and IoT applications with high IO), some colocation data center providers serving these types of customers will need to adapt," the report says. "Prefabricated modular data center components, equipped with precision or liquid cooling, are increasingly being viewed as a retrofit tactic to enable mixed-density colocation (and other) environments."
Other items covered in the survey are data center information management (DCIM) software (it's mainstream), and staffing (finding skilled workers will become more difficult). Tied to the later is the subject of diversity and gender equality. According to this report, women make up less 6 percent of the workforce at most data centers, which was not perceived as a problem by 70 percent of the respondents.
"As study after study shows, a lack of diversity typically represents not just a lack of pipeline for hiring but also a threat of technical stagnation, negative publicity, and, ultimately, a loss of market share," the report notes. "There is growing consensus among data center industry leaders, and elsewhere, that the future success of the data center business will depend on building a diverse workforce."
We asked Kirby what he found least surprising about this year's survey.
"The fact that edge computing is changing the dynamics of operations and management," he said. "We've known that's coming and have rolled-out a program called TIER-Ready that's helping people ensure that their infrastructure is reliable at the edge, because it's obvious that edge devices are going to be used for multiple purposes. The operation of the edge is going to be different than the core data center, so people need to adopt their practices and procedures for distributed data centers with those edge devices."
For those wanting a deeper delve into the numbers in this year's survey, Uptime Institute is offering a free webinar on August 22 at 8:00 AM Pacific Time.