2022 State of SRE Report Identifies Site Reliability DevOps Challenges

There are a number of difficult challenges facing site reliability engineers, the top one being defining service-level objectives.

Sean Michael Kerner, Contributor

March 18, 2022

3 Min Read
cover of 2022 State of SRE Report

The 2022 State of SRE Report was released on March 17, providing insight into the current state of site reliability engineering.

Site reliability engineering (SRE) is an increasingly essential component of DevOps — and IT operations overall — and the 2022 State of SRE Report was based on input from 450 site reliability engineers about their views of SRE. Among the surprising findings in the 46-page report, which was sponsored by software intelligence vendor Dynatrace, is the extreme level of challenge that nearly all (99%) respondents had with defining service-level objectives (SLOs), a foundational metric for site reliability engineering.

"Google's work around their SRE practices have inspired wide discussion over the past years, and we're seeing more tool support to make it easier to define SLOs based on best practices," Andreas Grabner, DevOps activist at Dynatrace, told ITPro Today. "This is why 99% is a surprise."

Why Service-Level Objectives Are a Challenge for Site Reliability Engineering

As to why defining SLOs continues to be such a hard challenge, Grabner has a few ideas.

"There's a disconnect between business objectives and how they translate to technical and digital services," Grabner said. "Business objectives of the organization are what must define SLOs, and in my observation, this alignment is where SREs are most challenged."

Related:How DevOps Roles and Site Reliability Engineer Roles Differ

Another reason why defining SLOs is such a challenge is because it's difficult to actually understand what the current service levels are in an organization. The majority (68%) of respondents to the survey found that the use of multiple tools and having siloed organizational units increase the difficulty in determining the actual service levels.

While there is a challenge defining SLOs, there is widespread acknowledgment of the importance of the role of site reliability engineers overall. Sixty-eight percent of respondents expect the role of SRE in security to increase in the years to come, and 88% noted that in 2022 most organizations understand the strategic importance of SREs.

How SREs Spend Their Time Solving Site Reliability Challenges

There are a number of common tasks that dominate site reliability engineers' time.

The 2022 State of SRE Report found that the top use of SREs' time is spent improving the MTTR (mean time to recovery) for organizations' IT operations and services. Following closely behind in second place is building and maintaining automation code for a number of different tasks.

SREs are building automation code to help remediate security issues and resolve application failures. Automation code is also used to increase the speed of application and service delivery. In fact, 85% of respondents noted that both automation and AI are critical to scaling SRE practices overall.

Related:3 IT Operations and Management Trends to Watch in 2022

"SREs need to influence future architectures so that it will become easier to leverage the new generation of automation to remediate problems,"Grabner said.


About the Author(s)

Sean Michael Kerner


Sean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He consults to industry and media organizations on technology issues.


Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like