The widespread adoption of site reliability engineering, or SRE, roles has been one of the biggest disruptors in the IT operations world over the past several years. Although SRE originated in Google nearly two decades ago, it has only been over the past half-decade or so that companies of all types and sizes have begun adding SREs to their IT organizations.
The question is: Will the trend last? Will there remain strong demand for SREs five or 10 years from now?
I don't know the answer. But I do have some thoughts on the future of SRE roles. Let me explain them by articulating two reasons why site reliability engineers may be part of IT teams for years to come, and two reasons why site reliability engineering may turn out just to be a fad.
Why SREs Could Be Here to Stay
Let's start with reasons why SREs may have a bright future.
1. High user expectations
Probably the most compelling reason to envision SRE as an enduring role is that reliability expectations are higher than ever, and companies need SREs to help meet them.
Even just five years ago, users had more patience for websites that failed to load or applications that crashed. But that's no longer the case. Factors such as hyper-competitiveness between businesses and the need to support remote workforces in the wake of the COVID-19 pandemic have raised the bar.
Because SREs specialize in optimizing the performance and availability of resources, they can be a unique asset to companies trying to meet the very high expectations of modern users.
2. Complex architectures
SREs also have the potential to deliver unique value for organizations that need to support complex software architectures and environments, like Kubernetes.
Maintaining the reliability of a monolithic application hosted on a VM is straightforward enough. But distributed applications and architectures are a different type of beast, and not every IT engineer knows how to tame them from a reliability perspective. But SREs typically do.
As cloud-native architectures become ever-more widespread, then, there is likely to be steady demand for SREs.
Why SREs May Fade Away
On the other hand, there are reasonable enough reasons to imagine that the SRE thing will turn out mostly to be a fad — and that even if there are still SREs five or 10 years from now, they'll constitute a niche role more than a mainstream part of the IT industry.
1. SRE is a concept for hyperscale companies
Arguably, the greatest reason why site reliability engineering may not end up becoming an enduring role at most companies is that the SRE role was conceived by and for hyperscale companies — namely, Google. Companies that lack IT estates as large or complex as those of Google may discover that SRE is overkill and that they can get by perfectly fine using traditional reliability strategies and roles.
Whether that turns out to be the case depends in large part, I think, on how far the adoption of cloud-native technology goes. Platforms such as Kubernetes were also designed for and by hyperscalers (indeed, Kubernetes is another product that traces its origins to Google), and it's possible that smaller businesses will decide they don't actually need to containerize and microservice-ize all of their applications. By extension, they won't need SREs to help manage those technologies.
2. The SRE role remains ambiguous
Another challenge that site reliability engineers currently face is the fact that there's not a great deal of consensus around what exactly they do, or how they do it.
Sure, there are resources like the Google SRE Book, which spells out in some detail what the SRE role is supposed to consist of. And there's a general sense that the purpose of SREs is to specialize in reliability, typically by applying the techniques of software engineers to IT operations.
But to a greater extent than other roles, such as software developer or IT engineer, the definition of SRE remains hazy. It's likely to be unclear to at least some IT leaders how SREs differ from IT operations engineers, for example, or why they need to spend money on SRE roles when their existing teams are already heavily invested in achieving performance and availability goals.
Admittedly, you could make the same criticisms of a role like DevOps engineer, which is also subject to some ambiguity. But DevOps caught on in large part because it promised to form a bridge between developers and IT engineers. SREs more or less do the same thing, which may make them seem redundant for businesses that already have DevOps roles, even if DevOps and SRE are not quite the same thing.
SREs' Murky Future
There's no denying that SREs are in demand today. Indeed, they are among the best-compensated members of the IT organization, which is a reflection of the premium that companies place on hiring SREs.
But I'm not convinced that this trend will endure. It could, if demand for engineers who specialize in managing complex, cloud-native environments lasts. But there's also a decent chance that most companies will decide that they can get along well enough without the help of SREs.
About the authorChristopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.