Once upon a time, the people responsible for deploying and managing applications and infrastructure were known as IT operations engineers (or ITOps engineers for short).
ITOps teams still exist, and they still do those things. However, there’s a new category of engineer in town: site reliability engineer, or SRE.
While SREs perform many of the same tasks as ITOps teams, there are subtle but important differences between ITOps engineer and site reliability engineer roles. It’s critical to understand these differences if you want to know how IT organizations are structured today, and how SREs fit into them.
What Is IT Operations?
At the typical organization, IT operations is the team responsible for setting up and managing IT resources.
Traditionally, there has been a distinction between the IT operations team and the development team. The latter writes applications; the former deploys and manages them. And although DevOps encourages collaboration between ITOps engineers and developers, DevOps is not a replacement for ITOps. Usually, you’d keep your ITOps team even if you form a new DevOps team.
What Is Site Reliability Engineering?
Site reliability engineering is an IT discipline that focuses on optimizing the reliability of applications and infrastructure. The core mission of SREs is to maximize both availability and performance across IT environments.
Conceptually, site reliability engineering is pretty old. It originated at Google in the early 2000s, and other web-scale companies (like Facebook and Netflix) embraced the concept during the years that followed. However, it has only been over the past several years that SRE has begun to catch on within smaller companies, which are now increasingly adding SREs to their IT departments.
Similarities Between ITOps and Site Reliability Engineer Roles
That trend has more and more people asking the questions: Is SRE really all that different from ITOps? Or is it just a new buzzword and job title that covers what ITOps engineers have long been doing?
Those are fair questions. There is significant overlap between IT operations work and SRE work, especially in areas like:
- Deployment: IT operations teams and SREs are both concerned with ensuring that applications are deployed smoothly into production environments.
- Monitoring and observability: Both roles use monitoring and observability tools to detect and respond to problems.
- Incident management: Managing incidents (meaning disruptions that cause problems within applications or infrastructure) is a core part of the job responsibility of both SREs and ITOps engineers.
- Collaboration with developers: In today’s DevOps-centric world, ITOps teams and SREs alike are expected to work closely alongside developers, and to embrace the concept of shared ownership over software delivery.
Viewed from these perspectives, ITOps teams and SRE teams look pretty alike.
Key Differences Between ITOps and SRE
Despite those similarities, however, it would be a mistake to conflate ITOps with SRE – or to suggest that SREs are just ITOps engineers by a different name.
The main difference between ITOps and SRE boils down to scope of work. While SREs play a hand in many of the core IT operations processes, they do other things that fall beyond the scope of IT operations.
For instance, SREs might work with developers to help plan the next round of application updates to ensure that those updates will help rather than hinder application reliability. They may also work closely with QA engineers, who test applications before they are deployed, to help detect and fix reliability issues before the applications reach production.
A second important difference is that, while reliability is one thing that ITOps engineers care about, it’s not the main or only thing. IT operations teams are equally concerned with things such as end-user experience (which only partly involves reliability), infrastructure lifecycle management and infrastructure cost optimization, to name just a few ITOps priorities.
In contrast, SREs focus on reliability first and foremost. That doesn’t mean they totally ignore concerns like cost management. But they don’t get paid to address those other challenges. Their chief priority, and their main measure of success, is ensuring that IT resources meet performance and availability requirements.
Why Hire an SRE?
Those two differences explain why more and more businesses are investing in SREs to complement their IT operations teams.
When you hire an SRE, you get someone who collaborates with all stakeholders – developers, QA engineers, security analysts, ITOps engineers and everyone in between. By extension, you get a type of engineer who will help make your IT organization more cohesive.
At the same time, SREs bring a dedication to reliability that other roles just don’t have, at least not to the same extent. That’s important in an era when users expect sites to load in seconds, and where even just a few milliseconds of latency may mean you fail to deliver on the promise of technologies like 5G.
If you’re an ITOps engineer, it’s likely that you’ll be working with SREs in the relatively near future. Although there is certainly some overlap between ITOps and site reliability engineer roles, SREs can do things that ITOps engineers can’t, and vice versa.