If you're a company like Google, it's easy to see how site reliability engineering — which, not coincidentally, originated inside Google — is valuable. SRE helps large businesses optimize the performance and reliability of their sprawling, complex infrastructures and applications.
It may be harder, however, to determine whether hiring site reliability engineers is worth the cost and effort for smaller companies. If you have a relatively small infrastructure and set of applications to manage — or if you just have a limited overall engineering budget — SRE is not as obvious a choice. And just because the tech giants are all about SRE these days doesn't mean every small and midsize business (SMB) or small and midsize enterprise (SME) should be, too.
This begs the question: How large does a company need to be, exactly, before it should hire site reliability engineers?
For Site Reliability Engineering, Size Matters
Any company could hire SREs. However, small companies may shoot themselves in the foot if they hire SREs unnecessarily, for several reasons:
- SREs — who are among the highest-earning types of engineers in the IT industry — can put strain on a smaller business's budget.
- The job of SREs is to help developers and IT engineers manage reliability more effectively. If you have few developers and IT engineers on staff, there may not be much point to hiring SREs.
- SREs could create unnecessary organizational complexity, raising questions about who "owns" what when it comes to reliability, within small companies that already have deeply entrenched ways of doing things.
- Small companies may not have IT environments that are large or complicated enough to require the expertise of SREs.
This isn't to say that smaller businesses never need SREs. It just means that hiring SREs could do more harm than good for some small companies.
How SMBs and SMEs Should Think About SRE
How, then, should smaller businesses decide whether SREs are right for them? Here are several pointers to consider.
The '25 engineer' rule
While every business's mileage will, of course, vary, one rule of thumb for determining whether to invest in site reliability engineers is what you could call the "25 engineer" rule.
This is the idea — articulated by people like Seth Vargo of Google — that companies begin to need SRE once they have at least 25 engineers on staff, or at least 25 employees of some type within their engineering organization.
Below that point, your engineering team may simply be too small and uncomplicated to require roles that focus on reliability alone.
Companies should think, too, about how they handle on-call scheduling — meaning the process of determining which engineers are "on call" to respond to unexpected incidents — and whether it would be beneficial to have SREs take the lead on that front.
You certainly don't need SREs to cover on-call duty. Plenty of companies manage on-call scheduling using just IT engineers. That said, on-call operations can be smoother when at least some of the people who are on call are SREs, who excel at responding to incidents because that is a main focus of their jobs.
So, if your existing on-call management strategy is not working well — either because your engineers don't like being on call or because it's hard to ensure that you always have someone on call who has the background necessary to respond quickly to any type of incident — you may stand to benefit more from hiring SREs than would another small company that doesn't face on-call challenges.
Small businesses should think, too, about their growth plans. With regard to the value SREs can bring, there is a big difference between a high-growth startup that has only 10 engineers today but expects to have 50 within a year and an established SMB that is not in growth mode.
The point here is that businesses should assess whether SREs may be valuable to them not just today, but also one or three or five years out. It makes more sense to invest in SREs if you expect to need them in the relatively near future, even if you don't really need them at present because your company isn't big enough yet.
A final critical factor to consider if you are a small business considering hiring site reliability engineers is how complex your technology stacks are. While I don't have data to prove it, I suspect that SMBs and SMEs are, on the whole, less likely to be deploying highly complex, cloud-native technologies than are larger enterprises, due to their smaller engineering staffs.
But, of course, there are exceptions. If you're a small company that has gone all-in on Kubernetes or hybrid cloud, for instance, you probably can benefit from SREs more than a business that mostly uses just VMs. The latter are easier to keep running without the specialized reliability expertise of SREs.
Site reliability engineers are great when you actually need them, but not every business does. Smaller companies in particular should carefully weigh their reliability requirements, as well as how SREs would fit within their existing engineering organization, before committing to site reliability engineering.