How can you optimize the reliability of your applications and infrastructure?
The conventional answer involves deploying a range of tools, teams, and roles that are not exactly cheap. You can hire site reliability engineers (SREs), who specialize in optimizing availability and performance, but they represent one of the most expensive roles in the modern IT organization. You can refactor your applications to run as microservices, which enhance reliability and performance, but that takes a lot of developer resources that your business may not have. You can pay for more expensive cloud hosting or mirror your workloads across multiple cloud availability zones or regions to increase availability, but doing so could increase your cloud computing bill considerably.
So, what do you do if you need to improve application reliability but don't have unlimited financial resources? What if you can't afford to refactor or pay SREs to be on-call on a 24/7 basis?
There are actually a number of things you can do. As this article explains, it's possible to improve application reliability without breaking the budget — even if your IT budget is fragile to begin with.
1. Leverage Load Balancing
Load balancers distribute application traffic between multiple app instances or servers. They increase application reliability by ensuring that you use hosting resources as efficiently as possible. For example, a good load-balancing configuration can redirect traffic from an application instance that is maxed out to another one that is underutilized so that the application continues to operate without dropping requests.
Public clouds offer fully managed load-balancing services, or you can configure your own load balancer.
There is a cost associated with managed load balancers, but it's small, and most load balancers are easy to configure. They're a simple, cost-effective way to improve reliability.
2. Configure Autoscaling
Autoscaling services are another low-cost (or, in some cases, free) and simple way to improve reliability. Autoscaling lets you configure rules that will automatically increase or decrease infrastructure availability. As a result, you can accommodate shifts in demand, which in turn enables you to keep your application running smoothly even if you experience unanticipated changes in load.
Not all applications and infrastructure can be autoscaled, but autoscaling is available for most core IaaS services — such as AWS EC2 and Azure Virtual Machines.
3. Create Reliability Playbooks
Playbooks are predefined procedures that teams develop ahead of time to spell out how they will react to various types of problems — such as a server or network failure. By speeding incident response processes and removing some of the guesswork, playbooks help decrease the risk of downtime due to unexpected incidents.
There is no direct cost to develop playbooks and no special tools to pay for (although to make the most of playbooks, you may want to integrate them with your observability and incident response tools). And although you'll need some staff time to create playbooks, you can streamline the process by looking at which problems your team has experienced in the past and how it responded. That information can form the basis for your playbooks.
4. Containerize Your Application — Even if It's a Monolith
Running applications in containers improves reliability because it provides for a more consistent and predictable hosting environment. When your app is containerized, configuration variables on the host server aren't very important because the only configuration that really matters is what's baked into the container.
Containers are most commonly used to host microservices. But there's no reason why you can't run a monolithic application inside a container, too. You won't enjoy all of the scalability benefits that you'd get from a microservices architecture, but you will enjoy a more consistent application hosting environment and, by extension, a lower risk of reliability issues.
You also won't have to expend significant development resources refactoring your application. You may have to make some changes to address requirements like application storage (because containers are ephemeral, they can't provide persistent storage resources for a monolithic application in the same way that a host server could), but you won't need a whole development team and months of time to handle those challenges.
5. Use Canary Releases
A canary release is the deployment of a newer version of an application to a select group of users — the so-called canaries. That way, any reliability problems introduced by the version will impact a limited portion of your user base, and you can fix the issues before pushing the version out to everyone.
Canary releases add a bit of complexity to the application deployment process because they require you to be able to host different versions of your application for different users. But you can typically do this easily enough — and without great cost — by setting up load balancers to direct traffic as required to multiple application versions.
Application Reliability Doesn't Have to Be Expensive
In a perfect world, every team would have the resources to invest in software architectures and hosting models that maximize reliability. But in the real world, it's not always financially feasible to take advantage of the most sophisticated reliability tools and techniques.
Fortunately, there are less costly ways to improve application reliability — and most of them don't require a lot of complexity or configuration, either.
About the authorChristopher Tozzi is a technology analyst with subject matter expertise in cloud computing, application development, open source software, virtualization, containers and more. He also lectures at a major university in the Albany, New York, area. His book, “For Fun and Profit: A History of the Free and Open Source Software Revolution,” was published by MIT Press.