Sometimes, the best way to ensure the reliability of an application or IT service is to inject some chaos into the process to see how it will react.
That's the goal of Steadybit, which has developed a chaos and reliance engineering platform to help site reliability engineers (SREs) and IT operations professionals. The company announced the general availability of its platform on Sept. 22, alongside $7.8 million in seed funding.
With chaos engineering, faults and errors are injected into a process to see how it will react.
"Normally chaos engineering is something for experts only, as you need to know a lot of information about your application," Benjamin Wilms, CEO and co-founder of Steadybit, told ITPro Today. "Now with Steadybit, people are able to do it on their own without knowing anything about chaos engineering."
Moving Chaos Engineering from Production to Development
There are multiple chaos engineering tools in the market, including open source ones such as Chaos Monkey. Steadybit differentiates itself from other tools by targeting developers and injecting chaos experiments into the development process, whereas chaos engineering has typically been used on the operations and production side, Wilms said.
Chaos engineering tools typically provide somewhat generic failures to test resilience, he said. In contrast, Steadybit users use the platform to describe what they would like to achieve. For example, users can configure Steadybit to test to see whether a specific service will survive an update with a heavy load of users.
Users have expectations for what their system can deliver, and with Steadybit, they can see if those expectations match the reality, according to Wilms.
"We provide a policy so users can describe expectations as code, and then they are executed by Steadybit," he said.
Steadybit is deployed and run as part of an organization's continuous integration/continuous development (CI/CD) process. Among the challenges of testing during the development process is the fact that, unlike a production application, the application being developed doesn't yet have users. To that end, Steadybit integrates with load testing tools that can simulate traffic, such as jMeter.
"Normally, people are just checking for performance with load testing tools, but they are not verifying if the system is really working under production situations, because production is not a happy place to be," Wilms said.
From Chaos to Resilience Engineering
Wilms argued that a primary motivation for developers is not resilience, but rather getting new and exciting features into production.
With that understanding, Steadybit aims to work the way developers do — with a focus on helping to get features into production. To that end, as part of a CI/CD process, Steadybit can provide recommendations to improve resilience and that will also help developers complete features faster.
Currently, Steadybit enables developers to run chaos experiments in a time-based approach, where a failure condition will operate for a specific period of time. Looking forward, Wilms said Steadybit will introduce event-based timing, such that a chaos experiment will run until a certain event occurs.
About the authorSean Michael Kerner is an IT consultant, technology enthusiast and tinkerer. He consults to industry and media organizations on technology issues.